Users can interface with the Dispatcher by using the EnFuzion Eye and a web browser, or through the command line utilities enfsub and enfcmd. Custom programs can also communicate with the Dispatcher using its network based, programming interface. Chapter 9 describes how users and custom programs can accomplish most common tasks.
This chapter provides details about the Eye program, the command line program the enfsub and the enfcmd, and the Dispatcher programming interface.
The Eye program provides your EnFuzion cluster with an intuitive, web-based interface. It establishes a connection to the EnFuzion Dispatcher and displays information about a running cluster. The Eye uses a set of web pages, so that the user can interact with EnFuzion using a graphical web browser.
The Eye allows the user to monitor the state of the cluster, nodes and runs that EnFuzion uses. Furthermore, the Eye allows the inspection of cluster and run logs. Using a web browser, it can be used to browse and retrieve run results and to submit new runs and related data files.
The Eye runs as a separate program, interfering as little with the actual EnFuzion cluster as possible. If you encounter a problem while using the Eye, your cluster should continue functioning normally.
The Eye is started by executing the enfeye executable residing on the root machine in the same location as the EnFuzion Dispatcher.
The Eye is normally started automatically by the Dispatcher, as described in the Section called Handling of the Eye by the Dispatcher in Chapter 9, so there is no need to change any of the configuration defaults to use the Eye.
The Eye can also be started manually from a command line, or its default configuration can be changed. The Eye command line options are described in the Section called The Eye in Chapter 11. The Eye configuration options are described in the Section called Specifying Root Configuration Options in Chapter 6.
Once the Eye is started as described above, you can use your web browser to connect to it. The Eye uses only plain HTML, conforming to the W3C HTML 4.01 DTD, and cascading style sheets to construct its web pages. The Eye works best with Internet Explorer 5 or higher, Mozilla 1.0 or higher and Netscape 6 or higher. Cookies must be enabled in your browser, in order for the Eye to function properly.
Your web browser needs to be directed to the system where the Eye is executing and to the port that the Eye is listening on. The default port number is 10101. Using default values, you can connect to the Eye with the following link:
http://<root_host>:10101The Eye port number can be changed as described in the Section called Port Number for the Eye in Chapter 6.
Upon establishing a connection, you arrive at the Eye home page (see Figure 10-1).
The header, which is common to most of the Eye pages, presents a short descriptive title of the page on the left-hand side, just below the Axceleon logo. The left hand side displays the time when the information used in creating the page was last updated and the title of the page being viewed. On the right hand side, the hostname and port where the Dispatcher is listening and the user that the Eye is currently logged in as are displayed.The navigation bar in the header provides quick access to the most common activities. On the left, the option "Home" should always bring you to the home page that you are currently observing. The other options, which are described in the sections that follow, take you to the pages listed below:
Cluster: Cluster State page
Nodes: Node List page
Runs: Run List page
Accounting: Accounting Reports page
Execution: Executing Job List page
Submit: Run Submission page
Results: Run Results page
The Eye home page offers you a choice of activities:
The "Login" link lets you submit new login information.
The "Logout" link gives you an "anonymous" user ID.
The "Submit A Run" link allows you to submit a run
The "Check Run Results" link presents you with a list of directories containing run results. Their contents may then be inspected and retrieved.
The "Cluster Monitoring" link takes you to a set of pages that show information on the overall cluster state, as well as the runs and nodes used by the cluster.
The "Accounting" link takes you to the page that lets you generate and view reports of EnFuzion activity.
Most of the information in the Eye is presented in tables. When appropriate, the table contents may be sorted by column, in either ascending or descending order. If the column header is a hyper link, simply click on it to sort the table by that column.
A table that consists of more than a hundred rows is broken into pages of hundred rows each. In this case a page index appears above the table, displaying the current page number and links that allow for navigating the pages.
Runs can be submitted through the Run Submission page, which can be reached via the Eye home page or through the Submit link, available in the header menu. The Run Submission page is shown in Figure 10-2.
When submitting a run, you first need to upload the run file to the Dispatcher. Click on the Browse button near the Run file field, and select your run file.
Clicking on the Submit button will upload the selected file and create a run from it. If your run file was not correctly formed, you will see an error message reporting that adding the run failed. Otherwise, a page will be displayed, enabling you to select and upload optional data files (see Figure 10-3).
Select a file with the Browse button, and then click on the Submit Data File button. You will see the data file added in the list below the submission form. Repeat this process for every data file, and select Start Run Execution. The results of starting a run will then be displayed (see Figure 10-4).
If the run was successfully started, you can immediately view its state by clicking on the link that includes the ID of the started run. This process is described with more detail in the Section called Detailed Run Information Page.
Note that although EnFuzion allows you to specify a custom name for the run directory, custom directories are not supported by the Eye. You need to allow the run to create its own directory, using a default name.
If you are accessing the Eye via a proxy, it is possible that the proxy will not allow you to post large data files to the Eye. One solution is to bypass the proxy and connect to the Eye directly. Otherwise, you may need to contact your proxy administrator for assistance.
This collection of pages displays an in-depth view of the EnFuzion cluster that the Eye is connected to, including its runs and nodes.
The first table contains general information about the cluster (see Figure 10-5):
Cluster: the host name and port that the EnFuzion Dispatcher is using
Status: the status of the cluster
Uptime: the total time that the cluster has been running
Active Nodes: the number of active nodes, these might be executing or idle
Down Nodes: the number of nodes that are down and unable to perform work
Submitted Runs: the number of runs already submitted to the cluster
Completed Runs: the number of runs completed by the cluster
The "Nodes" link takes you to the Node List page, as described in the Section called Node List Page below. The corresponding table shows the numbers of nodes, grouped by the node status.
By following the "Runs" link, a list of runs is requested. See the Section called Run List Page. The corresponding table shows the number of runs, grouped by their status.
Finally, a table lists the ten most recent diagnostic messages from the cluster log that merit user attention. If there are more than ten messages, two buttons under the table take allow you to view all diagnostic messages or the complete cluster log, respectively.
This page displays a single table, containing all of the runs that the EnFuzion cluster recognizes. The following information is displayed in the table (see Figure 10-6):
Selection: the first column allows you to add and remove runs from the selection
Run ID: the run ID. Clicking this takes you to the detailed run information page.
Name: the run name
User: user ID of the run owner
Status: the run status
Uptime: the time elapsed since the run was started
Finish In: the estimated time required to complete this run
Priority Level: priority level for the run
Priority Weight: priority weight for the run
Allocated Nodes: the number of nodes allocated to perform work for this run
Jobs Waiting: the number of jobs still waiting to be executed
Jobs Executing: the number of jobs currently executing
Jobs Done: the number of completed jobs
Jobs Failed: the number of jobs that did not complete due to some error
Job Length: the average time to complete a job
Total Time: the sum of completion times for all the jobs
Below the table, three buttons allow you to operate on the set of selected runs:
Start: starts selected runs
Stop: stops selected runs
Abort: aborts selected runs
This page displays detailed information about a single run (see Figure 10-7):
The first table contains general run information:
Run ID: the run ID
Name: the run name
User: user ID of the run owner
Account: user specified string
Priority Level: priority level for the run
Priority Weight: priority weight for the run
Node Limit: maximum number of nodes to execute the run
Persistent: persistence switch
Preemptive: preemption switch
Execution Limit: time to complete the run
Job Execution Limit: time to complete a job
The second table contains information about run status:
Status: one of Created, Started, Done, Failed, Stopped
Stage: one of Initializing, Rootstart, Jobsexecuting, Nodefinish, Rootfinish
Allocated Nodes: the number of nodes allocated to perform work for this run.
Uptime: the time elapsed since the run was started.
Finish In: the estimated time required to complete this run.
Total Time: the sum of completion times for all the jobs
The next table contains information about how the run is executing:
Jobs Waiting: the number of jobs still waiting to be executed
Job Executing: the number of jobs currently executing
Jobs Done: the number of completed jobs.
Jobs Failed: the number of jobs that did not complete due to some error
Job Length: the average time to complete a job
Datajobs Executing: the number of data jobs currently executing
Datajobs Done: the number of completed data jobs
Datajob Length: the average time to complete a data job
Below this table, a list of nodes that are initialized to serve this run is displayed. The following columns are specified for each node:
Node: node ID
Host: host name executing the node
Jobs Done: jobs completed on the node
Datajobs Done: datajobs completed on the node
Nice: job priority on the node. If nice is on, then the jobs are executed at a background priority.
User: user account on the node that executes the jobs
Directory: the main directory where jobs are executing
At the bottom of the page, additional buttons enable you to view further run details:
Output: shows list of files produced by this run
Log: displays the run log
Completed Jobs: takes you to the Completed Jobs page
Requirements: lets you inspect and edit run requirements
The last row of buttons allows you to control the run:
Start: starts the run
Stop: stops the run
Abort: aborts the run
Edit: takes you to a page that displays and lets you edit various run attributes
Editing run attributes brings you to a new page where you can edit the following run attributes: Priority Level, Priority Weight, Node Limit and Execution Limit. When you change these to the desired values, simply click on the "Apply Changes" button in order to commit the changes and have them take effect.
The completed jobs page shows a table of all jobs in the specified run that have completed (see Figure 10-8).
Job ID: ID of the job
Node ID: ID of the node that the job was completed on
Node Host: host name of the node that the job was completed on
Execution Time: time that the job executed
Start Time: time when the job was first started
End Time: time when the job was completed
Job Starts: number of times the job was started
Type: type of job, which is either "nodestart" for jobs that initialize a node and "main" for user specified jobs
Status: status of job, which is either "done" or "failed"
This page displays a list of all nodes that the EnFuzion cluster recognizes. For each node, the following information is displayed (see Figure 10-9):
Selection: the first column allows you to add and remove nodes from the selection
Node ID: the node name
Host: the host name of the node
Status: one of Executing, Idle, Busy, Down, Starting, Terminating
Uptime: the time elapsed since the node last changed its status to "Up"
Executing: the percentage of the uptime that the node was executing user jobs
Idle: the percentage of the uptime that the node was idle
Busy: the percentage of the uptime that the node was unavailable, since it was busy with processing unrelated to EnFuzion
Downtime: the time elapsed since the node last changed its status to "Down"
Job Limit: the maximum number of concurrent jobs that this node can execute
Jobs Executing: the number of jobs currently executing on this node
Jobs Done: the number of jobs completed by this node
Job Length: the average time needed to complete a job on this node
Clicking on the node name link provides you with yet more information about that node. See the Section called Detailed Node Information page for further information.
Below the table, you may choose to start, terminate or remove selected nodes or add a new node.
Adding a node brings you to a new page where you have to enter information on the new node. This data is mostly the same as the one used in the enfuzion.nodes file:
Host name of the node
Username used to login to the node
Password that is used to login to the node. You need to type it twice in order to confirm it. If you use the key authorization for the SSH method, which does not require a password, just use the dummy string for the password.
Connection type
Clicking the "Add" button will add a new node to the cluster. You are only allowed to add a node, if privileges are not enforced or if you are logged in as a user with administrator privileges.
The detailed information page consists of three tables. The first table displays general information about the selected node (see Figure 10-10):
Node ID: the node name
Host: the host name of the node
User: the user that is used to log on the node
Port: the port used for communication with the EnFuzion Dispatcher
Operating System: the operating system running on this node
Root Start: switch to indicate whether the root starts the node or is the node started independently
Start Type: the method to start the node
Start Command: the command used to start the node
The second table displays the node's status information:
Status: the node status
Total Time: the total time since the node was added to the cluster
Total Uptime: the total time that the node was "Up"
Total Downtime: the total time that the node was "Down"
Uptime: the time elapsed since the node last changed its status to "Up"
Executing: the percentage of the uptime that the node was executing user jobs
Idle: the percentage of the uptime that the node was idle
Busy: the percentage of the uptime that the node was unavailable, since it was busy with processing unrelated to EnFuzion
Downtime: the time elapsed since the node last changed its status to "Down"
Finally, the third table displays job execution statistics for this node:
Job Limit: the maximum number of concurrent jobs that this node can execute
Jobs Executing: the number of jobs currently executing on this node
Jobs Done: the number of jobs completed by this node
Job Length: the average time needed to complete a job on this node
Below the tables, a set of buttons enables you to control the node. You may choose to:
Start the node
Terminate the node
Remove the node
View the log
Edit the node properties
Selecting the properties button brings you to the Node Properties page. Here you can view all the node properties in a list. You may select any number of properties and remove them using the "Remove" button or enter a new property in the text field below the "Remove" button and add it with the "Add" button.
The Executing Jobs page shows all currently executing jobs. The table consists of the following fields:
Selection: allows you to add or remove jobs to the selected set
Run ID: shows ID of the run the job belongs to. The ID links to the respective run page
Run Name: shows name of the run the job belongs to
Run Owner: shows the user ID of the run owner
Job ID: shows ID of the job
Node ID: ID of the node the job is currently executing on; the ID links to the respective node page
Node Host: hostname of that node
Execution Time: the time the job has been executing for
Below the table two buttons allow you to abort or reschedule the selected set of jobs.
This page shows a list of all directories that store results of an EnFuzion run (see Figure 10-12):
Selection: the first column allows you to add and remove completed runs from the selection
Run ID: ID of the completed run, the ID links to the page with the contents of the run directory where output files are stored
Name: name of the run
Status: "done" or "failed"
User: the owner of the run
Account: user specified string
Submitted: time of submission of run
Completed: time of completion of run
Uptime: time the run was up
Total Time: the sum of execution times for all the nodes
Jobs Waiting: number of waiting jobs. If the run is aborted, then this number represents the number of uncompleted jobs
Jobs Done: number of done jobs
Jobs Failed: number of failed jobs
Jobs Rescheduled: number of rescheduled jobs
Job Length: average length of a single job
Data Jobs Done: number of done data jobs
Data Job Length: average length of a single data job
Nodes: number of used nodes, which links to the page of nodes used.
Beneath the table, the following buttons allow you to inspect details of the selected run:
Output: shows contents of the directory containing files output by the run
Log: shows run log
Completed Jobs: takes you to the Completed Jobs page
Used Nodes: view the Used Nodes page
Delete: deletes the run directory and all user files in the directory. This operation deletes all information about the run, use with care!
By following the Run ID link or using the output button, the user may browse the contents of run directories in a similar fashion to browsing a file system with a file manager. Clicking on a directory displays its contents, their sizes and the dates of their last modification (see Figure 10-13):
You can browse recursively through the subdirectories and download the files contained in them.
This page shows a table of all nodes used by the specified run (see Figure 10-14):
Node ID: node ID; it links to the appropriate node page
Host Name: host name of the node
Jobs Done: number of jobs completed on this node
Data Jobs Done: number of data jobs completed on this node
Nice: execution priority
User: the account used on the node
Directory: the working directory on the node
The accounting page lists available run and node activity reports. At (see Figure 10-15):
At the top of the page, a "Change Report Layout" button takes you to the Report Layout page and below it three links, "Hourly Reports", "Daily Reports" and "Monthly Reports" take you to the parts of the page listing hourly, daily and monthly reports, respectively.
The three tables below list reports by period of activity: run reports in the left column and node reports in the right column. First table lists hourly reports, the second one daily reports and the last table lists monthly reports: clicking on the links in the table shows the desired report.
The report layout page lets you to edit the columns shown in the run and node reports (see the Section called Accounting Page):
The first table is dedicated to the node reports and the second one to the run reports. You may check the "Group By Column" checkbox in order to group report rows by certain columns. In this case, the values for grouped rows are added together. Entering a "Match Value" only shows the rows where the desired column's value matches the entered one.
You may use the buttons beneath each table to reset the layout specification to the default one.
At the bottom of the page, you may select a group filter for run reports. Only runs owned by users in the selected group will be shown in the reports.
Changes to layout should be committed by clicking on the "Apply Changes" button.
Each report page starts with a header that describes the report type and the period for which the report stands. The actual report table follows and the page ends with a button that allows you to change the report layout.
Reports are available for runs (see Figure 10-17) and nodes (see Figure 10-18):
This section lists error messages that the Eye produces.
An unpredicted error occurred. Please follow the instructions on the page in order to try and remedy the problem. Retry your action and if it fails again, restart the Eye and retry your action again. If the problem persists, send a bug report with a detailed description of how to reproduce it to support@axceleon.com.
The client has been denied access to the Eye. You should check you access permissions in the root.options file.
The client failed to log in to the Eye and to the Dispatcher. Check that you have used a proper user identity file, generated by the enfcmd utility, and that the file has not been altered by anyone.
The Eye was unable to connect to the Dispatcher. Please verify that the Dispatcher is actually running and that the Eye has been setup to try and connect to the proper port.
You have attempted to perform an action that requires at least one selected item, but you have selected none.
You have attempted to perform an action that requires exactly one selected item, but you have selected more than one.
You have chosen an action that requires user privileges that you do not have: perhaps you have attempted to perform an administrative action while not logged in as a user with administrative privileges or have chosen to manipulate a run that is not owned by the user you are currently logged in as.
The Dispatcher was run in the batch mode and has exited, bringing down the Eye with him. You need to start the Eye manually if you wish to browse the run results after the Dispatcher has quit or set the eyeterminate option to off in root.options file which will prevent the Dispatcher from taking down the Eye when it exits.
Your session has probably expired, please go to the Home Page and attempt to log in again.
The Eye was unable to connect to the Dispatcher. Check the port number of the Dispatcher given to the Eye through command line options or entered through the login page.
You have attempted to submit a file without specifying which file.
You have attempted to display information about a node that does not exist.
You have attempted to fetch information on a run that the Dispatcher does not recognize.
The results of the requested run do not exist, or are in a directory with a non-default name.
Reporting data for the period you want the report for was not found.
You have requested a page that the Eye knows nothing about.
The maximum number of concurrent sessions that the Eye is willing to handle has been reached. You need to wait for one session to expire. Currently, the Eye supports 256 sessions. A session expires after a week of inactivity.
The run submission has expired since you have not completed it in a reasonable time. The submission cannot be completed and you need to resubmit the run.
The Eye was unable to submit the run to the Dispatcher.
A parameter that is mandatory was not entered. This error might happen whenever the Eye requires you to supply some values for an action like editing node or run attributes, adding a node or similar.
A parameter you entered has a value that is not allowed.
When adding a node you need to enter the same password twice in order to confirm it. You have not entered the same password in both text fields.
Root options noanonsubmit, see details in the Section called Rejecting Anonymous Run Submission in Chapter 6, and privileges, see details in the Section called Enforcing Privileges in Chapter 6, affect which actions can be performed by users. By default, noanonsubmit and privileges are turned off, which allows any action to be performed by any user.
If noanonsubmit is turned on, then the following action is not permitted by users with the anonymous user ID:
run submission, described in the Section called Submitting a Run;
If privileges are turned on, then the following actions are permitted only by users with administrative privileges:
Start, Terminate, Remove actions on the Node List page, described in the Section called Node List Page, and the Add Node action on the subpage to add a new node;
Start, Terminate, Remove actions on the Detailed Node Information page, described in the Section called Detailed Node Information page; Remove, Add actions on the Properties subpage;
If privileges are turned on, then the following actions are permitted only by users with administrative privileges or the run owner:
Start, Stop, Abort actions on the Run List page, described in the Section called Run List Page;
Start, Stop, Abort actions on the Detailed Run Information page, described in the Section called Detailed Run Information Page; Remove, Add actions on the Run Requirements subpage; Apply Changes action on the the Edit Run subpage;
Abort, Reschedule actions on the Executing Jobs page, described in the Section called Executing Jobs Page.
The Eye offers IP-based authentication. The administrator can set a list of IP addresses that are allowed or denied to connect to the Eye (see the Section called Restricting Access to the Eye in Chapter 6 for details).