Chapter 10. Interfacing with the Dispatcher

Table of Contents
Graphical Web Based Interface
Command Line Interface
Application Programming Interface

Users can interface with the Dispatcher by using the EnFuzion Eye and a web browser, or through the command line utilities enfsub and enfcmd. Custom programs can also communicate with the Dispatcher using its network based, programming interface. Chapter 9 describes how users and custom programs can accomplish most common tasks.

This chapter provides details about the Eye program, the command line program the enfsub and the enfcmd, and the Dispatcher programming interface.

Graphical Web Based Interface

The Eye program provides your EnFuzion cluster with an intuitive, web-based interface. It establishes a connection to the EnFuzion Dispatcher and displays information about a running cluster. The Eye uses a set of web pages, so that the user can interact with EnFuzion using a graphical web browser.

The Eye allows the user to monitor the state of the cluster, nodes and runs that EnFuzion uses. Furthermore, the Eye allows the inspection of cluster and run logs. Using a web browser, it can be used to browse and retrieve run results and to submit new runs and related data files.

The Eye runs as a separate program, interfering as little with the actual EnFuzion cluster as possible. If you encounter a problem while using the Eye, your cluster should continue functioning normally.

The Eye

The Eye is started by executing the enfeye executable residing on the root machine in the same location as the EnFuzion Dispatcher.

The Eye is normally started automatically by the Dispatcher, as described in the Section called Handling of the Eye by the Dispatcher in Chapter 9, so there is no need to change any of the configuration defaults to use the Eye.

The Eye can also be started manually from a command line, or its default configuration can be changed. The Eye command line options are described in the Section called The Eye in Chapter 11. The Eye configuration options are described in the Section called Specifying Root Configuration Options in Chapter 6.

Using the Eye

Once the Eye is started as described above, you can use your web browser to connect to it. The Eye uses only plain HTML, conforming to the W3C HTML 4.01 DTD, and cascading style sheets to construct its web pages. The Eye works best with Internet Explorer 5 or higher, Mozilla 1.0 or higher and Netscape 6 or higher. Cookies must be enabled in your browser, in order for the Eye to function properly.

Your web browser needs to be directed to the system where the Eye is executing and to the port that the Eye is listening on. The default port number is 10101. Using default values, you can connect to the Eye with the following link:


    http://<root_host>:10101
The Eye port number can be changed as described in the Section called Port Number for the Eye in Chapter 6.

Upon establishing a connection, you arrive at the Eye home page (see Figure 10-1).

Figure 10-1. The Eye Home Page

The header, which is common to most of the Eye pages, presents a short descriptive title of the page on the left-hand side, just below the Axceleon logo. The left hand side displays the time when the information used in creating the page was last updated and the title of the page being viewed. On the right hand side, the hostname and port where the Dispatcher is listening and the user that the Eye is currently logged in as are displayed.

The navigation bar in the header provides quick access to the most common activities. On the left, the option "Home" should always bring you to the home page that you are currently observing. The other options, which are described in the sections that follow, take you to the pages listed below:

  • Cluster: Cluster State page

  • Nodes: Node List page

  • Runs: Run List page

  • Accounting: Accounting Reports page

  • Execution: Executing Job List page

  • Submit: Run Submission page

  • Results: Run Results page

This navigation bar is also replicated in the footer of each page.

The Eye home page offers you a choice of activities:

  • The "Login" link lets you submit new login information.

  • The "Logout" link gives you an "anonymous" user ID.

  • The "Submit A Run" link allows you to submit a run

  • The "Check Run Results" link presents you with a list of directories containing run results. Their contents may then be inspected and retrieved.

  • The "Cluster Monitoring" link takes you to a set of pages that show information on the overall cluster state, as well as the runs and nodes used by the cluster.

  • The "Accounting" link takes you to the page that lets you generate and view reports of EnFuzion activity.

Most of the information in the Eye is presented in tables. When appropriate, the table contents may be sorted by column, in either ascending or descending order. If the column header is a hyper link, simply click on it to sort the table by that column.

A table that consists of more than a hundred rows is broken into pages of hundred rows each. In this case a page index appears above the table, displaying the current page number and links that allow for navigating the pages.

Submitting a Run

Runs can be submitted through the Run Submission page, which can be reached via the Eye home page or through the Submit link, available in the header menu. The Run Submission page is shown in Figure 10-2.

Figure 10-2. The Run Submission Page

When submitting a run, you first need to upload the run file to the Dispatcher. Click on the Browse button near the Run file field, and select your run file.

Clicking on the Submit button will upload the selected file and create a run from it. If your run file was not correctly formed, you will see an error message reporting that adding the run failed. Otherwise, a page will be displayed, enabling you to select and upload optional data files (see Figure 10-3).

Figure 10-3. Submission of Data Files

Select a file with the Browse button, and then click on the Submit Data File button. You will see the data file added in the list below the submission form. Repeat this process for every data file, and select Start Run Execution. The results of starting a run will then be displayed (see Figure 10-4).

Figure 10-4. Successful Run Submission

If the run was successfully started, you can immediately view its state by clicking on the link that includes the ID of the started run. This process is described with more detail in the Section called Detailed Run Information Page.

Note that although EnFuzion allows you to specify a custom name for the run directory, custom directories are not supported by the Eye. You need to allow the run to create its own directory, using a default name.

If you are accessing the Eye via a proxy, it is possible that the proxy will not allow you to post large data files to the Eye. One solution is to bypass the proxy and connect to the Eye directly. Otherwise, you may need to contact your proxy administrator for assistance.

Monitoring Execution

This collection of pages displays an in-depth view of the EnFuzion cluster that the Eye is connected to, including its runs and nodes.

Cluster Status Page

The first table contains general information about the cluster (see Figure 10-5):

Figure 10-5. The Cluster Status Page

  • Cluster: the host name and port that the EnFuzion Dispatcher is using

  • Status: the status of the cluster

  • Uptime: the total time that the cluster has been running

  • Active Nodes: the number of active nodes, these might be executing or idle

  • Down Nodes: the number of nodes that are down and unable to perform work

  • Submitted Runs: the number of runs already submitted to the cluster

  • Completed Runs: the number of runs completed by the cluster

The "Nodes" link takes you to the Node List page, as described in the Section called Node List Page below. The corresponding table shows the numbers of nodes, grouped by the node status.

By following the "Runs" link, a list of runs is requested. See the Section called Run List Page. The corresponding table shows the number of runs, grouped by their status.

Finally, a table lists the ten most recent diagnostic messages from the cluster log that merit user attention. If there are more than ten messages, two buttons under the table take allow you to view all diagnostic messages or the complete cluster log, respectively.

Run List Page

This page displays a single table, containing all of the runs that the EnFuzion cluster recognizes. The following information is displayed in the table (see Figure 10-6):

Figure 10-6. The Run List Page

  • Selection: the first column allows you to add and remove runs from the selection

  • Run ID: the run ID. Clicking this takes you to the detailed run information page.

  • Name: the run name

  • User: user ID of the run owner

  • Status: the run status

  • Uptime: the time elapsed since the run was started

  • Finish In: the estimated time required to complete this run

  • Priority Level: priority level for the run

  • Priority Weight: priority weight for the run

  • Allocated Nodes: the number of nodes allocated to perform work for this run

  • Jobs Waiting: the number of jobs still waiting to be executed

  • Jobs Executing: the number of jobs currently executing

  • Jobs Done: the number of completed jobs

  • Jobs Failed: the number of jobs that did not complete due to some error

  • Job Length: the average time to complete a job

  • Total Time: the sum of completion times for all the jobs

Below the table, three buttons allow you to operate on the set of selected runs:

  • Start: starts selected runs

  • Stop: stops selected runs

  • Abort: aborts selected runs

Detailed Run Information Page

This page displays detailed information about a single run (see Figure 10-7):

Figure 10-7. Detailed Run Information

The first table contains general run information:

  • Run ID: the run ID

  • Name: the run name

  • User: user ID of the run owner

  • Account: user specified string

  • Priority Level: priority level for the run

  • Priority Weight: priority weight for the run

  • Node Limit: maximum number of nodes to execute the run

  • Persistent: persistence switch

  • Preemptive: preemption switch

  • Execution Limit: time to complete the run

  • Job Execution Limit: time to complete a job

The second table contains information about run status:

  • Status: one of Created, Started, Done, Failed, Stopped

  • Stage: one of Initializing, Rootstart, Jobsexecuting, Nodefinish, Rootfinish

  • Allocated Nodes: the number of nodes allocated to perform work for this run.

  • Uptime: the time elapsed since the run was started.

  • Finish In: the estimated time required to complete this run.

  • Total Time: the sum of completion times for all the jobs

The next table contains information about how the run is executing:

  • Jobs Waiting: the number of jobs still waiting to be executed

  • Job Executing: the number of jobs currently executing

  • Jobs Done: the number of completed jobs.

  • Jobs Failed: the number of jobs that did not complete due to some error

  • Job Length: the average time to complete a job

  • Datajobs Executing: the number of data jobs currently executing

  • Datajobs Done: the number of completed data jobs

  • Datajob Length: the average time to complete a data job

Below this table, a list of nodes that are initialized to serve this run is displayed. The following columns are specified for each node:

  • Node: node ID

  • Host: host name executing the node

  • Jobs Done: jobs completed on the node

  • Datajobs Done: datajobs completed on the node

  • Nice: job priority on the node. If nice is on, then the jobs are executed at a background priority.

  • User: user account on the node that executes the jobs

  • Directory: the main directory where jobs are executing

At the bottom of the page, additional buttons enable you to view further run details:

  • Output: shows list of files produced by this run

  • Log: displays the run log

  • Completed Jobs: takes you to the Completed Jobs page

  • Requirements: lets you inspect and edit run requirements

Run requirements are shown in a list on a dedicated page: you may select and remove them with the "Remove" button or you may type a new requirement in the text field below the list and add it with the "Add" button.

The last row of buttons allows you to control the run:

  • Start: starts the run

  • Stop: stops the run

  • Abort: aborts the run

  • Edit: takes you to a page that displays and lets you edit various run attributes

Editing run attributes brings you to a new page where you can edit the following run attributes: Priority Level, Priority Weight, Node Limit and Execution Limit. When you change these to the desired values, simply click on the "Apply Changes" button in order to commit the changes and have them take effect.

Completed Jobs Page

The completed jobs page shows a table of all jobs in the specified run that have completed (see Figure 10-8).

Figure 10-8. The Completed Jobs Page

  • Job ID: ID of the job

  • Node ID: ID of the node that the job was completed on

  • Node Host: host name of the node that the job was completed on

  • Execution Time: time that the job executed

  • Start Time: time when the job was first started

  • End Time: time when the job was completed

  • Job Starts: number of times the job was started

  • Type: type of job, which is either "nodestart" for jobs that initialize a node and "main" for user specified jobs

  • Status: status of job, which is either "done" or "failed"

Node List Page

This page displays a list of all nodes that the EnFuzion cluster recognizes. For each node, the following information is displayed (see Figure 10-9):

Figure 10-9. The Node List Page

  • Selection: the first column allows you to add and remove nodes from the selection

  • Node ID: the node name

  • Host: the host name of the node

  • Status: one of Executing, Idle, Busy, Down, Starting, Terminating

  • Uptime: the time elapsed since the node last changed its status to "Up"

  • Executing: the percentage of the uptime that the node was executing user jobs

  • Idle: the percentage of the uptime that the node was idle

  • Busy: the percentage of the uptime that the node was unavailable, since it was busy with processing unrelated to EnFuzion

  • Downtime: the time elapsed since the node last changed its status to "Down"

  • Job Limit: the maximum number of concurrent jobs that this node can execute

  • Jobs Executing: the number of jobs currently executing on this node

  • Jobs Done: the number of jobs completed by this node

  • Job Length: the average time needed to complete a job on this node

Clicking on the node name link provides you with yet more information about that node. See the Section called Detailed Node Information page for further information.

Below the table, you may choose to start, terminate or remove selected nodes or add a new node.

Adding a node brings you to a new page where you have to enter information on the new node. This data is mostly the same as the one used in the enfuzion.nodes file:

  • Host name of the node

  • Username used to login to the node

  • Password that is used to login to the node. You need to type it twice in order to confirm it. If you use the key authorization for the SSH method, which does not require a password, just use the dummy string for the password.

  • Connection type

Clicking the "Add" button will add a new node to the cluster. You are only allowed to add a node, if privileges are not enforced or if you are logged in as a user with administrator privileges.

Detailed Node Information page

The detailed information page consists of three tables. The first table displays general information about the selected node (see Figure 10-10):

Figure 10-10. Detailed Node Information

  • Node ID: the node name

  • Host: the host name of the node

  • User: the user that is used to log on the node

  • Port: the port used for communication with the EnFuzion Dispatcher

  • Operating System: the operating system running on this node

  • Root Start: switch to indicate whether the root starts the node or is the node started independently

  • Start Type: the method to start the node

  • Start Command: the command used to start the node

The second table displays the node's status information:

  • Status: the node status

  • Total Time: the total time since the node was added to the cluster

  • Total Uptime: the total time that the node was "Up"

  • Total Downtime: the total time that the node was "Down"

  • Uptime: the time elapsed since the node last changed its status to "Up"

  • Executing: the percentage of the uptime that the node was executing user jobs

  • Idle: the percentage of the uptime that the node was idle

  • Busy: the percentage of the uptime that the node was unavailable, since it was busy with processing unrelated to EnFuzion

  • Downtime: the time elapsed since the node last changed its status to "Down"

Finally, the third table displays job execution statistics for this node:

  • Job Limit: the maximum number of concurrent jobs that this node can execute

  • Jobs Executing: the number of jobs currently executing on this node

  • Jobs Done: the number of jobs completed by this node

  • Job Length: the average time needed to complete a job on this node

Below the tables, a set of buttons enables you to control the node. You may choose to:

  • Start the node

  • Terminate the node

  • Remove the node

  • View the log

  • Edit the node properties

Selecting the properties button brings you to the Node Properties page. Here you can view all the node properties in a list. You may select any number of properties and remove them using the "Remove" button or enter a new property in the text field below the "Remove" button and add it with the "Add" button.

Executing Jobs Page

The Executing Jobs page shows all currently executing jobs. The table consists of the following fields:

Figure 10-11. The Executing Jobs Page

  • Selection: allows you to add or remove jobs to the selected set

  • Run ID: shows ID of the run the job belongs to. The ID links to the respective run page

  • Run Name: shows name of the run the job belongs to

  • Run Owner: shows the user ID of the run owner

  • Job ID: shows ID of the job

  • Node ID: ID of the node the job is currently executing on; the ID links to the respective node page

  • Node Host: hostname of that node

  • Execution Time: the time the job has been executing for

Below the table two buttons allow you to abort or reschedule the selected set of jobs.

Run Results page

This page shows a list of all directories that store results of an EnFuzion run (see Figure 10-12):

Figure 10-12. The Run Results page

  • Selection: the first column allows you to add and remove completed runs from the selection

  • Run ID: ID of the completed run, the ID links to the page with the contents of the run directory where output files are stored

  • Name: name of the run

  • Status: "done" or "failed"

  • User: the owner of the run

  • Account: user specified string

  • Submitted: time of submission of run

  • Completed: time of completion of run

  • Uptime: time the run was up

  • Total Time: the sum of execution times for all the nodes

  • Jobs Waiting: number of waiting jobs. If the run is aborted, then this number represents the number of uncompleted jobs

  • Jobs Done: number of done jobs

  • Jobs Failed: number of failed jobs

  • Jobs Rescheduled: number of rescheduled jobs

  • Job Length: average length of a single job

  • Data Jobs Done: number of done data jobs

  • Data Job Length: average length of a single data job

  • Nodes: number of used nodes, which links to the page of nodes used.

Beneath the table, the following buttons allow you to inspect details of the selected run:

  • Output: shows contents of the directory containing files output by the run

  • Log: shows run log

  • Completed Jobs: takes you to the Completed Jobs page

  • Used Nodes: view the Used Nodes page

  • Delete: deletes the run directory and all user files in the directory. This operation deletes all information about the run, use with care!

By following the Run ID link or using the output button, the user may browse the contents of run directories in a similar fashion to browsing a file system with a file manager. Clicking on a directory displays its contents, their sizes and the dates of their last modification (see Figure 10-13):

Figure 10-13. Run Directory

You can browse recursively through the subdirectories and download the files contained in them.

Used Nodes Page

This page shows a table of all nodes used by the specified run (see Figure 10-14):

Figure 10-14. The Used Nodes Page

  • Node ID: node ID; it links to the appropriate node page

  • Host Name: host name of the node

  • Jobs Done: number of jobs completed on this node

  • Data Jobs Done: number of data jobs completed on this node

  • Nice: execution priority

  • User: the account used on the node

  • Directory: the working directory on the node

Accounting Page

The accounting page lists available run and node activity reports. At (see Figure 10-15):

Figure 10-15. The Accounting Page

At the top of the page, a "Change Report Layout" button takes you to the Report Layout page and below it three links, "Hourly Reports", "Daily Reports" and "Monthly Reports" take you to the parts of the page listing hourly, daily and monthly reports, respectively.

The three tables below list reports by period of activity: run reports in the left column and node reports in the right column. First table lists hourly reports, the second one daily reports and the last table lists monthly reports: clicking on the links in the table shows the desired report.

Report Layout Page

The report layout page lets you to edit the columns shown in the run and node reports (see the Section called Accounting Page):

Figure 10-16. The Report Layout Page

The first table is dedicated to the node reports and the second one to the run reports. You may check the "Group By Column" checkbox in order to group report rows by certain columns. In this case, the values for grouped rows are added together. Entering a "Match Value" only shows the rows where the desired column's value matches the entered one.

You may use the buttons beneath each table to reset the layout specification to the default one.

At the bottom of the page, you may select a group filter for run reports. Only runs owned by users in the selected group will be shown in the reports.

Changes to layout should be committed by clicking on the "Apply Changes" button.

Report Pages

Each report page starts with a header that describes the report type and the period for which the report stands. The actual report table follows and the page ends with a button that allows you to change the report layout.

Reports are available for runs (see Figure 10-17) and nodes (see Figure 10-18):

Figure 10-17. Run Report

Figure 10-18. Node Report

Error Messages List

This section lists error messages that the Eye produces.

General Error

An unpredicted error occurred. Please follow the instructions on the page in order to try and remedy the problem. Retry your action and if it fails again, restart the Eye and retry your action again. If the problem persists, send a bug report with a detailed description of how to reproduce it to support@axceleon.com.

Error: Access Denied

The client has been denied access to the Eye. You should check you access permissions in the root.options file.

Error: Authentication Failed

The client failed to log in to the Eye and to the Dispatcher. Check that you have used a proper user identity file, generated by the enfcmd utility, and that the file has not been altered by anyone.

Error: Connection Failed

The Eye was unable to connect to the Dispatcher. Please verify that the Dispatcher is actually running and that the Eye has been setup to try and connect to the proper port.

Error: Empty Selection

You have attempted to perform an action that requires at least one selected item, but you have selected none.

Error: Multiple Selected Items Not Allowed

You have attempted to perform an action that requires exactly one selected item, but you have selected more than one.

Error: Action Not Permitted!

You have chosen an action that requires user privileges that you do not have: perhaps you have attempted to perform an administrative action while not logged in as a user with administrative privileges or have chosen to manipulate a run that is not owned by the user you are currently logged in as.

Error: The Eye has Quit

The Dispatcher was run in the batch mode and has exited, bringing down the Eye with him. You need to start the Eye manually if you wish to browse the run results after the Dispatcher has quit or set the eyeterminate option to off in root.options file which will prevent the Dispatcher from taking down the Eye when it exits.

Error: Login Failed

Your session has probably expired, please go to the Home Page and attempt to log in again.

Error: Dispatcher Not Found

The Eye was unable to connect to the Dispatcher. Check the port number of the Dispatcher given to the Eye through command line options or entered through the login page.

Error: No File Name

You have attempted to submit a file without specifying which file.

Error: No Such Node

You have attempted to display information about a node that does not exist.

Error: No Such Run

You have attempted to fetch information on a run that the Dispatcher does not recognize.

Error: No Run Results

The results of the requested run do not exist, or are in a directory with a non-default name.

Error: No Reporting Data

Reporting data for the period you want the report for was not found.

Error: Page Not Found

You have requested a page that the Eye knows nothing about.

Error: Session Limit Reached

The maximum number of concurrent sessions that the Eye is willing to handle has been reached. You need to wait for one session to expire. Currently, the Eye supports 256 sessions. A session expires after a week of inactivity.

Error: Run Submission Expired

The run submission has expired since you have not completed it in a reasonable time. The submission cannot be completed and you need to resubmit the run.

Error: Run Submission Failed

The Eye was unable to submit the run to the Dispatcher.

Error: Mandatory Parameters Missing

A parameter that is mandatory was not entered. This error might happen whenever the Eye requires you to supply some values for an action like editing node or run attributes, adding a node or similar.

Error: Invalid Parameter Value

A parameter you entered has a value that is not allowed.

Error: Passwords Do Not Match

When adding a node you need to enter the same password twice in order to confirm it. You have not entered the same password in both text fields.

Handling of Privileges

Root options noanonsubmit, see details in the Section called Rejecting Anonymous Run Submission in Chapter 6, and privileges, see details in the Section called Enforcing Privileges in Chapter 6, affect which actions can be performed by users. By default, noanonsubmit and privileges are turned off, which allows any action to be performed by any user.

If noanonsubmit is turned on, then the following action is not permitted by users with the anonymous user ID:

If privileges are turned on, then the following actions are permitted only by users with administrative privileges:

If privileges are turned on, then the following actions are permitted only by users with administrative privileges or the run owner:

Access Control

The Eye offers IP-based authentication. The administrator can set a list of IP addresses that are allowed or denied to connect to the Eye (see the Section called Restricting Access to the Eye in Chapter 6 for details).