EnFuzion runs describe work that is scheduled and executed by EnFuzion on remote machines. Runs can be either command line programs, scripts or parametric executions. A parametric execution contains multiple jobs that share execution commands, but have different input parameters and different outputs. Parametric executions are optimized for applications, where the same program is executed again and again, thousands of times if necessary, each time with different input parameters. Normally, each instance of application execution represents one job .
Runs are submitted by EnFuzion users from submit hosts, which are normally local user machines. Jobs are executed by EnFuzion nodes, which are computer hosts that perform the computation. A central host, called EnFuzion root, controls the nodes and manages job execution.
EnFuzion implements the concept of a user. All interactions with EnFuzion at run time are assigned an owner user ID. User IDs are used for generating activity reports and for restricting permitted actions.
The sections below explain basic EnFuzion concepts in more detail.
EnFuzion can manage remote execution of regular command line programs and scripts. Additionally, it is optimized for parametric executions, executing the same application many times with different input parameters. Parametric executions are common in computational modeling, simulations, and analysis. Many tasks can be reduced to parametric executions, such as Monte Carlo analysis, design optimization and verification, computational experiments, data mining, searching, combinatorial optimization, what-if scenarios and other similar tasks.
Each parametric execution is described by a run, which is a container for jobs that perform the same commands with different input values.
User jobs are submitted through runs. Each run specifies an environment for job execution and contains one or more jobs. The number of jobs in a run can range from one to millions of jobs.
A run can be either a command line program, a script or a parametric execution, containing many jobs. A parametric execution consists of tasks , job descriptions and configuration options. Tasks include commands that are executed for each job in the run. These task commands provide instructions on how to execute applications, specify input and output files and such. All jobs in a run share the same tasks. Job descriptions provide specific input values for each job. The run configuration specifies the options that determine run behavior. Run options can determine, for example, run priorities, timeout limits and so forth.
Runs are described in detail in Chapter 8. For run options see the Section called Options in Chapter 8.
A job corresponds to one unit of work. It executes commands from the common tasks in the run, but uses its own specific input parameter values.
EnFuzion supports two kinds of jobs. Regular jobs must have an associated task description and a set of input parameters. These regular jobs are simply referred to as jobs. They are used in most applications. Datastream jobs consist of input data and resulting output data. Datastream jobs are referred to as datajobs. They deliver higher throughput, with less overhead than regular jobs and are better suited for certain special applications.
Run execution results in contexts. For each node, executing jobs from a run, the run maintains a context with temporary information about the node. A context is created dynamically during job execution, after a node has been initialized to execute the jobs of the run. An initialization can be installation of the execution binary on the node or the copying of common files to the node. If a run or a node is terminated, the corresponding context is deleted. Contexts are handled automatically by EnFuzion. There is no need for users to issue any special context commands.
Submit computers are used to submit jobs for execution. These are usually local user machines, although any other machine can be used to submit jobs.
Users on submit computers can use a standard web browser to submit jobs and communicate with the EnFuzion root. In that case, there is no need to install any EnFuzion related software on the system.
EnFuzion provides additional programs, which allow job submission from a command line, provide user identification and simplify job preparation. If these features are required, then EnFuzion submit components must be installed on each submit system.
Besides having EnFuzion submit software installed, there are no special requirements for any additional software or hardware on the submit host.
If the EnFuzion is used as a service on the network , then the service address must be specified in the submit.config file. Details are provided in Chapter 5.
There are no background EnFuzion processes on submit hosts. All EnFuzion processes are executed under the explicit user control.
Users can communicate with the root from their submit computers using a web based interface, a command line program or directly through a network based application programming interface.
Jobs are submitted and results can be retrieved by users from their submit computers through a standard web browser, which communicates with the Eye process on the root. See the Section called Graphical Web Based Interface in Chapter 10 for more details on this process.
EnFuzion provides the command line program Enfsub, which is used to submit jobs. This command is detailed in the Section called The enfsub Program in Chapter 10. The enfsub command is useful to automate submission steps in scripts, which can be implemented through standard scripting languages, such as shells, Perl, Python and Ruby.
EnFuzion provides the command line program Enfcmd, which is used to monitor and control submission as well as to retrieve results. This command is detailed in the Section called The Enfcmd Program in Chapter 10. The enfcmd command is useful to automate EnFuzion activity in scripts, which can be implemented in standard scripting languages, such as shells, Perl, Python and Ruby.
Alternatively, other programs in programming languages such as C/C++ and Java can communicate directly with the root through the EnFuzion network based API, which provides a complete range of commands to control the root. See the Section called Application Programming Interface in Chapter 10 for more details on this topic.
The root is the central component of an EnFuzion cluster. It controls the networked cluster nodes, handles communication with users, and manages the execution of jobs. Each root can control hundreds of nodes and can process thousands, or even millions of jobs, sometimes in just a few minutes.
The root activates and terminates cluster nodes. It exchanges heartbeat messages with nodes to determine their availability. It sends jobs for execution to nodes and retrieves job results.
Besides having EnFuzion root software installed, there are no special requirements for any additional software or hardware on the root host. Since EnFuzion itself introduces little overhead, regular desktop computers can serve as cluster roots, even for very large clusters. In most EnFuzion environments, the load on the root host is light, so almost any computer can act as an EnFuzion root, as long as it provides sufficient disk storage for EnFuzion users.
Cluster nodes are described in the enfuzion.nodes file. The root provides a range of user configurable options in the root.options file. For detailed descriptions see the Section called Specifying EnFuzion Nodes in Chapter 6. For a description of cluster options, see the Section called Specifying Root Configuration Options in Chapter 6. The handling of EnFuzion users is configured by several files: users, which modifies default user assignments; groups, which lists group memberships; admins, which specifies users with administrative privileges; and user.accounts, which specifies how user accounts are determined on the nodes. These files are described in the Section called Specifying User Identities in Chapter 6, the Section called Specifying Groups in Chapter 6, the Section called Specifying Administrators in Chapter 6, and the Section called Specifying User Accounts for Job Execution on Nodes in Chapter 6.
The central process on the root is the Dispatcher, described in Chapter 9. The Dispatcher controls several subprocesses, including the node manager to manage the nodes, the node starter to start the nodes and the job daemon to execute job commands on the root.
The root also hosts the Eye process, which provides a web based user interface to the Dispatcher.
The EnFuzion root can be monitored and controlled using any standard web browser, the command line program Enfcmd or directly through a network based application programming interface (API) . Web interface is provided by the Eye, which is executing on the EnFuzion root host. The Eye is described in more detail in the Section called Graphical Web Based Interface in Chapter 10. The enfcmd command is detailed in the Section called The Enfcmd Program in Chapter 10. Finally, the API is described in the Section called Application Programming Interface in Chapter 10
When jobs are submitted to the root, the root prioritizes their execution and executes them as nodes become available. The root communicates with nodes in order to maximize job throughput and to assure fast and reliable job execution. Through its resource management capabilities, the root matches job requirements with node capabilities. If a node becomes unavailable or a system error occurs, a job is automatically restarted on one of the working nodes.
Cluster nodes execute user jobs. A cluster can have hundreds of nodes, and each node can be configured to execute more than one user job. Furthermore, more than one cluster node can run on a single computer. This is useful for powerful computers with multiple processors.
Node computers can vary in size and functionality, ranging from desktop computers to powerful servers running Windows NT/2000/XP, Unix, or Linux. There are no special hardware or software requirements for nodes. All that is really required is to have the EnFuzion node software installed and a TCP/IP connection to the root host.
Nodes provide a range of user configurable options in the node.config file , detailed in Chapter 7. Load monitoring options, which determine when a node is available to execute an EnFuzion job, are specified in the enfuzion.options file. The enfuzion.nodes file is detailed in Chapter 7.
The main process on the node is the node server. The node server communicates with the root and manages other processes on the node. User jobs are executed by the job server processes. Each job has its own job server processes. The job server manages all aspects of job execution, such as controlling user commands and the copying of files.
EnFuzion provides a wide range of load monitoring options for the node hosts. These options specify when a computer is idle and when it is available to execute user jobs. Examples of load monitoring options include 'no interactive use', 'sufficient available RAM', 'sufficient available disk space', and 'low CPU load'. Options are controlled by system administrators to provide optimal utilization of resources in their computing environment. See Chapter 7.
EnFuzion implements the concept of a user. All interactions with EnFuzion at run time are assigned an owner user ID. This owner assignment is used in accounting reports to identify the work done by a single user or to restrict user actions.
A user is identified by a string in the form <user>@<host_name>. By default, <user> is the account name of the user that is submitting the run and <host_name> is the host name of the computer where the submission is performed. <host_name> is usually the fully qualified domain name (FQDN) of the host. If the domain name is not set, but the host name is, it is equal to the host name. Otherwise, it is the IP address of the local host. If EnFuzion is unable to determine the default user ID string, a generic anonymous user ID is assigned as the run owner.
The default user ID string can be changed by the EnFuzion administrator through a configuration file on the EnFuzion root system.
An EnFuzion user ID is assigned to each run, when the run is submitted for execution. The user assignment cannot be changed later.
To enhance security and simplify usage, EnFuzion delegates the task of user identification to the operating system of the submit computer. When a user connects to EnFuzion for the first time, the user account name on the submit computer and the submit computer host and domain name are used to form a user identification string, which is sent to the Dispatcher on the root system. The EnFuzion user cannot influence the user assignment.
If the run is submitted through a command line, this user identification and assignment are done transparently to the EnFuzion user.
If the run is submitted through a web browser, then the EnFuzion user must perform a login. Otherwise, a generic anonymous user ID is assigned as the run owner. The user performs a login by using an identification file that was generated by the EnFuzion enfcmd command line utility.
An administrator can restrict actions that regular EnFuzion users can perform. By default, there are no restrictions and any EnFuzion user can perform any action.
Privilege enforcement is turned on by the administrator in a configuration file on the EnFuzion root system. This enforcement restricts actions of regular EnFuzion users. They can only add new runs and control runs that they own. They are not allowed to control the cluster by performing actions, such as removing a run owned by another user, adding and removing nodes, shutting down the cluster, and modifying cluster and node settings and properties.
Even if the privilege enforcement is turned on, there are no restrictions on actions by the users that are identified as EnFuzion administrators. These users are enumerated in a configuration file on the EnFuzion root system.
EnFuzion users can be grouped by the administrator in order to report combined activities of related users. Users can be members of one or more user groups.
Groups are useful to generate combined activity reports for different departments or group projects.