GriddLeS: Grid Enabling Legacy Software
Home Overview Applications References Contact

 

 

Overview

Computational and data Grids couple geographically distributed resources such as high performance computers, workstations, clusters, and scientific instruments. Accordingly, they have been proposed as the next generation computing platform for solving large-scale problems in science, engineering, and commerce. Unlike traditional high performance computing systems, such Grids provide more than just computing power, because they address issues of wide area networking, wide area scheduling and resource discovery in ways that allow many resources to be assembled on demand to solve large problems. Grid applications have the potential to allow real time processing of data streams from scientific instruments such as particle accelerators and telescopes in ways which are much more flexible and powerful that is currently available. A number of prototype applications have been built and these demonstrate that the Grid computing paradigm holds much promise.

Of particular interest are applications, called Grid Workflows, that consists of a number of components, including: computational models, distributed files, scientific instruments and special hardware platforms (such as visualisation systems). Importantly, such workflows are interconnected in a flexible and dynamic way to give the appearance of a single application that has access to a wide range of data, and running on a very high performance platform. Grid workflows have been specified for a number of different scientific domains including physics, gravitational wave physics, geophysics, astronomy and bioinformatics.

Much of the effort in Grid computing is being directed towards the construction of new applications, in many cases written from scratch. We are interested in building new applications, but from legacy components. In particular, we want to leverage the billions of lines of code embodied in existing scientific and engineering codes, by stitching them together into new Grid aware applications.

Over the past 5 years we have constructed a software tool called Nimrod/G, which allows a user to migrate a particular class of applications to the Grid. Specifically, it automates the execution of parameter sweep applications (parameter studies) over global computational grids. Nimrod is particularly novel because it supports user-defined deadline and budget constraints for scheduling computations and manages the supply and demand of resources in the Grid using an experimental computational economy. Thus, using Nimrod/G, we have demonstrated that it is possible to build specific Grid application very easily and quickly for a niche class of problems, namely parameter sweeps. However, Nimrod/G cannot be used to build general grid workflows.

The GriddLeS environment provides a more general environment than Nimrod, one that facilitates the composition of arbitrary grid applications from legacy software. The underlying belief is that is possible to take existing programs and grid enable them by providing a high level tool that facilitates the composition of complex systems from smaller, working components. A user of this environment interacts with a visual, graphical manipulation language to describe the interaction between programs, data sources, and IO devices such as shared scientific instruments.

One of the more important aspects of GriddLeS is the mechanism it uses to support communication between components. GriddLeS supports the construction of complete applications without source modification to the existing components. To achieve this we have overloaded the normal IO primitives in conventional languages so they support interprocess communication as well as file operations. This allows the individual components to behave as though they are operating in a conventional file system, whilst in fact they are sending and receiving data across a distributed grid infrastructure. The mechanism, called GridFiles, is very flexible and can be implemented by a range of different techniques, from file copy to IP sockets.

GridFiles makes use of a “File Multiplexer” as shown here. This routine replaces the normal file IO library for a particular language, and allows the system to redirect file IO requests dynamically to local files, remote files or remote processes. In the latter case, a file multiplexer on the writer machine is linked with a symmetric file multiplexer on the reader machine. The device handles the synchronisation of readers and writers, and thus supports quite complex interprocess communication patterns. It is also possible to cache the data being transmitted between components

Normal file IO primitives are intercepted by the File Multiplexer, and these are processed either by the Local File Client, the Remote File Client or the Grid Buffer Client depending on whether the file reference is for a local file, a remote file or an inter-process socket (accordingly).

The GNS Client is responsible for resolving the local file names specified in the OPEN calls, and for mapping these to either local files, remote files, remote replicated files or remote processes. The File Multiplexer treats the GNS as a read only database, and matches up multiple OPEN calls. The GNS is loaded by a separate process responsible for configuring a grid application. Each entry in the GNS indicates what should happen when a particular file is opened on a particular resource. For example, if the file is to remain local to the resource, then the GNS simply stores the local file name. However, if the file is to be read from a remote resource, the full pathname of the remote file is stored in the GNS entry. If a Grid Buffer is required, then the local file name is mapped onto a Grid Buffer identifier.

.

The Local File Client simply passes the calls onto the local file system, using the file name as resolved by the GNS. The Remote File Client connects to a Grid FTP server on the remote machine, and passes back blocks of the file as required. Note that the GridFTP server is a standard part of the Globus distribution, not a special component of GriddLeS. The Grid Buffer Client is responsible for implementing inter-process communication. It connects to a corresponding Grid Buffer Server on the other host, and sends blocks of data for each local WRITE call. At the other end of the socket, the Grid Buffer Client reads blocks by making calls to the local Grid Buffer Server. A cache file can be stored at either the sending end of a Grid Buffer connection or the receiving end.

The GRS Client is used to implement a GriddLeS data Replication Service. When an application opens a replicated file, the GRS makes an actual binding to one of those replicas. It determines the most appropriate one by measuring the available bandwidth to each replica (using tools such as the Network Weather Service (NWS), and it dynamically switches source during program execution should the bandwidth change. The GRS has been designed to support a variety of replication services, but the current implementation uses the Storage Resource Broker (SRB) from SDSC.

GriddLeS can be combined with Grid workflow packages such Kepler, a public domain grid workflow system. Using Kepler, it is possible to specify advanced Grid workflows, and GriddLeS provides the IO mechanism that allows the components to communicate in flexible ways.



Disclaimer