GriddLeS: Grid
Enabling Legacy Software
|
||||||||||||||
|
||||||||||||||
|
OverviewComputational and data Grids couple geographically distributed resources such as high performance computers, workstations, clusters, and scientific instruments. Accordingly, they have been proposed as the next generation computing platform for solving large-scale problems in science, engineering, and commerce. Unlike traditional high performance computing systems, such Grids provide more than just computing power, because they address issues of wide area networking, wide area scheduling and resource discovery in ways that allow many resources to be assembled on demand to solve large problems. Grid applications have the potential to allow real time processing of data streams from scientific instruments such as particle accelerators and telescopes in ways which are much more flexible and powerful that is currently available. A number of prototype applications have been built and these demonstrate that the Grid computing paradigm holds much promise. Of particular interest are applications, called Grid Workflows, that consists of a number of components, including: computational models, distributed files, scientific instruments and special hardware platforms (such as visualisation systems). Importantly, such workflows are interconnected in a flexible and dynamic way to give the appearance of a single application that has access to a wide range of data, and running on a very high performance platform. Grid workflows have been specified for a number of different scientific domains including physics, gravitational wave physics, geophysics, astronomy and bioinformatics. Much of the effort in Grid computing is being directed towards the construction of new applications, in many cases written from scratch. We are interested in building new applications, but from legacy components. In particular, we want to leverage the billions of lines of code embodied in existing scientific and engineering codes, by stitching them together into new Grid aware applications. Over the past 5 years we have constructed a software tool called Nimrod/G, which allows a user to migrate a particular class of applications to the Grid. Specifically, it automates the execution of parameter sweep applications (parameter studies) over global computational grids. Nimrod is particularly novel because it supports user-defined deadline and budget constraints for scheduling computations and manages the supply and demand of resources in the Grid using an experimental computational economy. Thus, using Nimrod/G, we have demonstrated that it is possible to build specific Grid application very easily and quickly for a niche class of problems, namely parameter sweeps. However, Nimrod/G cannot be used to build general grid workflows. The GriddLeS environment provides a more general environment than Nimrod, one that facilitates the composition of arbitrary grid applications from legacy software. The underlying belief is that is possible to take existing programs and grid enable them by providing a high level tool that facilitates the composition of complex systems from smaller, working components. A user of this environment interacts with a visual, graphical manipulation language to describe the interaction between programs, data sources, and IO devices such as shared scientific instruments. One of the more important aspects of GriddLeS is the mechanism it uses to support communication between components. GriddLeS supports the construction of complete applications without source modification to the existing components. To achieve this we have overloaded the normal IO primitives in conventional languages so they support interprocess communication as well as file operations. This allows the individual components to behave as though they are operating in a conventional file system, whilst in fact they are sending and receiving data across a distributed grid infrastructure. The mechanism, called GridFiles, is very flexible and can be implemented by a range of different techniques, from file copy to IP sockets.
The Local File Client simply passes the calls onto the local file system, using the file name as resolved by the GNS. The Remote File Client connects to a Grid FTP server on the remote machine, and passes back blocks of the file as required. Note that the GridFTP server is a standard part of the Globus distribution, not a special component of GriddLeS. The Grid Buffer Client is responsible for implementing inter-process communication. It connects to a corresponding Grid Buffer Server on the other host, and sends blocks of data for each local WRITE call. At the other end of the socket, the Grid Buffer Client reads blocks by making calls to the local Grid Buffer Server. A cache file can be stored at either the sending end of a Grid Buffer connection or the receiving end. The GRS Client is used to implement a GriddLeS data Replication Service. When an application opens a replicated file, the GRS makes an actual binding to one of those replicas. It determines the most appropriate one by measuring the available bandwidth to each replica (using tools such as the Network Weather Service (NWS), and it dynamically switches source during program execution should the bandwidth change. The GRS has been designed to support a variety of replication services, but the current implementation uses the Storage Resource Broker (SRB) from SDSC.
|