RealityGrid: an integrated approach to middleware through ICENI

Jeremy Cohen, A Stephen McGough, John Darlington, Nathalie Furmento, Gary Kong, Anthony Mayer

Abstract

The advancement of modelling and simulation within complex scientific applications is currently constrained by the rate at which knowledge can be extracted from the data produced. As Grid computing evolves, new means of increasing the efficiency of data analysis are being explored. RealityGrid aims to enable more efficient use of scientific computing resources within the condensed matter, materials and biological science communities. The Imperial College e-Science Networked Infrastructure (ICENI) Grid middleware provides an end-to-end pipeline that simplifies the stages of computation, simulation and collaboration. The intention of this work is to allow all scientists to have access to these features without the need for heroic efforts that have been associated with this sort of work in the past. Scientists can utilise advanced scheduling mechanisms to ensure efficient planning of computations, visualize and interactively steer simulations and securely collaborate with colleagues via the Access Grid through a single integrated middleware application.

Keywords:

1. Introduction

The advancement of modelling and simulation within complex scientific applications is currently constrained by the rate at which knowledge can be extracted from the data produced. As Grid computing evolves, new means of increasing the efficiency of data analysis are being explored. RealityGrid1, an Engineering and Physical Sciences Research Council project that is exploring the use of Grid technology to investigate the structure of matter at the meso- and nanoscale levels, aims to enable more efficient use of scientific computing resources within the condensed matter, materials and biological science communities.

The Imperial College e-Science Networked Infrastructure (ICENI; Furmento et al. 2002a,b; Mayer et al. 2005) Grid middleware provides an end-to-end pipeline that simplifies the stages of computation, simulation and collaboration. Scientists can utilise advanced scheduling mechanisms to ensure efficient planning of computations, visualize and interactively steer simulations and securely collaborate with colleagues via the Access Grid through a single integrated middleware application. In this paper, we explore various computer science issues raised by the RealityGrid project and show how the ICENI Grid middleware has been developed to tackle some of these issues.

ICENI has been developed through effort at the London e-Science Centre over a 3 year period. The main purpose of this work has been to investigate novel aspects of Grid computing and its applicability for integration into a full Grid architecture. These have included a service-oriented architecture that supports all phases of Grid application construction and execution, along with a full end-to-end pipeline including workflow scheduling and enactment. As such, the use of ICENI within the RealityGrid project has been that of ‘deep-track’ research. By deep-track research, we refer to the aim of developing more generic, long-term solutions to the type of problems raised in this project. Other project partners are working on shorter-term, more specific solutions, to address particular project aims.

The rest of the paper is organized as follows. In §2, we introduce the stages of the application pipeline. These stages are covered in detail in §§§3–5. In §6, we describe the deployment work that has been undertaken. In §7, we discuss the usefulness of the current ICENI implementation to RealityGrid, and other projects, and outline how the future plans for ICENI II can improve this situation. We conclude in §8 by looking at future work that can further enhance ICENI's contribution to the RealityGrid project.

2. The application pipeline

The application pipeline consists of a series of operations required during the lifetime of an application. These operations can be grouped into three broad areas:

This stage involves the preparation of the code for the execution environment. It also involves the specification of the application metadata that will be used, for example, during the scheduling phase, as well as the possible componentisation of the code.

The application is executed within a Grid environment. This stage of the pipeline includes scheduling execution against available resources. In most cases, the scheduling algorithm will try to minimize the execution time of the overall application. Different criteria might also be used, such as the location of the resources on which the different components will execute (this might be specified by the end-user when composing and submitting the application).

While the application is running or at the end of its execution, the output from the application execution may be analysed. This can be done through visualization—either in real time or after completion of the execution—or by other means of mining results data.

These three steps provide an end-to-end pipeline that allows end-users to deploy their application for execution on the Grid resources, and to interact with the application while it is running. We now present in more detail each of the stages of the application pipeline.

3. Deploying an application

The first stage of the application pipeline is the deployment of the application. Placing an application on to the Grid raises two issues—how native code, not designed for remote deployment, can operate remotely within the context of a distributed system and how to select the target resources to provide optimal execution times and resource usage. This section looks at the issue of ‘Grid enabling’ native applications, while optimal selection of target resources is carried out at execution time and discussed in §4.

(a) Legacy code issues

LB3D (Chin et al. 2003), a three-dimensional lattice-Boltzmann code written in Fortran90 and parallelized using the Message Passing Interface (MPI) standard, was developed by partners within the RealityGrid project and is used extensively in application pipeline development within the ICENI framework. In order to support LB3D and similar legacy applications, ICENI provides the ‘binary component’ (see figure 1). This is a mechanism by which a legacy executable, compiled on a particular machine and linked with particular libraries, is wrapped as an ICENI component. As such, it is published as a component service and made available for composition within the ICENI runtime system. In order to target multiple resources, the native application must be compiled and wrapped for each target platform.

Figure 1

An ICENI binary component.

A new binary component is defined through a metadata document. Different characteristics of the binary component need to be specified: such as the location of the binary code, the files that need to be staged before and after the execution, as well as the possible resource names on which the code may be executed. The binary component builder is a tool that simplifies wrapping of native code applications into binary components (see figure 2).

Figure 2

The ICENI binary component builder.

The ICENI component model abstracts the application view such that where an application is available on many resources this plurality may be hidden from the user. The user may select a location specific binary component, or may select a location independent binary component, leaving the choice of which binary component (and hence hardware resource) to the runtime scheduler.

The binary component has ports that allow standard input, output and error to be streamed to and from the native executable's process. However, most scientific applications require files to be made available at initialization, and following execution, they output their results to local files. LB3D is no exception. To overcome this restriction, the binary component wrapping includes a specification of required and provided files, and stages these files using either Grid File Transfer Protocol (GridFTP) or Hypertext Transfer Protocol (HTTP) protocols, or local file copy. Details of the particular protocols enabled locally are available from each resource's launcher service and any resource registered within ICENI must specify which of these protocols it provides. For each of the supported protocols, the appropriate metadata should be provided. For example, in the case of a Web server, the address for the Web server and the path to where files are published. The selection of a specific launcher to execute a component will determine which protocol has to be used to transfer files. This process of automatic file staging hides the remote nature of the code execution from the user and allows the user interface component (in the case of LB3D, a parameter input panel) to execute on a different machine from the binary in a transparent manner.

(b) Componentizing LB3D

The LB3D application has been wrapped as a binary component. The configuration of the component is done through a JDML (Job Description Markup Language; McGough 2003) document that specifies different parameters for the executable such as the location of the executable, a list of arguments, and an input and output sandbox. In order to stage files, either to or from the resource, the document needs to contain detailed descriptions of the possible transport mechanisms. LB3D is configured through its own input file, which uses attribute-value pairs to define the application behaviour. An input editor component presents the contents of the input file within a graphical interface allowing easy editing of the input values prior to execution of the LB3D application. Simple output components, part of ICENI's built-in set of basic components, are then connected to the LB3D binary component to allow visual monitoring of the standard output and error streams. This results in a four-component application composition (see figure 3).

Figure 3

LB3D component composition within ICENI.

4. Application execution

(a) Scheduling an application

Dynamically selecting a resource that has low load and guaranteed availability along with sufficient processing power in the context of a continuously changing Grid environment is a complex task. The key to approaching even an approximate solution is information, and for this purpose, ICENI is built around the concept of capturing metadata wherever possible. Each of the components within the LB3D application composition is scheduled by the ICENI scheduling framework and each may execute on a different physical system. Resource information is provided by a published resource service, which is discovered by the ICENI scheduling framework. This allows the scheduling architecture to keep an up-to-date model of the resources on the Grid. In addition, scheduling decisions require performance characteristics of the application (McGough et al. 2004a,b; 2005).

Application performance models are constructed from three sources of metadata: application composition, component workflow and component performance data.

The composition of the application is accessible through ICENI's component-based design model, and is thus provided by the user on an application-by-application basis.

The activity of each individual component is encoded within component metadata provided by the component designer.

The performance data is used to predict the execution time of the whole application. Historical performance data is processed in order to improve future predictions. This metadata is stored in the ICENI performance repository.

The gathering of component performance data is achieved by instrumenting the ICENI runtime system so as to provide events recording the initiation and completion of component threads and all intercomponent communication. Together with the composite workflow model of the application, these events can be used to infer activity times for the individual components. This data is captured by a performance repository service, which listens to performance events, and stores the event data using a pluggable database allowing information to be stored in commodity implementations such as those based around the Standard Query Language (SQL) access to relational databases. The event information includes the values of flagged application parameters, as well as the particular resource upon which the execution times were observed (McGough et al. 2004b).

When the ICENI scheduling system requires performance characteristics, the appropriate information is supplied by the repository. The option is also available to use developer supplied component performance models in the place of historical data. The various scheduling mechanisms are described in (Young et al. 2003). Support for advanced reservations is also integrated into the ICENI scheduling framework allowing execution capacity to be reserved as required (McGough et al. 2004a).

(b) Execution

At runtime, the LB3D input editor component generates its output file and then passes control to the LB3D binary component, the next component in the composition. It also passes information about the location of any files that may be required by the next component to execute, in this case the LB3D configuration file. The binary component can then obtain the required data before execution, using local knowledge about how to transfer data from the resource that was running the input editor component. In a similar way, files can be transferred from the resource running the binary component at the end of the execution of the LB3D application. As output is produced on the standard output and error streams, this data is displayed within the output components, which may execute on different resources to the binary component. This allows a remote resource to execute the LB3D algorithm while the output data is transmitted back to the user's local system in real-time.

5. Collaborative visualization and steering

ICENI supports collaborative e-Science by providing a number of mechanisms that enable a simulation (such as LB3D) to be visualized and steered by multiple remote partners (see figure 4), and supports an integrated Access Grid ‘collaboratory’.

Figure 4

The Netbeans interface is used to start the LB3D, visualization and steering applications, the steering Graphical User Interface (GUI) is then used to steer the LB3D execution, affecting the visualization output.

Much of the science enabled by LB3D concerns features (such as the gyroid phase) that are very difficult to discover automatically from the data; such features are only discernible through visualization. ICENI enables easy remote visualization of an existing LB3D simulation through the exposure of running components as services. Collaborators can discover an executing application, and connect visualization components at runtime without interfering or linking with the initial application.

Visualization components are provided to allow both local rendering of the visualization data and to transmit the visualization over the Access Grid. Multiple display components can be added to a visualization composition allowing some systems to visualize the output data locally while others receive the visualization via the Access Grid. Stereo visualization components are also available.

The RealityGrid ‘fast-track’ steering library (Brooke et al. 2003) has been wrapped into an ICENI component thus exposing the steering library's interface to other components. The library itself provides hooks and instrumentation into LB3D (and many other RealityGrid applications), and provides the link from the infrastructure to the application. As an ICENI component, the steering library is published as a service, and may be discovered and invoked by anyone with the correct access privileges. By utilising the ICENI framework, multiple clients can invoke the steering library, and their commands are passed in a coherent way to the application. This is more beneficial than using the steering library alone.

An ICENI steering proxy component wraps the code of the native library using the Java Native Interface (JNI). The ICENI steering client components, which provide a graphical interface for a user to easily steer the LB3D computation, interact with the ICENI proxy component in order to access the steering library's functionality (see figure 5). Multiple steering clients can be connected to a single steering proxy component.

Figure 5

Component composition for the LB3D steering application.

The visualization tool uses the Visualization ToolKit (VTK)2 and is written on top of the Chromium library.3 For those not willing or able to run Chromium and VTK, a facility is provided to stream the video of the visualization through the Access Grid.

A further series of ICENI components allow Access Grid control through the ICENI framework. The Access Grid nodes are started through ICENI and the control of the different nodes is achieved centrally through the Access Grid Controller, another ICENI component. Any modifications to the Access Grid session from one node are automatically propagated to all other nodes. The controller interface, a Java Swing GUI, provides a full complement of Access Grid control functions through an easy-to-use interface.

The ICENI Access Grid components provide a means to generate and distribute session keys for encryption of video and audio data. These keys can be generated from any control component, they are then sent to all the display nodes that are part of the session. Any third-party user connected to the Access Grid room will then be unable to connect to the ICENI session without access to the correct key.

The ability to broadcast visualization data over the Access Grid is of particular importance when working with collaborators in different locations. Visualization data appears alongside the video windows of the other users in the Access Grid session, simplifying the analysis of the data. The ability for a single operator to start up and configure the Access Grid node at one or more remote locations, along with generating encryption keys and distributing to the other locations, provides security and reduced administration costs.

6. Test deployment

LB3D has been compiled for two of the London e-Science Centre's parallel resources, a 24×750 MHz UltraSparcIII processor Sun E6800 and a 64 node Dual Pentium 4 Xeon Myrinet interconnected cluster. Because performance testing is not the focus of this paper, this issue was not considered during the test deployment. The focus was on the complete application pipeline, providing more than one parallel resource for the scheduling framework to consider. This allowed scheduling to take into account a choice of two available implementations at different execution locations. By submitting the application to run several times, performance data can be fed into the performance database enabling more accurate decisions about how to schedule the application in the most efficient manner.

7. ICENI and the wider Grid community

ICENI has now had exposure in the wider Grid community, though not enough time has yet passed to make a full assessment of its potential, through our involvement with projects and uptake through interested third parties. The GENIE project has used ICENI in order to Grid enable their unified Earth System Model, this has allowed them to significantly reduce their run time (Gulamali et al. 2004). The e-Protein project is using ICENI to control their gene annotation workflows (O'Brien et al. 2004). The Immunology Grid project is using ICENI in areas of molecular medicine and immunology.4

Through feedback from the RealityGrid project and the other parties mentioned, we have been assessing the usability of the current ICENI implementation. Although we still feel that the underlying concepts and architectural ideals developed are sound, it is apparent that the current ICENI implementation has become overburdened through the process of software decay. Over the last few years, the Grid community has been moving rapidly between different underlying middleware architectures, this has led to ICENI seeming somewhat outdated by its original selection of the JINI network technology architecture. This situation now appears to have resolved itself with Web Service-oriented middlewares becoming the prevalent standard. ICENI has also suffered from a process of incremental design, as new features were incorporated into the implementation over time. This has resulted in the current realization being cumbersome to install, as each feature requires the full ICENI installation. However, none of these factors are intrinsic to the underlying ICENI architecture.

From these observations, we have come to the conclusion that a refactoring of the ICENI implementation is required. The main aims of this refactoring, to be known as ICENI II, are outlined below:

  1. Develop ICENI on top of Web Services. ICENI has always been architectured in a communications agnostic manner. It would now appear correct to develop ICENI on top of Web Services while still retaining this agnostic approach.

  2. Decompose the ICENI architecture into a number of separated composable toolkits, each of which can be used separately to perform tasks within the Grid. Alternatively, these toolkits, and those from other Grid technology developers, can be used in an ‘à la carte’ fashion with the sum functionality being greater than that of its parts.

  3. Reduction of the footprint of ICENI on resources within the Grid. A barrier to the adoption of ICENI has been the amount of code that needs to be deployed on to a resource. By making most of ICENI optional, only those bits that are required need to be installed.

  4. Tightly defining the functionality of each of the toolkits. This should minimize software decay and allow each toolkit to focus on one goal.

The first step towards ICENI II has been the development of a lightweight Web Service job submission and monitoring service (GridSAM)5 based on an original prototype (WS-JDML; Lee et al. 2004). GridSAM uses the upcoming Job Submission Description Language (JSDL) (Anjomshoaa et al. 2005), evolved from JDML, and standardized through the Global Grid Forum. This work is being hardened through collaboration with the Open Middleware Infrastructure Institute.6

8. Conclusion and further work

The combination of LB3D, the fast-track steering library and ICENI has enabled the deployment of a complete end-to-end Grid infrastructure highlighting many advances in collaborative working environments for scientists. The ability for several geographically distributed scientists to come together via the Access Grid and jointly monitor and steer a complex computation in real-time is an important advancement of the Grid. In addition, there is the provision of tools aimed directly at making the complex installation and configuration procedures of the software practical for end users. This adds a vital element for future wider deployment of the software.

Future work includes the transition from the current fast-track file-based steering approach to the use of service-based steering. ICENI's service oriented architecture is ideal for this approach, making the movement of steering data between nodes within a Grid easier and more transparent to the user. The development of ICENI II toolkits should allow those parts of ICENI best suited to the RealityGrid project to be used within the project without overburdening them with other advanced features that are not required.

In providing a full end-to-end Grid middleware through ICENI, utilising collected metadata, it has been possible to provide an automatic process for the e-Scientist to move from concept to useful collaborative resource without the need for the heroic effort often associated with this process.

References

View Abstract