Royal Society Publishing

Scientific Grid computing

Peter V Coveney

Abstract

We introduce a definition of Grid computing which is adhered to throughout this Theme Issue. We compare the evolution of the World Wide Web with current aspirations for Grid computing and indicate areas that need further research and development before a generally usable Grid infrastructure becomes available. We discuss work that has been done in order to make scientific Grid computing a viable proposition, including the building of Grids, middleware developments, computational steering and visualization. We review science that has been enabled by contemporary computational Grids, and associated progress made through the widening availability of high performance computing.

Keywords:

This theme issue of Philosophical Transactions of the Royal Society A is devoted to the new field of scientific Grid computing. But what is Grid computing? The definition adhered to throughout this issue is the following:Grid computing is distributed computing performed transparently across multiple administrative domains.

I would like to make several comments about this definition. The first is that the term ‘computing’ here is used in its most general sense to mean any form of digital activity. The definition therefore applies whether one is involved in low or high end numerical computation, symbolic computing, accessing data bases, performing visualization or any combination of these.

Second, ‘transparency’ is intended to imply minimal complexity so far as any user of this technology is concerned. After all, the very word ‘Grid’ was coined by analogy with other utilities in a modern society, such as electricity power Grids. The vision is that of computing power on demand, regardless of location and without the user needing to know from where it is being supplied.

Third, the ability to use computing resources regardless of their location and, therefore, managed by different people and organizations, is what distinguishes Grid computing from any of its forerunners. It is obvious from this definition that Grid computing cannot be seriously undertaken without simultaneously addressing the issue of security, for who is going to be willing to offer access to their own resources without the certainty that their users have been appropriately authorized? And there is the need, manifest in many medical and commercial activities inter alia, to prevent eavesdropping on confidential data. Accounting is also a central issue for a production Grid: who will be the first to make serious money from this technology?

Berners-Lee (1999) was able to revolutionize access to the Internet—the global network of networks by means of which all computers communicate—through the introduction of the hypertext transfer protocol (http), hypertext markup language (html) and the development of browser technology to exploit these new tools. The result was the World Wide Web, something that almost everyone has heard of today and that anyone with access to a computer will be fully aware of. Before the advent of the World Wide Web, access to the Internet was limited to a much smaller number of people, mainly located in research and technical organizations and allowed people to send electronic mail. My own home-based copy of the Chambers 20th Century Dictionary dating from 1986 does not even contain an entry for the Internet, yet its ubiquity today has changed the world we live in.

The Web greatly facilitated access to the Internet, with the result that extraordinary numbers of people started to use it, and their own engagement with it is part of the remarkable story of how the World Wide Web has evolved. Today, I expect that almost every reader of this journal will know very well what this means: a web browser enables some one sitting at a computer (or even a personal digital assistant with wireless Internet access) to view web pages as if they were on their own client machine, when in fact they could be located at an arbitrary geographical location.

Just as the Web permitted widespread use and exploitation of the Internet, the goal of Grid computing today is to facilitate analogous access to computing resources. It builds on and extends the capabilities provided by the Web and requires considerable complexity in underlying infrastructure, including the hardware, networks and ‘middleware’ (software that sits between the hardware and the scientist) comprising it. The ultimate purposes to which access to Grids may be put surely cannot be foreseen by today's pioneers of the technology; but in order to enable such a dynamical evolution to occur, one thing above all others must be addressed—the issue of usability. For without facilitating access to the resources on a computational Grid, no organic growth is possible, and the entire enterprise risks being stillborn.

Today, there are worldwide efforts aimed at building such Grids, with national and international programmes in many parts of the globe. This issue contains papers describing the US TeraGrid (Beckman 2005) and the European Union's Grid called EGEE (Gagliardi et al. 2005). The academic community in the United Kingdom has received about one quarter of a billion pounds in government funding to build up such infrastructure and to develop applications that will exploit it in the so-called e-Science Initiative.1 The UK's initiative seems to be qualitatively quite distinct from the others that I am aware of within Europe, South East Asia and the USA. In most of these Grid programmes, emphasis has been placed on building up the ‘infrastructure’, and only secondly on seeking active involvement from users. In at least one case, I recall being told that the Grid builders were anxious that their programme should not be seen as a route to free compute cycles for scientists; in another, that ‘no one is using the service, and no one is planning to’.

The problem, unfortunately, is that right now we remain quite far from realizing the definition of Grid computing that I put forward above. Partly due to the difficulties in handling security in a reasonable way, but also because insufficient attention has been paid to the needs of users, as well as because the enterprise is intrinsically hard, there is little taking place today that can be strictly called scientific Grid computing. Even in the limited number of cases where persistent Grid infrastructure does exist, it is extremely difficult to use it to do real science. For this reason, we are still in a building phase, trying to construct Grid infrastructure that will be truly usable, in a manner akin to the way in which the Web made access to the Internet easy for all.

This Theme Issue provides the reader with a record of the state of scientific Grid computing as it existed at the time that these papers were assembled in late 2004 and early 2005. It contains a wide range of contributions covering many aspects relevant to the theme, ranging from a small number of articles that report on genuine scientific Grid computing, through high-performance computing applications, to hardware, networking, middleware and human factors issues, all oriented towards assisting the scientific user. The majority of the contributions are from members of RealityGrid, a large UK EPSRC funded e-Science Pilot Project2 with considerable international reach. This project has enjoyed a number of important successes in Grid computing since its launch in 2002. I believe these are due to a rather optimal combination of expertise—a blend of scientists with sophisticated knowledge of computational issues, software engineers and computer scientists, all working together to deliver useful scientific capabilities.

The central theme of RealityGrid is computational steering. In its general form, deployed on a high performance computing Grid, it provides the scientist with the ability to choreograph a complex sequence of operations, many elements of which can be run concurrently, with the result that time to discover new insight and results is dramatically reduced. Grid-based computational steering requires interactive access and, frequently, co-allocation of resources, such as compute and visualization, through advanced reservation; such provision remains difficult to secure today but is necessary in order to realize the full potential of Grid computing. This approach is illustrated by the work reported here on the TeraGyroid and the Steered Thermodynamic Integration of Molecular Dynamics projects (Fowler et al. 2005; Harting et al. 2005). But the RealityGrid steering API and associated library have been designed for maximum flexibility (Pickles et al. 2005); such steering need not take place on a Grid and, owing to the current difficulties in working with Grids, has found most of its daily utilisation in off-Grid situations. At the time of writing, more than ten codes, from within and outside the RealityGrid project, have been made steerable in this way and the steering system has been adopted by other projects, including the EPSRC's Integrative Biology.3 There are several papers in the issue which describe steering and its wide-ranging scientific applications (for example, Kenny et al. 2005; Mason & Sutton 2005). Other steering systems exist and Eickermann et al. (2005) describe one of these.

Grid-based computational steering raises numerous issues, including important ones concerned with human factors; some of these aspects are discussed by Kalawsky et al. (2005a). Within RealityGrid we have developed a variety of steering clients for scientists to use; the hand-held personal digital assistant is described by Kalawsky et al. (2005b).

As we have previously described, much existing Grid middleware is difficult to use and has hindered the attainment of a central goal of Grid computing, enshrined in the above definition, that it be ‘transparent’ to users (Chin & Coveney 2004). Middleware innovations that have made it possible to put together, in a facile manner, the essential services required for any scientific application are discussed by Coveney et al. (2005) and Haines et al. (2005).

As important as Moore's law—the exponential growth of supercomputer performance—has been for the explosion of computational science activity, at least as important a factor has been the development of improved algorithms which can exploit the new high-end computer architectures being constructed today. In recent years, for example, we have seen the development of highly scalable classical molecular dynamics codes that are optimized to run on massively parallel architectures enabling the study of much bigger systems for much longer times than would have been possible with raw compute power alone. One paper reports on comparative performance of several molecular dynamics codes on selected state-of-the-art high-end computing architectures (Hein et al. 2005); there are also some articles describing new science being done on these platforms (Cates et al. 2005; Finn et al. 2005; Giordanetto et al. 2005; Wan et al. 2005). Clearly, an analogous message will prevail for future Grid architectures. In this case, coupled models comprising distinct and largely independent components are likely to benefit from computational Grids, provided that the algorithms concerned are suitably optimized.

While benchmarking of monolithic codes on multiprocessor machines can be done quite reproducibly, to optimize the execution time of a job on a computational Grid is quite another matter. Precisely because such Grids are expected to be highly dynamic entities, with large numbers of users (eventually, we anticipate) competing for a finite set of resources, there is a very complex optimization problem to be addressed concerning how best to handle the deployment of jobs running on Grids. This is the ‘performance control’ problem; ultimately, one would expect that this could be done automatically, with no human involvement, based on dynamically updated information on available resources, code performance on available platforms and heuristic guidelines for minimizing execution time. One paper in this issue reports on progress towards automated performance control on a restricted set of machines (Mayes et al. 2005); another describes a route to decomposition of a coupled model within a flexible coupling framework so that the individual components can be optimally deployed on a Grid (Delgado-Buscalioni et al. 2005). Such coupled models are likely to become more and more common in the next few years as they arise in all areas of science and engineering.

The role of high performance and Grid computing is gaining ground in the life sciences. Projects such as Integrative Biology, which is concerned with providing a Grid infrastructure for multi-level modelling and simulation of heart dynamics, BioSimGrid4 (Tai et al. 2004) and Comb-e-Chem5 are reflected in contributions to this Theme (Gavaghan et al. 2005; Woods et al. 2005). We have argued elsewhere for the central importance of Grid computing in modern approaches to systems biology (Coveney & Fowler 2005).

In other areas, concerned with access to the original scientific data on which publications are based, new methods for dealing with publication at source have been developed and are reported on here for the first time (Rousay et al. 2005).

The interdisciplinary nature of computational science—the fact that its powerful methods find application in so many diverse domains—is one of the compelling aspects of the field and one of its real strengths. In the same way that computational science is characterized by the common underlying problems, so computational methods, algorithms and implementations transcend domain-specific barriers. I hope that computer scientists, software engineers and scientists working in most fields of research will be able to build on the work and experiences reported here in order to help make Grid computing part of the scientist's everyday toolkit.

This Theme Issue is also a first for the Royal Society, which has encouraged me to prepare distinct online and printed versions of the theme. The online version exploits multimedia to full advantage in order to enhance some of the papers, particularly where visualization and steering are critical aids to scientific understanding. I am grateful to the editor, Professor Michael Thompson FRS, and to Cathy Brennan and the production team for supporting this project from its inception to the final products.

Acknowledgments

I must thank first and foremost the Engineering and Physical Sciences Research Council for funding the lion's share of my own scientific research in recent years, as well as the Biotechnology and Biological Sciences Research Council. Colleagues at the UK's National Supercomputing Facilities (CSAR, University of Manchester, and HPCx/EPCC, Daresbury Laboratory and the University of Edinburgh) have been of tremendous assistance in making several of our ambitious projects work at the dawn of the ‘capability computing’ era. In addition, through colleagues and collaborators in the USA, particularly Bruce Boghosian at Tufts University, I have been fortunate to have had access to National Science Foundation funded resources and facilities within the US TeraGrid and the Pittsburgh Supercomputing Center via PACI and NRAC awards. Access to such massive supercomputing resources, harnessed within a Grid environment and federated with the UK's National Grid Service, has made it possible to perform scientific research undreamed of only two or three years ago.

None of my own work would have been possible without the assistance of a highly talented group of post-doctoral research fellows and Ph.D. students. Our endeavours in Grid computing have relied heavily on strongly productive collaborations with colleagues inside and beyond the RealityGrid project, particularly with friends and colleagues at Manchester Computing, including Stephen Pickles and John Brooke. I am also grateful to a set of hard-working referees who assisted in enhancing the quality of the papers collected here.

Footnotes

References

View Abstract