Louisiana researchers and universities are leading a concentrated, collaborative effort to advance statewide e-Research through a new cyberinfrastructure: computing systems, data storage systems, advanced instruments and data repositories, visualization environments and people, all linked together by software programs and high-performance networks. This effort has led to a set of interlinked projects that have started making a significant difference in the state, and has created an environment that encourages increased collaboration, leading to new e-Research. This paper describes the overall effort, the new projects and environment and the results to date.
Cyberinfrastructure has been defined as consisting of ‘computing systems, data storage systems, advanced instruments and data repositories, visualization environments and people, all linked together by software and high-performance networks to improve research productivity and enable breakthroughs not otherwise possible’ (Stewart 2007). Its primary role is the enablement of new e-Research (Hey & Trefethen 2003).
A coordinated effort is advancing cyberinfrastructure in Louisiana to create a statewide environment that can integrate and aggregate Louisiana's many strengths, currently distributed across its universities and industries. Starting in 2001 and catalysed by the Louisiana Vision 20/20 initiative, the state began investing $25 million per year at five research universities to advance information technology with the aim of stimulating new economic development; part of that investment led to the creation of the LSU Center for Computation & Technology (CCT). In 2003, a white paper proposed the creation of a statewide high-speed advanced data network (Allen et al. 2009) including both networks and compute resources connected by an enabling software infrastructure. Grand Challenge applications important to Louisiana were identified, similar to modelling the Mississippi River basin, petroleum engineering, astrophysics, coastal ocean observing and biological computing, where such an infrastructure would be a crucial part of retaining existing expertise and recruiting new faculty. Furthermore, the white paper highlighted the potential impact on economic development in areas such as: oil, gas and energy; information technology; entertainment; advanced materials; biotech; and environmental industries. In 2004, a statewide workshop (the Louisiana Optical Network Initiative (LONI) Forum) was organized, with national leaders and state researchers presenting on high-speed networks and applications. At this workshop, Governor Kathleen Blanco committed $40 million to build and run a statewide network (§§2 and 3), with an additional $10 million provided in 2005 to expand computing capacity.
The main contribution of this paper is to describe the various elements of the cyberinfrastructure (including networks, computers, storage, software and people), while making the point that all of these elements are needed simultaneously in order for e-Research to advance. Any one of them alone is useful, but is limited in its impact on e-Research. For example, if an institution acquires new computers, the researchers would probably use them in the same way they used the old computers. This leads to the common situation where e-Research is done on current cyberinfrastructure just as it would have been done on the cyberinfrastructure 10 or 20 years ago: batch submission of jobs followed by transfer of output data to a local system for visualization and analysis. In general, most applications are pre-existing and are merely ported from previous systems. Even new applications are usually not written to make full use of all of the cyberinfrastructure. Thus, the impact of the cyberinfrastructure on e-Research is very limited, and new science results are also limited. Here, our effort has been to make a revolutionary change in capability, and as part of this change, to encourage new applications and new collaborations, in order to make maximum use of the cyberinfrastructure, to make maximum impact on e-Research and to lead to maximum new science.
2. LONI network
Initially, a statewide 40 Gb s−1 fibre optic network was proposed to link the six major research universities (LSU, Louisiana State University; LaTech, Louisiana Tech University; SUBR, Southern University and A&M College; Tulane, Tulane University; UL-Lafayette, University of Louisiana at Lafayette; and UNO, University of New Orleans) and two health science campuses (LSU Health Sciences Centers, New Orleans and Shreveport) to each other and to the broader national and international cyberinfrastructure (National LambdaRail and Internet 2). This network has since grown to include Louisiana Public Broadcasting, Louisiana Department of Transportation, 14 additional universities and institutions in Louisiana, 50 community and technical colleges in Louisiana, one K-12 Public School System and five universities and institutions in Mississippi.
The 40 Gb s−1 network that makes up the LONI network consists of four 10 Gb s−1 lambdas in three physical rings, as shown in figure 1. The four lambdas will be used for three distinct purposes: two for connectivity between the universities, one for connectivity between the supercomputers and one for research.
LONI is managed by the Louisiana Board of Regents, which delegated day-to-day governance to the LONI Management Council. This includes members from the state government, the three Louisiana university systems, the Community and Technical College system, Tulane University, as well as advisors on economic development, science and technology. The actual maintenance and operation of the LONI network is done at LSU.
3. LONI computing
An important part of the original plan was the provisioning of compute capabilities across the network. To date, the Louisiana cyberinfrastructure includes approximately 85 Tflops of compute resources and approximately 250 TB of disk and 400 TB of tape primarily at LONI member sites. Five 13-node, 104-processor IBM P5-575 systems are at LaTech (Bluedawg), SUBR (LaCumba), Tulane (Ducky), UL-Lafayette (Zeke) and UNO (Neptune). Six 128-node, 512-core Dell Xeon systems are at LSU (Eric), LaTech (Painter), SUBR (unnamed), Tulane (Louie), UL-Lafayette (Oliver) and UNO (Poseidon). A central cluster, Queen Bee, a 668-node, 5344-core Dell Xeon system is in a state information technology building in Baton Rouge. All systems have 1 GB memory per core. In September 2007, LONI was awarded an NSF agreement to allocate 50 per cent of the cycles available on Queen Bee through the TeraGrid, and to add Queen Bee to the TeraGrid, starting in February 2008. The overall number of CPU hours available has grown from 160 K in August 2006 to 0.6 M in August 2007 and to 6 M in July 2008, and usage has grown from 39 K CPU hours in August 2006 to 0.45 M in August 2007 and to 3.7 M in July 2008.
All of the LONI systems are administered by a single team (LSU@HPC, consisting 20 staff members, nine of whom are involved in user support), with the intent that users can easily move from one system to another in the near term, and perhaps that users will be able to submit jobs to the pool of systems without knowing which system they are using in the longer term. All account and allocation information is maintained centrally to enable this idea. A committee that includes representatives from each of the six research universities allocates cycles on the set of systems. Small requests (less than 50K CPU hours) are immediately reviewed, and larger requests are reviewed quarterly. Additionally, 10 per cent of the cycles are reserved to encourage economic development. During the first year (2006) in which allocations were awarded, there were 14 project allocations, and this grew to 178 in 2007 and to 607 in the first three quarters of 2008. The number of LONI users grew to 375 in October 2007 and was 450 in August 2008. Partially overlapping this, there were 970 total users of LSU or LONI systems in October 2007 and 1350 in August 2008. Additionally, due to the joining of the Queen Bee to the TeraGrid, there were an additional 1300 TeraGrid users on the Queen Bee. The main research areas are astrophysics, chemistry and mechanical engineering (computational fluid dynamics, CFD), high-energy physics, hurricane and storm surge modelling, biocomputing and material science.
The LSU@HPC team also provides user support and training for the current and potential users. This includes a help desk, consulting and advanced user support. The HPC@LSU team, working with other experts at the Louisiana institutions, provides training to encourage usage of the LONI hardware and software systems. This includes 1–2 day general workshops approximately twice per semester at varying sites around the state, and focused 3-hour sessions 10–15 times per semester at LSU, distributed via Access Grid to other institutions. The specific topics in both the general workshop and the focused training sessions depend on user demand, but cover topics including how to get allocations and accounts, compilation, optimization, message passing, threads, scientific software, etc.
4. Tools and services
In order to promote new e-Research and achieve the vision of LONI as a single pool of resources, the main challenge is to provide effective scheduling and data management. Work in metascheduling is part of the CyberTools project. Co-scheduling is enabled by Highly Available Resource Co-allocator (HARC), and data management is delivered by the PetaShare project. Simple API for Grid Applications (SAGA) then provides a platform-independent application programming interface (API) for application development to make it easy to move to and from LONI.
CyberTools (cybertools.loni.org), an NSF-funded collaboration of nine Louisiana institutions, is aligning cyberinfrastructure development with the needs of a wide range of scientific applications. CyberTools is working with targeted application groups to integrate, extend and deploy a range of coordinated services necessary for the next generation of applications. Many of these services and tools are developed in research projects in Louisiana (e.g. HARC, Cactus, SAGA and PetaShare). The project is structured to facilitate a deep interaction between applications and tools developers, crucial for both to progress. In particular, targeted applications drive the development and deployment of the tools, which in turn enable the increased capability of the applications. Although the project is based on two broadly defined science application areas—biosensors and biotransport processes—CyberTools has a goal of providing tools to many other scientific domains and involves researchers from coastal engineering, porous transport and astrophysics.
CyberTools provides a fast-track aimed primarily at the accelerated uptake of existing infrastructure by providing easy-to-adapt integrative solutions, and a deep-track to provide advanced capabilities needed for future Grand Challenge-type problems. The deep-track is more research oriented, involves incremental capability development and requires sustained interaction between tool developers and science drivers. In general, the fast-track enables applications to use features by lowering the barrier to entry, while the deep-track involves the design and development of toolkits and mechanisms that provide advanced capabilities.
CyberTools is divided into four work packages, for data scheduling, information services, visualization, and applications and application toolkits. To highlight the strong coupling between science drivers and CyberTools, we outline some of the issues that are being addressed within WP4 (applications and application toolkits).
Many simulation components need to be assembled for a full CFD simulation, needed for several science drivers—different numerical schemes, solvers and domain representations are needed for different scientific problems. CyberTools is creating a Cactus-based toolkit that can mix-and-match these different components. The ‘plug-and-play’ mechanism needs to be available for multiple CFD applications, and CyberTools is building a general purpose CFD toolkit rather than a single problem-solving environment. A specific example of the fast-track approach in WP4 is embedding a legacy retinal transport code using a multiblock for the Navier–Stokes equations into the Cactus framework; this enables scientists to take advantage of existing distributed cyberinfrastructure while using an existing and previously validated code. A component-based approach to facilitate multiphysics and multiscale coupling of disparate codes provides an interesting and important example of a deep-track research problem being addressed in WP4. Additional examples include collaborative work with science drivers to scale codes in preparation for being able to use the next generation of high-end machines.
Another way in which existing distributed cyberinfrastructure can be used is by providing support for application patterns used by scientists. This is being done, for example, via application managers or hosting agents that support all stages of the deployment and execution life cycle of an application. A specific example of an application-level pattern that we support is replica-exchange (RE), which is used for a wide range of scientific problems.
(b) Highly Available Resource Co-allocator
The HARC (MacLaren 2007) is an open-source system for reserving multiple resources in a coordinated fashion. HARC handles different types of resources, and has been used to reserve time on supercomputers across a US-wide testbed, together with dedicated lightpaths connecting the machines. An example for the use of HARC is to reserve a LONI supercomputer at one site, a network between that computer and a visualization resource, and the visualization resource, so that a job can be run and visualized in real time. A second example would be to reserve a set of supercomputers for a hurricane forecast ensemble using input data that will be delivered in 6 hours.
PetaShare (Balman et al. 2008) is an NSF-funded project that is deploying additional disk and tape storage at LONI sites and developing user-friendly data-aware storage systems, data-aware schedulers and cross-domain metadata schemes (figure 2). PetaShare leverages the LONI infrastructure fully exploiting high-bandwidth, low-latency optical network technology. It links over 50 senior researchers and 200 graduate and undergraduate students from 10 different disciplines to perform multidisciplinary research. Application areas supported by PetaShare include coastal and environmental modelling, geospatial analysis, bioinformatics, medical imaging, fluid dynamics, petroleum engineering and high-energy physics. PetaShare currently manages over 200 TB of distributed disk storage across Louisiana, and will manage 300 TB of disk and 400 TB of tape when fully operational. Several sites from Mississippi and Alabama are interested in joining PetaShare.
PetaShare provides a ‘petashell’ interface, allowing users to access remote and distributed data in the same way as local data, without requiring changes to their applications. Additionally, a Web interface called ‘petasearch’ provides a keyword search across the distributed PetaShare archives, returning (for each file) a logical file name, an archive to which the file belongs and the file's physical location. This cross-domain ontology-based scheme enables powerful searches across multiple science domains and across multiple physical sites for new e-Research applications.
(d) Simple API for Grid Applications
The SAGA is a proposed recommendation of the Open Grid Forum that defines a high-level programmatic interface for developers of distributed applications. The fundamental goal of SAGA is to lower the barrier for applications and application scientists to use distributed infrastructure. SAGA provides a simple, uniform, stable interface to the most often required functionality in order to construct general purpose extensible and scalable applications. The SAGA effort has been led by researchers at LSU. We are also developing several different novel applications using SAGA to harness the power of distributed infrastructure. SAGA has already been used to develop different types of distributed applications: converting legacy applications to use distributed resources; developing applications based upon abstractions and frameworks that are themselves developed using SAGA; and new first principles applications, explicitly cognizant of the fact that they will operate in a distributed environment, where the application logic is coupled with the distributed logic (figure 3). SAGA supports the development of these applications and many others, thus providing a tool to develop a broad and general class of applications.
SAGA facilitates the use of distributed infrastructure by providing a simple interface across different middleware distributions and environments. Therefore, once an application has been written using SAGA, it can be deployed and run on any environment in which SAGA is supported. We are developing adaptors for the most commonly occurring distributed environments. Additionally, SAGA provides the abstractions from which commonly occurring execution patterns and usage modes can be supported. For example, for data-intensive applications, we create a framework that supports the common MapReduce pattern. Applications involving basic functionality such as searching can then be deployed over distributed environments.
SAGA is being used within the CyberTools project in several critical ways: it is being used to create a general purpose ‘application manager’ that will enable many science drivers to use remote LONI machines without any changes to the execution environment. In particular, it can be used to support specific application usage patterns. For example, it has been used for distributed RE simulations using nanoscale molecular dynamics (Luckow et al. 2008). The same infrastructure can be used with other codes such as large-scale atomic/molecular massively parallel simulator etc. Additionally, SAGA has been interfaced with Cactus applications (Jha et al. 2007, 2008) to use information services and other advanced cyberinfrastructure features. Finally, SAGA will also provide the basic capability for interfacing multiphysics applications (via extension to the API to support messaging).
The final aspect of cyberinfrastructure is the people. The LONI institute is addressing this area, by setting up a collaboration across the six LONI research universities, funded jointly by Louisiana and the universities, initially focused on computational materials, computational biology and computational science. This institute is hiring 12 faculty members (2 at each site), 6 computational scientists (one at each site), and is funding 6 graduate students per year over 5 years. The faculty and the computational scientists will start partnering projects with each other and with industry, with the goal of infusing computational science throughout the universities, leading to growth in: competitiveness of researchers in seeking funding and universities seeking the best students, staff and faculty; Louisiana's high-tech workforce; industry–academia cooperative programmes; and economic development.
This effort to advance Louisiana provides a model for building regional cyberinfrastructure based upon local needs but consistent with and integrating into national-level efforts. It has led to an unprecedented level of partnering between the state's research universities and to the largest amount of per capita computing accessible to the state's researchers. In addition, the ongoing projects to develop software and services and to increase collaboration are building a base that has the potential to transform the state's educational system and economy.
(a) A cyberinfrastructure enabled application
One example of the new e-Research that this cyberinfrastructure enables is National Incident Management and Advanced Technologies (NIMSAT), a national-scale homeland security centre that has been established at and led by UL-Lafayette: focusing on conducting research; developing tools; and providing planning, training, exercising and operational support for incident managers. The NIMSAT institute leverages LONI cyberinfrastructure to develop disaster management applications for decision makers to help understand emerging threats and manage disasters more effectively. For the decision making to be near real time and interactive, the underlying cyberinfrastructure needs to both allow and help decision-making tools access data from multiple data sources in near real time; allow and help disaster models (such as storm surge predictions) run models at a fine granular level and seamlessly interoperate with various other disaster management applications (transportation, evacuation, storm impact and relief operations); and support visualization tools. Also, these applications need prioritized access to data, network and computing resources that are normally shared among multiple users. And finally, the applications are designed and built by collaborating researchers across the state.
NIMSAT is currently developing a disaster response application called the Points-Of-Distribution (POD) tool that could be used by emergency managers (pre- or post-disaster) to effectively plan the government distribution of basic commodities (food, water, ice, tarps, prophylactics, etc.) to provide sustenance to people in the immediate aftermath of a disaster. The POD tool would geospatially analyse the post-disaster ground reality in terms of identifying the requirements of the affected people, identifying the road networks that are passable and the available government facilities for setting up a POD, etc. Owing to the computational complexity of the algorithm and the need for immediate results post-disaster, the POD tool has been designed to use computing resources of LONI on the back-end while providing an interactive Web-based interface for emergency management authorities to use this POD planning tool from their desktops. The facility location algorithm for this POD tool was parallelized and would only take few seconds on Zeke (a LONI supercomputer) instead of several hours on a server. In addition to the computing, network and storage resources of LONI, disaster management applications can use several of the LONI tools and services for resource reservation and provisioning, data scheduling and visualization. For example, HARC can be used for advanced resource reservations on multiple supercomputers and SPRUCE can provide a secure urgent access to computing resources during emergencies.
(b) High-energy physics processing
LONI is enabling scientists in Louisiana to significantly contribute to major international e-Research projects that have high demands for high-performance networking and computing. The LaTech High Energy Physics (HEP) group has been very successful in applying the high throughput capabilities of the Open Science Grid (OSG) software to the LONI network for the extreme data processing demands of experiments in the USA and, soon, in Europe. These physicists are involved in the DØ experiment at Fermilab (near Chicago) and the ATLAS experiment at CERN (Geneva). Contributing to this effort is Department of Energy-funded research on high availability grid computing by LONI computer scientists.
LaTech established an OSG compute element for HEP grid computing on Eric (at LSU). The LaTech group members were the first users of Eric when they participated in the massive Spring 2007 DØ reprocessing effort. Of the 455 million events processed remotely, LONI processed more than 47 million, and was one of the top two performing OSG sites, using more than 200 LONI processors. LaTech is currently continuing Monte Carlo production for DØ and for the ATLAS experiment on Eric, with plans to have additional OSG compute elements established on two other LONI Dell clusters, including Painter (at LaTech). A key feature of the new installations will be access to PetaShare storage at each of the sites.
(c) Additional collaborations
In addition to new projects, the largest consequence of the statewide shared cyberinfrastructure has been the development of a shared culture of collaboration. Three examples of this are the awarding to the LONI collaboration of an NSF agreement to put Queen Bee on the TeraGrid; the PetaShare project, which combines computer science research at LSU with application research at UNO, Tulane, LSU, UL-Lafayette, SUBR and Johns Hopkins; and the LONI Institute, where the LONI universities are coordinating hiring of 12 faculty members and six computational scientists across the six universities, rather than having each university do its own hiring, possibly in competition with each other. The results of this can be seen by the last year's 29 papers that involved multi-university collaborative e-Research. This metric is expected to increase during the coming years, particularly as the LONI Institute starts selecting new projects and supporting them with effort from the computational scientists.
The efforts that have been put in place to build a coordinated, statewide cyberinfrastructure are beginning to make an impact on e-Research in Louisiana, catalysing new collaborative projects that are building new e-Research applications, supporting network, compute and storage intensive research and attracting new faculty to our institutions.
The institutions and the state are actively building on the foundation described in this paper. For example, a new multidisciplinary hiring initiative in computational science at LSU is seeking to recruit six new leading faculties, and the CCT continues to recruit joint faculty with many different campus departments. A new centre, the Center for CyberSpace, is bringing new faculty and research to LaTech and LSU. Also, a new NSF EPSCOR solicitation across Louisiana, Alabama and Mississippi is intended to fund research related to building joint cyberinfrastructure.
One important component that has led to the success of these programmes in Louisiana is the continued and sustained support from the high levels of administration in the state and the institutions. This is not just in funding the infrastructure and associated projects but also in supporting the changes necessary to enable multidisciplinary e-Research through joint faculty hires, the hiring of research staff and new models of collaboration. This has been possible despite the turnover in state government and institutions (e.g. LSU has passed through four chancellors and three provosts since 2002), and illustrates the importance the state places on cyberinfrastructure as a pathway to e-Research, education and economic growth.
In summary, the efforts that have been put in place in Louisiana have started to make an impact in new faculty members, new staff, new tools and new e-Research. The ‘Louisiana model’ is being talked about in the USA and around the world as an example for other states and regions to follow.
We would like to acknowledge support from the State of Louisiana (specifically, the Louisiana Board of Regents (LEQSF(2007-12)-ENH-PKSFI-PRS-01)), NSF (OCI-0710874, LEQSF(2007-10)-CyberRII-01, CNS-0619843), NIH (NCRR P20RR016456) and the UK EPSRC (GR/D0766171/1).
One contribution of 16 to a Theme Issue ‘Crossing boundaries: computational science, e-Science and global e-Infrastructure I. Selected papers from the UK e-Science All Hands Meeting 2008’.
- © 2009 The Royal Society