Remote scientific visualization, where rendering services are provided by larger scale systems than are available on the desktop, is becoming increasingly important as dataset sizes increase beyond the capabilities of desktop workstations. Uptake of such services relies on access to suitable visualization applications and the ability to view the resulting visualization in a convenient form. We consider five rules from the e-Science community to meet these goals with the porting of a commercial visualization package to a large-scale system. The application uses message-passing interface (MPI) to distribute data among data processing and rendering processes. The use of MPI in such an interactive application is not compatible with restrictions imposed by the Cray system being considered. We present details, and performance analysis, of a new MPI proxy method that allows the application to run within the Cray environment yet still support MPI communication required by the application. Example use cases from materials science are considered.
The visualization of large datasets is a bottleneck in applications where validation of data acquisition from scientific equipment and in silico experiments is required at an early stage. Despite advances in graphics processing unit (GPU) hardware, researchers are able to produce datasets that are too large to visualize even using modern graphics workstations. A related issue is that of data location as the researcher is often not co-located with their large datasets. Computational and scientific instruments are now often scattered across the globe.
Over the last decade, there has been the introduction of a new philosophy based on e-Science that can be applied to these data creation, storage, transmission and manipulation processes and has been termed the fourth paradigm . This offers a few guidelines that we believe can and should be applied to the data visualization process when working with large datasets.
In this paper, we consider visualizing volume datasets that provide material density data acquired from a new range of X-ray imaging technologies. These include those at the Diamond Light Source (http://www.diamond.ac.uk/), the third-generation synchrotron light source (currently with a focus on the I12 Joint Engineering, Environmental and Processing beamline) and the facilities available at the Henry Moseley X-Ray Imaging Facility within the School of Materials Science at the University of Manchester . Working with researchers from the School of Materials Science, we have access to datasets that are typically 500 GB in size. Figure 1 shows images representing the work flow for processing a smaller dataset from a biological sample that was physically scanned on two X-ray computed tomography (CT) machines, each creating unique sets of two-dimensional radiographs. These can be reconstructed into three-dimensional volume datasets that, in turn, can be segmented and visualized in qualitative and quantitative ways. However, given the much larger size of new datasets and the memory limits of GPU hardware typically found in desktop visualization systems, an alternative method of visualizing these datasets is required.
One approach is to distribute rendering to a cluster of workstations acting as a visualization system with a final compositing step used to form a complete image from the partial images produced on the cluster . In this case, the bottleneck is moved to that of transferring the dataset from the acquisition or simulation hardware to the visualization system. The number of workstations available in these clusters is also often not sufficient for some very large datasets.
The large memory and core counts offered by supercomputers provide another alternative. In the case of simulation code running on supercomputers, it is possible to move the visualization code to the supercomputer  or to add visualization capabilities to the simulation code [5,6]. In the work by Manos et al., each process taking part in the simulation performed the rendering of its domain of data using raycasting, as a final step. The final image is formed by a gather and composite of each partial image. This technique has the advantage of removing any dependency on third-party visualization software and possibly the need to write data to disk (apart from the final image), but was restricted to a particular visualization technique and lacked the ability to perform explorative visualization after the simulation had finished. In both cases, rendering is often performed in software on the CPU in the absence of any GPU hardware on such systems. While this results in long render times, often purely in a batch process rather than as an interactive application, it does allow the large core counts and distributed memory of such systems to be utilized in order to render large datasets.
Remote visualization of large datasets requires an efficient parallel distribution of the data coupled with a parallel image compositing system, all preferably running at interactive rates, although this may be dependent on whether GPU hardware is available. Since 1999 Research Computing Services, at the University of Manchester, has developed various parallel CPU and GPU visualization solutions. The provision of visualization tools on large compute systems is one of the suite of tools that can increase the productivity for the researcher using that system. This idea has been taken up by national-level services. For example, the National Grid Service (NGS) had provided a pilot visualization service  and the HPCx service offered Paraview as a visualization tool . To summarize, there is a growing need for visualization services, but also an indication that the data size may be an overriding issue when specifying a new system.
The work presented in this paper has taken the approach of porting the commercial visualization application AVS/Express (http://www.avs.com) to the Cray XT4 hardware provided by HECToR, the UK National Supercomputing Service (http://www.hector.ac.uk). This allows our target end users from materials science to visualize large datasets that currently exceed the capabilities of our GPU-based visualization systems.
The remainder of this paper describes in detail our approach and is structured as follows. Section 2 describes some of the guidelines proposed by the UK e-Science programme, which, although not specific to remote visualization, we believe could be considered beneficial for any large research-based software development project that is considering providing a visualization service element. A new visualization system, in the form of software ported to a Cray XT4 supercomputer, is described in detail in §3. Section 4 then looks at initial results and performance measures and finally §5 offers some conclusions.
2. Lessons from UK e-Science
Jim Gray formulated several informal rules or laws that are designed to codify how to approach data engineering challenges related to large-scale scientific datasets. These rules or laws are specifically designed from the experiences of managing projects within the UK e-Science programme and are summarized by Szaley and Blakely  as follows.
Data intensive. Scientific computing is becoming increasingly data intensive rather than computationally intensive in terms of resource requirements. It is therefore proposed that data management is becoming a more important problem than computation management.
‘Scale-out’ architecture. A system should be able to scale in a linear manner. This automatically removes a future hurdle that would impose limits to the data or computational size that can be exploited. A linear scale may not be possible, and could be considered quite rare, but keeping the computation and order of system complexity polynomial is considered essential for managing any large system that plans on expanding.
Computations to the data. Bring the computational problem to the data, rather than transferring the data to the computation processor. It is perceived that, for large problems, the size of the program is always smaller than the data and even if the computational processors are slower at the data location this is still likely to be a faster option than moving the data.
Ask ‘20 queries’. Start the design by asking 20 questions of the researchers. The rule of 20 comes from the fact that a very small number of questions may not be enough to extract a broad picture, whereas a large number will dilute the focus. Therefore, 20 has been taken as a sensible rule-of-thumb.
Working solutions. Always go from a working solution to a working solution at every iteration of development. This is especially important when creating applications that need to scale up.
Considering these points and the user needs, we embarked upon porting a full visualization application to the system on which the data will be located, allowing the size of datasets being processed to increase.
3. Porting the visualization software
In order to provide large dataset visualization facilities, this project has taken the approach of porting an existing visualization code to a Cray XT4 system (HECToR) with 22 656 cores (as 5664 quad-cores) and 43.5 TB of distributed memory. Each quad-core has access to at most 8 GB memory. Typically, parallel codes are written using the message-passing interface (MPI) to distribute data for computation across the system. No GPU hardware is available and therefore rendering is performed in software using the MesaGL (http://www.mesa3d.org) software implementation of OpenGL. This type of CPU-based rendering will not match the rendering speed of GPU-based solutions and therefore we offer no comparison of rendering times between such systems. Our goal is to provide a general purpose visualization tool that can be used to visualize data on the HECToR system.
The AVS/Express visualization product is a scientific visualization application allowing the rendering of many forms of data. It uses the visualization pipeline method  in which data are read in to the system, possibly filtered and then mapped to geometry for rendering. At each stage, the user is able to control how that stage processes the data, whether it be choosing an appropriate file reader, deciding how the data are filtered (for example, whether they are down-sampled or cropped) and then how those data are converted to geometry. The last stage usually employs a suitable visualization technique such as isosurfacing or volume rendering to produce an image of the dataset. AVS/Express provides a user interface in which these stages are represented by modules and allows the user to connect the modules together to form a data-flow style network. Figure 2 shows an example network in which data (a CT scan of a golf ball) are read in via a file reader module before being volume rendered. A module may have its own parameters that are controlled in the user interface panel shown on the far left.
The particular version of AVS/Express to be ported is the AVS/Express Distributed Data Renderer (DDR) edition. This product provides parallel module computation by allowing the AVS modules to execute on decomposed datasets, whereby the dataset is split in to smaller domains of data and distributed to a number of compute processes. The DDR edition also provides a parallel renderer (referred to as the MPU renderer). Separate rendering processes receive geometry from particular compute processes, render an image of the geometry and then composite the images together to form a final visualization of the entire dataset. This technique is referred to as sort-last rendering  and allows a dataset to be rendered that exceeds the capabilities of a single rendering process.
The main development task was to modify the AVS/Express code such that it could operate within the HECToR runtime environment. The first key issue was to enable the execution of the main AVS/Express application (network editor, module user interfaces, visualization window) on the login nodes, where X11 functionality is available, yet communicate with parallel module and rendering processes executing on the backend nodes. The second key issue was to modify the image compositing architecture within AVS/Express so that it can use MPI on the backend nodes.
AVS/Express must be run from a login node with X11 forwarded over the secure shell (SSH) connection to the user's X server and the X server must support the GLX protocol. This is not a particularly efficient method of running an X11 application and results in the overall rendering frame rate having an upper bound that cannot be improved by parallel rendering because the limiting factor becomes the time needed to transfer the image from the login node to the user's X server. In the case of the AVS renderer, the upper bound is approximately 5.0 frames per second (fps) for a 512×512 window and 1.1 fps for a 1024×1024 window when rendering remotely to a desktop system running Linux. While this appears to be low, the visualization does in fact remain sufficiently responsive such that the user can interact with the visualization.
Several alternatives to X11 forwarding were considered. Running a VNC server on the HECToR login node would allow the AVS/Express user interface to be rendered to a virtual desktop on HECToR and displayed on the user's local workstation via a VNC client. This method can significantly reduce the network traffic associated with remote X11 applications, resulting in a more responsive interactive session for the user. A second alternative would be to run the AVS/Express user interface on the local workstation and have it communicate directly with the parallel module and rendering processes on the backend nodes on HECToR. However, the backend nodes are only visible to the HECToR login nodes. A variation of this method would be to run a port-forwarding process on the login node. This would forward any connection normally opened between the backend node processes and AVS/Express running on a login node to an AVS/Express process running on a workstation. Unfortunately, none of these methods are permitted on HECToR for security reasons. Hence, X11 forwarding over SSH from the login node to a local workstation is the only method of displaying the AVS/Express user interface.
(a) MPI forwarding
Before describing the main development task of this project, it is useful to consider the current architecture of AVS/Express DDR.
(i) Existing DDR architecture
The existing visualization code is an MPI application that comprises a number of executables, the main one being express which provides the AVS network editor, module user interface and visualization window. This is always rank 0 in the MPI job. The other components are two types of MPI executables, namely the pstnode and mpunode executables.
Figure 3 shows the current scheme: the pstnode processes execute parallel module codes according to the modules in the visualization network. A key concept is that a dataset is never accessed directly by the express process. Instead, a dataset is decomposed into a number of smaller sub-domains, one for each pstnode process. The express process will instruct the pstnode processes on how they should process their sub-domain of data. For example, a parallel isosurface module will receive parameters from the user interface (e.g. the isosurface level to compute), but the computation will take place within the pstnode processes on their current sub-domain of data. The sub-domain of data within a pstnode process remains fixed. It is this decomposition of data and encapsulation within the pstnode process that allows AVS/Express to work with large datasets. At no point should sub-domains be gathered and recomposed in the main express process.
The visualization network will specify which modules should produce renderable geometry. Any geometry produced by the pstnode processes will be passed directly to an assigned mpunode rendering process. Hence, each mpunode process only receives and renders a fraction of the total geometry in the scene. The images produced by the mpunode processes are composited together (using either depth or alpha compositing) and the final image is sent back to the express process for display in the user interface.
(ii) MPI proxy
In order that the express user interface process can be run on the HECToR login node, a number of changes to AVS/Express are required. Most significantly, all MPI functionality must be removed from the executable so that it can be run outside of the MPI job. Removing MPI would be a major change to the AVS source code and it would mean that the AVS developer application programming interface (API) exposed to the user in AVS/Express would have to support an alternative communication layer. This API allows the user to develop parallel AVS modules using MPI. The user is able to call MPI functions directly as well as use AVS functionality that relies on MPI internally. Hence, removing MPI was not considered to be a feasible solution because it would require a rewrite of the AVS developer API and break existing user-developed modules. Instead, the strategy was to develop an alternative MPI library that does not use the Cray MPI layer but still allows the express executable to be linked without major source code changes. The express executable can then continue to make MPI function calls that do not require the Cray MPI layer found on the backend nodes.
Other works have examined the use of intercepting MPI function calls for different purposes. For example, the MPI standard specifies the PMPI profiling interface to allow debugging and performance analysis tools to attach to the MPI implementation . An application can be linked to an alternative MPI library (provided by the profiling tool, for example) which collects the performance data before calling the native MPI functions. The MPI standard also states that this mechanism could be used for networking different MPI implementations, which may provide us with a method of modifying our MPI processes. The MPICH-G2  project modified MPI at a lower level to provide grid-based communication allowing MPI to run between multiple systems. However, this low-level option is not available as we wish to use the Cray MPI layer for the majority of communication between processes on the backend nodes. Both of these techniques also require that the MPI processes run within the MPI environment. The problem we are considering is that one of our MPI processes must execute outside of the MPI environment (on the login node) while the other MPI processes execute in the standard Cray MPI environment (on the backend nodes). To solve this problem, we provide an alternative MPI implementation for use by the login node process and an MPI proxy process which connects that MPI implementation with the standard Cray MPI processes. This is now discussed in detail.
Our replacement MPI library is referred to as XPMT (Express MPI Tunnel). express source includes xpmt_nompi.h (rather than the usual vendor <mpi.h>) and it is compiled as a serial login node executable. Our XPMT library contains MPI functions which communicate with a proxy MPI process via a standard transmission control protocol/Internet protocol (TCP/IP) socket. This proxy process is a genuine Cray MPI process (always rank 0 in the MPI job) running on the backend nodes. As shown in figure 4, the non-MPI express sends requests for MPI functions to be called on the compute node on which the proxy xpnode is running. This process receives the request and any required arguments to the requested MPI function.
The MPI proxy process maps XPMT's representation of MPI types to Cray MPI types. When express creates new MPI objects (communicators, datatypes, statuses etc.), the proxy creates the equivalent objects using the Cray MPI layer and a mapping between the two representations is maintained.
The pstnode and mpunode MPI processes are unchanged (they are standard Cray MPI executables) and will communicate with the xpnode process as though it were the express process. This is because they think the rank 0 process is express (and xpnode is always rank 0) and only communicate with it in response to MPI functions being called by express. For example, if express posts an MPI_Recv(), the proxy xpnode will make the same function call from rank 0. When the pstnode or mpunode processes make a corresponding MPI_Send() call, the proxy xpnode will receive the data and pass them back to the non-MPI express process. Hence, the sending processes are completely unaware that the xpnode process is a proxy for express.
It should be noted that the pstnode and mpunode processes communicate with each other via the Cray MPI layer and so benefit from this optimized library and its use of the Cray interconnect. The largest data transfer occurs between a pstnode and its associated mpunode when geometry is passed for rendering. This communication never touches the proxy xpnode process, occurring entirely within the Cray MPI domain and so suffers no change in performance as a result of removing Cray MPI from the express user interface. The amount of data sent by the non-MPI express process via the socket is in general small, because it is mainly command-and-control messages from the express user interface. The global scene graph information sent from express to the mpunode render processes is also small, because most of the geometry is generated by the pstnode processes.
(iii) XPMT performance
The use of the MPI proxy imposes a performance penalty on any MPI communication to and from the express process. Figure 5 shows execution times for various tests using the proxy and the standard Cray MPI layer. A Cray MPI executable was used entirely on backend nodes. The same tests were compiled against the XPMT replacement library so that what was rank 0 now runs on the login node and communicates via the xpnode proxy processes (which becomes rank 0 in the MPI job). The remaining ranks are standard Cray MPI executables. This matches exactly the changes made to AVS/Express. The tests perform common communication calls (MPI_Bcast, MPI_Gather, MPI_Send, MPI_Recv, MPI_Isend, MPI_Irecv, MPI_Waitall) that are all found in AVS/Express. They each send or receive using an array of MPI_INTs where the size of the array increases as: 64, 128, 256, 512, 1024, 10242, 2×10242 and 8×10242 (the Gather test uses array sizes up to 512). The graphs show the total time for all array sizes used for each test. We can see that the use of the MPI proxy can slow communication by a factor of three, approximately, in some cases. However, within AVS/Express, most of the messages from the express process are less than 1 Kb in size. The largest message sent back to the express process from the compute or render nodes is usually that containing the final composited image, which for a 512×512 window is 1 Mb (assuming 4 bytes per pixel). Given that AVS/Express is an interactive application (rather than a numerical simulation) we have found this performance penalty acceptable.
(b) Image compositing
AVS/Express currently uses the open source Paracomp compositing library (http://paracomp.sourceforge.net). The basic compositing method employed is the Scheduled Linear Image Compositing  method. While this has proved to be an effective compositing library on render clusters, the lack of MPI support in the library prevented its use on HECToR for image communication.
Having removed the Paracomp communication facilities, we have implemented the 2–3 Swap Image Compositing method . This allows all image communication between render processes to take place within the Cray MPI layer. The 2–3 Swap method is similar to Binary-Swap Compositing  but removes the need to have a power-of-2 number of render processes. This is important in AVS/Express because of the extra process introduced by the use of the MPI proxy process when considering the job queue sizes on HECToR.
Parallel image compositing reduces the time required to blend the rendered images of the sub-domains (recall the dataset is divided into sub-domains of data) from every render process. Sending full-sized images from all render processes directly to one process for blending (using either depth testing or alpha blending) would introduce a bottleneck. By dividing images at every process in screen-space (i.e. discarding rows of pixels) and exchanging these sub-images, all processes can take part in the blending operation. Eventually, every process will have a sub-image that contains a fully rendered dataset. The final step is to gather these sub-images at a single process and copy them into the image buffer. This final gather step is not a bottleneck, because the screen-space sub-images are small at this stage (each sub-image contains (1/n)th of the total number of pixels in the final image, where n is the number of render processes). All of this communication takes place on the backend nodes. The final image is sent through the MPI proxy process to the express process so that it can display it in the user interface.
(i) Compositor performance
Figure 6 shows the performance of various compositing operations. Figure 6a shows timings for one particular rank for images of size 5122 pixels and 10242 pixels. The rank chosen is the middle rank for each number of render processes (any arbitrary rank could be used). This gives a snapshot of what one particular render process is doing during image compositing. All times are totals for the entire image compositing operation. The Image exchange time is the total time the middle rank process spends exchanging screen-space images with other processes during the compositing operation. The Image blend time is the total time spent blending image pixels (using depth testing). The Image final gather time is the time spent sending the blended sub-image to the single process (usually the first mpunode render process) that gathers the sub-images from all other render processes. Note that the gather operation writes the sub-images directly into the final image buffer. The Total time is the sum of the previous three operations. Clearly, the Image exchange phase is the dominant factor when measuring the time spent compositing images on the backend nodes. As more processors take part in the compositing operation, any particular processor exchanges image segments with an increasing number of processors. However, even the slowest compositing times correspond to a frame rate of approximately 90 fps for a 5122 window and approximately 30 fps for a 10242 window. Hence, the image compositing scheme within the Cray MPI domain provides more than adequate performance for an interactive visualization application.
Figure 6b shows the times taken during the two-stage process used to send the final composited image, via the MPI proxy, to the express process on the login node. The image is sent to the proxy from the backend process that has assembled the final image. This time is insignificant because the communication occurs entirely within the Cray MPI layer. However, the receiving time on the login node is significant and therefore will limit the overall frame rate of the renderer. This image transfer occurs over the TCP/IP socket through which express and the MPI proxy process communicate and therefore does not benefit from the optimized communication available within the Cray MPI layer. The fluctuating times occurred each time the tests were performed and were because of variations in load and network traffic on the login node. While this communication time represents a significant drawback to our method, it is at least bounded across a range of processor counts. The parallel image compositing operation on the backend nodes performs at interactive rates and shows that the number of rendering processes can be increased (typically as dataset size increases). If the final image does not need to be displayed in the user interface, instead being written to disk for example, then the final image transfer step can be removed altogether.
4. Results and analysis
A large dataset has been provided by the materials science users with whom we are working. The dataset is a uniform volume of dimensions 7150×7150×7369 containing byte data (density values in the range 0–255) resulting in a 351 GB raw dataset. It is to be volume rendered and the large size of dataset makes it a useful test case because the parallel volume render module is particularly memory hungry. Each mpunode render process takes a copy of the sub-domain from the pstnode process (which is typically only executing the parallel file reader code). It also allocates another volume that is 1/32 the size of the original volume for voxel lighting calculations. For gigabyte volumes, this can be significant. It is memory usage that dictates how many processes are needed to render the dataset. Volume rendering takes place at image resolution and therefore performance is mainly influenced by the size of the final image and whether the transfer function produces semi-transparent regions in the data (this increases the execution time of the volume render algorithm). However, the AVS volume renderer does not tile the image and therefore adding more processors does not necessarily improve the frame rate. It does, however, reduce the size of sub-domain of data that each mpunode render process will have to store.
Table 1 shows statistics obtained when rendering the volume dataset for image sizes of 5122 and 10242. The number of Domains in which to divide the data is the minimum number required to volume render this dataset. Using fewer sub-domains (hence dropping down to a smaller queue size on HECToR) resulted in the mpunode-rendering processes running out of memory. The Procs per node is the number of processes running on a quad-core node with 8 GB memory divided among those processes. Increasing the number of processes per node is desirable, because it increases the intra-node communication between the processes, reducing communication over the interconnect between the nodes, but doing so reduces the amount of memory available to each process, and hence reduces the size of each dataset sub-domain. This results in more processes being used. The GB per Domain is the size of each sub-domain once the original dataset has been distributed to the backend pstnode processes. We always require GB per Domain <8 GB/Procs per node. The times given are for operations performed by the express process on the login node. The Build+Distrib operation is the construction of the scene graph and distribution to all rendering processes. The Render time is the total time taken from sending out the scene graph to receiving a composited image back from the render processes. The render processes will have rendered their data during this operation, so it is dependent on the AVS parallel volume render module. The Total time is the sum of the two times. The Frames per second time is the best time that can be achieved when rendering the data. While there is a maximum frame rate of 1.3 fps for a 10242 image, which is because of different variations in the rendering, compositing and communication times, for the smaller image size of 5122 there is a 31 per cent speed-up when we approximately double the number of processors. Despite the low frame rate, as stated earlier, users are able to manipulate the visualization interactively.
This project has ported a commercial visualization code to the Cray XT4 architecture provided by the HECToR service. Using this platform, we have visualized CT data acquired on various X-ray CT systems using up to 511 cores (255 for parallel module processing, 255 for parallel rendering, 1 MPI proxy). The system allows the user to interact with the visualization and is sufficient for interactive manipulation. The use of the MPI forwarding layer and proxy has eliminated the need for a significant redevelopment of the communication code within the visualization application, allowing the communication between the parallel module and rendering processes to remain within the vendor MPI domain. There are, however, some issues that need to be considered.
Our solution requires that the datasets be locally accessible to the Cray XT4. If the datasets have not been created on the Cray directly, they have to be first transferred to the system. This clearly goes against the ‘computations to the data’ rule given in §2. While our user cases with materials science cannot satisfy this rule, the porting of a general visualization system means that simulation codes that generate large volumes of data are able to visualize the data in-place without transferring the data to a another system.
The Cray system uses a parallel file system that enables MPI processes to read sections of the same file in parallel but even this required a few minutes to read in the 255 slices from the 351 GB raw dataset and then synchronize the MPI processes before interactive visualization could occur. This can make interactive applications difficult to use if exploring several large datasets during a visualization session.
Interactivity is another issue as the Cray XT4 uses a batch system for running jobs; therefore, it may not schedule the interactive job at a convenient time for the user. The system was not specifically designed for interactive use, but interactivity allows data exploration to occur and decisions to be made while running the job. If supercomputer systems are to support interactive visualization, the security policies should allow the use of VNC software or more direct connections to running processes (as discussed in §3) to avoid the need to display user interfaces remotely over SSH connections. This would increase rendering frame rates and hence the interactivity experienced by the user.
This paper does not consider GPU-based solutions as they are unavailable on the Cray architecture and on national service machines at present. The availability of such hardware would improve rendering times but would not require any change in the application's structure—it would simply use the GPU via the OpenGL library but the use of the MPI proxy would still be required. Although there are disadvantages to running this system on HECToR, we should remind ourselves of the clear advantages available. This may be the only time that the researcher can explore the complete dataset in near real time without downsampling or considering additional pre-processing stages. Verification and more precise visual quantitative analysis can now take place on the complete dataset rather than on a subset of the data. By providing a general visualization package, there is no need to build custom visualization techniques into simulation code.
Scientific visualization is not in itself a large e-Science project, but is often a component within one of these projects. With respect to the five informal rules, the key ones of data management and maintaining local computation have been considered, but there are still significant local data movements that go against the rules. The system works and has been shown to scale to a level that is required by the users.
Currently, remote visualization users do not appear to be requiring the next level of peta-scale computing. There are no visualization systems being proposed for the petabyte community without some form of prior data mining operation, for example. This may indicate that there is an upper bound on a scientific visualization system, but there is a trend to add high-performance parallel computation to the visualization network and therefore closer integration is inevitable and future demands on visualization systems will grow.
We acknowledge the funding and access to HECToR provided by NAG and EPSRC through their dCSE scheme. We also wish to thank AVS Inc. for their support and Dr Phil Manning for access to data.
One contribution of 12 to a Theme Issue ‘e-Science: novel research, new science and enduring impact’.
- This journal is © 2011 The Royal Society