## Abstract

The tuning of parameters in climate models is essential to provide reliable long-term forecasts of Earth system behaviour. We apply a multi-objective optimization algorithm to the problem of parameter estimation in climate models. This optimization process involves the iterative evaluation of response surface models (RSMs), followed by the execution of multiple Earth system simulations. These computations require an infrastructure that provides high-performance computing for building and searching the RSMs and high-throughput computing for the concurrent evaluation of a large number of models. Grid computing technology is therefore essential to make this algorithm practical for members of the GENIE project.

## 1. Introduction

The GENIE project (Lenton *et al*. 2007) has developed a component framework for the construction, execution and management of Earth system models (ESMs) capable of simulation over millennial time scales. Constituent models of the Earth system at varying resolutions, complexity and dimensionality can be coupled together to form a hierarchy of climate models. Grid computing technology is a key enabler within the project to aid in flexible coupling of component models, subsequent execution of the resulting ESMs and sharing and management of the data that they generate. In particular, a grid-enabled problem-solving environment is exploited for concurrent execution of multiple instances of GENIE models on computational grids.

At their heart, climate models solve a system of equations, derived from physical laws, which represent the major physical processes of the Earth system and their interactions. The numerical solution of these equations is practical only if approximations are made where the representations of the physical processes on small spatial and temporal scales are simplified or parametrized. Often, the parameters of a model do not relate to real-world physical quantities, so a key stage in climate model development is the selection of values that result in the simulation of a reasonable climatology. However, the nonlinear response of a model to its parameters and the often conflicting attainment goals make this a difficult problem to solve.

Parameter estimation has been the subject of much study within the GENIE project (see Price *et al*. (2006) for further details). Population-based multi-objective design search and optimization algorithms from the engineering domain have proved useful for parameter estimation in GENIE models. In particular, the non-dominated sorting genetic algorithm (NSGA-II) with surrogate modelling has been applied successfully for parameter estimation in a number of models from the GENIE framework (Price *et al*. 2006; Matsumoto *et al*. 2008). This optimization process involves the iterative evaluation of response surface models (RSMs) followed by the execution of multiple Earth system simulations. These computations require an infrastructure that provides high-performance computing for building and searching the RSMs and high-throughput computing for the concurrent evaluation of a large number of models. Grid computing technology is essential to make this algorithm practical for members of the GENIE project.

In this paper, we present the application of the NSGA-II algorithm with surrogate modelling to the tuning of a new model from the GENIE framework. In §2, the new GENIE model is presented and the parameter estimation problem formalized. The multi-objective optimization algorithm is presented in §3 and the use of surrogate modelling is detailed. This method presents two forms of computational challenge requiring both high-performance and high-throughput computing infrastructure to provide a timely result. We present the software environment used to meet these challenges in §4. The results of the optimization of this model are presented in §5 and we draw conclusions in §6.

## 2. Parameter estimation problem

The GENIE-1 ESM consists of a three-dimensional frictional-geostrophic ocean model (GOLDSTEIN) coupled to a two-dimensional energy–moisture balance model and a thermodynamic sea-ice model. The model is computationally efficient, designed for studies that require long time scales of integration.

We use a new version of GENIE-1, which employs an ocean vertical mixing scheme of increased complexity relative to the default included in GOLDSTEIN. In brief, an ocean mixing scheme aims to represent the effect of unresolved turbulence on simulated water mass properties. Vertical (diapycnal) turbulent mixing is thought to play a key role in sustaining or ‘closing’ the thermohaline circulation; studies suggest the diffusion of heat from the near-surface to deep ocean facilitates the upwelling of dense bottom waters against the large-scale stable ocean stratification (Rahmstorf 2003; Wunsch & Ferrari 2004). In terms of physical complexity, a wide spectrum of mixing schemes currently exists; ranging from a simple dependence on concentration gradients, through to explicit consideration of local energetics. A survey of existing literature reveals that the choice of mixing scheme exerts an important control on the resulting large-scale ocean circulation, in particular the thermohaline circulation stability (Marzeion *et al*. 2007). Our implementation of an increased complexity mixing scheme thus forms the basis for a larger study currently in progress, in which we explore the implications of vertical mixing complexity for large-scale ocean circulation stability. Future publications will provide a full description and discussion of both the scientific development of the new mixing scheme and the results of our investigations.

In this paper, we focus on the multi-objective optimization of the final implementation of this scheme as incorporated into GENIE-1. Owing to the fundamental modification to the ocean physics arising from use of the stratification-dependent mixing scheme, we consider it necessary to re-evaluate the standard GOLDSTEIN parameter set. In addition to the principal transport parameters identified in Edwards & Marsh (2005), we include two new parameters related to the stratification-dependent mixing scheme whose values are poorly constrained, but which strongly influence the model climatology. The aim of the tuning exercise is to find the optimal values for these free parameters that produce an equilibrium model end state with the closest fit to equivalent observational data. In order to reach equilibrium, the model is integrated from a uniform initial state for at least 4000 model years to reach a steady end state. On a typical desktop personal computer (PC) (Pentium IV 2.6 GHz), an integration over 4000 model years typically requires approximately 7 hours of central processing unit (CPU) time. The aim of the optimization process is to reduce the root mean square (r.m.s.) errors between model fields *s*_{i} and observational data *S*_{i} across four physical fields (*i*),(2.1)

The four physical datasets to which the model is tuned consist of three-dimensional ocean temperature (OCNTEMP) and salinity fields (OCNSAL) and two-dimensional atmospheric surface air temperature (SURFTEMP) and humidity (SURFHUM) fields. These datasets are an average over approximately five decades (spanning the period 1948–2002) and obtained from the World Ocean Atlas (Levitus *et al*. 1998) and the National Centers for Environmental Prediction (NCEP reanalysis data provided by the National Oceanic and Atmospheric Administration/Oceanic and Atmospheric Research/Earth System Research Laboratory Physical Sciences Division, Boulder, Colorado, USA, from their website at http://www.cdc.noaa.gov/). The observational data are interpolated onto the model grid and weighted by the variance of the data for each field .

## 3. Multi-objective optimization

Many optimization problems consist of competing design/attainment goals, and some form of compromise must be enforced to obtain a result. In ‘traditional’ single objective optimization, such compromise is typically achieved by optimizing a weighted sum of individual design goals. The results of employing such a summation can be highly dependent upon the weightings used. Multi-objective methods avoid this problem by investigating the problem in the objective space and seeking a set of non-dominated solutions, usually referred to as the Pareto front (PF). A Pareto set of design solutions contain members that are all in some sense optimal; in order to improve a single member in one objective, it must get worse in one or more of its other objectives. A solution is said to be ‘non-dominated’ if there is no other set member that exceeds its performance in *all* attainment goals.

### (a) Non-dominated sorting genetic algorithm

Population-based optimization algorithms map well to multi-objective problems because they maintain a family of solutions that evolve over successive generations. A suitable algorithm can seek to generate a population of Pareto optimal solutions in the objective space of the problem. There are many multi-objective algorithms based on evolutionary programming or genetic algorithms (see Zitzler *et al*. 2000) that will perform such multi-objective optimizations. Arguably, the most popular methods from the literature are NSGA-II (Deb *et al*. 2002) and Strength Pareto Evolutionary Algorithm (SPEA2; Zitzler *et al*. 2000). In this work, we use a modified NSGA-II method (Voutchkov & Keane 2006) that has been improved for gains in performance over the standard algorithm by using the genetic algorithm engine in the OptionsMatlab software package. This will be referred to as OptionsNSGA2. The algorithm maintains a population of solutions that ‘evolve’ over generations of the algorithm to generate a number of Pareto optimal model instances. At the end of the process, the modeller selects a subset of the members of the PF for further study. While this method yields high-quality solutions to the estimation problem, it places considerable demands on computational resource. At each generation, multiple evaluations of an ESM must be performed. For such methods to be practical, these evaluations must be executed concurrently. The grid computing software exploited within GENIE enables us to exploit appropriate grid computing resource to perform function evaluations in a timely and reliable fashion.

### (b) Non-dominated sorting genetic algorithm with surrogate modelling

Even with concurrent execution of the GENIE models, the large number of generations *O*(10^{2}) required to reach an optimal solution can make the method prohibitively expensive. For example, a standard application of the OptionsNSGA2 algorithm to a GENIE model would typically require several weeks of continuous computing time to achieve a high-quality result (figure 1*a*). In order to perform the parameter estimation exercise in a timely fashion, we therefore employ the OptionsNSGA2 method with surrogate modelling (figure 1*b*). The use of surrogate models with the OptionsNSGA2 algorithm can reduce, by an order of magnitude, the total number of simulation years required for a high-quality result in the calibration of a GENIE model. In the surrogate modelling approach, one seeks to build a representation of the individual objective functions of the problem using RSM techniques. This provides surrogate models of the underlying problem, which can be extensively searched at significantly less cost than the true expensive functions (Earth system simulations). The main challenge with this method is to refine the surrogate models so that they provide a sufficiently accurate representation of the objective functions, at least in the optimal regions of the parameter space. This is typically achieved through an iterative process in which the surrogate models are updated using the true response of the underlying function. By adding such evaluations to the data cache of existing points, the quality of subsequent RSMs is improved until good agreement between the GENIE model and the RSM is achieved. The resulting members of the PF are taken from the evaluations of the GENIE model and not from the RSM, thus ensuring that the results relate to the true response of the model.

### (c) Kriging

There are many RSMs that could be used to provide a cheap surrogate model for the optimization process. Voutchkov & Keane (2006) discussed some of the merits of different RSM methods, but we limit our discussion here to the Kriging method (Jones *et al*. 1998) that we have chosen to employ. Kriging is a curve-fitting technique that originated in the field of geological surveying. This method has been found to work very well for a wide range of multi-objective problems. However, there is a computational cost to building the Krig models of the underlying functions. The curvature of each Krig is controlled by a set of hyper-parameters that must themselves be tuned (optimized) to provide the best fit of the surface to the sampled data. This is achieved by maximizing a concentrated likelihood function (CLF) over a set of sampled data points. The evaluations of the CLF involve the inversion of a matrix of correlation measures (an *O*(*N*^{3}) operation) and consequently the tuning of the Krig can incur significant computational expense. For the typical scale of a GENIE parameter estimation problem (in this case 12 model parameters), the Krigs are built using in excess of 120 sampled data points, and the hyper-parameter tuning process is expensive. Since a separate Krig is built for each objective function, there is scope for concurrency, and these calculations are therefore performed in parallel on the best resource available.

Figure 2 presents the workflow of the optimization process and highlights the type of resource typically targeted for each stage of the calculation. For the benefit of the reader, we briefly summarize the algorithm, which is embedded in the OptionsNSGA2_RSM procedure of the OptionsNSGA2 software package.

An initial sampling of the parameter space is performed using a formal space filling design of experiments methods. An LP

_{τ}(Statnikov & Matuzov 1995) sequence is typically employed and the expensive simulations are performed using Condor high-throughput computing.The hyper-parameters of the Krigs are then tuned concurrently to provide the best representation of the response of the objective measures of model fitness to data across the parameter space. These calculations are targeted at high-performance computing clusters.

The OptionsNSGA2 algorithm is applied over the cheap surrogate models to perform extensive multi-objective searches over the design space.

The results of these searches are used to select update points to be evaluated on the underlying simulation code. These points are selected so that the Krigs in the following iteration will be built from higher quality data and consequently yield an improvement in the Pareto optimal points. The updates are selected from five sources:

Pareto optimal points in the Kriging representation of the objective space,

points of maximum r.m.s. error in the Krigs,

points of maximum expected improvement in the Krigs,

random points to help escape from local minima, and

points from a secondary OptionsNSGA2 process performed directly on the underlying expensive function.

The update points are evaluated concurrently by running the expensive simulations, again using a Condor pool.

The algorithm iterates by returning to point (b) until convergence.

## 4. Grid and distributed computing infrastructure

The process is made practical using Condor high-throughput computing (Thain *et al*. 2005), which enables *O*(10^{2}) update evaluations of the expensive simulations to be executed concurrently across workstations that would otherwise sit idle. This work was carried out at the University of Southampton condor pool, which is composed of approximately 1100 laboratory workstations and office PCs running various versions of the Windows operating system. The algorithm operates by improving the training data for the Krig hyper-parameters and increasing the accuracy of the stochastic process models over which the searches are performed. The tuning of the hyper-parameters of each Krig is performed concurrently on a high-performance computing cluster. At the end of the process, the Pareto-optimal members of the global data cache are returned for further analysis.

### (a) Geodise compute toolbox

GENIE has adopted software from the Geodise project (Eres *et al*. 2005) to build a distributed collaborative problem-solving environment for the study of ESMs. The Geodise toolboxes integrate compute and data grid functionality into the Matlab and Jython environments familiar to scientists and engineers.

The OptionsNSGA2 algorithm has been programmed in the Matlab problem-solving environment because this is a productive and familiar environment for engineers. The OptionsNSGA2 software package enables users to provide their optimization problem as a simple tunable function. For GENIE models, we expose a GENIE model invocation on the Condor pool as a tunable function pair; a submission function that accepts as input a parameter set chosen by the optimizer and returns a job handle, and a retrieval function that processes the model output upon completion of the simulation and returns the objective function values to the optimizer. The Krig tuning process is similarly managed by enabling the user to write a function pair that submit the tuning binary to a resource, monitor the job handle and retrieve the optimal hyper-parameters upon completion. For this work, we have exploited a licenced code for the evaluation of the hyper-parameter tuning, which is deployed on the Spitfire Microsoft Compute Cluster Server at the University of Southampton. The Globus interface of the UK National Grid Service to this system enables GENIE users from other institutions to exploit this service for the OptionsNSGA2 optimization process.

### (b) Condor DAGMan

The GENIE model used for this study needs approximately 5–7 hours of continuous CPU time on the range of resource available in the Condor pool targeted for this work. Furthermore, this particular Condor pool comprises exclusively machines running the Windows operating system and consequently Condor's native checkpointing mechanisms (which are available only for Linux platforms) could not be used. In practice, the execution of a large number of compute jobs, each requiring this duration of runtime, presents a challenge to such a pool. The nature of the resource is that jobs of this length are at higher risk of interruption due to user activity on campus. It is found that throughput degrades considerably as the time requirements of compute jobs increases. With GENIE models, we are however able to use the checkpoint and restart capabilities of the models to overcome these problems.

Condor DAGMan is a meta-scheduler for Condor that will manage the submission of a directed acyclic graph (DAG) of compute tasks. A DAG specifies a workflow that is built from activities that depend upon the prior execution of other activities. For Condor DAGMan, a simple text file is written to define the required jobs and express the explicit dependencies between those jobs. In the case of GENIE, the task graph simply expresses a linear sequence of jobs in which the output (simulation checkpoint files) from one task is passed as input to the next task. The Condor DAGMan tool then manages the submission of those tasks to the Condor pool, providing some fault tolerance through job recovery mechanisms. The simulations are divided into 10 tasks and with the individual tasks then requiring less than 1 hour of CPU time each, the execution of a large number of long simulations is reliably completed.

## 5. Results

The OptionsNSGA2 method with surrogate modelling has been applied to the parameter estimation problem for the new ocean mixing scheme. A total of 12 model parameters were tuned using a fixed budget of eight update iterations of the algorithm. The final PF consists of 48 non-dominated solutions. This highlights a key advantage of the multi-objective method; the domain expert is able to select from a range of solutions, depending on the requirements of the study. To illustrate this point from a climate modelling perspective, we show (figure 3) streamfunctions in the meridional–vertical plane, corresponding to the Atlantic meridional overturning circulation (AMOC), an important proxy for the state of the ocean circulation. The streamfunctions are obtained as top-to-bottom cumulative integrals of the zonally averaged meridional volume flux at each depth level in the ocean. The two panels in figure 3 are for two different members of the PF: member A and member B. The maximum of the streamfunction, a measure of AMOC strength, differs significantly between the two members: 26 Sv (A) versus 14 Sv (B). For a study requiring a single solution for the simulation of the AMOC in accordance with current scientific consensus (Lumpkin *et al*. 2008), the rejection of member A for exhibiting an excessively vigorous circulation might be appropriate. Alternatively, a statistically orientated study wishing to examine multiple optimal initial AMOC states might include members A and B, and others from the PF.

Selection of a parameter set from the PF is also possible by considering a weighted measure of the four single objectives. To demonstrate the improvement in objectives compared with the default parameter set, we weight each of the single objectives by the number of residuals that constitute that objective, and choose the PF member that provides the lowest combined weighting. Figure 4*a*–*c* presents a convincing reduction in the ocean temperature residuals when compared with the default point in parameter space. For comparison, the member of the PF with the worst fit to the temperature data is also shown. These results are a small part of an extensive study of our stratification-dependent mixing scheme; a complete scientific analysis will be the subject of future publications.

## 6. Conclusions

Parameter estimation is an important step in climate model development, especially in highly parametrized models such as GENIE. The multi-objective OptionsNSGA2 method is found to be successful for this class of problem, but is computationally demanding when the underlying objective function evaluations (GENIE simulations) are expensive. For timely results to be generated, the OptionsNSGA2_RSM method is employed. Appropriate computing platforms are required to execute the iterative process of building surrogate models and updating those models using the true model response. This algorithm benefits from both high-performance computing to build the surrogate models and high-throughput computing to evaluate the true behaviour of the model. Within GENIE, we target appropriate resource using the Geodise interfaces to Globus v. 2.4 and Condor. The RSMs are built using the Options optimization package on nodes of the UK National Grid Service to which (freely available academic) licence codes have been assigned. Condor DAGMan is used to progress the large number of simulations to completion using checkpoints and restarts. The OptionsNSGA2_RSM software only requires the GENIE scientist to wrap their model as a pair of Matlab functions accepting as input the tunable parameters and returning, after remote execution of the simulation on a specified resource, the multiple measures of fitness to data. The time taken from a ‘decision to study’ to an end result is, in practice, no more than 1 week. In this regard, the method is very competitive with alternative approaches that often require expert knowledge to execute. We have presented the results of this method applied to a new model in which a more complex ocean mixing scheme is introduced.

## Acknowledgements

The GENIE and GENIE*fy* projects are funded by the Natural Environment Research Council (NER/T/S/2002/00217 & NE/C515904). The Options RSM technology is supplied by dezineforce. Development of the OptionsNSGA2 software has been sponsored by Rolls Royce under the VIVACE project. The UK National Grid Service, the University of Southampton Condor pool and Spitfire Microsoft Compute Cluster Server at the University of Southampton have been used in this work.

## Footnotes

One contribution of 16 to a Theme Issue ‘Crossing boundaries: computational science, e-Science and global e-Infrastructure II. Selected papers from the UK e-Science All Hands Meeting 2008’.

- © 2009 The Royal Society