Perturbed physics experiments are among the most comprehensive ways to address uncertainty in climate change forecasts. In these experiments, parameters and parametrizations in atmosphere–ocean general circulation models are perturbed across ranges of uncertainty, and results are compared with observations. In this paper, we describe the largest perturbed physics climate experiment conducted to date, the British Broadcasting Corporation (BBC) climate change experiment, in which the physics of the atmosphere and ocean are changed, and run in conjunction with a forcing ensemble designed to represent uncertainty in past and future forcings, under the A1B Special Report on Emissions Scenarios (SRES) climate change scenario.
Predictions made with climate models are widely and increasingly used in policy making (Schellnhuber et al. 2006). Since forecasts are of limited use without an associated uncertainty range, climate modellers are usually encouraged to attempt to quantify the uncertainties that arise from various sources, including past and future forcing uncertainty, initial condition uncertainty and both parametric and structural error (Palmer & Hagedorn 2006). The most comprehensive way to do this is with a perturbed physics ensemble, in which uncertain parameters and parametrizations in climate models are perturbed (Allen 1999; Allen & Stainforth 2002).
Because many of the techniques used in shorter range forecasting to select perturbations are not directly applicable to the climate problem (e.g. we have no way of defining how ‘close’ two models are in terms of their formulation (Lea et al. 2000; Smith 2002)), we have no way of knowing which perturbations will give rise to a representative spread of model uncertainty. Therefore, large ensembles that sample parameter space more comprehensively are needed for quantifying model error.
Until recently, large ensembles for the analysis of model error have been done with simple or intermediate-complexity models to save on computer processing time (e.g. Forest et al. 2000) or by generating ‘pseudo-ensembles’ by rescaling the output of a single atmosphere–ocean general circulation model (AOGCM; Allen et al. 2000; Stott & Kettleborough 2002). This is limited, however, to predicting uncertainty ranges on just a few global climate variables. In order to quantify uncertainty in regional forecasts, we need a large perturbed physics ensemble of fully coupled AOGCMs.
In the last few years, two perturbed physics experiments based on very similar models have been undertaken: the ‘Quantifying Uncertainty in Model Predictions’ (QUMP) project and the climateprediction.net (CPDN) project (Allen 1999). Both projects have attempted to explore and, where possible, quantify the uncertainty in climate predictions resulting from uncertainty in physical parameters and parametrizations. Both projects began by focusing on the equilibrium response to a doubling of CO2 in an atmosphere model coupled to a slab ocean (Murphy et al. 2004; Stainforth et al. 2005). More recently, QUMP has produced results exploring the transient response in an ensemble of 17 perturbed atmosphere general circulation models (GCMs), coupled to a single ocean (Collins et al. 2006).
Promisingly for the future of such perturbed physics ensembles, Collins et al. (2006) found that their transient climate response simulation (forcing of 1% yr−1 CO2 increase) almost spanned the range of responses in global mean temperature found in the Intergovernmental Panel on Climate Change (IPCC) AR4 multi-model transient ensemble, even with only one ocean. The inclusion of more oceans provides a means of spanning a wider and more representative range of uncertainties, which is necessary in order to develop a more comprehensive understanding of climate change. In addition, CPDN has also conducted a forcing ensemble designed to explore uncertainty in past (and future) forcing. This is an important and often unrecognized source of uncertainty in climate change projections: recent papers by Kiehl (2007) and Brohan (in preparation) have shown that the Coupled Model Intercomparison Project (CMIP) models undersample uncertainty in historical forcing: specifically, across the CMIP ensemble there is too good a correlation between the climate sensitivity of the model and the level of twentieth century aerosol forcing used in that model. By exploring a range of plausible historical forcings, we can attempt to quantify the impact of this uncertainty on projections of twenty-first century climate change.
The CPDN project addresses these issues by running a large perturbed forcing, perturbed physics and perturbed initial condition ensemble of AOGCMs simulating climate change for the period 1920–2080. By weighting models according to how well they fit the observational record, CPDN hopes to provide a broad and reasonably comprehensive uncertainty estimate of climate change over the first half of this century.
The CPDN project has a couple of unique and innovative aspects: first, the ability to explicitly resolve more of parameter space than smaller perturbed physics ensembles. Although there are putative methods for emulating the behaviour of perturbed physics GCMs, where one has not sampled the parameter setting under investigation, the only way to test these methods comprehensively is to run the actual model. Stainforth et al. (2005) showed that, although reasonably simple methods for making inferences about the effects of combining physics perturbations in GCMs held in many cases, they fail quite badly about a quarter of the time, especially in those cases where quite large perturbations are being made. Knight et al. (2007) demonstrate that effects of parameter, hardware and software variation are detectable, complex and interacting. However, most of the effects of parameter variation are due to a small subset of parameters. Notably, the entrainment coefficient in clouds is associated with 30 per cent of the variation in climate sensitivity (CS) seen, although both low and high values can give high CS. The effects of hardware and software are small relative to the effect of parameter variation and, over the wide range of systems tested, may be treated as equivalent to that caused by changes in initial conditions. Preliminary analysis of duplicate runs in the BBC CCE echoes these findings. Recent work has aimed at infilling the space between parameter settings by training neural networks (Sanderson et al. 2008) from the slab ensemble. These processes seem to work reasonably well in well-sampled regions of parameter space, where there are lots of models to learn from but, rather unsurprisingly, work less well in more sparsely sampled regions of parameter space. The second innovative aspect of CPDN is the ability to investigate responses of a perturbed physics ensemble to an array of forcing ensembles. CPDN thus provides a unique tool with which to learn about the combined effects of forcing uncertainty and climate response uncertainty. As recent research has suggested that even quite aggregate aspects of the physical climate response can behave differently depending on the forcing (Joshi & Gregory in press), this is an important contribution that CPDN can make to our understanding of twentieth and twenty-first century climate.
A sample of preliminary results taken from the control ensemble of the British Broadcasting Corporation climate change experiment (BBC CCE) are presented later to illustrate the spread of physical responses to control forcings. A more full analysis of results from the transient ensemble will be presented in a separate paper.
2. Models and experiment set-up
The first experiment to be launched under the CPDN distributed computing infrastructure used the UK Met Office Hadley Centre ‘slab’ model, HadSM3, a global atmosphere model coupled to a simplified single-layer (slab) ocean of constant depth, which represents the thermodynamic properties of the oceanic mixed layer. The coupling period between atmosphere and ocean is 1 day. The atmosphere is based on Pope et al. (2000) and has 19 levels in the vertical, with a horizontal resolution of 2.5° latitude by 3.75° longitude. The atmosphere component equations are a quasi-hydrostatic version of the primitive equations with full representation of the Coriolis force (Cullen 1993). The prognostic variables are surface pressure, horizontal wind components, liquid water potential temperature and total water mixing ratio. A dynamical integration time step of 30 min was used, with physics parametrization run every 3 hours. The dynamical equations are formulated to conserve mass, energy, momentum, angular momentum and total water. Parametrization processes include convection (Gregory & Rowntree 1990), radiation (Edwards & Slingo 1996), large-scale cloud (Smith 1990; Gregory & Morris 1996) and precipitation processes (Senior & Mitchell 1993; Gregory 1995), gravity wave drag (Gregory et al. 1998), boundary-layer physics (Smith 1990) and land-surface processes (Cox et al. 1999).
The single-layer ocean component (Williams et al. 1999) has the same horizontal resolution as the atmosphere. A heat flux convergence field, representing the convergence of heat from neighbouring oceanic grid points, is applied to make up for the lack of ocean dynamics and to ensure a realistic sea surface temperature (SST) distribution. The ocean model is integrated with a time step of 1 day. The sea ice component is a zero-layer thermodynamic model on the ocean grid with a formulation to account for advection (Crossley & Roberts 1995).
The second experiment to be launched under CPDN used the same slab model, but was extended to include an interactive sulphur cycle. Details of the model set-up in this experiment can be found in Ackerley et al. (submitted b).
HadCM3L (Jones & Palmer 1998) has the same atmosphere component and coupling period as HadSM3 but, instead of a slab ocean, it is coupled to a three-dimensional dynamic ocean component. The ocean is based on the primitive equations model of Cox (1984), with a horizontal resolution the same as the atmosphere (2.5°×3.75°) and 20 levels in the vertical. Level thicknesses range from 10 m near the surface, in order to resolve the mixed layer, to 616 m near the bottom of the ocean. The mixed layer uses an energy balance model (Kraus & Tuner 1967), and negative surface buoyancy fluxes are convectively mixed down to a neutrally buoyant level. Wind stress contributes to turbulent kinetic energy (wind mixing energy), which is assumed to decay exponentially with depth. Vertical diffusion is performed for both tracers and momentum with a simplified Large et al. (1994) scheme in the mixed layer and, below this, a scheme based on Pacanowski & Philander (1981). A latitude-dependent horizontal diffusion is also applied. Eddy parametrization due to Gent & McWilliams (1990) is included. A mixing parametrization is applied to simulate outflow from the Mediterranean Sea (scaled to simulate a throughflow of 1 Sv, where 1 Sv=106 m3 s−1) because there is no direct connection between the Mediterranean and the Atlantic in the model's topography. A simplified sill overflow parametrization is also included for the Greenland–Scotland ridge (Gerdes et al. 1991; Roether et al. 1994). Shortwave solar radiation is selectively absorbed with depth using a double exponential decay; attenuation coefficients are given by Paulson & Simpson (1977). The sea ice component is the same as that for HadSM3, except that the top-level ocean current is used to advect sea ice.
In addition, the BBC CCE version of HadCM3L includes a modification to the ocean bathymetry: Iceland was removed and the Denmark Straits deepened, as suggested by Jones (2003). This improves the northward transport of heat in this coarse-resolution ocean. The BBC CCE also includes the interactive sulphur cycle described by Ackerley et al. (submitted b).
3. BBC CCE
The BBC CCE uses the CPDN distributed computing infrastructure described in Christensen et al. (2005). The two principal differences between the slab (Stainforth et al. 2005) and BBC CCE experiments are that the BBC CCE uses the model with fully dynamical ocean, HadCM3L, described earlier, and that it is run under transient forcing—historical forcings for the period 1920–2000 are applied, together with an ensemble of possible future forcings for 2000–2080. Control simulations corresponding to an unforced or stationary climate in the year 1920 are also run, so that each physically distinct model in the experiment can be checked for spurious model drifts.
The slab model experiments are made up of three phases, as described in Stainforth et al. (2005), and illustrated in figure 1. During a 15 year calibration phase, SSTs are relaxed to prescribed climatological values and a heat flux convergence field (Williams et al. 1999) is derived from a mean of the final 8 years of the diagnosed heat flux into the mixed layer. Two 15 year phases are then started from the end of the calibration phase: a control phase and a double CO2 (2×CO2) phase. In both of these phases, SSTs are allowed to vary freely, but heat into the ocean is constrained by the heat flux convergence field derived from the calibration phase. In the double CO2 phase, an instantaneous doubling of CO2 is also applied.
The coupled model control experiments were initialized using the spin-up method described in Jones & Palmer (1998). This method includes a long (greater than 1000 years) ocean-only phase, in order to speed up the equilibration of the ocean component, before a 50 year coupled atmosphere–ocean phase, in which SSTs and salinities (SSSs) are relaxed back to climatology (Levitus & Boyer 1994; Levitus et al. 1995); this process is known as ‘Haney forcing’. The control experiments are then run with flux adjustment fields for heat and fresh water computed from the final 10 years of the Haney phase. The greenhouse gases, solar constant, background volcanic aerosol, ozone and sulphur emissions were all set at levels appropriate to pre-industrial 1860 conditions. The double CO2 experiments were run under the same conditions, but with an instantaneous doubling of CO2 applied.
For the BBC CCE, it was decided not to use the Jones & Palmer (1998) ocean-only phase, since it was found that the main effect of this phase was a change in ocean state with no notable improvement of the surface climate. Instead, an extended coupled Haney-forced phase, of up to 200 years, was used, and only those models that appeared most stable were subsequently selected for running the climate change experiments. In addition, in order to reduce the length of the experiments, inputs to the model were all set at levels appropriate to 1900 conditions instead of 1860 conditions. The Haney forcing SSTs used were an average of years 1871–1900 of the HadISST1 dataset (Rayner et al. 2003).
Transient simulations pose a particular challenge to perturbed physics experiments: in general when atmospheric and ocean GCMs are coupled together, the models are out of balance and drift in (for example) temperature and salinity. In the general case, this necessitates adjustments to the fluxes between the atmosphere and ocean to maintain a stable, reasonably representative model climate. Situations in which models do not require flux adjustments are desirable, since flux adjustments can spuriously suppress modes of variability, and most modern climate models do not require such brute force ways of maintaining balance. Unfortunately, however, such un-flux-adjusted combinations are comparatively elusive in perturbed physics ensembles, since the effects of somewhat randomly combining atmospheres and oceans do not generally lead to a balanced model: changing the physics in climate models has implications for the ‘new’ model's energy balance. Collins et al. (2006) found that the use of flux adjustments to deal with the general problem of energy was generally reasonable at large scales and, in the BBC CCE, we have developed a novel way of maintaining energy balance while limiting the number of full model spin-ups we undertake.
In total, 10 different oceans and 155 different atmospheres have been explored, giving a total of 1550 physically distinct atmosphere–ocean systems. Since the ocean evolves on much longer time scales than the atmosphere, a large proportion of the computing cost of these simulations is that required to spin up the ocean into a realistic state (of the order of hundreds of model years). Even under such a powerful computing infrastructure as CPDN, it is computationally prohibitively expensive to spin up anywhere near 1550 models. In designing the experiment, therefore, an alternative method was sought, and found to be successful—it involves the flux ‘readjustment’ method described in detail in Faull (2005) and Faull et al. (in preparation).
In this method, the atmosphere–ocean fluxes associated with each perturbed atmosphere in the slab experiment were taken and applied when that atmosphere was coupled to the fully dynamical, but not spun up, ocean. In cases involving the standard ocean component, this technique led to a climate response acceptably similar to that resulting from a model in which the same perturbed atmosphere was coupled to a fully spun-up ocean. In Faull (2005) and Faull et al. (in preparation), this linearity assumption was extended to account for physics perturbations made to the ocean component of the coupled model. This was carried out by applying a correction to the surface fluxes in the same way as we do for conventional flux adjustments that account for perturbations to the atmospheric physics. The additional flux adjustment accounting for ocean perturbations is referred to as a flux readjustment in Faull (2005) and Faull et al. (in preparation) and in the rest of this paper. The evidence suggests that the flux readjustment by and large works; although there are unrealistic aspects to the readjusted climatology, we generally find the readjustment of heat and fresh-water fluxes gives rise to a relatively stable base climate. Under transient forcing, the flux readjustment technique reproduces a climate response closer to that of the same model version with a full ocean spin-up than to that of the unperturbed model (Faull et al. in preparation). It should be noted that the linearity assumption only holds for physics perturbations to the fast components of the coupled model, such as the atmosphere and land-surface scheme, since their adjustment time is short; by contrast, the adjustment time scale of a physics perturbation to the ocean is too long for such a linear correction to be useful.
(a) Forcings in the BBC CCE
The BBC CCE used four datasets to obtain a range of plausible solar forcings (Hoyt & Schatten 1993; Lean et al. 1995; Solanki & Krivova 2003; M. Lockwood 2005, personal communication). There is a reasonable amount of variation among these datasets, which are all based on observations. In case all of these substantially underestimate the actual trend in solar index (in which case it could be argued that a large part of the observed warming in the second half of the twentieth century might be caused by the Sun), we have arbitrarily created a fifth dataset by doubling the trend in solar index in the Lean, Beer and Bradley dataset. Since future solar forcing is highly uncertain, the BBC CCE created three scenarios: in the first, the solar index carries on increasing at the same rate it has increased over the past 80 years; in the second, it decreases at the same rate; and in the third, it shows no significant trend either way. It seems reasonable to assume that reality will lie somewhere in between these cases.
There is also considerable uncertainty in observations of volcanic emissions in the past, particularly in the pre-satellite era. For the past 80 years, we have created five datasets based on the Sato et al. (1993) and Ammann et al. (2003) observations of volcanic aerosol in the stratosphere. These data are divided into four latitude bands of equal area: 90° S to 30° S, 30° S to the equator, the equator to 30° N and 30° N to 90° N. For the future, we have created 10 possible scenarios, as we have, of course, no idea what volcanoes may erupt where. One scenario simply repeats the recent past according to the Sato et al. (1993) and Ammann et al. (2003) dataset. Two more are based on observations of the preceding 80 years, based on the Sato et al. (1993) datasets. The remaining seven are subsets of observations of 1400–1960, based on a dataset constructed by Crowley (1996). In each case, it was ensured that, as observed, there were no major eruptions in 2000–2005.
(b) The BBC CCE ensemble
The results of the slab experiment have been used to inform the BBC CCE; atmospheric parameters were not sampled as completely in the BBC CCE as they were in the slab experiment. Instead, atmospheric parameters were chosen by interpolation. A combined root mean square error (RMSE) was derived from three different observation types: surface temperatures; top of atmosphere radiative fluxes; and total precipitation (Sanderson et al. 2008). For each observation, Empirical Orthogonal Functions (EOFs) were taken in the space defined by the area weighted Giorgi regions in the ‘spatial’ dimension and the ensemble in the ‘temporal’ dimension. The EOFs were truncated to account for 95 per cent of the ensemble variance, and the principal components were normalized by HadCM3 variability. Thus, for each model, an observational error in each EOF mode could be defined by taking the difference between the amplitude of the mode and the projection of that mode onto a reanalysis dataset. Each model in the ensemble is associated with three sets of errors—each set comprising the difference between the amplitude of each mode in the truncated EOF set in that model and the reanalysis projection of that mode. These numbers were then used to train a neural network (Sanderson et al. 2008), where the input was the perturbed parameters for each model and the output was the set of observational errors described above. The neural network was also trained to predict the climate sensitivity. Once trained from the ensemble, the neural network was then used to interpolate between the discrete parameter values used in the original experiment and predict the errors in each observation type, along with the model sensitivity. These errors were combined to predict an overall RMSE over the three observation types used. A brute force technique then found the minimum RMSE in each 0.1 K bin of sensitivity, and the parameters associated with these models are those that were employed in the transient ensemble.
The second CPDN experiment—the sulphur cycle experiment (Ackerley et al. submitted b)—extended the first CPDN experiment by including an interactive sulphur cycle (Ackerley et al. submitted a). The BBC CCE included a range of perturbations to the sulphur cycle, and was discussed at some length in Ackerley et al. (submitted a). As with the other parts of the perturbed physics ensemble, the sulphur cycle parameters were perturbed to a maximum, minimum and a best guess value (each suggested by D. Roberts 2004, personal communication), a strategy intended to promote a large range of plausible model responses. The parameters given in the table were also perturbed in combination, as well as individually, to sample as much parameter space as possible and to identify whether perturbations interacted nonlinearly. The total number of perturbations to the sulphur cycle scheme itself was 81, including the unperturbed model.
In addition, the base dynamical ocean was also perturbed to attempt to span a range of heat uptake responses. The parameters sampled are listed in the table. These were chosen following elicitations from Hadley Centre oceanographers and are detailed in Brierley (2007).
The 10 different physical oceans were each spun up for 200 years, using forcings that correspond to the climate in ca 1920. The spin-ups were completed in house, and the oceans then coupled to perturbed physics atmospheres, using the flux adjustment files from each slab run of the perturbed atmosphere, plus the flux readjustment calculated via the procedure outlined in Faull et al. (in preparation). The perturbations in the BBC CCE are listed in table 1. See http://www.climateprediction.net/science/parameters.php for further details. The coupled models, with the flux adjustment and readjustments, were then run under transient (1920–2080) and control (160 year) forcings. Some simple results for the control climates are presented below. The transient responses are to be discussed in a follow-up paper.
Figure 2 shows a subset of 302 control simulations taken from the BBC CCE, each with different combinations of atmospheric and oceanic physics. Time series of surface air temperature (figure 2a) and a histogram of temperature trends, calculated with 120 years of data from 40 years after the control simulation is started (figure 2b) are shown. Temperature trends, calculated from control simulations from 25 members of the IPCC ensemble, are shown as crosses. The first thing to note is that the drifts are reasonably small compared with the sorts of transient global warming signals we expect from the twentieth/twenty-first century forcing experiment: of the order of a few tenths of a degree over 120 years from 1960 to 2080, compared with Greenhouse Gases (GHG) induced warmings of approximately 3 or 4°C. Since the CPDN team are conducting the analysis of the transient ensemble as a set of ‘transient minus control’ pairs, these trends are subtracted from the transient ensemble and therefore do not bias the subsequent analysis. It will also be noted that there is a positive bias among the trends in the control runs. Essentially, there seem to be a couple of main possibilities to account for this (which don't exclude each other): the first is that the effort to offset the top of atmosphere imbalance and the surface flux imbalance with flux adjustments+flux readjustments nearly works, but doesn't quite, since we have not included all the fluxes. On this view, the positive bias may be the result of the decision not to add the precipitation fields to the convergence file Faull et al. (in preparation), thus under-correcting rather than over-correcting the (positive) imbalance. The second is that the 150 year spin-up is not long enough to get a stable coupled system, and thus the convergences are being diagnosed from a Haney-forced phase that has a trend in it (i.e. is not quite in balance), and that this returns as a trend in the 160 year coupled run. A preliminary analysis has not managed to discriminate between these two mechanisms and further analysis is currently underway (Yamazaki et al. in preparation).
Although the trends for the BBC CCE controls are typically larger than those for the IPCC ensemble, the initial drifts are smaller than that expected without applying a flux readjustment.
Temperature trends for the ensemble for 2040–2060 were derived by regressing the annual mean temperatures against a straight line. Figure 3 displays the temperature trends in the ensemble members with (dashed line) and without (solid line) the linear trend removed for each ensemble member, for a 5 year period (2048–2052) and a 19 year period (2041–2060). These plots show that the removal of a linear trend acts to narrow and de-bias the range of drifts. Given that there is every reason to believe that the transient ensemble members will suffer from very similar unforced drifts, it would seem reasonable to expect that the use of transient minus control pairs would have a similar effect in terms of countering any residual drifts in the surface climate.
A histogram of the control ensemble's interannual variability for the 120 year period 1960–2080 is plotted in figure 4. The quantity plotted is simply the standard deviation in the annual global mean temperature. The standard deviation of global mean temperature in the base model is 0.12 K, and that of the HadCRUT dataset (with linear trend removed) is 0.14 K (Collins et al. 2001). Thus, the interannual variability of the ensemble, unsurprisingly, has as its modal value that of the base model and also spans the values seen in the observations.
4. Summary and conclusions
Preliminary results examining the control runs suggest that the experiment and, especially the new technique (Faull et al. in preparation) for dealing with the somewhat random energy imbalances inherent in the random combination of physically perturbed atmospheres and oceans, seem to work reasonably well; these results are discussed in much greater depth in Faull (2005) and Faull et al. (in preparation).
At the time of writing, the BBC CCE has produced some 50 000 transient-control pairs, including duplicates. Analysis of these is ongoing. Successive CPDN experiments have demonstrated the viability of large-scale public resource distributed computing in an environmental context (Stainforth et al. 2002). Analysis of the transient minus control pairs currently underway includes looking at the global temperature response to quantify some aspects of the uncertainty in the transient climate response, regional studies aiming to inform adaptation planners and exploring relationships between physical variables such as precipitation and temperature, pressure and temperature, and the land/sea contrast. The ensemble is also being used to inform climate system emulators. These studies will allow us not only to make progress in understanding some aggregate aspects of the relationship between twentieth and twenty-first century climate, but also to improve our understanding of climate processes and inform real-world decision making, all of which make CPDN a unique and versatile climate research resource.
Making sense of such a large ensemble of GCMs is something of a challenge, and one that far exceeds the research capacity of the core CPDN team. Work is continuing on developing software infrastructure to enable scientists to search, subset and process the runs so as to enable further analysis. We are attempting to turn the BBC CCE into a convenient community resource so that others can make use of this unique dataset. The main task at the moment is making the dataset more accessible, and providing the appropriate information, especially metadata, which describe the experimental structure and details of the data. Those interested in accessing the data are welcome to visit http://results.cpdn.org/ to register for data access.
The authors wish to thank the Natural Environment Research Council, Microsoft Research, the James Martin 21st Century School, the Smith School of Enterprise and the Environment, the Met Office, the British Atmospheric Data Centre, the Open University, the EU ENSEMBLES project and the very generous CPDN participants, without whom CPDN would never have been possible.
One contribution of 24 to a Discussion Meeting Issue ‘The environmental eScience revolution’.
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- Copyright © 2008 The Royal Society