Model complexity versus ensemble size: allocating resources for climate prediction

Christopher A. T. Ferro, Tim E. Jupp, F. Hugo Lambert, Chris Huntingford, Peter M. Cox

Abstract

A perennial question in modern weather forecasting and climate prediction is whether to invest resources in more complex numerical models or in larger ensembles of simulations. If this question is to be addressed quantitatively, then information is needed about how changes in model complexity and ensemble size will affect predictive performance. Information about the effects of ensemble size is often available, but information about the effects of model complexity is much rarer. An illustration is provided of the sort of analysis that might be conducted for the simplified case in which model complexity is judged in terms of grid resolution and ensemble members are constructed only by perturbing their initial conditions. The effects of resolution and ensemble size on the performance of climate simulations are described with a simple mathematical model, which is then used to define an optimal allocation of computational resources for a range of hypothetical prediction problems. The optimal resolution and ensemble size both increase with available resources, but their respective rates of increase depend on the values of two parameters that can be determined from a small number of simulations. The potential for such analyses to guide future investment decisions in climate prediction is discussed.

1. Introduction

Two of the main drivers of improved weather forecasts and climate predictions in recent decades have been the increasing complexity of general circulation models and the increasing number of trajectories simulated with those models. Both of these advances place greater demands on computational resources and so a trade-off emerges: should future resources be invested in more complex models or in larger ensembles of simulations? This question pertains to forecasting on all time scales. We believe that answers to this question would benefit from quantitative analysis, but efforts are rarely made to collect the necessary data. In this paper, we illustrate how such analyses might be conducted when the relevant data are available. Our approach is highly simplified, but we hope that it will stimulate more detailed quantitative investigations of this important topic. We begin with a brief discussion of the roles of ensembles and model complexity.

Ensemble simulations are now routinely used to investigate the uncertainty that arises from limitations in our knowledge [1]. For example, uncertainty about external forcings of the climate system can be explored by running a model with each of several different boundary conditions. The impact of imperfections in models can be explored by running several models, differing either in the structure of their equations (to produce a multi-model ensemble) or just in the values of the parameters in their equations (to produce a perturbed-physics ensemble). An initial condition ensemble is formed by varying only the initial state of a model. These latter ensembles provide information about the frequency distribution of simulated weather that arises from a model's sensitivity to initial conditions. Increasing the number of members in an initial condition ensemble improves the precision with which this model climate (and properties of it such as its mean) is known, but the ensemble size will be limited by the available computational resources.

General circulation models are based on equations derived from physical laws, which are approximated and solved numerically on a discrete grid. The complexity of a model may be judged in terms of both the physical processes that are represented in the model and the resolution of the grid on which the equations are solved. Increasing model complexity may improve the accuracy with which a model simulates the climate system, but again the complexity will be limited by the available computational resources.

Weather forecasting and climate prediction, therefore, would benefit from both more complex models and larger ensembles. In order to simplify our illustrative analysis, we shall concentrate on the roles of model resolution and initial condition ensembles because their likely effects on forecast performance are more amenable to mathematical description. Extensions to other aspects of model complexity and other types of ensemble are left for future research. Larger initial condition ensembles provide more precise information about a model's climate, while increasing model resolution (reducing the distance between grid points) changes that climate, potentially making it closer to reality. Increasing ensemble size or model resolution is expensive, and resource constraints mean that a balance must be struck. Should future investment be targeted at higher resolution? If so, how high? Or should investment be targeted at larger ensembles? If so, how large? Is there an optimum trade-off between the two competing demands?

In §2, we survey existing evidence for how ensemble size and model resolution might be expected to affect the quality of predictions. In §§3 and 4, we outline a simple framework for describing these effects mathematically. In §5, we illustrate how such a framework can be used to identify an optimal trade-off between ensemble size and model resolution. We close with a discussion in §6.

2. Evidence for the effects of ensemble size and resolution

In order to decide where to invest resources, we should like to be able to predict how changes in ensemble size and model resolution will affect the quality of forecasts without actually having to generate extra ensemble members or create new models at higher resolutions. For this to be possible, we must first have some evidence that ensemble size and resolution have predictable effects on forecast performance.

For many measures of performance, increasing the size of initial condition ensembles does have a beneficial effect that is systematic and predictable. This is because the statistical properties of additional members of these ensembles are often similar to the properties of existing members. This has been known since the early days of ensemble weather forecasting. Leith [2], for example, showed that the expected mean-squared error (m.s.e.) of the ensemble mean changes at a rate that is inversely proportional to the ensemble size. Many other analytical and empirical demonstrations of such effects have been published for lead times ranging from a few days to months [312].

Evidence for predictable effects of increasing model resolution is rarer. Indeed, isolating an effect that is owing to model resolution alone is difficult in practice because, usually, at least some model parameters must be tuned to each resolution in order to obtain realistic simulations. Nonetheless, there are some studies in the literature that identify effects which can be attributed largely to changes in resolution [1317]. These effects, however, are not always systematic. For example, the performance of simulations can jump by unpredictable amounts when the resolution crosses a threshold beyond which new physical processes are resolved by the model. Even when the effects are systematic, increasing resolution does not always improve performance. However, the studies cited above suggest that there are times when predictable effects exist, when smooth changes in performance can be expected over a limited range of resolutions, and it is on these situations that we shall focus. Roeckner et al. [16] provide the most accessible, quantitative examples in their comparison of multiple resolutions of the ECHAM5 general circulation model. Figure 1 presents part of their data in graphical form, where, for four variables, the m.s.e. of global and seasonal means, averaged over seasons, is found to decrease smoothly across five different resolutions.

Figure 1.

Root mean-squared errors against horizontal grid spacing for temperature at 850 hPa (T850 in K, circles), mean sea-level pressure (MSLP in hPa, triangles), geopotential height at 500 hPa (Z500 in dam, pluses) and zonal wind speed at 850 hPa (U850 in m s−1, crosses) simulated by the ECHAM5 model. Curves defined by expressions (4.1) and (4.2) and fitted by least squares are superimposed.

In figure 1 and throughout this paper, we consider the effects of model resolution on the prediction of scalar quantities, such as global means, rather than spatial fields. The former provides key tests of model quality, and understanding the effects of resolution in these terms is therefore important. By neglecting the case of spatial fields, however, we ignore a second, important benefit of increasing model resolution, which is to produce forecasts with greater spatial variability. Incorporating this second effect is left for future research.

3. Predicting the effects of ensemble size

We have seen that there are situations in which we may reasonably assume that the effects of increasing resolution and ensemble size can be predicted to some useful degree. In §§3 and 4, we present a simple example of how such effects might be described mathematically. This leads to a formula for how performance depends on resolution and ensemble size, which we then use to analyse the trade-off between the two effects.

Let Y represent the observed value of a scalar predictand, and let X1,…,Xn represent the corresponding predictions made by n members of an initial condition ensemble. We derive the effect of ensemble size on the m.s.e., Embedded Imageof the ensemble mean, Embedded Image, where Embedded ImageOur choice of the m.s.e. reflects its widespread use and the tractability of the results that follow, but other measures could be used instead.

We suppose that we are interested in the performance of the ensemble over some, possibly short, time interval during which the statistical properties of the ensemble members and observations are approximately stationary. We account for temporal variation in the ensemble members and observation over this interval by defining X1,…,Xn and Y to be random variables and then seek the effect of ensemble size on the expected value, E(S), of the m.s.e. To do so, we must make some assumptions about the statistical properties of the ensemble members, otherwise we would be unable to tell what effect adding or removing ensemble members would have on the expected m.s.e. The only assumption that we make is that the ensemble members are second-order exchangeable [18]. In other words, the ensemble members all have the same expectations and variances and all pairs of ensemble members are equally correlated. This is weaker than assuming independent and identically distributed ensemble members and assumes nothing about the relationship between the ensemble members and the observation. Our assumption is encapsulated in the following notation. For all i, the expectations of our random variables are Embedded Imagethe variances are Embedded Imageand the correlations are Embedded Imagefor all ji. With these assumptions, we may now derive the effect of ensemble size on the expected m.s.e.

We show in appendix A that the expected m.s.e. is Embedded Image3.1This expression simplifies in situations in which the initial conditions confer no predictive skill on the ensemble members at the lead time of interest. This is typical of many climate predictions for which the lead time is longer than the time scales of any internal processes that influence significantly the variability of the predictand. In such cases, we may assume that the ensemble members are mutually uncorrelated and also uncorrelated with the observation so that ρ=ρo=0. Under this assumption, the skill of the ensemble members comes from predicting the mean climate rather than the daily variations in weather.

We shall use the general expression (3.1) for the expected m.s.e. in order to analyse the benefit that would be gained from increasing the ensemble size. Although we would minimize the expected m.s.e. by choosing the ensemble size to be as large as possible, the benefit gained from increasing n depends on the magnitude of the ensemble variance, σ2(1−ρ), relative to the other terms in the expression. If the variance is relatively large, then a large percentage reduction in m.s.e. will result from increasing ensemble size; but, if the variance is relatively small, then the percentage reduction will be small. A law of diminishing returns also applies: one extra ensemble member has a greater impact on the m.s.e. if the current ensemble size is small than if the current ensemble size is large.

4. Predicting the effects of model resolution

In §3, we derived an expression for the effect of ensemble size on the expected m.s.e. under quite weak assumptions about the statistical properties of initial condition ensemble members. Finding an expression for the effect of resolution is more challenging. Rather than attempting to derive such an expression theoretically, we adopt an alternative strategy: assume a sufficiently flexible parametric form for the effect of resolution on the statistical properties of ensemble members. Such a form can be fitted to ensembles from a model run at a small number of different resolutions in order to assess empirically its adequacy and to estimate its parameters. We adopt one, simple form below for illustrative purposes, but alternatives may be used with equal ease.

Define the grid spacing, r, to be the horizontal distance between grid points. This is inversely proportional to spectral truncation and model resolution so that higher resolution corresponds to smaller values of r. Let us write the effect of resolution on the expected value of an ensemble member as Embedded Image4.1where we assume that the resolution-dependent bias β(r)→0 as r→0. Then, ϵ can be interpreted as an irreducible error. We also assume for simplicity that the model standard deviation, σ, and the correlations ρ and ρo are constant in r, although resolution-dependent models for these parameters could also be incorporated.

We found earlier that the expected m.s.e. (3.1) decreases monotonically as the ensemble size, n, increases. In contrast, our model (4.1) admits non-monotonic dependence of the m.s.e. on the grid spacing, r. For example, if β(r) and ϵ have different signs, then there may be a grid spacing r>0 such that β(r)=−ϵ and, therefore, μ(r)=μo. If β(r) and ϵ have the same sign, however, then our model (4.1) implies that the expected m.s.e. will decrease monotonically as r→0.

To complete our specification of the effect of resolution, we must specify a parametric form for β(r). We consider the following example: Embedded Image4.2where δ>0. This choice is motivated by the finding that the energy spectra of atmospheric motions tend to follow power laws [19], such that the unresolved energy has a power-law dependence on model resolution. We assume that the bias, β(r), associated with these unresolved motions also approximates a power law in r.

The parameters ϵ, α and δ in the model defined by expressions (4.1) and (4.2) may be estimated from a small number of model simulations run at different resolutions. For example, the bias, μ(r)−μo, may be estimated by the difference between the mean of an ensemble run at grid spacing r and a time mean of corresponding observations. If such bias estimates are available for a few different resolutions, then the curve ϵ+αrδ can be fitted to them by least squares. We have been unable to obtain bias estimates with which to illustrate this procedure and so we fitted the curve to the data in figure 1 instead. This would be equivalent to fitting the curve to bias estimates were the biases of the simulations underlying figure 1 positive in all four seasons. The parameter estimates are presented in table 1 and the fitted curves are superimposed on the data in figure 1. Although some of the parameters are estimated with low precision, our simple parametric form is able to describe the effect of resolution quite closely.

View this table:
Table 1.

Parameter estimates. Parameter estimates and estimated standard errors in brackets for the data in figure 1 when the grid spacing, r, is rescaled to have units of 1000 km.

5. Analysing the trade-off

(a) Model cost

In this section, we demonstrate how descriptions of the effects of ensemble size and model resolution on performance measures such as the m.s.e. can be used to identify an optimal allocation of finite resources. First, we need to know how the cost of running an ensemble member depends on the resolution of the model. The form of this dependence is determined by the details of how the model code would change with resolution, and so different forms of dependence will be required for different modelling systems. The change of cost with resolution or other aspects of model complexity will typically contain some discontinuities, as model parametrizations change their nature for example, but for our illustrative analysis we adopt a simple, smooth form based on the following generic ideas. Such forms may hold approximately over restricted resolution ranges, and alternative forms could be used with equal ease.

Halving the horizontal distance, r, between grid points quadruples the number of grid points and, therefore, quadruples the number of calculations required to run the model. So the number of calculations is approximately inversely proportional to r2. In addition, however, the time step used to solve the model equations numerically tends to vary in direct proportion to r and so the number of calculations is approximately inversely proportional to r3 [17]. If the vertical distance between grid points is also proportional to r [20], then the number of calculations is approximately inversely proportional to r4. Rather than choosing a particular power now, we approximate the cost of running a single ensemble member at grid spacing r by c/rγ for some positive constants c and γ. We shall specialize to the case γ=4 later.

(b) Resource allocation

Suppose now that we have a total computational resource C for producing an initial condition ensemble of size n. Then, we are required to operate under the constraint nc/rγ=C so that n=Crγ/c. Given our expressions (4.1) and (4.2) for how resolution affects the model bias, the expected m.s.e. (3.1) becomes Embedded ImageThis is minimized at r=ropt, where ropt is defined implicitly by the equation Embedded Image5.1when α≠0, and the corresponding optimal ensemble size is Embedded Image5.2

As C increases, more resources become available and so ropt decreases and nopt increases. As c increases, the model becomes more expensive and so ropt increases and nopt decreases. As σ increases or ρ decreases, the ensemble variance increases and both ropt and nopt increase. As the absolute value of α increases, the resolution-dependent bias increases and both ropt and nopt decrease. If ϵα>0, then, as the absolute value of ϵ increases, the impact of resolution on the squared bias increases and both ropt and nopt decrease. The opposite holds if ϵα<0 since then a large resolution (small grid spacing) is needed to balance a large irreducible error. If ϵα>0, then both ropt and nopt are minimized at an intermediate value of δ; both increase as δ→0 since then the effect of resolution on the bias reduces; and both increase to the asymptotes ropt=1 and nopt=C/c as Embedded Image since then any grid spacing below r=1 yields a small bias. If ϵα<0, then ropt and nopt decrease monotonically as δ increases.

We can identify the optimal resolution and ensemble size to use for a given resource C if we can determine the values of the parameters in our equations (5.1) and (5.2). We have already described how ϵ, α and δ may be estimated. The ensemble variance, σ2(1−ρ), may be estimated from small initial condition ensembles. For illustration, we use the parameter estimates presented in table 1 where, for want of relevant model data, σ has been estimated by the standard deviation of global, seasonal means from the ERA-40 reanalysis [21] for the same time period, 1979–1993, used by Roeckner et al. [16]. We can assume ρ=ρo=0 for these ensembles and we set γ=4. Figure 2 shows how ropt changes with the available resource, where resource is measured in terms of the number of ensemble members that could be afforded when the model is run at grid spacing 276 km (corresponding to spectral truncation T85). For example, an ensemble size equal to one should be interpreted as the resource sufficient to run one member at resolution T85. For all variables and all levels of resource, we find that the optimal grid spacing is lower than 276 km. The optimal resolutions for T850 and Z500 are similar to one another, and simulations of U850 and mean sea-level pressure (MSLP) would benefit from higher resolutions. The optimal resolutions for U850 and MSLP correspond to optimal ensemble sizes that are less than one, however, and so the optimal resolution for these two variables is actually the highest resolution that can be afforded with a one-member ensemble.

Figure 2.

Optimal grid spacing against the ensemble size that can be afforded with a grid spacing of 276 km based on the data for the four variables in table 1: T850 (solid line), MSLP (dashed line), Z500 (dotted line) and U850 (dotted-dashed line). The smallest grid spacing (long-dashed line) that can be afforded with a one-member ensemble is also shown.

(c) Large and small irreducible errors

In two special cases, equations (5.1) and (5.2) yield closed-form expressions for the optimal grid spacing and ensemble size. When the irreducible error, ϵ, is small compared with the resolution-dependent bias, Embedded Image, equations (5.1) and (5.2) yield Embedded Imageand Embedded Imagewhile the cost of each member for this optimal model is Embedded ImageThe logarithms of the optimal grid spacing, ensemble size and model cost, therefore, scale approximately linearly with the logarithm of the total resource, C, and this scaling depends on the values of δ and γ only. Moreover, as resource increases, the optimal model cost increases faster than the ensemble size if and only if δ<γ/2.

When the irreducible error is large compared with the resolution-dependent bias, something that can happen only when ϵα>0, we obtain the approximations Embedded Imageand Embedded Imagewhile the model cost is Embedded ImageA different log-linear relationship holds in this case, with the optimal model cost increasing faster than the ensemble size if and only if δ<γ.

(d) Non-dimensionalization

A useful graphical representation of how the optimal grid spacing and ensemble size change with resource can be obtained by non-dimensionalizing our equations. Define the following dimensionless quantities: Embedded Imagewhere rc=|ϵ/α|1/δ is a ‘critical’ grid spacing. As before, the optimal, dimensionless grid spacing and ensemble size are defined implicitly by two equations. When ϵα>0, these equations are Embedded Imageand Embedded ImageThe relationships between Embedded Image, Embedded Image and Embedded Image therefore, depend only on δ and γ. Moreover, relative changes in these dimensionless quantities equal the relative changes in r, n and C since Embedded Image, Embedded Image and Embedded Image. The curves in figure 3 show how Embedded Image and Embedded Image change with Embedded Image for the four variables listed in table 1 when γ=4. At small and large values of Embedded Image the curves approach the log-linear relationships identified in §5c for the cases of small and large irreducible errors, respectively. The balance of grid spacing and ensemble size for existing ensembles can also be judged by plotting Embedded Image and Embedded Image against Embedded Image and seeing how far the points fall above or below the curves of optimal resource allocation. We illustrate this idea for the variables in table 1 by plotting points corresponding to the situations in which the available resource is sufficient to run either 10 or one ensemble members at grid spacing 276 km. As discovered earlier, in all cases, the resolution is too low (grid spacing too large) and the ensemble size too high.

Figure 3.

Dimensionless grid spacing (Embedded Image, grey) and dimensionless ensemble size (Embedded Image, black) against dimensionless resource (Embedded Image) on logarithmic axes for the parameter estimates in table 1: temperature at 850 hPa (T850: solid line, circles), mean sea-level pressure (MSLP: dashed line, triangles), geopotential height at 500 hPa (Z500: dotted line, pluses) and zonal wind speed at 850 hPa (U850: dotted-dashed line, crosses). The curves show how the optimal grid spacing decreases and the optimal ensemble size increases as resource increases. The symbols correspond to a grid spacing of 276 km and ensemble sizes 10 (large symbols) and one (small symbols).

6. Discussion

The question of resource allocation for weather forecasting and climate prediction is ubiquitous. Should the priority be to create more models, more complex models, or to learn more about existing models by creating larger ensembles? In this paper, we have argued that attempts should be made to answer these questions quantitatively, and we have provided a simple illustration of the sort of approach that might be taken. We showed how expressions for the effects of ensemble size and model resolution on the m.s.e. of an initial condition ensemble mean can be used to analyse the trade-off between these two effects and to identify the optimal ensemble size and resolution for a given level of computational resource. We derived our expression for the effect of ensemble size analytically under the weak assumption of second-order exchangeable ensemble members and obtained our expression for the effect of model resolution empirically from a small number of model simulations. The simplifying assumptions that we adopted may need to be relaxed in order to develop more realistic frameworks. We illustrated our approach with data from simulations of the ECHAM5 climate model published previously by Roeckner et al. [16]. Our analysis suggested that, at current resource levels, higher resolution should perhaps take priority over larger ensembles, but we emphasize that our numerical results should be viewed as an illustration of an approach rather than as a reliable guide for resource allocation for ECHAM5 or any other climate model. Although our framework has various limitations that we discuss below, we believe that extensions of our approach could be highly informative and we hope that these ideas will be pursued by other researchers.

We considered only one measure of performance: the m.s.e. of the ensemble mean. This is a widely used measure for which the ensemble-size effect is tractable, but it fails to assess other properties of the ensemble such as its spread. Extensions to other measures such as ranked probability scores are possible [12], but incorporating the effects of model complexity in those cases may be more challenging. Another improvement may be to measure performance relative to the magnitude of the predictand. For example, if the ensemble mean represents the proportion of ensemble members predicting the occurrence of a rare event, then the absolute m.s.e. typically will be small, but achieving a small relative m.s.e. is more important. Although the same allocation of resources will optimize both the absolute and relative m.s.e., the level of resource needed to achieve a small relative m.s.e. for rare events can be very large: high resolution (small grid spacing) may be needed to control the relative bias and large ensembles may be needed to control the relative precision of the ensemble mean. A third improvement would be to incorporate a measure of the benefit of the greater spatial detail provided by higher resolution models.

An important extension of our work would accommodate perturbed-physics and multi-model ensembles. These ensembles are most relevant when forecasting at seasonal and longer lead times, over which the impacts of model imperfections become more significant. However, the effects of increasing the size of such ensembles are harder to predict than for initial condition ensembles because the statistical properties of ensemble members are affected by model structures and parameter values in complex ways. Multi-model ensembles are especially challenging. Determining how to distribute resources across a given set of existing models may be the best approach in such circumstances. More is possible for perturbed-physics ensembles if it can be assumed that ensemble members respond smoothly to changes in parameter values. In that case, statistical theory on the design of experiments [22] provides a way forward.

We relied on empirical information to predict the effects of changing model resolution on the statistical characteristics of ensemble members. Supplementing this information with theoretical results from numerical analysis would be beneficial, particularly, as the empirical evidence available at present is sparse. We also interpreted model complexity in the narrow sense of model resolution. Extending our approach to accommodate the expected effects of changing models in other ways would be desirable.

Finally, decisions about resource allocation can rarely be formalized entirely in the sort of framework that we have outlined. For example, optimal allocations are typically unique to each predictand, and so compromises must be made to find an allocation that yields acceptable performance across all predictands of interest. Nonetheless, such compromises should be based on relevant evidence. We believe that valuable additions to such evidence could be provided by the type of analysis proposed in this paper.

Acknowledgements

The authors are grateful to Dr Renate Brokopf for providing access to data from the ECHAM5 simulations, and to the staff of the Isaac Newton Institute for Mathematical Sciences and the organizers of its programme on Mathematical and Statistical Approaches to Climate Modelling and Prediction (11 August–22 December 2010), during which this work was initiated.

Appendix A

We derive the expression (3.1) for the expected m.s.e. We have Embedded Imageand, therefore, Embedded Image

Footnotes

References

View Abstract