## Abstract

The impact of a nonlinear dynamic cellular automaton (CA) model, as a representation of the partially stochastic aspects of unresolved scales in global climate models, is studied in the European Centre for Medium Range Weather Forecasts coupled ocean–atmosphere model. Two separate aspects are discussed: impact on the systematic error of the model, and impact on the skill of seasonal forecasts. Significant reductions of systematic error are found both in the tropics and in the extratropics. Such reductions can be understood in terms of the inherently nonlinear nature of climate, in particular how energy injected by the CA at the near-grid scale can backscatter nonlinearly to larger scales. In addition, significant improvements in the probabilistic skill of seasonal forecasts are found in terms of a number of different variables such as temperature, precipitation and sea-level pressure. Such increases in skill can be understood both in terms of the reduction of systematic error as mentioned above, and in terms of the impact on ensemble spread of the CA's representation of inherent model uncertainty.

## 1. Introduction

Ever since their introduction, numerical climate models have been formulated according to a rather precise prescription: represent the equations of motion as accurately as possible by projection onto a Galerkin basis down to some truncation scale, and represent the effect of unresolved scales on the resolved-scale motions through deterministic bulk-formula parametrizations. The tendencies associated with such parametrizations are determined by, and hence slave to, the resolved-scale flow, typically at the truncation scale.

Such bulk-formula parametrizations are motivated by concepts in statistical mechanics. For example, the formulation of the effect of molecular viscosity is given by considering an ensemble of randomly moving molecules the mean-free path of which is small compared with some macroscale of interest. Similarly, climate model parametrizations (e.g. of convection or gravity wave drag) are formulated by supposing that there exist an ensemble of sub-grid processes (e.g. convective plumes or gravity waves) in quasi-equilibrium with the resolved-scale flow.

By using a simplified cloud resolving model, Shutts & Palmer (2007) studied this statistical mechanical assumption quantitatively, in the case of convection in the tropics. In regions of strong convection it was found that the bulk-formula assumptions were in error by an order of magnitude; rather than determining the sub-grid tendency, the truncation scale motions provided only a partial constraint on some underlying probability distribution of sub-grid tendencies.

Consistent with these findings, Palmer (2001) has suggested that sub-grid motions should be represented by simplified nonlinear stochastic dynamic models, as an alternative to the deterministic bulk-formula approach. One consequence of such an approach is that a stochastic dynamic model can allow sub-grid-scale energy to be backscattered onto the resolved-scale grid.

Stochastic kinetic energy backscatter has its origins in early three-dimensional turbulence closures that recognized the deficiencies of the eddy viscosity concept (Kraichnan 1976). Leith (1990) and Mason & Thomson (1992) demonstrated the benefits of using a stochastic backscatter term in large eddy simulations of turbulence and Frederiksen & Davies (1997) extended the technique to large-scale planetary motion. The idea is based on the notion that turbulent dissipation rate is the difference between upscale and downscale spectral transfer and that the upscale component is available to the resolved flow as a kinetic energy source. In a nonlinear system, such backscattered energy can in principle reduce the large-scale systematic error of the model. In addition, through multiple independent random samplings of the underlying stochastic processes, an ensemble of integrations is readily generated by such models, thus providing a representation of underlying model uncertainties.

One simple type of stochastic dynamic model considered by Palmer (2001) is based on cellular automaton (CA). This approach has been further developed by Shutts (2005) and applied to medium-range forecast problems. In this paper, the CA scheme of Shutts is applied to climate time scales. Specifically, the impact of the scheme, both on the systematic error of the European Centre for Medium Range Weather Forecasts (ECMWF) coupled ocean–atmosphere model, and on the seasonal forecast skill of the same model, is studied.

A brief summary of the CA scheme is described in §2. The experimental design is given in §3, and results in terms of impact on systematic error and on seasonal forecast skill are given in §4. Conclusions are summarized in §5.

## 2. The CA backscatter scheme

In this section, we give a brief summary of Shutts' kinetic energy CA backscatter scheme (CASBS) and point out some minor differences from the original scheme. An extension of this scheme and its impact on medium-range ensemble forecasting and flow-dependent predictability and error growth is discussed in Berner *et al*. (submitted).

Shutts (2005) argues that in numerical weather prediction (NWP) systematic kinetic energy is lost in both numerical integration schemes and parametrizations. For instance, errors in semi-Lagrangian departure-point interpolation cause a net energy sink, and kinetic energy released in deep convection does not find its way sufficiently into balanced flows and gravity wave generation. Consequently, the backscatter scheme aims at representing upscale error growth on synoptic and sub-synoptic scales from convection, gravity and mountain/wave drag and numerical dissipation.

The backscatter scheme generates a flow-dependent stochastic kinetic energy source by introducing a stream function forcing(2.1)where *D*(*φ*, *λ*, *z*, *t*) is the instantaneous dissipation rate and *ϕ*(*φ*, *λ*, *t*) a realization of the CA. The resulting stream function forcing is then transformed to spectral space, converted into a vorticity increment and added to the dynamical equations. True to its nature, as a kinetic energy backscatter scheme, this will affect the kinetic energy spectrum of the model. In ensemble forecasts, CASBS is initialized with different choices of the CA pattern. Since these patterns can be chosen randomly, CASBS is effectively a stochastic sub-grid parameterization.

In the following paragraphs, we give some more details on the pattern generator and the computation of the total dissipation rate *D*(*φ*, *λ*, *z*, *t*). The CA is a variant of the CA known as Conway's Game of Life (e.g. Gardner 1983). The CA domain covers the entire globe in a fine grid of cells that are regular in latitude/longitude coordinates; here 720×360 cells (figure 1*a*). Each cell can be alive or dead and living cells have up to 32 lives. Dead cells come to life if their nearest neighbours satisfy certain ‘birth’ conditions (e.g. when the number of them with 32 lives equals either 3, 4 or 5). When a dead cell comes to life it is given 32 lives and these are progressively lost in later steps when certain survival rules fail to be met. Only cells with the maximum number of lives (32) participate if the rule counts. In order to present a sufficiently smooth pattern to the forecast model, the cell values are averaged into 2° ‘macro’ cells comprising 4×4 CA cells and the subsequent array is smoothed again using passes of a 1-2-1 filter first in *x* (longitude) and then in *y* (latitude). The smoothed values are then normalized so that the domain average is unity (figure 1*b*). The resulting pattern looks very ‘organic’ and consists of clusters of densely populated cells with distinct fronts, reminiscent of mesoscale structures in the atmosphere. Further detail on the CASBS algorithm can be found in Shutts (2005).

The spatial structure of the smoothed coarse-grained CA pattern is analysed by calculating the spatial correlation as a function of the distance *r* between the macro cells, in units of the width of a single macro cell. The spatial correlation decreases with distance and oscillates at approximately zero for macro cells of seven or more units apart (figure 1*c*). The e-folding distance, i.e. the decorrelation scale, equals four units. Since the macro cell represents an area of 2×2° the decorrelation scale is effectively approximately 8° or 800 km.

The total instantaneous dissipation rate *D*(*φ*, *λ*, *z*, *t*) contains contributions from deep convection, numerical dissipation and gravity/mountain wave drag. We refer to Shutts (2005) for details on the computation of the numerical dissipation and contribution from gravity/mountain wave drag. Changes in the NWP model made it necessary to no longer base the dissipation from deep convection on convective updraught speeds; but instead a mass-flux formulation was usedwhere *M*(*φ*, *λ*, *z*, *t*) is the updraft convective mass flux rate in kg (m^{3}s)^{−1} from the convective parametrizations, *δ*(*φ*, *λ*, *z*, *t*) the updraft detrainment rate in kg (m^{3}s)^{−1}, *ρ*(*φ*, *λ*, *z*, *t*) the density and *β* an assumed detrainment cloud fraction for deep convection of *β*=2.6 10^{−2}. To focus on the large-scale structure, the total dissipation rate *D*(*φ*, *λ*, *z*, *t*) is subsequently smoothed by applying a spectral filter that completely retains wavenumbers *n*<20 and gradually reduces to zero for wavenumbers 20<*n*<30.

The total dissipation rate per unit area typical of the ECMWF atmospheric model for the months December through February and its contributions from deep convection, numerical dissipation and gravity/mountain wave drag are shown in figure 2. For illustration only, we show density-weighted vertical averages of the three-dimensional dissipation fields. The dominating contributor to the total dissipation rate with a global mean of 0.99 Wm^{−2} is deep convection. Its maxima are in the deep convective regions of the tropics, especially over Indonesia, but also downstream of the Andes. The most active regions are seen in the summer hemisphere, and for the period June through August there are larger contributions from the Northern Hemisphere (not shown). With a global average of 0.56 Wm^{−2}, the second largest contribution comes from the numerical dissipation. It is largest in the storm track regions and downstream of high orography such as Greenland, the Himalayas and the Andes. The global-mean dissipation from gravity and mountain wave drag is much smaller and occurs mainly in the lower tropospheric levels over orography. However locally, over the major mountain ranges, the dissipation rates may be very large indeed (e.g. over 100 Wm^{−2}).

## 3. Experimental set-up

The experiments discussed in this paper were performed using an ensemble of dynamical seasonal re-forecasts. Two sets of experiments with and without the stochastic backscatter scheme were performed. In the following, the control model simulations without stochastic backscatter will be denoted as CTRL, whereas the experiments using the new stochastic physics scheme will be referred to as CASBS.

The model used to produce these two sets of re-forecasts was integrated over seven months initialized twice a year on 1 May and 1 November at 00.00 hour GMT over the 11-year period 1991–2001. Ensembles of nine members generated by different initial conditions were used for each start date, so that a total set of 22 nine-member seasonal re-forecasts was available to analyse the results for both CTRL and CASBS.

As a forecast model, we used the ECMWF coupled atmosphere–ocean general circulation model (Anderson *et al*. 2007). The atmospheric component is the integrated forecasting system of ECMWF as used in operational medium-range weather forecasting in its version CY29R2. The model was run with a horizontal truncation of T_{L}95 and 40 vertical levels. The Hamburg Ocean Primitive Equation (HOPE) ocean model has a horizontal resolution of 1°, with an equatorial refinement of 0.3°, and 29 levels in the vertical. The coupler OASIS (version 2) is used to interpolate once per day between the oceanic and atmospheric grids.

The atmospheric initial conditions, including land surface conditions, come from ERA-40 (Uppala *et al*. 2005); the ocean initial conditions come from an ensemble of ocean analyses (Balmaseda *et al*. 2007). The nine initial-condition ensemble members were generated by adding small perturbations to the atmospheric and oceanic fields. For the initialization of the atmosphere, perturbations based on singular vectors were applied in a similar way as in the operational medium-range ensemble forecasts (Buizza & Palmer 1995; Rodwell & Doblas-Reyes 2006). For the initialization of the ocean, three different ocean analyses were created by adding daily wind stress perturbations on the basis of the differences between two quasi-independent datasets. In addition, four sea surface temperature (SST) perturbations, again based on the differences between two SST datasets, were added to or subtracted from the initial fields.

## 4. Results

### (a) Effect on kinetic energy spectrum

Figure 3 shows the rotational and divergent part of the kinetic energy spectrum at 500 hPa for ERA-40 data, a control integration without stochastic backscatter (CTRL) and an integration with CASBS. The kinetic energy in CTRL is less than in ERA-40, suggesting that the model is underactive for all wavenumbers *n*, but especially so for *n*>10. The stochastic backscatter scheme injects energy, so that the kinetic energy increases for all wavenumbers. This leads to an improvement in the rotational part of the kinetic energy spectrum in the synoptic band (wavenumbers 2–12) but to a slight overestimation for wavenumbers *n*>12. By introducing a stream function forcing, only the rotational part of the spectrum is directly forced. Nevertheless, a substantial impact is also seen in the divergent part of the kinetic energy spectrum for wavenumbers larger than *n*=5. The divergent part of the spectrum is now much closer to that of ERA-40 (figure 3), although there is now slightly too much energy for wavenumbers between 20 and 40. From this we conclude that the activity in the model has been changed in such a way that the model with CASBS is better at producing and maintaining divergent modes, which is especially important for tropical variability.

### (b) Systematic model error

In this section, the impact of the stochastic backscatter scheme on the systematic errors that develop during the course of the coupled seasonal integrations is described. We focus on the atmosphere and discuss results for the boreal winter (December–February, DJF) re-forecasts starting in November and for the summer (June–August) re-forecasts starting in May.

Figure 4 shows the systematic errors in simulating DJF mean total precipitation. The difference between CTRL and the global precipitation climatology project (GPCP) verification dataset (Adler *et al*. 2003) is displayed in figure 4*a*. The control model generates excessive precipitation over the tropics. The inter-tropical convergence zone (ITCZ) over the tropical Pacific is particularly wet. The control re-forecasts also generate too much precipitation in the Indonesian warm pool area. A dry bias can be seen over large parts of South America and Northern Australia. The errors in mid-latitudes are in general smaller than in the tropics.

The systematic error of the CASBS runs is shown in figure 4*b*; figure 4*c* displays the difference CASBS minus CTRL. The impact of the stochastic backscatter scheme on precipitation errors can be summarized as reducing the wet bias over the tropical ITCZ, in particular over the Pacific north of the equator. This is due to both a southward shift of the convergence zone and a reduction of the overall rainfall amounts. A decrease of the excessive precipitation over the tropical Atlantic and the Indian Ocean can also be detected. The systematic errors in precipitation over land are mainly unaffected by the stochastic parameterization scheme, although a positive impact can be found over Northern Australia.

Jung *et al*. (2005) showed that previous versions of the stochastic physics scheme induced significant changes in the extratropical tropospheric circulation in an uncoupled model. As a diagnostic of this impact, the Tibaldi & Molteni (1990) blocking detection index has been used to estimate the frequency of persistent blocking anticyclones in the simulations. The index computes the 500 hPa geopotential height meridional gradient for specific mid-latitude bands (of latitude between 40–60° N and 60–80° N) and classifies a longitude for a given time step as blocked, if in the northernmost latitude band a reverse of the westerly flow is detected. An event is classified as blocked if at least 10° of longitude are blocked over more than five consecutive days. Figure 5 shows the mean blocking frequency for DJF averaged over the years 1991–2001 and the nine ensemble members. 90% confidence intervals have been computed using a bootstrap method, where the original data were resampled with replacement 500 times. The grey dots (filled circle for CTRL and open circle for CASBS) in the top of the panel show the longitudes where the model climatology is not significantly different from the ERA-40 results, using a two-sample test based on the bootstrap estimates (Nicholls 2001; Lanzante 2005). The black squares correspond to those longitudes where both experiments are not significantly different. Although both experiments significantly underestimate the frequency of blocking events, the CASBS simulation reduces this error over the North Pacific to the point that the simulations are significantly different. The improvement of the mean state over the North Pacific is a robust feature of CASBS in seasonal integrations (Jung *et al*. 2005). In the experiments presented here, CASBS has no significant impact on Atlantic blocking.

In figure 6, we show the growth of systematic errors in SST over the Niño3.4 region (5° S–5° N, 170–120° W) in the eastern-to-central tropical Pacific, a key region for the development of the El Niño-Southern oscillation phenomenon, over the seven-month lead time of the re-forecasts. The dashed curves show the SST bias for the CTRL simulation for the two start dates in May and November; the solid ones that for the CASBS simulations. For both start dates and the two sets of experiments, the modelled SSTs are warmer than the climatological temperatures. However, for the May start dates, this warm bias is systematically reduced in the CASBS runs. For example, whereas in the CTRL experiment the SST errors after three and seven months lead time come close to 1.5 and 2.5 K, respectively, they are reduced by nearly 30%, that is approximately 0.5 K after three months and almost 1 K after seven months, in the CASBS experiments. For the November start dates no difference between the two was found.

### (c) Forecast quality assessment

Various measures of forecast quality have been used to assess the relative merit of CASBS when compared with CTRL. The scores include the anomaly correlation of the ensemble mean (ACC) and, for dichotomous probability forecasts, the Brier skill score (BSS) with respect to climatology and the relative operating characteristic skill score (ROCSS; Jolliffe & Stephenson 2003). The BSS has also been decomposed into the sum of two components (Murphy 1986): the reliability (RELSS) term that measures the relative bias of conditional means, and the resolution (RESSS) term that measures the relative variance of the conditional means. The BSS decomposition used here includes two additional terms in the resolution component which account for the within-bin variance and covariance of the probability forecasts, as described in Stephenson *et al*. (in press). As the BSS depends strongly on the ensemble size for small ensembles, Ferro (2007) has developed an analytical expression to estimate the Brier score for different ensemble sizes using the Brier score value obtained from the sample. This expression that depends on the variance (also known as sharpness) of the probability forecasts, allows the computation of a theoretical BSS for infinite ensemble sizes (BSSI henceforth).

The seasonal forecast system performance has been traditionally tested on tropical Pacific SSTs. This is because the main source of seasonal predictability is assumed to come from the interannual variability related to ENSO. Figure 7 shows the root mean square error (r.m.s.e.) and the anomaly correlation (ACC) of ensemble-mean surface temperature anomalies over the Niño3.4 region as a function of lead time, for the May and November start dates over the period 1991–2001. For comparison, the r.m.s.e. and ACC of a simple statistical model based on the persistence of the anomaly of the month previous to the start date are also shown. The accuracy of the re-forecasts generally decreases with lead time, although both experiments have higher skill than this simple persistence. The smaller r.m.s.e. and higher ACC for CASBS are evident after the third month of the forecast. This improvement of CASBS over CTRL is mainly associated with the forecasts started in May; for these start dates CASBS also leads to a large reduction in the systematic error. Figure 7*a* also shows the spread of both experiments measured by the standard deviation of the ensemble members around the ensemble mean. For a reliable ensemble system, state-dependent uncertainty is expected to be modelled by the spread. In a well-calibrated system, spread and r.m.s.e. should have a similar magnitude, which is not the case for CTRL. However, both experiments underestimate the forecast uncertainty, though CASBS has an increased spread with respect to CTRL from the first month of the integration.

Figure 8 illustrates the different ensemble forecasts for each experiment in the Niño3.4 region. In general, CASBS has a larger spread than CTRL, and its ensemble members encompass the CTRL ensemble in most cases. Furthermore, there are cases where CASBS seems to yield a better forecast. For example, the onset of the extremely warm ENSO event in 1997/98 was better predicted in CASBS than in CTRL.

The full set of forecast quality measures has been computed for several variables (500 hPa geopotential height, 850 hPa temperature, precipitation, 2 metre temperature and mean sea-level pressure) over a large number of regions (table 1) and for three different events in the case of the probability forecasts: anomalies above the upper tercile, above the median and below the lower tercile. The terciles and the median have been computed separately for each experiment and for the reference dataset to take into account the different systematic errors in the simulated distributions. Figure 9 shows the scatter plots of BSSI and ROCSS of CTRL versus CASBS for both start dates, the first four regions in table 1, the three events mentioned above and several forecast ranges (first month and forecast periods 1–3, 2–4, 3–5, and 4–6 months), making a total of 600 cases. For both skill scores there is a large range of values, from unskilful predictions (lower than zero) to values close to 0.5 and 1 for BSSI and ROCSS, respectively. Frequently, CASBS performs better than CTRL, as the larger number of points above the diagonal suggests. The difference of the skill scores for a given variable, region, lead time, start date and event, between CASBS and CTRL, has been tested for significance using a two-sample test and a bootstrap method with 1000 samples. While CTRL is significantly better (with 95% confidence) than CASBS in three cases for BSSI and in six cases for ROCSS, CASBS is significantly better than CTRL in 251 and 118 cases, respectively.

This overwhelming superiority of CASBS over CTRL in terms of forecast quality is also evidenced in table 2, which summarizes the results for several skill scores and all the regions in table 1. The large proportion of cases where RELSS is better for CASBS than for CTRL suggests that the increase in BSS and BSI can be largely attributed to an improvement in reliability. This might be linked to the alleviated underdispersion of the ensemble when using CASBS. A less intuitive result is the large number of cases where the ROCSS in CASBS is significantly higher than in CTRL. This improvement in terms of forecast resolution (of which ROCSS is a measure) is especially important because the resolution of a forecast system can only be enhanced using additional sources of forecast information, while the reliability can be improved a posteriori using climatological information. Improved resolution could arise owing to the smaller systematic error in CASBS.

The improvements due to CASBS can be highly relevant in specific cases. Figure 9 shows that most of the cases with negative skill scores for the European region for CTRL have positive skill for CASBS, especially for the May start date (note that the negative skill cases over Europe are displayed with blue dots in figure 9). This improvement can be appreciated in more detail in figure 10, where the ROCSS for the predictions of 500 hPa geopotential height and 850 hPa temperature is shown as a function of the start date and the forecast range. While the forecast quality decreases with lead time (from left to right, with the November start date results starting after the first five sets of bars), all the sample values (i.e. those obtained with the re-forecasts without resampling, which are displayed with black dots) for CASBS are positive, which is not always the case for CTRL. Confidence intervals (95%) obtained with the bootstrap method described above are displayed around the skill score sample value. They suggest that it is more likely to obtain a ROCSS significantly different from zero (colour bar clear of the zero-skill horizontal line) with CASBS than with the control.

Attribute diagrams (Hsu & Murphy 1986) offer a comprehensive illustration of the benefits of CASBS in terms of forecast quality. These diagrams allow the visualization of the reliability, resolution and sharpness of a system for a specific event. Figure 11 shows the attribute diagrams for first-month re-forecasts of precipitation anomalies over the tropics started on the 1 May. The diagrams illustrate the conditional relative frequency of occurrence of the events as a function of their forecast probability, based on a discrete binning of many forecast probabilities taken over a region. Each forecast probability bin in the diagrams is represented by a grey circle whose area is proportional to the bin sample size. The vertical line represents the average forecast probability, while the horizontal line is for the climatological frequency of the event. In the idealized case of infinite sample and ensemble sizes, the diagonal line represents perfect probabilistic reliability: if from a set of cases where an event is forecast with probability *p*, the event actually occurs on a fraction *p* of occasions. If, on the occasions where an ensemble forecast predicts some event with probability *p*, the event occurs in reality on a fraction *q* of times, then if *p* is sufficiently different from *q*, the ensemble forecast probabilities are not reliable. This case will appear in the diagram as a point away from the diagonal. If the corresponding curve is shallower than the diagonal, the forecast system is said to be over-confident, while if it is steeper the system will be under-confident. The sum of the horizontal squared distance of all the points to the diagonal (weighted by the sample size of each bin) is a measure of the lack of reliability of the system as measured by the Brier score. In the same way, the sum of the vertical distance of the points to the horizontal line corresponding to the climatological frequency of the event measures the forecast resolution, i.e. the ability of the system to produce reliable forecasts that differ from the naive probability. This means that if a reliability curve were to be horizontal, the frequency of occurrence would not depend on the forecast probabilities and the system would have zero resolution. The black dashed line in the diagram separates skilful regions from unskilful ones in the diagram: points with forecast probabilities smaller (larger) than the climatological frequency, which fall below (above) this line, contribute to positive BSS; otherwise they contribute negatively to the BSS. Finally, the sharpness is the variance of the forecast probabilities and is shown in the diagram by the proportion of probability forecasts away from the average probability.

The diagrams in figure 11 show that the unreliable curves obtained for CTRL are much steeper for CASBS, bringing the curve into the skilful area determined by the black dashed line (note the significant improvement of the corresponding BSSI). This happens for both the events considered. The increased steepness improves both the reliability (by decreasing the distance to the diagonal) and the resolution (by increasing the distance to the horizontal line corresponding to the climatological frequency). This occurs at the expense of a reduction in the variance of the forecast probability, or sharpness, illustrated by a reduction in the size of the dots for the extreme forecast probabilities in CASBS with respect to CTRL. Although a sharp set of probabilities, i.e. forecast probabilities with larger variance, is a desirable feature of a forecast system, large sharpness may be harmful in the case of overconfident reliability diagrams, such as the ones for CTRL.

## 5. Summary and conclusions

In this paper, we have studied the impact on the climate of the ECMWF coupled model of a CA scheme, as first proposed by Palmer (2001), with specific implementation by Shutts (2005), as a partial representation of stochastic sub-grid processes. Two aspects of this impact have been studied: on the systematic error of the model and on seasonal forecast skill. For the former, some remarkable reductions in the error in tropical seasonal mean rainfall, extratropical blocking frequency, and temperature drift in the tropical Pacific were found. The skill of the model, measured both in terms of ensemble-mean accuracy in predicting tropical Pacific SST variability, and in terms of probabilistic seasonal predictions of temperature, precipitation and mean sea-level pressure, was notably improved by the addition of the CA scheme.

The reduction of systematic error can be understood by the fact that climate is a profoundly nonlinear dynamical system, and that the proposed CA acts as a source of sub-synoptic scale and mesoscale energy that can be backscattered to the planetary-scale components of the flow. A simple analogy is a double-well potential. By adding stochastic noise to the system, the sub-dominant well may be visited more frequently, thus altering the mean state of the system. The increase in forecast skill arises from a combination of two effects. The first is the reduction in systematic error as discussed above. The second arises from the fact that the model equations are a key source of forecast uncertainty for climate prediction and, through different random initializations, the CA provides a representation of model uncertainty in ensemble predictions. Thus the CA increases the reliability of probabilistic seasonal forecasts, thereby impacting on standard skill scores.

In this paper, a relatively simple CA has been implemented in which the accurate representation of the dynamics of atmospheric sub-grid motions is somewhat minimal. As such, one line of development in the CA model is the inclusion of more explicit atmospheric sub-grid dynamics; for example, a ‘Mexican wave’ representation of equatorially trapped Kelvin wave dynamics can be coded very simply in a CA model. Hence, for example, one can envisage representing through a CA, the sort of convectively driven wave modes in the tropical atmosphere which are not well resolved in a standard global climate model. This is work for the future.

## Acknowledgments

This work was supported by the ENSEMBLES project (GOCE-CT-2003-505539). The authors would like to acknowledge Dr Thomas Jung for his contribution to the analysis of the experiments.

## Footnotes

One contribution of 12 to a Theme Issue ‘Stochastic physics and climate modelling’.

- © 2008 The Royal Society