## Abstract

Glacial climate variability is studied integrating simple nonlinear stochastic dynamical systems with palaeoclimatic records. Different models representing different dynamical mechanisms and modelling approaches are contrasted; model comparison and selection is based on a likelihood function, an information criterion as well as various long-term summary statistics. A two-dimensional stochastic relaxation oscillator model with proxy temperature as the fast variable is formulated and the system parameters and noise levels estimated from Greenland ice-core data. The deterministic part of the model is found to be close to the Hopf bifurcation, where the fixed point becomes unstable and a limit cycle appears. The system is excitable; under stochastic forcing, it exhibits noisy large-amplitude oscillations capturing the basic statistical characteristics of the transitions between the cold and the warm state. No external forcing is needed in the model. The relaxation oscillator is much better supported by the data than noise-driven motion in a one-dimensional bistable potential. Two variants of a mixture of local linear stochastic models, each associated with an unobservable dynamical regime or cluster in state space, are also considered. Three regimes are identified, corresponding to the different phases of the relaxation oscillator: (i) lingering around the cold state, (ii) rapid shift towards the warm state, (iii) slow relaxation out of the warm state back to the cold state. The mixture models have a high likelihood and are able to capture the pronounced time-reversal asymmetry in the ice-core data as well as the distribution of waiting times between onsets of Dansgaard–Oeschger events.

## 1. Introduction

A striking feature of climate variability during the last glacial period is the repeated abrupt transitions between cold stadials and warm interstadials, the so-called Dansgaard–Oeschger (DO) events [1]. The DO events have a characteristic saw-tooth shape: they typically start with a rapid warming by up to 10–15 K over only a few decades, followed by gradual cooling over several hundred or several thousand years; sometimes, the cooling phase ends with an abrupt final drop of temperature back to stadial conditions.

The origin and dynamical mechanism of the DO events is still under debate. Crucial involvement of the thermohaline ocean circulation appears to be well established. DO events may be attributed to a temporary collapse and resumption of the Atlantic meridional overturning circulation, corresponding to two different modes of deep-water formation [2,3]. Different mechanisms for the switching between the two states have been investigated. Several authors refer to internal oscillations in the climate system, possibly combined with excitability because of stochastic fluctuations [4–7]; deep-decoupling oscillations at the right time scale, driven by fresh water fluxes, have been demonstrated. Gildor & Tziperman [8] suggested a mechanism involving sea ice variations. Moreover, external forcing mechanisms, e.g. solar forcing, have been invoked [3,9]. Nonlinear resonance effects such as stochastic resonance [10,3] and/or coherence resonance [3,7] could explain the apparent regularity of the DO events.

Besides studies with complex numerical models, investigations have been made reducing the system to low-order, box and conceptual models both in a deterministic and stochastic setting. A bistable nonlinear system, at the simplest level described by a one-dimensional potential landscape, has been assumed in which shifts between the two distinctly different states are triggered randomly by stochastic forcing [11–13]. Stochastic resonance [14,15] may or may not play a role in such a model. Under the relaxation oscillation paradigm, a two-dimensional heat-salt oscillator exhibiting millennial-scale oscillations was first proposed by Welander [16]. Deterministic and stochastic low-order systems, physically based upon a highly truncated model of the thermohaline circulation of the Atlantic ocean, have been formulated [17,18]. Moreover, a system of relaxation oscillators with nonlinear coupling and phase synchronization has been postulated [19]. Furthermore, a system of nonlinear delay differential equations modulated by external forcing was put forward [20]. In a conceptual model approach, Dima & Lohmann [21] propose that millennial variability can originate from rectification of an external forcing, and suggest that the thermohaline circulation, through a threshold response, could be the rectifier. The latter hypothesis is a combination of the externally driven and internal oscillation hypotheses. In the study of Braun *et al.* [22], a very simple conceptual model is formulated that combines an internal threshold mechanism with external forcing. The mechanism of ghost resonance is introduced to explain how an external forcing could generate the time scale of about 1500 years of the DO events, even in the absence of that period in the input.

Many different dynamical mechanisms and low-order models have been proposed to explain the DO events, and some quantitative agreement with data has been confirmed. However, little work has been done so far on systematically integrating models with palaeoclimatic data, that is, on parameter estimation by assimilating data into such models and on model comparison in order to assess which of the dynamical mechanisms is better supported by the data. Recently, a one-dimensional Langevin equation describing stochastically driven motion in a potential landscape [12] and a Duffing-type oscillator [23] have been derived from an ice-core record using Kalman filtering. A relaxation oscillator model for glacial–interglacial cycles has been formulated and parameters estimated in a Bayesian framework [24].

In the present paper, two novel models for rapid glacial climate transitions are formulated and their parameters estimated from ice-core records: a relaxation oscillator model and a mixture of local linear models. The study addresses the issue of model comparison and selection using a likelihood approach, an information criterion as well as various long-term summary statistics.

The paper is organized as follows: the two Greenland ice-core records used for parameter estimation are introduced and described in §2. In §3, the relaxation oscillator model is formulated and the procedure for parameter estimation is described. A modelling approach based on combining local linear models is introduced in §4. The results are presented and discussed in §5. The paper closes with a concluding section.

## 2. Ice-core data

In this study, two ice-core datasets from different locations in Greenland are used to calibrate the dynamical models: a GRIP [25] and an NGRIP [26] ice core. They both give a record of the isotope ratio *δ*^{18}O as a proxy for Northern Hemisphere temperatures with a sampling interval of *δt*=50 years. We actually use the time series in the time interval from 60 000 to 20 000 years before present (figure 1) as this is the period of main DO activity during the last ice age and the instationarity of the datasets is not too severe in this interval. The dynamical models are all formulated as anomaly models; hence, the mean value is removed from the time series prior to the analysis.

## 3. Conceptual dynamical models: bistability versus relaxation oscillations

### (a) Bistable potential model

Abrupt palaeoclimatic changes are sometimes thought to be related to a switching between two distinct states in a stochastically driven nonlinear system [10,3]. A simple conceptual model is Brownian motion in a one-dimensional potential landscape [27,11,12] described by the Langevin equation
3.1
where *U*(*z*) is a potential function, *η* is a white Gaussian noise with zero mean and unit variance and *σ* is the standard deviation of the stochastic forcing. The system has stationary probability density function . Such a potential model has been calibrated to Greenland ice-core data [12,13] using a nonlinear Kalman filter assuming the potential to be of the form *U*(*z*)=*a*_{4}*z*^{4}+*a*_{3}*z*^{3}+*a*_{2}*z*^{2}+*a*_{1}*z* as a canonical ansatz allowing for bistability. A bistable potential is found with a deep well corresponding to a very stable cold state and a shallow well corresponding to a warm state close to marginal stability. Transitions back and forth occur randomly according to the Kramers rates determined by the noise level and the respective height of the potential barrier. The model is able to capture the amplitude of the variability and the time scale of the transitions. However, the potential model always satisfies the condition of detailed balance, and thus cannot break the time-reversal symmetry of the joint stationary probability density (*p*_{st}(*z*,*t*;*z*^{′},*t*^{′})=*p*_{st}(*z*^{′},*t*;*z*,*t*^{′})).

Still under the bistability paradigm, the dynamical framework has been extended to a noise-driven Duffing-type oscillator [23]. However, when estimating the system parameters from ice-core data [23], the oscillator turns out to be in the strongly dissipative limit (Smoluchowski regime) where it can be well approximated by an effective one-dimensional Langevin equation as given by equation (3.1).

### (b) Relaxation oscillator

A different paradigm for the dynamics of DO events are relaxation oscillations [5,6,28]. To derive a minimal conceptual model for this scenario, we start out from a generalized Duffing–van der Pol oscillator
3.2
and
3.3
The equations describe motion with inertia in a potential *V* (*z*) with a state-dependent damping *f*(*z*). The model could be generalized further by allowing the damping to also depend on *v* [29], but we avoid this complication here. The Duffing–van der Pol oscillator includes two different mechanisms: (i) the nonlinear restoring force from the Duffing oscillator with possible multi-stability of the system if the potential *V* (*z*) has multiple equilibria and (ii) the state-dependent damping that enables relaxation oscillations of which the van der Pol oscillator is a prototype. We here consider a pure relaxation oscillator by assuming a potential of the form
3.4
with parameters *k*>0 and *z*_{0}, giving rise to a linear restoring force with no bistability. We further assume the simple stability profile
3.5
with parameters *α*, *β* and *z*_{*}. Stable long-term solutions of the system require *β*>0. Using the Liénard transformation *w*=[*v*+*F*(*z*)]/*k* with
3.6
and adding stochastic forcing, we arrive at the two-dimensional stochastic dynamical system
3.7
with *G*(*z*)=*F*(*z*)/*k*. Here, *η*_{1} and *η*_{2} are independent white Gaussian noises with zero mean and unit variance; *σ*_{1} and *σ*_{2} are the standard deviations of the stochastic forcings. This system will be studied in the following.

The model (3.7) is a slow–fast system of the relaxation oscillator type. Its deterministic part is similar to the FitzHugh–Nagumo model [30,31]. The curvature *k* of the potential, that is, the strength of the restoring force is the time-scale ratio of the slow and the fast variable, corresponding to a small time-scale separation parameter *ε*=1/*k*. For *k*≫1, *z* is the fast variable; it will be identified with the isotope ratio given in the ice-core records as a proxy for palaeotemperature. Here, *w* is a recovery-type slow variable providing the destabilization process for the relaxation oscillator; its physical nature is not specified here. The slow manifold of the system is given by the *z*-nullcline as *w*=*G*(*z*). The *w*-nullcline is the straight line *z*=*z*_{0}. The present model is similar to a recently proposed model for glacial–interglacial cycles [24,28]. The structure of the slow manifold will here be inferred from data rather than prescribed. The author believes this is a novelty both in the palaeoclimate and applied mathematics communities.

The stochastic terms are included to enhance the realism of the model. The ice-core records are generated by an extremely complex dynamics involving many spatial and temporal scales. Any model as simple as the one considered here cannot be expected to provide more than a kind of ‘skeleton’ dynamics of the system. The noise terms account for the influence of unresolved scales on the variables *z* and *w* in the spirit of stochastic climate models. Moreover, they are meant to account for structural model error. From a statistics perspective, the stochastic terms keep an appropriate level of uncertainty consistent with the level of predictability present in the data (cf. [32]). Of course, even with the noise forcing the model is still crude; there may be systematic model biases on top of that, and also the Gaussian white noise is just an approximation to a more complex situation. It is worth noting that the additive noise in (*z*,*w*)-space amounts to multiplicative noise in (*z*,*v*)-space.

The stationary probability density function of the model (3.7) is generally not tractable analytically. The Duffing–van der Pol oscillator in (*z*,*v*)-space with noise only on the *v*-equation, that is, the system , has stationary probability density [33]. The stationary probability density of model (3.7) is even more complex and flexible. The system (3.7) breaks detailed balance and thus the time-reversal symmetry of the joint stationary probability density: *p*_{st}(*z*,*w*,*t*;*z*^{′},*w*^{′},*t*^{′})≠*p*_{st}(*z*^{′},−*w*^{′},*t*;*z*,−*w*,*t*^{′}), and thus generally *p*_{st}(*z*,*t*;*z*^{′},*t*^{′})≠*p*_{st}(*z*^{′},*t*;*z*,*t*^{′}).

For *α*<0, there is an unstable branch of the slow manifold extending on the interval [*z*_{*}−(|*α*|/*β*)^{1/2},*z*_{*}+(|*α*|/*β*)^{1/2}]. The deterministic part of the model possesses a single fixed point at (*z*_{0},0) given by the intersection of the *z*- and *w*-nullclines. For 4*k*−*f*^{2}(*z*_{0})>0, the eigenvalues of the Jacobian at the fixed point are *λ*_{1,2}=*μ*±i*ω* with *μ*=−*f*(*z*_{0})/2 and *ω*=[*k*−*f*^{2}(*z*_{0})/4]^{1/2}. The fixed point then is a focus characterized by an *e*-folding time *τ*_{e}=1/|*μ*| and a period *T*=2*π*/*ω* (the time between passing the same angle in the (*z*,*w*)-plane). If *f*(*z*_{0})>0, the focus is stable. If the system is close enough to the bifurcation point, it is excitable [34]. Under stochastic forcing, the model exhibits noisy oscillations corresponding to a stochastic limit cycle with a noise-induced eigenfrequency. There is an optimal noise variance at which the noise-induced oscillations are most coherent; this is the regime of coherence resonance [35,34]. At *f*(*z*_{0})=0, the system undergoes a supercritical Hopf bifurcation. If *f*(*z*_{0})<0, the focus is unstable. The system then has a unique stable limit cycle as its global attractor, performing self-sustained oscillations even in the absence of external perturbations or stochastic forcing. The limit cycle consists of two pieces of slow motion connected by fast jumps. A strong enough stochastic forcing can significantly alter the amplitude and the period of the limit cycle. For 4*k*−*f*^{2}(*z*_{0})<0, the fixed point becomes a node; this regime will not be relevant here.

When integrating the system (3.7) for a long time, we have , where 〈⋅〉 denotes the average with respect to the stationary probability density of the system. Averaging over the *w*-equation yields 〈*z*〉=*z*_{0}. Thus, the parameter *z*_{0} is the mean state of the model. It appears reasonable to require that the mean state of the model matches the mean state in the data. The model is here formulated, as an anomaly model; we therefore set *z*_{0}=0, eliminating one parameter from the estimation problem.

The number of parameters in the model (3.7) to be estimated from the ice-core data is *p*=6 (*α*, *β*, *z*_{*}, *k*, *σ*_{1} and *σ*_{2}).

### (c) Parameter estimation

The unscented Kalman filter [36] is used for parameter estimation in the relaxation oscillator model; it is applied to the augmented state vector formed by merging the state vector and the parameter vector. The suitability of the unscented Kalman filter for calibrating simple stochastic dynamical systems has been demonstrated [12,23], particularly in a palaeoclimatic context. For systems with high dynamical noise level, the unscented Kalman filter is preferable to the more common extended Kalman filter as the superior covariance propagation of the unscented Kalman filter has a visible effect [23]. Here, only the *z*-component of the oscillator is observed. The implementation of the unscented Kalman filter follows closely that detailed in the study of Kwasniok & Lohmann [23] for the Duffing oscillator with the obvious modifications. The step size in the Kalman filter is set to *h*=*δt*/100=0.5 years. The dataset spanning a period of 40 000 years with 800 data points is too short to obtain well-converged estimates for the parameters. Therefore, the data are processed 10 times to improve the estimates [12,23].

For estimation of the noise levels, we follow a likelihood approach [32]. The uncertainty is dominated by the dynamical noise [32] as the measurement error in the isotope records is small. We therefore set the observational noise variance in the Kalman filter to virtually zero and only the dynamical noise levels *σ*_{1} and *σ*_{2} are to be determined. The predictive likelihood of the Kalman filter [32] is calculated on a fine enough mesh in (*σ*_{1},*σ*_{2})-space and the maximum found. A separate issue is the dating uncertainty in the ice-core records, which is not considered here. Its size is hard to quantify and it is unclear how to incorporate it into the Kalman filter.

## 4. Mixture of linear stochastic models

Complementarily to the conceptual models considered before, the ice-core records are modelled using a collection of simple local stochastic models, associated with different dynamical regimes. This is a generic modelling approach from time-series analysis, not *a priori* referring to a particular dynamical mechanism or paradigm. Nevertheless, the inferred regimes and models may be interpreted afterwards. Local modelling appears to be suitable for systems with strongly nonlinear threshold effects and switches such that no global model is easily available. Two variants of the mixture model with slightly different philosophies are considered: a Markov-switching mixture model and a cluster-weighted mixture model.

### (a) Markov-switching mixture model

#### (i) Model formulation

The existence of a finite number of different dynamical regimes is assumed, which cannot be observed directly. The number of regimes is *M* and the integer variable *c* takes values from 1 to *M*, according to which regime the system is in. The dynamics of the regimes is modelled by a Markov chain described by an (*M*×*M*) transition matrix **A** with components
4.1
The matrix **A** is a row-stochastic matrix, that is, *A*_{ij}≥0 and . The initial probability distribution of the regimes is
4.2
Each regime is associated with a first-order autoregressive process (AR(1) process) modelling the data
4.3
*η*_{i} are independent white Gaussian noise processes with zero mean and unit variance. The local model corresponds to a conditional predictive probability density given by
4.4
The number of regimes *M* is a hyperparameter of the model, controlling its overall complexity. For *M*=1, the mixture model is an AR(1) process. The number of independent parameters is *M*^{2}+3*M*−1 (*M*^{2}+4*M* parameters and *M*+1 constraints).

#### (ii) Parameter estimation

Given a learning dataset of length *N*, {*z*_{0},…,*z*_{N}}, the parameters of the model are estimated by maximizing the likelihood function *L*(*z*_{1},…,*z*_{N}|*z*_{0}). A suitable expectation–maximization (EM) algorithm [37,38] is devised for this task. In the present model setting, this is a forward–backward algorithm similar to the Baum–Welch algorithm [39] for hidden Markov models. Two auxiliary probabilities are introduced: the forward variable *α*_{i,n}=*p*(*z*_{1},…,*z*_{n},*c*_{n}=*i*|*z*_{0}) and the backward variable *β*_{i,n}=*p*(*z*_{n+1},…,*z*_{N}|*z*_{n},*c*_{n}=*i*). They obey the recursion relations
4.5
with initialization *α*_{i,0}=*θ*_{i} and
4.6
with the definition *β*_{i,N}=1. We further introduce the posterior probabilities
4.7
and
4.8
For information purposes, a weight *w*_{i} is introduced as
4.9
which can be interpreted as the fraction of time spent in regime *i* or the fraction of data points accounted for by model *i*. The likelihood function of the data is
4.10
The EM algorithm is iterative. Having arrived at parameter estimates , , , and , the expectation step of the (*m*+1)th iteration consists of calculating the posterior probabilities and . In the maximization step, the parameters are re-estimated according to the following equations:
4.11
with , , , , ,
4.12
4.13
4.14
The algorithm is monotonically non-decreasing in likelihood and converges to a maximum of the likelihood function *L*. Different initial guesses for the parameters should be tried as the likelihood landscape may have multiple local maxima.

### (b) Cluster-weighted mixture model

#### (i) Model formulation

The methodology of cluster-weighted modelling [40,41] is suitably adapted here. The local model is conditional on clusters in state space. We here choose to condition the model on the current state *z*(*t*) and the increment *δz*(*t*)=*z*(*t*)−*z*(*t*−*δt*); this corresponds to a two-dimensional time-delay embedding of the system. A data point (*z*,*δz*) is mapped to a discrete state by partitioning the (*z*,*δz*)-space into bins. The *z*-space is divided into the two disjoint intervals and . We choose for the GRIP data and for the NGRIP data, which is the pronounced local minimum of the probability density of the data, separating the two modes of the bimodal distribution representing the cold and the warm state. The *δz*-space is divided into the two disjoint intervals and , corresponding to downward and upward trajectories. The binning is summarized in a single integer variable *s*, taking values from 1 to 4. We have *s*=1 if and *δz*>0; *s*=2 if and *δz*>0; *s*=3 if and *δz*≤0; *s*=4 if and *δz*≤0.

We introduce *M* clusters in the discrete *s*-space. Each cluster has an overall weight or probability of that cluster being chosen,
4.15
and a clustering probability distribution
4.16
describing its domain of influence in *s*-space. The parameters of the clusters satisfy a number of probabilistic constraints. The overall weights form a probability distribution: *w*_{i}≥0, . The clustering probability distributions satisfy *ψ*_{ik}≥0 and . The clusters are required to add up to the climatological distribution (invariant measure) of *s*, that is, , where *ρ*_{k} is empirically given as the fraction of data points in these bins: .

Each cluster is associated with an AR(1) process as given by equation (4.3), corresponding to a conditional predictive probability density *N*_{i,n+1} given by equation (4.4). The total predictive probability density is modelled as a sum over the clusters,
4.17
The state-dependent model weights are given by Bayes’ rule as
4.18
The mean local model weights are found to be . Hence, the overall model weight *w*_{i} can be interpreted as the fraction of the dataset accounted for by the model *i* or the fraction of time spent in cluster *i*.

The number of independent parameters in the model is 7*M*−4 (8*M* parameters and *M*+4 independent constraints).

#### (ii) Parameter estimation

As before, given an equally sampled learning dataset of length *N*, {*z*_{0},…,*z*_{N}}, the parameters of the cluster-weighted mixture model are estimated according to the maximum-likelihood principle using a suitably devised EM algorithm. The likelihood function of the data is
4.19
The expectation step of the (*m*+1)th iteration consists of calculating the posterior probabilities
4.20
In the maximization step, the parameters are updated according to the re-estimation formulae
4.21
and
4.22
Equations (4.11) and (4.12) again apply here. The constraints on the parameters are satisfied by the algorithm provided the initial guesses to satisfy them.

## 5. Results and discussion

### (a) Model comparison and selection

The various models were estimated from the GRIP and NGRIP ice cores as described previously. Table 1 summarizes the obtained values of the likelihood function. The likelihood function reflects the (in-sample) predictive capability of the various models as it is inversely proportional to the mean ignorance score [42] of probabilistic one-step predictions of the data. As a guideline for model comparison and selection, we use the Bayesian information criterion given by
5.1
where *p* is the number of parameters in the model, *N* the length of the time series (here *N*=800) and *L* the likelihood function for the optimal parameter set. The Bayesian information criterion quite strongly penalizes additional parameters. This appears to be very suitable in the present situation. Given the high noise level and the shortness of the ice-core records, they do not contain much information and one would therefore like to keep the number of parameters to a minimum and go for the most parsimonious model.

### (b) Bistable potential model

One-dimensional potential models as outlined in §3*a* were estimated from the GRIP and NGRIP time series using the procedure detailed in [12,32] based on the unscented Kalman filter. The number of parameters is *p*=4 (*a*_{4}, *a*_{3}, *a*_{2}, *σ*) as the parameter *a*_{1} is fixed by a mean-matching condition [12]. The estimated parameter values are *a*_{4}=0.33, *a*_{3}=−0.74, *a*_{2}=−1.28, *a*_{1}=3.58 and *σ*=4.1 for GRIP, and *a*_{4}=0.18, *a*_{3}=−0.28, *a*_{2}=−0.93, *a*_{1}=1.72 and *σ*=3.9 for NGRIP. Both potential landscapes are bistable, having a deep well corresponding to a cold stadial state and a shallow well corresponding to a warm interstadial state (not shown), in accordance with earlier findings [12,13]. The shallow potential well is a bit deeper than that obtained in [12] for the NGRIP data, the potential here being further away from degeneracy. This is due to the fact that a slightly different time period of the NGRIP record was used in [12].

### (c) Relaxation oscillator

The relaxation oscillator model was derived from both ice-core records as described in §3*c*. Figure 2 displays the log-likelihood in the final sweep through the data as a function of the noise levels *σ*_{1} and *σ*_{2} for the NGRIP data. Using a mesh of size 0.1 for both *σ*_{1} and *σ*_{2}, there is a maximum at *σ*_{1}=4.8 and *σ*_{2}=1.7. Expectedly, given the shortness and stochastic character of the ice-core records, there is some uncertainty about the noise levels as the likelihood function is quite flat. Figure 3 gives the parameter estimates and their uncertainties produced by the Kalman filter when running through the data. After 10 sweeps, reasonably converged estimates for all parameters are obtained. Taking averages over the last sweep and the final uncertainty estimate, the values of the parameters are *α*=−0.52±0.46, *β*=7.81±0.30, *z*_{*}=0.28±0.03 and *k*=26.36±1.10. The prior estimates and uncertainties used to initialize the Kalman filter were *α*=0±2, *β*=10±5, *z*_{*}=0±1 and *k*=20±10. There is substantial uncertainty in *α*; the other parameters are better determined, with *z*_{*} being particularly well defined. The parameter estimation for the GRIP data looks very similar (not shown). The parameter values are *σ*_{1}=6.6, *σ*_{2}=1.8, *α*=−1.24±0.60, *β*=15.46±0.63, *z*_{*}=0.49±0.03 and *k*=37.59±1.49. In view of the parameter uncertainties, the rounded values listed in table 2 are actually adopted. The stability characteristics of the fixed point, the *e*-folding time and the period are also given.

Figure 4 sketches the state space portrait of the two models. Both models have a small flat unstable branch of the slow manifold as *α* is relatively close to zero. The fixed point is close to marginal stability in both the GRIP and the NGRIP data. Taking the mean parameter estimates at face value, it lies just outside the unstable branch in both cases, with the GRIP model being more stable. This corresponds to the stability characteristics quoted in table 2. In view of the parameter uncertainties, it can actually not be decided whether the NGRIP model is before or beyond the Hopf bifurcation point. For the GRIP data, the fixed point is still stable within one error standard deviation of all parameters. Both systems are excitable. A scatter of data points as estimated by the Kalman filter illustrates the stochastically driven motion of the models. Ten points in the scatterplot correspond to one data point: the *z*-coordinate is given by the data as there is no state uncertainty; the Gaussian uncertainty in *w* given by the Kalman filter after the update step is represented by drawing 10 values at random.

The picture obtained here is consistent with earlier low-order model studies finding a system of the relaxation oscillator type close to a Hopf bifurcation with a limit cycle emerging [17,18]. Signatures of a relaxation oscillator mechanism and excitability owing to noise have also been found in more complex ocean–atmosphere models [5,3,7].

The relaxation oscillator model is clearly better supported by the data than the bistable potential model. The likelihood in the GRIP data is higher by a factor of e^{14}≈1.2×10^{6}, and it is clearly preferred by the Bayesian information criterion. For the NGRIP data, the likelihood ratio is e^{13.2}≈5.4×10^{5}. The bistable potential model is able to clearly outperform in likelihood in both datasets an AR(1) process as a simple benchmark.

There is a link between the relaxation oscillator model and one-dimensional potential dynamics. The *z*-equation of the model (3.7) can be written as a one-dimensional Langevin equation,
5.2
where *ϕ* is a non-stationary effective one-dimensional potential modulated by the slow variable *w*,
5.3
We have immediately set *z*_{0}=0 for simplicity. The effective potential constantly undergoes bifurcations during the DO cycles. This is depicted in figure 5, showing the effective potential for some values of *w* consistent with the dynamics derived from the data (cf. figure 4). Starting, for example, in the stadial state and *w*=−2, a rapid transition to the interstadial state is triggered by *w* drifting to positive values and rendering the stadial state highly unstable. A transition back to the stadial state is triggered by *w* drifting back to negative values, rendering the interstadial state unstable. Stochastic forcing is needed to excite the system and it adds an irregular random component to the cycle. The present scenario is conceptually very different from noise-induced transitions between two available states in a stationary bistable potential [12,13]. Here, *w* modulates only the linear term of the effective potential (equation (5.3)); it is therefore interesting to note that a mathematically equivalent and indistinguishable effective potential dynamics arises if *w* is a prescribed external forcing rather than being coupled to *z*. A physical interpretation of the variable *w* may be fresh water fluxes [5,17,2], possibly hinting at a triggering of DO events by Heinrich events or even a two-way coupling between DO events and Heinrich events [7].

Cessation of the DO events at the end of the glacial period probably relates to a stochastic bifurcation owing to a stabilization of the oscillator such that the system loses excitability and the warm state cannot be realized any more. The exact nature of this critical transition is unclear and needs further investigation (cf. [43]). The relaxation oscillator picture modifies the interpretation of the recent finding of a bifurcation at the end of the glacial period [13,44]. There is a change in the probability density of *z* from bimodal to unimodal, but for an understanding of its dynamical origin, the one-dimensional potential model framework is insufficient as the system is of higher dimension (cf. [45]).

Various extensions of the relaxation oscillator model are possible.

— As a test, the model was driven by red rather than white noise. Two Ornstein–Uhlenbeck processes

*ξ*_{1}and*ξ*_{2}were defined to drive the model instead of*η*_{1}and*η*_{2}, introducing two more parameters. However, it turns out that the increase in the likelihood function is only rather minor and the Bayesian information criterion clearly prefers the white noise model over the red noise model.— A damping term −

*λw*was added to the*w*-equation and then*z*_{0}kept as a parameter to be estimated. This corresponds to a tilted*w*-nullcline as in the FitzHugh–Nagumo model. It turns out that*λ*is actually very small, that is, the*w*-nullcline is almost vertical as is assumed in the model (3.7).— A fourth-order potential

*V*(*z*)=*a*_{4}*z*^{4}+*a*_{3}*z*^{3}+*a*_{2}*z*^{2}+*a*_{1}*z*was assumed instead of equation (3.4) and a suitable Liénard transformation used. This possibly combines the relaxation and the bistability mechanisms. The fourth-order potential is found not to be bistable; it just provides a rather small correction to the quadratic potential. This further strengthens the finding that the relaxation oscillator mechanism is clearly better supported by the data than the bistability paradigm.— Assimilation of data also for

*w*would add much information to the model identification problem. The more general model , with*G*(*z*)=*az*^{3}+*bz*^{2}+*cz*+*d*would then be identifiable.— Another modification worth studying is the inclusion of an external forcing to modulate the output of the model. Stochastic resonance effects, possibly together with a coherence resonance, could be investigated (cf. [3]). The timing of the excitation may then be related to the phase of the external forcing. A particular topic could be the ability of the model to produce some facsimile of a Bond cycle (bundling of DO events, cf. [17]), which the model described here is not able to represent.

— A three- or higher-dimensional slow–fast system could be investigated.

### (d) Mixture of linear models

Markov-switching and cluster-weighted mixture models were derived from both ice-core records as outlined in §4 for numbers of regimes *M* ranging from 1 to 5. For both datasets and both model settings, a model with *M*=3 regimes is sharply chosen by the Bayesian information criterion (table 1). The mixture models are vastly superior in likelihood to the relaxation oscillator and potential models and overwelmingly preferred by the information criterion. Their architecture allows a quite high degree of complexity to be modelled, including a flexible multiplicative noise. In the GRIP data, the Markov-switching model is clearly better than the cluster-weighted model; in the NGRIP data, the cluster-weighted model is slightly better.

Table 3 lists the estimated model parameters. Each linear model has a fixed point at towards which the state is relaxed and which is the mean state of the stochastic process. The transition matrices for the Markov-switching model are 5.4 for the GRIP data, and 5.5 for the NGRIP data.

The first regime corresponds to a prolonged lingering around the cold state, showing up in the AR(1) picture as a quite strong relaxation towards that state. This is the most populated regime with *w*_{1}=0.572 in the GRIP model and *w*_{1}=0.595 in the NGRIP model. The second regime has a large positive shift; it can be clearly interpreted as a rapid warming event. The warm state is quickly reached and there is little lingering around it in the data. This is reflected by a very strong relaxation (small *b*_{2}) in the model. As expected, the rapid warming regime only accounts for a small fraction of the time series. Its weight is *w*_{2}=0.085 in the GRIP data and *w*_{2}=0.059 in the NGRIP data. The third regime describes the gradual cooling following the rapid warming event. It exhibits a slow relaxation out of the warm state towards a medium state. The weight of the third regime is *w*_{3}=0.343 in the GRIP record and *w*_{3}=0.346 in the NGRIP record.

The transition matrices reveal a preferred cycle among the regimes. The first regime is very persistent and is left only for the rapid warming regime according to a geometric waiting time distribution with success probability *A*_{12} implying a mean waiting time *δt*/*A*_{12}, which is 1320 years in the GRIP model and 1160 years in the NGRIP model. In continuous time, this corresponds to a Poisson process with exponential waiting time distribution. The rapid warming state is short lived, being somewhat more persistent in GRIP. It is preferably left for the third regime, but there is some probability of going back to the first regime, corresponding to an aborted warming event. The gradual cooling regime is again very persistent. It is left only for the first regime. The mean residence time in the third regime is *δt*/*A*_{31}, which is 980 years in the GRIP data and 880 years in the NGRIP data. This dynamical pattern is consistent with the phases of the relaxation oscillator model, the probabilistic component of the cycle being due to the stochastic forcing.

The cluster-weighted mixture model reveals a very similar picture based on clusters in state space. Also, the parameter structure of the models is very similar (table 3). Figure 6 shows the location of the clusters and figure 7 the corresponding local model weights. The bins in the order from 1 to 4 correspond to a coarse-grained representation of the oscillation cycle as *δz* is a velocity-like variable. Cluster 1 describes a lingering around the cold state; cluster 2 is the DO event; cluster 3 is the relaxation state.

Figure 8 displays the time series of the regime posterior probabilities *δ*_{i,n}. This allows for a classification of time-series episodes into the regimes. In the Markov-switching model, owing to the persistence of regimes 1 and 3, there are prolonged episodes in these regimes that coincide well with what is visible in the time series. For example, the gradual cooling between 55 000 and 50 000 years before present is classified as regime 3 with probability almost 1; the period from 32 000 to 20 000 years is almost entirely in regime 1, except for three rapid warming events and two occasions of regime 3 contribution after warming events. Expectedly, in the cluster-weighted mixture model, the posterior distribution is less sharp. Nevertheless, there are still prolonged episodes where one of the clusters has the largest posterior probability.

The posterior probabilities allow for an objective identification of DO events, based on the underlying dynamical evolution. Figure 9 gives the posterior probability of the second regime only for instants in time where it is the largest out of the three regimes. Alternatively, one could define a threshold for *δ*_{2,n} above which a data point is classified as a DO event. The occasions flagged as DO events in this way largely coincide with the conventional identification of DO events by eye (events 2–17 in the interval from 60 000 to 20 000 years before present), except for some splitting of events. In the GRIP dataset, the event 15 is split into two events with both models, with the cluster-weighted model also the event 16. In the NGRIP data, the events 16, 15, 14 and 13 are split into two events with both models. In particular for event 15, it appears reasonable that dynamically it should be regarded as two events. This would have consequences for the analysis of waiting time statistics and conclusions about dynamical mechanisms based on them. The (somewhat weak) event 9 is identified as a DO event in both records with both models. In the GRIP data, the cluster-weighted model identifies with low probability an additional event at about 39 500 years before present, corresponding to a small spike visible in the record. Setting a probability threshold of, for example, 0.5 would eliminate this event. Overall, the cluster-weighted model gives a somewhat cleaner identification of the DO events, corresponding to a lower weight *w*_{2} of that state than in the Markov-switching model. The reason is that the rapid warming event is a short-lived rather than a metastable state.

Figure 10 illustrates the distribution of waiting times between onsets of DO events obtained from the different models. In the Markov chain, conditional on being in regime 1 (or 3), the waiting time until the next stadial–interstadial (or interstadial–stadial) transition follows a geometric distribution. The distribution for the total waiting time between successive onsets of DO events, that is, between successive entries into regime 2, is then more complex. The probability density has a maximum at a finite value, giving the preferred waiting time of about 1500 years. This is basically in accordance with recent work on the stochastic nature of DO events [47]. The waiting time distribution based on the data points flagged as DO events (figure 9) is in accordance with the distribution from the Markov chain for both the Markov-switching and the cluster-weighted model. The distribution based on the timings of onsets of DO events according to the conventional classification [46] has fewer short waiting times.

### (e) Long-term integration

Long-term integrations of the various models are performed in order to check to what extent they are able to reproduce certain statistical properties of the ice-core data. The potential and the relaxation oscillator model are integrated using the Euler–Maruyama scheme with step size 10^{−5}; the mixture models are iterated according to their one-step predictive probability density.

Figure 11 shows the stationary probability density of the models and that of the ice-core records. All models capture the bimodality of the distribution and the position and population of the cold and warm state but with different accuracy. The potential model fails to capture the clear separation of the states with a minimum in between. The oscillator model improves on this. The Markov-switching mixture model has two somewhat exaggerated distinct modes as, owing to the persistence of regimes 1 and 3, the probability density is basically a sum of two Gaussians corresponding to the models 1 and 3. The cluster-weighted mixture model has a notably less distinct minimum in the GRIP case.

Figure 12 displays the third-order moment *M*(*τ*)=〈*z*(*t*)*z*^{2}(*t*+*τ*)−*z*^{2}(*t*)*z*(*t*+*τ*)〉 as a function of the time lag *τ*. This is a measure for time-reversal asymmetry [48]. We have *M*(0)=0 and, for a mixing system, . The Markov-switching mixture model captures the pronounced asymmetry, up to a lag of 400 years with the correct strength; the asymmetry in the cluster-weighted model is somewhat weaker due to the inherent interpolation between the clusters. For large lags *τ*, the influence of the non-stationarity of the ice-core data on the statistic *M*(*τ*) is unclear. The relaxation oscillator model has too little instability to display pronounced time-reversal asymmetry. It carries an important basic dynamical mechanism but is too simple to model the strong switches in palaeoclimate dynamics. This is already evidenced by the much worse likelihood compared with the mixture models. The potential model is intrinsically not able to have any time-reversal asymmetry.

Figure 13 shows sample trajectories of the models in comparison with the anomaly dataset for NGRIP. All models reproduce the correct amplitude and time scale of the switches between the cold and the warm state. The mixture models, in accordance with the results from figure 12, also have some visible time-reversal asymmetry, that is, the characteristic saw-tooth shape of the warming events. The other models fail to show this feature.

## 6. Conclusions

Glacial climate variability was analysed and modelled using several different simple dynamical systems. Models and data are combined directly by using palaeoclimatic records to estimate parameters in conceptual models. This yields a natural likelihood-based statistical framework for model comparison and model selection. The paradigm of relaxation oscillations as the underlying dynamical mechanism of DO events [5,17,7] is adopted and a conceptual model formulated. The relaxation oscillator model is found to be much better supported by the data than a bistable potential model considered earlier [12,13]. Two mixtures of linear stochastic models are considered, yielding a decomposition of the dynamics into local models, each associated with a regime or cluster. For both of these, the best results were obtained with three local models, corresponding to the phases of the relaxation oscillation.

In summary, the dynamics of DO events appears to be formed by a deterministic ‘skeleton’ dynamics based on a relaxation oscillator mechanism, determining the amplitude and the preferred time scale of the DO cycle. Noise acts as a crucial agent to excite the system. Due to the high overall noise level, the onset of individual DO events is largely stochastic and unpredictable.

The simple models considered here may serve as null models against which to test further hypotheses about the dynamics and regularity of DO events based on simulations.

## Footnotes

One contribution of 13 to a Theme Issue ‘Mathematics applied to the climate system’.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.