## Abstract

Approaching a dangerous bifurcation, from which a dynamical system such as the Earth's climate will jump (tip) to a different state, the current stable state lies within a shrinking basin of attraction. Persistence of the state becomes increasingly precarious in the presence of noisy disturbances. We argue that one needs to extract information about the nonlinear features (a ‘softening’) of the underlying potential from the time series to judge the probability and timing of tipping. This analysis is the logical next step if one has detected a decrease of the linear decay rate. If there is no discernible trend in the linear analysis, nonlinear softening is even more important in showing the proximity to tipping. After extensive normal-form calibration studies, we check two geological time series from palaeo-climate tipping events for softening of the underlying well. For the ending of the last ice age, where we find no convincing linear precursor, we identify a statistically significant nonlinear softening towards increasing temperature.

## 1. Introduction

The on-going work by the United Nations, following the major report of the IPCC [1] and the Climate Change Conferences in Copenhagen and Cancun (Mexico), centres on the prediction of future climate change, a key feature of which would be any suggestion of a sudden and (possibly) irreversible abrupt change called a tipping point [2,3]. Many tipping points, such as the switching on and off of ice ages, are well documented in palaeo-climate studies over millions of years of the Earth's history.

Predicting such tippings in advance using time-series data derived from preceding behaviour is now seen as a major challenge, impinging, for example, on the possible use of geo-engineering [4]. Techniques introduced by Held & Kleinen [5] and Livina & Lenton [6] draw on the assumption that the tipping events are governed by a bifurcation in an underlying nonlinear dissipative dynamical system. Specifically, researchers search for a slowing down of intrinsic transient responses within the data, which is predicted to occur before most bifurcational instabilities. This is done by determining a so-called propagator, estimated from the correlation between successive elements of a window sliding along the time series (this estimate is called *κ*_{ACF} in this paper). This propagator, such as the first-order autoregressive (AR(1)) mapping coefficient, is a measure of the linear decay rate and should increase to unity at a tipping instability (corresponding to a decrease of the linear decay rate to zero).

Prediction techniques can be tested on climatic computer models, but more challenging is to try to predict real ancient climate tippings, using their preceding geological data provided by ice cores, sediments, etc. Using this preceding data alone, the aim would be to see to what extent the actual tipping could have been predicted in advance.

One such study by Livina & Lenton [6] looks at the end of the most recent ice age and the associated Younger Dryas event, about 11 500 years ago, when the Arctic warmed by 7^{°}C in 50 years. This pioneering study used a time series derived from Greenland ice-core palaeo-temperature data. A second such study (one of eight made by Dakos *et al*. [7]), using data from tropical Pacific sediments, gives a good prediction for the end of ‘greenhouse’ Earth about 34 million years ago when the climate tipped from a tropical state into an icehouse state. Lenton *et al.* [8] give a complete overview of current techniques for extracting early warning signals from time series and compare them on realistic models and a range of palaeo-climate time series.

A remark that is made during the analysis of Lenton *et al.* [8] is that the absolute values of the extracted quantities have no direct bearing on the tipping probability over time unless one also knows the size of the *basin of attraction*. This basin is determined in the simplest cases by the dominant nonlinear term in the underlying equations of motion, which is the subject of this paper.

## 2. Concepts from nonlinear dynamics

The core techniques for analysis of time series of climate data aim to extract the linear decay rate towards a (possibly drifting) noise-perturbed equilibrium [5,6]. These techniques can be directed at the Earth's climate as a whole, or at the relevant climate subsystems described by Lenton *et al.* [2] as *tipping elements*. Our aim here is to augment this linear information with information about nonlinear features of the underlying dynamics, by examining to what extent we can identify nonlinear softening and include it in the list of early warning signs for climate tipping.

### (a) Shrinking basins around dangerous bifurcations

As we approach a dangerous bifurcation, from which a nonlinear dissipative dynamical system will experience a finite jump to a remote alternative state, the current attractor is located within a shrinking basin of attraction. Correspondingly, the maintenance of the state will become increasingly precarious in the presence of noisy disturbances.

Thinking in terms of an underlying potential energy function (the existence of which is theoretically well defined for a saddle–node fold and some other bifurcations), the parabolic shape of its graph will become increasingly perturbed by softening nonlinear features.

The relevant bifurcations are the so-called *codimension-one* events [9,10], namely those that can be typically encountered under a gradual change of a single control parameter. The complete list of the dangerous codimension-one bifurcations is given in table 1, where we give a brief description of the nature of the shrinking basin. More details can be found in Thompson & Sieber [11]. Focusing first on the local bifurcations (table 1*a*,*b*), the shrinking boundaries are illustrated schematically in figure 1, with the control parameter plotted horizontally and the response plotted vertically.

In figure 1*a*, we illustrate both types of the saddle–node fold bifurcation: the static fold involving a path of equilibrium fixed points, and the cyclic fold involving a trace of periodic orbits. For these, the basin shrinkage is one-sided. The basin is bounded by the unstable side of the parabolically folding path.

Next, in figure 1*b*, we illustrate the local subcritical bifurcations, of which the Hopf, flip and Neimark are codimension-one (we remember that the more familiar static pitchfork bifurcations are codimension-two, being structurally unstable against a symmetry-breaking perturbation). Here we see an unstable solution, which controls the basin boundary, shrinking parabolically around the stable solution, from which the (noise-free) system jumps out of our field of view as the control parameter, *μ*, reaches its critical value of *μ*_{C}.

The static fold and e Hopf bifurcation have a theoretically well-defined potential energy surface that neatly summarizes the shrinking basin, as illustrated in figure 2. Note that figure 2*a* will be discussed more fully in §2*b*.

### (b) A closer look at the fold

The fold is the simplest and most common way in which an equilibrium of a nonlinear dissipative dynamical system can lose its stability. Having only a single active degree of freedom (*x*) and being observable under the variation of a single control parameter (*μ*), it can be illustrated on a graph of *x* against *μ* as a smooth curve that simply reaches an extreme value (a maximum, say) of the control *μ*, as shown in figure 1*a*. So although it is traditionally called a saddle–node bifurcation, it does not exhibit any obvious ‘bifurcation’ of a path, but rather just a smooth folding of an existing path. In the case of a static fold (to which we shall largely restrict our attention), the path is a trace of equilibrium fixed points, while for a cyclic fold the path would be a trace of periodic solutions.

Assuming that we are close to a fold, the intrinsic damping of the system will have become super-critical so the system will have the non-oscillatory response of an over-damped particle sliding (in thick oil, as we might imagine) on a notional potential energy surface *U*(*x*,*μ*). This is illustrated for the fold in figure 2*a*, where we show the potential energy surface erected over the (*x*,*μ*) base plane. Here, the solid part of the equilibrium path, corresponding to energy minima, is a stable node, while the dashed part, corresponding to energy maxima (more generally, a geometrical saddle in higher dimensions) is a dynamically unstable saddle.

We see that as the control *μ* is slowly increased towards its critical value, *μ*_{C}, the stable equilibrium solution gets closer and closer to the hill-top potential barrier, and so gets progressively more precarious in the presence of noisy disturbances. With *μ* increasing at an infinitesimal rate, noise will ensure escape over the hill-top before *μ* reaches *μ*_{C}. If, however, *μ* increases rapidly, this early escape may be delayed or suppressed, as we have examined quantitatively elsewhere [12].

To give a greater perspective to our view of the fold it is useful to introduce another control parameter *μ*_{2} such that our parameter space is now represented jointly by the (*μ*_{1},*μ*_{2})-plane. We can now erect an equilibrium surface, of height *x*, over the two-dimensional control plane, as illustrated in figure 3. Here, we see two fold lines whose projections divide the control plane into regimes exhibiting either one or three coexisting solutions. These fold lines coalesce and vanish at the cusp. This cusp is a codimension-two phenomenon, which is typically only observed when we have independent control of both parameters (*μ*_{1} and *μ*_{2}). This is made transparent by the fact that a typical path in control space cannot be expected to pass precisely under the cusp point; by contrast, if a parameter path passes through a fold line, all near-by paths also cross this fold line, which is what ‘the fold has codimension-one’ means. For an appropriate choice of path in the (*μ*_{1},*μ*_{2})-plane through the cusp, one sees a pitchfork bifurcation [10].

We will consider the case in which the main equilibrium sheet (in grey) is stable while the narrow inverted sheet (white) is unstable.

A climate system will have a large number of possible control parameters, but in trying to predict a tipping point, we will be studying a particular evolution that corresponds to a particular route in control space; this is precisely why we are only likely to encounter a fold, and not a cusp, in figure 3. To explore possible scenarios that might underlie a tipping study, it is now useful to examine some possible routes in our two-parameter model of figure 3.

Consider, first, the simple scenario in which the system parameters take path BB^{′} leading transversally through a fold line. This is the classical encounter with a fold that is implicitly envisaged in earlier work. In the presence of noise, a premature tip from N occurs with a certain probability depending on the noise level and the speed with which the path is traversed, as analysed by Thompson & Sieber [11]. If the noise level was particularly high, a jump from N could easily be perceived as a purely noise-induced jump with no bifurcation, even though it is actually caused by the proximity to the fold: this could hold even if there were no movement in the control space at all, the system having rested at N throughout (Ashwin *et al.* [13] call this N-tipping). Another scenario, not specifically illustrated in the figure, could arise if the route in control space approached the fold line, but then turned back away from it. Analysis would then show a temporary decrease of the strength of the attraction (with the AR(1) coefficient moving towards unity), which might be discounted as a false alarm, even though the system did approach a fold and a noise-induced escape was temporarily probable.

Finally, in the scenario of path AA^{′}, one would observe a temporary increase of the AR(1) coefficient, even though no bifurcation is apparent but rather a gradual shift of the steady state. This shows that a rapid transition can be related to a nearby fold that was just barely missed. Similarly, the scenario called R-tipping by Ashwin *et al.* [13] occurs if one changes the other control parameter in figure 3 sufficiently rapidly from the path AA^{′} to N (avoiding the folds). In this scenario, the system jumps to the other stable equilibrium despite having never crossed a bifurcation curve.

## 3. Time-series analysis for leading-order terms: qualitative overview

The extraction of the underlying dynamics from time series is divided into subtasks of increasing difficulty. Assume that we have a time series coming from a process that is either stationary or has a slowly drifting underlying parameter *μ* as suggested by the paths in figure 3. In order to predict (or identify) the dynamics, one attempts to extract the leading orders of the dominant components of the underlying equations of motion. We illustrate the steps using a time series obtained from a process that has the saddle–node normal form with a slowly drifting normal-form parameter *μ* as its deterministic part (corresponding to figure 2*a*) and is influenced by additive Gaussian noise. We permit the parameter *μ* to drift slowly with speed *ε*:
3.1
Here, *x* is the response, *μ* is the control parameter and *t* is the time. The Gaussian perturbation d*W*_{t} has zero mean and unit variance, such that *σ* controls the noise-induced variance added to the dynamics of the deterministic saddle–node form , which has the stable equilibrium . The second equation prescribes the linear sweep of *μ* from its starting value *μ*_{0}>0 towards zero at rate *ε*≪1. For fixed *μ*>0, the unperturbed system (*σ*=0) has a stable equilibrium at and an unstable equilibrium at .

In §4, we will use the saddle–node normal form (3.1) to test the ability of a range of estimators to distinguish a time series generated by a process with a quadratic nonlinearity from a linear time series. Two example time series and their analysis are shown in figure 4. Both time series have identical linear properties but differ in the dominant nonlinear term of the deterministic part of the dynamics. The softening of the full (estimated) nonlinear potential well (corresponding to the illustration in figure 2*a*) is displayed in figure 5.

### (a) Equilibrium: zero order

The location of the slowly drifting equilibrium and its trend is estimated first. Lenton *et al.* [8] call this step ‘detrending’ and propose either piecewise linear fitting or filtering with a Gaussian kernel. The result of filtering with a Gaussian kernel is shown as a thick grey curve in figure 4*a*,*b*(i). Every method for detrending requires a bandwidth parameter (for example, the width of the Gaussian kernel). For figure 4, we chose the bandwidth that the kernel density estimation routine (Matlab's `kde` by Botev *et al.* [14]) offered. The thin black curve in figure 4*b*(i) shows the true value of the quasi-equilibrium (for *ε*=0). It gives a visual estimate of how the reliability of the zero-order estimates depends on the quality of the data. The oscillatory deviations of the extracted equilibrium (thick light grey curve) from its true value (thin black curve) show that the automatically chosen bandwidth was slightly smaller than the theoretical optimum.

### (b) Linear decay rate: first order

Assuming that the deterministic part of the underlying process has a stable equilibrium *x*_{eq} (which is possibly slowly drifting), and that the zero-order estimate is an approximation for *x*_{eq}, the next step is to estimate the linear decay rate *κ* towards *x*_{eq} and the trend of *κ* over time. Generally, if the underlying deterministic process is high-dimensional then these estimates capture the dominant (that is, smallest) decay rate. They are typically applied to the detrended time series, that is, , which is shown in the second row of figure 4.

—

*ACF*. The*k*-step (usually one-step) autocorrelation function (ACF)*α*is fitted directly to the detrended time series: . This was introduced by Held & Kleinen [5] as degenerate fingerprinting for the analysis of climate time series. The ACF*α*is related to*κ*via The estimate for*κ*_{ACF}is shown in figure 4*a*,*b*(iii).—

*DFA*. Detrended fluctuation analysis (DFA) has been introduced by Livina & Lenton [6] for climate time series; see also Lenton*et al.*[8] for a short description of how one extracts*κ*from this analysis.—

*Quasi-stationary density*. If one assumes that the deterministic dynamics of the detrended time series is essentially that of an overdamped particle moving in a potential well , that is, 3.2 then the linear decay rate equals the second derivative ∂_{xx}*U*(0) of*U*at the bottom of the well, . Moreover, the stationary density*p*(*x*) is related to the shape of the potential well*U*via the stationary Fokker–Planck equation 3.3 (the second line being obtained after one integration), where*c*is the constant of integration. This constant satisfies , and can be interpreted as the flow rate (*c*<0 indicates flow towards ). Remembering that the drift speed*ε*of the system parameter*μ*is small, the true time-dependent probability density*p*still satisfies (3.3) approximately at each*μ*: 3.4 In (3.4), we included the dependence of all quantities on*μ*expressly and remember that is of order*ε*. Assuming quasi-stationarity, we neglect the order-*ε*terms and determine the coefficients, −*σ*^{−2}∂_{x}*U*and*σ*^{−2}*c*, by fitting the empirical density*p*_{emp}(*x*) from a time window [*t*−*w*/2,*t*+*w*/2] to equation (3.4). Specifically,*c*_{emp}=*σ*^{−2}*c*is a scalar and ∂_{x}*U*(0) is equal to 0 after detrending, such that one has to fit two coefficients,*κ*_{U}and*c*_{emp}, using (3.4) if one truncates ∂_{x}*U*after first order: 3.5 In problems where no escape is possible, one can drop the term*c*in (3.3) (and, thus, in (3.4) and (3.5)), and simplify (3.3) to 3.6 Livina*et al.*[15] used relation (3.6) (fitting of*U*with higher-order polynomials) to detect potential wells in time series that included rapid transitions. Fitting*U*with a parabola (where*U*(0) is irrelevant and ∂_{x}*U*(0)=0) is equivalent to using the variance of if the density*p*_{emp}is Gaussian. Ditlevsen & Johnsen [16] state that monitoring the empirical variance helps to avoid erroneous detection of false trends.

Lenton *et al.* [8] compare these three estimates, *κ*_{ACF}, the DFA-based estimate for *κ*, and the variance (which is inversely proportional to *κ*_{U}), in detail using time series arising in climate models and in palaeo-climate records. Methods based on properties of the spectrum of the time series were proposed and investigated by Kleinen *et al.* [17] and Biggs *et al.* [18] but are not discussed here. Figure 4 shows in the third row how the estimates of *κ*_{U} from the quasi-stationary density compare with the ACF estimates *κ*_{ACF}. For both time series, the estimates are quantitatively close to the theoretical value of *κ* (which is indicated by the thin black line in figure 4*a*,*b*(iii). We also observe that in figure 4*a*,*b*(iii) the local trends of *κ*_{ACF} from the autocorrelation estimate and of *κ*_{U} based on the quasi-stationary density are strongly correlated, so it is unclear if monitoring both quantities really prevents false alarms.

There is one notable difference between the AR(1) and the DFA estimate on the one hand and the distribution-based estimate *κ*_{U} on the other: the estimate for *κ*_{U} based on the quasi-stationary density only looks at the distribution of the values in the window of interest but does not care about the order in which they appear. This is in contrast to, for example, the AR(1) estimate, which determines the correlation between each value and its predecessor.

#### (i) Estimate of the input noise amplitude

If the AR analysis shows the presence of a single positive dominant coefficient (which gives evidence that the underlying dynamics has really one distinct direction in which the decay is slowest), then one can combine both estimates of the decay rate to obtain an estimate of the amplitude *σ* of the additive noise. Equations (3.3) and (3.5) imply that *κ*_{U} is an estimate for *σ*^{−2}*κ*, where *κ* and *σ*^{2} are the true linear decay rate and input noise amplitude. Thus, if we replace the unknown *κ* by the estimate *κ*_{ACF}, we can use the relation between *κ*_{U} and the true *κ* to estimate *σ*^{2}:
3.7
where *κ*_{ACF} is the estimate of the linear decay rate obtained from the autocorrelation and *κ*_{U} is the estimate obtained from the quasi-stationary density *p*_{emp}. This is an alternative to using the residual of the least-squares fit that one obtains when estimating *κ*_{ACF}. We found that the residual systematically underestimates the noise level in the normal-form examples shown in figure 4.

### (c) Nonlinearity: basin boundary

One problem left open by the linear analysis is that the quantitative value of the estimated linear decay rate *κ* and the estimated noise level do not give any indication for the probability of tipping. What the linear methods estimate is the *α* and the *σ* of a discrete process,
3.8
(where the *η*_{k} are assumed to be independent and normally distributed random numbers). The discrete process (3.8) is the linearization of the time-Δ*t* map of the continuous process (3.2). An estimate for *α* less than but close to unity does not necessarily indicate that we are close to a stability boundary for the nonlinear problem (3.2). Rather, it implies that the time step Δ*t* between successive measurements has been small compared with the mean decay time to half, , of the process:
The *trend* of the estimated *α* is an indicator for incipient tipping because (if extrapolated in any way) it gives an estimate for the time to tipping if one ignores the possibility of early escape. However, the trend of *α* is less certain even in the artificial time series of figure 4. Similarly, the estimated noise amplitude *σ* has to be compared with the coefficient in front of the leading nonlinear term of the right-hand side (which equals −1 in (3.1)). The ratio between *σ* and the nonlinear term enters the estimates for the probability of tipping if there is no discernible upward trend in *α* (N-tipping), or for the probability of early escape if the trend of *α* points upwards [12]. So without at least an order-of-magnitude estimate of the leading nonlinear term, the linear methods only provide an estimate for the tipping time if there is a clearly discernible trend in *α*, and even then this estimate discounts early escape.

Figure 4 illustrates this problem. Apart from a slow trend, both time series in figure 4 have identical properties at the linear level. However, while for the process generating the time series in figure 4*b*(i) the median time to tipping is *t*=100 according to Thompson & Sieber [12] (indeed, it escapes shortly after *t*=120); for the time series in figure 4*a*(i) the extrapolation (correctly) predicts tipping (or rather, linear instability) at *t*≈320. Thus, we observe that the linear properties of the time series alone are not sufficient to provide estimates about tipping, for example, the median time to tipping. The difference between the time series in figure 4*a*(i) and *b*(i) is in the dominant nonlinear term of the deterministic part. That this term plays a role is not surprising because tipping is a feature of nonlinear systems. Extracting its presence from a time series is a necessary step that has to follow the linear analysis. The approaches presented in the following all generalize the estimate for *κ*_{U}, based on the quasi-stationary density, by looking at the full nonlinear potential well associated with the quasi-stationary density via the Fokker–Planck equation (3.4). Figure 5 illustrates what looks like for the time series shown in figure 4*b*. The realization of the noisy, evolving time series is shown in the base plane together with the equilibrium path *μ*=*x*^{2}, and the estimated (empirical) potential *U*_{emp} is shown as the coloured surface. The estimate for the potential well is taken in a sliding window, which explains why the surface can only be determined in the central region of the time series with half the length of the window unrepresented at either end. The numerically estimated equilibrium trend *x*_{eq} is displayed as a black curve at the bottom of the valley of *U*_{emp}. For comparison with this *U*_{emp}, we display at either end the real (nonlinear) potential *U*(*x*,*t*) corresponding to (3.1), which at the later time is already showing the hill-top to the left. The colour in figure 5 shows the deviation of the empirical potential from the linear-theory parabola, which is not itself displayed. The softening, highlighted by the blue coloration, is seen on the left-hand side, which is approaching the hill-top. Transparency of the colour is used to show the number of data points that support a particular part of the surface (as given by the empirical quasi-stationary density ).

Several methods have been proposed to detect nonlinearity in the potential well in the form of this type of softening (or tilting).

—

*Skewness*. Guttal & Jayaprakash [19,20] studied how the closeness of a bifurcation influences the skewness*γ*of the empirical quasi-stationary density from time series. As this approach is also based on the empirical quasi-stationary density*p*_{emp}, it generalizes the estimate*κ*_{U}to non-parabolic features of the well*U*and non-Gaussian features of the stationary density. Guttal & Jayaprakash [19,20] proposed to look at the*trend*of the skewness*γ*over time to detect incipient bifurcations. Figure 4*b*(iv) shows how the skewness, taken over a single window, changes. It is noticeably negative (compare with the empirical skewness from the linear time series in figure 4*a*(iv)) but no trend is discernible.—

*Quasi-stationary density*. Livina*et al.*[15] generalized the potential-well analysis for the quasi-stationary density beyond the estimate*κ*_{U}for the linear decay rate, fitting the potential well to higher-order polynomials of even degree using relation (3.6). This analysis was not used to attempt prediction of transition probabilities from time series but it was applied to time series that already included all transitions to count the number of wells of*U*depending on time*t*(which corresponds to the number of modes (peaks) of the empirical quasi-stationary density*p*_{emp}). See also Beaulieu*et al.*[21] for methods that detect change points in time series.—

*Drift ratio*. If one assumes that the underlying system parameter is approaching a fold more or less linearly and one is sufficiently close to the fold already, then the ratio between the drift Δ*x*_{eq}of the equilibrium estimate and the increase of the estimated linear decay rate Δ*κ*estimates the quadratic term in the normal form [12].

## 4. Quantitative estimates of the nonlinear coefficients

The colour shading of figure 5 suggests that the nonlinear term in the underlying deterministic system shows up in the quasi-stationary density *p*_{emp}. This raises two questions. First, can the level of deviation from linear theory be distinguished with confidence from random chance? That is, can one state, when looking at a time series, that a significant quadratic term is present? Second, is it possible to quantify the size of the nonlinear term with reasonable level of certainty?

Finding a significant nonlinearity is far easier than estimating its precise value because biases introduced during the analysis of the equilibrium and decay rate (zero-order and first-order analysis) may distort the value of the nonlinear estimate but may still keep it significantly different from zero. We will demonstrate this in §5 for two palaeo-climate time series.

### (a) Properties of the stationary density of the saddle–node normal form

Figure 6 shows the dependence of the mean, the variance var *p* and the skewness *γ* of the empirical stationary density *p*_{emp} on the system parameter *μ* for (3.1) as computed using the stationary Fokker–Planck equation (3.3). Note that this corresponds to the stationary process where the random realization is re-injected at whenever it has escaped to . We observe that the dependence of *γ* is strongly nonlinear for the saddle–node normal form with noise (, *σ*=1 in equation (3.1)). This means that trends of the skewness are likely to be difficult to ascertain when approaching a fold. However, the presence of skewness is an indicator for the nonlinearity uniformly for all parameters *μ*, and its sign indicates the direction of escape.

Note also how the nonlinearity affects the empirical mean and the variance var *p*, which is no longer inversely proportional to the linear decay rate *κ*. This shift of the mean and the additional variance is in part due to the fraction of trajectories that escape, and in part due to the non-parabolic shape of the well at some distance from its local minimum.

### (b) Practical estimates of the nonlinear term from time series

We can estimate the nonlinearity directly as the deviation of the empirical potential well *U* from a parabola as suggested by figure 5 (we are only interested in ∂_{x}*U*). The simplest approach (apart from looking at skewness *γ*) is to fit *κ*_{U} and *c*_{emp} to the empirical density *p*_{emp} using equation (3.5). If is indeed linear then *c*_{emp} would be zero such that *c*_{emp} is a signed scalar measure for the deviation of from linearity, serving as a proxy for the present nonlinearity. A second alternative is to expand to second order, incorporating a quadratic nonlinearity directly into ∂_{x}*U* with an unknown coefficient *N*_{2}. This leads to
4.1
Both quantities, *c*_{emp} and *N*_{2}, measure the deviation of from its linear approximation . The main difference between them is the weighting ( for *N*_{2}, uniform for *c*_{emp}). In summary, the step-by-step procedure to extract an estimate of the quadratic term (or a proxy for it) from a time series *x*_{k} is the following.

— Detrend the time series

*x*_{k}(see §3*a*). We call the detrended time series . The following steps are performed for each suitable time*t*to obtain*κ*_{U}(*t*),*c*_{emp}(*t*),*N*_{2}(*t*) or skewness*γ*(*t*).— Choose a length

*w*of sliding windows and obtain the empirical stationary density for the sliding window centred at*t*. (We used`kde1d`to obtain . For non-stationary time series, we used the sliding window length*w*=*N*/2.)— Use relation (3.5) to find

*κ*_{U}(*t*) and*c*_{emp}(*t*) from*p*_{emp}. Use relation (4.1) to find*N*_{2}(*t*) (and another estimate for*κ*_{U}) from*p*_{emp}. Compute the empirical skewness*γ*(*t*) from*p*_{emp}.

### (c) Comparison of estimates using saddle–node normal form

Figure 7 shows how the estimates for nonlinearity behave for the saddle–node normal form with noise, equation (3.1) with *ε*=0 and *σ*=1. Each panel shows the quartiles of the distribution of the estimate for 100 realizations of time series generated by (3.1). Each realization was run until it reached 2000 data points or *x*=−1 (indicating escape). Figure 7*a* shows the estimate *c*_{emp} obtained by linear least-squares fitting of the approximate stationary Fokker–Planck equation (3.5) to the empirical stationary density *p*_{emp}, figure 7*b* shows the quadratic coefficient *N*_{2} obtained from (4.1) and figure 7*c* shows the skewness *γ* of *p*_{emp}. For comparison, we also generate 100 linear time series using
4.2
and then fit *c*_{emp}, *N*_{2} and *γ* for these time series, too. Figure 7 compares the quartiles of the distributions for time series generated by the saddle–node normal form (3.1) and the linear equation (4.2). If there is small overlap in the distribution then the quantity is a good indicator for the presence of a quadratic nonlinear term. We observe that this is the case for all quantities as long as the time series has a length of 2000 (see figure 7*d* for the distribution of time-series lengths). Naturally, the uncertainty, and hence the overlap, increases when the time series is shorter due to escape from the potential well to . This is the case for smaller *μ* in figure 7 (see Thompson & Sieber [12] for a quantitative study of escape probabilities for small *μ*). We also notice that all quantities have a systematic deviation from the theoretical value of the quantity they supposedly estimate: *N*_{2} should be equal to −1 according to (3.1), the real escape rate *c* has a much smaller modulus than *c*_{emp} (cf. figure 6*a*), and the skewness *γ* has theoretically a much larger modulus (cf. figure 6*b*). The deviation for *N*_{2} is relatively small, and is probably caused by the kernel density estimate for *p*_{emp}. The large modulus of *c*_{emp} shows that the linear term is a poor fit for , which makes *c*_{emp} a measure of how much the odd symmetry is broken by , rather than an estimate for the escape rate. We note that *c*_{emp,2} as fitted in (4.1) has a realistic modulus (close to zero, not shown in figure 7). The bias of the skewness *γ* is mainly due to our restriction to time series that stay inside the potential well. However, the characteristic dip of *γ* seen in figure 6*b* is still visible in figure 7*c*. Appendix B addresses the dependence of the empirical skewness on all method parameters.

We also note that the estimates for the linear decay rate, *κ*_{U} and *κ*_{ACF}, are more accurate than the estimates for the nonlinearity but by their nature they cannot distinguish the time series, generated by the saddle–node normal form from a linear surrogate.

The summative conclusion from figure 7 is that one can detect the underlying nonlinearity of the deterministic part of equation (3.1) by observing *c*_{emp}, *N*_{2} or the skewness *γ* if the time series is moderately long. One can expect this to be true in the more general case of a time series generated by a system with deterministic dynamics close to a fold and additive noise. While one can in principle recover the underlying parameter *μ* from the estimates for *κ*_{U}, *N*_{2} and *σ* [12], the uncertainty in *N*_{2} propagates dramatically into uncertainty for *μ* (one has to divide by *N*_{2}).

The results of figure 7 show the stationary case (*ε*=0). For slowly drifting time series, the same analysis will then have to be applied to parts of the time series in sliding windows. We demonstrate this for two geological times series in §5. For time series with rapidly drifting system parameter *ε*, the estimates for the softening, *c*_{emp}, *N*_{2} and the skewness *γ*, will not give a detectable difference from the corresponding linear time series before tipping. The conclusion one would draw from this absence of detectable softening would be correct: tipping happens when the linear decay rate *κ* reaches the critical value 0 (or slightly later [12]). The same holds if the noise level is low, which is equivalent to rapid drift (see Thompson & Sieber [12] for the transformation). In practice, one uses the argument the other way round: if the nonlinear estimates do not give a value significantly different from zero, one will conclude that the ratio between noise level and drifting speed is small.

## 5. Studies of geological time series

We now apply our nonlinear investigations to two ancient climate tippings (see figures 8 and 9). The first is the major transition marking the end of the latest ice age, which occurred about 17 000 years before the present (yrs BP). The data used for this is a temperature reconstruction from the Vostok ice-core deuterium record [22]. The second more recent tipping is the ending of the Younger Dryas using the grey-scale from the Cariaco basin sediments in Venezuela [23]. This Younger Dryas event was a curious cooling, about 11 500 years ago, just as the Earth was warming up after the last ice age. It ended in a dramatic tipping point, when the Arctic warmed by 7^{°}C in 50 years. This sudden ending has been related [24] to a *switching-on* of the global oceanic thermohaline circulation (THC). This switch-on is known from extensive theoretical studies [25] to be at a saddle–node fold arising as a perturbation of a subcritical pitchfork [12,26].

Under linear time-series analysis of the directly preceding data, neither of these two events shows a strong trend in the stability propagator (figure 9). Meanwhile sample estimates of the underlying potential functions are shown in figure 8.

In figure 8*a*, for the end of the last glaciation, we see clearly that the well is softening (falling beneath the parabolic fit) on the high-temperature end while hardening (rising above the parabola) on the low-temperature end. So there is a strong nonlinear signal that a jump to higher temperatures may be pending (as indeed it was). We show that this signal is statistically significant in our full analysis in figure 9. By comparison, the study of the Younger Dryas, illustrated in figure 8*b*, provided no significant nonlinear conclusions.

For figure 9, we detrended both time series using Gaussian filtering (figure 9*c*,*d*) and estimated the linear indicators *κ*_{ACF} and *κ*_{U}. We observe weak to non-existent trends, leaving the evidence inconclusive at the linear level (grey and black curves in figure 9*e*,*f*). The ratio between *κ*_{ACF} and *κ*_{U} (both shown in figure 9*e*,*f*) gives an estimate of the variance *σ*^{2} of the noise input. The unit for *κ*_{ACF} is *units of x per time step* (so, for example, for figure 9

*e*,

^{°}

*C*/Δ

*t*), and, correspondingly, the unit for

*κ*

_{U}is Δ

*t*/

^{°}

*C*(unit of

*κ*

_{ACF}divided by unit for

*σ*

^{2}). Both units are arbitrary so that we do not indicate the scale of the

*y*-axis in figure 9

*e*,

*f*.

Lenton *et al.* [8] discuss the linear analysis of both time series in greater detail. Their time series *Vostok* corresponds to figure 9*a*; their time series *Cariaco* corresponds to figure 9*b*. (Note that Lenton *et al.* [8] use the deuterium proxy directly, whereas figure 9*a* shows the temperature reconstruction.) Specifically, figs. 2, 4, 9 and A1 in Lenton *et al.* [8] discuss how the results depend on method parameters such as detrending bandwidth and sliding window. Their analysis finds that the presence of a significant trend depends strongly on both parameters. See table 1 in Lenton *et al.* [8] (rows *Vostok* and *Cariaco*) for a summary of the evidence at the linear level.

At the nonlinear level, the time series for the end of the last glaciation has a strong quadratic nonlinearity in where the potential well *U* is presumed to govern the deterministic part of the dynamics (figure 9*g*). The indicators *c*_{emp}, *N*_{2} and the skewness *γ* in figure 9*g* are all far removed from what can be expected by chance in a linear time series. The histogram in the background of figure 9*g*,*h* has been sampled from 500 random linear time series generated by the linear process given in equation (4.2) with , where *κ*_{ACF} is taken from the estimate shown in figure 9*e*. The percentages at the top of figure 9*g*,*h* express how far in the tail of the histogram the mean of the quantity extracted from the geological time series is. We warn that the three quantities *γ*, *N*_{2} and *c*_{emp} are not really three independent indicators, as they all depend on the same estimate of the empirical density *p*_{emp}.

Visual inspection of the well shape in figure 8 confirms that the well is softening (bending downwards) on the high-temperature end and hardening (bending upwards) on the low-temperature end. So, at the nonlinear level the time-series data close to the bottom of the well already give evidence for a propensity to escape towards larger temperatures. The time series for the end of the Younger Dryas does not show evidence for strong nonlinearity of the underlying dynamics at the second-order level (figure 9*h*). The values for *c*_{emp}, *N*_{2} and the skewness *γ* can all be explained by randomness, as the histograms of estimates for the 500 linear time series in figure 9*h* show. Note that we did not include the previous transition (visible at the left end in figure 9*b*) in our analysis as our aim was to infer the nonlinearity exclusively from the data near the tipping equilibrium.

## 6. Conclusion

The main message of the paper is that the linear analysis investigated by Lenton *et al.* [8] on its own, even if all the estimates are accurate, does not contain all the information necessary to estimate the probability of tipping over time. The result of the linear analysis is an estimate of the decay rate *κ*, which is taken relative to the time spacing of the measurements. Even if this decay rate has an identifiable trend to zero, the probability of tipping over time is determined by the dominant nonlinear coefficient *N*_{2} in conjunction with *κ* and the noise level.

We found that the dominant nonlinear coefficient *N*_{2} is much harder to estimate accurately, so we also looked at proxies that indicate nonlinearity, such as the skewness *γ* (as proposed by Guttal & Jayaprakash [19]) or the coefficient *c*_{emp} from equation (3.5) (which is also a scalar signed measure for deviation from linear theory). Our study of the saddle–node normal form suggests that both proxies, *γ* and *c*_{emp}, give values that are easier to distinguish from random chance than the estimate for *N*_{2} itself. A significantly non-zero value for either of these proxies indicates that a quadratic nonlinear term is present and gives its sign.

However, we found the uncertainty for moderately long time series (*N*=2000) too large to translate the proxy back into a quantitatively reliable estimate for the normal-form parameter (this would be necessary to read off the probability for tipping from the tables in Thompson & Sieber [12]). We also found that the nonlinear proxies do not have discernible trends when one approaches tipping, so it makes sense only to extract their overall mean from the time series but not the trend.

## Appendix A. Sensitivity analysis for estimates of nonlinear quantities in palaeo-climate records

Figure 10 shows how the results in figure 9*g*,*h* depend on the parameters of our method. Specifically, we vary the length of the sliding window and the bandwidth of the Gaussian kernel when fitting the quasi-equilibrium drift. In all panels, the *x*-axis is the length of the sliding window relative to the overall length of the time series on a logarithmic scale (overall length is *N*=525 for the end of the last glaciation, *N*=2646 for the end of the Younger Dryas). For example, the *x*-value of −1 corresponds to a window length *w*=*N*/2. The *y*-axis is the kernel bandwidth for the Gaussian fitting kernel relative to the overall length of the time series on a logarithmic scale (this enters the function `kde1d` as a resolution parameter). For example, the *y*-value of −3 means that the resolution parameter of `kde1d` is 2^{3}=8.

For each measured quantity, we compare the mean of its value over all sliding windows to the percentiles of the linear surrogate. In this way, contour levels close to 0 or 100 correspond to significant deviations from a time series generated by a linear process. The distributions of these time series are shown as grey background in figure 9*g*,*h*. The value shown as a black vertical line in figure 9*g*,*h* was obtained using the method parameters highlighted by a black circle in figure 10.

Figure 10 gives evidence that the nonlinear features of the time series for the end of the last glaciation are robust as long as the detrending remains coarse-grained. The end of the Younger Dryas does not show any nonlinear feature that is significantly different from the surrogates for any choice of method parameters.

## Appendix B. Dependence of skewness on distance to tipping point

The observations in figures 6*b* and 7*c* appear to contradict the statement by Guttal & Jayaprakash [19] that skewness increases as one approaches tipping. Furthermore, there is a quantitative difference between figures 6*b* and 7*c*, even though they show the same qualitative feature: non-monotonicity of the skewness in the control parameter *μ*. Note that Guttal & Jayaprakash [19] studied the same type of tipping as in figure 6, not in the fold normal form but in a particular ecological model. In order to investigate this apparent contradiction, we look at how the skewness depends on two parameters: the control parameter *μ* and the value at which we consider a realization as escaped. More precisely, both figures 6*b* and 7*c* study
(B 1)
where *σ*=1 without loss of generality and *μ* is constant. Any realization of (B 1) escapes to in finite time almost surely so that (B 1) does not possess a stationary density. In the calculation for figure 6, we modified the process (B 1) by re-injecting the realization at whenever it escaped to . In this way, the process has a stationary density given by the Fokker–Planck equation (3.3) with −∂_{x}*U*(*x*)=*μ*−*x*^{2}. However, this is not accurately describing the density that one is interested in: the probability density of the realizations under the condition that the realization has not yet ‘escaped’. Clearly, this conditional probability depends on what one means by escaped. So, we introduce the escape boundary *b* as an additional parameter. Whenever, a realization gets smaller than the value (that is, it has run away beyond the local maximum of the potential by distance *b*) we stop the integration of (B 1). When one considers the set of all realizations before escape, then this also leads to a stationary density. This was done for figure 7 using (such that we count a realization as escaped whenever it reaches −1). The re-injection used for figure 6 and the choice of *x*=−1 as escape indicator lead to the systematic difference in skewness *γ* between figures 7*c* and 6*b*. In practice, one encounters only the conditional probability as shown in figure 7. So, it is important to check how the skewness depends on the somewhat arbitrary choice of the value *b* indicating escape. Moreover, the estimate shown in figure 7*c* suffers from the shortness of the time series for *μ* close to 0 (which is typical in practice).

Figure 11 shows the result of a more precise calculation: we simulate (B 1) for a large ensemble of realizations (*n*=100 000). At every time step *t*_{k}, we collect the *n*_{esc,k} realizations that have reached a value . They count as escaped and have to be discarded. In order to avoid depletion of the ensemble, we start *n*_{esc,k} new realizations at *t*_{k}. For each newly started realization, we pick the current value of one randomly chosen non-escaped realization as the initial value. This process reaches a stationary density, which is an approximation of the conditional probability density of *x* under the condition that *x* has not yet escaped beyond . We observe in figure 11 that the depth of the dip in the skewness depends on *b* (indicated by grey-scale) but that the location of the dip is uniformly larger than the critical value 0 of the control parameter *μ*.

It is possible that the simulations by Guttal & Jayaprakash [19] stayed consistently to the right of the curve of extreme skewness shown in figure 11 such that only the increase is observed. Note that in their model the notion of skewness has the opposite sign, so one should consider the modulus of the skewness.

## Footnotes

One contribution of 13 to a Theme Issue ‘Climate predictions: the influence of nonlinearity and randomness’.

- This journal is © 2012 The Royal Society