Approaching a dangerous bifurcation, from which a dynamical system such as the Earth's climate will jump (tip) to a different state, the current stable state lies within a shrinking basin of attraction. Persistence of the state becomes increasingly precarious in the presence of noisy disturbances. We argue that one needs to extract information about the nonlinear features (a ‘softening’) of the underlying potential from the time series to judge the probability and timing of tipping. This analysis is the logical next step if one has detected a decrease of the linear decay rate. If there is no discernible trend in the linear analysis, nonlinear softening is even more important in showing the proximity to tipping. After extensive normal-form calibration studies, we check two geological time series from palaeo-climate tipping events for softening of the underlying well. For the ending of the last ice age, where we find no convincing linear precursor, we identify a statistically significant nonlinear softening towards increasing temperature.
The on-going work by the United Nations, following the major report of the IPCC  and the Climate Change Conferences in Copenhagen and Cancun (Mexico), centres on the prediction of future climate change, a key feature of which would be any suggestion of a sudden and (possibly) irreversible abrupt change called a tipping point [2,3]. Many tipping points, such as the switching on and off of ice ages, are well documented in palaeo-climate studies over millions of years of the Earth's history.
Predicting such tippings in advance using time-series data derived from preceding behaviour is now seen as a major challenge, impinging, for example, on the possible use of geo-engineering . Techniques introduced by Held & Kleinen  and Livina & Lenton  draw on the assumption that the tipping events are governed by a bifurcation in an underlying nonlinear dissipative dynamical system. Specifically, researchers search for a slowing down of intrinsic transient responses within the data, which is predicted to occur before most bifurcational instabilities. This is done by determining a so-called propagator, estimated from the correlation between successive elements of a window sliding along the time series (this estimate is called κACF in this paper). This propagator, such as the first-order autoregressive (AR(1)) mapping coefficient, is a measure of the linear decay rate and should increase to unity at a tipping instability (corresponding to a decrease of the linear decay rate to zero).
Prediction techniques can be tested on climatic computer models, but more challenging is to try to predict real ancient climate tippings, using their preceding geological data provided by ice cores, sediments, etc. Using this preceding data alone, the aim would be to see to what extent the actual tipping could have been predicted in advance.
One such study by Livina & Lenton  looks at the end of the most recent ice age and the associated Younger Dryas event, about 11 500 years ago, when the Arctic warmed by 7°C in 50 years. This pioneering study used a time series derived from Greenland ice-core palaeo-temperature data. A second such study (one of eight made by Dakos et al. ), using data from tropical Pacific sediments, gives a good prediction for the end of ‘greenhouse’ Earth about 34 million years ago when the climate tipped from a tropical state into an icehouse state. Lenton et al.  give a complete overview of current techniques for extracting early warning signals from time series and compare them on realistic models and a range of palaeo-climate time series.
A remark that is made during the analysis of Lenton et al.  is that the absolute values of the extracted quantities have no direct bearing on the tipping probability over time unless one also knows the size of the basin of attraction. This basin is determined in the simplest cases by the dominant nonlinear term in the underlying equations of motion, which is the subject of this paper.
2. Concepts from nonlinear dynamics
The core techniques for analysis of time series of climate data aim to extract the linear decay rate towards a (possibly drifting) noise-perturbed equilibrium [5,6]. These techniques can be directed at the Earth's climate as a whole, or at the relevant climate subsystems described by Lenton et al.  as tipping elements. Our aim here is to augment this linear information with information about nonlinear features of the underlying dynamics, by examining to what extent we can identify nonlinear softening and include it in the list of early warning signs for climate tipping.
(a) Shrinking basins around dangerous bifurcations
As we approach a dangerous bifurcation, from which a nonlinear dissipative dynamical system will experience a finite jump to a remote alternative state, the current attractor is located within a shrinking basin of attraction. Correspondingly, the maintenance of the state will become increasingly precarious in the presence of noisy disturbances.
Thinking in terms of an underlying potential energy function (the existence of which is theoretically well defined for a saddle–node fold and some other bifurcations), the parabolic shape of its graph will become increasingly perturbed by softening nonlinear features.
The relevant bifurcations are the so-called codimension-one events [9,10], namely those that can be typically encountered under a gradual change of a single control parameter. The complete list of the dangerous codimension-one bifurcations is given in table 1, where we give a brief description of the nature of the shrinking basin. More details can be found in Thompson & Sieber . Focusing first on the local bifurcations (table 1a,b), the shrinking boundaries are illustrated schematically in figure 1, with the control parameter plotted horizontally and the response plotted vertically.
In figure 1a, we illustrate both types of the saddle–node fold bifurcation: the static fold involving a path of equilibrium fixed points, and the cyclic fold involving a trace of periodic orbits. For these, the basin shrinkage is one-sided. The basin is bounded by the unstable side of the parabolically folding path.
Next, in figure 1b, we illustrate the local subcritical bifurcations, of which the Hopf, flip and Neimark are codimension-one (we remember that the more familiar static pitchfork bifurcations are codimension-two, being structurally unstable against a symmetry-breaking perturbation). Here we see an unstable solution, which controls the basin boundary, shrinking parabolically around the stable solution, from which the (noise-free) system jumps out of our field of view as the control parameter, μ, reaches its critical value of μC.
The static fold and e Hopf bifurcation have a theoretically well-defined potential energy surface that neatly summarizes the shrinking basin, as illustrated in figure 2. Note that figure 2a will be discussed more fully in §2b.
(b) A closer look at the fold
The fold is the simplest and most common way in which an equilibrium of a nonlinear dissipative dynamical system can lose its stability. Having only a single active degree of freedom (x) and being observable under the variation of a single control parameter (μ), it can be illustrated on a graph of x against μ as a smooth curve that simply reaches an extreme value (a maximum, say) of the control μ, as shown in figure 1a. So although it is traditionally called a saddle–node bifurcation, it does not exhibit any obvious ‘bifurcation’ of a path, but rather just a smooth folding of an existing path. In the case of a static fold (to which we shall largely restrict our attention), the path is a trace of equilibrium fixed points, while for a cyclic fold the path would be a trace of periodic solutions.
Assuming that we are close to a fold, the intrinsic damping of the system will have become super-critical so the system will have the non-oscillatory response of an over-damped particle sliding (in thick oil, as we might imagine) on a notional potential energy surface U(x,μ). This is illustrated for the fold in figure 2a, where we show the potential energy surface erected over the (x,μ) base plane. Here, the solid part of the equilibrium path, corresponding to energy minima, is a stable node, while the dashed part, corresponding to energy maxima (more generally, a geometrical saddle in higher dimensions) is a dynamically unstable saddle.
We see that as the control μ is slowly increased towards its critical value, μC, the stable equilibrium solution gets closer and closer to the hill-top potential barrier, and so gets progressively more precarious in the presence of noisy disturbances. With μ increasing at an infinitesimal rate, noise will ensure escape over the hill-top before μ reaches μC. If, however, μ increases rapidly, this early escape may be delayed or suppressed, as we have examined quantitatively elsewhere .
To give a greater perspective to our view of the fold it is useful to introduce another control parameter μ2 such that our parameter space is now represented jointly by the (μ1,μ2)-plane. We can now erect an equilibrium surface, of height x, over the two-dimensional control plane, as illustrated in figure 3. Here, we see two fold lines whose projections divide the control plane into regimes exhibiting either one or three coexisting solutions. These fold lines coalesce and vanish at the cusp. This cusp is a codimension-two phenomenon, which is typically only observed when we have independent control of both parameters (μ1 and μ2). This is made transparent by the fact that a typical path in control space cannot be expected to pass precisely under the cusp point; by contrast, if a parameter path passes through a fold line, all near-by paths also cross this fold line, which is what ‘the fold has codimension-one’ means. For an appropriate choice of path in the (μ1,μ2)-plane through the cusp, one sees a pitchfork bifurcation .
We will consider the case in which the main equilibrium sheet (in grey) is stable while the narrow inverted sheet (white) is unstable.
A climate system will have a large number of possible control parameters, but in trying to predict a tipping point, we will be studying a particular evolution that corresponds to a particular route in control space; this is precisely why we are only likely to encounter a fold, and not a cusp, in figure 3. To explore possible scenarios that might underlie a tipping study, it is now useful to examine some possible routes in our two-parameter model of figure 3.
Consider, first, the simple scenario in which the system parameters take path BB′ leading transversally through a fold line. This is the classical encounter with a fold that is implicitly envisaged in earlier work. In the presence of noise, a premature tip from N occurs with a certain probability depending on the noise level and the speed with which the path is traversed, as analysed by Thompson & Sieber . If the noise level was particularly high, a jump from N could easily be perceived as a purely noise-induced jump with no bifurcation, even though it is actually caused by the proximity to the fold: this could hold even if there were no movement in the control space at all, the system having rested at N throughout (Ashwin et al.  call this N-tipping). Another scenario, not specifically illustrated in the figure, could arise if the route in control space approached the fold line, but then turned back away from it. Analysis would then show a temporary decrease of the strength of the attraction (with the AR(1) coefficient moving towards unity), which might be discounted as a false alarm, even though the system did approach a fold and a noise-induced escape was temporarily probable.
Finally, in the scenario of path AA′, one would observe a temporary increase of the AR(1) coefficient, even though no bifurcation is apparent but rather a gradual shift of the steady state. This shows that a rapid transition can be related to a nearby fold that was just barely missed. Similarly, the scenario called R-tipping by Ashwin et al.  occurs if one changes the other control parameter in figure 3 sufficiently rapidly from the path AA′ to N (avoiding the folds). In this scenario, the system jumps to the other stable equilibrium despite having never crossed a bifurcation curve.
3. Time-series analysis for leading-order terms: qualitative overview
The extraction of the underlying dynamics from time series is divided into subtasks of increasing difficulty. Assume that we have a time series coming from a process that is either stationary or has a slowly drifting underlying parameter μ as suggested by the paths in figure 3. In order to predict (or identify) the dynamics, one attempts to extract the leading orders of the dominant components of the underlying equations of motion. We illustrate the steps using a time series obtained from a process that has the saddle–node normal form with a slowly drifting normal-form parameter μ as its deterministic part (corresponding to figure 2a) and is influenced by additive Gaussian noise. We permit the parameter μ to drift slowly with speed ε: 3.1 Here, x is the response, μ is the control parameter and t is the time. The Gaussian perturbation dWt has zero mean and unit variance, such that σ controls the noise-induced variance added to the dynamics of the deterministic saddle–node form , which has the stable equilibrium . The second equation prescribes the linear sweep of μ from its starting value μ0>0 towards zero at rate ε≪1. For fixed μ>0, the unperturbed system (σ=0) has a stable equilibrium at and an unstable equilibrium at .
In §4, we will use the saddle–node normal form (3.1) to test the ability of a range of estimators to distinguish a time series generated by a process with a quadratic nonlinearity from a linear time series. Two example time series and their analysis are shown in figure 4. Both time series have identical linear properties but differ in the dominant nonlinear term of the deterministic part of the dynamics. The softening of the full (estimated) nonlinear potential well (corresponding to the illustration in figure 2a) is displayed in figure 5.
(a) Equilibrium: zero order
The location of the slowly drifting equilibrium and its trend is estimated first. Lenton et al.  call this step ‘detrending’ and propose either piecewise linear fitting or filtering with a Gaussian kernel. The result of filtering with a Gaussian kernel is shown as a thick grey curve in figure 4a,b(i). Every method for detrending requires a bandwidth parameter (for example, the width of the Gaussian kernel). For figure 4, we chose the bandwidth that the kernel density estimation routine (Matlab's kde by Botev et al. ) offered. The thin black curve in figure 4b(i) shows the true value of the quasi-equilibrium (for ε=0). It gives a visual estimate of how the reliability of the zero-order estimates depends on the quality of the data. The oscillatory deviations of the extracted equilibrium (thick light grey curve) from its true value (thin black curve) show that the automatically chosen bandwidth was slightly smaller than the theoretical optimum.
(b) Linear decay rate: first order
Assuming that the deterministic part of the underlying process has a stable equilibrium xeq (which is possibly slowly drifting), and that the zero-order estimate is an approximation for xeq, the next step is to estimate the linear decay rate κ towards xeq and the trend of κ over time. Generally, if the underlying deterministic process is high-dimensional then these estimates capture the dominant (that is, smallest) decay rate. They are typically applied to the detrended time series, that is, , which is shown in the second row of figure 4.
— ACF. The k-step (usually one-step) autocorrelation function (ACF) α is fitted directly to the detrended time series: . This was introduced by Held & Kleinen  as degenerate fingerprinting for the analysis of climate time series. The ACF α is related to κ via The estimate for κACF is shown in figure 4a,b(iii).
— DFA. Detrended fluctuation analysis (DFA) has been introduced by Livina & Lenton  for climate time series; see also Lenton et al.  for a short description of how one extracts κ from this analysis.
— Quasi-stationary density. If one assumes that the deterministic dynamics of the detrended time series is essentially that of an overdamped particle moving in a potential well , that is, 3.2 then the linear decay rate equals the second derivative ∂xxU(0) of U at the bottom of the well, . Moreover, the stationary density p(x) is related to the shape of the potential well U via the stationary Fokker–Planck equation 3.3 (the second line being obtained after one integration), where c is the constant of integration. This constant satisfies , and can be interpreted as the flow rate (c<0 indicates flow towards ). Remembering that the drift speed ε of the system parameter μ is small, the true time-dependent probability density p still satisfies (3.3) approximately at each μ: 3.4 In (3.4), we included the dependence of all quantities on μ expressly and remember that is of order ε. Assuming quasi-stationarity, we neglect the order-ε terms and determine the coefficients, −σ−2∂xU and σ−2c, by fitting the empirical density pemp(x) from a time window [t−w/2,t+w/2] to equation (3.4). Specifically, cemp=σ−2c is a scalar and ∂xU(0) is equal to 0 after detrending, such that one has to fit two coefficients, κU and cemp, using (3.4) if one truncates ∂xU after first order: 3.5 In problems where no escape is possible, one can drop the term c in (3.3) (and, thus, in (3.4) and (3.5)), and simplify (3.3) to 3.6 Livina et al.  used relation (3.6) (fitting of U with higher-order polynomials) to detect potential wells in time series that included rapid transitions. Fitting U with a parabola (where U(0) is irrelevant and ∂xU(0)=0) is equivalent to using the variance of if the density pemp is Gaussian. Ditlevsen & Johnsen  state that monitoring the empirical variance helps to avoid erroneous detection of false trends.
Lenton et al.  compare these three estimates, κACF, the DFA-based estimate for κ, and the variance (which is inversely proportional to κU), in detail using time series arising in climate models and in palaeo-climate records. Methods based on properties of the spectrum of the time series were proposed and investigated by Kleinen et al.  and Biggs et al.  but are not discussed here. Figure 4 shows in the third row how the estimates of κU from the quasi-stationary density compare with the ACF estimates κACF. For both time series, the estimates are quantitatively close to the theoretical value of κ (which is indicated by the thin black line in figure 4a,b(iii). We also observe that in figure 4a,b(iii) the local trends of κACF from the autocorrelation estimate and of κU based on the quasi-stationary density are strongly correlated, so it is unclear if monitoring both quantities really prevents false alarms.
There is one notable difference between the AR(1) and the DFA estimate on the one hand and the distribution-based estimate κU on the other: the estimate for κU based on the quasi-stationary density only looks at the distribution of the values in the window of interest but does not care about the order in which they appear. This is in contrast to, for example, the AR(1) estimate, which determines the correlation between each value and its predecessor.
(i) Estimate of the input noise amplitude
If the AR analysis shows the presence of a single positive dominant coefficient (which gives evidence that the underlying dynamics has really one distinct direction in which the decay is slowest), then one can combine both estimates of the decay rate to obtain an estimate of the amplitude σ of the additive noise. Equations (3.3) and (3.5) imply that κU is an estimate for σ−2κ, where κ and σ2 are the true linear decay rate and input noise amplitude. Thus, if we replace the unknown κ by the estimate κACF, we can use the relation between κU and the true κ to estimate σ2: 3.7 where κACF is the estimate of the linear decay rate obtained from the autocorrelation and κU is the estimate obtained from the quasi-stationary density pemp. This is an alternative to using the residual of the least-squares fit that one obtains when estimating κACF. We found that the residual systematically underestimates the noise level in the normal-form examples shown in figure 4.
(c) Nonlinearity: basin boundary
One problem left open by the linear analysis is that the quantitative value of the estimated linear decay rate κ and the estimated noise level do not give any indication for the probability of tipping. What the linear methods estimate is the α and the σ of a discrete process, 3.8 (where the ηk are assumed to be independent and normally distributed random numbers). The discrete process (3.8) is the linearization of the time-Δt map of the continuous process (3.2). An estimate for α less than but close to unity does not necessarily indicate that we are close to a stability boundary for the nonlinear problem (3.2). Rather, it implies that the time step Δt between successive measurements has been small compared with the mean decay time to half, , of the process: The trend of the estimated α is an indicator for incipient tipping because (if extrapolated in any way) it gives an estimate for the time to tipping if one ignores the possibility of early escape. However, the trend of α is less certain even in the artificial time series of figure 4. Similarly, the estimated noise amplitude σ has to be compared with the coefficient in front of the leading nonlinear term of the right-hand side (which equals −1 in (3.1)). The ratio between σ and the nonlinear term enters the estimates for the probability of tipping if there is no discernible upward trend in α (N-tipping), or for the probability of early escape if the trend of α points upwards . So without at least an order-of-magnitude estimate of the leading nonlinear term, the linear methods only provide an estimate for the tipping time if there is a clearly discernible trend in α, and even then this estimate discounts early escape.
Figure 4 illustrates this problem. Apart from a slow trend, both time series in figure 4 have identical properties at the linear level. However, while for the process generating the time series in figure 4b(i) the median time to tipping is t=100 according to Thompson & Sieber  (indeed, it escapes shortly after t=120); for the time series in figure 4a(i) the extrapolation (correctly) predicts tipping (or rather, linear instability) at t≈320. Thus, we observe that the linear properties of the time series alone are not sufficient to provide estimates about tipping, for example, the median time to tipping. The difference between the time series in figure 4a(i) and b(i) is in the dominant nonlinear term of the deterministic part. That this term plays a role is not surprising because tipping is a feature of nonlinear systems. Extracting its presence from a time series is a necessary step that has to follow the linear analysis. The approaches presented in the following all generalize the estimate for κU, based on the quasi-stationary density, by looking at the full nonlinear potential well associated with the quasi-stationary density via the Fokker–Planck equation (3.4). Figure 5 illustrates what looks like for the time series shown in figure 4b. The realization of the noisy, evolving time series is shown in the base plane together with the equilibrium path μ=x2, and the estimated (empirical) potential Uemp is shown as the coloured surface. The estimate for the potential well is taken in a sliding window, which explains why the surface can only be determined in the central region of the time series with half the length of the window unrepresented at either end. The numerically estimated equilibrium trend xeq is displayed as a black curve at the bottom of the valley of Uemp. For comparison with this Uemp, we display at either end the real (nonlinear) potential U(x,t) corresponding to (3.1), which at the later time is already showing the hill-top to the left. The colour in figure 5 shows the deviation of the empirical potential from the linear-theory parabola, which is not itself displayed. The softening, highlighted by the blue coloration, is seen on the left-hand side, which is approaching the hill-top. Transparency of the colour is used to show the number of data points that support a particular part of the surface (as given by the empirical quasi-stationary density ).
Several methods have been proposed to detect nonlinearity in the potential well in the form of this type of softening (or tilting).
— Skewness. Guttal & Jayaprakash [19,20] studied how the closeness of a bifurcation influences the skewness γ of the empirical quasi-stationary density from time series. As this approach is also based on the empirical quasi-stationary density pemp, it generalizes the estimate κU to non-parabolic features of the well U and non-Gaussian features of the stationary density. Guttal & Jayaprakash [19,20] proposed to look at the trend of the skewness γ over time to detect incipient bifurcations. Figure 4b(iv) shows how the skewness, taken over a single window, changes. It is noticeably negative (compare with the empirical skewness from the linear time series in figure 4a(iv)) but no trend is discernible.
— Quasi-stationary density. Livina et al.  generalized the potential-well analysis for the quasi-stationary density beyond the estimate κU for the linear decay rate, fitting the potential well to higher-order polynomials of even degree using relation (3.6). This analysis was not used to attempt prediction of transition probabilities from time series but it was applied to time series that already included all transitions to count the number of wells of U depending on time t (which corresponds to the number of modes (peaks) of the empirical quasi-stationary density pemp). See also Beaulieu et al.  for methods that detect change points in time series.
— Drift ratio. If one assumes that the underlying system parameter is approaching a fold more or less linearly and one is sufficiently close to the fold already, then the ratio between the drift Δxeq of the equilibrium estimate and the increase of the estimated linear decay rate Δκ estimates the quadratic term in the normal form .
4. Quantitative estimates of the nonlinear coefficients
The colour shading of figure 5 suggests that the nonlinear term in the underlying deterministic system shows up in the quasi-stationary density pemp. This raises two questions. First, can the level of deviation from linear theory be distinguished with confidence from random chance? That is, can one state, when looking at a time series, that a significant quadratic term is present? Second, is it possible to quantify the size of the nonlinear term with reasonable level of certainty?
Finding a significant nonlinearity is far easier than estimating its precise value because biases introduced during the analysis of the equilibrium and decay rate (zero-order and first-order analysis) may distort the value of the nonlinear estimate but may still keep it significantly different from zero. We will demonstrate this in §5 for two palaeo-climate time series.
(a) Properties of the stationary density of the saddle–node normal form
Figure 6 shows the dependence of the mean, the variance var p and the skewness γ of the empirical stationary density pemp on the system parameter μ for (3.1) as computed using the stationary Fokker–Planck equation (3.3). Note that this corresponds to the stationary process where the random realization is re-injected at whenever it has escaped to . We observe that the dependence of γ is strongly nonlinear for the saddle–node normal form with noise (, σ=1 in equation (3.1)). This means that trends of the skewness are likely to be difficult to ascertain when approaching a fold. However, the presence of skewness is an indicator for the nonlinearity uniformly for all parameters μ, and its sign indicates the direction of escape.
Note also how the nonlinearity affects the empirical mean and the variance var p, which is no longer inversely proportional to the linear decay rate κ. This shift of the mean and the additional variance is in part due to the fraction of trajectories that escape, and in part due to the non-parabolic shape of the well at some distance from its local minimum.
(b) Practical estimates of the nonlinear term from time series
We can estimate the nonlinearity directly as the deviation of the empirical potential well U from a parabola as suggested by figure 5 (we are only interested in ∂xU). The simplest approach (apart from looking at skewness γ) is to fit κU and cemp to the empirical density pemp using equation (3.5). If is indeed linear then cemp would be zero such that cemp is a signed scalar measure for the deviation of from linearity, serving as a proxy for the present nonlinearity. A second alternative is to expand to second order, incorporating a quadratic nonlinearity directly into ∂xU with an unknown coefficient N2. This leads to 4.1 Both quantities, cemp and N2, measure the deviation of from its linear approximation . The main difference between them is the weighting ( for N2, uniform for cemp). In summary, the step-by-step procedure to extract an estimate of the quadratic term (or a proxy for it) from a time series xk is the following.
— Detrend the time series xk (see §3a). We call the detrended time series . The following steps are performed for each suitable time t to obtain κU(t), cemp(t), N2(t) or skewness γ(t).
— Choose a length w of sliding windows and obtain the empirical stationary density for the sliding window centred at t. (We used kde1d to obtain . For non-stationary time series, we used the sliding window length w=N/2.)
(c) Comparison of estimates using saddle–node normal form
Figure 7 shows how the estimates for nonlinearity behave for the saddle–node normal form with noise, equation (3.1) with ε=0 and σ=1. Each panel shows the quartiles of the distribution of the estimate for 100 realizations of time series generated by (3.1). Each realization was run until it reached 2000 data points or x=−1 (indicating escape). Figure 7a shows the estimate cemp obtained by linear least-squares fitting of the approximate stationary Fokker–Planck equation (3.5) to the empirical stationary density pemp, figure 7b shows the quadratic coefficient N2 obtained from (4.1) and figure 7c shows the skewness γ of pemp. For comparison, we also generate 100 linear time series using 4.2 and then fit cemp, N2 and γ for these time series, too. Figure 7 compares the quartiles of the distributions for time series generated by the saddle–node normal form (3.1) and the linear equation (4.2). If there is small overlap in the distribution then the quantity is a good indicator for the presence of a quadratic nonlinear term. We observe that this is the case for all quantities as long as the time series has a length of 2000 (see figure 7d for the distribution of time-series lengths). Naturally, the uncertainty, and hence the overlap, increases when the time series is shorter due to escape from the potential well to . This is the case for smaller μ in figure 7 (see Thompson & Sieber  for a quantitative study of escape probabilities for small μ). We also notice that all quantities have a systematic deviation from the theoretical value of the quantity they supposedly estimate: N2 should be equal to −1 according to (3.1), the real escape rate c has a much smaller modulus than cemp (cf. figure 6a), and the skewness γ has theoretically a much larger modulus (cf. figure 6b). The deviation for N2 is relatively small, and is probably caused by the kernel density estimate for pemp. The large modulus of cemp shows that the linear term is a poor fit for , which makes cemp a measure of how much the odd symmetry is broken by , rather than an estimate for the escape rate. We note that cemp,2 as fitted in (4.1) has a realistic modulus (close to zero, not shown in figure 7). The bias of the skewness γ is mainly due to our restriction to time series that stay inside the potential well. However, the characteristic dip of γ seen in figure 6b is still visible in figure 7c. Appendix B addresses the dependence of the empirical skewness on all method parameters.
We also note that the estimates for the linear decay rate, κU and κACF, are more accurate than the estimates for the nonlinearity but by their nature they cannot distinguish the time series, generated by the saddle–node normal form from a linear surrogate.
The summative conclusion from figure 7 is that one can detect the underlying nonlinearity of the deterministic part of equation (3.1) by observing cemp, N2 or the skewness γ if the time series is moderately long. One can expect this to be true in the more general case of a time series generated by a system with deterministic dynamics close to a fold and additive noise. While one can in principle recover the underlying parameter μ from the estimates for κU, N2 and σ , the uncertainty in N2 propagates dramatically into uncertainty for μ (one has to divide by N2).
The results of figure 7 show the stationary case (ε=0). For slowly drifting time series, the same analysis will then have to be applied to parts of the time series in sliding windows. We demonstrate this for two geological times series in §5. For time series with rapidly drifting system parameter ε, the estimates for the softening, cemp, N2 and the skewness γ, will not give a detectable difference from the corresponding linear time series before tipping. The conclusion one would draw from this absence of detectable softening would be correct: tipping happens when the linear decay rate κ reaches the critical value 0 (or slightly later ). The same holds if the noise level is low, which is equivalent to rapid drift (see Thompson & Sieber  for the transformation). In practice, one uses the argument the other way round: if the nonlinear estimates do not give a value significantly different from zero, one will conclude that the ratio between noise level and drifting speed is small.
5. Studies of geological time series
We now apply our nonlinear investigations to two ancient climate tippings (see figures 8 and 9). The first is the major transition marking the end of the latest ice age, which occurred about 17 000 years before the present (yrs BP). The data used for this is a temperature reconstruction from the Vostok ice-core deuterium record . The second more recent tipping is the ending of the Younger Dryas using the grey-scale from the Cariaco basin sediments in Venezuela . This Younger Dryas event was a curious cooling, about 11 500 years ago, just as the Earth was warming up after the last ice age. It ended in a dramatic tipping point, when the Arctic warmed by 7°C in 50 years. This sudden ending has been related  to a switching-on of the global oceanic thermohaline circulation (THC). This switch-on is known from extensive theoretical studies  to be at a saddle–node fold arising as a perturbation of a subcritical pitchfork [12,26].
Under linear time-series analysis of the directly preceding data, neither of these two events shows a strong trend in the stability propagator (figure 9). Meanwhile sample estimates of the underlying potential functions are shown in figure 8.
In figure 8a, for the end of the last glaciation, we see clearly that the well is softening (falling beneath the parabolic fit) on the high-temperature end while hardening (rising above the parabola) on the low-temperature end. So there is a strong nonlinear signal that a jump to higher temperatures may be pending (as indeed it was). We show that this signal is statistically significant in our full analysis in figure 9. By comparison, the study of the Younger Dryas, illustrated in figure 8b, provided no significant nonlinear conclusions.
For figure 9, we detrended both time series using Gaussian filtering (figure 9c,d) and estimated the linear indicators κACF and κU. We observe weak to non-existent trends, leaving the evidence inconclusive at the linear level (grey and black curves in figure 9e,f). The ratio between κACF and κU (both shown in figure 9e,f) gives an estimate of the variance σ2 of the noise input. The unit for κACF is units of x per time step (so, for example, for figure 9e, °C/Δt), and, correspondingly, the unit for κU is Δt/°C (unit of κACF divided by unit for σ2). Both units are arbitrary so that we do not indicate the scale of the y-axis in figure 9e,f.
Lenton et al.  discuss the linear analysis of both time series in greater detail. Their time series Vostok corresponds to figure 9a; their time series Cariaco corresponds to figure 9b. (Note that Lenton et al.  use the deuterium proxy directly, whereas figure 9a shows the temperature reconstruction.) Specifically, figs. 2, 4, 9 and A1 in Lenton et al.  discuss how the results depend on method parameters such as detrending bandwidth and sliding window. Their analysis finds that the presence of a significant trend depends strongly on both parameters. See table 1 in Lenton et al.  (rows Vostok and Cariaco) for a summary of the evidence at the linear level.
At the nonlinear level, the time series for the end of the last glaciation has a strong quadratic nonlinearity in where the potential well U is presumed to govern the deterministic part of the dynamics (figure 9g). The indicators cemp, N2 and the skewness γ in figure 9g are all far removed from what can be expected by chance in a linear time series. The histogram in the background of figure 9g,h has been sampled from 500 random linear time series generated by the linear process given in equation (4.2) with , where κACF is taken from the estimate shown in figure 9e. The percentages at the top of figure 9g,h express how far in the tail of the histogram the mean of the quantity extracted from the geological time series is. We warn that the three quantities γ, N2 and cemp are not really three independent indicators, as they all depend on the same estimate of the empirical density pemp.
Visual inspection of the well shape in figure 8 confirms that the well is softening (bending downwards) on the high-temperature end and hardening (bending upwards) on the low-temperature end. So, at the nonlinear level the time-series data close to the bottom of the well already give evidence for a propensity to escape towards larger temperatures. The time series for the end of the Younger Dryas does not show evidence for strong nonlinearity of the underlying dynamics at the second-order level (figure 9h). The values for cemp, N2 and the skewness γ can all be explained by randomness, as the histograms of estimates for the 500 linear time series in figure 9h show. Note that we did not include the previous transition (visible at the left end in figure 9b) in our analysis as our aim was to infer the nonlinearity exclusively from the data near the tipping equilibrium.
The main message of the paper is that the linear analysis investigated by Lenton et al.  on its own, even if all the estimates are accurate, does not contain all the information necessary to estimate the probability of tipping over time. The result of the linear analysis is an estimate of the decay rate κ, which is taken relative to the time spacing of the measurements. Even if this decay rate has an identifiable trend to zero, the probability of tipping over time is determined by the dominant nonlinear coefficient N2 in conjunction with κ and the noise level.
We found that the dominant nonlinear coefficient N2 is much harder to estimate accurately, so we also looked at proxies that indicate nonlinearity, such as the skewness γ (as proposed by Guttal & Jayaprakash ) or the coefficient cemp from equation (3.5) (which is also a scalar signed measure for deviation from linear theory). Our study of the saddle–node normal form suggests that both proxies, γ and cemp, give values that are easier to distinguish from random chance than the estimate for N2 itself. A significantly non-zero value for either of these proxies indicates that a quadratic nonlinear term is present and gives its sign.
However, we found the uncertainty for moderately long time series (N=2000) too large to translate the proxy back into a quantitatively reliable estimate for the normal-form parameter (this would be necessary to read off the probability for tipping from the tables in Thompson & Sieber ). We also found that the nonlinear proxies do not have discernible trends when one approaches tipping, so it makes sense only to extract their overall mean from the time series but not the trend.
Appendix A. Sensitivity analysis for estimates of nonlinear quantities in palaeo-climate records
Figure 10 shows how the results in figure 9g,h depend on the parameters of our method. Specifically, we vary the length of the sliding window and the bandwidth of the Gaussian kernel when fitting the quasi-equilibrium drift. In all panels, the x-axis is the length of the sliding window relative to the overall length of the time series on a logarithmic scale (overall length is N=525 for the end of the last glaciation, N=2646 for the end of the Younger Dryas). For example, the x-value of −1 corresponds to a window length w=N/2. The y-axis is the kernel bandwidth for the Gaussian fitting kernel relative to the overall length of the time series on a logarithmic scale (this enters the function kde1d as a resolution parameter). For example, the y-value of −3 means that the resolution parameter of kde1d is 23=8.
For each measured quantity, we compare the mean of its value over all sliding windows to the percentiles of the linear surrogate. In this way, contour levels close to 0 or 100 correspond to significant deviations from a time series generated by a linear process. The distributions of these time series are shown as grey background in figure 9g,h. The value shown as a black vertical line in figure 9g,h was obtained using the method parameters highlighted by a black circle in figure 10.
Figure 10 gives evidence that the nonlinear features of the time series for the end of the last glaciation are robust as long as the detrending remains coarse-grained. The end of the Younger Dryas does not show any nonlinear feature that is significantly different from the surrogates for any choice of method parameters.
Appendix B. Dependence of skewness on distance to tipping point
The observations in figures 6b and 7c appear to contradict the statement by Guttal & Jayaprakash  that skewness increases as one approaches tipping. Furthermore, there is a quantitative difference between figures 6b and 7c, even though they show the same qualitative feature: non-monotonicity of the skewness in the control parameter μ. Note that Guttal & Jayaprakash  studied the same type of tipping as in figure 6, not in the fold normal form but in a particular ecological model. In order to investigate this apparent contradiction, we look at how the skewness depends on two parameters: the control parameter μ and the value at which we consider a realization as escaped. More precisely, both figures 6b and 7c study (B 1) where σ=1 without loss of generality and μ is constant. Any realization of (B 1) escapes to in finite time almost surely so that (B 1) does not possess a stationary density. In the calculation for figure 6, we modified the process (B 1) by re-injecting the realization at whenever it escaped to . In this way, the process has a stationary density given by the Fokker–Planck equation (3.3) with −∂xU(x)=μ−x2. However, this is not accurately describing the density that one is interested in: the probability density of the realizations under the condition that the realization has not yet ‘escaped’. Clearly, this conditional probability depends on what one means by escaped. So, we introduce the escape boundary b as an additional parameter. Whenever, a realization gets smaller than the value (that is, it has run away beyond the local maximum of the potential by distance b) we stop the integration of (B 1). When one considers the set of all realizations before escape, then this also leads to a stationary density. This was done for figure 7 using (such that we count a realization as escaped whenever it reaches −1). The re-injection used for figure 6 and the choice of x=−1 as escape indicator lead to the systematic difference in skewness γ between figures 7c and 6b. In practice, one encounters only the conditional probability as shown in figure 7. So, it is important to check how the skewness depends on the somewhat arbitrary choice of the value b indicating escape. Moreover, the estimate shown in figure 7c suffers from the shortness of the time series for μ close to 0 (which is typical in practice).
Figure 11 shows the result of a more precise calculation: we simulate (B 1) for a large ensemble of realizations (n=100 000). At every time step tk, we collect the nesc,k realizations that have reached a value . They count as escaped and have to be discarded. In order to avoid depletion of the ensemble, we start nesc,k new realizations at tk. For each newly started realization, we pick the current value of one randomly chosen non-escaped realization as the initial value. This process reaches a stationary density, which is an approximation of the conditional probability density of x under the condition that x has not yet escaped beyond . We observe in figure 11 that the depth of the dip in the skewness depends on b (indicated by grey-scale) but that the location of the dip is uniformly larger than the critical value 0 of the control parameter μ.
It is possible that the simulations by Guttal & Jayaprakash  stayed consistently to the right of the curve of extreme skewness shown in figure 11 such that only the increase is observed. Note that in their model the notion of skewness has the opposite sign, so one should consider the modulus of the skewness.
One contribution of 13 to a Theme Issue ‘Climate predictions: the influence of nonlinearity and randomness’.
- This journal is © 2012 The Royal Society