## Abstract

I describe an approach to data assimilation making use of an explicit map that defines a coordinate system on the slow manifold in the semi-geostrophic scaling in Lagrangian coordinates, and apply the approach to a simple toy system that has previously been proposed as a low-dimensional model for the semi-geostrophic scaling. The method can be extended to Lagrangian particle methods such as Hamiltonian particle–mesh and smooth-particle hydrodynamics applied to the rotating shallow-water equations, and many of the properties will remain for more general Eulerian methods. Making use of Hamiltonian normal-form theory, it has previously been shown that, if initial conditions for the system are chosen as image points of the map, then the fast components of the system have exponentially small magnitude for exponentially long times as *ϵ*→0, and this property is preserved if one uses a symplectic integrator for the numerical time stepping. The map may then be used to parametrize initial conditions near the slow manifold, allowing data assimilation to be performed without introducing any fast degrees of motion (more generally, the precise amount of fast motion can be selected).

## 1. Introduction

Data assimilation is the science of blending observational data with scientific models, with the aim of predicting future evolution or reconstructing missing data. Historically, numerical weather prediction is one area driving the field [1]; in that context the aim of data assimilation is to provide the optimal initial condition (or ensemble of initial conditions) for a forecast of future weather. In the field of ocean modelling, data are much more scarce since the ocean is mostly impenetrable to radiation, and data assimilation is an important tool even for predicting the current state of the ocean [2]. In the study of the Earth's climate, data assimilation has mostly performed two roles. Firstly, data can be used to estimate parameters in physics parametrizations and in other parts of an Earth system model such as biogeochemical processes [3–6] (although care must be taken to ensure that these parameters are not dependent on the current climate state; this would reduce confidence in the model in other scenarios). Secondly, data assimilation is used to produce reanalyses (estimates of past model states produced using all available data). The ECMWF (European Centre for Medium–Range Weather Forecasts) ERA-40 reanalyses [7,8] are a very popular dataset among climate scientists seeking to study climate patterns from the middle of the twentieth century onwards. Data assimilation is also used in the challenging problem of reconstructing prehistoric climate from palaeoclimate proxy data [9]. More recently, there has been efforts towards decadal climate prediction [10], anticipated by Bengtsson & Shukla [11], and driven by the IPCC (Intergovernmental Panel on Climate Change) choosing ‘near-term climate change’ as one of the focus areas of the AR5 IPCC report due for publication in 2014 [12]. Prediction on these relatively short scales will require accurate estimates of the current climate state, particularly of the ocean, for which data assimilation is playing a key role.

This paper concentrates on the need to ensure that the assimilated model state exhibits geophysical balances in the presence of sparse or noisy data. It presents a mathematical treatment of these issues using techniques of slow–fast dynamical systems. In time-dependent variational data assimilation (4DVAR), one seeks optimal initial conditions for a dynamical system given a set of observations over a time window, i.e. the initial conditions for the trajectory that passes closest to each of the observations. If the dynamical system has more than one time scale, then noise in the observation data can excite spurious fast motions that are not present in the true model state. Roughly speaking, this is because the optimization procedure will search among all the degrees of freedom to take the solution trajectory closer to the observation data; to prevent this from happening, one needs to impose conditions on the underlying balance in the solution trajectory. The problem of initialization in ocean and atmosphere simulations has been addressed by filtering fast degrees of freedom from the initial data during the data assimilation process (see [13–15]) typically by introducing penalty terms into the functional to be optimized during the data assimilation process. There has been a recent study of the effects on balance of other types of data assimilation (including the extended Kalman filter and the ensemble Kalman filter) reported in [16], using the modification of the Lorenz '86 model proposed in [17].

In this paper, I propose an alternative method in which a nonlinear mapping is used that maps onto system states, which only contain very weak fast degrees of freedom. The mapping (which is closely related to the optimal PV balance method [18]) is very simple and is supported by the theory of exponentially accurate Hamiltonian normal forms (see [19] for a review) and backward error analysis for symplectic integrators (see [20,21] for reviews). It was first introduced in [22] in an experiment to measure the exchange of oscillatory energy between two colliding molecules, and applied in the context of semi-geostrophic Lagrangian particle motion in [23]. The systems considered in the latter paper include particle–mesh methods for rotating shallow-water equations (which can also be extended to the rotating compressible Euler equations [24] and the isentropic multilayered equations) as well as a simple illustrative toy model for balance, which has also been considered in [25] and which I shall use here to illustrate the method through numerical experiments.

The rest of the paper is organized as follows. The method is introduced in §2, with the model equations of motion being introduced in §2*a*, and a review of the theory of exponentially accurate normal forms and backward error analysis for symplectic integrators being given in §2*b*. The mapping is described in §3, the application to 4DVAR being given in §3*a*. Numerical experiments illustrating the method are shown in §4, and a summary and outlook are given in §5.

## 2. Description of method

### (a) Equations of motion

In this section, I introduce and motivate the model equations that are used to illustrate the method described in this paper. More generally, the method can be applied to any model system where the split between fast and slow variables can be made in the Hamiltonian, i.e. the Hamiltonian structure matrix is independent of *ϵ*.

The shallow-water equations (SWEs) on an *f*-plane are
2.1
2.2
and
2.3Here, *h*=*h*(*t*,*x*,*y*) is the fluid depth, *g* is the acceleration due to gravity, *f* is twice the angular velocity of the reference plane,
2.4is the Lagrangian or material time derivative, and subscripts denote partial differentiation with respect to that variable. See [26] for a derivation of the SWEs and their relevance to geophysical fluid dynamics.

In non-dimensionalized variables, the SWEs can be written in the form
2.5
2.6
and
2.7where *Ro*=*U*/*fL* is the Rossby number and *B*=(*L*_{R}/*L*)^{2} is the Burger number. Furthermore, is the Rossby radius of deformation, *U* is a typical advection velocity, *L* is a typical length scale and *H* is the mean fluid depth for the problem under consideration. In this paper, following [23], I choose the semi-geostrophic scaling *Ro*=*B*=*ϵ*. See [26] for a derivation of non-dimensionalized equations and the semi-geostrophic scaling limit. In addition to the SWEs and their approximation using the Hamiltonian particle–mesh (HPM) method, Cotter & Reich [23] and Frank *et al*. [27] considered a simpler model problem where the layer depth *μ*(*ϵt*,*x*,*y*) is a given function of time *t* and space ** x**=(

*x*,

*y*)

^{T}. We may then investigate the motion of a single fluid parcel with coordinates

**=(**

*q**q*

_{x},

*q*

_{y})

^{T}. After introducing the new time scale

*τ*=

*ϵt*, denoting time derivatives with respect to

*τ*by overdot, the parcel coordinates and their conjugate momenta

**=(**

*p**p*

_{x},

*p*

_{y})

^{T}satisfy the Hamiltonian system of equations 2.8and 2.9

Slightly more generally, we consider models of the form
2.10and
2.11where for *i*=1,…,*n* and
*J*_{2} is the 2×2 skew-symmetric matrix
*V* is a potential function, *ϵ* is a small parameter and *n* is a positive integer. This form covers Lagrangian particle discretizations of the rotating shallow-water equations (such as smooth-particle hydrodynamics and the HPM method), as well as the low-dimensional toy model considered in [25] with *n*=1, *M*=*I* and *V* a given function of *X*_{1}, which I shall use in the numerical experiments. The scaling with *ϵ* corresponds to the semi-geostrophic scaling on the fast time scale.

These equations can be written in the Hamiltonian form:
where the structure matrix is a block diagonal with *n* blocks *J*_{b} given by
2.12and where the Hamiltonian function is
For the case of the toy model there is only one block in the structure matrix *J*. The time derivative of any function *F* of ** Q** and

**is given by the Poisson bracket:**

*P*Another important example that I shall refer to in this paper is the semi-discretization (discrete in space, continuous in time) HPM method [27], which is a discretization of the shallow-water equations in Lagrangian variables that retains the Hamiltonian structure. In this case, the potential *V* is the discretization of hydrostatic pressure *gh* which couples together all of the Lagrangian particles.

Equations (2.10)–(2.11) represent a slow–fast system in which there are fast oscillations on the time scale, and slow dynamics on the time scale. The method in this paper applies to the notion of a slow manifold in phase space, near which the fast oscillations are extremely weak and the slow dynamics dominates. As described in the next section, it is possible to obtain estimates for the magnitude of the fast oscillations, and the rate of drift away from the slow manifold, which is also extremely slow. In this paper, I will assume that the dynamical system is evolving on the slow manifold and will seek to perform data assimilation based on this assumption. One possible approach is to derive an asymptotic expansion for the coordinates on the slow manifold: these are known as *balance relations*, which parametrize the slow manifold by specifying the momenta ** P** as a function (written as an asymptotic expansion in

*ϵ*) of the coordinates

**. The initial conditions could then be assumed to satisfy such conditions, having truncated the series. However, in this paper, I give an alternative method based on the theory of Hamiltonian normal forms, which could be described as being ‘equation-free’ since the balance relations need never be explicitly computed.**

*Q*### (b) Exponential estimates

In this section, I review the theory of exponentially accurate Hamiltonian normal forms applied to equations (2.10)–(2.11), as described in [23]; this theory is used in the data assimilation technique described in this paper.

The theory describes a change of coordinates that makes explicit the interaction between fast and slow frequencies in the system. If the potential *V* is analytic and bounded in some region of phase space, and if *ϵ* is sufficiently small, there exists a symplectic change of coordinates *Ψ* that transforms the Hamiltonian to the form
for some positive constant *c*, where *G* commutes with *K* in the Poisson bracket, i.e.
and where the magnitude of ∇*R* is bounded. In these coordinates, the time derivative of is
which is exponentially small as *ϵ*→0. This means that the magnitude of satisfies
for some constant *d*. In the transformed coordinates the magnitude of measures the amount of fast dynamics in the system; in particular, if is initialized to **0** then it will stay exponentially close to **0** for exponentially long times. This means that is an almost-invariant subspace of the dynamical system, and the *slow manifold* is defined to be the inverse image under *Ψ* of the space defined by . If the dynamical system is initialized on the slow manifold, then there will be exponentially weak fast dynamics and the system will stay very close to the slow manifold for very long times. In this paper, I use a method for generating trajectories that are close to the slow manifold, for use in data assimilation.

However, I do not perform data assimilation with a perfect model, but must discretize the equations in time using a numerical integrator. As described in [23], if a symplectic integrator is used to integrate the dynamical system, then there exists a dynamical system with a modified Hamiltonian
(where *p* is the order of the numerical method) with solutions that *shadow* the numerical trajectory, i.e. they stay exponentially close to the numerical trajectory for exponentially long times as Δ*t*→0. Furthermore, there exists a change of coordinates *Ψ*_{Δt} such that the modified Hamiltonian in the transformed coordinates takes the form
where
This means that the numerical trajectory also has an exponentially accurate slow manifold, which is an perturbation to the slow manifold for the original system.

Since the aim of data assimilation on the slow manifold is to prevent noisy observations from generating spurious fast oscillations, we will perform our data assimilation on the modified slow manifold for the numerical integrator rather than the slow manifold for the continuous-time differential equation. Since the method proposed does not need to explicitly compute the balance relations that parametrize the slow manifold, it is possible to change between numerical methods without altering the method, provided that a symplectic integrator is used.

## 3. Mapping onto the slow manifold

In this section, I describe how to apply the normal-form theory to obtain dynamical trajectories on the slow manifold (and numerical trajectories on the modified slow manifold for the numerical integrator).

As described in [23], the exponential estimate can be extended to the case in which the potential is scaled by a time-varying function *r*(*ϵt*) which evolves on the slow time scale, so that the Hamiltonian becomes
and the equations are
3.1and
3.2

I use the standard technique of making a system autonomous by introducing new canonically conjugate variables *τ* and *e* so that the Hamiltonian becomes
subject to the initial condition *τ*=*ϵt*_{0}. We are now in the same framework as the previous section and there exists a symplectic change of coordinates *Ψ*_{r} under which *K*(** P**(

*t*)) is exponentially close to

*K*(

**(**

*P**t*

_{0})) for exponentially long times.

To compute our mapping, I choose *r* so that *r*(*τ*)=0 for *τ*≪0 and *r*(*τ*)=1 for *τ*≫0. Since the proofs of the estimates require an analytic function I use
where *α* is a positive constant. For large negative times *τ*_{0}≪0, *r*(*ϵt*) has almost vanished and the equations become
and
Here, the change of coordinates *Ψ*_{r} is trivially the identity. For large positive times *τ*_{1}≫0, *r*(*ϵt*) is almost equal to 1, and the system reduces to our original dynamical system (2.10)–(2.11); *Ψ*_{r}, when restricted to ** Q** and

**, is equal to**

*P**Ψ*. Since the magnitude of

**in the transformed coordinates has only changed by an exponentially small amount in the time interval [**

*P**t*

_{0},

*t*

_{1}], if we choose initial conditions

**(**

*P**t*

_{0})=

**0**, and solve the system forwards to time

*τ*

_{0}, then we obtain a system state that has fast degrees of freedom that are exponentially small in magnitude as

*ϵ*→0. This technique can equally be applied to numerical solutions using a symplectic integrator since all the properties hold for the system with the modified Hamiltonian. If we choose

*τ*

_{0}so that

*r*(

*τ*

_{0}) is exponentially small in

*ϵ*(and similar for

*τ*

_{1}), then we can apply the map over a finite time on the slow time scale (in practice, this is not prohibitively long). The choice of

*α*should be made so that

*r*′(

*τ*) is of a similar order of magnitude to |∇

*V*(

**)| in the dynamical region of interest; this is in order to optimize the time**

*x**τ*

_{1}−

*τ*

_{0}required to integrate in order to compute that map.

This allows us to construct a mapping that maps onto the exponentially accurate slow manifold:

— given inputs

, choose initial conditions*Q*(*Q**t*_{0})=,*Q*(0)=*P***0**,*τ*(*t*_{0})=*ϵτ*_{0};— solve the system (3.1)–(3.2) using a symplectic integrator until

*t*=*t*_{1}=*τ*_{1}/*ϵ*; and— the map is defined as

*ϕ*()=(*Q*(*Z**t*_{1})).

We do not know much about the properties of this map; in particular, we do not know where on the slow manifold a given point is mapped to. However, by numerically inverting it we can use it to constrain the dynamics to be on the slow manifold during a data assimilation procedure. We can also use this technique to measure the amount of fast dynamics present in the system state. This is done by replacing *r*(*τ*) by *r*(−*τ*) (i.e. running backwards) so that at late times the Hamiltonian is equal to *K*(** P**) and the fast energy can be measured from the magnitude of

**.**

*P*### (a) Variational data assimilation on the slow manifold

Typically, during 4DVAR one tries to choose optimal initial conditions for a dynamical system given statistics (mean and covariance) for the initial conditions *Z*_{b} (called a background state, or prior) and a set of observations , where are observation operators, and and *e*_{i} are true values of the data trajectory and observation errors at sample time *t*_{i}, respectively. Making the assumption of Gaussian statistics, we seek a maximum likelihood estimator for the initial conditions of the data. This maximum likelihood estimate minimizes the functional
where *B*_{0} and *R*_{i} are the background and observation covariance matrices. For derivations, discussion and references see [1].

The aim of our data assimilation procedure is to seek inputs ** Q** for the slow manifold map

*ϕ*such that the initial conditions

**(**

*Z**t*

_{0})=

*ϕ*(

**) minimize**

*Q**J*. This is done by the following procedure:

—

*Numerically invert the map ϕ to find**Q*_{b}=*ϕ*^{−1}(*Z*_{b}). For the toy system used in the numerical experiments in this paper, it was sufficient to use the Scientific Python (SciPy) nonlinear solver`optimize.fsolve()`with an initial guess obtained by choosing first-order balanced velocities at=*Z**Z*_{b}, running backwards (as described at the end of the previous section) and taking thecomponents. For larger systems, such as particle methods for the rotating shallow-water equations, it may be necessary to use more sophisticated Newton methods such as continuation or multiple shooting.*Q*—

*Attach the data assimilation window to the end of the map time interval*[*τ*_{0},*τ*_{1}]*and solve for initial conditions**Q**that minimize J*°*ϕ*,*using**Q*_{b}*as an initial guess*. In our numerical experiments, this was done by using the nonlinear conjugate gradient algorithm (`optimize.fmin_cg()`in the SciPy package) with gradients computed using the adjoint equations. We found that the numerical minimization procedure stagnated in some cases if the initial guess*Q*_{b}was not used.

## 4. Numerical experiments

In this section, I illustrate the method using the toy model
and
with the quartic potential
For small *ϵ*, this system has an exponentially accurate slow manifold which produces periodic trajectories. The system is integrated using the second-order symplectic splitting method
4.1
4.2
4.3
4.4
and
4.5where
which is the extension of the Verlet method to the non-canonical structure matrix for this system. The only modification required to apply the map is to replace the middle step by
To apply the map, the system is integrated from *t*=−5/*ϵ* to *t*=5/*ϵ*, with Δ*t*=0.1. Illustrative plots are shown in figure 1, which shows how it is possible to control the amount of fast dynamics that are present in the system by altering the initial conditions for ** P** at the beginning of the map.

To illustrate the issue with assimilating noisy sparse observations, I used the map to construct a trajectory with exponentially weak fast dynamics, and took five observations of the ** Q** variable, which were perturbed by normally distributed

*N*(0,(0.2)

^{2}) noise, together with an estimate of the background state. Equal statistical weight was given to each observation and to the background state, with no cross-correlations. I then performed standard 4DVAR in which optimization over all four components of the initial conditions was performed. The nonlinear conjugate gradient method was iterated to a gradient tolerance of 10

^{−10}, and convergence was achieved in 31 iterations using 71 function evaluations (forward runs) and 71 gradient evaluations (adjoint runs), with an optimal functional of 0.54. Figure 2 clearly shows that fast motions have been introduced to reach a lower value of the optimized functional. Note that the slow part of both solutions also varies quite a bit, which is because the variance of the observation error is large (0.2

^{2}). The balanced assimilation is more constrained because it cannot reach the data through fast motions, and so the differences between observations and model trajectory are distributed differently.

Figure 3 shows the optimal trajectory obtained from the same five data observations of the particle position ** Q**, using the map to constrain the dynamics to the slow manifold. The nonlinear conjugate gradient method was iterated to a gradient tolerance of 10

^{−10}, and convergence was achieved in 16 iterations using 55 function evaluations (forward runs) and 55 gradient evaluations (adjoint runs) with an optimal functional value of 0.59. The plots show that the noise in the observations does not introduce any visible fast dynamics, as the initial conditions have been constrained to be on the exponentially accurate slow manifold.

## 5. Summary and outlook

In this paper, I described a nonlinear mapping that can be used to exactly constrain the amount of fast dynamics in a range of slow–fast model dynamical systems. This mapping can be extended to the modified Hamiltonian systems that arise when symplectic integrators are used to discretize in time. This mapping can be used as an exponentially accurate filter removing fast oscillations from the system which would otherwise be excited by noisy observations in a data assimilation process. In comparison, the digital filter of [15] only removes the first-order component of the fast oscillations. I described how to apply this map to variational data assimilation and illustrated the method applied to a simple toy model of balance, integrated using a second-order symplectic splitting method.

The model systems described in this paper include particle and particle–mesh methods for rotating fluids, and future work will be to perform more numerical experiments for these models. An important consideration will be that these methods usually use compactly supported basis functions, which are not analytic, while the exponential estimates require analytic potentials. However, previous experience with these basis functions [23] suggests that the fast motion present after applying the map will still be very weak. The connection with the optimal PV balance also needs to be explored, as does the quasi-geostrophic scaling (I only consider the semi-geostrophic scaling here).

In general, it may be the case that the true trajectory may not be exponentially close to the slow manifold, in which case this procedure can be modified since it is possible to select the magnitude of the fast dynamics in the system. An important issue is whether it is possible to estimate the distance from the slow manifold from the data; from sparse noisy observations this seems to be difficult, as the unbalanced motion only causes a very weak phase shift in the mean trajectory.

In this paper, I focused on variational data assimilation. However, the method could also be extended to other techniques such as the ensemble Kalman filter and particle filters. These methods involve propagating forward an ensemble of trajectories that approximate a probability density function (PDF) for the solution state. One interesting approach would be to consider PDFs for the ** Q** components of the system only, with the

*P*components being obtained by inverting the map

*ϕ*. Thus ensemble filters could be performed on the exponentially accurate slow manifold. The method of this paper could also be used to generate ensembles with a chosen PDF for the distance from the slow manifold.

## Footnotes

One contribution of 13 to a Theme Issue ‘Mathematics applied to the climate system’.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.