Royal Society Publishing

Fitting ordinary differential equations to short time course data

Daniel Brewer , Martino Barenco , Robin Callard , Michael Hubank , Jaroslav Stark


Ordinary differential equations (ODEs) are widely used to model many systems in physics, chemistry, engineering and biology. Often one wants to compare such equations with observed time course data, and use this to estimate parameters. Surprisingly, practical algorithms for doing this are relatively poorly developed, particularly in comparison with the sophistication of numerical methods for solving both initial and boundary value problems for differential equations, and for locating and analysing bifurcations. A lack of good numerical fitting methods is particularly problematic in the context of systems biology where only a handful of time points may be available. In this paper, we present a survey of existing algorithms and describe the main approaches. We also introduce and evaluate a new efficient technique for estimating ODEs linear in parameters particularly suited to situations where noise levels are high and the number of data points is low. It employs a spline-based collocation scheme and alternates linear least squares minimization steps with repeated estimates of the noise-free values of the variables. This is reminiscent of expectation–maximization methods widely used for problems with nuisance parameters or missing data.


1. Introduction

Ordinary differential equations (ODEs) are one of the most popular frameworks for describing the temporal evolution of a wide variety of systems in physics, chemistry, engineering and biology (e.g. Gershenfeld 1999). Such models take the formEmbedded Image(1.1)where x is a vector of variables evolving with time; f is a vector field; and α denotes an (optional) set of parameters. Once an ODE model has been built, a vast array of powerful analytical and numerical methods exists for exploring its properties. Any such model will usually involve one or more parameters, α. In some cases, particularly in physics where models are based on well-understood physical mechanisms, such parameters may be derived from first principles, or measured directly. Increasingly, however, ODEs are being applied in disciplines such as cell and molecular biology where many parameters cannot be determined by either of these approaches.

In such situations, one can attempt to estimate the unknown parameters from experimentally measured data. In most cases, these data consist of time series, or time courses, of repeated measurements of one or more experimental variables. In cell and molecular biology, these might, for instance, be mRNA or protein concentrations measured at several time points throughout an experiment. One can fit such data by systematically varying the parameters to determine a set of parameters which minimize the difference between a solution of the differential equation and the data (e.g. Gershenfeld 1999). One can also apply the same methodology to build a ‘black-box’ model by starting with a general set of basis functions (such as polynomials or radial basis functions) and then estimating their coefficients by minimizing the discrepancy between model and data. In principle, such an approach can be used to deduce the network of interactions between the variables (Stark et al. 2003a,b) which in turn can lead to more refined mechanistic models.

A variety of approaches exists to fit data to ODE models. Unfortunately, many of these are poorly documented in the literature, and may only be described in the context of specific applications in specialist publications. We therefore present an overview of the key concepts below. We show that such methods can be classified by whether the ODE is solved using a conventional iterative numerical integrator such as fourth-order Runge–Kutta, or whether a global solution is approximated using splines or related methods. This distinction is similar to that between shooting and collocation methods for boundary value problems (Golub & Ortega 1992). Although global methods are now generally preferred for solving boundary value problems, they are poorly developed in the context of fitting ODEs to data. In such cases, shooting is generally more popular and well known.

Most experiments in cell and molecular biology produce very short time courses, often with very noisy measurements. Shooting-type methods tend to perform particularly poorly in such situations; we show an example of this below. As ODEs become more widely applied in this area, particularly with the current rapid growth in systems biology, there is therefore an urgent need to develop alternative methods more suited to this type of data. The use of collocation-type algorithms appears to be particularly attractive. We explain the main ideas behind this approach below, describe existing algorithms and present a new two-stage algorithm for ODEs that are linear in parameters. To illustrate these ideas, we use a model of the p53 tumour-suppressor network, which is described briefly in appendix A.

2. Estimating parameters in ODEs

(a) General principles of model fitting

Any method for estimating model parameters from data requires two main ingredients. We need to construct an error function Embedded Image that quantifies the difference between a model with parameters α and the data, and we need an optimization method that finds the value of α that minimizes Embedded Image. Except for error functions that have particular features (such as those occurring in linear least squares problems), the minimization stage requires an iterative approach. Typically, the minimization method can be chosen independently of the construction of the error function. In some cases, however, an integrated approach can have advantages. This can occur, for instance, if it is sufficient to compute only a rough estimate of Embedded Image at each iteration of the minimization scheme, rather than calculating Embedded Image exactly.

A wide variety of standard minimization algorithms exist (e.g. Gershenfeld 1999). Where Embedded Image has no, or only a few, local minima apart from the global minimum, then methods that iteratively step downhill, such as the Nelder–Mead simplex method (Nelder & Mead 1965 or see Press et al. 2002) or the Levenberg–Marquardt method (Marquardt 1963 or see Press et al. 2002) work well. If, on the other hand, Embedded Image has a more complex landscape, stochastic search algorithms such as simulated annealing (Gershenfeld 1999 or Press et al. 2002), Markov Chain Monte Carlo (e.g. Gilks et al. 1996) or genetic algorithms (e.g. Gershenfeld 1999) are often necessary. These are usually computationally very demanding.

Attempting to estimate the parameters α of the ODE in equation (1.1) from observed data presents additional difficulties. These are mainly centred on the construction and efficient computation of a suitable error function. This is because we cannot directly determine how well a given set of data pointsEmbedded Imagefits the ODE in equation (1.1). We shall describe two common strategies for the construction and minimization of appropriate error functions.

(b) Solution-based approaches

By far the better known, and arguably statistically more valid, approach is to solve equation (1.1) numerically to obtain an approximate solution Embedded Image such thatEmbedded Image(2.1)Since this is meant to provide a model for the data, the points Embedded Image should be close to the values Embedded Image. It is thus natural to base the error function Embedded Image on the difference between Embedded Image and Embedded Image. The most common choice is to use the least squares errorEmbedded Image(2.2)possibly weighted by the reciprocal of the noise level at each data point. The subscript D here highlights that this is an error between the data and the solution u, in order to distinguish this quantity from other error functions defined below. When measurement errors are independently normally distributed, Embedded Image will be the logarithm of the likelihood of the data (or more strictly, log (likelihood)), and minimizing equation (2.2) is equivalent to maximum likelihood estimation of the parameters. It is also possible to use this approach even if we cannot measure all of the components of the state vector Embedded Image. In such a case, the norm in equation (2.2) is just taken over the measured components. As long as we have a sufficient number of data points to ensure that Embedded Image has a non-degenerate minimum, it may still be possible to estimate parameters successfully.

Observe that when f has linear dependence on its parameters, the error function given by equation (2.2) is a quadratic form. The minimum in such a case can be obtained efficiently in one step using algebraic methods such as QR decomposition (e.g. Lawson & Hanson 1974). This is much faster and more robust than the iterative optimization algorithms mentioned above. For ODEs this possibility seems irrelevant, however, since even if the vector field f depends linearly on the parameters α, the solution Embedded Image will depend nonlinearly on α. Nevertheless, we shall present below a new approach that makes use of linearity in parameters.

The most popular method of solving the differential equation is to use a standard numerical integration scheme such as fourth-order Runge–Kutta, possibly with adaptive step-size control (e.g. Press et al. 2002). Such an approach immediately encounters a major potential obstacle: we very rarely know the correct initial condition Embedded Image for all of the variables. Even if we are able to measure all of the components of Embedded Image, such measurements will inevitably be subject to some error. Thus, in practice, we only have Embedded Image where Embedded Image denotes the experimental error. If the differential equation is globally stable, and Embedded Image is small, the difference between Embedded Image and Embedded Image may not be too significant. However, most models in the real world are nonlinear. In such a case, the dynamics can dramatically amplify small differences in the initial conditions. Attempting to estimate parameters for such systems using Embedded Image as an initial condition leads to poor results.

(c) Shooting

A better approach is to regard the initial condition Embedded Image as an additional set of unknown parameters which are incorporated in the minimization scheme. We thus regard the error function Embedded Image as depending on both α and Embedded Image. This type of approach appears to have first been tried by Bellman et al. (1967) and similar ideas appear in Swartz & Bremermann (1975). It is closely related to shooting methods for boundary value problems, including methods used for finding periodic orbits and other special solutions (e.g. Golub & Ortega 1992; Kuznetsov 1995). It can work well if data are plentiful and noise levels are low. However, if we only have a few time points, as is typical of cell and molecular biology datasets, Embedded Image can have a large number of local minima separated by steep peaks and ridges. As we shall see below, it is difficult to find the global minimum in such cases.

One possible extension is to use multiple shooting methods where the solution is broken down into a number of successive segments, with appropriate matching at the joints (e.g. Kuznetsov 1995; Timmer et al. 2000). This can improve results, but our experience with even moderately complex models is that it can suffer from similar problems to simple shooting.

(d) Collocation methods

An alternative to using iterative numerical integration is to represent the solution globally using a set of convenient basis functions Embedded Image. A suitable choice is usually given by piecewise polynomials, usually of low order. If we match values and derivatives at the joins between adjacent polynomials, globally smooth functions can be obtained, called splines. We can represent these as linear combinations of the formEmbedded Image(2.3)A variety of different types of spline are available, but often the most convenient are B-splines (De Boor 1978; Gershenfeld 1999). Such splines have minimal support with respect to degree, smoothness and partition, and any other spline function of a given degree, smoothness and domain partition can be represented as a linear combination of B-splines of that same degree and smoothness over the same partition. B-splines have the great advantage that each Bj has compact support so that evaluating Embedded Image requires only summing over a small number of Bjs. More precisely, if we partition the domain using the knots Embedded Image, we can recursively define basis functions of increasing order byEmbedded ImageEmbedded ImageNote that the basis function at a given degree is obtained by interpolating appropriate combinations of basis functions of one degree less. It is easy to see that Embedded Image is 0 outside the interval Embedded Image. In applications, typically cubic B-splines are used (with i=3). We shall restrict ourselves to the case of uniform spacing of the knots and define Embedded Image in order to give a more symmetric form. An explicit formula is given byEmbedded Imagewhere h is the spacing between the knots sj. Observe that for Embedded Image the sum in equation (2.3) reduces toEmbedded ImageFurthermore, evaluating at the knot point sj, we haveEmbedded ImageThis can be written in matrix form asEmbedded Image(2.4)where Embedded Image and Embedded Image are dummy values required to give a well-determined system. These determine the behaviour of u at the boundaries and can, for instance, be chosen to set the second derivatives of u at the boundaries to 0 (a so-called natural cubic B-spline). The matrix in (2.4) is explicitly invertible (though in practice one would solve equation (2.4) using a specialized LR decomposition). This shows that we can either parametrize the spline Embedded Image using the coefficients Embedded Image, or using its value at the knot points Embedded Image. This makes it straightforward to obtain a spline that interpolates any particular set of points. It is also possible to generalize this derivation to irregularly spaced knot points, which can have significant advantages in some applications.

The Bj are polynomials and hence using equation (2.3), we can easily differentiate u and substitute the derivative into equation (1.1). In general, since we only take a finite number of basis functions, we cannot expect this to be satisfied exactly everywhere, i.e. for every t. Instead, we choose a finite number of so-called collocation points Embedded Image, and require (1.1) to hold at these points, so thatEmbedded Image(2.5)for all Embedded Image. Note that the derivatives Embedded Image can be easily pre-computed for any given set of basis functions and collocation points, so that equation (2.5) represents a system of algebraic equations for Embedded Image, or for the values Embedded Image via equation (2.4). Note that equation (2.5) is independent of the particular choice of spline, or indeed other basis functions.

Given an appropriate choice of splines and collocation points, the system (2.5) is well determined and can be solved to yield the coefficients of the approximate solution of the differential equation. It turns out that this is equivalent to an appropriate implicit Runge–Kutta integration scheme. This approach is particularly useful for boundary value problems (Villadsen & Stewart 1967) and today forms the basis of standard packages such as Auto for finding periodic orbits and bifurcation points. The basic principle is to use the Newton method or a similar root finder to solve equation (2.5), together with appropriate side conditions. These ensure that u is periodic (or homo-/heteroclinic for certain global bifurcations) and in the case of finding local bifurcations has particular eigenvalues. In the case of bifurcations, it is necessary for the root finder to simultaneously vary both the coefficients bj and the parameters α. This bears close resemblance to parameter estimation from data and suggests that collocation-based methods may also be useful in the latter problem.

However, there is an important difference between the two problems. When finding bifurcations, we simply want to solve a set of (nonlinear) equations made up of equation (2.5) and whatever side conditions we need to specify the bifurcation. In the case of parameter fitting, the side conditions are replaced by the minimization of ED(α) which has to be carried out simultaneously with the solution of equation (2.5). There are a number of possible approaches to this. Observe that if u is given by equation (2.3) then ED has no direct dependence on α but rather is a function of b. From now on, we shall thus denote it byEmbedded Image

(i) Nested minimization and collocation

This is closely related to shooting, except that instead of integrating the differential equation using a method such as fourth-order Runge–Kutta, we use the Newton method to solve equation (2.5) at each candidate value of the parameters α. This Newton solver is then nested inside an optimization method that iteratively minimizes Embedded Image. This is potentially very slow, and we are not aware of this method appearing in the published literature.

(ii) Simultaneous minimization and collocation

It is possible to construct combinations of gradient-based minimization and Newton root solving which essentially simultaneously linearize both equation (2.5) and Embedded Image. Given a reasonable initial guess, such an algorithm will rapidly converge to a minimum of Embedded Image that satisfies equation (2.5). The first such method appears to have been published by Baden & Villadsen (1982). Biegler (1984) independently initiated the application of collocation-based methods to dynamic optimization, which includes parameter estimation as a special case. His method uses sequential quadratic programming, a common algorithm for minimizing an objective function subject to equality constraints without requiring satisfaction of the constraints at each iteration. This scheme solves the exact constraint such as equation (2.5) once, and then at subsequent iterations, it uses only the linearization of the constraint. This is combined with a quadratic approximation to the objective function Embedded Image which is easily minimized subject to the linearized constraint. It can be shown that such an iteration converges quadratically to the desired minimum. Biegler's algorithm has stimulated a range of variations and extensions (e.g. Tjoa & Biegler 1991; Esposito & Floudas 2000; Wang 2000; Li et al. 2005) and is widely used in the chemical engineering community.

(iii) Dual minimization

Observe that instead of using a root finder to solve equation (2.5), we could minimize the difference between the r.h.s. and l.h.s.Embedded Image(2.6)with respect to Embedded Image. We could also give a different weight to each component of the ODE (e.g. Ramsay et al. 2007). We can think of Embedded Image as a measure of how well the spline Embedded Image, defined by b, satisfies the differential equation (1.1). This approach has the advantage that we can take a larger number q of collocation points. In that case, equation (2.5) will be overdetermined, and no longer have a solution, but it is still possible to find a minimum of Embedded Image. Indeed, in the limit Embedded Image, we haveEmbedded Image(2.7)The minimization of such a function is often used in the derivation of various collocation schemes (e.g. Golub & Ortega 1992). Given either version of Embedded Image, now our problem is to simultaneously minimize both Embedded Image and Embedded Image with respect to both b and α. This is most easily achieved by minimizingEmbedded ImageHere λ is a weighing factor that determines how much emphasis we place on the data and the model, respectively. This is a particular attraction of this approach, since if we have high confidence in the model but the data are very noisy, we can take λ large, and conversely if the measurement error for the data is low but the model is suspect, we can use a low λ.

Brewer (2006) compares the Nelder–Mead simplex algorithm and simulated annealing in the minimization Embedded Image, with λ=1 and Embedded Image defined by equation (2.6). A more sophisticated optimization scheme has recently been presented by Ramsay et al. (2007) for the case of equation (2.7). Ramsay et al. (2007) also analyse the behaviour of their method in the limit λ→∞.

An alternative algorithm, applicable when f is linear in parameters, was introduced in Brewer (2006) and is presented in §2e. This is motivated by the observation that if the data are observed without error, so that Embedded Image, then we know that the rate of change of a solution at ti is precisely Embedded Image. In such a case, it would be reasonable to replace Embedded Image byEmbedded ImageThe advantage of this is that Embedded Image is a quadratic form (i.e. a linear least squares problem) in Embedded Image. Since Embedded Image is also a quadratic form, the overall error Embedded Image is a linear least squares function that can be minimized using a single QR decomposition.

Of course, in general, the data are subject to measurement error. We can, however, still use a modified error function of the above type, with our best estimate Embedded Image of the real measurement replacing Embedded Image in the above formula. Motivated by the expectation–maximization methods used in the case of nuisance parameters or missing data (Dempster et al. 1977; Moon 1996), we alternatively minimize Embedded Image and generate a new estimate of the data. Full details are given below.

(e) Derivative approximation methods

All of the methods presented so far use the least squares error function in equation (2.2) which determines the discrepancy between the observed data and a numerically computed solution of the differential equation. An alternative is to minimize the discrepancy between the r.h.s. and l.h.s. of the differential equation at a selected number of data points. This gives an error function likeEmbedded Image(2.8)This relationship between minimizing Embedded Image and Embedded Image under suitable conditions is discussed by Baden & Villadsen (1982). Observe the close similarity between Embedded Image and Embedded Image in equation (2.6). In particular, if we chose an approximate solution u given by equation (2.3), then Embedded Image. In such circumstances, minimizing Embedded Image will be equivalent to minimizing Embedded Image in the limit Embedded Image.

Note, however, that Embedded Image has no direct dependence on the observed data Embedded Image. In the case of Embedded Image, for a finite λ, such dependence is obtained through Embedded Image. Alternatively, one can incorporate the data directly into Embedded Image by choosing the points Embedded Image in equation (2.8) to be exactly or approximately the data points Embedded Image. The advantage of this is that it can be done without the need to solve the differential equation (1.1). Instead, we simply need to estimate the derivative Embedded Image at the points of interest. This can be done by smoothing the data Embedded Image using an appropriate local polynomial (or globally using a spline), and then differentiating the result.

This approach appears to have first been employed by van den Bosch & Hellinckx (1974), possibly inspired by collocation-based methods for parameter estimation in partial differential equations (Seinfeld 1969). Baden & Villadsen (1982) suggested an improvement and compared both schemes with the simultaneous minimization of Embedded Image and solution of equation (2.5), as described above. Swartz & Bremermann (1975) independently published an algorithm of this type and compared it with a shooting method. In contrast to van den Bosch & Hellinckx (1974) and Baden & Villadsen (1982) who used a global spline to smooth the data, Swartz & Bremermann (1975) employed local polynomials to either approximate or interpolate several successive data points, without any attempt to match such local fits at their intersections. Varah (1982) presented and evaluated an algorithm based on their ideas but again using a global spline smoothing method.

Finally, in a modern systems biology context, where observations are only available for a very restricted number of time points, Barenco et al. (2006) used Lagrangian interpolation between nearby data points to estimate the required derivative. This allowed them to estimate both parameters in a simple model of gene expression and to estimate the activity of an unobserved transcription factor (p53) driving the system. This, in turn, allowed the prediction of previously unknown targets of p53.

3. A new efficient method for ODEs linear in parameters

(a) Linearity in parameters

Our experience is that direct minimization of Embedded Image can be time-consuming (Brewer 2006). One would thus like to use specific features of a particular class of models to develop a more efficient algorithm. In particular, many models in physics, chemistry, engineering and biology are linear in their parameters. This is true for both models derived from underlying principles such as mass–action models in chemistry and cell biology, and for data-driven ‘black-box’ models built with general set of basis functions (e.g. radial basis functions). Recall that if a model is linear in its parameters, then the objective function is a quadratic form and a least squares estimate can be obtained in one step (i.e. non-iteratively) using standard linear algebra techniques such as QR decomposition (e.g. Lawson & Hanson 1974). This is much faster than the iterative minimization routines required for models that are nonlinear in parameters.

It seems difficult, however, to make use of this for ODE models, since even if the vector field Embedded Image depends linearly on the parameters α, the solution Embedded Image will typically exhibit nonlinear dependence. This makes it difficult to benefit from the linearity of f for solution-based approaches employing ED or Embedded Image.

On the other hand, if Embedded Image is linear in α then for fixed Embedded Image in equation (2.8) the error function Embedded Image is a quadratic form that can be minimized in the usual way using QR decomposition. In other words, equation (2.8) is a linear least squares problem in α. This makes methods based on Embedded Image particularly attractive for ODEs that are linear in parameters. However, as Baden & Villadsen (1982) point out, Embedded Image is the wrong objective function from a likelihood point of view. Our aim here, therefore, is to use the similarity between Embedded Image and Embedded Image highlighted above to derive an algorithm that approximately minimizes Embedded Image making full use of the linearity of Embedded Image with respect to α.

(b) Overview of the new algorithm

As already indicated above, if the data are known precisely, we can replace Embedded Image by Embedded Image which is a linear least squares objective function. We can employ the same principle even with noisy data if we replace Embedded Image in Embedded Image by our best available estimate of the noise-free data values. More generally, we do not need to restrict to just the observed time points ti. In particular, if we have estimates Embedded Image of the variables at the collocation points rk, we obtain the modified model error functionEmbedded Image(3.1)which yields the overallEmbedded Image(3.2)Motivated by expectation–maximization methods used in the case of nuisance parameters or missing data (Dempster et al. 1977; Moon 1996), we can attempt to fit the model by generating a sequence of better and better estimates Embedded Image. At each stage, this estimate is substituted into Embedded Image which is minimized with respect to Embedded Image. The resulting b is used to generate a new estimate Embedded Image using equation (2.3). We thus alternate estimating the expected value of the nuisance parameter with a minimization of the parameter of interest. Note, however, that our algorithm differs from a conventional expectation–maximization method where we would only minimize α, whereas here we simultaneously minimize both α and b.

(c) Description of the algorithm

We iteratively generate a sequence of estimates Embedded Image with Embedded Image given byEmbedded Image(3.3)

To do this, we proceed as follows.

  1. The iteration is initialized with a spline Embedded Image obtained by smoothing the data. More precisely,Embedded Image is given by equation (3.3) with b(0) the minimum of Embedded Image.

  2. For each Embedded Image we obtain Embedded Image from Embedded Image by minimizing Embedded Image with respect to Embedded Image. This is a linear least squares problem which is carried out in one step using QR decomposition.

  3. We define Embedded Image from Embedded Image using equation (3.3).

  4. If for some preset tolerance δ we have Embedded Image then terminate, otherwise return to step (ii).

(d) Properties of the solution

When the iteration terminates, we have Embedded Image to some pre-specified numerical tolerance. ThusEmbedded Imageand henceEmbedded ImageSince Embedded Image minimizes Embedded Image with respect to Embedded Image, we see that Embedded Image is a minimum of Embedded Image with respect to α. In general, it does not appear to be possible to ensure that Embedded Image is also minimized with respect to b but in practice the distinction between Embedded Image and EM appears to have negligible effect.

(e) Implementation

The algorithm was implemented in C++, using the TNT library (Pozo 2004). A weight of λ=1 and a stopping tolerance of Embedded Image were used, unless otherwise stated. In evaluating the accuracy of the final spline to the model, we used the integral form of Embedded Image, as defined in equation (2.7). This was evaluated using Simpson's rule with 10 000 steps (Press et al. 2002).

4. Numerical results

(a) A test model

As a specific example, we consider a simple four-component model of the core of p53 gene regulatory network; see appendix A and Brewer (2006). This ODE describes the behaviour of active ATM (a), p53 (z), active p53 (x) and MDM2 (y) after a cell experiences DNA damageEmbedded ImageIn total, the model depends linearly on its nine parameters Embedded Image and k4. We generated a simulated dataset for the parameter values given in table 4a using a fourth-order Runge–Kutta scheme with step size of 10−8 implemented in C++ (Press et al. 2002). Different sized datasets were created by sampling from this set at fixed intervals.

To simulate measurement noise, we added an independent Gaussian error to each data point. This had mean 0 and variance σ2, with σ a constant value for all data points. A number of different noise levels were used, ranging from σ=0 to σ=0.06. For each noise level, we report results averaged over 1000 independent realizations, each of which was fitted to the model.

(b) Results: shooting method using Nelder–Mead optimization

We implemented a shooting algorithm approach in C++ using a Runge–Kutta algorithm with adaptive step size control as the integrator (Fehlberg 1968; Cash & Karp 1990) and a Nelder–Mead simplex method as the optimization routine (Nelder & Mead 1965; Press et al. 2002). A starting simplex was constructed around an initial parameter estimate Embedded Image byEmbedded Imagewhere Embedded Image is the unit vector in the ith coordinate direction. Four different choices of initial guess were tried, as shown in table 1.

View this table:
Table 1

The points in parameter space used as the initial starting point P0 for the Nelder–Mead optimization. (The first of these (A) consists of the true parameter values used to generate the data and was used as a stability check for the algorithm.)

We used a stopping criterion based on the fractional range of the simplexEmbedded Imagewhere η is the required accuracy; lh is the highest value of the objective function among the vertices of the simplex; and Embedded Image is the lowest value.

Table 2 shows the results of this method for the four different initial starting points from table 1. This demonstrates that the parameter space contains many local minima, and unless the algorithm is started close to the true value, it is difficult to recover the correct parameter estimates. Using the Powell minimization method (Acton 1990), instead of Nelder–Mead, produced similar results (data not shown). Multiple local minima are common when applying parameter estimation to ODE models (Esposito & Floudas 2000), especially when the model is nonlinear and there are a large number of parameters. In this case, using any kind of local method, including other approaches such as gradient- or Hessian-based methods, will not be sufficient unless the starting parameter estimates are close to their true values.

View this table:
Table 2

Parameter estimates obtained using a Nelder–Mead-based shooting method. The first column indicates the starting point P0 for the simplex as in table 1. (The dataset consisted of all four variables sampled at 1052 time points with no noise added (so that σ=0). The notation ‘it’ indicates the number of iterations before convergence and ‘LSQ’ the final least squares error ED. Implementation constants were ζ=10 and η=10−10.)

(c) Results: shooting method using simulated annealing

An alternative is to use a global minimization method such as simulated annealing (Metropolis et al. 1953; Kirkpatrick et al. 1983; Kirkpatrick 1984; Gershenfeld 1999 or Press et al. 2002), Markov Chain Monte Carlo (e.g. Gilks et al. 1996) or genetic algorithms (e.g. Gershenfeld 1999). Such methods all have some possibility of moving to a worse solution during a systematic search of the parameter space and hence are able to escape out of local minima. Simulated annealing is probably the oldest and best established of these methods. This global minimization method slowly ‘cools’ the system, where the ‘temperature’ determines the probability that a step in parameter space that worsens the error is accepted. Generally, the step is determined by taking a random point from a Gaussian distribution with the mean set at the current point but here a refinement is implemented that uses the Nelder–Mead algorithm to propose the step (Cardoso et al. 1996; Kvasnicka & Pospichal 1997; Torres et al. 1997; Press et al. 2002). The cooling scheme is an important factor in the effectiveness of the algorithm. Here we employ the one recommended by Press et al. (2002), taking Embedded Image where Embedded Image is the initial temperature, k is the total number of moves so far and K is the estimated number of moves required.

The above scheme was implemented in C++. A number of different choices of cooling parameters (starting temperature and rate of cooling) and initial conditions were applied to the 1052 time point dataset with no measurement error (Embedded Image) using a number of different G5-based computers with processor speeds between 1.6 and 2.0 GHz (tables 3 and 5). Convergence to the global minimum only occurred in approximately one-quarter of the runs and these successful runs all started with the same initial parameters. The duration of the runs was consistently high, almost always taking more than a day to converge. In order to provide the best chance of success, the test was performed with a large dataset; a reduction in the dataset would render the parameter space more complex and make convergence to the optimal solution harder.

View this table:
Table 3

Parameter estimation using shooting with simplex simulated annealing. (Initial temperatures of 10, 100 and 1000 correspond to initial acceptance percentages of 52, 68 and 71% (based on 1000 proposed steps after an initial transient of 100 steps). The total number of steps was chosen for its computational feasibility: 106 steps can require several days of computer time.)

We conclude that simulated annealing can give good parameter estimates, but requires a lot of time tuning the minimization algorithm for success. Even when it does work, it is slow and is at the limit of practicality on current popular hardware. These conclusions will apply to any global minimization method that relies on single shooting to determine the error function. Multiple shooting (e.g. Kuznetsov 1995; Timmer et al. 2000) can help to regularize the parameter space and generally provides a more robust algorithm, but in our experience still retains many of the same problems as ordinary shooting.

(d) Results: novel collocation scheme for ODEs linear in parameters

Finally, we applied the algorithm presented in §3 for a variety of combinations of noise levels and sizes of dataset. When the dataset is relatively large (1052 time points), we obtain accurate parameter estimates using 22 splines (p=19). In this case, except for Embedded Image, all estimated parameters are within 0.1% of their true values (table 4b). The solution splines are virtually indistinguishable from the true time course (data not shown) and satisfy the model to a high degree of accuracy (Embedded Image). The algorithm took approximately 2 min on a 2.0 GHz G5 Power Macintosh when p=19, n=1052, q=n, Embedded Image and Embedded Image=0.06. This is significantly faster than the shooting methods above. Furthermore, the speed improves considerably as the amount of data is reduced (12 s required when n=106).

View this table:
Table 4

Evaluation of the algorithm presented in §3. (a) True parameter values. (b) Percentage error for parameter estimates from a 1052 time point dataset with p=19, q=1052, λ=1 and ω=0.464. (c) Parameter estimates from a six time point dataset when p=19, q=29 and ω=0.464, where ωq/n.

As the size of the dataset is decreased, the accuracy of the estimates also declines (figure 1). This occurs because the larger the dataset, the more constrained the spline is and hence the closer it will be to the ‘true’ solution and the more accurate the estimates will be. However, the loss of accuracy is minimal down to approximately n=150. Even with very small amounts of data, the estimates are still reasonably accurate; when n=14 the error is less than 7% which is perfectly usable in many systems biology contexts. This behaviour was consistent for each of the parameter estimates, but there were orders of magnitude differences in the error, ranging from less than Embedded Image to 15% when n=14 (see figure 7 in appendix B). This may reflect the relative contributions a parameter has on the model solution. We used q=n throughout as preliminary experiments showed that when there was no error in the data, this consistently produced good results in the minimum amount of processing time.

Figure 1

The relationship between the amount of data and the accuracy of the parameter estimates. This plot shows the accuracy when applied to datasets with between 14 (the minimum possible in this situation) and 1052 time points, with p=19, q=n, Embedded Image. Plots for individual parameters are given in figure 7 in appendix B.

We next added varying levels of noise to the data, and found that the algorithm continues to perform well (figure 2). The mean estimate generally occurs close to the true value and is always within one standard error. As Embedded Image increases, the estimates shift away from the true parameter value and this shift is significantly greater in some parameters than others. As the error in the data increases, the positional constraints of the spline (i.e. the data points) less resemble the true solution and so the optimal spline is likely to deviate, which in turn produces less accurate estimates.

Figure 2

(a,b) The effect of increasing measurement error on the accuracy of parameter estimates. Results are based on 1000 independent noise realizations with p=19, Embedded Image, Embedded Image. The error bars indicate the standard error of the parameter estimates. A Embedded Image of 0.06 corresponds to approximately 75% relative error for active p53 and 12% for the other components. For results of the remaining parameters see figure 6 in appendix B.

The performance of the algorithm depends on a number of adjustable factors, including p, q and Embedded Image. It is beyond the scope of this paper to look at these factors in detail (Brewer 2006), but here we will briefly examine two situations where their optimization is beneficial. When the amount of error in the data is large, it is no longer appropriate to give equal weight to the spline being close to the data and the spline satisfying the model. This is because the solution spline is unable to represent the model solution accurately and so poor parameter estimates will result. More weight can be placed on the spline satisfying the model by increasing the number of collocation points and/or increasing Embedded Image. This gives an improvement in the spline quality (figure 3) and hence the parameter estimates. Increasing the number of collocation points has the additional benefit of spreading the positions where the model needs to be satisfied, which is important where the amount of data is small. There is a limit to how much additional weight can be used: if it is too large, the procedure fails to converge. In this case, the solution spline moves away from the data points, becoming a worse estimate of the model solution at each iteration. This occurs because the data no longer have a strong enough constraining influence on the spline. When there is a small amount of data, it is difficult to get reasonable parameter estimates, but with the appropriate optimization of the factors, it is possible to get good results (table 4c and figure 4). At such low amount of data, convergence is extremely sensitive to the factor values.

Figure 3

The relationship between model error Embedded Image and (a) the number of collocation points q; and (b) the ratio Embedded Image. Results are based on 1000 independent noise realizations with p=19, Embedded Image and n=106. In (a) we have Embedded Image and in (b) q=500. Error bars indicate standard deviations.

Figure 4

The solution spline for active p53 produced by the algorithm on a small amount of data with n=6 and either p=19, q=29 and Embedded Image or p=11, q=14 and Embedded Image.

5. Discussion

Estimation of parameters in ODE models is of considerable importance in many modelling fields. Increasingly in systems biology, this needs to be done for very short time courses with high levels of noise on the data. Traditional algorithms are poorly suited to such problems.

We have presented a summary of the various approaches to parameter fitting in ODEs. We have then introduced a new algorithm for the case of models linear in parameters. It employs a spline-based collocation scheme and alternates linear least squares minimization steps with repeated estimates of the noise-free values of the variables. This has the advantage that fast, established linear algebra solvers can be used to provide optimal parameter values. The proposed procedure also avoids the problems of the model becoming stiff which can hamper shooting-based methods, in particular, when applied to models that are linear in their parameters. Additionally, the proposed procedure is effective at dealing with large-scale models and does not require the estimation of initial conditions.

We have shown that the proposed procedure produces reasonable estimates, even at low amounts of data and quite high amounts of error. The accuracy of the proposed procedure depends on how close the intermediary spline is to the true model solution. At low amounts of data and high error, there is less valid information to constrain the spline and so the spline accuracy and hence the estimates suffer. By optimizing the algorithm factors, in particular the weight placed on the spline satisfying the model, it is possible to significantly improve the accuracy of the spline and hence the estimates.

In comparison with single shooting methods, there are many advantages to our new algorithm. Firstly, it is more accurate in converging to reasonable parameter estimates across a wide range of dataset sizes and amounts of error. Many single shooting global optimization methods rely on probabilistic steps but this can often result in inconsistent results. Secondly, our procedure is considerably faster. This arises because the problem is linearized so that efficient linear algebra solvers can be used and by using splines the model has been effectively discretized, negating the need for costly integration. Also, our procedure is simple, with only three key factors that can be varied. In comparison, simulated annealing requires numerous algorithm parameters to be optimized. Finally, this technique neither requires the estimation of parameter values to seed the optimization nor the initial conditions of the dynamic system. This simplifies the requirements to get reasonable parameter estimates, which are of particular concern at low amounts of data or when not much is known about the system in advance.

Despite our procedure being limited to models that are linear in their parameters, it is still a fast and useful tool that is applicable in many areas of modelling. Furthermore, there is potential for this method to be applied to simplifications of more complex problems to provide good initial estimates that can then be refined using a more complex optimization methods applied to the full model.


D.M.B. held a CHRAT studentship by the ICH and M.B. was supported by UK Biotechnology and Biological Sciences Research Council (BBSRC) Exploiting Genomics Initiative grant (39/EGM16102). J.S. is supported by the BBSRC via the Centre for Integrative Systems Biology at Imperial College (CISBIC), BB/C519670/1.


  • One contribution of 14 to a Theme Issue ‘Experimental chaos II’.


View Abstract