## Abstract

Increasingly complex applications involve large datasets in combination with nonlinear and high-dimensional mathematical models. In this context, statistical inference is a challenging issue that calls for pragmatic approaches that take advantage of both Bayesian and frequentist methods. The elegance of Bayesian methodology is founded in the propagation of information content provided by experimental data and prior assumptions to the posterior probability distribution of model predictions. However, for complex applications, experimental data and prior assumptions potentially constrain the posterior probability distribution insufficiently. In these situations, Bayesian Markov chain Monte Carlo sampling can be infeasible. From a frequentist point of view, insufficient experimental data and prior assumptions can be interpreted as non-identifiability. The profile-likelihood approach offers to detect and to resolve non-identifiability by experimental design iteratively. Therefore, it allows one to better constrain the posterior probability distribution until Markov chain Monte Carlo sampling can be used securely. Using an application from cell biology, we compare both methods and show that a successive application of the two methods facilitates a realistic assessment of uncertainty in model predictions.

## 1. Introduction

In many scientific situations, mathematical models are used to predict the properties of a system under study using explicitly formulated assumptions and hypotheses. This is especially important for complex applications that do not allow for an intuitive understanding of the processes involved. Applications range from analyses in particle physics [1] or in climate research [2,3] to quantifying dynamical processes in cell biology [4]. Often, these mathematical models contain parameters that are unknown or known only with large uncertainty. For example, in the biochemical models that are used in cell biology, parameters such as reaction rate constants, amount of molecular compounds, detection sensitivities or measurement backgrounds are often unknown. Before a model can be used for prediction reliably, the unknown parameters have to be estimated by comparing model output to experimental data.

For a realistic assessment of the accuracy of model predictions, it is important that uncertainties in the experimental data and in prior assumptions are propagated correctly via the parameters to the desired predictions. Bayesian Markov chain Monte Carlo (MCMC) sampling facilitates this propagation of uncertainties by sampling from the posterior probability distribution [5]. However, if experimental data are limited and the mathematical models are nonlinear and contain many unknown parameters, the posterior probability distribution can be insufficiently constrained. For such an insufficiently constrained posterior, the probability mass can be distributed widely in a high-dimensional parameter space. Consequently, MCMC sampling can quickly become infeasible.

Alternatively, one can resort to frequentist methods in this situation. Here, insufficient experimental data and prior assumptions can be interpreted as non-identifiability of the model parameters [6]. We applied a generic approach that uses the profile likelihood to detect both structural and practical non-identifiability [7]. Furthermore, this approach allows one to design new experiments that resolve non-identifiability. Therefore, it is beneficial to further constrain the posterior probability distribution until MCMC sampling can be applied reliably and efficiently.

We compared the results of both MCMC sampling and profile-likelihood methods. In the absence of non-identifiability, the results of both methods are in good agreement. However, in the presence of non-identifiability their results can be substantially different. Our results imply that MCMC sampling in the presence of non-identifiability can be misleading. Therefore, we suggest a successive application of the two methods that ensures a realistic assessment of uncertainty in model predictions.

### (a) Frequentist methods

Maximum-likelihood estimation of model parameters is a theoretically well-developed area [8]. Based on an assumption on the distribution of the measurement noise, the likelihood function *L*(*y*|*θ*) of the data *y* given the parameters *θ* describes the agreement of model output and experimental data. In the case of normally distributed measurement noise *ϵ*∼N(0,*σ*^{2}), the likelihood reads as
1.1where *m* model outputs *y*_{k}(*t*_{l},*θ*) and *d*_{k} data instances for each model output, such as time points *t*_{l}, can be considered. The maximum of *L*(*y*|*θ*), i.e. the best fit of the model to the data, provides a point estimate of the parameters. This maximum-likelihood estimate (MLE) can be calculated for nonlinear models by numerical optimization methods; see e.g. the trust region algorithm in [9] and the general introductions in [10,11]. The uncertainty of the estimate is buried in the shape of the likelihood function. Figure 1 shows an illustration of the likelihood for three typical cases.

If the amount of model quantities that can be accessed by experiments is limited, a subset of parameters can be structurally non-identifiable. This indicates that the parametrization of the model is such that two or more parameters can compensate their effects and yield exactly the same model outputs *y*_{k}(*t*_{l},*θ*). This in turn results in a constant likelihood value on a sub-manifold (figure 1*a*). Consequently, the MLE for the parameters cannot be determined uniquely. The parameter relations that cause the structural non-identifiability are akin to gauge invariances in physical theories. However, for complex models, it can be difficult to detect structural non-identifiability. For example, if models can only be evaluated by numerical simulation, such as in the case of detector models used in particle physics [13] or dynamical models that are used in cell biology [4], structural non-identifiability cannot be detected directly. In the latter case, methods for *a priori* structural non-identifiability analysis exist that analyse the structure of the ordinary differential equations (ODEs) without having an analytical solution. A comparison of these methods can be found in Chis *et al*. [14]. However, *a priori* methods are often limited to linear ODE systems or are impracticable for models containing many parameters [15]. Arguably one of the most practicable *a priori* methods that has also been proved to work for larger models applies a probabilistic algorithm [16,17].

In addition to structural non-identifiability, model parameters can be practically non-identifiable [7]. This type of non-identifiability arises if the amount and quality of experimental data are limited. It cannot be detected by *a priori* methods. However, practical non-identifiability is of equal importance. A generic approach that allows one to detect both structural and practical non-identifiability at the same time uses the concept of the profile likelihood [7,18]. The profile likelihood (PL) can be calculated for each parameter *θ*_{i} individually by
1.2The equation indicates that for each value of *θ*_{i} all of the remaining parameters *θ*_{j} are re-optimized; see figure 1 for illustration. The profiles PL(*y*|*θ*_{i}) break down the uncertainty contained in the high-dimensional likelihood *L*(*y*|*θ*) to a footprint in one dimension. It allows for reliable conclusions about whether a parameter can be inferred from the experimental data. Three typical cases arise and can be detected from the profiles. A flat profile with a constant value indicates structural non-identifiability (cf. figure 1*a*). A profile that decreases but tails out to a plateau to one or both sides indicates practical non-identifiability (cf. figure 1*b*). A profile that tails out to zero on both sides quickly enough, i.e. at least exponentially fast, indicates structural and practical identifiability (cf. figure 1*c*). Experimental design and model reduction strategies based on the profile likelihood allow one to resolve parameter non-identifiabilities iteratively; for an application see [19].

Furthermore, the profile likelihood allows one to assess likelihood-based confidence intervals [20–22]. A confidence interval to a confidence level *α*=0.95 signifies that the true value of the parameter *θ**_{i} is expected to be inside the interval with 95% probability. Using a threshold which is the *α* quantile of the *χ*^{2}-distribution with d.f. degrees of freedom [10], confidence intervals can be determined from the profiles by .

### (b) Bayesian methods

By applying Bayes' theorem, the likelihood function (1.1) is extended by the prior probability density function (PDF) of the parameters *P*(*θ*) and normalized by a factor *c*, yielding the posterior PDF of the parameters,
1.3In analogy to the MLE, the maximum *a posteriori* (MAP) estimate is defined by maximizing *P*(*θ*|*y*). The only decisive difference to frequentist methodology is the choice of the prior *P*(*θ*). The direct computation of the normalization factor *c* is not feasible for a high-dimensional parameter space. However, extensive MCMC sampling offers a way to evaluate *P*(*θ*|*y*) despite the unknown *c*. This intriguing feature opens the prospect of considering the full high-dimensional posterior PDF for statistical inference. Most prominently, the Metropolis–Hastings algorithm [23,24] defines a Markov process where transitions are generated using a proposal function *q*(*θ*|*θ*′) that eventually produces a series of samples of the posterior PDF *P*(*θ*|*y*). The transitions are accepted with probability
1.4For efficiency of the sampling, the choice of the proposal function *q*(*θ*|*θ*′) is important. Often, proposals drawn from a multivariate normal distribution are convenient. One of the most simple implementations uses where is a scaled identity matrix (SIM). Too small a choice of *s* in relation to the actual shape of the posterior PDF will cause the process to converge slowly and to produce correlated samples. Too large a choice of *s* will lead to rejection of too many proposals *θ*′ yielding a slow sampling. Assuming that the posterior PDF is a multivariate normal distribution, the optimal acceptance rate of proposals is ≈0.23 [25]. Nevertheless, in the light of complex and nonlinear models with possibly limited amount and accuracy of experimental data, this assumption is problematic because the shape of the posterior PDF can be far from the PDF of a normal distribution. For a high-dimensional parameter space, computational efficiency also becomes an important issue. In these cases, the Markov chain may have to move along complex structures [12], fig. 8.2.2. To increase efficiency, more sophisticated methods take into account the natural geometry of the posterior PDF, e.g. the manifold Metropolis adjusted Langevin algorithm (MMALA) takes into account the local gradient and curvature information [26].

Before applying an MCMC sampling method, the prior PDF of the parameters needs to be specified. Given that empirical evidence exists about the distribution of a parameter, such as a previous measurement or estimation, *P*(*θ*) should incorporate this prior knowledge accordingly [27]. If no empirical evidence about the parameter value is available, the prior should be chosen as uninformative [12]. This requires a flat metric in parameter space, i.e. one that does not artificially favour certain parameter values. Depending on the parametrization of the model, it can in practice be difficult to obtain a flat metric and hence to specify an uninformative prior [27].

The most crucial problem that MCMC sampling faces is to ensure that the samples obtained realistically represent the actual posterior PDF. One instance where the convergence of the Markov chain fails is if the posterior PDF is not proper [28]. Posterior PDFs are called *proper* if they are integrable [12]. This means that it has to tail out to zero sufficiently fast over the appreciable range of the parameters such that its integral can be normalized to one. Non-identifiable parameters cause the posterior PDF to be non-proper [28]. This indicates that neither the prior assumptions nor the likelihood that represents the experimental data constrain the posterior PDF sufficiently. A practical consequence is that the Markov chain cannot converge and hence gives inaccurate results [29]. For convergence, it is required that the Markov chain is positive recurrent, which is not given in the case of non-identifiability [30]. The user must ensure parameter identifiability before an MCMC technique can be used securely. If models contain many unknown parameters and the posterior PDF is not sufficiently constrained, the convergence of the Markov chain can be impracticably slow even for a proper posterior PDF.

## 2. Results

For reliable inference in the presence of non-identifiability, we propose a joint approach that take advantage of both profiling and MCMC sampling methods. The profile likelihood is suitable to detect parameter non-identifiability [7]. Furthermore, it allows for experimental design that helps to resolve parameter non-identifiability [19]. This ensures that the posterior PDF is proper and well constrained. Subsequently, efficient MCMC sampling [26] can be used reliably to generate samples of the posterior PDF. Finally, the uncertainty in model predictions can be assessed realistically. The workflow of this joint approach is displayed in figure 2*a*.

Both profiling and MCMC sampling methods separately are used for inference in various fields, e.g. in particle physics [1,31] or in climate research [2,3]. Here, we present results of using the joint approach that take advantage of both methods for an application from cell biology [32]. An ODE model was used to describe the concentration dynamics of six molecular compounds involved in the interplay of the hormone erythropoietin (Epo) and its corresponding membrane receptor (EpoR) (figure 2*b*). Epo is an important factor in the differentiation of blood cells. The dynamics of the molecular compounds are formulated as
2.1and a mapping of the dynamics to experimentally accessible quantities by the model output function
2.2that enters the likelihood function (1.1) was used. The model comprises six ODEs and 10 unknown parameters, including one nuisance parameter; for details about the model equations see the electronic supplementary material. All parameters are positive by definition, hence the logarithmic space yields a flat metric [12]. Since no empirical evidence about the values of the nine parameters of interest was available, the prior was chosen uninformative.

Radioactively labelled Epo facilitated the measurement of Epo concentration in the extracellular medium and of Epo bound to the cell membrane; see electronic supplementary material, table S1. The MAP estimates of the model parameters were obtained by numerical optimization; see electronic supplementary material, table S2. The agreement of model outputs and experimental data for this initial experimental set-up is shown in figure 2*c*. In accordance with the experimental data, the model describes the binding of Epo to the Epo receptor on the membrane, internalization and recycling of the Epo–EpoR complex. As first step of the proposed joint approach, the posterior PDF was screened for non-identifiability using the profiling method.

### (a) Identifiability analysis using posterior profiles

The profile-likelihood approach for identifiability analysis [7] can be translated into a profile posterior approach. In analogy to equation (1.2), profiling can be applied to the unnormalized posterior PDF by defining the profile posterior
2.3(cf. figure 1). Technical details on the implementation of the methods are given in the electronic supplementary material notes. For the initial experimental set-up, the profiles of the posterior PDF reveal four structurally non-identifiable, two practically non-identifiable and three identifiable parameters. One of each case is displayed as full red lines in figure 3*a*; the remaining profiles are shown in the electronic supplementary material, figure S1.

### (b) Markov chain Monte Carlo sampling

The results of the posterior profiling approach indicate that the posterior PDF for the initial experimental set-up is not proper, i.e. the posterior PDF cannot be normalized to one and is therefore not a valid PDF. In this situation, an MCMC sampling cannot converge and hence gives inaccurate results [29]. In order to allow for a comparison between profiling and MCMC sampling, the prior PDF was restricted artificially to a uniform distribution with support from 10^{−5} to 10^{+5}. The resulting posterior PDF is now proper and the Markov chain can in principle converge. It is important to note that the results of MCMC sampling can potentially be biased due to the artificial specification of this prior PDF. The new posterior PDF, though proper, still exhibits a complicated structure. The probability mass is distributed widely in the 10-dimensional parameter space; see the posterior profiles in figure 3*a*. In this situation, the SIM sampling algorithm, which uses a scale identity matrix for generating proposals, was too inefficient and did not yield reasonable results within an acceptable time; see the electronic supplementary material, figures S1–S3.

To improve efficiency, the MMALA algorithm, which takes into account the local geometry of *P*(*θ*|*y*), was used [26]. Figure 3*a* shows the histograms of marginalized MCMC samples. The remaining results are displayed in the electronic supplementary material, figures S4–S6. For the identifiable parameter *k*_{de} and for the structural non-identifiable parameter *k*_{on}, the results of the MCMC sampling are in good agreement with the results of the profiling. For the practically non-identifiable parameter *k*_{t}, the results are substantially different. The profile shows that the MAP point is located at . However, the lion's share of the marginalized MCMC samples propose to be >0. In this region, the posterior profile reveals a plateau.

### (c) Taking into account additional experimental data

The results of both posterior profiling and MCMC sampling indicate substantial uncertainty in the parameter estimates for the initial experimental set-up. Based on the results of the profiling approach, additional experiments were suggested; see [19] for details. The target of the experimental design was to resolve non-identifiabilities. The additional experimental data were included in the estimation procedure, yielding an extended experimental set-up (figure 2*c*). Figure 3*b* shows the recomputed posterior profiles. They indicate, by tailing out to zero, that the identifiability problems were resolved for all parameters. The remaining profiles are shown in the electronic supplementary material, figure S7.

As a consequence, the posterior PDF is now proper without artificial assumptions on the prior PDF. In this situation, the SIM sampling algorithm was still too inefficient and did not yield reasonable results within an acceptable time; see the electronic supplementary material, figures S7–S9. To improve efficiency, the MMALA algorithm, which takes into account the local geometry of *P*(*θ*|*y*), was used again [26]. Figure 3*b* shows the histograms of marginalized MCMC samples. The results for the remaining parameters are shown in the electronic supplementary material, figures S10–S12. For all parameters, the results of the MCMC sampling and posterior profiling are now in good agreement. The MCMC samples for parameter *k*_{t} that showed substantial difference between profiles and MCMC samples for the initial set-up are now in good agreement as well. Interestingly, the MCMC samples for parameter *k*_{t} for the extended set-up are localized close to the MAP point of the initial set-up. This suggests that the large probability mass of MCMC samples for values of obtained for the initial set-up was misleading.

MCMC sampling results are now reliable and allow one to consider the full high-dimensional posterior PDF for inference; see e.g. the nonlinear correlation between parameter *k*_{di} and *k*_{de} shown in the electronic supplementary material, figure S13. Using the generated MCMC samples, the parameter uncertainties contained in the posterior PDF can be propagated accurately to the prediction of the dynamics of molecular compounds that are not accessible by experiments directly (figure 4).

## 3. Summary

We introduced a joint approach that combines frequentist profiling and efficient Bayesian MCMC sampling methods. The proposed approach uses the analysis of posterior profiles that helps to determine the identifiability of the model parameters. Non-identifiability results in a posterior distribution that is not proper, i.e. that cannot be normalized to one. However, a proper posterior distribution is required for convergence of the MCMC sampling. The results of the posterior profiling can be used for experimental design that helps resolve non-identifiabilities iteratively. Consequently, this ensures that the posterior distribution is proper. Having ensured identifiability of the parameters, the MCMC sampling results are reliable and can be used to propagate the uncertainty of parameter estimates to model predictions. Using an application from cell biology, we showed that this approach enables one to obtain accurate results.

We compared the results of posterior profiling and marginalizing over the MCMC samples for two stages of experimental set-up: an initial set-up that contains non-identifiabilities and an extended set-up where all parameters are identifiable. A uniform prior distribution ensured that the posterior is proper despite non-identifiability for the initial set-up. For the extended set-up, the results of posterior profiling and marginalizing over the MCMC samples are in good agreement. For the initial set-up, substantial differences between profiles and MCMC samples are observed. Interestingly, the profiles for the initial set-up reflect the posterior distribution for the extended set-up much better than the MCMC samples of the initial set-up. This indicates that MCMC sampling results in the presence of non-identifiability can be inaccurate despite a proper posterior distribution.

## Acknowledgements

This work was supported by the German Federal Ministry of Education and Research (Virtual Liver (grant no. 0315766), LungSys (grant no. 0315415E) and FRISYS (grant no. 0313921)), the European Union (CancerSys (grant no. EU-FP7 HEALTH-F4-2008-223188)), the Initiative and Networking Fund of the Helmholtz Association within the Helmholtz Alliance on Systems Biology (CoReNe HMGU), and the Excellence Initiative of the German Federal and State Governments (EXC 294).

## Footnotes

One contribution of 17 to a Discussion Meeting Issue ‘Signal processing and inference for the physical sciences’.

- © 2012 The Author(s) Published by the Royal Society. All rights reserved.