## Abstract

Modelling human and animal metabolism is impeded by the lack of accurate quantitative parameters and the large number of biochemical reactions. This problem may be tackled by: (i) study of modules of the network independently; (ii) ensemble simulations to explore many plausible parameter combinations; (iii) analysis of ‘sloppy’ parameter behaviour, revealing interdependent parameter combinations with little influence; (iv) multiscale analysis that combines molecular and whole network data; and (v) measuring metabolic flux (rate of flow) *in vivo* via stable isotope labelling. For the latter method, carbon transition networks were modelled with systems of ordinary differential equations, but we show that coloured Petri nets provide a more intuitive graphical approach. Analysis of parameter sensitivities shows that only a few parameter combinations have a large effect on predictions. Model analysis of high-energy phosphate transport indicates that membrane permeability, inaccurately known at the organellar level, can be well determined from whole-organ responses. Ensemble simulations that take into account the imprecision of measured molecular parameters contradict the popular hypothesis that high-energy phosphate transport in heart muscle is mostly by phosphocreatine. Combining modular, multiscale, ensemble and sloppy modelling approaches with *in vivo* flux measurements may prove indispensable for the modelling of the large human metabolic system.

## 1. Introduction

Metabolism consists of the biochemical reactions taking place in the body. It is the counterpart in the molecular domain of the energy transformations and signalling taking place in the cells and the body as a whole. Metabolism is therefore of enormous importance to human and animal physiology and pathology. For instance, instead of studying the tremors in patients with Parkinson's disease, it may often be more efficient to study the synthesis of the neurotransmitter dopamine, which is involved in this disease, in cultures of human cells.

One difficulty in the study of human and animal metabolic networks is that the total system is very large and strongly interconnected. In addition, some metabolic pathways overlap extensively (Ma *et al*. 2007) and hence influence each other's behaviour. This causes difficulties in studying them separately. Qualitatively, much is known about metabolism, from micro-organisms to higher life forms. This means that we often know which metabolites take part in metabolism and how they are connected via reactions in the metabolic pathways. For the most part, biochemical reactions are very similar throughout all kingdoms of life, which to some extent may simplify the modelling process: many reactions and parts of pathways can be transferred from models of one organism to another. The possibility for such simplification is limited because the regulation of metabolic pathways and the mechanisms of regulation of enzymes vary between species and must in many cases be adapted to build accurate models for other species. Although there is a certain universality in metabolic systems, even between bacteria and higher organisms, regulation and quantitative behaviour of the models has to be adapted and fine tuned for each organism. This often means quantitative changes in pathways that remain qualitatively similar.

In contrast to qualitative properties, little is known about the quantitative parameters of these metabolic systems, especially not under *in vivo* conditions. Parameters are not yet available or perhaps not even measurable. This lack of accurate kinetic parameters is an important impediment for modelling human and animal metabolism. Here, we describe strategies for robust modelling of metabolic systems, which means that we deal with the limited accuracy of metabolic parameters. We try to obtain insight and useful predictions in the face of limited knowledge about the metabolic system.

The problem of insufficient quantitative knowledge may be tackled by: (i) modularization of the network to study parts of the system one by one and independently; (ii) making use of ensemble simulations to explore the range of plausible parameter values in multidimensional parameter space; (iii) determining combinations of parameters that have either large or very small effects on model predictions when the parameters are changed in a correlated way (‘sloppy’ parameter behaviour, see below); (iv) multiscale analysis that combines information from the molecular, biochemical and organ levels; and (v) measuring metabolism under *in vivo* conditions, providing important data on biochemical network function as a whole.

While these are important strategies to bridge the gap between the qualitative knowledge of metabolism and a comprehensive quantitative analysis of the human and animal metabolic system, another important gap exists between mathematical modellers and experimentalists. Many experimental scientists have difficulties in understanding mathematical and computational models. Such models of biochemical systems are presently often formulated as systems of ordinary differential equations (ODEs). Each differential equation represents the rate of change of a metabolite pool caused by the chemical reactions transporting mass into and out of that pool. For many experimental scientists, this approach is probably not appealing because it is not in line with their intuition. For this reason, we will discuss a more graphical approach, provided by Petri nets, which is accurate enough for many purposes and close in structure and functioning to the metabolic pathways that it represents. This approach may help experimental scientists to formulate and use models of metabolic pathways more efficiently.

## 2. From qualitative knowledge on metabolism to large-scale quantitative modelling

Textbooks of biochemistry and an enormous number of publications in the literature describe the types of molecules that have been discovered in the bodies of humans and animals. Many biochemical reactions, catalysed by enzymes, connect the metabolites as they react with each other or are formed from each other. Sequences of enzymatic reactions are connected in metabolic pathways. The Boehringer chart, which shows a large number of reactions and their connections, has decorated the walls of biochemical laboratories for many decades now. Recently, several metabolic system databases have been assembled. The KEGG, Reactome and PathwayCommons databases, for instance, list a great number of reactions in many pathways. Bernard Palsson and colleagues built a reconstruction of human metabolism *in silico* (Duarte *et al*. 2007). A second reconstruction of human metabolism was published (Ma *et al*. 2007). In addition, a database exists that contains the small metabolites found in the human body, the Human Metabolome database (Wishart *et al*. 2007).

There is a remarkable difference between metabolism as portrayed in textbooks, where dominant fluxes (rates of flow) in major pathways are treated, and the connectivity of metabolism as found in an *in silico* reconstruction of human metabolism (Duarte *et al*. 2007). The textbook version of the tricarboxylic acid (TCA) cycle, also known as the Krebs cycle, usually describes the TCA cycle as an ordered series of reactions and does not emphasize the many existing side reactions. The textbook mental picture of the TCA cycle is almost that of a distinct module. A close look at the comprehensive *in silico* description shows that the TCA cycle intermediates are substrates and products of numerous side reactions, which connect the TCA cycle intermediates extensively with the rest of metabolism. Kinetically, it may be true that the TCA cycle reactions dominate, but in reality its intermediates are part of an extensive metabolic network. To understand this complex densely interconnected metabolic system under all conditions, it would be very useful to have a dynamic model of metabolism that addresses the full network connectivity. If such a model proves to predict metabolic network function reasonably, it could in the future also help to regulate metabolism under disease conditions.

The lack of accurate kinetic equations and parameters is clear, but even if all the parameters are known with reasonable accuracy, this does not mean that metabolite levels or metabolite fluxes can be calculated with good precision. Even for a relatively ‘simple’ organism such as yeast, with kinetic measurements collected to characterize all the enzymes of the glycolytic pathway (Teusink *et al*. 2000), it turned out that the prediction of pathway flux and metabolite levels in the pathway was of rather limited accuracy.

Given that there is so much information available on the connectivity of metabolism, the question may be posed whether it is timely to start building quantitative and dynamic models of human and animal metabolism. The answer is clearly positive. On the one hand, much experimental data have been gathered, but on the other, understanding of the metabolic system is still very limited. Gradually, by trial and error, building a more comprehensive model of the human metabolic system may help to integrate and understand the enormous amount of experimental information on human and animal metabolism. Building a valid model will be possible only by making predictions on new experiments and correcting the model if the predictions prove wrong. In this way, dynamic models of metabolism, containing many pathways and metabolites, may prove their value to integrate the extensive knowledge on thousands of metabolites, with reactions catalysed and regulated by thousands of gene products. It is a major challenge to systems biochemistry to develop such a quantitative description of human metabolism, which integrates all biochemical and molecular biological knowledge and explains most experimental findings. However, because the mind of an ordinary scientist is, in general, insufficient to contain all the information available on the human biochemical system, robust modelling and analysis techniques making use of all available information are necessary. This becomes even more desirable when analysing and predicting metabolic system behaviour during a disease process or an experimental or therapeutic intervention. Because it is inevitable to face the challenge of developing models of human and animal metabolism, there is a need to deal with the large scale of the system and in particular with the substantial number of imprecisely known kinetic parameters.

## 3. Strategies for quantitative modelling of metabolic systems

There exist different strategies to deal with the large scale of the human biochemical system and with the incomplete quantitative knowledge of the parameters characterizing metabolic processes. Some robust approaches are as follows.

Modularization of the network to study parts of the entire system in detail, which can be experimentally ‘isolated’. Thereafter, the whole system is assembled from such well-characterized parts. This approach may not be possible in all or even most cases, but is rewarding if feasible because the biological function of a small group of molecules can be accurately characterized.

Application of ensemble simulations to explore the range of plausible parameter values. To this end, many simulations are done with different parameter sets that cover the plausible part of multidimensional parameter space, taking correlation between the parameters into account. Importantly, such simulations can also be used to explore the effect of measurement noise on the confidence regions of parameters estimated from experimental data.

Determination of parameter combinations that are shown by analysis to have large or small effects on model predictions when changing in a correlated way. The former represent ‘stiff’ directions in parameter space while the latter depict ‘sloppy’ directions in parameter space. The approach of finding those parameter combinations has been termed ‘sloppy modelling’ (Brown

*et al*. 2004; Gutenkunst*et al*. 2007*a*,*b*). This does not mean sloppy work by the scientists who design the model, but emphasizes that some of the combinations of parameters in the model exert a weak influence on the model prediction. Often strong correlations within groups of parameters are discovered, which together define the stiff and sloppy directions.Application of multiscale analysis, which means that measurements at various aggregation levels are taken into account: information measured at the level of molecules, biochemical pathways, organelles, cells and organs can thus be combined. For instance, not only the properties of enzymes or isolated mitochondria are incorporated in the analysis, but also the measured response of a whole pathway or network of connected enzymes and organelles in the cell. For instance, the time course of metabolite levels or metabolic flux (rate of flow) in the pathway in response to stimulation of the pathway provides data at the pathway level. Experimentally, this may be accomplished by measuring the time course of adaptation to altered cellular workloads, such as increased muscle contraction frequency, neural firing rate or secretion of hormone. Yet another example is the measurement of metabolic fluxes in specific pathways under various steady-state conditions in relation to the metabolite levels in the pathway.

Measuring metabolism under

*in vivo*conditions. We describe computational methods to quantify metabolic fluxes from experimental measurements under*in vivo*conditions. This can, among other ways, be done by providing substrates for metabolism that are labelled with stable isotopes, and subsequently measuring the incorporation of the label in the network's metabolites. The latter type of measurements define the operation of the metabolic system under the*in vivo*working conditions and thereby provide important information on the functioning of the intact living system.

We will start with a description of flux measurements because it provides a simple example of a model of a metabolic system. Such measurements provide an important input for the multiscale approach and show examples of the ‘sloppiness’ of parameter space.

## 4. Measuring metabolic fluxes *in vivo* with stable isotopes

Metabolic fluxes reflect cell function and dynamic adaptation of living organisms to their environment. Given the incompleteness of available kinetic parameters to dynamically simulate metabolism, it is very useful to determine the reaction rates in metabolic networks experimentally. Common isotope experiments are, for instance, done in experimental animals, human beings and micro-organisms, and entail the infusion of isotope-labelled substrate into the metabolic system. These isotopes are distributed by metabolic fluxes among the metabolites, usually until the isotopic steady state is reached. The amount of isotope incorporation is measured by nuclear magnetic resonance spectroscopy (NMRS) or mass spectrometry (MS) in chosen metabolites. This allows us to quantify metabolic fluxes via computational modelling and analysis, even in cases where kinetic parameters are missing (Kelleher 2001; Wiechert 2001; Sherry *et al*. 2004; Antoniewicz *et al*. 2007). However, measurements in tissue samples taken after a steady state of label incorporation is reached only give insight into the relative fluxes in different pathways. Especially in micro-organisms, substrates enriched with stable isotopes are often given for a long time until incorporation in small metabolites and proteins has reached a steady state. Hence, experiments become long and costly. By contrast, if steady states are short, as in humans and animals, this common approach will not be accurate due to continuous changes in the system. Yet another obstacle is destructive measurement techniques such as MS, which do not allow us to measure the time course of isotope incorporation in the same small tissue region. It is impossible to take multiple samples if there is little material available, as may be the case with tissue biopsies and in cell cultures.

Therefore, we have recently developed a method to define the time course of isotope incorporation and to quantitate metabolic fluxes by analysing a single sample taken at a single time point (van Beek *et al*. 1998, 1999). For this method, it is essential that the sample is collected before the attainment of a steady state of isotope incorporation. Despite taking only a single measurement, several metabolic fluxes can be quantified by computational model analysis of the measured isotope incorporation in the metabolites in the tissue sample. This new experimental protocol to quantify metabolic fluxes is named ‘labelling with isotope for pre-steady-state snapshots’ (LIPSSS). The isotope incorporation data are analysed with a computer package called FluxSimulator (Binsl *et al*. 2007). The LIPSSS protocol has in common with other isotope methods that a substrate labelled with isotopes is infused into the metabolic system of interest. However, in contrast to many other experimental protocols, in the LIPSSS protocol a single snapshot of stable isotope incorporation in metabolites is taken before the steady state of isotope incorporation is reached. This makes experiments shorter and cheaper. Furthermore, it facilitates *in vivo* experiments in mammals where metabolic steady states persist only briefly.

The quantification of metabolic fluxes from single time-point measurements is possible since the time course of isotope incorporation into metabolites depends on (i) the amount of isotopic material given, (ii) the sizes of the metabolite pools through which the label flows, with larger transit times of label caused by larger pool sizes, and (iii) the flux sizes that transport the isotopic material between the metabolites, with larger fluxes leading to shorter transit times for given metabolite pool sizes. Hence, at every point in time during the dynamic phase before isotope steady state is reached, each metabolite pool contains a characteristic composition of different isotope combinations, called isotopomers. These isotopomers within the metabolite pool determine distinct peaks measured with NMRS or MS. These peaks are calculated from models that contain the carbon transitions, usually already known from earlier experiments. The connectivity of metabolites at the carbon level is found in the scientific literature or databases.

An example of isotopomer fractions evolving dynamically over time in a simplified metabolic pathway is given in figure 1. The measurements are analysed computationally with our computer package FluxSimulator to quantitate the metabolic fluxes.

To this end, the experimental protocol is continuously simulated by FluxSimulator while different parameter estimates are explored by an algorithm to optimize the fit of the model to the measured data. In each step, using the simulated isotopomer fractions of the FluxSimulator simulation the corresponding nuclear magnetic resonance (NMR) multiplet intensities are calculated. The parameter estimates are provided by a nonlinear parameter optimization routine (Nelder & Mead 1965) and the calculated multiplets are compared in each optimization step with the experimentally measured multiplets via the sum-of-squares criterion until a reasonable match between multiplets and model prediction is found (figure 2).

FluxSimulator enables the user to implement metabolic models and to perform flux estimation in an efficient and user-friendly manner. In many existing simulation software packages, metabolic models are specified directly by their mathematical representation via ODEs. By contrast, FluxSimulator uses a straightforward specification of the metabolic pathway via three plain text files. Based on this specification, the entire system of ODEs is assembled and simulated automatically (Binsl *et al*. 2007). This enables biomedical researchers to easily specify models that consist of hundreds of equations in a more flexible way than the error-prone implementation of the ODEs by hand. To illustrate the usefulness of automatic model assembly, the ODEs representing the dynamics of the simple metabolic model in figure 1 are given below:(4.1)(4.2)(4.3)(4.4)(4.5)(4.6)(4.7)(4.8)and(4.9)The metabolite symbols *A*–*D* in these equations give the fractions of these pools taken up by the isotopomers given by the subscript. The metabolite symbols A–D in square brackets indicate the amount per gram of dry tissue mass. The time course of the labelled substrate for the pathway (here, A) is determined by the experimenter who designs the label infusion protocol. Equation (4.1) indicates that the experimenter has arranged to keep the labelling state of pool A, which is infused, constant after stepping to a new value at *t*=0. The flux *v* in μmol min^{−1} g^{−1} of dry mass of tissue is constant throughout the circular pathway. This presents a simple model representation of a metabolic cycle, similar to, for instance, the TCA cycle.

It is of course not very difficult to implement the ODEs for the system in figure 1 by hand in a computer program. However, this becomes a tedious task for extensive systems with metabolites containing more than two carbon atoms (Binsl *et al*. 2007). In particular, for each metabolite containing five carbon atoms, such as α-ketoglutarate in the TCA cycle and glutamate, 32 differential equations must be added to the ODE system. In FluxSimulator, these equations, similar to equations (4.1)–(4.9), are assembled based on simple textual information. In a text file, the brief statement A1:B1 D1:B2, for instance, indicates that atom 1 of metabolite B originates from atom 1 of metabolite A, and atom 2 of metabolite B originates from atom 1 of metabolite D.

FluxSimulator is implemented as a computer package realized in the ‘R’ programming language and environment (R Development Core Team 2008), which is available for the most common operating systems such as MS Windows, UNIX and Mac OS. Owing to the on-the-fly C code generation of the model, followed by compilation and back loading of the dynamic library, the simulation time for a model consisting of approximately 200 stiff ODEs is of the order of 0.1 s on a 3 GHz Pentium-based desktop computer. Parallelization and execution on a cluster computer will enhance the computational speed further. ‘Embarrassingly parallel’ execution to analyse multiple datasets each on a single processor was easily accomplished. Parallelization of parameter optimization on one dataset has not been implemented yet but would be straightforward using, for instance, the R package snowfall.

The approach of performing a LIPSSS experiment in combination with analysis and flux estimation with the FluxSimulator computer package and nonlinear parameter optimization was experimentally validated. To that end, the flux in the TCA cycle and exchange with the directly related amino acids glutamate and aspartate was determined in tissue samples from porcine cardiac tissue *in vivo*. The TCA cycle flux represents the final pathway of aerobic energy conversion, tightly coupled to oxygen consumption in the mitochondria. Excellent correlation (*r*=0.90) was found between flux in the TCA cycle calculated from NMR measurements on extracts of the cardiac tissue samples after 5.5 min infusion of ^{13}C-labelled acetate in the coronary artery and ‘gold standard’ measurements of oxygen consumption on the heart *in situ*.

Experimental protocols for isotope labelling experiments were designed based on computer simulation with the FluxSimulator package. Various experimental protocols, e.g. infusion time of the isotope-labelled substrate or the isotopic composition of the substrate, were simulated. The simulation shows the dynamic development of isotopomer fractions in one or more metabolite pools, and in that way demonstrates how absolute isotopomer composition and, perhaps equally important, the ratios of various isotopomer fractions at a certain point in time depend on the metabolic flux (figure 1*b*). It is also possible to add realistic measurement noise to the simulated values at a certain point in time and then to simulate the process under more realistic conditions. This gives information about the model's complexity and parameter identifiability, and makes it possible to choose the experimental protocol with the most accurate and comprehensive measurement results.

Care must be taken that the model of the metabolic pathway and carbon distribution routes is compatible with the organ and cell type studied. The model for the cardiac study above was in particular designed and tested for heart muscle *in vivo*. Although the metabolic pathways of the TCA cycle and related amino acids incorporated in this model are almost universally found in other organs and cell types, the activity of various anaplerotic pathways varies under experimental conditions. Therefore, the models of the metabolic pathways and of carbon distribution must be adapted to accurately reflect the precise cell type under study. It is desirable that the model's suitability to accurately quantitate metabolic fluxes is examined on an organ-by-organ and cell-type by cell-type basis.

Using FluxSimulator, ensemble simulations to explore the ranges of parameter values (see §8) or the assessment of sloppy parameter sensitivity patterns and spectra (see §6) are easy. The analysis of parameter combinations by simulation enables the user to decide which parameter combinations can be estimated precisely and which are imprecise and can potentially be fixed at certain values or kept free to take care of sloppy directions in parameter space (Gutenkunst *et al*. 2007*a*,*b*).

## 5. Modelling metabolic systems modularly

To analyse complex metabolic systems, it is most useful if one can divide large networks into modules that can be isolated conceptually and experimentally to be studied independently in detail. A module is an independently operable unit that is part of a whole. An example of the modular approach is the experimental isolation of the adenine nucleotide–creatine–phosphate module in muscle metabolism (van Beek 2007, 2008). Although this module is small, not all the parameters of its components were reliably known *a priori* from the molecular and organellar level. The remaining parameters were estimated by combining information from the molecular and the whole-system levels (see §7).

Models such as the carbon transition network (CTN) in figure 1 are usually also part of a bigger system. If the fluxes in the pathway of figure 1 dominate strongly over those entering and leaving via side paths, then it is possible to study this model as a module. It is probably difficult to divide the metabolic system completely into modules that can be studied independently, because metabolic pathways overlap and interact extensively. However, if it is possible to study a certain part in isolation, this makes it possible to define the function of that part of the system accurately. Connecting the modules again to form the whole system is an ideal strategy, which we must fear is only partially feasible.

## 6. Sloppy parameter behaviour

Quantitative mathematical models of metabolic or signalling pathways usually contain tens or even hundreds of parameters describing the kinetics of molecular interactions within the system. Sufficiently precise parameters are essential for useful predictions based on the model. Experimental determination of all kinetic parameters of the system, however, often proves to be difficult or even impossible. This is either because not all components can be isolated or because isolation procedures damage the component. Furthermore, intracellular operating conditions are virtually impossible to reproduce *in vitro*. Hence, the estimation of unknown parameters poses a major challenge in mechanistic modelling of biological processes.

The total set of unknown model parameters can be constrained collectively by fitting different parameter combinations to experimental data. This is accomplished by minimizing a cost function, e.g. a least-squares distance measure between model prediction and experimental data points, which reports how well a set of parameters fits the available data. In many cases, a variety of different parameter combinations agree with similar cost with the available data. One might then often hear the criticism that the model is overparametrized and that parameters cannot be determined with sufficient accuracy. This may, at first glance, seem to disqualify such an attempt to characterize the parameters, but we will show below that it is possible to live with uncertainty in some combinations of parameters and still obtain very useful predictions or useful quantification of some of the parameters.

It turns out that there are sloppy parameter combinations that do not significantly alter the simulation outcome of a model. System parameter combinations varying along sloppy directions in parameter space are therefore not well constrained by experimental data. Other directions (stiff directions) can be well constrained, with small parameter variations yielding significant changes in predictions. An example of sloppy parameter behaviour is given in figure 3. The contours in this plot give the locations of equal deviation of model prediction from the result yielded by the best (‘reference’) parameter set for the data point. Figure 3*a* represents a model that is linear in the parameters *P*_{A} and *P*_{B}. Figure 3*b–d* represents a model of carbon transitions with five parameters, which is an extended version of the model of figure 1. Note that parameters may show strong dependences: in figure 3*a*, a joint increase in *P*_{A} or *P*_{B} results in small changes in model prediction. This constitutes the long axis of the ellipse in figure 3*a*, which is termed a sloppy direction in parameter space. In the case where *P*_{A} changes in the direction opposite to that of *P*_{B}, the prediction by the model changes briskly along the short axis of the ellipse. This is termed a stiff direction in parameter space. In the nonlinear case of figure 3*b* the contours are not ellipsoidal, although the inner contours can be approximated by ellipses. Note that the carbon transition model in figure 3*b* shows a correlation between *P*_{1} and *P*_{2} with short and long axes of the same length order. The relationship between *P*_{2} and *P*_{3} in figure 3*c* is closer to the linear case. Parameter *P*_{4} in figure 3*d* can vary widely with little effect on the prediction by the model, and there is a large difference in the length of the short and long axes of the contour. The ratio of the axes of ellipsoids approximating such contours can reach values of the order of 1000 in many systems biology models (Gutenkunst *et al*. 2007*b*). Parameter sensitivity analyses on many metabolic and signalling networks taken from the BioModels database (Le Novère *et al*. 2006) showed that all these models have sloppy sensitivity spectra and that many parameters show a high degree of uncertainty (Gutenkunst *et al*. 2007*b*). The dynamic behaviours of all analysed models depend on only a few stiff parameter combinations. Sloppy sensitivity spectra appear to be a universal property of systems biology models.

A simple example for a sloppy parameter combination can be given as follows. Consider a chemical side reaction of a linear metabolic pathway that follows the law of mass action with rate constants *k*_{+} and *k*_{−} for forward and backward reactions, respectively. If the reaction is very fast relative to the rest of the biochemical network it is part of, it can be characterized solely by its equilibrium constant *K*=*k*_{+}/*k*_{−} (Brown *et al*. 2004). Both rate constants can be changed freely by the same factor without affecting *K*. In the model, the parameters *k*_{+} and *k*_{−} represent removable degrees of freedom and they can be consolidated to one single parameter capturing the essential behaviour of the reaction.

The sloppy and stiff directions in parameter space are determined by calculating the eigenvectors of the Hessian matrix of the sum-of-squares function in the sloppy modelling approach (Brown *et al*. 2004; Gutenkunst *et al*. 2007*b*). A similar approach to unveil more or less hidden correlations between parameters is principal component analysis (PCA) on a matrix, which describes parameter influence on model behaviour. Both methods identify linear combinations of parameter sensitivities that account for the major variation in the simulation outcome of a model. Those combinations that represent independent directions in parameter space can provide a basis for model reduction (e.g. Gokulakrishnan *et al*. 2006) in which negligible parameters are removed from the system in order to reduce complexity. A specific example of eigenvector decomposition of a model and what it reveals about the components of the model and their relations is given below (§7).

For a model of nerve growth factor signalling, Brown *et al*. (2004) showed that individually determined experimental parameter measurements, e.g. obtained on isolated molecules, must be extremely precise in order to yield reasonable predictions of integrated network behaviour. They show that model parameters fitted collectively to time-course data on signal protein phosphorylation perform much better for model predictions of responses to new experimental interventions. Some parameter combinations can be determined accurately in mechanistic models of biological processes, while other combinations are ill determined but also do not matter so much for further predictions.

To make useful predictions, we can do a set of simulations with parameter combinations from the part of parameter space where it is plausible that parameter combinations are located. The part of parameter space considered should be compatible with the already available experimental data. This is termed an ensemble approach. It turns out that the ensemble prediction of the response in a new experiment can still be constrained to a useful small range.

## 7. Module for high-energy phosphate group transfer in heart muscle

In the following, we will illustrate the sloppy modelling ensemble approach by the analysis of a small model, published previously (van Beek 2007). This model describes the transport and buffering of high-energy phosphate groups in heart muscle and is used to analyse the dynamic adaptation of oxidative phosphorylation in the muscle cells to changing workloads. The model is publicly available in the BioModels database (Kongas & van Beek 2007) and in the CellML model repository. A ‘skeleton model’ approach was chosen, meaning that only key elements that regulate the adaptation of adenosine triphosphate (ATP) synthesis in the mitochondrion to increased cytosolic ATP hydrolysis in the muscle cell are incorporated (figure 4).

After ATP has been synthesized in the mitochondrial matrix and exported to the mitochondrial intermembrane space (IMS), its high-energy phosphate group can be transported further into the cytosol (CYT) in two different ways, either by direct diffusion of ATP itself through the mitochondrial outer membrane (MOM) or by the so-called ‘phosphocreatine (PCr) shuttle’, which operates as follows. In the IMS, one high-energy phosphate group of ATP is transferred to creatine (Cr) to form PCr, which then diffuses through the membrane into the cytosol. In the cytosol, the phosphate group of PCr is then transferred to adenosine diphosphate (ADP) to form ATP. The reactions are catalysed by two isoforms of creatine kinase (CK), one mitochondrial form (Mi-CK) and one cytosolic form (MM-CK), respectively. The input of the model is ATP hydrolysis, given by a forcing function that represents the experimentally measured changing muscle workload. Inorganic phosphate (P_{i}) together with ADP is then transferred to the mitochondrial matrix to serve as the substrate for ATP synthesis.

The model consists of 10 ODEs, one for each of the five metabolites in the cytosol and for the same metabolites in the IMS, and has 24 free parameters for enzyme kinetics and mitochondrial membrane permeability (see van Beek (2007) for details). Metabolite and enzyme levels, as well as kinetic constants, had been experimentally determined and were collected from the scientific literature. For the permeability of the MOM to ATP and ADP, values from different experiments seemed to vary over a large range: its reported values differ from 0.16 to 85 μm s^{−1} (van Beek 2007). This may be due to damage to the interface of the mitochondria incurred during isolation from the cell or during removal of the outer cell membrane by chemical treatment. Simulation with the model showed that the dynamics of adaptation of oxygen consumption in the mitochondria depends strongly on MOM permeability. ADP permeability was therefore optimized to measurements with oxygen electrodes in the venous outflow from the heart as a whole corrected for the transport time in the coronary vessels, showing that the oxygen consumption in the mitochondria, which is directly linked to oxidative phosphorylation (OxPhos), increases with a generalized time constant of 3.7 s, in response to a step increase in electrically paced heart rate.

With ADP permeability known, the behaviour of the system could be predicted. For instance, it was predicted that an increased expression of the Mi-CK leads to faster dynamic adaptation of oxidative phosphorylation, while an increased expression of MM-CK leads to slower dynamic adaptation of oxidative phosphorylation. Furthermore, it was predicted that the CK reaction leads to buffering of the oscillation of ADP in the cytosol and of oxidative phosphorylation in the mitochondria. However, very surprisingly, CK activity also leads to lower levels of inorganic phosphate, a breakdown product of ATP, which may inhibit contractility.

The sloppy and stiff directions in parameter space (corresponding to the long and short axes of the ellipsoids; figure 3) for this model were determined by calculating the eigenvectors of the Hessian matrix of the sum of squares. In the four eigenvectors with the largest eigenvalues, the coordinates of the parameters of several molecular processes show up strongly. These are, respectively, oxidative phosphorylation, MM-CK, Mi-CK and the permeability to ADP of the MOM. An increased capacity for oxidative phosphorylation gives similar effects to increased affinity of oxidative phosphorylation to ADP and P_{i} in the eigenvectors with the three largest eigenvalues. A group of parameters for MM-CK and independently a group of parameters for Mi-CK are also found in the eigenvectors. The parameters of the equations of a single molecular process are usually coupled in a group in the eigenvectors. However, the eigenvectors tend to show combinations of molecular processes that are not necessarily directly connected in the scheme of the model (figure 4). In this case, the eigenvectors therefore do not seem to provide a clear-cut basis to divide the system into modules, although parameters in the nonlinear enzyme kinetic equations for a particular component are often found together. PCA of the parameter sensitivities of the model, an analysis done previously on a chemical reactor system (Gokulakrishnan *et al*. 2006), shows similar results.

Interestingly, simulations with the system revealed that the PCr shuttle contributes only one-third to the total high-energy phosphate group transfer from the mitochondria to the sites of ATP hydrolysis to energize muscle contraction. This seemed to contradict the ‘PCr shuttle hypothesis’, which states that almost all high-energy phosphates in muscle cells are transferred as PCr (Bessman & Geiger 1981). However, our model prediction was based on a set of reference parameters on enzyme kinetics, and our statement was not supported by an extensive analysis of the experimental error levels that lead to imprecision in the parameters. Our aim in §8 is therefore to investigate how uncertainty in the parameter values affects the predictions by the model.

## 8. Ensemble predictions of high-energy phosphate group transfer

The predictions derived from simulations of the high-energy phosphate transfer model give valuable insights into the functioning of energy metabolism in muscle cells. Nevertheless, the model relies strongly on the accuracy and precision of experimentally measured kinetic parameters for the components of the system. The parameter estimates are subject to experimental error and may differ between *in vitro* and *in vivo* conditions. In the following, we describe an ensemble approach to answer the question whether it is reasonable to state that the PCr shuttle accounts for only a small part of the total high-energy phosphate group transfer, given the limited precision with which model parameters are known.

We chose the following ensemble method to evaluate the uncertainty of our prediction. We generated parameter sets for the ensemble simulation by drawing values from a Gaussian distribution with mean equal to the parameter value from the literature (van Beek 2007) and a relative standard deviation of 0.1. In this way, we investigated what an error of 10 per cent for each measured parameter means for the predicted fraction of high-energy phosphate groups carried by PCr. The membrane permeability for ADP and ATP, assumed equal, is not randomly drawn directly, but optimized to exactly reproduce the randomly drawn generalized time constant of the response to an increase in the cytosolic ATP hydrolysis (3.7 s, s.e.=0.3 s), reflecting the variation found in the whole heart experiment.

For each parameter set of the ensemble, the system is simulated over 100 s, with an increase in the ATP turnover rate by increasing the paced heart rate from 135 to 220 b.p.m. at 50 s. The fraction of high-energy phosphate being transferred by PCr is then calculated for each second of the resulting time series according to *J*_{diff,PCr}/(*J*_{diff,ATP}+*J*_{diff,PCr}), with *J*_{diff} giving the net diffusion flux of PCr and ATP from mitochondria to cytosol.

The central 95 per cent upper and lower bound of the resulting trajectories from the ensemble calculation for the fraction of PCr phosphate transfer is plotted in figure 5. Strikingly, values for the calculated fraction drop below zero due to a net flux of PCr from the cytosol into the mitochondria, meaning that high-energy phosphate is transferred back into the IMS, which seems thermodynamically infeasible. This is caused by our scheme to generate random parameters, where the kinetic parameters of both CK enzymes were drawn independently. Six kinetic parameters for each enzyme determine the reaction's equilibrium constant, *K*_{CK}, via the Haldane relation. Drawing kinetic parameters independently therefore causes differences in *K*_{CK} in IMS and cytosol. This may cause a reversal of *J*_{diff,PCr} (figure 5*a*). Perpetual diffusion fluxes are calculated even for cases where there is no ATP hydrolysis and with oxidative phosphorylation being fully inhibited in the model (figure 5*b*). Drawing kinetic parameters independently therefore causes a situation that is thermodynamically impossible. When there is no ATP hydrolysis and synthesis, all diffusion fluxes should become zero in the steady state.

To prevent diffusion fluxes in the absence of any energy input, we add the thermodynamic constraint to the model that the *K*_{CK} is equal in IMS and cytosol. This is implemented by drawing one *K*_{CK} (with 10% random error added) and setting it equal in IMS and cytosol for both CK isoforms. For MM-CK and Mi-CK, five of the kinetic constants are drawn and the sixth constant is calculated to yield the correct *K*_{CK}. This is balanced such that all kinetic parameters are calculated in this way an equal number of times.

The resulting uncertainty range for the prediction of the contribution of PCr to the overall high-energy phosphate group transfer is shown as the green region in figure 5. Even though the ensemble prediction covers a broad range, figure 5*a* suggests that PCr does not carry much more than half of the high-energy phosphate flux from mitochondria to contractile elements, which contradicts the PCr shuttle hypothesis posed more than 25 years ago in *Science* (Bessman & Geiger 1981). The prediction range is narrower if thermodynamically infeasible parameter combinations are annihilated. This example demonstrates that, for models relying on experimental parameter measurements, each prediction should be carefully investigated with respect to possible measurement errors.

## 9. Intuitive modelling for biologists: models of metabolism with Petri nets

Petri nets were developed in 1962 by Carl Adam Petri. The Petri net formalism describes a mathematical structure to construct directed bipartite graphs for modelling pairwise connections: a collection of two different node types is pairwise connected by edges. The nodes are called ‘places’, representing parts of a system, and ‘transitions’, representing interactions between the parts. In the common type of Petri nets, places contain a variable integer number of tokens that reflect the state of the place. The dynamic behaviour of the network is then described by the consumption of tokens from one place and the generation of tokens in the connected place when a transition fires during the execution of a Petri net (see Murata (1989) for a detailed review). Identifying the places with metabolites and the transitions with biochemical reactions is natural and is probably more intuitive to non-mathematicians than ODE-based models. Hence, besides modelling the LIPSSS experiments described above by systems of ODEs, we modelled such metabolic networks also with Petri nets.

Petri nets can specify, verify and simulate a large amount of different systems (concurrent, asynchronous, distributed, parallel, deterministic and stochastic). They provide an intuitive way for users to deal with complex models and dynamic system behaviour due to their graphical representation. The user can easily switch between two points of view: (i) the transitions and the order in which they are executed and (ii) the states of the system that are reached after a given sequence of actions. Petri nets have been investigated for many years, leading to different Petri net types specialized for different applications.

Here, we present a new approach to simulate tracer and LIPSSS experiments using coloured Petri nets (Jensen 1998). Similar to Petri nets, metabolic networks are inherently bipartite due to the metabolic fluxes (transitions) that connect interacting metabolites (places) of the network. In addition, biological reactions take place independently or in parallel, which makes them easily representable by a concurrent system such as Petri nets. Furthermore, Petri nets provide random firing of transitions to simulate natural processes that are governed by stochastic laws. Their non-deterministic properties make them appropriate to model situations where numbers of molecules are low, because the randomness is expected to lead to small random deviations if many molecules react independently.

Coloured Petri nets contain distinct groups of tokens, each represented by a different colour. By assigning the colours to isotopomers, they are perfectly able to deal with the different labelling states of metabolites, i.e. isotopomers, occurring in tracer experiments. We developed a software package that enables one to easily specify metabolic networks at the level of carbon atoms and their connections via metabolic reactions (CTN). The user draws molecules and transitions in a graphical user interface. The CTN is then converted into a data structure representing the coloured Petri net.

During the conversion from CTN to Petri net representation, for each metabolic pool of the CTN, a place in the Petri net is created and the coloured tokens representing the isotopomers of the pool are determined. The user has the choice to simulate the Petri net's dynamic behaviour in either a continuous or a discrete (figure 6*a*,*b*, respectively) manner (Chouikha & Schnieder 1998). However, the constitution and the number of tokens depend on the choice of Petri net, i.e. continuous or discrete. The continuous approach represents each isotopomer by a single coloured token with the fraction of the particular isotopomer attached as an attribute. In the discrete approach, however, the number of tokens for each isotopomer is proportional to the concentration of its metabolic pool. The number of molecules chosen to be represented by each token is the proportionality constant. Each token represents a fixed number of molecules of the metabolite, one or more, while the colour of the token still refers to its isotope composition. Finally, all individual carbon transitions present in the CTN are combined within the Petri net transitions. Therefore, for each reaction present in the metabolic network, it is determined which isotopomers in precursor pools give rise to each isotopomer of a particular metabolic pool and lumped within a Petri net transition. This information is subsequently used to perform the simulations. Both approaches offer the possibility to simulate the system step by step, giving the user the possibility to explore system behaviour interactively.

In the continuous case, a simulation of the Petri net closely resembles a simple Euler integration of the corresponding ODE representation of the model. At each time step, all Petri net transitions are fired simultaneously and the changes in the isotopomer fractions are updated in the numeric attribute attached to the tokens (one token per isotopomer). By contrast, during the discrete simulation, all Petri net transitions are fired in a random order and tokens representing ‘real’ molecules are transferred between the places. The results of a simulation of both the continuous and the discrete Petri net in figure 6 are shown in figure 7.

The results show that, although the discrete Petri net simulation is a completely different representation of the system, the overall behaviour of the system stays the same. The knowledge gained in this way for the experimentalist is very useful to develop metabolic models with desired properties and offers an entry point for a more detailed analysis using model representations via ODEs.

## 10. Future directions

The human or animal metabolic network is extensive. Because it lies at the heart of physiology, it would be extremely useful to have a comprehensive understanding captured in computational models of metabolic functioning. Ideally, one would like to have access to an *in silico* model of metabolism, which covers the entirety of human metabolism without more or less arbitrary subdivisions and modularizations. However, owing to the complexity of metabolism, such subdivisions are necessary to study metabolism. Nevertheless, subdivisions can be quite arbitrary: a particular modularization may be useful for one particular task and set of experiments, but insufficient for others. Metabolic pathways that interact little under one condition may become strongly linked under other conditions. The alternative is not very attractive: a large model constructed in one step is very difficult to test comprehensively. Partial reactions in the system may be in error. Modelling in one large step makes it hard to debug the system, while dividing the system into modules that are each tested and debugged is desirable.

Together, the approaches in §3 may be helpful to tackle the modelling and analysis of the metabolic system, but the question is whether they are sufficient to allow us to model either the essence or the entirety of the complete metabolic system, despite the challenges posed by its large size and complex structure. The approaches of multiscale, modular, ensemble modelling, etc. are closely interwoven. Must we start by designing a grand scheme for modelling the system, providing a top-down very structured approach for modelling human metabolism? Even a grand design will work best if we proceed in relatively small steps. The difficulty for these steps will be that there are always boundaries to the modules into which we divided the system that may be hard to control.

A crucial aspect is organization of the modelling process in such a way that, if inevitable mistakes are made in early model versions, the interplay between computer simulations and experimental tests results in a gradual improvement of the model. We must aim to make the modelling process the driving force behind metabolic experimentation and data collection, such that it becomes the vehicle for integration of knowledge and understanding of the complete metabolic system.

## Acknowledgments

This work is part of the BioRange programme (project no. SP 2.2.1) of The Netherlands Bioinformatics Centre, which is supported by a BSIK grant through The Netherlands Genomics Initiative (NGI). The work is also part of the Centre for Medical Systems Biology, which is a Genomics Centre of Excellence funded by the Dutch Government via the NGI.

## Footnotes

One contribution of 15 to a Theme Issue ‘The virtual physiological human: tools and applications I’.

- © 2009 The Royal Society