## Abstract

Many important biological functions are strongly dependent on specific chemical interactions. Modelling how the physicochemical molecular details emerge at much larger scales is an active area of research, currently pursued with a variety of methods. We describe a series of theoretical and computational approaches that aim to derive bottom-up descriptions that capture the specificity that ensues from atomistic detail by extracting relevant features at the different scales. The multiscale models integrate the descriptions at different length and time scales by exploiting the idea of mechanical responses. The methodologies bring together concepts and tools developed in seemingly unrelated areas of mathematics such as algebraic geometry, model reduction, structural graph theory and non-convex optimization. We showcase the applicability of the framework with examples from protein engineering and enzyme catalysis, protein assembly, and with the description of lipid bilayers at different scales. Many challenges remain as it is clear that no single methodology will answer all questions in such multidimensional complex problems.

## 1. Multiscale dynamics of biomolecules: from chemical interactions to dynamical responses

The realization that there is a molecular basis to many important biological functions, and conversely to loss of function leading to disease, has prompted a shift in the study of biological systems towards an emphasis on the chemical interactions that underpin biological and physiological processes. However, despite the wealth of generated data and simulations, a theoretical framework that could lead to deeper mechanistic understanding, together with opportunities for biological design and control, is still in its infancy. It is now clear that the study of biological components in isolation is insufficient. At the same time, simulating the full spatial and temporal information of large number of molecules is impractical and, even if possible, it would not necessarily increase our insight.

Biological function is often the result of the action of tightly knit groups of molecules, hierarchically coupled in systems within systems into cellular and supracellular arrangements. A major challenge for computational approaches to biological simulation is to uncover how molecular interactions, which are specific, local and occur at the nano- and sub-nanometre scale, appear at larger scales where physiological responses occur. Because these are highly complex systems in which nonlinear and non-periodic interactions span orders of magnitude of time and length scales, naive attempts at compartmentalizing the dynamics often do not capture the key underlying phenomena. In the post-genomic era, the challenge is now to develop quantitative frameworks that can aid our understanding of biological behaviour, all the way from the molecular level to the emergent properties at the organism level. To achieve this, an integrated multi-level approach is required where the essential chemical components are identified and then integrated upwards taking into account the metabolic network structures and the relevant physiological mechanics. Its defining characteristic is the integration of successful reduced descriptions across multiple time and length scales. Our goal is to derive dynamical, bottom-up descriptions by sacrificing non-essential degrees of freedom, without losing the chemical specificity that ensues from the atomistic detail. Unlike continuum models on the one hand, which ignore the discreteness of biological components, and static networks on the other, where the nodes are either imposed or obtained from data, we seek to derive discrete, chemically based network models that, rather than being static representations, can lead to dynamical models.

The development of low-resolution or coarse-grained approaches to biomolecular systems is a very active area of research. For an excellent, recent review on the different directions on methodology see Ayton *et al.* (2007). These approaches are also relevant to lipid membranes (Brannigan *et al.* 2006), proteins (Tozzini 2005) and nucleic acids (Pyle & Widom 2006). Our focus here will be a particular set of methods motivated by recent advances in diverse areas of mathematics that are based on nonlinearity, robustness and input–output optimality. These features have been largely ignored in the physicochemical theoretical community but may be useful in the characterization of biological processes. They lead to the use of a set of distinct mathematical tools (such as algebraic theory, model reduction, graph theory and non-convex optimization) that have been developed in the area of engineering mathematics and theoretical computer science. Underpinning the mathematical framework is the realization that often chemical interactions are coupled with other scales through mechanical responses. For a wide range of biological examples, from cells to tissues to organs that use mechanics in a hierarchical manner, see the recent review by Ingber (2006). Mechanical analogies have been useful in coarse-graining chemical systems in the past, for example on harmonic lattices (Deutch & Silbey 1971). Here, we show how starting from detailed atomic structures, we derive reduced units based on generic mechanical properties, such as shape and rigidity, that are well defined and accessible at all scales.

The dynamical behaviour of biomolecules provides a link between biological structure and function. One of the outstanding successes of theoretical and computational chemistry has been the accurate representation of the dynamics of biomolecules through the development of carefully parametrized potentials to describe chemical interactions (Brooks *et al.* 1983; Conrell *et al.* 1995). This *molecular dynamics* framework involves the numerical integration of nonlinear equations of motion of a very large number of degrees of freedom in contact with thermostats and barostats that ensure thermodynamic equilibrium (Hoover 1986). However, there are computational limitations to the applicability of this approach to large biomolecules and supramolecular assemblies, over the time scales that are relevant to biological function. Moreover, molecular dynamics simulations show that a large proportion of the available degrees of freedom are usually ‘averaged out’ over time and lead to non-observable dynamics. In this paper, we briefly introduce some ideas that allow for a stratified, simplified description of biomolecular systems based on constrained dynamics.

Firstly, we would like to obtain biologically relevant reduced descriptions of biomolecules that faithfully represent their dynamics. Owing to the intrinsic complexity of these systems, the mathematical tools invoked depart from the standard linearization assumptions. We use graph theoretical concepts originally developed for the analysis and design of mechanical structures, specifically frameworks and rigidity theory (Connelly 1980; Connelly & Whiteley 1996), to eliminate degrees of freedom and simplify the molecular dynamics. This allows for the biomolecules to be partitioned into quasi-rigid bodies and suggests naturally reduced representations for further study at larger scales. It also highlights the intimate link between structure and functional behaviour, as the partitionings are derived from physicochemical interactions. We illustrate some of the results with applications to the mechanical modulation of enzyme activity and the emergence of supramolecular architecture at larger scales.

This dynamical partitioning leads to coarse-grained molecular models in which the concept of shape emerges naturally, since the resulting quasi-rigid bodies are almost always anisotropic. The study of anisotropic interactions is a classic area in chemical physics (Onsager 1949; Berne & Pechukas 1972; Gay & Berne 1981; Perram *et al.* 1996) but there is surprisingly little work on the implication to dynamics. We study how anisotropic reduced models can have important implications for structure and dynamics, and we concentrate on specific examples involving lipid bilayer formation and Brownian motion of ellipsoidal particles. Another area of current research is the emergence of continuum properties from assemblies of reduced models of biomolecules and how continuum properties depend on molecular detail. We will highlight some of these issues with examples where the physicochemical structure of lipids is shown to influence the spontaneous curvature of lipid bilayers and the activity of certain curvature-sensitive enzymes. This interdependence of ‘chemical composition–curvature–enzyme activity’ can lead to an intrinsic biophysical feedback mechanism that regulates lipid biosynthesis (Attard *et al.* 2000). Throughout the paper, we will use a series of model problems to exemplify the methodology which are nevertheless complex and relevant enough to merit research in their own right.

## 2. Towards dynamical coarse-graining of biomolecules: rigidity and model reduction

### (a) Graph rigidity for biomolecules

Our first goal is to identify rigid substructures in biomolecular structures. At the smallest scale, this is translated into groups of atoms that move as ‘blocks’ and can thus be approximated as rigid bodies with frozen internal degrees of freedom. Although there is ongoing research into the use of linear methods (such as normal mode analysis and Gaussian network models; Bahar & Rader 2005; Tozzini 2005) to find relevant atom groupings, we have used a nonlinear, graph theoretical approach to identify such structures. The mathematical foundation follows from rigorous results in graph and combinatorial rigidity theory (Connelly 1980) which were originally proved by Laman for planar graphs. These results can be extended to a sub-class of graphs in three dimensions (bond-bending networks) that are good descriptions of molecular structures (Jacobs 1998; Jacobs *et al.* 2001).

Starting from a full atom description that includes both covalent and non-covalent interactions, it is possible to derive a representation of a protein as a bond-bending network (a binary graph), where the nodes are the atoms and the edges indicate distance constraints derived from the interatomic interactions. Concepts from generic graph rigidity theory are then applied to identify flexible (underconstrained) and rigid (overconstrained or isostatic) regions (Jacobs 1998). These mechanical constraints can serve to identify, for example, which bonds can rotate; which bonds have correlated motions; and which segments of the protein behave as rigid bodies. Continuous regions of consecutive rigid bonds define rigid clusters that are interlinked by flexible bonds. This mechanical partitioning can be computed efficiently with FIRST, a computational tool for the analysis of proteins developed by Jacobs *et al.* (2001). FIRST implements a pebble game algorithm that scales linearly with the number of atoms in three-dimensional systems (Jacobs & Thorpe 1995). An important feature of this graph theoretical approach is that it uses a full-atom description to obtain the distance constraints and hence derives the set of rigid clusters instead of assuming them *a priori.* In addition, the identification of available degrees of freedom and constrained regions is a combinatorial process and thus it is truly nonlinear. The advantage of this method is that it remains efficient computationally by avoiding costly energy minimizations, diagonalization of large matrices, or long molecular dynamics simulations.

### (b) The impact of rigidity on enzyme inhibition

This approach can be used to quantify the impact on rigidity of a given intramolecular interaction or group of interactions. For example, the impact of a hydrogen bond can be assessed by removing the corresponding edges from the graph and recalculating the rigidity of the protein. Because graph rigidity is a nonlinear, non-local property, we expect such contributions to depend heavily on the topology of the global network and the relative position of the interactions and not only on the usual local rules based on energy. This application at the residue level of description can be useful in elucidating the influence of residue positions and substitutions, for example mutations, on the rigidity of the protein. The impact of each residue on the global rigidity depends both on its position and on its contribution to the global hydrogen-bonding network. This structural feature can have implications for the functionality of proteins.

We have used this methodology to study natural and synthetic proteins of the family of the Bowman–Birk inhibitor (BBI), a class of naturally occurring serine protease inhibitors that inhibit enzymes such as trypsin by binding to them (figure 1). These proteins share a conserved reactive site loop whose presence in a conserved conformation is linked to their inhibitory activity. The most potent inhibitor known to date (found in the sunflower seed) consists of only this loop in a cyclic form (Brauer *et al.* 2002). In addition to naturally occurring instances, there is considerable experimental effort devoted to the synthesis of BBI mimics, minimal in size but still functionally active, as well as the design of mutated ones with maximal activity (McBride *et al.* 2002). Our results (Costa & Yaliraki 2006) show that this canonical loop is an independent rigid cluster in the natural inhibitors. We have further used the technique to identify residues that play an important role in the structural rigidity of the protein and relate their presence to enzymatic activity. We find that the conserved elements among the natural and synthetic peptides are those that contribute the most to rigidity, even if they are located far from the active site. These results help to elucidate why certain mutations in the loop of the BBI produce peptides that fail to have the designed inhibitory activity.

The idea that the dynamical aspects of protein structure, and not only thermodynamic measures such as binding energies, are important in catalysis is gaining ground (for a review see Benkovic & Hammes-Schiffer (2003)). Recent experimental evidence showing that catalysis depends on the whole network of residues, and not only on the active site (Eisenmesser *et al.* 2005), highlights the need for such new methodological analyses.

### (c) Dynamical coarse-graining from rigidity

The output from rigidity analysis can also be used to postulate coarse-grained representations for the dynamics of biomolecules based on block rigidity (Hemberg *et al.* 2006). It is important to remark that the blocks are identified bottom-up, from the physicochemical interactions, without having to specify their number or structure *a priori.* This course-graining approach leads to great simplification—starting typically from several thousand atomic coordinates it outputs representations with tens of variables. These reduced models are then amenable to be used either within stochastic kinetics or rigid body dynamics frameworks. We have studied such mechanical partitionings in individual proteins as well as in supramolecular assemblies of proteins involved in viral capsid formation.

Figure 2 shows an example of the partitioning of the protein adenylate kinase based on rigid blocks. This protein undergoes a large conformational change upon biding, where two extremal domains of the protein (the LID domain at the top and the AMP domain on the right) come together in a ‘closing’ motion. Our rigidity analysis shows that most of the secondary structure elements of the protein belong to different clusters. Moreover, it can be seen that the LID and AMP domains form separate rigid clusters as would be expected from the conformational change. The partitioning can be then translated into a reduced model in terms of rigid blocks which leads to a reduced molecular dynamics. We have implemented this scheme computationally to evaluate the validity of these partitionings in a dynamical sense. Early results indicate that the partitions based on rigidity are also meaningful for the dynamics of the problem.

The reduction in the dimensionality of the problem opens the possibility of studying supramolecular assemblies of biomolecules and the process of self-assembly itself. Protein–protein interactions are central to cellular processes such as signalling, genetic regulation and molecular recognition. Protein aggregation and self-assembly are also the biophysical processes underlying the formation of plaques and other aggregates in neurodegenerative diseases, such as Huntington's (Burke *et al.* 2003). Another important example of protein self-assembly is the formation of viral capsids. Each virus carries genetic material that can code for a particular coat protein. These proteins then assemble into highly symmetric arrangements (icosahedral in many cases) of identical, non-symmetric proteins that surround and protect the viral genetic material. It is known that the assembly pathways differ from virus to virus which indicates that molecular detail plays an important role in the process. We have developed a bottom-up approach that starts with atomic descriptions of the individual proteins and performs rigidity analysis on the possible intermediate aggregates to establish partitionings into rigid substructures (Hemberg *et al.* 2006). The reduced descriptions are then used within a modified Gillespie algorithm to sample the kinetics of self-assembly.

Figure 3 shows some sample results of this approach, highlighting the fact that the molecular fingerprint of the individual viral proteins translates into the appearance of distinct assembly pathways, both in structural intermediates and time scales of formation. This influence of individual molecular characteristics on the emergent kinetics would be difficult to obtain from purely continuous models or from discrete phenomenological models that lack any atomistic detail. The observed differences are the result of distinctive intermediates that vary both in association energies and dissociation propensities, which are affected by the rigidity of the aggregates. Because it is a nonlinear, global property, the rigidity of a particular unit may change dramatically as aggregation occurs, even though local neighbourhoods remain identical. As units associate, the global interaction graph changes leading to a redistribution of clusters during the self-assembly progression. This information may provide insight into the emergence of higher-order architecture and into the possibility of therapeutic interventions in the self-assembly process.

### (d) Dynamical model reduction

In the approach described so far, time is not considered explicitly in the partitioning. Therefore, the partitionings would have to be tested dynamically, as indicated above. The graphs and partitions would need to be rederived as the system evolves in time, either stochastically as in the viral capsid assembly or under molecular dynamics of rigid bodies as discussed above.

If time is considered explicitly *a priori,* coarse-graining becomes more involved. We have used exact approaches based on projector operator techniques to provide solutions on linear model systems (Cubero & Yaliraki 2005*a*). However, these methods have limited validity due to the memory of the kernel of the system, an issue of importance in reduced models of condensed phase systems (Cubero & Yaliraki 2005*b*). We have also explored an alternative approach where we use model reduction techniques to obtain a representation of the system over a finite time horizon. This is inspired by the input–output theory of model reduction in the field of theoretical control. We have already shown that we can recover good low-order approximations to the classic problem of the appearance of dissipation (Barahona *et al.* 2002). We are currently exploring how these methods will deal with biomolecular systems.

## 3. Molecular shape: the effect of anisotropy on dynamics

### (a) Anisotropic potentials for coarse-grained dynamics

We have described above a procedure based on mechanical and graph theoretical concepts through which simplified structural descriptions can be derived rather than postulated. The induced partitionings bring to fore the importance of ‘shape’ in biomolecules, i.e. the well-known fact that units in biology are anisotropic, with important consequences for system properties. However, anisotropic interactions are important not only for structural or equilibrium properties but also for packing and dynamical properties. For instance, surprising, and to date largely ignored, is the effect of anisotropy on Brownian motion. Anisotropic particles do not conform to the classical results and the coupling of translation and rotation leads to non-Gaussian behaviour, especially in confined environment as has been recently also experimentally observed (Han *et al.* 2006). The effects are predicted to be even stronger in the presence of external forces (Grima & Yaliraki 2007), which suggests that they could have implications for the crowded cell.

When describing the relevant mechanical properties of biomolecules through coarse-graining, anisotropic potentials are a natural choice since they capture the intrinsic constraints introduced by the interactions. This is exemplified in figure 2, where we have represented the enclosing volumes for the atoms in the rigid clusters as ellipsoids. However, most standard approaches assume isotropic potentials (see, however, Liwo *et al.* 1997). General, rigorously defined and parametrized anisotropic potentials are still lacking. Recently, we have shown that the classic anisotropic potential, the Gay–Berne ellipsoidal potential (Berne & Pechukas 1972), can lead to artificial ordering forces because it does not represent volume properly under all particle orientations. This observation could have implications for the dynamical pathways of self-assembly where shape plays an important role. Inspired by the work of Perram and co-workers (Perram *et al.* 1996), we have provided an alternative framework, by realizing that their geometric approach could lead to the distance of minimal approach across the particle intercentre vector. We showed that this is a good approximation to the exact distance of closest approach and developed an ellipsoidal potential which is generally applicable to any mixture of elliptic particles at any orientations and which scales correctly with distance and captures the complexity of the well shapes associated with orientation (Paramonov & Yaliraki 2005). The potential is obtained using mathematical techniques from algebraic theory and optimization.

### (b) Assembly of coarse-grained lipid bilayers

A key advantage of our approach is its computational applicability. The derived potential is amenable to explicit, efficient dynamical simulation due to the analytical form of the forces (Paramonov & Yaliraki 2005). We have developed an in-house software that implements the full molecular dynamics based on this ellipsoidal potential (for details see Paramonov *et al.* (submitted)) and have used it to simulate coarse-grained dynamics of lipid bilayer assembly. Some of the results of these coarse-grained simulations are shown in figure 4. The obtained bilayers exhibit desired structural and dynamical features, i.e. regarding ordering, diffusion and fluctuation behaviours, which compare well with other models. We are currently pursuing a double direction. Firstly, we are furthering the parametrization of the ellipsoidal potential to reflect specific molecular properties. Secondly, we are studying averaged properties on large ensembles of coarse-grained lipids to relate them to (discretized) continuum mechanical models. As seen in figure 4, quasi-continuum properties, such as curvature and undulations, become more easily measurable in large ensembles.

### (c) Protein–membrane interactions and mechanical feedback

The goal of the preceding programme is to be able to introduce molecular detail into higher-level models of the mechanics of the membrane. These descriptions are becoming ever more important due to the growing experimental realization that the cell actively exploits and regulates the strong interdependence between the lipid composition of the membrane and its mechanical response. This is at the root of several important functions, such as membrane trafficking, vesicle budding, growth, division and movement (McMahon & Gallop 2005).

We have started by studying the interaction of lipid bilayers with small compounds and with proteins. Small compounds are also at the heart of non-specific drug binding. Early results indicate that small compounds can penetrate the bilayer in some cases severely disrupting its local structure (Baciu *et al.* 2006). The interaction of proteins, specifically enzymes, with lipid bilayers is key to several functions, in which there is an interplay between curvature-generating and curvature-sensing proteins (Attard *et al.* 2000). We have studied in detail a particular model in which some of the enzymes involved in lipid biosynthesis have been postulated to be curvature-sensitive, in the sense that their activity is modulated by the stored elastic curvature energy of the lipid bilayer. These enzymes have amphipathic α-helices that bind more or less strongly to the bilayer depending on its spontaneous curvature. The binding to the bilayer induces a conformational change that enhances the activity of the enzyme. This has the effect of modifying the lipid composition of the membrane and provides a biophysical mechanism for feedback control between curvature and lipid composition. Our analysis to date has been biophysical and kinetic. There is work in progress using the computational approach presented above to simulate bilayers with different lipids to estimate their spontaneous curvatures and to study their interaction with amphipathic ellipsoids.

### (d) Other applications

Since the developed potential is generic, it can be applied to any spherical or anisotropic coarse-grained unit of interest. For example, it can be combined with the rigidity analysis as described above—the rigid bodies derived from atomistic-detailed protein structures could be modelled through anisotropic particles. This methodology could be used to explore the effects of anisotropy on the explicit dynamics of viral capsid assembly and protein aggregation, beyond the kinetic approach introduced above, by introducing a reduced dynamics with a level of structural detail derived from the molecular interactions.

It is also applicable to other problems where anisotropic interactions are determinant. This includes the classic packing problems for ellipsoids, which could be relevant for crowding in the cellular environment. When dealing with such problems, we have also exploited the fact that our potential is based on the minimal distance between ellipsoids to recast the evaluation of configurations as a high-dimensional optimization problem. For the case of ellipsoids, this can be solved efficiently through a subset of semidefinite programmes (Vandenberghe & Boyd 1996) which gives all the pairwise distances at once, together with proofs that they have indeed been obtained. We have used this step in the implementation of the molecular dynamics (Burke & Yaliraki 2006) and it could also be used to perform quasi-static configurational searches. Importantly, this scheme is not restricted to ellipsoidal particles. A recently developed approach, namely sum-of-squares (SOS) decomposition theory (Parrilo 2000), employs algebraic techniques to optimize nonlinear (polynomial) non-convex problems. We are using this framework to extend this approach to non-convex bodies, thus increasing the fidelity of the coarse-grained descriptions.

## 4. Outlook

We have described here a series of coarse-graining directions which, starting from atomistic descriptions to using mechanical concepts, aim to show how the traditional chemistry of local interactions translates into global behaviours in biological systems. We remark that the mechanical aspects are just the initial step to be complemented through statistical mechanics in the future. Clearly, many challenges remain and a variety of approaches should be pursued since for such nonlinear, multidimensional problems no method will outperform all others in every situation.

This general approach of multiscale dynamics of biomolecules brings a complementary approach to other areas of current interest in biological modelling which can enhance current research in molecular dynamics simulations for biomolecules (driven by atomic potentials), bioinformatics (statistical analysis of static data) and systems biology (closed loop input–output biological modelling). There are exciting times ahead as new methodologies are being developed to increase the relevance and applicability of computational chemistry, mathematical modelling and systems engineering to biological and medical systems. Ultimately, experimental verification is important and a crucial partner in this endeavour.

## Acknowledgements

We thank our research groups, especially Joao Costa, Martin Hemberg and Leonid Paramonov, who have contributed directly to the core of this work. We thank Kim Parker for fruitful discussions. Funding from GSK, ONR (USA), EU-FP6, the Maths Institute at Imperial to S.N.Y. and from EPSRC and the Royal Society to M.B. is gratefully acknowledged.

## Footnotes

One contribution of 20 to a Triennial Issue ‘Chemistry and engineering’.

- © 2007 The Royal Society