## Abstract

We describe the recent advances in studying biological systems via multiscale simulations. Our scheme is based on a coarse-grained representation of the macromolecules and a mesoscopic description of the solvent. The dual technique handles particles, the aqueous solvent and their mutual exchange of forces resulting in a stable and accurate methodology allowing biosystems of unprecedented size to be simulated.

This article is part of the themed issue ‘Multiscale modelling at the physics–chemistry–biology interface’.

## 1. Introduction

In the last decades, computer simulation has reached a high level of maturity and plays today a central role to investigate matter of growing complexity. As more computational methods became available and computing power grew at a rate following Moore’s law, the wealth of systems that can be accessed today by simulation extends on the everyday basis [1,2].

In the simulation of systems relevant to biology, the investigation of the cellular environment is considered the next frontier and it requires the development of appropriate multiscale methodologies. Pioneering work has recently attempted to model the interior of the cell-like environments [3,4] but was based on drastic simplifications, for instance by neglecting solvent-mediated interactions.

The cell interior is composed of several compartments: any closed part embedded in the cytosol, typically surrounded by membranes, such as in the case of organelles. These are specialized subunits that carry distinct functions, ranging from energy production to translation, folding, sorting and packaging of proteins, etc., typical examples being mitochondria, ribosomes, the Golgi apparatus or the endoplasmic reticulum. To execute their function, organelles display complex structures and internal organizations of large sets of proteins, whose type is closely related to the containing unit.

A major hallmark of cell compartments and the cytoplasm is the high level of crowding with small solutes, proteins, nucleic acids and membranes, such that macromolecules can reach a volume fraction of 40%. In addition, cell membranes are stuffed with lipids and proteins, some of which have large appendages or are tethered to skeletal proteins.

From the biological point of view, the consequences of crowding is still an open question. In fact, crowding exerts important effects on the transport of macromolecules and their aggregation, binding, reactivity and stability, cell metabolism, signalling and motility [5,6]. Given the wide diversity of cellular environments, it is quite arduous to determine a few working principles in the compartment organization. Crowding hinders solute diffusion, so that compartmentalized metabolism requires diffusive barriers to be overcome. A key characteristic of the cellular environments is the promiscuity of macromolecular interactions, such that functional interactions have to compete with a large number of non-functional ones. As a consequence, interactions in cells, such as protein–protein associations and enzyme reactions, are drastically altered. From the physico-chemical point of view, it is possible already today to shed some light on how crowding modulates the structural and dynamical features and on the role of non-functional interactions in influencing the properties of proteins.

The hydration level of soluble macromolecules is high enough in concentrated conditions to affect substantially the diffusional and transport properties of proteins. Hydrodynamic interactions are therefore key to understanding both the motion of proteins in dilute and in crowding conditions. This has been pointed out in seminal theoretical and experimental work [7–10].

From the previous discussion, it is clear that a crucial role is played by modelling appropriately the high diversity of biological environments. This is a formidable task and requires selecting the relevant degrees of freedom and propagating them in time subject to properly chosen boundary conditions. Computer simulation and modelling can play a synergistic role in this sense. In fact, an explicit particle-based representation of the macromolecules cannot be avoided, whereas hydrodynamics has to be accounted for in explicit terms.

Our approach to modelling the cellular environments takes these two elements in consideration by the following dual approach: (i) by describing macromolecules by using appropriate implicit solvent coarse-grained (CG) models at quasi-atomistic detail (i.e. the OPEP force field [11]), such that the secondary, tertiary and quaternary structures are well reproduced and (ii) by describing water as a dynamically responsive medium that obeys the laws of fluid dynamics and that is able to host the presence of embedded bodies (figure 1). Depending on the level of description of the biological process, the intramolecular flexibility of the macromolecules can be treated differently, from accurate standard intramolecular forces, including bonding, bending and torsional terms, to a cruder elastic network, up to a rigid body representation. At the same time, the fluid representation can be chosen with a certain liberty in order to capture either more microscopic or larger-scale features.

In this work, we first describe the lattice Boltzmann molecular dynamics technique [12,13] that we have recently implemented to simulate biomolecules, and then we present the results of exemplicative simulations on study-cases selected to demonstrate the potentiality of the method.

## 2. Methods

### (a) Coupling lattice Boltzmann and molecular dynamics

According to our computational scheme, macromolecules and the aqueous solvent are described differently. We employ particles to represent proteins and advance them in time via the molecular dynamics method, fluid populations are used to represent the aqueous solvent and advance them in time according to the lattice Boltzmann method. The resulting lattice Boltzmann molecular dynamics (LBMD) methodology handles the interactions between solute and solvent particles through a lumped representation of the local collisions, and consequently hydrodynamic interactions are naturally included in the simulation. Given the fact that we represent the solvent molecules at the probabilistic level, the dual simulation technique can host not only hydrodynamics, but also a wide range of physical elements, including thermodynamic forces, electrokinetics, pH-dependent forces and so on, being a flexible platform for multiphysics. At the same time, the LBMD is intrinsically multiscale, because the resolution of both components, the fluid and the molecular system can be chosen to cover a wide spectrum of conditions.

In the LBMD scheme, we take into account the presence of water via a minimalistic, yet explicit, representation of the solvent, with a computational cost that, at fixed concentration, is roughly proportional to the number of proteins in solution. While the explicit treatment of the solvent appears to burden the overall technique when compared with the Brownian dynamics simulation approach, the main advantage of LBMD is that the solvent locally responds to the motion of the embedded proteins without the need for specifying the system Green’s function to account for the hydrodynamic forces. In our scheme, the dynamical perturbations and fluid vorticity travel over arbitrary space and timescales and exchanges momentum with other proteins present in the system.

Similar to the mescoscopic description of the aqueous solvent, handling the macromolecule at the nanometric level is done by employing a CG representation of amino acids. A few successful force fields are available today to treat proteins and combinations of macromolecular entities, such as the Martini or other models [14–17]. In the last decade, we have been developing the optimized potential for efficient protein structure prediction (OPEP) force field as a result of intensive modelling and validation studies [11,18–25]. The molecular model is able to reproduce accurately the secondary and tertiary folding of proteins, intermolecular aggregations, internal stability and motions and so forth. In particular, OPEP is an implicit solvent molecular model, that is thermodynamic forces stemming from water–protein interactions are automatically included in the CG hamiltonian and do not need to be handled separately as part of the explicit solvent representation. However, hydrodynamics still requires to be included in full.

The equations of motion for the explicit particles follow a conventional second-order dynamics [13,26,27]
2.1where is a conservative force describing the sum of molecular interactions as encoded in the OPEP force field, and *M*_{i} indicates particle mass. *μ*_{i} is a white noise representing thermal fluctuations stemming from the fluid molecules, having zero mean and variance 〈*μ*_{i}(*t*′)*μ*_{i}(*t*)〉=2*k*_{B}*T*/*M*_{i}*δ*(*t*−*t*′), with *k*_{B} is the Boltzmann constant and *T* is the temperature.

The fluid-to-particle momentum exchange is modelled by a drag force exerted between the *i*th particle having velocity *V*_{i} and the fluid at position *R*_{i}

The parameter *γ* accounts for the Stokes-like fluid–particle coupling strength. It is related to the particle diffusivity via Stokes–Einstein relation *D*=*k*_{B}*T*/*γ*. In the case of macromolecules, a simple procedure is to set *γ* by reproducing the macromolecular diffusion constant at infinite dilution, if available from experiments or *ad hoc* calculations.

The OPEP CG model represents each amino acid by six centres of force. Each side-chain is represented by a bead and the backbone uses an atomic resolution with N, HN, C_{α}, C and O atoms. Proline is an exception, with all its heavy atoms considered [18,19]. The implicit solvent OPEP energy function, which retains structural accuracy and chemical specificity, is defined as a sum of local, non-bonded and hydrogen bonding (H-bond) terms and all analytical expressions are given in references [11,20]. Notably, H-bonds between backbone atoms are modelled by two- and four-body potentials, rather than Coulomb interactions, and the interactions between ion pairs were derived from all-atom PMF calculations. OPEP coupled to various sampling methods has been successful tested on many non-amyloid proteins, recovering experimental structures and thermodynamics properties, [21–24,28] and protein/protein complexes [25]. Applied to amyloid proteins, OPEP simulations were the first to predict various topologies, such as the β-barrels, that were later identified experimentally [29,30]. While the original OPEP function is free of any biases and allows an integration timestep of 1.5–2.0 fs for dynamics simulations, in this work, we also investigated the addition of elastic network forces to the OPEP Hamiltonian in order to explore protein diffusion a long timescale, through a timestep of 10 fs.

The lattice Boltzmann method describes the solvent in terms of the probability distribution of finding a particle at lattice site *x* at time *t* and moving in lattice space with discrete velocity *c*_{p}, called fluid populations *f*_{p}(*x*,*t*). The populations are defined on a cubic mesh having lattice spacing Δ*x* and are advanced in time over a timestep Δ*t* via a Boltzmann-like kinetic equation. Its simplest and most popular version is the so-called BGK equation
2.2

Here, the populations *f*_{p}(*x*,*t*) evolve towards the equilibrium target *f*^{eq}_{p} with a rate given by the characteristic relaxation frequency *ω*. The Chapman–Enskog multiscale analysis guarantees that at space–timescales larger than the mesh related ones, the kinetic equation (2.2) reconstructs the Navier–Stokes equation of fluid mechanics, with the relaxation frequency being related to the fluid kinematic viscosity via , where is the lattice speed of sound. The elementary numerical method can be interpreted in terms of pseudo-particles moving on the mesh and with displacements that occur between lattice neighbours. By employing the D3Q19 scheme [31], the information exchange occurs to first and second lattice neighbours, that is 18 discrete non-null vectors plus a null one that mimics particles at rest. Given the populations, the fluid density *ρ* and velocity *u* are constructed as and . It is a simple task to show that the thermodynamics of the solvent corresponds to an ideal gas, the fluid local pressure being *p*=*ρk*_{B}*T*.

Finally, the term *f*^{eq}_{p} in equation (2.2) is expressed as a quasi-Maxwellian term, whose explicit form is
2.3where *w*_{p} are lattice-dependent weights.

The term *g*_{p}(*x*,*t*) in equation (2.2) accounts for the presence an external forces *F*^{ext} and for the presence of the solute-induced drag force *F*^{D}, summed collectively as *G*=(1/*m*)[*F*^{D}+*F*^{ext}]. To the lowest order, it can be shown that , whereas, in practice, a higher-order version is needed to ensure a global second-order accuracy of the lattice Boltzmann solver [32,33]. Without delving into the analytical details, the method is typically extended to account for local fluctuations at the level of the stress tensor, such that fluctuating hydrodynamics is recovered [34]. In addition, it can be shown that local mass and momentum of the global particle-fluid elements are preserved, a key condition to obtain the correct fluid dynamic behaviour. It should also be remarked that the model can be directly extended to account for thermodynamic forces.

### (b) Physical considerations

From the practical point of view, given the resolution of our molecular model (OPEP), a convenient choice is to take the timestep Δ*t*=1–2 fs and mesh spacing Δ*x*=1–3 Å. These values guarantee a good resolution of the molecular dynamics solver and a rather over-resolved choice for the fluid solver. Before leaving this section, we remark two key points. At first, being nearly incompressible the typical biological solution operates a virtually zero Mach number, with density fluctuations *δρ*/*ρ*∼*Ma*^{2}, whereas the LB solver has finite compressibility [32].

Reproducing *Ma*≃0 conditions in simulation implies choosing a much smaller timestep that is unusable in practical cases (while for the current choice of mesh and temporal resolution, the speed of sound is Δ*x*/Δ*t*∼10^{5} m s^{−1}, that is much higher than that of water in standard conditions). Thus, the simulated fluid is more compressible than the real cellular environment and a proper analysis of the consequences of compressibility must be undertaken in specific cases. This is not a major drawback to the overall approach, because the quantities of interest are usually weakly dependent on the simulated *Ma*, under general biological operating conditions. Second, choosing Δ*x*∼1 Å implies a numerical Knudsen number (identified as Δ*x*/*σ*, with *σ* the solvent interparticle separation) that is close to unity, a value that produces a kinetic, rather than the hydrodynamic behaviour at the mesh spacing length scale, leading hydrodynamic behaviour at larger scales. In the physical macromolecular solutions, departures from equilibrium are typically mild and the physical *Kn* varies in the range 0.001–0.5. It should be borne in mind that the simulated Reynolds number is chosen to match exactly the physical value and in the subnanometric/picosecond scale the Reynolds number is typically . The physical Mach number is typically . Given the choice of the simulated *Ma* number (in the 10^{−3} to 10^{−2} range), it is verified *a posteriori* that the scheme guarantees high levels of numerical stability and accuracy.

### (c) Computational efficiency

Efficiency is a crucial feature of our computational approach due to the fact that water is reduced to handling a handful of degrees of freedom per mesh node. Given a mesh voxel of ∼1 Å ^{3}, when compared with a water molecule occupying a volume of 27 Å ^{3}, the cost of handling the single mesh node is still much more convenient than handling water explicitly via molecular dynamics owing to the local nature of the lattice Boltzmann collisions. In fact, aggregating floating-point operations with data movements to and from memory, one has to perform about 100 floating point operations per timestep; in contrast, computing forces in molecular dynamics requires about 10 times more operations, even for a CG version of the solvent.

As our effort aims at extending the spatial and temporal scales that can be accessed by the computational scheme, it should be remarked that by making use of a CG model for the macromolecules and the continuum-based methodology for water creates an intrinsically uneven distribution of particles in the simulated system. In parallel computing terms, one wants to distribute the simulation burden across multiple processing units in order to reduce systematically the time-to-solution. However, handling a sparse system is not the best configuration to handle, because load balancing is difficult to achieve and be guaranteed in the general case. New algorithmic paradigms are currently under investigation to cure this problem.

In terms of parallel computing, the lattice Boltzmann technique allows exploiting highly scalable algorithms to distribute the simulation burden across multiple processing units. The end result is that massive parallelism dramatically speeds up the treatment of the solvent and reduces the computational load to computing mostly the CG interparticle forces, with a highly favourable net efficiency.

## 3. Results

It is instructive to look at representative examples of application of the LBMD method applied to non-trivial biological systems, in particular including multiple macromolecules in solution and complex interactions with the fluid solvent. These examples show the capabilities of the method in typical operating conditions, whereas much more non-conventional settings can be built to look at inhomogeneous or compartmentalized conditions. All simulations in this section were performed by using the MUlti-PHYsics (MUPHY) software package, developed over the years to enable the simulation of biological and non-biological composite systems [35].

### (a) Diffusion in crowded conditions

Diffusion of crowded protein solutions is a significant application to shed light on the possibility that high concentration produces a marked slowing down of diffusion as well as anomalous and complex behaviours, such as size-dependent diffusivity. Experiments have probed a reduction of protein mobility by about one order of magnitude when the fraction of occupied volume approaches the level of the *in vivo* cellular environment (about 30% in volume). This reduced mobility depends on the fold and size of the proteins as well as on the nature of the crowders [8,36–38]. It is debated whether solvent-mediated interactions play a key role on top of excluded volume interactions. It is also debated how crowding affects the nature of diffusivity, for example subdiffusive versus standard diffusive dynamics [5,39].

In a preliminary investigation reported in reference [11], the LBMD technique was tested to simulate a huge system composed of about 18 000 rat I proteins modelled using the OPEP force field. The excellent scalability of the method allowed allocation of the run on 18 000 GPU and reach a few tenths of nanoseconds. The analysis showed that increasing the crowding condition protein diffusion decreases as expected by experiments on a protein of similar size. However, in order to investigate the effect of crowding on protein mobility in a more systematic way and extending the timescale of the process simulated, we have performed now extensive simulations on a homogeneous solution of proteins at varying concentration. The protein investigated is the chymotrypsin inhibitor 2 (CI2), which has been extensively studied experimentally by Wang *et al*. [36]. We have built several systems placing randomly 70 proteins in periodic cubic boxes of different volume, 135 Å ≤*L*≤250 Å , therefore spanning crowding condition from 5% up to 32% of volume fraction. CI2 is composed by 65 amino acids, and the total number of pseudo-atoms in the systems was about 23 000. If transposed to the explicit solvent atomistic resolution, then the largest system considered would count about 1.5 million of particles. Interactions were based on the OPEP potential to account for interprotein forces and on an elastic network model to replace the intramolecular interactions with a set of springs that preserve the protein internal structure within a prescribed tolerance (details will be reported in a forthcoming publication). This choice allows us to use larger integration timesteps (10 fs) and thus sample the dynamics over hundreds of nanoseconds (500 ns). The diffusion coefficient was measured from the linear time component of the mean-square displacement at short times below 10 ns, with a block analysis over the last 200 ns).

Figure 2 illustrates the results of this set of LBMD simulations and are compared with the experimental results of Pielak obtained on the CI2 protein embedded in crowded solutions where the crowding was caused by a different type of protein (lysozyme, BSA, ovalbumin). Our numerical results show a strikingly quantitative agreement between simulations and experiments, as shown in figure 2. This proves that the inclusion of the elastic model and hydrodynamic interactions are sufficient ingredients to match the experimental data, and at the same time open the way to study the complex behaviour of the crowded environment. Interestingly, from direct inspection of the mean-square displacement, we observed subdiffusive behaviour on the subnanosecond timescale. At 32% the analysis shows that the dynamics scales as MSD ∼*t*^{0.67} and at 26% as MSD ∼*t*^{0.75}, becoming less marked at decreasing volume fraction.

### (b) Amyloids

As a second example, we deployed the methodology to study complex processes controlled by diffusion and solvent-mediated correlations as the aggregation of amyloid proteins into unsoluble well-organized fibrils. This process is the hallmark of several neurodegenerative diseases (Alzheimer, Parkinson and Huntington) [40,41] and involves many length and timescales: fibrils extend up to hundreds of nanoseconds, and the timescale of full growth exceeds hours *in vitro*.

The stability, the conformational landscape and the aggregation process of small oligomers—now known to be highly toxic species—can be accessed by molecular dynamics at the atomistic resolution in combination with enhanced sampling methods [42]. However, in order to explore the early phase of aggregation of a larger number of peptides, it is required to model the system at the CG level with implicit solvent [43].

The LBMD technique based on the OPEP CG force field [12] is an excellent compromise to study massive aggregation including naturally the solvent-mediated correlations. We have already shown [12] that hydrodynamic interactions speed up the association of short amyloid peptides similarly to what observed by others in Brownian dynamics simulations including hydrodynamic interactions of a simplified model of lipid [7].

We present here a synthesis of an extended simulation of a very large system composed of 1000 amyloid peptides A*β*_{16−22}. Initially, the peptides were distributed randomly in a simulation cubic box of size 300 Å. The simulation was performed, using the standard OPEP model (no elastic network) with a lattice resolution of 2 Å, timestep of 1.5 fs. The coupling parameter *γ* was set to 0.05 fs^{−1}, a value that allows to recover for the elementary species (monomer and trimer) the experimental diffusivity when the kinematic viscosity of the fluid is set to the value of water. This system, if transposed to all-atom resolution with explicit solvent, would count about 2.4 million particles. It is worth noting that so far the simulation of the aggregation processes of ‘realistic’ molecular models of amyloid peptides was limited to a much smaller number of peptides [44,45]. During the simulation the peptides aggregate with two characteristic times, a fast time describing the encounter of monomers (a few nanoseconds), and a slower one (tenths of nanoseconds), describing the further fusion of smaller size oligomers. The evolution of the aggregation process is represented in figure 3 where we report the probability distribution of the aggregate sizes for different blocks of the simulation. During the first 20 ns, the majority of the aggregates have a size *n*<10, and the largest cluster is formed by approximately 20 monomers. At longer times, the size of the oligomers increases as mirrored by the broadening of the distribution. The largest cluster reaches in a hundred of nanoseconds a final size of about 100 monomers, 10% of the total system, with 10% of β-sheets content. Interestingly, the largest cluster is organized as a branched structure. Lateral branching is described theoretically by a second nucleation mechanism [46], but while observed experimentally for some proteins [47,48], it has never been reported from molecular simulations. In a recent work [49] reporting on the effect of hydrodynamics on the aggregation process of Aβ_{16−22} peptides in systems of different sizes, we have demonstrated that hydrodynamic interactions not only speed up the aggregation with respect to standard Langevin dynamics, but also enhance the fluctuations of the sizes of the formed clusters along the aggregation. It was speculated that the peptides aggregates act on the solution as active particles [9], and the change of their conformations and sizes generate coherent fluid flows that further favour the fusion of small entities in larger complexes.

### (c) Proteins under shear

Another class of biophysical phenomena for which the proposed methodology will be relevant involves the response of proteins to fluid-induced mechanical stresses, tensile forces. In the vascular system catch-bonds proteins, such as FimH [50], granting cell adhesions are activated by shear flow. Similarly, vessel injuries, altering the blood flow, activate the response of the multimeric von Willebrand factor (VWF), which favours the anchor of platelets at the injured spots.

The VWF is a linear multimeric protein that can extend up to 15 μm. The chain is formed by dimers linked by disulfide bridges at the N-terminals. Each monomer is formed by a series of domains of specific binding functionalities, e.g. A1 binds platelet GPIb and A3 domain to subendothelial collagen exposed at sites of vessel injury. In the static condition, VWF chains are collapsed but extend under the effect of enhanced flow. This elongation comes along with the unfolding of the domain A2 considered the hydrodynamic sensor of the protein [51]. Shear-induced opening of A2 domain favours its cleavage by ADAMTS13, and probably enables binding capability of adjacent domains [52]. Understanding how the single A2 domain, and multiple interacting domains [53], respond to external forces is therefore crucial for understanding how mechanical stresses activate the VWF protein and its functionality.

To demonstrate the capability of our computational tool, we have performed LBMD simulations of the A2 domain subject to shear flow using the standard OPEP model, timestep 1.5 fs and lattice resolution of 3 Å. In this preliminary test, the shear rate used in the simulations is orders of magnitude higher than the physiological condition (10^{11} versus 10^{3} s^{−1}). This extreme value allowed us to monitor unfolding events at the nanosecond timescale.

The A2 domain under the effect of the elongational component of the shear flow starts to disassemble, see the pictorial representation in figure 4. The first unfolding step interests the peripheral regions of the domains highlighted by the red and black circles. This is caused by the disassembling of the protein hydrophobic core formed by six β strands (β_{1}β_{2}β_{3}β_{4}β_{5}β_{6}). In fact, the first unfolding interests β_{5}β_{6}, and at the other extreme, the strands β_{2}β_{3}. The new structure, much more elongated, breaks apart when the central part of the hydrophobic core represented by the strands β_{2}β_{1}β_{4} cracks. The unzipping of β_{2}β_{1} occurs as the last unfolding step, anticipated by β_{1}β_{4} separation. Interestingly, only when the central core unfolds, the end-to-end distance of the domain significantly increases, and reaches a maximal extension of about 8 nm. At the timescale considered, the chain is not completely elongated. In AFM pulling experiments, the mechanical force required to unfold the A2 domain is the range of 10–20 pN with a loading rate comprises in the range 0.1–10 *nN* s^{−1}. When considering, as local reaction coordinate, the hydrogen bonds formed between the β-strands structures, the solvent-induced drag component is just a fraction of pN. This seems to indicate that for a complex fold as the one of VFW A2, the deformation of the protein matrix owing to the elongational term of the shear flow occurs first and then propagates down, mechanically, to the elementary molecular bricks that stabilize the fold, i.e. the secondary structure HBs. It is clear that many independent simulations should be produced to determine the precise sequence of the unfolding events. In fact, at lower shear rate, the tumbling of the protein owing to the rotational component of the flow, could expose to deformation different fragments of the protein matrix and induce a different downhill cracking of the fold. As a next step, the LBMD technology based on the quasi-atomistic force field for proteins OPEP will be deployed to study the shear rate dependence of the A2 unfolding rate, and, following the pioneering work of Katz and co-workers [54], of domains separation.

## 4. Conclusion

In summary, using a multiscale simulation method that represents water and proteins at the mesoscopic level paves the way for studying massive macromolecular systems composed of thousands of proteins. Such a possibility has already been explored by using specialized computational infrastructure made of thousands of GPUs [55].

Importantly, the LBMD computational scheme offers an original approach to multiscale simulations. In fact, the multi-resolution lattice Boltzmann scheme is today available [56], as much as devising methods where the particle resolution is handled differently in different regions of the system [57–59] and, in essence, allowing to handle situations where both the fluid and particle components cross the scale boundaries.

## Data accessibility

The files containing the processed Protein Databank structures relative to CI2, amyloid peptides and the A2 domain of VWF and containing the OPEP force field parameters are publicly available upon request to the authors. The trajectories of the simulated systems are also made available to users upon request to the authors and for file size not exceeding 10 GB.

## Competing interests

We declare we have no competing interests.

## Funding

The research leading to these results has received funding from the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013) grant agreement no. 258748. Part of his work was performed using HPC resources from GENCI (CINES and TGCC; grant no. x201576818).

## Acknowledgements

We acknowledge the financial support for infrastructures from ANR-11-LABX-0011-01. The authors thank G. Pielak for useful discussions.

## Footnotes

One contribution of 17 to a theme issue ‘Multiscale modelling at the physics–chemistry–biology interface’.

- Accepted August 3, 2016.

- © 2016 The Author(s)

Published by the Royal Society. All rights reserved.