## Abstract

Theoretical advances are reported on the kinetics and thermodynamics of free and template-directed living copolymerizations. Until recently, the kinetic theory of these processes had only been established in the fully irreversible regime, in which the attachment rates are only considered. However, the entropy production is infinite in this regime and the approach to thermodynamic equilibrium cannot be investigated. For this purpose, the detachment rates should also be included. Inspite of this complication, the kinetics can be exactly solved in the regimes of steady growth and depolymerization. In this way, analytical expressions are obtained for the mean growth velocity, the statistical properties of the copolymer sequences, as well as the thermodynamic entropy production. The results apply to DNA replication, transcription and translation, allowing us to understand important aspects of molecular evolution.

This article is part of the themed issue ‘Multiscale modelling at the physics–chemistry–biology interface’.

## 1. Introduction

Copolymers are macromolecules composed of several species of monomeric units [1,2]. Their sequence may thus contain information for DNA, coding genetic information in biological organisms. Sequences are determined during the synthesis of the copolymers. There exist different kinds of copolymerization, including general processes where copolymers of any length can attach together or break, and living copolymerization with the attachment and detachment of monomers to one or both ends of the macromolecular chains, for instance with a catalyst. Copolymerization can proceed freely or directed by a template. Examples of free living copolymerization are the synthesis of copolymers composed of ethylene and 1-octene or norbornene using homogeneous catalysts [3,4], and lactide with caprolactone by ring-opening metathesis [5]. Examples of template-directed copolymerization are DNA replication, transcription and translation, which play a central role in biology [6].

In view of the importance of these processes in natural sciences, efforts have been devoted to develop the kinetic theory and thermodynamics of living copolymerization. Remarkably, the thermodynamic entropy production of these reactions depends not only on the free energy of monomeric attachment and detachment, but also on disorder in copolymer sequences, which is characterized by quantities from Shannon information theory [7,8]. Furthermore, the kinetic equations of living copolymerization can be exactly solved in the regimes of steady growth or depolymerization [9–15]. In order to study thermodynamics, it is required to include the detachment besides the attachment rates, which complicates the kinetic equations. Nevertheless, they can be solved, leading to powerful mathematical expressions for the mean growth velocity, the entropy production and the statistical properties of the synthesized copolymers. Computationally, these new theoretical methods can be 10^{5} to 10^{6} faster than kinetic Monte Carlo simulations with Gillespie’s algorithm [16].

The purpose of this paper is to report on recent advances in this topic. Section 2 is devoted to free living copolymerization and §3 to template-directed living copolymerization. The case of DNA replication is developed and implications for molecular evolution are discussed in §3. Conclusions are drawn in §4.

## 2. Free living copolymerization

### (a) Kinetics

Free living copolymerization proceeds by the attachment and detachment of different monomers *m*_{j}∈{1,2,…,*M*} to macromolecular chains:
2.1The process is supposed to go on without termination at one end of the chain. The solution is large enough so that the monomeric concentrations remain constant and the process can reach a regime of steady growth. Copolymerization can be promoted by a catalyst located at the end of the chain, as shown in figure 1*a*. Moreover, an external force *f* may be exerted on the catalyst. The control parameters are thus the monomeric concentrations and the possible external force *f*. If the solution is dilute, the copolymers do not interact with each other so that their concentrations are proportional to the probabilities that a single copolymer has the sequence *m*_{1}*m*_{2}⋯*m*_{l} at the time *t*
2.2in a volume *V* containing *N* copolymers. The time evolution of these probabilities is ruled by the kinetic equations
2.3where the first gain term describes the attachment of the monomer *m*_{l} to *m*_{1}⋯*m*_{l−1}, the second terms the detachment of the monomeric units *m*_{l+1} from *m*_{1}⋯*m*_{l−1}*m*_{l}*m*_{l+1}, and the loss terms the events occurring to the chain *m*_{1}⋯*m*_{l−1}*m*_{l} itself. The rates of attachment and detachment *W*_{±ml,l} depend in general on the monomer *m*_{l} that is attached or detached, possibly on the configuration of the macromolecule of length *l*, as well as on the external control parameters.

In the regime of steady growth, the growing chain has sequences characterized by a stationary probability distribution *μ*_{l}(*m*_{1}⋯*m*_{l}) so that the probabilities (2.2) factorize as
2.4where *p*_{t}(*l*) is the probability that the copolymer has the length *l* at time *t* [8]. After a long enough time, this latter becomes a Gaussian distribution of mean length 〈*l*〉_{t}≃*vt* with a constant growth velocity *v* and a variance also increasing linearly in time [11].

If the rates depend on *k* monomeric units behind the last one , it is known that the growing copolymer is a *k*th-order Markov chain so that the sequence probability distribution itself factorizes as
2.5in terms of the conditional probabilities *μ*(*m*_{j}|*m*_{j+1}⋯*m*_{j+k}) of the Markov chain and the tip probabilities *μ*(*m*_{l−k+1}⋯*m*_{l}) [11,14,17].

Analytical methods have been developed to determine these probabilities as well as the growth velocity [11,14].

### (b) Thermodynamics

In steady growth regimes, the entropy production of processes ruled by the kinetic equation (2.3) can generally be expressed as [8,9]
2.6in terms of Boltzmann’s constant *k*_{B}, the growth velocity *v* and the affinity *A*, which is the sum of the free-energy driving force
2.7and the Shannon disorder per monomeric unit in the growing sequences
2.8This latter is positive for a copolymer and zero for a pure polymer.

Because the entropy production (2.6) is always non-negative by the second law of thermodynamics, the affinity must be positive in a growth regime. Therefore, the growth is possible either in a favourable free-energy landscape if the free-energy driving force is positive, or in an adverse free-energy landscape if −*D*<*ϵ*≤0. In the latter case, the growth is driven by the entropic effect of sequence disorder [8]. At thermodynamic equilibrium, the growth velocity vanishes together with the affinity so that the free-energy driving force is equal to minus the sequence disorder: *ϵ*_{eq}=−*D*_{eq}.

### (c) The case of Bernoulli chains

If the rates only depend on the monomeric unit that is attached or detached *W*_{±ml,l}=*w*_{±ml}, the process yields Bernoulli chains (corresponding to *k*=0) [9]. Accordingly, the probability distribution (2.5) factorizes into the probabilities *μ*(*m*_{j}) to find the monomeric unit *m*_{j} anywhere in the sequence. If the growth velocity is equal to *v*, the net incorporation rate of the monomeric unit *m* is given by its attachment rate *w*_{+m} minus the detachment rate *w*_{−m} multiplied by the probability to find the unit *m* at the end of the chain
2.9Inverting this relationship, the probability is obtained as
2.10Because this probability distribution should be normalized to unity, we find the following self-consistent equation for the growth velocity in terms of the attachment and detachment rates
2.11

Figure 2 illustrates the copolymerization of two monomeric species if the kinetics obeys the law of mass action, according to which the attachment rates are proportional to the concentration of the attached monomer, *w*_{+m}=*k*_{+m}*c*_{m}; and the detachment rates are independent of the concentrations, *w*_{−m}=*k*_{−m}. In the concentration space, equilibrium happens on the line , as shown in figure 2*a*. The copolymer is growing for larger concentrations (figure 2*b*) and it undergoes depolymerization for smaller concentrations (figure 2*c*). In between, the growth velocity is vanishing, but the copolymer length fluctuates because of random attachment and detachment events (figure 2*d*).

Figure 3 shows another example with rates depending on an external force *f* as
2.12where *β*=(*k*_{B}*T*)^{−1}, *δ*_{±1}=*δ*_{±2}=±*δ*/2,
2.13As seen in figure 3*a*, the growth is stopped if a strong enough force is opposed, which defines the stall force: . The growth is driven by sequence disorder if −0.4055<*βfδ*≤0.09878 and by a positive free energy if 0.09878<*βfδ*. Figure 3*b* shows that the copolymer can grow against an opposed external force, while the corresponding pure polymers cannot because their velocity *v* becomes positive only at positive values of the rescaled external force *βfδ*. Accordingly, a mechanical work is generated by the sequence disorder of the growing copolymer, but not with the pure polymers [13,18].

## 3. Template-directed living copolymerization

### (a) Kinetics

In the presence of a template (figure 1*b*), copolymerization can proceed by pairing of monomers with complementary monomeric units of the template *α*=*n*_{1}*n*_{2}⋯*n*_{l}*n*_{l+1}⋯ before the elongation of the copy *ω*=*m*_{1}*m*_{2}⋯*m*_{l}, as in the fundamental biological processes of DNA replication, transcription and translation, which are catalysed by polymerases or ribosomes [6]. Under such circumstances, the rates *W*_{±ml,l} also depend on some subsequence of the template around the length *l* of the copy. In general, there are *M* different species of units *m*_{l} and *n*_{l}.

Information-containing templates are aperiodic so that the catalyst moves in a disordered landscape because of sequence heterogeneity [19]. Contrary to a usual random walk in a random environment where disorder is quenched [20–25], the motion of the catalyst also depends on randomness in the incorporation of monomeric units *m*_{l} in the copy. Indeed, replication errors may occur, which leads to possible mutations.

In the simplest process, the rates only depend on the pairing *m*_{l}:*n*_{l} that happens at the location *l*: *W*_{±ml,l}=*W*_{±ml,nl}. In this case, the growing copy forms a Bernoulli chain, as discussed in §2, and the probability to find the unit *m*_{l} paired with *n*_{l} is here given in [15]
3.1where *x*_{l} is a local velocity that varies with the location *l*. Because the probability distribution (3.1) is normalized to unity, we find that the local velocities are determined by the following *iterated function system* (IFS)
3.2running backwards along the template *α* [15]. The mean growth velocity *v* is thus given by
3.3Similar results hold for kinetic schemes depending on previously incorporated units *m*_{l−1}⋯*m*_{l−k} with *k*≥1 [15]. The IFS (3.2) allows us to compute the different properties 10^{5} to 10^{6} faster than Monte Carlo simulations. Furthermore, it predicts new regimes arising from sequence heterogeneity. In particular, it is known [26] that the IFS (3.2) may generate different kinds of invariant sets in the space of the local velocity *x*: point-like, fractal or continuous. In this last case, there exists a probability density, which may become singular as *p*(*x*)∼*x*^{γ−1} at *x*=0 with 0<*γ*<1. This implies that the random drift of the catalyst is anomalous with a vanishing mean growth velocity *v*=0, but a mean length growing sub-linearly in time as 〈*l*〉_{t}∼*t*^{γ}. Thermodynamic equilibrium happens if the growth stops, i.e. if *γ*=0. In the space of control parameters, there is thus a plain domain between the locus of equilibrium and the domain where the growth velocity is positive [15]. This plain domain where the growth velocity is zero does not exist if the rates can only discriminate between correct and incorrect pairing, in which case the effects of sequence heterogeneity disappear, the process is equivalent to free copolymerization, and the invariant set becomes point-like.

### (b) Thermodynamics

For template-directed living copolymerization, the entropy production is also given by equation (2.6), but the disorder *D* is now the conditional Shannon disorder of the copy *ω* with respect to the template *α* [8]. By a standard formula of information theory [27], it can be expressed as
3.4in terms of the overall disorder *D*(*ω*) of the copy and the mutual information *I*(*ω*,*α*) between the copy and the template. This latter quantity characterizes the fidelity of the copying process. As for free copolymerization, the growth of the copy can be driven either by free energy if *ϵ*>0, or by the entropic effect of the sequence disorder if −*D*<*ϵ*≤0 [8]. This result establishes a fundamental relationship between thermodynamics and molecular information.

### (c) DNA replication

These methods apply to DNA replication, for which the rates *W*_{±ml,l} have Michaelis–Menten dependences on the nucleotide concentrations. A key issue is to determine the probability of errors in replication (i.e. the probability of finding base pairs that are not of Watson–Crick type) and to understand how this probability depends on the kinetics of DNA polymerases and on the nucleotide concentrations. If errors are rare and uncorrelated, the error probability *η* determines the conditional Shannon disorder by and thus the thermodynamic entropy production (2.6) [12].

The kinetic parameters of the 16 possible pairings have been experimentally measured for several DNA polymerases, including the human mitochondrial one [28]. This allows us to set up kinetic models, which can be solved with the aforementioned methods [29,30]. Typically, exonuclease-deficient polymerases have an error probability in the range 10^{−6}– [29]. With exonuclease, DNA polymerases have a proofreading activity, reducing the error probability to the range –10^{−5} [30].

The error probability is enhanced if the nucleotide pool is imbalanced, as shown in figure 4 for the human mitochondrial polymerase. The error probability takes its lowest value, , if the four nucleotides have concentrations of the same magnitude, as under physiological conditions [31]. If there is some nucleotide imbalance, the error probability increases while the growth velocity decreases. These errors cause DNA damage, which may result into mutations if transmitted to offspring.

### (d) Molecular evolution

The kinetics of DNA polymerases plays a pivotal role in understanding mutation rates and molecular evolution. Mutation rates have been measured for different biological systems as a function of their genome size [32]. Viroids have mutation rates in the range just below the equilibrium limit, as if non-equilibrium constraints could remain minimal for the replication of such genomes. RNA viruses are in the range , which is consistent with the aforementioned values of the error probability for exonuclease-deficient DNA polymerases. The mutation rates drop to for DNA viruses in agreement with the fact that their DNA polymerases have acquired exonuclease proofreading. For bacteria and eukaryotes, the mutation rates are in the range , which is possible thanks to further proofreading with DNA mismatch repair [33–35]. In this regard, the general trends of mutation rates during evolution can be understood on the basis of biochemical kinetics. Furthermore, the genome size is observed to be inversely proportional to the mutation rate in accordance with the Eigen–Schuster theory [36–38].

Besides, the evolution of DNA sequences under repeated replications can be simulated using kinetic models of DNA polymerases. As seen in figure 5, the base pairs A:T become more frequent than C:G from replication to replication in agreement with the higher probability to find A or T in different genomes [39,40]. The reason is that the average physiological concentration [dATP] + [dTTP] is larger than [dCTP] + [dGTP] [31]. If an error occurs, the substitution by A or T is thus more frequent than by C or G. Accordingly, the nucleotide probabilities drift towards their stationary values after a number of replications that is inversely proportional to the error probability, *η*^{−1}≃10^{4}, as observed in figure 5. The time scale of this drift is thus given by
3.5in terms of the error probability *η* and the replication time scale *t*_{replication}.

## 4. Conclusion

This paper reports recent theoretical advances on the kinetics and thermodynamics of living copolymerization and its implications, in particular to understanding molecular evolution in biology.

Remarkably, the kinetics of free living copolymerization can be solved in the regimes of steady growth even if the detachment rates are not vanishing, extending the classic work by Mayo & Lewis [41] and Alfrey & Goldfinger [42]. If the rates depend on previously incorporated monomeric units, the growing copolymers form Markov chains instead of Bernoulli chains. It turns out that the thermodynamic entropy production depends on the Shannon disorder of the growing copolymer, which predicts the possibility of growth driven by the entropic effect of sequence disorder. The kinetics and thermodynamics of depolymerization have also been investigated.

These results can be generalized to template-directed living copolymerization. In the presence of sequence heterogeneity, the kinetics is reminiscent of random drifts in disordered media with the additional feature that the copy disorder may differ from the template disorder. Consequently, the process may become anomalous with the sub-linear growth in time of the copy. These methods apply to DNA replication, transcription and translation, allowing us to study proofreading as well as important aspects of molecular evolution on the basis of physico-chemical principles.

Besides template-directed copolymerization, sequence-defined copolymers can also be synthesized by stepwise coupling of monomeric units on a solid support [43–45]. Such processes are not autonomous because their rates are time-dependent and controlled with external conditions. Understanding the kinetics and thermodynamics of these iterative synthesis methods is also of great importance for future developments.

## Competing interests

I declare that I have no competing interests.

## Funding

This research is financially supported by the Université Libre de Bruxelles, the FNRS-F.R.S., and the Belgian Federal Government under the Interuniversity Attraction Pole project No. P7/18 ‘DYGEST’.

## Acknowledgements

The author thanks David Andrieux and Yves Geerts for fruitful discussions.

## Footnotes

One contribution of 17 to a theme issue ‘Multiscale modelling at the physics–chemistry–biology interface’.

- Accepted August 1, 2016.

- © 2016 The Author(s)

Published by the Royal Society. All rights reserved.