## Abstract

Recently, analytical techniques have been developed for detecting single-nucleotide polymorphisms in DNA sequences. Improvements of the sequence identification techniques has attracted much attention in several fields. However, there are many things that have not been clarified about DNA. In the present study, we have developed a coarse-graining DNA model with single-nucleotide resolution, in which potential functions for hydrogen bonds and the *π*-stack effect are taken into account. Using Langevin-dynamics simulations, several characteristics of the coarse-grained DNA have been clarified. The validity of the present model has been confirmed, compared with other experimental and computational results. In particular, the melting temperature and persistence length are in good agreement with the experimental results for a wide range of salt concentrations.

## 1. Introduction

Rapid development of DNA sequencing techniques make it possible to find single-nucleotide polymorphisms (SNPs) that cause serious diseases such as cancer (Miki *et al.* 1994; Risch *et al.* 2001; Velasco *et al.* 2007). For example, denaturing gradient gel electrophoresis (DGGE) is known to be one of the major techniques that can detect SNPs in several hundred base alignments (Myers *et al.* 1985; Sheffield *et al.* 1989; Abrams *et al.* 1995; Michikawa *et al.* 1997). DNA fragments that have SNPs in the sequences show different electrophoretic flows caused by different interactions between dissociated base pairs and the surrounding gel media (Ganguly *et al.* 1993). However, the detailed mechanism of mobility reduction is not yet understood. There are some difficulties in clarifying the nature because the target, about 50 nm in length, is very small for experimental observation, but very large for all atom molecular simulations.

In order to overcome these difficulties, several coarse-graining methods have been developed. DNA models such as the bead-spring model (Deutsch 1988; Chen *et al.* 2005; Streek *et al.* 2005; Nagahiro *et al.* 2007; Trahan & Doyle 2009), the worm-like chain (WLC) model (Jian & Vologodskii 1997; Wang & Gao 2005) and the double-strand model (Cocco & Monasson 1999; Lyubartsev & Laaksonen 1999; Drukker *et al.* 2001; Tepper & Voth 2005; Knotts *et al.* 2007; Mielke *et al.* 2008) are well known. In the bead-spring model, the coarse-grained particles are about 100 nm in diameter, and the particles are connected with springs. Hydrodynamic interactions between particles can be treated effectively, although bending elasticity of DNA is not considered. Bead-spring models are available for the case that the contour length *L* is larger than the persistence length *P*. It is known that flow in very narrow or confined channels is effectively simulated using the bead-spring models (Chen *et al.* 2005). On the other hand, the WLC model is known to be effective when *L* is nearly equal to *P* (Jian & Vologodskii 1997; Wang & Gao 2005). A potential function for bond angles is considered based on the bending elasticity.

In order to simulate further characteristics of DNA, such as melting, hybridization or loop structures, the double-stranded structure of DNA is required to be constructed in the coarse-graining procedure. These characteristics are caused by making or breaking hydrogen bonds between base molecules. In particular, differences between adenine-thymine (AT) and guanine-cytosine (GC) base pairs (bps) are the nature of the phenomena. Several kinds of the double-strand model have been developed for each purpose. For example, Drukker *et al.* (2001) investigated the melting property of DNA using a model in which a single nucleotide was replaced by two particles. The coarse-grained particles, which was 1 nm in diameter, represented parts of a base and backbone. They succeeded in realizing the melting phenomenon of AT and GC fragments. Knotts *et al.* (2007) replaced a single nucleotide with three particles that represented a phosphate group, sugar and a base. In addition, negative charges on phosphate groups were considered and the distribution of counter cations was treated using the Debye–Hückel approximation. Using the model of 60 bps, they found that not only were loop structures observed in AT-rich parts, but also the hybridization phenomenon. Mielke *et al.* (2008) constructed a model in which three nucleotides were replaced by a particle. Their model succeeded in realizing the persistence length of double-stranded chains and clarified that counter cations around DNA fragments obeyed the Poisson–Boltzmann distribution. However, their model did not have single-nucleotide resolution.

In the present study, we develop a coarse-graining model focusing on the dynamical aspects of double-stranded B–DNA. One of our interests is to clarify the detecting process of different DNA sequences in electrophoretic flow. Therefore, our goal is to make this model available for fluid-dynamics simulations of DNA, which are influenced by viscosity, salt concentrations and external electric fields. The simulation techniques have not been developed sufficiently, although each property of DNA was realized by coarse-graining methods from previous work. In our model, a single nucleotide is replaced by a coarse-grained particle. Several kinds of potential functions are adopted for interactions between the particles. We define hydrogen bonds in base pairs by Morse-type functions and particularly adopt weak harmonic oscillators for the *π*-stack effect in the base-pair stacks. Using the present model, the melting temperature and persistence length of DNA sequences are evaluated; these are variables for measuring the bending characteristics. The relation between persistence length and salt concentrations is discussed and compared with experimental data. The scalability of the model from 10 to 155 bps is demonstrated.

## 2. Development of the coarse-graining DNA model

### (a) Potential energy functions in coarse-grained DNA

An SNP is observed within about 1000 bps. Single-base-pair differences can be found clearly as a result of electrophoresis. However, the behaviour of DNA fragments in the gel solution has not been understood sufficiently. In the present study, we develop a coarse-graining model of DNA that has single-nucleotide resolution. A schematic illustration of our model is presented in figure 1. Figure 1*a* shows an image of the all-atom model of double-stranded DNA and (*b*) represents its coarse-grained model. The coarse-grained particles are located at positions of each P atom in the backbones. Interactions between the particles are represented by several potential functions,
2.1
Here, bond lengths and bond angles are assumed to be oscillated around the equilibrium structures,
2.2
and
2.3
These potentials can be well applied to many models in which the interactions can be assumed to be harmonic oscillations. Here, *k*_{b} and *k*_{θ} are force constants in the oscillators and *d*_{eq} and *θ*_{eq} are bond length and bond angle in the equilibrium structures, respectively. The *V* ^{bond} and *V* ^{angle} depend on chemical bonds and molecular sturctures, which are dominated by short-range interactions. They are defined between the nearest-neighbour particles that represent single nucleotides. On the other hand, the long-range interactions are represented by *V* ^{elec}. Moreover, hydrogen bonds between base molecules are known to be written by Morse-type functions as follows:
2.4
A binding energy of hydrogen bonds given by *D* in equation (2.4) is different between each complementary base pair. Hydrogen bonds in a GC pair are considered to be 1.5 times stronger than those in an AT pair because of the difference in the number of hydrogen bonds. The constant *l*_{eq} is an equilibrium distance between base pairs and *α* is a parameter depending on the force constants. Furthermore, the potential function caused by *π*–*π* interactions in base-pair stacks is defined as follows:
2.5
The interaction between base-pair stacks is assumed to be that of harmonic oscillations. The constants *k*_{s} and *h*_{eq} are the force constant and the equilibrium distance between the base-pair stacks, respectively. The *h*_{eq} is estimated to be 3.40 Å and referred to as the double-stranded structure of B–DNA. The force constant of *k*_{s} is represented by *k*_{s}=*k*_{B}*T*/*γ*^{2} and the averaged thermal fluctuation is assumed to be *γ*=0.10*h*_{eq}. This stacking potential plays an important role in the double-stranded structures of DNA. Furthermore, negative charges on the phosphate groups are taken into account. It is known that the electrophoresis of DNA fragments is caused by the response of these negative charges to external electric fields. The electrostatic interaction is represented by means of the Debye–Hückel approximation,
2.6
Charges of *q*_{i} and *q*_{j} are located at the distance *r*_{ij}. *N* is the number of charged particles. The constants *ε*_{0} and *ε*_{k} are the permittivity of the vacuum and the relative permittivity of the aqueous solution, respectively. The factor *λ*_{D} is the Debye length. *N*_{A}, *k*_{B} and *T* are Avogadro’s number, the Boltzmann constant and temperature, respectively. *I* is the ionic strength of the salt. In the present study, negative charges on each particle are fixed at −1 in the elementary charge.

The characteristics of the potential functions are verified, compared with the melting temperature of DNA fragments. In order to recognize the result, parameters in each potential function are determined as below. The force constant of a chemical bond is defined as *k*_{b}=*k*_{B}*T*/*δ*^{2}. The *δ* represents an average of the thermal fluctuation and the value is set to be 0.01 *d*_{eq}. The bending elasticity of the backbone is represented by the angle potential of equation (2.3). The force constant of the angle potential is defined as *k*_{θ}=*k*_{B}*TP*_{ss}/*d*_{eq} (Hagerman & Zimm 1981). Here, *P*_{ss} is the persistence length of a single-stranded DNA. The value is assumed to be 4.0 nm (Tinland *et al.* 1997). The equilibrium angle of *θ*_{eq} is set to be 180.0° based on an assumption of the isotropic nature of the bending elasticity. Figure 2 shows the angles between the nearest neighbours in the optimized base-pair stacks. The angles are estimated from the inner product of the base-pair axes between the nearest neighbours. The plots show the averages of 10 bp fragments. According to *γ*, the stacking angle settles at different values. When *γ* is 0.10*h*_{eq}, the averaged angle between the stacking pair converges to 35.4°. The Morse potentials for hydrogen bonds are determined reasonably in order to realize the melting temperature. In equation (2.4), the binding energies of hydrogen bonds *D*_{GC} and *D*_{AT} are estimated as 28.8 and 19.2 kJ mol^{−1}, respectively. Moreover, the parameter *α* in equation (2.4) influences the force constant defined as 2*α*^{2}*D*. The force constant is a key factor to determine the melting temperature of the coarse-graining model. The constants, *α*_{GC} and *α*_{AT}, are set to be 4.5 Å and 4.0 Å, respectively. Figure 3 shows the computational results of the melting temperature for 10 bp poly(dG)·poly(dC) and 10 bp poly(dA)·poly(dT). These plots were obtained from the averages of 200 ns samplings. The validity of the values are confirmed to be in agreement with the results of the nearest-neighbour method (Santalucia *et al.* 1996). The melting temperature is determined from half-cleaved structures of the base-pair fragments. A base-pair distance longer than *l*_{eq}+6/*α* is considered to be cleaved according to the consideration of Cocco & Monasson (1999). The temperatures are estimated as 338 K and 304 K for poly(dG)·poly(dC) and poly(dA)·poly(dT), respectively. On the other hand, the nearest-neighbour method estimates the melting temperatures as 330 K and 295 K for poly(dG)·poly(dC) and poly(dA)·poly(dT), respectively. The present results are in good agreement with the results from the nearest-neighbour method within the errors of 2.4 per cent for poly(dG)·poly(dC) and 3.0 per cent for poly(dA)·poly(dT). In the present study, using the potential functions and parameters described above, the melting temperatures and bending properties of the coarse-grained DNA are computed and compared with other theoretical and experimental results.

### (b) Langevin-dynamics simulation

Langevin-dynamics computations are carried out for DNA fragments in an aqueous solution. The equation of motion of the *i*th particle that moves with velocity **v**_{i} is written as
2.7
where *m*_{i} is the mass, is the viscous drag and is the random force caused by thermal fluctuations. The inertial motion of the coarse-grained particles is so small when compared with the other terms that it can be ignored. The random force is assumed to be generated with the following conditions:
2.8
and
2.9
The viscous drag is defined as according to Stokes’s law and the friction coefficient *ζ* can be replaced by 6*πηa*, with viscosity of water *η* and radius of the particle *a*. In the present computation, the radius of a single nucleotide is represented by *a*. The stochastic Runge–Kutta procedure (Honeycutt 1992) is employed to integrate the Langevin equation. A time step of 10 fs is chosen for the numerical integrations.

## 3. Results and discussion

### (a) Validation of coarse-graining DNA model

Taking the thermal fluctuations into account, Langevin-dynamics simulations have been carried out to analyse the dynamical properties of DNA fragments. According to the topological symmetry of B–DNA, El Hassan & Calladine (1995) suggested several indices to characterize molecular condition. In their model, a set of two base pairs was divided into six degrees of freedom, such as tilt, roll, twist, slide, shift and rise. Figure 4*a* shows a schematic of six degrees of freedom. Base molecules represented by plates are connected rigidly with another complementary pair. Three axes are defined for translations of the base pair and the other three are for rotations. In our model shown in figure 4*b*, however, two nucleotides replaced by particles are connected with a line, and only four degrees of freedom, such as twist, slide, shift and rise, can be compared with the model of figure 4*a*. The simulations for a 155 bp sequence were carried out for 100 ns at the temperature of 300 K. Table 1 shows each parameter resulting from Langevin-dynamics analysis for several salt concentrations. The time-averaged rotational angle 〈twist〉 increases as a function of the salt concentration. This is because increase of the salt concentration causes screening of negative charges on the backbone, the electrostatic repulsion between the particles is weakened and the force from the stacking potential turns to be effective to twist the double strand. On the other hand, 〈rise〉 decreases and the molecular contour length is shortened as 〈twist〉 increases. In addition, 〈slide〉 and 〈shift〉 are not influenced more by the screening effect. The results are compared with other molecular dynamics computations (Šponer & Lankaš 2006) and crystal-structure analyses (Olson *et al.* 1998). Our results show good agreement with those theoretical and experimental analyses. Furthermore, our averaged parameters show the dependency on salt concentrations. Therefore, the coarse-grained model is expected to realize dynamics of DNA fragments in some kinds of aqueous solution.

### (b) Melting temperature of DNA fragments

In order to discuss the melting properties of our model, the melting temperature was calculated using the Langevin-dynamics simulation. Several B–DNA sequences presented in table 2 were prepared for comparison with experimental results (Zeng *et al.* 2004). As a result of 100 ns samplings in each single trial, ratios of hydrogen-bond breaking were estimated as shown in figure 5. In the present case, the ionic strength of the solvent was set to be 50 mM. The dissociation of a complementary base pair was judged from the bond length of *l*_{eq}+6/*α* (Cocco & Monasson 1999), where *l*_{eq} and *α* are parameters of equation (2.4). The melting temperatures, which were determined from the half-cleaved point, are summarized in table 3. The half-cleaved points were determined from the point at the intersection with 0.5 using linear interpolation. In the present model, hydrogen-bond breaking is mainly caused by the *V* ^{bp} adjusted to the melting properties of 10 bp poly(dG)·poly(dC) and poly(dA)·poly(dT). Results from our simulations are usually in agreement with the experimental data (Zeng *et al.* 2004), particularly for the cases of L19AS2, L33B9, L42B18 and L48AS. The errors from these models are as small as the cases of 10 bp poly(dG)·poly(dC) and poly(dA)·poly(dT). L19AS2 and L48AS sequences have many AT pairs near the edge. Therefore, base-pair cleavages occurred and expanded from the edge as the temperature increased. On the other hand, in L33B9 and L42B18 sequences, GC pairs are concentrated at both ends. These alignments seem to make the structures toughen and retard their melting. It is found that the adjustment for small DNA fragments, such as 10 bp poly(dG)·poly(dC) and poly(dA)·poly(dT), is available for larger models. This coarse-graining technique has scalability about the size and can realize melting characteristics of double-stranded DNA. Meanwhile, the disagreement is remarkable in the case of L60B36, although the error is at most 7.1 per cent. The melting property is known to be enhanced by the expansion of cleavages at the ends or middle parts of a fragment (Barbi *et al.* 1999; Hwa *et al.* 2003; Kalosakas *et al.* 2004; Ares *et al.* 2005). In the present computations, the dissociation of complementary base pairs tended to expand from the edges at temperatures higher than 350 K and from the middle parts at less than 340 K. Because of these different melting characteristics, the melting temperature of L60B36 was overestimated, compared with the experimental result (Zeng *et al.* 2004). In the experiment, the L60B36 sequence is considered to have loop structures at the middle parts. Therefore, the melting temperature is expected to be lower than that of the completely closed structure. The present result suggests that loop structures formed at the middle parts of a sequence enhance the melting at lower temperature than cleavages from the ends of GC clamps.

### (c) Persistence length

The persistence length of DNA fragments is analysed quantitatively in order to discuss the bending properties. The persistence length is mainly characterized by bending elasticity and Coulomb interaction between the coarse-grained particles. Baumann *et al.* (1997) calculated the persistence length by fitting their tension test for DNA fragments to the WLC model. They reported that their results agree well with computational results based on Poisson–Boltzmann theory. The persistence length of our DNA model is analysed using an approximation by Hagerman & Zimm (1981),
3.1
Here, *R*_{a}=*τ*_{a}/*τ*_{B} denotes a proportion of the approximated rotational relaxation time *τ*_{a} to Broersma’s rotational relaxation time, *τ*_{B}. The variable *X*=*L*/*P* is defined as a proportion of the contour length *L* to the persistence length *P* that should be determined. Equation (3.1) is known to be effective for the condition *L*/*P*≥0.5 (Hagerman & Zimm 1981). The approximated rotational relaxation time *τ*_{a} is defined as
3.2
where 〈*D*_{x}〉 and 〈*D*_{y}〉 are time averages of the rotational diffusion coefficient for the *x*- and *y*-directions, respectively. Broersma’s relaxation time is calculated according to the relation 1/*τ*_{B}=6*D*_{B}. The rotational diffusion coefficient *D*_{B} is defined as follows (Hagerman & Zimm 1981):
3.3
where *h* denotes the height of the coarse-grained DNA referred to as Broersma’s cylinder and *b* is its radius. The value *h* is estimated as *h*=*n*〈rise〉, where *n* is the number of base pairs. The value *b* is determined as the cylindrical radius that denotes the equivalent volume as the coarse-grained base pair,
3.4
In this study, the hydrodynamic radius *a* of a single nucleotide is set to be 10 Å. The Langevin-dynamics simulations were performed for a 155 bp fragment at 300 K for 100 ns. Figure 6 shows the molecular structures at 100 ns, obtained in salt concentrations of 1, 10 and 150 mM. The molecular structures depending on the salt concentration are visually characterized. The higher the salt concentration, the longer the distance between both ends are because of the screening effect on the negative charges of the backbones. For qualitative validation, these molecular structures were analysed from the averages of 10 000 time-samplings for 100 ns. Figure 7 shows the salt-concentration dependency of the rotational relaxation time, comparing between Broersma’s model and our coarse-graining model. In both models, the rotational relaxation times tend to be reduced as the salt concentration increases. At each salt concentration, the rotational diffusion coefficients are confirmed to be converged enough during 100 ns samplings, although the figure is omitted here. Substituting the proportion of these two rotational relaxation times in equation (3.1), the relation between the persistence length and the salt concentration can be evaluated. Figure 8 shows a comparison between our computational results and other experimental results (Baumann *et al.* 1997). The experimental data were applied to some coarse-graining models, such as the inextensible WLC, strong stretching limit and extensible WLC. Another estimation using Poisson–Boltzmann theory is also presented in figure 8. In this case, DNA fragments are assumed to have a cylindrical structure that has a homogeneous electronic distribution. The persistence length *l*_{p} was estimated as (Baumann *et al.* 1997)
3.5

where *l*_{p0} was the non-electrostatic contribution and was set to be 50 nm, *l*_{B} was Bjerrum length, *κ*=1/*λ*_{D}, and *I* was salt concentration. As shown in figure 8, the persistence length resulting from our computations decreases as a function of the salt concentration. This tendency agrees with experimental analyses and the Poisson–Boltzmann theoretical model. However, differences between each model appear at the high salt concentration near 100 mM. Therefore, it can be said that our coarse-graining model explains the behaviour of DNA at low salt concentrations. Reasonable quantities have been obtained from our potential functions, which depend on the charge distribution and elasticity of a DNA fragment. The screening effect represented by the Debye–Hückel approximation is considered to be appropriate. However, the convergence at high salt concentrations appeared to be slower than the experimental and the theoretical estimations. This is because the rotational relaxation times and the distance of base-stack 〈rise〉 moderately change as the salt concentration becomes higher. The improvement of the difference with other models is a subject of future study. In the present DNA model, instead of a conventional potential for dihedral angles along the backbones, the stacking potential *V* ^{stack} that represents the *π*-stack effect between base pairs was adopted effectively to maintain its double-stranded structure. In base-pair stacks, *π*–*π* interactions turn to be effective as electrostatic repulsive interactions between the particles are reduced because of the screening effect. Therefore, differences between the potential functions clearly influence the persistence length that depends on the rotational relaxation time and the distance of base stack 〈rise〉. In the present study, it is confirmed that the double-stranded structure is maintained by *π*–*π* interactions between the base stacks.

## 4. Conclusion

In the present study, a coarse-grained model of B–DNA has been developed. The potential function for *π*-stacks of base pairs and single-nucleotide resolution are particularly emphasized. Using the coarse-grained model, several characteristics have been concluded as follows: (i) the robustness of the double-stranded structure was confirmed in a wide range of salt concentrations compared with experimental (Olson *et al.* 1998) and other computational (Šponer & Lankaš 2006) results, (ii) it was clarified that the melting properties were dominated by the strength of hydrogen bonds between complementary bases and the melting temperatures agreed well with several DNA sequences, and (iii) the bending properties of DNA were confirmed to be in agreement with the experimental results by investigating the relation between the persistence length and salt concentrations.

## Footnotes

One contribution of 13 to a Theme Issue ‘The virtual physiological human: computer simulation for integrative biomedicine I’.

- © 2010 The Royal Society