## Abstract

For comparing RNA rings or hairpins with reference or random ring sequences, circular versions of distances and distributions like those of Hamming and Gumbel are needed. We define these circular versions and we apply these new tools to the comparison of RNA relics (such as micro-RNAs and tRNAs) with viral genomes that have coevolved with them. Then we show how robust are the regulation networks incorporating in their boundary micro-RNAs as sources or new feedback loops involving ubiquitous proteins like p53 (which is a micro-RNA transcription factor) or oligopeptides regulating protein translation. Eventually, we propose a new coevolution game between viral and host genomes.

## 1. Introduction

A challenge 40 years ago was to give an objective score summarizing the genetic distance between a host (e.g. human) and an infectious agent (e.g. *Haemophilus influenzae*) in order to predict the latter’s pathogenicity or virulence. In the classical Gatlin diagram (Gatlin 1968), whose variables were DNA redundancy *R* and GC per cent of genomes, the quadratic distance between the two genomes was a way to compare them, based on their global content in puric and pyrimidic base distribution (figure 1*a*). Now comparing genomes that come from infectious agents, hosts and vectors is always pertinent, and more sophisticated tools use entropy or circular distances based on the distribution of nucleic bases along DNA (Vinga & Almeida 2004) when all the sequences of their genomes can be used, even when these genomes have a complex architecture (in their information organization or in their topology). This is the case, for example, for the circular DNA of the 4 600 755 bp length chromosome of *Yersinia pestis* (http://cmr.tigr.org/tigr-scripts/CMR/shared/CircularGenomeDisplay.cgi) or for the hepatitis D circular RNA (http://pathmicro.med.sc.edu/virol/hepatitis-virus.htm), also known as the delta agent, more similar to a plant viroid than to a complete virus (figure 2). The main difference with the historical approach performed in the 1960s is that now we can compare chain or ring sequences of RNA or DNA with reference sequences or random rings (figure 1*b*), with appropriate distances and distribution functions expressing the variability of these distances among a population of given chains or rings.

In figure 1*a*, the redundancy *R* (that is, the ability of the genome to repeat pairs of bases) is defined as follows, where (1−*p*) denotes GC per cent:
where *P*_{AU/AU} (respectively, *P*_{GC/AU}) is the probability to have a base A or U after a base A or U (respectively, G or C).

In this paper, we give the essentials of the mathematical properties of these distances in the case of rings (Demongeot 1978; Demongeot & Besson 1983; Demongeot & Moreira 2007*a*,*b*), the work for chains having been extensively published already (e.g. Comet *et al.* 1999; Bacro & Comet 2000), and later make the comparison between genomes of some coevolving triplets (host, vector and infectious agent) in virology. For example, with data from recent studies (Jopling *et al.* 2005), we will show how some human (host) or mosquito (vector) micro-RNAs coming from their UTR (untranslated) genomes fit with the genomes of some viruses (infectious agent), and we discuss a possible coevolution giving this fit as a result of a global game favouring the survival of the three interacting species, each winning (a ‘win/win/win game’).

## 2. Distances between rings and chains

If we wish to compare chains of dinucleotides, we could use classical distances between integer vectors, such as that defined by Hamming, but, in the case of rings, the vectors are considered as the same if one is a rotation of the other (Moreira 2003). Let us consider a finite alphabet *A* and a fixed integer *n* denoting the length of the rings, described from vectors in *A*^{n}. We first introduce a notation for the rotation: given *x*∈*A*^{n}, such that *x*=(*x*_{0},*x*_{1},…,*x*_{n−1}), *σ*^{i}(*x*)=(*x*_{i},…,*x*_{n−1},*x*_{0},…,*x*_{i−1}) is the *i*th circular left-permutation. It is evident that the following properties hold: *σ* is invertible, *σ*^{i}(*σ*^{j}(*x*))=*σ*^{i+j}(*x*) and *σ*^{i}(*x*)=*σ*^{i(mod n)}(*x*). We define the notion of equivalence under rotation, denoted ‘≡’, for two vectors *x*,*y*∈*A*^{n}, by

It is easy to see that this is an equivalence relation. Our space of rings will hence be *A*^{n}/≡, the quotient composed of the equivalence classes of the vectors, and a ring will be described as [*x*] ∈*A*^{n}/≡.

### (a) Circular Hamming distance

The most usual way to compare vectors with values in a finite alphabet is through the Hamming distance. Given two vectors *x*,*y*∈*A*^{n}, the Hamming distance between them is
In other words, it is the number of positions in which the values of the vectors differ. The function *d*_{H} is a metric: it is non-negative, symmetric, satisfies the triangle inequality and a null distance implies identity of the vectors. It is also easy to see that, for all *i*∈{0,…,*n*−1},
and hence

Using this last property, we define the circular Hamming distance between two rings [*x*] and [*y*] as

In general, the minimum between two metrics is not necessarily a metric, but here it holds.

### Lemma 2.1.

*The circular Hamming distance* *is a metric on A*^{n}*/≡.*

### Proof.

If , this implies that there exists a *k* such that *d*_{H}(*x*,*σ*^{k}(*y*))=0; hence

Let us now prove the symmetry:

Let [*x*],[*y*],[*z*]∈*A*^{n}/≡. We must show that the triangle inequality is satisfied, i.e.

Let *i* and *j* be such that and . In addition, we define

Then and hence ▪

### (b) Maxsubstrings distance

We now define another distance measure denoted by *d*_{s}, which evaluates the existence of substrings shared by the rings; more precisely, we define *d*_{s}([*x*],[*y*]) as the difference between *n* and the longest length of the substrings present in both rings:

It is easy to see that *d*_{s} is a semi-metric in *A*^{n}/≡. It is not a metric, since the triangle inequality may fail when substrings shared by [*z*] with [*x*] and [*y*] have as intersection two disconnected subchains, i.e. when taken together, the shared substrings cover *z*, and intersect each other in both of their extremities. For example, the triangle inequality may fail for *d*_{s} if
and

### Lemma 2.2.

*We have*

### Proof.

Since *d*_{H}(*x*,*y*)=#{*i*:*x*_{i}≠*y*_{i}}, we also have *n*−*d*_{H}(*x*,*y*)=#{*i*:*x*_{i}=*y*_{i}}, and hence we can write as

If the longest substring shared by [*x*] and [*y*] is of length *m*, then we have
and thus
▪

### (c) Shuffle distance

Until now, the two ‘distances’ we have defined can measure some form of similarity between rings, but each of them has advantages as well as disadvantages. The circular Hamming distance measures similarities between rings, but ignores their order. If we apply a permutation to both rings, the distance would not change. Hence, in a scenario of rings cut into pieces, which are shuffled and then come together to build new rings, will not capture much of what happens with those substrings. On the other hand, *n*−*d*_{s} measures the size of the longest common substrings between rings, but does not tell us anything about the other sequences.

Hence, in order to capture another aspect of the idea of similarity in which we are interested, we introduce a third function, *d*_{t}. This function will be finite only for pairs of rings [*x*],[*y*] that use the same amount of each kind of letter in *A*, i.e. such that *d*_{H}(*x*,*α*…*α*)=*d*_{H}(*y*,*α*…*α*), for all *α* in *A*, where *α*…*α* is the sequence of *A*^{n} made by the concatenation of *n* *α*; it will be otherwise. In a finite case, we define *d*_{t}([*x*],[*y*]) as the minimum number of cuts to be made in [*x*] so that, after reordering the resulting pieces, we may obtain [*y*].

### Lemma 2.3.

*The shuffle distance d*_{t} *is a metric on A*^{n}*/≡.*

### Proof.

If *d*_{s}([*x*],[*y*])=0, then no cut is necessary, and the rings must be identical. Symmetry is easy to see, since the pieces used to go in opposite directions are the same. Finally, for the triangle inequality, *d*_{t}([*x*],[*y*])≤*d*_{t}([*x*],[*z*])+*d*_{t}([*z*],[*y*]), we cut [*x*] in the optimal way to build [*z*], and then in addition we make the cuts needed to build [*y*] out of [*z*]. In this way, we pass from [*x*] to [*y*] with *d*_{t}([*x*],[*z*])+*d*_{t}([*z*],[*y*]) cuts; this may not be the optimal way of going from [*x*] to [*y*], but it provides an upper bound for *d*_{t}([*x*],[*y*]), proving the inequality. The previous argument holds for the case where all values are finite; if the left-hand side of the inequality is infinite, then letter usage is different in [*x*] and [*y*], and since they cannot both share the letter usage of [*z*], the right side will be infinite too. ▪

### (d) The semi-metric

For speeding up the computation, the ‘distance’ we eventually propose will be an approximation of *d*_{t}. Given two rings [*x*] and [*y*], we remove from both of them one of the longest substrings they share, leaving two words and . Then, we initialize two lists of words; let us denote by *x*^{(k)} the set of (one or two) subwords left in the words of *x*^{(k−1)} by removing one of the longest substrings common with the words of *y*^{(k−1)}; then these two lists are and . At each time step, the lists contain a family of non-overlapping substrings of [*x*] and [*y*], respectively. More precisely, at each iteration *k*, the algorithm finds the longest substrings between two words, taken from each list at the same level *x*^{(k)} and *y*^{(k)} (i.e. maximizing over all possible pairs between these words from *x*^{(k)} and *y*^{(k)}), removes one of these substrings from these words, and returns the remaining words to the respective lists. We define as the number *N* of iterations of the algorithm until the word sets *x*^{(N)} and *y*^{(N)} are empty. It is easy to see why we call this function : it represents the same idea as *d*_{t}, cutting the sequences into the required number of pieces in order to obtain one by reassembling the pieces of the other and reciprocally. The function is a semi-metric (the triangle inequality may fail).

## 3. Circular Hamming distribution and circular Gumbel distribution

If one of the sequences to compare is a fixed chain *x*, the other being a random ring [*y*], both being of length *n*, let us denote by *M* the random variable equal to the number of matches between them; we have , where *σ*^{k}(*y*) is the chain obtained by opening *y* at the letter of phase *k*. We will call *circular Hamming distribution* the probability law of *M*. The expected number of matches *E*(*M*) in the case of the comparison of an RNA chain with a reference RNA ring each having for example 22 bases is less than the maximum number of matches observed in the case of comparison with 22 independent chains of length 22, because a change in the origin of phases on the ring does not correspond strictly to a new chain tossing. Then we can write , where the *X*_{i} are independent and identically distributed (i.i.d.) random variables, having as common distribution the binomial law *B*(22,1/4), i.e. the distribution of a binomial variable *X* equal to the number of matches between the given RNA chain and a random reference RNA chain of the same length (we suppose that the occurrence of each base A, U, G, C has the probability 1/4). By exploiting the binomial histogram (figure 3), we obtain
Hence, we have
Let us note that this result is in agreement with the inequality whose proof is reported by , which gives a majorant equal to 11. *E*(*M*) is also of course strictly larger than the expected number in the case of comparison with only one reference random chain, i.e. 22/4=5.5, hence *E*(*M*) lies in the interval [6, 10].

The observed empirical mean (see §5) in the numerical experiments shows a value near 9.5, i.e. about the value of the expectation of the supremum of 22 binomial variables *B*(22,1/4). This observation suggests a conjecture: the distribution of *M* is in general a convex compromise between the binomial law of *X*, the sup_{i=1,…,n}*X*_{i} distribution and the Dirac distribution located on the singleton {22} (with weights to determine). The extremal distributions can be obtained in the following circumstances. If the length of the reference random ring is going to infinity, the length of the given RNA remaining finite equal to 22, *E*(*M*) tends to be equal to the binomial expectation 5.5. If, on the contrary, the length of the given RNA tends to infinity as the length of the reference random ring remains fixed to 22, the perfect fit is asymptotically observed and *E*(*M*) tends to 22. If both lengths remain the same, equal to *n* and if *n* tends to infinity, we observe the sup_{i=1,…,n}*X*_{i} distribution, whose expectation is about 9.6, if *n*=22. This last case is observed in our example. If *n* is small, the bias observed in simulations with respect to the sup_{i=1,…,n}*X*_{i} distribution is because of the relatively weak number *A*_{n} of aperiodic rings (i.e. rings each of whose circular permutation is different from the others) among the *R*_{n} possible rings (Ruskey & Sawada 2000):
and
where *μ* and *ϕ* are, respectively, the Möbius and the Euler functions. For example, we have for rings of *n* nucleotides having only two states (puric and pyrimidic): *A*_{8}=30 and *R*_{8}=36, but *A*_{22}=190 557 and *R*_{22}=190 746, which shows the reduction of the bias when *n* increases.

We will call *circular Gumbel distribution* the probability distribution of the random variable defined by (*M*−*E*(*M*))/*σ*(*M*), where *σ*(*M*) is the standard deviation (s.d.) of *M*. This quantity is random, but partially independent of the length (here 22) of the reference RNA ring. For a ‘circular’ *Z*-score it could play the same role as the ‘classical’ Gumbel distribution for the ‘classical’ *Z*-score (; Comet *et al.* 1999). By using an upper bound of the large deviations of this distribution given by the supremum of binomial variables, we can show for example the significance (at the threshold of 2.5%) of the fit between specific chains (200 siRNAs from http://www.rnainterference.org/HumanSequences.html) and a reference ring called AL (cf. §5). The circular Gumbel distribution can be estimated using a von Mises–Tychonov kernel (Shmaliy 2005).

## 4. RNA relics

The RNA relics (essentially tRNA loops, siRNAs and micro-RNAs) are made of short sequences (length of about 20 bases) having the same function in many realms (viral, bacterial, vegetal, animal) and a weak interspecific variability as for the genetic code, which is universal (Eigen 1971; Labouygues 1976; Hopfield 1978; Trifonov & Sussman 1980; Eigen *et al*. 1981; Figureau & Pouzet 1984; Hartman 1984; Swanson 1984; Hobish *et al*. 1995; Szathmary & Maynard Smith 1997; Trinquier & Sanejouand 1998; Yarus 2000; de Duve 2002; Hornos *et al*. 2004; Wang & Schultz 2005; Wang *et al*. 2006). This is for example the case for the tRNA loops, which are highly invariant between species and amino acids, and it has been recently discovered that it also holds for micro-RNAs, which are small sequences of mean length 22 (see §7), present in the non-coding regions of many known genomes (especially of plants and animals), whose maturation (figure 4) process allows the interaction with mRNAs, preventing in general their translation in ribosomes. These micro-RNAs are particularly useful as cancer biomarkers (Calin *et al.* 2004) and could also be used in infectious diseases for predicting the pathogenicity of the infectious agents.

During the first step of the maturation process, the micro-RNAs (miRs) have a hairpin structure (http://protein3d.ncifcrf.gov/shuyun/Web/talk/Talk04.pdf), and both bioinformatics approaches and direct cloning methods have identified many such miRs, including orthologues from various species: the repository miRBase (http://microrna.sanger.ac.uk) contains over 5000 annotated miRs, including numerous human miR genes. Many miRs are ubiquitously expressed, whereas others are expressed in a cell-type specific manner. Because a single miR can target transcripts from multiple genes and, conversely, several miRs can control a single target (Krek et al. 2005), the miRs and their targets function as a complex regulatory network. We take advantage of the complete sequencing of vectors like *Anopheles gambiae* (Holt *et al.* 2002; Hill *et al.* 2005) and *Aedes egypti* (Nene *et al*. 2007) and also use the -untranslated region (-UTR) part of viral RNAs, like a typical isolated mRNA of *Hepacivirus*, hepatitis C virus (HCV), a 341-nucleotide sequence containing an internal ribosome entry site (IRES) required for the initiation of translation. It is fully admitted that the - and -UTRs may play a role in the initiation of negative-strand synthesis of viral RNAs released from entering virions, switching from negative-strand synthesis to synthesis of progeny plus strand RNA at late times after infection, and finally in the initiation of translation and in the packaging of virus plus strand RNA into particles (Markoff 2004). Until recently very little was known about regulation of *Flavivirus* RNA replication and translation, in particular via the RNA interference machinery (Bartenschlager *et al.* 2004), but in a human liver-specific miR (miR-122) enhances intracellular levels of HCV RNAs, and a recent work noted that this miR was likely to facilitate replication of the viral RNA (Appel & Bartenschlager 2006). By searching matches between miRs and viral genomes, we also discovered that a dozen miRs had a conserved coincidence in all four dengue virus subtypes, and also a dozen in all five HCV subtypes, with three miRNAs present in both, and from them only one, called *Anopheles gambiae* miRNA-281, was found with a coincidence in the same UTR () and in the same sense (+) for dengue and HCV. Its matching with dengue virus is interesting: for the subtypes 1, 2 and 3, it matches exactly the end of the -UTR, right before the beginning of the first CDS (coding sequence). It turns out that this part, in the absence of the miR, has a high hairpin-building potential, hybridized in chain form if the miR is added (figure 5).

Concerning human miRs, if the virus requires something to ‘open up’ the -end, then it should also happen with *Homo sapiens* miR518c (cf. http://microrna.sanger.ac.uk/cgi-bin/sequences/mirna_entry.pl?acc=MI0003159; figure 6), in which the matching concerns the Watson–Crick pairing plus the G-U pairing, with two hydrogen bonds, which occurs fairly often in RNA (but rarely in DNA).

For each mature miR and each target sequence, we slide the Watson–Crick complement of the miR over the target sequence on all possible positions. Thus, for each position, we compare a sequence *m*_{1},*m*_{2},…,*m*_{L} (the miR) with a segment of the target, *s*_{i}*s*_{i+1}…*s*_{i+L−1}. We define *v*_{j}=1 if *m*_{j}=*s*_{i+j−1} and *v*_{j}=−1 otherwise. We consider the segment [start, stop] a candidate match, if:

*v*_{start}=*v*_{stop}=1 and ;it is maximal, i.e. not contained in a larger segment verifying previous conditions.

When we analyse the mean match score (calculated for all miRs of species indicated in the legends of figures 7 and 8) along the viral -UTR, we notice a best match for the hosts whose coevolution with the virus has been the closest (e.g. showing a better fit for *Gallus gallus* than *Homo sapiens* for West Nile virus and the inverse for HCV, the fit being identified as the integral of the mean match curve). If we focus on precise miRs (figure 9), we find good matches between some of them and the -UTR, showing a better resistance of some hosts, similar to the human miR-122 at the beginning of the HCV -UTR (cf. figure 8;) and dengue -UTR (figure 9).

It is clear that the genomic congruences shown above are more pertinent than the proximities in the Gatlin diagram, but they are calculated in the same spirit. Complementary studies, namely of modelling and simulation, should be performed in order to understand well the effective role of miRs in the host and the vector regulatory networks during viral infection. A variational principle maximizing the benefit that each species (host, vector and virus) is getting in this three-player game has also to be found in order to explain why the coevolution has produced these fits between the three genomes. This evolutionary variational principle would involve only the three genomes and no exogenous information (with respect to the set of game players) coming, for example, from ancestral genomes. However, if we want to introduce an external reference in order to emphasize the internal homogeneity of a given genome with respect to the set of all possible genomes, we need to calculate distances to this referential set and show that they are smaller between the given genome and the referential set than between the given genome and a set of randomly chosen genomes.

## 5. Primitive genome and comparison with RNA relics

It has been shown (Demongeot & Besson 1996; ; Demongeot & Moreira 2007*a*,*b*) that specific RNA rings (e.g. the ring shown in figure 10, called AL for archetypal loop) could be selected as solutions of a variational principle: to be of minimal length favouring RNA naturation or renaturation after denaturation, as well as RNA replication processes (figure 10*a*,*b*) and to offer at least one reasonable affinity site for each amino acid (in the sense of the stereo-chemical theory of the genetic code, i.e. with electrostatic and/or van der Waals interactions). AL is represented in the Gatlin diagram (figure 1) and lies between the archaebacteria and the human mitochondrial genome. The total number of all the rings (denoted alRNAs) selected under this variational principle is 29 520. They all have a length of 22 bases, with a hairpin secondary structure (figure 10*b*,*c*), and are close (for the distances of 1) to all known tRNA relics essentially made of a succession of tRNA loops (Moreira 2003; Demongeot & Moreira 2007*a*,*b*), which can come from ancestral hairpins (Di Giulio 1992, 1997, 2009; Fujishima *et al*. 2008) and be present in miRNAs conserved or not in humans (Bentwich *et al*. 2005). Explaining the proximity or identity in the case of some tRNAs, like *Oenothera lamarckiana* Gly-tRNA, comes from the fact that rings subsolutions of the variational problem present a tRNA-like structure (figure 11), creating stems as for the *O. lamarckiana* Gly-tRNA clover leaf.

The ring AL fits with a high significance (less than 2.5% in figure 12) with siRNAs and miRs involved in many important cell functions. The mean and s.d. of the mean matching score (for the 22-circular Hamming distance) between all alRNAs and all known miRs are 9.634 and 0.088 (blue curve in figure 13). If we compare all known miRs with randomized samples from the set of all RNA rings having a length of 22 bases (there are about 16×10^{12} such rings) and presenting the same base composition as the 29 520 alRNAs, these values become 9.558 and 0.11 (yellow curve in figure 13).

The comparison between all the known miRs with AL gives a mean of 9.77935, over *μ*+1.645*σ* with respect to alRNAs, and slightly over *μ*+2*σ* with respect to randomized rings. Then, the mean matching score is significantly (*p*=0.005) higher with AL for the set of all the known miRs (*μ*=9.78, *σ*=0.09) than for a set of miRNAs obtained by chance (*μ*=9.56, *σ*=0.11). In the same way, the expectation of the maximal length *L* of consecutive matches of known miRs is significantly higher (*p*=0.01) with AL (*P*(*L*≥6)=0.093) than with miRs obtained by chance with the same base composition (*P*(*L*≥6)=0.07). The barycentre of all known miRs also has a significantly higher number of matches with AL: mean and s.d. are 4.131 and 0.041 for randomized rings with the same base composition as alRNAs; 4.227 and 0.070 for the set of 29 520 alRNA rings; and 4.31781 for AL, more than 4.5 times the s.d. from the mean 4.131. A unilateral test of mean shows that the match with AL is significantly better than the match with randomized rings (*p*=10^{−6}) (figure 14).

In figure 15, we show a significantly lower maxsubstring distance between known miRs observed in at least two species (repmiRs) and randomized miRs (with the same base composition as repmiRs) than with AL. More generally, in figure 16, we summarize for the two different distances introduced in §1 and for the circular version of the classical edit distance (the edit distance between two strings of characters is the number of operations required to transform one of them into the other) the proximity diagram (generalization of the Gatlin diagram) between the set of all known tRNA loops, the set of ancestral ring solutions of the variational problem of §5 and the set of all known miRs. An explanation of this proximity could lie in the fact that these structures with a low interspecific variability (tRNAs loops and miRs, as well as siRNAs as shown in figure 12) come from the same primitive reservoir of RNA rings satisfying a variational principle and that the fitness to their function (protein building for tRNAs and translation control for miRs) has been from the beginning sufficiently high to ensure their survival. We hope in the future to find the same type of variational principle explaining the fitness we have observed in §4 between host, vector and virus genomes.

## 6. Beyond a common fitness function between RNA relics, viral genome and primitive genome

### (a) A first equilibrium between two genomes, a primitive and an evolutive

Let us suppose that a primitive RNA genome G_{1} is able to appear well protected against denaturation by amino acids AA_{i}, having a great affinity with it. To evolve, G_{1} needs a second RNA genome G_{2} with which it has the following relationship, summarized in figure 17.

(i) G

_{1}is able to make peptides P (by confining amino acids, these giving peptides because of their proximity and ability to create covalent bonds), and it exports P as ‘capsule’ peptides to contribute to the protection of G_{2}(which presents affinity only for some amino acids of P like AA_{1}).(ii) G

_{2}is able to duplicate and grow through the classical operations such as mutation, insertion, deletion, inversion and translocation (Faraut & Demongeot 2000) with nucleic material already present in the environment, for example, nucleic bases synthesized from Miller’s type reaction (Johnson*et al*. 2008), and to export small RNA fragments to G_{1}.(iii) Finally, the coevolution of G

_{1}and G_{2}allows a first equilibrium between two genomes, G_{1}able to capitalize on the evolutionary memory and G_{2}able to evolve and ensure possibilities of evolution to G_{1}. The game with two players, G_{1}and G_{2}, leads to an equilibrium with two winners, each of them transmitting to the other its main survival feature, i.e. peptide protection for G_{1}and ability to evolve/adapt for G_{2}.

### (b) A second equilibrium between three genomes, a host, a virus and a vector

We can consider that after the first stage of evolution with two genomes that we have described above, another game appeared and went to equilibrium (cf. figure 18). This game consists in exchanging proteins and RNA (or DNA) between three players. The *host*, like the primitive genome G_{1}, capitalizes on the evolutionary memory and is able, if infected by the RNA (or DNA) of a virus, to replicate it and to make the proteins necessary for its protective capsule. The *virus* plays the same role as G_{2} by being able to evolve rapidly in a given environment. It continues to contribute to the evolution of G_{1} by incorporating and leaving a part of its RNA inside G_{1} (whose molecular form becomes more stable and adapted to a conservative replication by adopting the DNA configuration; this is also the case for certain viruses that have adopted this more stable form for their genome). To be more efficient, in particular, at passing through the defences of the host, the virus uses a third species, the *vector* (which can also be an intermediary host susceptible to start the multiplication of the virus) well adapted to transport the viral RNA inside the host cells. The game still leads to an equilibrium with only winners: the host and the vector are increasing their adaptability, and the virus ensures its survival and multiplication. Because this game corresponds to a coevolution over a long time, it is not surprising now to find common RNAs between host, vector and virus, as we have shown in the previous sections, these common sequences being just the traces of the exchanges between the three species. A computational implementation of this game is possible and will be presented and discussed in a further paper.

## 7. Robustness of the micro-RNA control

Micro-RNAs 17_5p or 34 (figures 5, 8, 9 and 12) match not only with viral genomes and the AL ring, but also with mRNAs of proteins controlling important functions. Among these functions, proliferation is ruled by the cell-cycle network, whose boundary elements act on the transcription factor E2F, which belongs to the core of the network, made of a double positive loop (figure 19). By fixing the states of these micro-RNAs or of p53 (a transcription factor of miR-34; Jouanneau & Larsen 2007) to the value 1 (corresponding to their expression state), four limit cycles occur in their dynamics, when all elements of the cell-cycle network are synchronously updated. These limit cycles are never present in the case of parallel updating, when the boundary genes are in state 0 (figure 19). More generally, a strongly connected subnetwork of size three like those containing E2F can present four different dynamical behaviours: Cy (respectively, Fi and Mi), in which attractors are only limit cycles (respectively, only fixed configurations, and at least one fixed configuration and one limit cycle) independently of the updating mode; and Ev, in which attractors are only fixed configurations or only limit cycles for certain updating modes (Elena & Demongeot 2008; Elena *et al*. 2008; Elena 2009). The addition of a micro-RNA to these strongly connected subnetworks of size three increases the percentage of class Fi and decreases that of class Ev, hence improving the robustness of networks like the cell-cycle network, whose class becomes independent of the updating mode (table 1). The cell-cycle network is then very sensitive to its boundary elements, especially to the miRs. Then the viral mRNAs hybridizing these miRs can play a direct role in important cell functions such as proliferation; for the other main functions, robustness has already been studied in many papers (Ben Amor *et al*. 2008, 2009; Demongeot *et al.*2000, 2006, 2008, 2009*a*,*b*).

## 8. Conclusion

We have shown in this paper that, for some RNA relics (i.e. RNA sequences well conserved among species) like tRNA loops and micro-RNA sequences extracted from their hairpin form, we have significant similarities. Because the mean length of these sequences is low (about 22), we used this to prove its existence as an intermediary, a reference set made of RNA rings selected from a variational principle (minimization of their length and maximization of their amino acid affinity in the stereochemical theory of the genetic code), which provided rings of length 22 only. Other small RNAs (like siRNAs) have also been tested, showing the same similarity. From this perspective, we could address the problem of the systematic detection of micro-RNAs in non-coding parts of genomes and show that there is a correlation between the low interspecific variability of these structures and their fit with the archetypal genome, as well as with the viral genome, both fits satisfying variational principles due to common coevolution. Even if Gilbert’s hypothesis (Gilbert 1986) of a primordial RNA world is not yet proved (cf. Ertem 2004; Shapiro 2007), the intense period of research about RNAs over the past 20 years is a reality. It has not been a ‘revolution’, but we can say, following Mello & Conte (2004), that ‘considering the potential role of RNA as a primordial biopolymer of life, it is perhaps more apt to call it an RNA ‘revelation’. RNA is not taking over the cell—it has been in control all along’.

## Footnotes

One contribution of 17 to a Theme Issue ‘From biological and clinical experiments to mathematical models’.

- © 2009 The Royal Society