Genes and causation

Denis Noble


Relating genotypes to phenotypes is problematic not only owing to the extreme complexity of the interactions between genes, proteins and high-level physiological functions but also because the paradigms for genetic causality in biological systems are seriously confused. This paper examines some of the misconceptions, starting with the changing definitions of a gene, from the cause of phenotype characters to the stretches of DNA. I then assess whether the ‘digital’ nature of DNA sequences guarantees primacy in causation compared to non-DNA inheritance, whether it is meaningful or useful to refer to genetic programs, and the role of high-level (downward) causation. The metaphors that served us well during the molecular biological phase of recent decades have limited or even misleading impacts in the multilevel world of systems biology. New paradigms are needed if we are to succeed in unravelling multifactorial genetic causation at higher levels of physiological function and so to explain the phenomena that genetics was originally about. Because it can solve the ‘genetic differential effect problem’, modelling of biological function has an essential role to play in unravelling genetic causation.


1. Introduction: what is a gene?

At first sight, the question raised by this paper seems simple. Genes transmit inherited characteristics; so in each individual they must be the cause of those characteristics. And so it was when the idea of a gene was first mooted. The word itself was coined by Johannsen (1909), but the concept already existed and was based on ‘the silent assumption [that] was made almost universally that there is a 1:1 relation between genetic factor (gene) and character’ (Mayr 1982).

Since then, the concept of a gene has changed fundamentally (Kitcher 1982; Mayr 1982; Dupré 1993; Pichot 1999; Keller 2000a,b), and this is a major source of confusion when it comes to the question of causation. Its original biological meaning referred to the cause of an inheritable phenotype characteristic, such as eye/hair/skin colour, body shape and weight, number of legs/arms/wings, to which we could perhaps add more complex traits such as intelligence, personality and sexuality.

The molecular biological definition of a gene is very different. Following the discovery that DNA codes for proteins, the definition shifted to locatable regions of DNA sequences with identifiable beginnings and endings. Complexity was added through the discovery of regulatory elements, but the basic cause of phenotype characteristics was still the DNA sequence since that determined which protein was made, which in turn interacted with the rest of the organism to produce the phenotype.

But unless we subscribe to the view that the inheritance of all phenotype characteristics is attributable entirely to DNA sequences (which I will show is just false) then genes, as originally conceived, are not the same as the stretches of DNA. According to the original view, genes were necessarily the cause of inheritable phenotypes since that is how they were defined. The issue of causation is now open precisely because the modern definition identifies them instead with DNA sequences.

This is not a point that is restricted to the vexed question of the balance of nature versus nurture. Even if we could separate those out and arrive at percentages attributable to one or the other (which I believe is misconceived in a system of nonlinear interactions and in which either on its own is equal to zero), we would still be faced with the fact that not all the ‘nature’ characteristics are attributable to DNA alone. Indeed, as we will see as we come to the conclusion of this paper, strictly speaking no genetic characteristics as originally defined by geneticists in terms of the phenotype could possibly be attributable to DNA alone.

My first point therefore is that the original concept of a gene has been taken over and significantly changed by molecular biology. This has undoubtedly led to a great clarification of molecular mechanisms, surely one of the greatest triumphs of twentieth-century biology, and widely acknowledged as such. But the more philosophical consequences of this change for higher level biology are profound and they are much less widely understood. They include the question of causation by genes. This is also what leads us to questions such as ‘how many genes are there in the human genome?’, and to the search to identify ‘genes’ in the DNA sequences.

2. Where does the genetic code lie?

Of course, it is an important question to ask which stretches of DNA code for proteins, and that is a perfectly good molecular biological question. It also leads us to wonder what the other stretches of DNA are used for, a question to which we are now beginning to find answers (Pearson 2006). But genetics, as originally conceived, is not just about what codes for each protein. Indeed, had it turned out (as in very simple organisms) that each coding stretch of DNA translates into just one protein, then it would have been as valid to say that the genetic code lies in the protein sequences, as was originally thought (Schrödinger 1944). We are then still left with the question ‘how do these sequences, whether DNA or protein, generate the phenotypic characteristics that we wish to explain?’ Looked at from this viewpoint, modern molecular biology, starting with Watson and Crick's work, has succeeded brilliantly in mapping sequences of DNA to those of amino acids in proteins, but not in explaining phenotype inheritance. Whether we start from DNA or protein sequences, the question is still there. It lies in the complexity of the way in which the DNA and proteins are used by the organism to generate the phenotype. Life is not a soup of proteins.

The existence of multiple splice variants and genetic ‘dark matter’ (only 1–2% of the human genome actually codes for proteins, but much of the rest codes for non-protein coding RNA; Bickel & Morris 2006; Pearson 2006) has made this question more complicated in higher organisms, while epigenetics (gene marking) makes it even more so (Qiu 2006; Bird 2007), but the fundamental point remains true even for higher organisms. In a more complicated way, the ‘code’ could still be seen to reside in the proteins. Some (e.g. Scherrer & Jost 2007) have even suggested that we should redefine genes to be the completed mRNA before translation into a polypeptide sequence (see also Noble 2008, in press). In that case, there would be as many as 500 000 genes rather than 25 000. The more complex genome structure (of multiple exons and introns and the way in which the DNA is folded in chromosomes) could then be viewed as an efficient way of preserving and transmitting the ‘real’ causes of biological activity, the proteins. It is still true that, if we identify genes as just the stretches of DNA and identify them by the proteins they code for, we are already failing to address the important issues in relation to genetic determinism of the phenotype. By accepting the molecular biological redefinition of ‘gene’, we foreclose some of the questions I want to ask. For, having redefined what we mean by a gene, many people have automatically taken over the concept of necessary causation that was correctly associated with the original idea of a gene, but which I will argue is incorrectly associated with the new definition, except in the limited case of generating proteins from DNA. This redefinition is not therefore just an arcane matter of scientific history. It is part of the mindset that needs to change if we are to understand the full nature of the challenge we face.

3. Digital versus analogue genetic determinism

The main reason why it is just false to say that all nature characteristics are attributable to DNA sequences is that, by itself, DNA does nothing at all. We also inherit the complete egg cell, together with any epigenetic characteristics transmitted by sperm (in addition to its DNA), and all the epigenetic influences of the mother and environment. Of course, the latter begins to be about ‘nurture’ rather than nature, but one of my points in this paper is that this distinction is fuzzy. The proteins that initiate gene transcription in the egg cell and impose an expression pattern on the genome are initially from the mother, and other such influences continue throughout development in the womb and have influences well into later life (Gluckman & Hanson 2004). Where we draw the line between nature and nurture is not at all obvious. There is an almost seamless transition from one to the other. ‘Lamarckism’, the inheritance of acquired characteristics, lurks in this fuzzy crack to a degree yet to be defined (Jablonka & Lamb 1995, 2005).

This inheritance of the egg cell machinery is important for two reasons. First, it is the egg cell gene reading machinery (a set of approx. 100 proteins and the associated cellular ribosome architecture) that enables the DNA to be used to make more proteins. Second, the complete set of other cellular elements, mitochondria, endoplasmic reticulum, microtubules, nuclear and other membranes and a host (billions) of chemicals arranged specifically in cellular compartments, is also inherited. Much of this is not coded for by DNA sequences since they code only for RNA and proteins. Lipids certainly are not so coded. But they are absolutely essential to all the cell architecture. The nature of the lipids also determines how proteins behave. There is intricate two-way interaction between proteins and lipids (see Roux et al. 2008).

One way to look at this situation therefore is to say that there are two components to molecular inheritance: the genome DNA, which can be viewed as digital information, and the cellular machinery, which can, perhaps by contrast, be viewed as analogue information. I will refer to both of these as ‘molecular inheritance’ to emphasize that the distinction at this point in my argument is not between genetic molecular inheritance and higher-level causes. The egg cell machinery is just as molecular as the DNA. We will come to higher-level causation later.

The difference lies elsewhere. Both are used to enable the organism to capture and build the new molecules that enable it to develop, but the process involves a coding step in the case of DNA and proteins, while no such step is involved in the rest of the molecular inheritance. This is the essential difference.

The coding step in the case of the relationship between DNA and proteins is what leads us to regard the information as digital. This is what enables us to give a precise number to the base pairs (3 billion in the case of the human genome). Moreover, the CGAT code could be completely represented by binary code of the kind we use in computers. (Note that the code here is metaphorical in a biological context—no one has determined that this should be a code in the usual sense. For that reason, some people have suggested that the word ‘cipher’ would be better.)

By contrast, we cannot put similar precise numbers to the information content of the rest of the molecular inheritance. The numbers of molecules involved (trillions) would be largely irrelevant since many are exactly the same, though their organization and compartmentalization also need to be represented. We could therefore ask how much digital information would be required to ‘represent’ the non-DNA inheritance but, as with encoding of images, that depends on the resolution with which we seek to represent the information digitally. So, there is no simple answer to the question of a quantitative comparison of the DNA and non-DNA molecular inheritance. But given the sheer complexity of the egg cell—it took evolution at least 1 or 2 billion years to get to the eukaryotic cellular stage—we can say that it must be false to regard the genome as a ‘vast’ database while regarding the rest of the cell as somehow ‘small’ by comparison. At fine enough resolution, the egg cell must contain even more information than the genome. If it needed to be coded digitally to enable us to ‘store’ all the information necessary to recreate life in, say, some distant extra-solar system by sending it out in an ‘Earth-life’ information capsule, I strongly suspect that most of that information would be non-genomic. In fact, it would be almost useless to send just DNA information in such a capsule. The chances of any recipients anywhere in the Universe having egg cells and a womb capable of permitting the DNA of life on Earth to ‘come alive’ may be close to zero. We might as well pack the capsule with the bar codes of a supermarket shelf!

4. Is digital information privileged?

Of course, quantity of information is not the only criterion we could choose. Whatever its proportion would be in my imagined Earth-life capsule, some information may be more important than others. So, which is privileged in inheritance? Would it be the cell or the DNA? ‘How central is the genome?’ as Werner puts the question (Werner 2007). On the basis of our present scientific knowledge, there are several ways in which many people would seek to give primacy to the DNA.

The first is the fact that, since it can be viewed as digital information, in our computer-oriented age, that can appear to give it more security, to ensure that it is more reliable, much as the music recorded on a CD is said to be ‘clearer’ and less ‘noisy’ than that on a vinyl disc. Digital information is discrete and fixed, whereas analogue information is fuzzy and imprecise. But I wonder whether that is entirely correct. Large genomes actually require correcting machinery to ensure their preciseness. Nevertheless, with such machinery, it clearly is secure enough to act as reliably inheritable material. By contrast, it could be said that attempting to reduce analogue information, such as image data, to digital form is always fuzzy since it involves a compromise over questions such as resolution. But this criterion already biases us towards the DNA. We need to ask the fundamental question ‘why do we need to prioritize digital information?’ After all, DNA needs a digital code simply and precisely because it does not code only for itself. It codes for another type of molecule, the proteins. The rest of the cellular machinery does not need a code, or to be reduced to digital information, precisely because it represents itself. To Dawkins' famous description of DNA as the eternal replicator (Dawkins 1976, ch. 2), we should add that egg cells, and sperm, also form an eternal line, just as do all unicellular organisms. DNA cannot form an eternal line on its own.

So, although we might characterize the cell information as analogue, that is only to contrast it with being digital. But it is not an analogue representation. It itself is the self-sustaining structure that we inherit and it reproduces itself directly. Cells make more cells, which make more cells (and use DNA to do so), …, etc. The inheritance is robust: liver cells make liver cells for many generations of liver cells, at each stage marking their genomes to make that possible. So do all the other 200 or so cell types in the body (Noble 2006, ch. 7). Yet, the genome is the same throughout. That common ‘digital’ code is made to dance to the totally different instructions of the specific cell types. Those instructions are ‘analogue’, in the form of continuous variations in imposed patterns of gene expression. The mistake in thinking of gene expression as digital lies in focusing entirely on the CGAT codes, not on the continuously variable degree of expression. It is surely artificial to emphasize one or the other. When it comes to the pattern of expression levels, the information is analogue.

So, I do not think we get much leverage on the question of privileged causality (DNA or non-DNA) through the digital–analogue comparison route. We might even see the digital coding itself as the really hazardous step—and indeed it does require complex machinery to check for errors in large genomes (Maynard Smith & Szathmáry 1995; Maynard Smith 1998). Having lipid membranes that automatically ‘accept’ certain lipids to integrate into their structure and so to grow, enable cells to divide and so on seems also to be chemically reliable. The lipid membranes are also good chemical replicators. That process was probably ‘discovered’ and ‘refined’ by evolution long before cells ‘captured’ genes and started the process towards the full development of cells as we now know them. I suspect that primitive cells, probably not much more than lipid envelopes with a few RNA enzymes (Maynard Smith & Szathmáry 1995, 1999), ‘knew’ how to divide and have progeny long before they acquired DNA genomes.

5. An impossible experiment

Could we get a hold on the question by a more direct (but currently and probably always impossible; Keller 2000a,b) biological experiment? Would the complete DNA sequence be sufficient to ‘resurrect’ an extinct species? Could dinosaur DNA (let us forget about all the technical problems here), for example, be inserted into, say, a bird egg cell. Would it generate a dinosaur, a bird, or some extraordinary hybrids?

At first sight, this experiment seems to settle the question. If we get a dinosaur, then DNA is the primary, privileged information. The non-DNA is secondary. I suspect that this is what most ‘genetic determinists’ would expect. If we get a bird, then the reverse is true (this is highly unlikely in my or anyone else's view). If we get a hybrid, or nothing (I suspect that this would be the most likely outcome), we could maintain a view of DNA primacy by simply saying that there is, from the DNA's point of view, a fault in the egg cell machinery. But note the phrase ‘DNA's point of view’ in that sentence. It already gives the DNA primacy and so begs the question.

The questions involved in such experiments are important. Cross-species clones are of practical importance as a possible source of stem cells. They could also reveal the extent to which egg cells are species specific. This is an old question. Many early theories of what was called ‘cytoplasm inheritance’ were eventually proved wrong (Mayr 1982), though Mayr notes that ‘The old belief that the cytoplasm is important in inheritance … is not dead, although it has been enormously modified.’ I suspect that the failure of most cross-species clones to develop to the adult stage is revealing precisely the extent to which ‘the elaborate architecture of the cytoplasm plays a greater role than is now realized’ (Mayr 1982). Since we cannot have the equivalent of mutations in the case of the non-DNA inheritance, using different species may be our only route to answering the question.

Interspecies cloning has already been attempted, though not with extinct animals. About a decade ago, J. B. Cibelli of Michigan State University tried to insert his own DNA into a cow egg cell and even patented the technique. The experiment was a failure and ethically highly controversial. Cibelli has since failed to clone monkey genes in cow's eggs. The only successful case is of a wild ox (a banteng Bos javanicus) cloned in domestic cow's eggs. The chances are that the technique will work only on very closely related species. At first sight, a banteng looks very much like a cow and some have been domesticated in the same way. More usually, interspecies clones fail to develop much beyond the early embryo.

But however interesting these experiments are, they are misconceived as complete answers to the question I am raising. Genomes and cells have evolved together (Maynard Smith & Szathmáry 1995). Neither can do anything without the other. If we got a dinosaur from the imagined experiment, we would have to conclude that dinosaur and bird egg cells are sufficiently similar to make that possible. The difference (between birds and dinosaurs) would then lie in the DNA not in the rest of the egg cell. Remember that eukaryotic cells evolved aeons before dinosaurs and birds and so all cells necessarily have much of their machinery in common. But that difference does not give us grounds for privileging one set of information over the other. If I play a PAL video tape on a PAL reading machine, surely, I get a result that depends specifically on the information on the tape, and that would work equally well on another PAL reader, but I would get nothing at all on a machine that does not read PAL coding. The egg cell in our experiment still ensures that we get an organism at all, if indeed we do get one, and that it would have many of the characteristics that are common between dinosaurs and birds. The egg cell inheritance is not limited merely to the differences we find. It is essential for the totality of what we find. Each and every high-level function depends on effects attributable to both the DNA and the rest of the cell. ‘Studying biological systems means more than breaking the system down into its components and focusing on the digital information encapsulated in each cell’ (Neuman 2007).

6. The ‘genetic differential effect problem’

This is a version of a more general argument relating to genes (defined here as DNA sequences) and their effects. Assignment of functions to genes depends on observing differences in phenotype consequent upon changes (mutations, knockouts, etc.) in genotype. Dawkins made this point very effectively when he wrote ‘It is a fundamental truth, though it is not always realized, that whenever a geneticist studies a gene ‘for’ any phenotypic character, he is always referring to a difference between two alleles’ (Dawkins 1982).

But differences cannot reveal the totality of functions that a gene may be involved in, since they cannot reveal all the effects that are common to the wild and mutated types. We may be looking at the tip of an iceberg. And we may even be looking at the wrong tip since we may be identifying a gene through the pathological effects of just one of its mutations rather than by what it does for which it must have been selected. This must be true of most so-called oncogenes, since causing cancer is unlikely to be a function for which the genes were selected. This is why the Gene Ontology (GO) Consortium ( excludes oncogenesis: ‘oncogenesis is not a valid GO term because causing cancer is not the normal function of any gene’. Actually, causing cancer could be a function if the gene concerned has other overwhelming beneficial effects. This is a version of the ‘sickle cell’ paradigm (Jones 1993, p. 219) and is the reason why I do not think oncogenesis could never be a function of a gene: nature plays with balances of positive and negative effects of genes (see ‘Faustian pacts with the devil’; Noble 2006, p. 109).

Identifying genes by differences in phenotype correlated with those in genotype is therefore hazardous. Many, probably most, genetic modifications are buffered. Organisms are robust. They have to be to have succeeded in the evolutionary process. Even when the function of the gene is known to be significant, a knockout or mutation may not reveal that significance. I will refer to this problem as the genetic differential effect problem. My contention is that it is a very severe limitation in unravelling the causal effects of genes. I will propose a solution to the problem later in this paper.

It is also important to remember that large numbers (hundreds or more) of genes are involved in each and every high-level function and that, at that level, individual genes are involved in many functions. We cannot assume that the first phenotype–genotype correlation we found for a given gene is its only or even its main function.

7. Problems with the central dogma

The video reader is a good analogy so far as it goes in emphasizing that the reading machinery must be compatible with the coding material, but it is also seriously limited in the present context. It is best seen as an analogy for the situation seen by those who take an extension of the central dogma of biology as correct: information passes from the coded material to the rest of the system but not the other way. What we now know of epigenetics requires us to modify that view. The cell machinery does not just read the genome. It imposes extensive patterns of marking and expression on the genome (Qiu 2006). This is what makes the precise result of our imagined experiment so uncertain. According to the central dogma, if the egg cell is compatible, we will automatically get a dinosaur, because the DNA dictates everything. If epigenetic marking is important, then the egg cell also plays a determining, not a purely passive, role. There are therefore two kinds of influence that the egg cell exerts. The first is that it is totally necessary for any kind of organism at all to be produced. It is therefore a primary ‘genetic cause’ in the sense that it is essential to the production of the phenotype and is passed on between the generations. The second is that it exerts an influence on what kind of organism we find. It must be an empirical question to determine how large the second role is. At present, we are frustrated in trying to answer that question by the fact that virtually all cross-species clones do not develop into adults. As I have already noted, that result itself suggests that the second role is important.

It would also be an interesting empirical question to determine the range of species across which the egg cell machinery is sufficiently similar to enable different genomes to work, but that tells us about similarities of the match of different genomes with the egg cells of different species, and their mutual compatibility in enabling development, not about the primacy or otherwise of DNA or non-DNA inheritance. In all cases, the egg cell machinery is as necessary as the DNA. And, remember, as ‘information’ it is also vast.

Note also that what is transferred in cross-species cloning experiments is not just the DNA. Invariably, the whole nucleus is inserted, with all its machinery (Tian et al. 2003). If one takes the contribution of the egg cell seriously, that is a very serious limitation. The nucleus also has a complex architecture in addition to containing the DNA, and it must be full of transcription factors and other molecules that influence epigenetic marking. Strictly speaking, we should be looking at the results of inserting the raw DNA into a genome-free nucleus of an egg cell, not at inserting a whole nucleus, or even just the chromosomes, into an enucleated egg cell. No one has yet done that. And would we have to include the histones that mediate many epigenetic effects? This is one of the reasons, though by no means the only one, why the dinosaur cloning experiment may be impossible.

To conclude this section, if by genetic causation we mean the totality of the inherited causes of the phenotype, then it is plainly incorrect to exclude the non-DNA inheritance from this role, and it probably does not make much sense to ask which is more important, since only an interaction between DNA and non-DNA inheritance produces anything at all. Only when we focus more narrowly on changes in phenotype attributable to differences in genotype (which is how functionality of genes is currently assessed) could we plausibly argue that it is all down to the DNA, and even that conclusion is uncertain until we have carried out experiments that may reveal the extent to which egg cells are species specific, since nuclear DNA marking may well be very important.

8. Genetic programs?

Another analogy that has come from comparison between biological systems and computers is the idea of the DNA code being a kind of program. This idea was originally introduced by Monod & Jacob (1961) and a whole panoply of metaphors has now grown up around their idea. We talk of gene networks, master genes and gene switches. These metaphors have also fuelled the idea of genetic (DNA) determinism.

But there are no purely gene networks! Even the simplest example of such a network—that discovered to underlie circadian rhythm—is not a gene network, nor is there a gene for circadian rhythm. Or, if there is, then there are also proteins, lipids and other cellular machinery for circadian rhythm.

The circadian rhythm network involves at least three other types of molecular structures in addition to the DNA code. The stretch of DNA called the period gene (per) codes for a protein (PER) that builds up in the cell cytoplasm as the cellular ribosome machinery makes it. PER then diffuses slowly through the nuclear (lipid and protein) membrane to act as an inhibitor of per expression (Hardin et al. 1990). The cytoplasmic concentration of PER then falls, and the inhibition is slowly removed. Under suitable conditions, this process takes approximately 24 hours. It is the whole network that has this 24 hour rhythm, not the gene (Foster & Kreitzman 2004). However else this network can be described, it is clearly not a gene network. At the least, it is a gene–protein–lipid–cell network. It does not really make sense to view the gene as operating without the rest of the cellular machinery. So, if this network is part of a ‘genetic program’, then the genetic program is not a DNA program. It does not lie within the DNA coding. Moreover, as Foster & Kreitzman emphasized, there are many layers of interactions overlaid onto the basic mechanism—so much so that it is possible to knock out the CLOCK gene in mice and retain circadian rhythm (Debruyne et al. 2006). I prefer therefore to regard the DNA as a database rather than as a program (Atlan & Koppel 1990; Noble 2006). What we might describe as a program uses that database, but is not controlled by it.

The plant geneticist Coen (1999) goes even further. I will use my way of expressing his point, but I would like to acknowledge his ideas and experiments as a big influence on my thinking about this kind of question. In the early days of computing, during the period in which Monod & Jacob (1961) developed their idea of le programme génétique, a program was a set of instructions separate from the functionality it serves. The program was a complete piece of logic, a set of instructions, usually stored on cards or tapes, that required data to work on and outputs to produce. Pushing this idea in relation to the DNA/non-DNA issue, we arrive at the idea that there is a program in the DNA, while the data and output is the rest: the cell and its environment. Jacob was quite specific about the analogy: ‘The programme is a model borrowed from electronic computers. It equates the genetic material with the magnetic tape of a computer’ (Jacob 1982). That analogy is what leads people to talk of the DNA ‘controlling’ the rest of the organism.

Coen's point is that there is no such distinction in biological systems. As we have seen, even the simplest of the so-called gene networks are not ‘gene programs’ at all. The process is the functionality itself. There is no separate program. I see similar conclusions in relation to my own field of heart rhythm. There is no heart rhythm program (Noble 2008, in press), and certainly not a heart rhythm genetic program, separate from the phenomenon of heart rhythm itself. Surely, we can refer to the functioning networks of interactions involving genes, proteins, organelles, cells, etc. as programs if we really wish to. They can also be represented as carrying out a kind of computation (Brenner 1998), in the original von Neumann sense introduced in his theory of self-reproducing machines. But if we take this line, we must still recognize that this computation does not tell something else to carry out the function. It is itself the function.

Some will object that computers are no longer organized in the way they were in the 1960s. Indeed not, and the concept of a program has developed to the point at which distinctions between data and instructions, and even the idea of a separate logic from the machine itself, may have become outdated. Inasmuch as this has happened, it seems to me that such computers are getting a little closer to the organization of living systems.

Not only is the period gene not the determinant of circadian rhythm, either alone or as a part of a pure gene network, but also it could be argued that it is incorrect to call it a ‘circadian rhythm’ gene. Or, if it is, then it is also a development gene, for it is used in the development of the fly embryo. And it is a courtship gene! It is used in enabling male fruitflies to sing (via their wing-beat frequencies) to females of the correct species of fruitfly (more than 3000 such species are known). Genes in the sense of the stretches of DNA are therefore like pieces of re-usable Lego. That is, in principle, why there are very few genes compared with the vast complexity of biological functions. Needless to say, human courtship uses other genes! And all of those will be used in many other functions. My own preference would be to cease using high-level functionality for naming genes (meaning here DNA sequences), but I realize that this is now a lost cause. The best we can do is to poke fun at such naming, which is why I like the Fruit Fly Troubadour Gene story (Noble 2006, p. 72).

9. Higher-level causation

I have deliberately couched the arguments so far in molecular terms because I wish to emphasize that the opposition to simplistic gene determinism, gene networks and genetic programs is not based only on the distinction between higher- and lower-level causation, but also there are additional factors to be taken into account as a consequence of multilevel interactions.

The concept of level is itself problematic. It is a metaphor, and a very useful one in biology. Thus, there is a sense in which a cell, for example, and an organ or an immune system, is much more than its molecular components. In each of these cases, the molecules are constrained to cooperate in the functionality of the whole. Constrained by what? A physicist or an engineer would say that the constraints do not lie in the laws governing the behaviour of the individual components—the same quantum mechanical laws will be found in biological molecules as in molecules not forming part of a biological system. The constraints lie in the boundary and initial conditions: ‘organisation becomes cause in the matter’ (Strohman 2000; Neuman 2006). These conditions, in turn, are constrained by what? Well, ultimately by billions of years of evolution. That is why I have used the metaphor of evolution as the composer (Noble 2006, ch. 8). But that metaphor is itself limited. There may have been no direction to evolution (but for arguments against this strict view, see Jablonka & Lamb 2005). We are talking of a set of historical events, even of historical accidents. The information that is passed on through downward causation is precisely this set of initial and boundary conditions without which we could not even begin to integrate the equations representing molecular causality.

To spell this out in the case of the circadian rhythm process, this is what determines the cytoplasm volume in which the concentration of the protein changes, the speed with which it crosses the nuclear membrane, the speed with which ribosomes make new protein and so on. And those characteristics will have been selected by the evolutionary process to give a roughly 24 hour rhythm. Surely, each molecule in this process does not ‘know’ or represent such information, but the ensemble of molecules does. It behaves differently from the way in which it would behave if the conditions were different or if they did not exist at all. This is the sense in which molecular events are different as a consequence of the life process. Moreover, the boundary and initial conditions are essentially global properties, identifiable at the level at which they can be said to exist.

What is metaphorical here is the notion of ‘up and down’ (Noble 2006, ch. 10)—it would be perfectly possible to turn everything conceptually upside down so that we would speak of upward causation instead of downward causation. The choice is arbitrary, but important precisely because the principle of reductionism is always to look for ‘lower-level’ causes. That is the reductionist prejudice and it seems to me that it needs justification; it is another way in which we impose our view on the world.

Although the concept of level is metaphorical, it is nevertheless an essential basis for the idea of multilevel causation. The example I often give is that of pacemaker rhythm, which depends on another global property of cells, i.e. the electrical potential, influencing the behaviour of the individual proteins, the ionic channels, which in turn determine the potential. There is a multilevel feedback network here: channels→ionic current→electrical potential→channel opening or closing→ionic current and so on. This cycle is sometimes called the Hodgkin cycle, since it was Alan Hodgkin who originally identified it in the case of nerve excitation (Hodgkin & Huxley 1952).

Similarly, we can construct feedback networks of causation for many other biological functions. I see the identification of the level at which such networks are integrated, i.e. the highest level involved in the network, as being a primary aim of systems biology. This will also be the lowest level at which natural selection can operate since it is high-level functionality that determines whether organisms live or die. We must shift our focus away from the gene as the unit of selection to that of the whole organism (Tautz 1992).

But I also have hesitations about such language using the concepts of levels and causation. My book, in its last chapter, recommends throwing all the metaphors away once we have used them to gain insight (Noble 2006, ch. 10). In the case of the cycles involving downward causation, my hesitation is because such language can appear to make the causation involved be sequential in time. I do not see this as being the case. In fact, the cell potential influences the protein kinetics at exactly the same time as they influence the cell potential. Neither is primary or privileged as causal agency either in time or in space. This fact is evident in the differential equations we use. The physical laws represented in the equations themselves, and the initial and boundary conditions, operate at the same time (i.e. during every integration step, however infinitesimal), not sequentially.

This kind of conceptual problem (causality is one of our ways of making sense of the world, not the world's gift to us) underlies some knotty problems in thinking about such high-level properties as intentionality. As I show in The music of life (Noble 2006, ch. 9), looking for neural or, even worse, genetic ‘causes’ of an intention is such a will-of-the-wisp. I believe that this is the reason why the concept of downward causation may play a fundamental role in the philosophy of action (intentionality, free will, etc.).

I am also conscious of the fact that causality in any particular form does not need to be a feature of all successful scientific explanations. General relativity theory, for example, changes the nature of causality through replacing movement in space by geodesics in the structure of space–time. At the least, that example shows that a process that requires one form of causality (gravity acting at a distance between bodies) in one theoretical viewpoint can be seen from another viewpoint to be unnecessary. Moreover, there are different forms of causality, ranging from proximal causes (one billiard ball hitting another) to ultimate causes of the kind that evolutionary biologists seek in accounting for the survival value of biological functions and features. Genetic causality is a particularly vexed question partly not only because the concept of a gene has become problematic, as we have seen in this paper, but also because it is not usually a proximal cause. Genes, as we now define them in molecular biological terms, lie a long way from their phenotypic effects, which are exerted through many levels of biological organization and subject to many influences from both those levels and the environment. We do not know what theories are going to emerge in the future to cope with the phenomenon of life. But we can be aware that our ways of viewing life are almost certainly not the only ones. It may require a fundamental change in the mindset to provoke us to formulate new theories. I hope that this paper will contribute to that change in the mindset.

10. Unravelling genetic causation: the solution to the genetic differential effect problem

Earlier in this paper, I referred to this problem and promised a solution. The problem arises as an inherent difficulty in the ‘forward’ (reductionist) mode of explanation. The consequences of manipulations of the lowest end of the causal chain, the genes, can be hidden by the sheer cleverness of organisms to hide genetic mistakes and problems through what modern geneticists call genetic buffering and what earlier biologists would call redundancy or back-up mechanisms that kick in to save the functionality. The solution is not to rely solely on the forward mode of explanation. The backward mode is sometimes referred to as reverse engineering. The principle is that we start the explanation at the higher, functional level, using a model that incorporates the forward mode knowledge but, crucially, also incorporates higher level insights into functionality. For example, if we can successfully model the interactions between all the proteins involved in cardiac rhythm, we can then use the model to assess qualitatively and quantitatively the contribution that each gene product makes to the overall function. That is the strength of reverse engineering. We are no longer dealing just with differences. If the model is good, we are dealing with the totality of the gene function within the process we have modelled. We can even quantify the contribution of a gene product whose effect may be largely or even totally buffered when the gene is manipulated (see Noble 2006, p. 108). This is the reason why higher level modelling of biological function is an essential part of unravelling the functions of genes: ‘Ultimately, in silico artificial genomes and in vivo natural genomes will translate into each other, providing both the possibility of forward and reverse engineering of natural genomes’ (Werner 2005).

11. Conclusions

The original notion of a gene was closely linked to the causes of particular phenotype characteristics, so the question of causal relationships between genes and phenotype were circular and so hardly had much sense. The question of causality has become acute because genes are now identified more narrowly with particular sequences of DNA. The problem is that these sequences are uninterpretable outside the cellular context in which they can be read and so generate functionality. But that means that the cell is also an essential part of the inheritance and therefore was, implicitly at least, a part of the original definition of a gene. Depending on how we quantify the comparison between the contributions, it may even be the larger part. Genetic information is not confined to the digital information found in the genome. It also includes the analogue information in the fertilized egg cell. If we were ever to send out through space in an Earth-life capsule the information necessary to reconstruct life on Earth on some distant planet, we would have to include both forms of information. Now that we can sequence whole genomes, the difficult part would be encoding information on the cell. As Sydney Brenner has said, ‘I believe very strongly that the fundamental unit, the correct level of abstraction, is the cell and not the genome’ (Lecture to Columbia University in 2003). This fundamental insight has yet to be adopted by the biological science community in a way that will ensure success in unravelling the complexity of interactions between genes and their environment. In particular, the power of reverse engineering using mathematical models of biological function to unravel gene function needs to be appreciated. Multilevel systems biology requires a more sophisticated language when addressing the relationships between genomes and organisms.


Work in the author's laboratory is supported by EU FP6 BioSim network, EU FP7 PreDiCT project, BBSRC and EPSRC. I would like to acknowledge valuable discussions with Jonathan Bard, John Mulvey, James Schwaber, Eric Werner and the critical comments of the referees.


  • One contribution of 12 to a Theme Issue ‘The virtual physiological human: building a framework for computational biomedicine I’.


View Abstract