Modern computing products are among the most complex engineering artefacts so far created. For example, Microsoft's Windows operating system has around 100 million lines of code, predicted to grow by 33% a year. Intel's Itanium 2 processor has around 400 million transistors, with Moore's law predicting a doubling in that number every 18 months. Ensuring the correctness of such large and complex systems demands increasing resource—for example verification is claimed to take 50% of the cost of chip design, rising to 70% in some companies. The damage to users and the companies themselves, caused by widely publicised flaws such as the Pentium division bug or Internet Explorer security loopholes, means companies like Intel and Microsoft are devoting increased attention to verification both in research and product divisions.

These techniques build on years of earlier research into appropriate mathematical theories for modelling processes and devices (Jones 2003). Pioneers like von Neumann, Goldstine and Turing in the 1940s understood that computing could be viewed as a branch of logic; in the 1960s Hoare and Floyd developed logics for reasoning about assertions and programs; in the 1970s Scott and Strachey developed the rigorous notions of semantics that allowed us to understand what a program did independently of the particular machine architectures it was running on and Milner laid the foundations of the theories we need to study distributed and interacting processes. The influence of this work is seen in the design of modern programming languages like Java and the takeup of logic-based verification techniques: today, articles in trade magazines like EE Times routinely mention ‘assertions’ or ‘model checking’, which until a few years ago were the preserve of academic specialists.

Effective machine support is essential if these ideas are to be applied in practice (MacKenzie 2001). Milner and Gordon were among the first to develop the theorem proving tools that made it practical to apply these theoretical ideas to obtain correctness proofs in domains, where calculation or verification by hand would be totally infeasible. These tools build up formal proofs from axioms and rules and are particularly valuable when large numbers of cases of rather similar results need to be verified, as useful tactics for semi-automation may be devised even if the theory is undecidable. Clarke devised model checking, which provided counterexamples when such correctness proofs failed and has proved particularly useful for systems modelled as finite-state automata.

Current applications of theorem proving are far from the dreams of the early pioneers, of running fully verified software on fully verified hardware: the companies involved are making pragmatic business decisions about how best to incorporate useful techniques into their existing well-developed design infrastructure. For example at Microsoft Ball and others (Ball *et al*. 2004) have developed theorem provers and model checkers to verify device drivers—the tricky code that makes peripherals, like monitors or printers, work correctly (usually). Windows contains thousands of drivers—in their tests of 30 properties of the Microsoft Windows Parallel Port device driver, the prover was called 487 716 times: on average, these queries contained 19 unique atoms and 40 instances of Boolean operators per query and in the worst case, one query contained 658 unique atoms and another contained 96 691 instances of the Boolean operators. At Intel, Harrison (Aagaard & Harrison 2000) used theorem proving in a version of Gordon's HOL system to verify the floating point division of the IA-64 against the IEEE standard. The IA-64 refines the hardware's division in software and his work necessitated developing theories of real arithmetic and analysis in HOL, in order to verify large numbers of special cases according to edge effects on registers and the like; this in turn involved treating various Diophantine equations that determined where the bad cases lay. In both cases, once the underlying mathematical theories are established and the infrastructure set up in the theorem prover, it may be used over and over again to derive results about particular cases which are important for the matter at hand but unlikely to be of great mathematical significance.

This context provides a way of addressing questions of different kinds mathematical approach. As an analogy let us consider applying the theory of dynamical systems to determine stability of a fighter aircraft:

The theory of dynamical systems is well-developed, building on the foundations of Newton and Liebniz to provide precise definitions and theorems, for example conditions for solutions to exist, to be stable and so on. Even so, this development took several hundred years with surprises along the way, for example Poincare's discovery of chaotic phenomena.

To investigate a particular system, for example to establish that it is stable, we use practical engineering knowledge to determine an appropriate model. For established disciplines like aircraft design our choice of model may be codified as fairly rigorous protocols and standards, based on the theories of dynamical systems and control engineering, which are well-established as a lingua franca in the community.

To establish properties of the model we run tests on it, for example computing eigenvalues of a linear system to verify stability, most likely with machine assistance in a program such as MatLab to ensure accurate answers. The results of these tests are unlikely to be of general importance in the theory—they act as routine calculations.

If this process produces an unsatisfactory answer it may arise from problems at any of the three stages: incorrect computation of the eigenvalues, poor choice of a model or possibly, though unlikely, from a hitherto unnoticed flaw in the underlying theory of differential equations.

The development of verification described above follows this pattern in outline, but by contrast we have a great variety of mathematical models and verification tools under consideration, much less of a consensus about how or where to apply them or about a unified approach to education and standards.

It seems to me that this should be viewed as a remarkable opportunity, rather than a weakness. The dream of some of the early workers in mechanised proof was that this would transform mathematics. It has not yet done so and maybe never will, but it has made possible the machine verification of routine yet unwieldy mathematical results that are needed to model computational phenomena (Martin 1999). By the standards of computer science, the development of classical mathematics is a slow process, yet it has a humbling depth and richness which we can aspire to in developing our present theories and the software tools that support them, so that they can bring to computational systems the power that calculus has brought to physics and engineering. By the standards of mathematics, modern computer science may seen to have identified a ridiculous variety of theories that may sometimes appear rather shallow, yet it has enormous challenges to meet in modelling, understanding and predicting next generation networks and devices. There is plenty of exciting work ahead for everyone.

## Footnotes

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof’.

- © 2005 The Royal Society