## Abstract

Progress in the foundations of mathematics has made it possible to formulate all thinkable mathematical concepts, algorithms and proofs in one language and in an impeccable way. This is not in spite of, but partially based on the famous results of Gödel and Turing. In this way statements are about mathematical objects *and* algorithms, proofs show the correctness of statements *and* computations, and computations are dealing with objects *and* proofs. Interactive computer systems for a full integration of defining, computing and proving are based on this. The human defines concepts, constructs algorithms and provides proofs, while the machine checks that the definitions are well formed and the proofs and computations are correct. Results formalized so far demonstrate the feasibility of this ‘computer mathematics’. Also there are very good applications. The challenge is to make the systems more mathematician-friendly, by building libraries and tools. The eventual goal is to help humans to learn, develop, communicate, referee and apply mathematics.

## 1. The nature of mathematical proof

Proofs in mathematics have come to us from the ancient Greeks. The notion of proof is said to have been invented by Thales (*ca* 624–547 BC). For example, he demonstrated that the angles at the bases of any isosceles triangle are equal. His student Pythagoras (*ca* 569–475 BC) went on to prove, among other things, the theorem bearing his name. Pythagoras started a society of followers, half religious, half scientific, that continued after he passed away. Theodorus (465–398 BC), a member of the society, taught the philosopher Plato (428–347 BC) the irrationality of , , , , …, . Plato emphasized to his students the importance of mathematics, with its proofs that show non-obvious facts with a clarity such that everyone can understand them. In Plato's dialogue, Meno a slave, was requested by Socrates (469–399 BC) to listen and answer, and together, using the maieutic method, they came to the insight that the size of the long side of an isosceles rectangular triangle is, in modern terminology, times the size of the shorter side. Not much later the subject of mathematics was evolved to a sufficient degree that Plato's student Aristotle (384–322 BC) could reflect about this discipline. He described the *axiomatic method* as follows. Mathematics consists of objects and of valid statements. Objects are defined from previously defined objects; in order to be able to get started one has the *primitive objects*. Valid statements are *proved* from other such statements; in order to get started one has the *axioms*. Euclid (*ca*. 325–265 BC) wrote just a few decades later his monumental *Elements* describing geometry in this axiomatic fashion. Besides that, the *Elements* contain the first important results in number theory (theory of divisibility, prime numbers, factorization) and even Eudoxos' (408–355 BC) account of treating ratios (that was later the inspiration for Dedekind (1831–1916) to give a completely rigorous description of the reals as cuts of rational numbers).

During the course of history of mathematics proofs increased in complexity. In particular, in the 19th century some proofs could no longer be followed easily by just any other capable mathematician: one had to be a specialist. This started what has been called the sociological validation of proofs. In disciplines other than mathematics the notion of *peer review* is quite common. Mathematics for the Greeks had the ‘democratic virtue’ that anyone (even a slave) could follow a proof. This somewhat changed after the complex proofs appeared in the 19th century that could only be checked by specialists. Nevertheless, mathematics kept developing and having enough stamina one could decide to become a specialist in some area. Moreover, one did believe in the review by peers, although occasionally a mistake remained undiscovered for many years. This was the case, e.g. with the erroneous proof of the Four Colour Conjecture by Kempe (1879).

In the 20th century, this development went to an extreme. There is the complex proof of Fermat's Last Theorem by Wiles. At first the proof contained an error, discovered by Wiles himself, and later his new proof was checked by a team of twelve specialist referees.1 Most mathematicians have not followed in detail the proof of Wiles, but feel confident because of the sociological verification. Then there is the proof of the Classification of the Finite Simple Groups. This proof was announced in 1979 by a group of mathematicians lead by Gorenstein. The proof consisted of a collection of connected results written down in various places, totalling 10 000 pages. In the proof one relied also on ‘well-known’ results and it turned out that not all of these were valid. Work towards improving the situation has been performed, and in Aschbacher (2004) it is announced that at least this author believes in the validity of the theorem. Finally there are the proofs of the Four Colour Theorem, Appel & Haken (1977*a*,*b*) and Robertson *et al*. (1996), and of Kepler's Conjecture, Hales (in press). All these proofs use a long computation performed on a computer. (Actually Aschbacher (2004) believes that at present also the proof of the Classification of the Finite Simple Groups relies on computer performed computation.) The situation is summed up in table 1.

A very different development, defended by Zeilberger and others, consists of admitting proofs where the result is not 100% certain, but say 99.9999999999%. Examples of these proofs concern the primality of large numbers, see Miller (1976) and Rabin (1980).

In this situation the question arises whether there has been a necessary devaluation of proofs. One may fear that the quote of Benjamin Peirce (1809–1880) ‘Mathematics is the science which draws necessary conclusions’ may not any longer hold. Scientific American even ventured an article called ‘The Death of Proof’, see Horgan (1993). The Royal Society Discussion Meeting ‘The Nature of Mathematical Proof’ (18–19 October 2004) had a more open-minded attitude and genuinely wanted to address the question. We will argue below that proofs remain alive and kicking and at the heart of mathematics. There is a sound methodology to ensure the full correctness of theorems with large proofs, even if they depend on complex computations, like the Four Colour Theorem, or on a sociological verification, like the Classification of the Finite Simple Groups.

### (a) Phenomenology

From where does the confidence come that is provided by a proof in mathematics? When a student asks the teacher: ‘Sir, am I allowed to do this step?’, the answer we often give is ‘When it is convincing, both for you and me!’. Mathematics is rightly considered as the most exact science. It is not too widely known to outsiders that this certainty eventually relies on a mental judgement. It is indeed the case that proofs and computations are a warranty for the exactness of mathematics. But both proofs and computations need a judgement that the performed steps are correct and applicable. This judgement is based on a trained form of our intuition. For this reason Husserl (1901), and also Gödel (1995), and notably Bernays in Wang (1997, p. 337, 10.2.7), emphasize the phenomenological character of the act of doing mathematics.

### (b) Computation versus intuition

In Buddhist psychology one distinguishes discursive versus intuitive knowledge. In order to explain this a contemporary example may be useful. Knowing physics one can calculate the range of angles a bike rider may use while making a right turn. This is discursive knowledge; it does not enable someone to ride a bike. On the other hand, a person who knows how to ride a bike ‘feels’ the correct angles by intuition, but may not be able to compute them. Both forms of knowledge are useful and probably use different parts of our brain.

For the mental act of doing mathematics one may need some support. In fact, before the Greek tradition of proofs, there was the Egyptian–Chinese–Babylonian tradition of mathematics as the art of computing. Being able to use computational procedures can be seen as discursive knowledge. This aspect is often called the ‘algebraic’ side of mathematics. On the other hand, proofs often rely on our intuition. One speaks loosely about the intuitive ‘geometric’ side of mathematics.

A computation like 13 338×3 145 727=41 957 706 726 needs do be done on paper or by some kind of computer (unless we are an *idiot savant*; this computation is related to the famous ‘Pentium bug’ appearing in 1994). Symbolic manipulations, like multiplying numbers or polynomials, performing symbolic integrations and arbitrary other algebraic computations may not be accompanied by intuition. Some mathematicians like to use their intuition, while others prefer algebraic operations. Of course knowing both styles is best. In the era of Greek mathematics at first the invention of proofs with its compelling exactness drew attention away from computations. Later in the work of Archimedes (287–212 BC) both computations and intuition did excel.

The story repeated itself some two millennia later. The way in which Newton (1643–1727) introduced calculus was based on the solid grounds of Euclidean geometry. On the other hand, Leibniz (1646–1716) based his calculus on infinitesimals that had some dubious ontological status (do they really exist?). But Leibniz' algebraic approach did lead to many fruitful computations and new mathematics, as witnessed by the treasure of results by Euler (1707–1783). Infinitesimals did lead to contradictions. But Euler was clever enough to avoid these. It was only after the foundational work of Cauchy (1789–1857) and Weierstrass (1815–1897) that full rigour could be given to the computational way of calculus. That was in the 19th century and mathematics bloomed as never before, as witnessed by the work of Gauss (1777–1855), Jacobi (1804–1851), Riemann (1826–1866) and many others.

During the last third of the 20th century the ‘schism’ between computing and proving reoccurred. Systems of computer algebra, being good at symbolic computations, were at first introduced for applications of mathematics in physics: a pioneering system is Schoonschip, see Veltman (1967), which helped win a Nobel prize in physics. Soon they became useful tools for pure mathematics. Their drawback is that the systems contain bugs and cannot state logically necessary side-conditions for the validity of the computations. On the other hand, systems for proof-checking on a computer have been introduced, the pioneer being Automath of de Bruijn, see Nederpelt *et al*. (1994). These systems are able to express logic and hence necessary side conditions, but at first they were not good at making computations. The situation is changing now, as will be seen below.

### (c) Computer science proofs

Programs are elements of a formal (i.e. precisely defined) language and thereby they become mathematical objects. It was pointed out by Turing (1949) that one needs a proof to show that a program satisfies some desired properties. This method was refined and perfected by Floyd (1967) and Hoare (1969). Not all software has been specified, leave alone proven correct, as it is often hard to know what one exactly wants from it. But for parts of programs and for some complete programs that are small but vital (like protocols) proofs of correctness have been given. The methodology of (partially) specifying software and proving that the required property holds for the program is called ‘Formal Methods’.

Proofs for the correctness of software are often long and boring, relying on nested case distinctions, contrasting proofs in mathematics that are usually more deep. Therefore the formal methods ideal seemed to fail: who would want to verify the correctness proofs, if they were longer than the program itself and utterly uninspiring. Below we will see that also this situation has been changed.

## 2. Foundations of mathematics

A foundation for mathematics asks for a formal language in which one can express mathematical statements and a system of derivation rules using which one can prove some of these statements. In order to classify the many objects that mathematicians have considered an ‘ontology’, describing ways in which collections of interest can be defined, comes in handy. This will be provided by set theory or type theory. Finally, one also needs to provide a model of computation in which algorithms performed by humans can be represented in one way or another. In other words, one needs logic, ontology and computability.

### (a) Logic

Not only did Aristotle describe the axiomatic method, he also started the quest for logic. This is the endeavour to chart the logical steps needed in mathematical reasoning. He started a calculus for deductions. The system was primitive: not all connectives (and, or, implies, not, for all, exists) were treated, only monadic predicates, like *P*(*n*) being ‘*n* is a prime number’, and not binary ones, like *R*(*n*, *m*) being ‘*n*<*m*’, were considered. Nevertheless, the attempt to find rules sufficient for reasoning was quite daring.

The quest for logic, as needed for mathematical reasoning, was finished 2300 years later in Frege (1879). Indeed, Gödel (1930) showed that Frege's system was complete (mathematically this was done already by Skolem 1922; but the result itself was not mentioned there). This means that from a set of hypotheses *Γ* a statement *A* can be derived iff in all structures in which the hypotheses *Γ* hold, also the conclusion *A* holds.

A particular nice version of logic was given by Gentzen in 1934 (see his collected papers, Gentzen 1969). This system is presented in table 2. Some explanations are in order. The signs →, &, ∨, , ∀, ∃ stand for ‘implies’, ‘and’, ‘or’, ‘true’, ‘for all’ and ‘exists’, respectively. ¬*A* stands for ‘not *A*’ and is defined as *A*→⊥, where ⊥ stands for a (‘the’) false statement (like 0=1). *Γ* stands for a set of statements and *Γ*⊢*A* stands for ‘from the set *Γ* the statement *A* is derivable’. A rule likehas to be read as follows: ‘If *A* & *B* is derivable from *Γ*, then so is *A*.’

#### (i) First-, second- and higher-order logic

The logic presented is first-order logic. It speaks about the element of a structure (a set with some given operations and relations on it) and can quantify over the elements of this structure. In second-order logic one can quantify over subsets of the structure , i.e. over . Then there is higher-order logic that can quantify over each . In first-order logic, one can distinguish the difference between continuity and uniform continuity of a given function (say on ).versus

Here ∀*ϵ*>0… has to be translated to .

In second-order logic one may express that an element *x* of a group *G* has torsion (a power of *x* is the unit element *e*) without having the notion of natural number:

This states that *e* belongs to the intersection of all subsets of *G* that contain *x* and that are closed under left-multiplication by *x*.

In higher-order logic, one may state that there exists a non-trivial topology on that makes a given function *f* continuous

Here is a non-trivial topology stands for

#### (ii) Intuitionistic logic

Not long after the first complete formalization of (first-order) logic was given by Frege, Brouwer criticized this system of ‘classical logic’. It may promise an element when a statement likehas been proved, but nevertheless it may not be the case that a *witness* is found, i.e. one may not know how to prove any of

For example, this is the case for the statementwhere stands for the Riemann Hypothesis that can formulated in (Peano) Arithmetic. The only possible witnesses are 0 and 1. By classical logic holds. In the first case one can take *x*=0 and in the second case *x*=1. Therefore one can prove ∃*x*.*P*(*x*). One, however, cannot provide a witness, as *P*(0) can be proved only if the is proved and *P*(1) can be proved only if the is refuted. At present neither is the case. One may object that ‘tomorrow’ the may be settled. But then one can take another open problem instead of the , or an independent statement, for example a Gödel sentence *G* stating thator the Continuum Hypothesis (if we are in set theory). A similar criticism can be addressed to provable statements of the form *A*∨*B*. These can be provable, while neither *A* nor *B* can be proved.

Brouwer analysed the situation and concluded that the law of excluded middle, *A*∨¬*A* is the cause of this unsatisfactory situation. He proposed to do mathematics without this ‘unreliable’ logical principle. In Heyting (1930) an alternative logic was formulated. For this logic one can show thatand similarly

Gentzen provided a convenient axiomatization of both classical and intuitionistic logic. In table 2 the system of classical logic is given; if one leaves out the rule of double negation one obtains the system of intuitionistic logic.

### (b) Ontology

Ontology is the philosophical theory of ‘existence’. Kant remarked that existence is not a predicate. He probably meant that in order to state that something exists we already must have it. Nevertheless, we can state that there exists a triple (*x*, *y*, *z*) of positive integers such that *x*^{2}+*y*^{2}=*z*^{2} (as Pythagoras knew), but not such that *x*^{3}+*y*^{3}=*z*^{3} (as Euler knew). Ontology in the foundations of mathematics focuses on collections of objects *O*, so that one may quantify over it (i.e. stating ∀*x*∈*O*.*P*(*x*), or ∃*x*∈*O*.*P*(*x*)). Traditional mathematics only needed a few of these collections: number systems and geometric figures. From the 19th century on a wealth of new spaces was needed and ample time was devoted to constructing these. Cantor (1845–1918) introduced set theory that has the virtue of bringing together all possible spaces within one framework. Actually this theory is rather strong and not all postulated principles are needed for the development of mathematics. An interesting alternative is type theory in which the notion of function is a first class object.

#### (i) Set theory

Postulated are the following axioms of ‘set existence’:

These axioms have as intuitive interpretation the following. is a set; if *a*, *b* are sets, then {*a*, *b*} is a set;…; if *P* is a property over sets, then {*x*∈*a*|*P*(*x*)} is a set; if for every set *x* there is given a unique *F*(*x*) in some way or another, then {*F*(*x*)|*x*∈*a*} is a set. We will not spell out the way the above axioms have to be formulated and how *P* and *F* are given, but refer the reader to a textbook on axiomatic set theory, see e.g. Kunen (1983). Also there are the axioms of ‘set properties’:

The axiom of extensionality states that a set is completely determined by its elements. The axiom of foundation is equivalent with the statement that every predicate *P* on sets is well-founded: if there is a witness *x* such that *P*(*x*) holds, then there is a minimal witness *x*. This means that *P*(*x*) but for no *y*∈*x* one has *P*(*y*). Another way to state foundation: .

#### (ii) Type theory

Type Theory, coming in several variants, forms an alternative to set theory. Postulated are inductively defined data types with their recursively defined functions. Moreover types are closed under function spaces and products. A type may be thought of as a set and that an element `a` belongs to type `A` is denoted by `a`:`A`. The difference with set theory is that in type theory an element has a unique type. Inductive types are given in the following examples (boolean, natural numbers, lists of elements of `A`, binary trees with natural numbers at the leafs).

These definitions should be read as follows. The only elements of `bool` are `true`, `false`. The elements of `nat` are freely generated from `0` and the unary ‘constructor’ `S`, obtaining `0`, `S`(`0`), `S`(`S`(`0`)),…. One writes for elements of `nat 1`=`S`(`0`), `2`=`S`(`1`),….

A typical element of `list_nat` is

A typical tree is

A typical element of `A`×`B` is 〈`a`, `b`〉=`pair(a, b)`, where `a:A, b:B`. Given types `A, B`, one may form the ‘function-space’ type `A->B`. There is the primitive operation of application: if `f: A->B` and `a:A`, then `f(a):B`. Conversely there is abstraction: if `M:B` ‘depends on’ an `a:A` (like `a ^{2}+a+1` depends on

`a:nat`) one may form the function

`f≔(a↦M):(A->B)`. This function is denoted by

`λa:A.M`(function abstraction). For example this can be used to define composition: if

`f:A->B, g:B->C`, then

`ĝf≔λa:A.g(f(a)):A->C`. Next to the formation of function space types there is the

*dependent cartesian product*. If

`B`is a type that depends on an

`a:A`, then one may form

`Πa:A.B`. One has (here

`B[ a≔t]`denotes the result of substitution of

`t`in

`B`for

`a`)

A typical example is `B=A ^{n}` for

`n:nat`. If

`f:(Πn:nat.A`, then

^{n})`f(2n):A`. Type theories are particularly convenient to express intuitionistic mathematics. Type theories differ as to what dependent cartesian products and what inductive types are allowed, whether or not they are predicative,2 have ‘powersets’, the axiom of choice. See Martin-Löf (1984), Aczel & Rathjen (2001), Barendregt & Geuvers (2001) and Moerdijk & Palmgren (2002). In Feferman (1998, ch. 14), a type-free system (which can be seen as a system as a type system with just one type) is presented for predicative mathematics.

^{2n}### (c) Computability

Mathematical algorithms are much older than mathematical proofs. They have been introduced in Egyptian–Babylonian–Chinese mathematics a long time before the notion of proofs. In spite of that, reflection over the notion of computability through algorithms has appeared much later, only about 80 years ago. The necessity came when Hilbert announced in 1900 his famous list of open problems. His 10th problem was the following.Given a Diophantine equation with any number of unknown quantities and with rational integral numerical coefficients: to devise a process according to which it can be determined by a finite number of operations whether the equation is solvable in rational integers.3

By a number a steps over a time interval of nearly 50 years, the final one by Matijasevic using the Fibonacci numbers, this problem was shown to be undecidable, see Davis (1973). In order to be able to state such a result one needed to reflect over the notion of algorithmic computability.

Steps towards the formalization of the notion of computability were done by Skolem, Hilbert, Gödel, Church and Turing. At first Hilbert (1926; based on work by Grassmann, Dedekind, Peano and Skolem) introduced the primitive recursive functions over by the following schemata (figure 1).4

It was shown by Sudan (1927) and Ackermann (1928) that not all computable functions were primitive recursive. Then Gödel (based on a suggestion of Herbrand) introduced the notion of totally defined computable functions,5 based on what is called nowadays *Term Rewrite Systems*, see Terese (2003). This class of total computable functions can also be obtained by adding to the primitive recursive schemata the scheme of minimalization (‘*μy*…’ stands for ‘the least *y* such that…’), Kleene (1936).

Finally, it was realized that it is more natural to formalize computable partial functions. This was done by Church (1936) using lambda calculus, and Turing (1936) using what are now called Turing machines. The formalized computational models of Turing and Church later gave rise to the so-called imperative and functional programming styles. The first is more easy to be implemented, the second more easy to use and to show the correctness of the programs.

Both the computational models of Church and Turing have a description about as simple as that of the first-order predicate calculus. More simple is the computational model given in Schönfinkel (1924) that is also capturing all partial computable functions. It is a very simple example of a Term Rewrite System (figure 2).

The system is based on terms built up from the constants , under a binary operation (application). Various forms of data (natural numbers, trees, etc.,) can be represented as , expressions. Operations on this represented data can be performed by other such expressions.

### (d) The compactness of the foundations

The study of the foundations of mathematics has achieved the following. The triple activity of defining, computing and reasoning can be described in each case by a small set of rules. This implies that it is decidable whether a (formalized) putative proof *p* (from a certain mathematical context) is indeed a proof of a given statement *A* (in that context). This is the basis of the technology of computer mathematics. For more on the relation between the foundational studies and computer mathematics, see Barendregt (2005).

## 3. Computer mathematics

In systems for Computer Algebra one can deal with mathematical objects like with full precision. The idea is that this number is represented as a symbol, say *α*, and that with this symbol one computes symbolically. One has *α*^{2}−2=0 but *α*+1 cannot be simplified. This can be done, since the computational rules for are known. In some sense is a ‘computable object’. There are many other computable objects like expressions dealing with transcendental functions (*e*^{x}, log *x*) and integration and differentiation.

In systems for computer mathematics, also called Mathematical Assistants, one can even represent non-computable objects. For example the set *S* of parameters for which a Diophantine equation is solvable. Also these can be represented on a computer. Again the non-computable object is represented by a symbol. This time one cannot simply compute whether a given number, say 7, belongs to *S*. Nevertheless, one can state that it does and in some cases one may prove this. If one provides a proof of this fact, then that proof can be checked and one can add 7∈*S* to the database of known results and use it in subsequent reasoning. In short, although provability is undecidable, being a proof of a given statement is decidable and this is the basis of systems for computer mathematics. It has been the basis for informal mathematics as well.

One may wonder whether proofs verified by a computer are at all reliable. Indeed, many computer programs are faulty. It was emphasized by de Bruijn that in case of verification of formal proofs, there is an essential gain in reliability. Indeed a verifying program only needs to see whether in the putative proof the small number of logical rules are always observed. Although the proof may have the size of several Megabytes, the verifying program can be small. This program then can be inspected in the usual way by a mathematician or logician. If someone does not believe the statement that a proof has been verified, one can do independent checking by a trusted proof-checking program. In order to do this one does need formal proofs of the statements. A Mathematical Assistant satisfying the possibility of independent checking by a small program is said to satisfy the *de Bruijn* criterion.

Of particular interest are proofs that essentially contain computations. This happens on all levels of complexity. In order to show that a linear transformation *A* on a finite dimensional vector space has a real eigenvalue one computesand determines whether *p*(*λ*) has a real root. In order to show that a polynomial function *F* vanishes identically on some variety *V*, one computes a Groebner basis to determine whether *F* is contained in the ideal generated by the equations defining *V*, see Buchberger & Winkler (1998).

Although it is shown that provability in general is undecidable, for interesting particular cases the provability of statements may be reduced to computing. These form the decidable cases of the decision problem. This will help computer mathematics considerably. Tarski (1951) showed that the theory of real closed fields (and hence elementary geometry) is decidable. An essential improvement was given by Collins (1975). In Buchberger (1965) a method to decide membership of finitely generated ideals in certain polynomial rings was developed. For polynomials over this can be done also by the Tarski–Collins method, but much less efficiently so. Moreover, ‘Buchberger's algorithm’ was optimized by e.g. Bachmair & Ganzinger (1994).

In order to show that the Four Colour Theorem holds one checks 633 configurations are ‘reducible’, involving millions of cases, see Robertson *et al*. (1996). How can such computations be verified? All these cases can be stylistically rendered as *f*(*a*)=*b* that needs to be verified. In order to do this one first needs to represent *f* in the formal system. One way to do this is to introduce a predicate *P*_{f}(*x*, *y*) such that for all *a*, *b* (say natural numbers) one has

Here ‘⊢’ stands for provability. If e.g. *a*=2, then , a representation of the object 2 in the formal system.6 In these languages algorithms are represented as so called ‘logical programs’, as happened also in Gödel (1931). In other formal theories, notably those based on type theory, the language itself contains expressions for functions and the representing predicate has a particularly natural form

This is the representation of the algorithm in the style of functional programming. Of course this all is not enough. One also needs to prove that the computation is relevant. For example in the case of linear transformations one needs a formal proof of

But once this proof is given and verified one only needs to check instances of for establishing *f*(*a*)=*b*.

There are two ways of doing this. In the first one the computation trace is produced and annotated by steps in the logical program *P*_{f} (respectively, functional program *F*). This produces a very long proof (in the order of the length of computation of *f*(*a*)=*b*) that can be verified step by step. Since the resulting proofs become long, they are usually not stored, but only the local steps to be verified (‘Does this follow from that and that?’). One therefore can refer to these as *ephemeral proofs*.

On the other hand, there are systems in which proofs are fully stored for later use (like extraction of programs from them). These may be called *petrified proofs*. In systems with such proofs one often has adopted the Poincaré Principle. This principle states that for a certain class of equations *t*=*s* no proofs are needed, provided that their validity can be checked by an algorithm. This puts some strain on the de Bruijn criterion requiring that the verifying program be simple. But since the basic steps in a universal computational model are simple, this is justifiable.

## 4. The nature of the challenge

### (a) State of the art: effort and space

Currently there are not many people who formalize mathematics with the computer, but that does not mean that the field of computer mathematics is not yet mature. The full formalization of all of undergraduate university mathematics is within reach of current technology. Formalizing on that level will be labour-intensive, but it will not need any advances in proof assistant technology.

To give an indication of how much work is needed for formalization, we estimate that it takes approximately *one work-week* (five work-days of eight work-hours) to formalize one page from an undergraduate mathematics textbook. This measure for some people is surprisingly low, while for others it is surprisingly high. Some people think it is impossible to formalize a non-trivial theorem in full detail all the way down to the axioms, and this measure shows that they are wrong. On the other hand, it takes much longer to formalize a proof than it takes to write a good informal version of it (this takes about half a workday per page: which is a factor of ten smaller).

One can also compare the formal version of a mathematical proof with the corresponding informal—‘traditional’—way of presenting that proof. In Wiedijk (2000) it is experimentally found that a file containing a full formalization of a mathematical theory is approximately *four* times as long as the LaTeX source of the informal presentation. We call this factor the *de Bruijn factor*, as de Bruijn claimed that this ratio is a constant, which does not change when one proceeds in formalizing a mathematical theory. Some researchers actually believe that the factor decreases as the theory grows.

### (b) State of the art: systems

In figure 3 some contemporary systems for computer mathematics that are especially suited for the formalization of mathematical proof are presented. On the left in this diagram there are the four ‘prehistoric’ systems that started the subject in the early seventies (three of those systems are no longer actively being used and have their names in parentheses). These systems differed in the amount of automated help that they gave to their users when doing proofs. At one extreme there was the Automath system of de Bruijn, that had no automation whatsoever: all the details of the proofs had to be provided by the user of the system himself (it is surprising how far one still can go in such a system). At the other extreme there was the nqthm system—also known as the Boyer–Moore prover—which fully automatically tries to prove the lemmas that the user of the system puts to it. In between these two extremes there was the LCF system, which implemented an interesting compromise. The user of this system was in control of the proof, but could make use of the automation of so-called tactics which tried to do part of the proof automatically. As will be apparent from this diagram the LCF system was quite influential. The seven systems in this diagram are those contemporary Proof Assistants in which a significant body of mathematics has been formalized. To give some impression of the ‘flavour’ of those systems we have put a superficial characterization in the right margin. See http://www.cs.ru.nl/∼freek/digimath/ for the web-addresses with information on these systems. The ontologies on which these systems are based are stated in table 3. All of these systems (with the exception of Automath and Mizar) were primarily motivated by computer science applications. Being able to prove algorithms and systems correct is at the moment the main driving force for the development of Proof Assistants. This is an extremely exciting area of research. Currently, people who have experience with programming claim to ‘know’ that serious programs without bugs are impossible. However, we think that eventually the technology of computer mathematics will evolve into a methodology that will change this perception. Then a bug free program will be as normal as a ‘bug free’ formalized proof is for us who do formal mathematics.

When one starts applying the technique to mathematics, one may be struck when finishing a formalization. Usually one needs to go over a proof when it is finished, to make sure one really has understood everything and made no mistakes. But with a formalization that phase is not needed anymore. One can even finish a proof before one has fully understood it! The feeling in that case is not unlike trying to take another step on a staircase which turns out not to be there.

On the other hand, when one returns from formalization to ‘normal’ programming, it feels as if a safety net has been removed. One can then write down incorrect things again, without being noticed by the system!

A currently less successful application of Proof Assistants, but one which in the long run will turn out to be even more important than verification in computer science, is the application of Proof Assistants to mathematics. The QED manifesto, see Boyer *et al*. (1994), gives a lucid description of how this might develop. We believe that when later generations look back at the development of mathematics one will recognize four important steps: (i) the Egyptian–Babylonian–Chinese phase, in which correct computations were made, without proofs; (ii) the ancient Greeks with the development of ‘proof’; (iii) the end of the nineteenth century when mathematics became ‘rigorous’; (iv) the present, when mathematics (supported by computer) finally becomes fully precise and fully transparent.

To show what current technology is able to do, in table 4 we list some theorems that have been formalized already. Clearly the technology has not yet reached ‘the research frontier’, but the theorems that can be formalized are not exactly trivial either.

The formalizations that are listed in this table are much like computer programs. To give an indication of the size of these formalizations: the Isabelle formalization of the Prime Number Theorem by Avigad and others consists of 44 files that together take 998 kb in almost thirty thousand lines of ‘code’.

### (c) What is needed?

Today no mathematician uses a Proof Assistant for checking or developing new work. We believe that in the coming decennia this will change (although we do not know exactly when this will be). We now will list some properties that a system for computer mathematics should have before this will happen.

#### (i) Mathematical style

In the current proof assistants the mathematics does not resemble traditional mathematics very much. This holds both for the statements as well as for the proofs. As an example consider the following statement:

In the HOL system this statement is called `LIM_ADD`, and there it reads7

This does not match the LaTeX version of the statement. (The technical reason for this is that as HOL does not support partial functions, the limit operator is represented as a relation instead of as a function.)

In the Mizar library the statement is called `LIMFUNC3:37`, and there it reads (where for clarity we replaced the condition of the statement, which states that the limits actually exist, by an ellipsis):

Again this does not resemble the informal version of the statement. (Here the reason is that Mizar does not support binders, and therefore the limit operator cannot bind the limit variable. Therefore the functions *f* and *g* have to be added instead of the function values *f*(*x*) and *g*(*x*).)

Clearly unless a system can accept this statement written similar tomathematicians will not be very much inclined to use it.8

While in most current systems the statements themselves do not look much like their informal counterparts, for the proofs it is even worse. The main exceptions to this are the Mizar language, and the Isar language for the Isabelle system. We call these two proof languages *mathematical modes*. As an example, the following is what a proof looks like in the actual Coq system.9

Not even a Coq specialist will be able to understand what is going on in this proof without studying it closely with the aid of a computer. It will be clear why we think that having a mathematical mode is essential for a system to be attractive to working mathematicians.

As an example of what a mathematical mode looks like, in figure 4 there is the Coq proof rephrased using the Mizar proof language.10

#### (ii) Library

The most important part of a proof assistant is its library of pre-proved lemmas. If one looks which systems are useful for doing formal mathematics, then those are exactly the systems with a good library. Using an average system with a good library is painful but doable. Using an excellent system without a library is not. The bigger the library, the more mathematics one can deal with in a reasonable time.

As an example, in Nijmegen we formalized a proof of the Fundamental Theorem of Algebra (see Geuvers *et al*. 2001) and it took a team of three people two years. At the same time Harrison formalized the same theorem all by himself (as described in Harrison 2001) and it only took him a few days. The main difference which explains this huge difference in effort needed, is that he already had an extensive library while in Nijmegen we had not.11

#### (iii) Decision procedures

One might imagine that the computer can help mathematicians find proofs. However *automated theorem proving* is surprisingly weak when it comes to finding proofs that are interesting to human mathematics. Worse, if one takes an existing informal textbook proof, and considers the gaps between the steps in that proof as ‘proof obligations’, then a general purpose theorem prover often will not even be able to find proofs for those. For this reason Shankar, whose group is developing PVS, emphasized that rather than the use of general automated theorem provers the *decision procedures*, which specialize on one very specific task, are important as they will always be able to solve problems in a short time. In fact Shankar claims that the big success of PVS is mainly due to the fact that it has the best decision procedures of all the systems, and combines those well.

Our view on automating computer mathematics is that a proof is something like an iceberg. When considering all details of the proof, a human mathematician will not even be consciously aware of the majority of those, just like an iceberg is 90% under water. What is written in papers and communicated in lectures is only the 10% of the proof (or even less) which is present in the consciousness of the mathematician. We think that the automation of a system for computer mathematics should provide exactly those unconscious steps in the proof. (There is a risk of having the computer provide too many steps so that we will not understand anymore what it is doing, and then we will not be able to guide the proof any longer.)

One should make a distinction between unconscious steps and decidable ones. Some unconscious steps may be guided in undecidable areas by heuristic tactics. Also, some decision procedures have horrendous complexity, so it is not necessarily the case that they will ‘solve problems in a short time’. However, we like to emphasize that the main function of automation in proof assistants should be taking care of unconscious steps, and that decision procedures are an important part of that.

#### (iv) Support for reasoning with gaps

The manner in which proof assistants are generally being used today is that the whole formalization is completed all the way to the axioms of the system. This is for a good reason: it turns out that it is very difficult to write down fully correct formal statements without having the computer help ‘debug’ the statements by requiring to formalize the proofs. If one starts a formalization by first writing down a global sketch of a theory, then when filling in the actual formal proofs, it often turns out that some of those statements are not provable after all!

If one just wants to use a Proof Assistant to order one's thoughts, or to communicate something to another mathematician, then fully working out all proofs is just not practical. In that case one would like to just give a *sketch* of the proof inside the formal system, as described in Wiedijk (2004). Related to this is the technique called *proof planning*, see for instance Bundy (1991). Still, the current Proof Assistants do not support this way of working very well.

In Lamport (1995) a proof style is described in which proofs are incrementally developed by *refining* steps in the proof into more detailed steps. Although that paper does not talk about proofs in the computer, and although we are not sure that the specific proof display format that is advocated in that paper is optimal, it is clear that this *style of working* should be supported by systems for computer mathematics, in order to be accepted by the mathematical community.

## 5. Romantic versus cool mathematics

After the initial proposals of the possibility of computer mathematics many mathematicians protested on emotional grounds. ‘Proofs should be survey-able in our mind’, was and still is an often heard objection. We call this the *romantic* attitude towards mathematics. There is another style, *cool* mathematics, that is, verified by a computer. The situation may be compared to that in biology. In romantic biology, based on the human eye, one is concerned with flowers and butterflies. In cool biology, based on the microscope, an extension of our eyes, one is concerned with cells. There is even *super-cool* molecular biology, based on electron microscopes. By now we know very well that these latter forms of biology are vital and essential and have a romanticism of their own. Similarly, we expect that cool proofs in mathematics will eventually lead to romantic proofs based on these. In comparison with biology there is also super-cool mathematics, checked by a computer, with a program this time not checked by the human mind, but checked by a computer in the cool way. This kind of *boot-strap* has been used for a compiled (hence faster) version of Coq, see Barras (1996, 1999).

A fully formalized proof in Coq of the Four Colour Theorem has been verified, see Gonthier (2004). Moreover a full proof in HOL of the Jordan curve theorem has been produced by Tom Hales as part of his work towards a full formalization of his proof of the Kepler conjecture. Both informal proofs need a long computer verification. These kinds of theorems with a long proof seem exceptional, but they are not. From the undecidability of provability it follows trivially that there will be relatively short statements with arbitrarily long proofs.12

We foresee that in the future *cool* proofs will have *romantic* consequences and moreover that computer mathematics will have viable applications.

## Discussion

N. Shah (*British Society of History of Mathematics*, *Durham*, *UK*). Automated reasoning systems either use logic or intuitionistic type theory. Which (in the speaker's opinion) will win out?

D. H. Barendregt. Intuitionistic type theory also uses logic, just a more explicit one. You probably ask whether the classical logic of some systems will win or not from the intuitionistic one used in others. Good question! I think that when the technology will gain momentum, then the classical systems will be in the majority, but that on the long term the intuitionistic ones will win. After all they also can be in a classical mode by assuming the excluded third.

A. Bundy (*School of Informatics*, *University of Edinburgh*, *UK*). If the mathematicians in the audience accept your argument then they will start to use automatic theorem provers. How long will it take before this becomes commonplace?

D. H. Barendregt. Pessimists think in 50 years; optimists in 10 years.

A. V. Borovik (*School of Mathematics*, *University of Manchester*, *UK*). What are advantages of deterministic proof checkers over simpler and quicker non-deterministic procedures? If a non-deterministic procedure confirms the validity of the statement with probability of error less than one in ten, after repeating it 100 times we have the probability of error in the non-deterministic judgement being less than one in ten to the power of one hundred, which is smaller than the probability of hardware error in the computer.

D. H. Barendregt. For practical purposes I do dare to step in an airplane with that low chance of falling down. So it is a matter of aesthetics. Nevertheless, the method of proof-checking applies to the correctness of your non-deterministic procedure as well.

A. J. Macintyre (*School of Mathematical Sciences*, *Queen Mary*, *University of London*, *UK*). The Isabelle proof of the Prime Number Theorem is based on the elementary proof. This proof is regarded by mathematicians as less illuminating than the complex variable proof. When will Isabelle be able to do the latter proof? Is the ‘library’ for a Master's Programme realistic without this?

D. H. Barendregt. We need more formalized libraries and in order to get these more certified tools. When this will happen depends on how hard we as community work. Complex variables should definitely be in the ‘library’ of a certified Master's.

P. H. Hines (*retired*). ‘Romantic’ proof inspires other mathematicians. But cool/computer proof does not.

D. H. Barendregt. Cool proofs have a romantic flavour of their own. Some centuries ago biologists got excited about flowers and bees. This is still the case. But now they also get excited about genome sequences.

M. Atiyah (*Department of Mathematics & Statistics*, *University of Edinburgh*, *UK*). I can understand how a computer could check a proof, or even develop in detail a proof which was only outlined. It would do what a competent research student could undertake. But in the real mathematical world the proof, or even the formulation of a theorem, is rarely known in advance, even in outline. It is hard to see how a computer can assist in such an ill-defined process.

D. H. Barendregt. The human provides the intuition. Then wants to check that intuition by constructing proofs. At first the proofs are sketchy. It is at this phase that the mathematical assistants can help.

## Acknowledgments

The authors thank Mark van Atten, Wieb Bosma, Femke van Raamsdonk and Bas Spitters for useful input.

## Footnotes

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof’.

↵One of these referees told us the following. ‘If an ordinary non-trivial mathematical paper contains an interesting idea and its consequences and obtains ‘measure 1’, then Wiles' proof can be rated as having measure 156.’

↵In predicative sytems a subset of an infinite set

*X*can only be defined if one does not refer to the class of all subsets of*X*. For example is allowed, but not↵By ‘rational integers’ Hilbert just meant the set of integers . This problem is equivalent to the problem over . The solvability of Diophantine equations over is still open.

↵This definition scheme was generalized by Scott (1970) to inductive types. For example over the binary trees introduced above one can define a primitive recursive function mirror as follows.It mirrors the tree displayed above.

↵Previously called (

*total*)*recursive functions*.↵For substantial computations one needs to introduce decimal (or binary) notation for numbers and prove that the operations on them are correctly defined. In the history of mathematics it was al-Khowàrizmì (780–850) who did not introduce algorithms as the name suggests, but proved that the well-known basic operations on the decimal numbers are correct.

↵Here ‘!’ stands for ‘∀’ and ‘\’ stands for ‘λ’.

↵Structurally this last version is what one would like to write, but typographically it still is not ideal. Perhaps one day it will be possible to use in a proof assistant a mathematical style like in † above or, at least, the LaTeX source for it.

↵In this proof ‘

`limit_in1 f D l x0`’ is the Coq notation for where*x*ranges over the set .↵Actually, this is a mixture of Mizar and Coq. The proof language is Mizar, but the statements are written in Coq syntax. We do this to be able to follow the Coq script very closely.

↵Another difference was that in Nijmegen we formalized an intuitionistic proof, while Harrison formalized a classical proof. But when analysing the formalizations, it turned out that this was not the main reason for the difference in work needed.

↵Indeed if every theorem of length

*n*would have a proof of length , then theorem-hood would be decidable by checking all the possible candidate proofs.- © 2005 The Royal Society