## Abstract

To those brought up in a logic-based tradition there seems to be a simple and clear definition of proof. But this is largely a twentieth century invention; many earlier proofs had a different nature. We will look particularly at the faulty proof of Euler's Theorem and Lakatos' rational reconstruction of the history of this proof. We will ask: how is it possible for the errors in a faulty proof to remain undetected for several years—even when counter-examples to it are known? How is it possible to have a proof about concepts that are only partially defined? And can we give a logic-based account of such phenomena? We introduce the concept of *schematic proofs* and argue that they offer a possible cognitive model for the human construction of proofs in mathematics. In particular, we show how they can account for persistent errors in proofs.

## 1. Introduction

To those brought up in a logic-based tradition there seems to be a simple and clear definition of proof. Paraphrazing Hilbert (Hilbert 1930):

A proof is a sequence of formulae each of which is either an axiom or follows from earlier formulae by a rule of inference.

Let us call a proof in this format *Hilbertian*.

But formal logic and its Hilbertian view of proof is largely a twentieth century invention. It was invented to help avoid erroneous proofs and to enable proofs about proofs, for instance Gödel's proof of the incompleteness of arithmetic (Gödel 1931). Formal logic has since become the basis for automated theorem proving.

Prior to the invention of formal logic, a proof was any convincing argument. Indeed, it still is. Presenting proofs in Hilbertian style has never taken off within the mathematical community. Instead, mathematicians write *rigorous* proofs, i.e. proofs in whose soundness the mathematical community has confidence, but which are not Hilbertian.

To see that rigorous proofs are not Hilbertian, consider erroneous proofs. The history of mathematics is full of erroneous proofs. The faults in some of these proofs have remained undetected or uncorrected for many years. However, Hilbertian proofs can be readily checked. We merely need to ask of each formula in the proof: ‘is it an axiom?’ or ‘does it follow from earlier formulas by a rule of inference?’. Such checking can be readily automated—and, indeed, it often has been. But even in the age before computers, postgraduate students could have carried out the necessary checks as an apprentice exercise. Mathematical proofs are subjected to a lot of checking, so we can conclude that proofs containing persistent errors are unlikely to be Hilbertian. If a Hilbertian proof contained an error, it would surely be quickly detected and corrected.

This may lead us to ask, what are the alternatives to Hilbertian proof presentation? Can these alternative presentations help us to understand how a fault can lie undetected or uncorrected for a long time? Could we formalize these alternative presentations? Could we automate such mathematical reasoning?

We will present an alternative method of proof presentation that we call *schematic proof*. Schematic proof is partly inspired by Lakatos's rational reconstruction of the history of Euler's Theorem (Lakatos 1976). It has been automated in two projects at Edinburgh, using the constructive *ω*-rule (Baker 1993, Jamnik 2001).

We argue that schematic proofs offer a possible cognitive model for the human construction of proofs in mathematics. In particular, we show how they can account for persistent errors in proofs. They also help explain a key role for examples in the construction of proofs and a paradox in the relative obviousness of different, but related, theorems.

## 2. Lakatos's discussion of Euler's Theorem

Euler's Theorem1 states that, in any polyhedron, *V*−*E*+*F*=2, where *V* is the number of vertexes, *E* is the number of edges and *F* is the number of faces. This theorem is illustrated in the case of the cube in figure 1.

In (Lakatos 1976), Imre Lakatos gives a rational reconstruction of the history of Euler's Theorem. This history is set in a fictitious classroom of extremely bright students and a teacher. The students adopt different roles in the history of the evolution of mathematical methodology. The teacher leads them through this history. To initiate the discussion, the teacher presents a ‘proof’ of Euler's Theorem due to Cauchy, which we reproduce in figure 2. Later, the teacher and the students discover various counter-examples to the Theorem. They use these counter-examples to analyse the faulty proof and propose a wide variety of different methods for dealing with the conflict between alleged proofs and counter-examples. These methods involve refining definitions, correcting proofs, modifying conjectures, etc. Lakatos's mission was to show how the methodology of mathematics had evolved: becoming more sophisticated in its handling of proofs and refutations. Our mission is different: we will analyse the proof method underlying Cauchy's faulty proof and show how errors can arise from the use of this proof method.

### (a) Cauchy's ‘proof’ of Euler's Theorem

Lakatos's account of Cauchy's ‘proof’ is illustrated in figure 2 for the case of the cube. The general ‘proof’ is given in theorem 2.1.

*For any polyhedron*, *V*−*E*+*F*=*2*, *where V is the number of vertexes*, *E is the number of edges and F is the number of faces.*

Given a polyhedron, carry out the following steps.

*Remove one face and stretch the other faces onto the plane*. Note that*F*has diminished by 1, but that*V*and*E*are unchanged. So we are required to prove that*V*−*E*+*F*=1.*Triangulate the remaining faces by drawing diagonals*. Note that each new diagonal increases both*E*and*F*by 1, but leaves*V*unchanged. So*V*−*E*+*F*is unaffected.*Remove the triangles one by one*. There are two cases to consider, illustrated by step (iii) in figure 2. In the first case, we remove an edge, so that both*E*and*F*decrease by 1. In the second case, we remove two edges and a vertex, so that both*V*and*F*decrease by 1, but*E*decreases by 2. In either case,*V*−*E*+*F*=1 is unaffected.

Finally, we are left with one triangle. In a triangle, *V*=3, *E*=3 and *F*=1, so *V*−*E*+*F*=1, as required. ▪

### (b) A counter-example to Euler's Theorem

Many people found Cauchy's ‘proof’ convincing. However, as Lakatos illustrates, eventually many counter-examples were found. The simplest is the hollow cube, illustrated in figure 3.

Lakatos reports the reaction to this counter-example as a debate about whether the hollow cube is really a *polyhedron*. He offers two possible definitions of polyhedron, providing two opposite answers to this question.

*A polyhedron is a solid whose surface consists of polygonal faces*. Under this definition, the hollow cube *is* a polyhedron.

*A polyhedron is a surface consisting of a system of polygons*. Under this definition, the hollow cube *is not* a polyhedron.

It is interesting to ask how it is possible for Cauchy to have offered a proof about polyhedra, when the question of their definition was still open.2 This could not happen in Hilbertian proofs: definitions are axioms and must come first. What kind of proof *might* allow us to keep the definitions open?

## 3. Schematic proofs

Note that the proof of theorem 2.1 is a *procedure*: given a polyhedron, a series of operations is specified, whose application will reduce the polyhedron to the triangle. The value of *V*−*E*+*F* is tracked during these operations. The actual number of operations to be applied will vary depending on the input polyhedron. This is very *unlike* a Hilbertian proof, which is not procedural and in which the same number of proof steps is used for all examples.

Independently of (Lakatos 1976), the Mathematical Reasoning Group at Edinburgh became interested in the constructive *ω*-rule. We subsequently realized that this rule generates just the kind of proof used above to prove theorem 2.1. We call this kind of proof *schematic*. Schematic proofs are procedures that, given an example, generate a proof, which is specific to that example. The number of steps in each proof depends on the example.

### (a) The constructive *ω*-rule

The *ω*-rule for the natural numbers 0, 1, 2, … isi.e. we can infer that *ϕ*(*x*) for all natural numbers *x* provided we can prove *ϕ*(*n*) for *n*=0, 1, 2, …. The *ω*-rule is clearly not a very practical rule of inference, since it requires the proof of an infinite number of premises to prove its conclusion. A Hilbertian proof using it would consist of an infinite sequence of formulas. Its use is usually confined to theoretical discussions. It was first described in a published work by Hilbert (1931).

The constructive *ω*-rule is a refinement of the *ω*-rule that *can* be used in practical proofs. It has the additional requirement that the *ϕ*(*n*) premises be proved in a *uniform* way, i.e. that there exists a recursive program, *proof*_{ϕ}, which takes a natural number *n* as input and returns a proof of *ϕ*(*n*) as output. We will write this as *proof*_{ϕ}(*n*): *ϕ*(*n*). The recursive program *proof*_{ϕ} formalizes our notion of schematic proof. Applied to the domain of polyhedra, rather than natural numbers, it could be used to formalize Cauchy's ‘proof’ of Euler's Theorem given in theorem 2.1 in §2*a*.

### (b) Implementation of the constructive *ω*-rule

To demonstrate its practicality as a rule of inference, we have implemented the constructive *ω*-rule within two automated theorem-proving systems. In outline, these automated theorem-provers use the following procedure.

Start with some proofs for specific examples, e.g.

*proof*_{ϕ}(3):*ϕ*(3),*proof*_{ϕ}(4):*ϕ*(4).Generalize these proofs of examples to obtain a recursive program:

*proof*_{ϕ}.Verify, by induction, that this program constructs a proof for each

*n*.

At first sight it may appear that step (iii) replaces a Hilbertian, object-level induction with an isomorphic meta-level induction. Siani Baker's work (Baker 1993), however, shows that the meta-level induction is *not* isomorphic to the object-level one; often it is much simpler. Her work revealed many examples in which the object-level proof required generalization or intermediate lemmas, but the meta-level proof did not, i.e. the meta-level proof was inherently simpler.

Note that Cauchy's ‘proof’ omits the final verification step (iii) in the above procedure. He is trusting that his program for reducing polyhedra to triangles will work for all polyhedra. As we have seen, it does not. This helps explain the error in his proof.

Both of our implementations of the constructive *ω*-rule were for the natural numbers: Siani Baker's for simple arithmetic theorems (Baker 1993) and the second author's for a form of diagrammatic reasoning (Jamnik 2001), as described in Roger Nelsen's book ‘Proofs without words’ (Nelsen 1993). Both our implementations included the final verification step, so were guaranteed to produce only correct schematic proofs.

An example schematic proof from Baker's work is given in figure 4. Rather than describe the program in a programming language, we try to capture the infinite family of proofs that it outputs, using ellipses to indicate those steps and expressions that occur a variable number of times.

An example proof from the second author's work is shown in figure 5. Her program, Diamond, was shown proofs for two numbers and then generalized these into a program for generating the proof for any number.

## 4. The relative difficulty of proofs

Proof by induction is the Hilbertian alternative to schematic proofs of theorems about recursive data-types, such as the natural numbers. We have also investigated the automation of inductive proofs (Bundy 2001). However, evidence arising from these investigations suggests that humans do *not* use inductive proof when assessing the truth or falsity of conjectures. This is perhaps not surprising, since the formalization of mathematical induction is a relatively modern development.3 Schematic proof is an alternative candidate model of the mechanism humans use to prove conjectures over infinite domains.

To illustrate the evidence against inductive proof as a cognitive model, consider the rotate-length theorem on lists:(4.1)where *l* is a list of elements of type *τ*, len is a unary function that takes a list *l* and returns its length and rot is a binary function that takes a number *n* and list *l* and rotates the first *n* elements of *l* from the front to the back. The recursive definitions of len and rot are given in figure 6. Also defined, is a binary function < >, which takes two lists and appends them together. < > is an auxiliary function in the definition of rot. The most straightforward induction rule for lists iswhere [ ] is the empty list, [*h*|*t*] places an *h* at the front of a list *t*, *τ* is an arbitrary type and list(*τ*) is the type of lists of elements of type *τ*.

Having absorbed the definitions, most people will readily agree that the rotate-length theorem, stated in equation (4.1) is true. However, its inductive proof is surprisingly difficult. It cannot be proved directly from the recursive definitions in figure 6. The proof requires the use of auxiliary lemmas, the generalization of the theorem and/or the use of a more elaborate form of induction. For instance, one way to prove the rotate-length theorem is first to generalize it to(4.2)Although people will also readily agree that equation (4.2) is also true, they find this assessment a little more difficult than that of equation (4.1).

So, if people are using induction to assess conjectures such as equations (4.1) and (4.2), even if unconsciously, then we are faced with a paradox: what appears to be a fairly easy assessment of equation (4.1), entails what appears to be the harder assessment of equation (4.2). Moreover, the inductive proof is quite difficult, requiring, for instance, the generalization of the initial conjecture and the speculation and proof of a couple of intermediate lemmas (or some alternative but similarly complex processes) (Bundy 2001, §§ 6.3.3 and 6.2.2). This phenomenon is not rare. On the contrary, we have found lots of similar examples, where an intuitively obvious conjecture has only a complex inductive proof, requiring generalization, lemmas, non-standard inductions or a mixture of these. The intermediate generalizations, lemmas, etc. are often harder to assess than the original conjecture.

The schematic proof of the rotate-length theorem is given in figure 7. This does *not* require any generalizations or intermediate lemmas and is fairly straightforward. The schematic proof can be viewed as evaluating the theorem on a generic list using the recursive definitions. In informal experiments, when we have asked subjects what mechanism they used to assess the truth/falsity of this theorem, they report a process that resembles this schematic proof,4 making it a candidate for a cognitive model.

## 5. Schematic proofs as a cognitive model

We hypothesize that schematic proofs provide a cognitive model of some mathematical proofs. We are soon to start a project to test this hypothesis. It appears to provide a good fit to procedural-style proofs, such as Cauchy's ‘proof’ of Euler's Theorem and some informal justifications of inductive conjectures. It gives an account of how certain kinds of errors might occur: if the final verification step is omitted, then it is a matter of luck whether a schematic proof will produce a sound proof for every example. If the number of possible examples is infinite or very large, it may take some considerable time before someone comes across an example for which the schematic proof fails. It may be difficult to tell *why* the schematic proof fails on this example and how to repair it.

The constructive *ω*-rule provides a logic-based, formal account of schematic proof, as an alternative to the standard Hilbertian account. This gives us a sound mathematical basis to investigate rigorous,5 as opposed to Hilbertian proof. It fills a gap in MacKenzie's classification of proofs (this volume) by providing an example of a rigorous but mechanizable style of proof.

To confirm our hypothesis, we will have to carry out a series of psychological experiments to compare human mathematical performance with our implementations and formal accounts of schematic proof. Here is an outline of the kind of experiment we might perform.

Present theorems and near-miss non-theorems to human mathematicians.

Ask for both their judgement of the truth/falsity of each example, together with a justification of that decision.

Investigate how subjects explore and absorb the meaning of recursive definitions. Do they try examples instances? Do they reason inductively? We may provide a computational tool that will support subjects' exploration and keep a record of these explorations as experimental data.

Try to model these justifications and explorations computationally.

Decide whether schematic proof or a Hilbertian proof provides a better explanation of these justifications or whether neither is a good fit and there is a third possibility.6 It may be that subjects' preferences vary according to their reasoning style. We may want to apply some pre-tests to classify the subjects' reasoning styles.

There has been some previous work on building computational models for how students recognize patterns in numbers and construct functions to generate the numbers (Haverty *et al*. 2000). This work could provide guidance on our experimental design, for instance, asking the students to speak aloud during the experiment, then analysing the resulting protocols for clues about internal processing.

We also plan to conduct an investigation of some historical proofs, especially those that were later found to be faulty, to see if schematic proof can offer an explanation of the mechanism used and help explain why errors were made that were hard to detect or correct.

## 6. Discussion

In this section, we discuss some related work.

### (a) Comparison to type theory

There are superficial similarities between the proposals here, to use the constructive *ω*-rule to produce schematic proofs and the formal representation of proofs in constructive type theory (Martin-Löf 1984). Both, for instance, associate, with each theorem, an object that can be interpreted as both a program and a proof. However, there are also significant differences.

Firstly, the object associated with a theorem in type theory can be interpreted both as a program constructed by the proof and as a verification proof of the correctness of that program and of the theorem. Moreover, the type theory proof proves the *whole* theorem. On the other hand, the program associated with a theorem by the constructive *ω*-rule is not a general proof of the theorem; it is a program which will *generate* a putative proof for each instance of the theorem. Note that members of this family of proofs can contain different numbers of steps.

Secondly, the program in type theory is generated by the process of proving the theorem. Its correctness is guaranteed by the soundness of type theory. The program in the constructive *ω*-rule is generated by inductive generalization from a few example proofs of these theorem instances. As we have shown, its correctness is *not* guaranteed; an additional meta-level proof is required to establish this. This meta-level proof has no counterpart in type theory.

### (b) Rigorous proof as Hilbertian proof highlights

Many accounts of rigorous proof implicitly or explicitly adopt the position that a rigorous proof is essentially a Hilbertian proof, but with steps missing; possibly, 90% or more of the steps. This is probably a valid account of many human-constructed proofs. We must then ask how errors may arise in such proofs and how these errors may lie undetected.

It is not the case that mathematicians first produce the Hilbertian proofs and then summarize them for publication by eliding 90%+ of the steps. Firstly, this explanation is at variance with accounts of proof discovery. Secondly, the few attempts to turn rigorous proofs into Hilbertian proofs often reveal errors. For instance, Jacques Fleuriot's formalization of Newton's proofs of Kepler's Laws, using the Isabelle prover and non-standard analysis (Fleuriot 2001), revealed an error in Newton's manipulation of infinitesimal numbers. Even Hilbert himself was not immune. Laura Meikle's Isabelle formalization of Hilbert's Grundlagen (Meikle & Fleuriot 2003), revealed that Hilbert had appealed to the semantics of the geometry domain rather than just the axioms and rules of the formal theory. Thirdly, rigorous proofs often prove unreasonably hard to check. We have argued that checking Hilbertian proofs should be routine: even Hilbertian proofs with many missing steps. At the time of writing, putative proofs of both the Poincaré Conjecture and the Kepler Conjecture are undergoing extensive checking.

Human accounts of proof discovery suggest that mathematicians first form a plan, which they then unpack until they are satisfied of the truth of each proof step. This process of proof discovery can also be automated using the technique of *proof planning* (Bundy 1991). However, in our automation of proof planning, the plan is unpacked into a Hilbertian proof. Humans stop short of this level of unpacking. It would be interesting to investigate how they decide when to stop. A possibility worth investigation is that schematic proof is used at the leaves of the proof plan, i.e. that the proof plan is unpacked until the remaining subgoals can be checked against a few well-chosen examples. This would explain how errors could be introduced into the proof plan. It also unites our two rival accounts of rigorous proof.

## 7. Conclusion

The standard Hilbertian account of mathematical proof fails to model some historically important proofs, to account for the possibility of undetected and uncorrected error and to account for the relative difficulty of proofs. Schematic proof provides an alternative account of proof that *does* address these issues. Schematic proofs are based on the constructive *ω*-rule, which provides a formal, logic-based foundation. This, for instance, enables us to automate the construction and application of schematic proofs. Schematic proof provides a link to computer program verification, in which an invariant formula, analogous to *V*−*E*+*F*, is shown to be preserved by successive computational operations. Just like Cauchy, programmers who do not verify their programs run the risk that their systems will fail on unforeseen inputs. The process of forming schematic proofs by generalizing from examples provides a key role for examples in the construction of proofs. This may go some way to explain why humans find models, diagrams, etc. so valuable during proof discovery.

We are now planning to conduct some psychological investigations into the extent to which schematic proofs can account for the mechanisms of human proof discovery.

## Discussion

S. Colton (*Department of Computing, Imperial College London, UK*). In the (admittedly few) times I have undertaken research mathematics, half of the time, I have progressed in the style of Fermat, Goldbach and Euler, i.e. I have noticed a pattern, then proved that the pattern is no coincidence. However, half the time, the theorem statement emerged at the same time that I finished the proof, because when I started, I did not know exactly what I wanted to prove. I worry that in asking the question ‘What is proof?’, this emphasizes a distinction between theorem and proof, whereas often, they should not be differentiated. I was wondering what your thoughts were about this.

A. Bundy. I strongly agree that the development of the theorem and the development of the proof often proceed in parallel. Indeed, in axiomatic theories, the development of the theory and the definition of the concepts *also* sometimes proceeds in parallel with proof and theorem development. In automatic theorem proving this mutability of definitions, theorem and theory is usually neglected: the theory and theorem are taken as given and fixed. I have always been interested in the potential for interaction between axioms, definitions, theorems and proofs and hope to investigate this further in my future research. For instance, I previously conducted joint research with Raul Monroy on the automatic detection and correction of false conjectures, using a failed proof attempt to guide the correction process. Currently, I have a project with Fiona McNeill that uses failures in proof to change the underlying representation of the theory. I have a project proposal to build a proof management system that will assist users to manage a large corpus of proofs, including adapting the proofs to accomodate changes in axioms and definitions.

E. B. Davies (*Department of Mathematics, King's College London, UK*). In some areas of mathematics almost the entire interest consists in finding appropriate generalizations of simple examples, guided by what is possible and what seems natural, beautiful and simple. Different mathematicians have frequently been led to quite different generalizations of the same theorem. Checking the correctness of the proofs is not the interesting part of the subject. Do you have any comments?

A. Bundy. More generally, I think that mathematical reasoning consists of the interaction between a heterogeneous collection of processes: theory development, conjecture making, counter-example finding, formalization of informal problems, calculation, as well as proof discovery. Among these processess, proof discovery has received a disproportionate amount of attention from the automated reasoning community. In my group's research, we have tried to investigate a wider range of mathematical processes. For instance, Simon Colton has built the HR system for theory development, including the formation of concepts and conjectures by generalization from examples.

T. Anderson (*CSR, School of Computing Science, University of Newcastle, UK*). Must we consign mathematics to the dustbin until computers have confirmed the validity of the theorems and proofs?

A. Bundy. What an extraordinary idea! Firstly, Lipton has argued in this meeting that the correctness of a proof is confirmed by a social process of interaction between mathematicians. We might want to integrate our computer systems into this social process so that they played a complementary role to the human mathematicians, but that requires solving very hard problems about the accessibility of and interaction with computer proof. Secondly, automated theorem proving is not yet capable of dealing with most state-of-the-art mathematics, but needs augmenting with human guidance, which calls for skills that are in short supply. Thirdly, even if the propensity of humans to err means we cannot be 100% confident of the correctness of a proof, a human-generated proof could still form the starting point for a computer verification—indeed, most interative proof case studies have taken this form. Fourthly, human mathematicians find mathematics fun. Why would they stop having fun? I think we need to find niches where automated mathematical reasoning systems can play a useful complementary role to human mathematicians, e.g. in checking very large or complicated, but elementary proofs, in computer-aided teaching tools, in paper writing and refereeing aids, etc. We can also use automated reasoning as a modelling mechanism to try to understand human reasoning.

N. Shah (*British Society of History of Mathematics, Durham, UK*). Professor Bundy mentioned that Newton's Principia had a flaw. The flaw was addressed (as I understand it), by Berkeley and the axiom of the first and last ratio.

A. Bundy. Bishop Berkeley addressed some general worries about the consistency of reasoning with infinitesimals. As a result of such criticisms, the use of infinitesimals in calculus was replaced by the epsilon/delta arguments of real analysis. Interestingly, work by the logician Abraham Robinson and others in the 1960s provided a consistent theory of infinitesimal and infinite numbers, called non-standard analysis, within which the arguments about infinitesimals in the Principia and elsewhere can be formulated without the risk of paradoxes. Jacques Fleuriot's mechanization of parts of the Principia, as mentioned in my paper, was based on Robinson's non-standard analysis. The particular flaw Fleuriot discovered in Newton's ‘proof’ of Kepler's laws of planetary motion, was not found by Berkeley, nor, to the best of our knowledge, by anyone else in the 318 years since the Principia was published, even though Newton himself was aware of this kind of flaw (dividing both sides of an equation by an infinitesmal). This illustrates how the nit-picking detail required in computer proofs can help uncover problems that a human's more cursory reading will skim over without noticing.

N. Shah. Historically, mathematicians are interested in the intention behind the maths, and so long as their proof submitted can be repaired if wrong, mathematicians are not going to be convinced by proof reasoning people that the i's dotted/t's crossed are the most important thing.

A. Bundy. One of the most surprising lessons I have learnt from this meeting is the tolerance of mathematicians for minor errors in proofs. The clearest example of this was Michael Aschbacher's estimate of the probablity of error in the classification of finite groups as *p*=1. His conclusion was not to reject the proof, but to assume that any error could be readily repaired, if necessary by adding a few more categories in the classification. I interpret this stance as evidence of the hierarchical nature of proofs: if the high-level structure is right then any error in the low-level detail can probably be repaired. My research group has exploited this hierarchical structure in our work on proof plans: a hierarchical way of describing computer proofs, which can be used to guide the automated search for a proof and to automatically repair failed proof attempts. Despite this tolerance for minor error, I am guessing that mathematicians would still welcome a tool that could detect, and maybe even repair, such errors.

M. Atiyah (*Department of Mathematics & Statistics, University of Edinburgh, UK*). I think I have to defend Newton! Any error in his work on planetary orbits must surely have been of a minor kind, easily rectifiable. His results have been rederived by modern methods innumerable times and experimentally confirmed, so it is not significant if such a minor lapse actually went undetected until recent times.

A. Bundy. This remark surely provides further evidence of the point made by the last questioner and in my response to it. Indeed, Newton's error *was* of a minor kind. It was rectified by Fleuriot, who was able to replace the faulty step with a correct sub-proof. My Principia example was not meant to be a criticism of Newton. Rather, I was using it to illustrate both how hard it can be to detect minor errors, even in one of the world's oldest and most well-known mathematics books, and how a computer proof can help us detect and repair such errors.

A more serious example arises from the more recent work of Fleuriot and his student Laura Meikle (a student helper at this meeting). They automated Hilbert's Grundlagen, which was an attempt to formalize Euclidean Geometry without the use of geometric intuition. But the detailed computer reconstruction showed that Hilbert *had* used geometric intuition, although again the missing formal steps could be provided. However, although the errors were minor, their presence was not, since it undermined the raison d'etre of Hilbert's work.

R. Pollack (*School of Informatics, University of Edinburgh, UK*). Questioners objected that the error in Newton's Principia was minor, and after all, the theorem was correct, so nothing much was gained by Fleuriot discovering and fixing the error. But at the same time, questioners insisted on the importance of the deep understanding captured in the proof: this is trying to have your cake and eat it at the same time.

A. Bundy. It all depends on what you mean by ‘deep’. You could argue that the deep understanding arises from the high-level structure of the proof, which captures the essential intuition behind the proof, rather than the detailed low-level proof steps within which the error was found. On the other hand, many of the paradoxes that arose from the informal use of infinitesimals arose from just such faulty low-level proof steps. Any ‘deep understanding’ of the cause of these paradoxes would require the investigation of exactly these low-level steps.

## Acknowledgements

The research reported in this paper was supported by EPSRC grants GR/S01771 and GR/S31099 (Bundy), an EPSRC Advanced Research Fellowship GR/R76783 (Jamnik) and a Swedish Institute Guest Scholarship and EPSRC/MRC Neuroinformatics Studentship EP/C51291X/1 (Fugard). We are grateful to Jürgen Zimmer for help with (Hilbert 1930) and to Torkel Franzen for discussions about the *ω* rule.

## Footnotes

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof’.

↵More properly, this would be called ‘Euler's Conjecture’, since he proposed, but did not prove it.

↵Nor is it yet closed, since terms such as surface, system, etc. have still to be defined.

↵It is usually attributed to Richard Dedekind in 1887, although informal uses of induction date back as far as Euclid.

↵For instance, Aaron Sloman, personal communication.

↵Rigorous proofs are sometimes called ‘informal’, but this is a misleading description since, as we have seen, they can be formalized.

↵Another potential candidate is a mechanism we have investigated for reasoning with ellipsis (Bundy & Richardson 1999).

- © 2005 The Royal Society