On proofs and Kolmogorov complexity Working paper

(1)

On proofs and Kolmogorov complexity

Working paper

Comments welcome; my email: armandobcm@yahoo.com

Armando B. Matos December 29, 2013

Various forms of proof complexity, such as the number of lines and the symbol-length, have been studied. In this work we study theKolmogorov complexity, or just complexity, of formal proofs. In brief, we show that, in a “reasonable” formal system, every theorem has proofs of arbitrarily large complexity and has a low complexity proof, that is, a proof with a complexity not exceeding the complexity of the theorem.

Gregory Chaitin argued that it is impossible to deduce a theorem witha bits of information (Kolmogorov complexity, KC, or simply complexity) from a set of axioms with less thanabits of information. Franz´en presented a counter-example in which a theorem with an arbitrarily large complexity can be deduced from a simple formal system. We justify this discrepancy by analysing the reasons for the increase of the complexity of a theorem.

The proof process may introduce “extra” complexity in the theorem that is proved (the last line of the proof) by three mechanisms that are discussed and illustrated in this work, namely: the selection of the inference rule and of the previous lines in order to infer a new line of the proof, the axiom instantiation mechanism (for instance, instantiation of a universally quantified formula, selection of an axiom from an axiom schema involving a term or formula; selection of an axiom schema from an axiom schemata), and the length of the proof itself. As shown in the examples, all these mechanisms can cause an unbounded increase in complexity, and may be seen as non-deterministic steps of the proof.

If the Church-Turing thesis is accepted an entirely deterministic formalization of formal proofs is also possible. It consists in a (deterministic) Turing machine that enumerates all thehtheorem, proofipairs deducible in the formal system. Using this alternative view of proof, we analyze the origin of the possible large value of the complexity of a theorem.

We also show that every theorem t has a proof whose complexity is not larger than the complexity oft; in particular not larger than the length (in number of symbols) oft. This is counter-intuitive because the empirical observation shows that there are theoremst that seem to require a proof much more complex thanKC(t).

(3)

1 Introduction

Various authors have studied several forms of proof complexity including the number of lines (or steps) and the symbol length, see for instance [7, 3, 15, 12]. In this work we concentrate on the Kolmogorov complexity of proofs and theorems.

Gregory Chaitin has argued that in any formal system it is impossible to prove a theorem that contains more information than the formal system¹. For instance, in [5] he writes

[. . . ] if a theorem contains more information than a given set of axioms, then it is impossible for the theorem to be derived from the axioms.

and again in [4]

If a set of theorems constitutes t bits of information, and a set of axioms contains less thantbits of information, then it is impossible to deduce these theorems from these axioms.

An “improved version” of this statement appears in [6],

One can not prove a theorem from a set of axioms that is of greater complexity than the axiomsand know that one has done this. I.e., onecan not realizethat a theorem of substantially greater complexity than the axioms from which it has been deduced, if this should happen to be the case.

Franz´en presents the following counter-example in [8].

Example 1 (Axiom instantiation as a source of KC)

Consider a formal system (in the domain of the integers) with the only axiom

“∀n:n=n”. Letxxxbe a very large, Kolmogorov random integer. Then, with only one axiom (of very low complexity), we can in one step prove that “xxx= xxx”, which is a statement having very large Kolmogorov complexity. We get a similar conclusion if we use the axiom schema “n=n” (which represents an infinite number of axioms, see page 6), instead of the axiom “∀n:n=n”.

This work has two contrasting parts:

1Chaitin’s “result” mentioned above should not be confused with another result (see for instance [4]): there is an integer m such that it is impossible to prove thatany string has Kolmogorov complexity greater thanm.

(4)

– Proofs (and theorems) can be arbitrarily complex, Section 3, page 7. Just as shown in Franz´en’s example above the complexity of a proof may be arbitrarily larger than the complexity of the axioms from which it is derived. In Section 3.6 (page 11) we summarize the reasons for this increase in complexity.

– Proofs do not have to be complex, Section 4, page 12. We also show that thecomplexityof a proof does not need toexceed the length² of the theorem that is proved plus a constant. Thus, minimum proofs may have a arbitrarily large length, but there is always a proof with a complexity bounded by the length of the theorem (plus a constant). This implies that the common idea that there are theorems whose minimum proof is very complex is false. The difference between the length of a proof and the length of its (minimum) description, that is, its Kolmogorov complexity, was already mentioned by Parikh [11]:

[. . . ] reader’s common sense to agree that a proof with over 10¹⁰⁰⁰ symbols is not itself feasible, though it may be possible to give a feasibledescription of it.

We show, and the proof is simple, that this always happen, i.e., every theorem with a feasible description (in particular with a feasible length), has a proof with a feasible description. Here “feasible description” means small Kolmogorov complexity.

In Section 5 (page 16) we briefly discuss the influence of the complexity of the formal system in use, mentioning the modifications of our results caused by that this generalization.

2 Preliminaries and notation

The following abbreviations are used F: formal system with domain N. proves(p, t): p is a proof of t.

Check: aTM such that Check(t, p) checks if p is a proof of Theorem t in a specific formal system. A proof should be clear and easy to verify; for every formal system there should beefficientproof checkers – in the sense that the execution (checking) time should be bounded by a low degree polynomial.

TM: Turing machine (deterministic).

2For a better bound replace “length” by “complexity”.

(5)

NDTM: non deterministic Turing machine.

UTM: universal Turing machine.

Th-enu: aTM that enumerates the theorems of a specific formal system.

TP-enu: a TM that enumerates the pairs htheorem, proofi of a specific formal system.

KC and “ complexity”: Kolmogorov complexity. We will use the plain Kolmogorov complexityC instead of the prefix complexity see [10]. That is,KC(x)≡C(x). By “random” we mean “Kolmogorov random”.

LKR: large and Kolmogorov random integer (C(x)≈ |x|).

xxx: a specific LKR.

The designations “KC”, “ complexity” and “Kolmogorov complexity” will be used interchangeably. The equalities and inequalities involving the Kolmogorov complexityC are to be understood “apart from a constant”.

The replacement of the variable x by the term t in formula ψ is denoted byψ[t/x].

On formal proofs, the Church-Turing thesis, and Turing machines In a formal system it is usually assumed that there is a mechanical method for deciding if a string of symbols p is a proof of the theorem t (which is also a string of symbols). Thus, the relation expressed by the predicate proves(p, t) is recursive and, by the Church-Turing thesis, there is a Turing machine³ that always halts with inputhp, ti, accepting or rejecting pas a proof of t.

MachinesTh-enu and TP-enuare deterministic. In particular, when enumerating the theorems or thehproof, theoremi pairs of a formal system, there are no non-deterministic axiom instantiations: all possible instantiations are considered in sequence. TheKCof Th-enu andTP-enu machines are not much larger than the complexities of the corresponding formal systems.

The domain of the formal systems is assumed to be N. “Much less than” and

“much greater than” are denoted by “<<” and “>>” respectively.

3This is a well known consequence of Church-Turing thesis, we cite for instance [2], page 189: . . . [on whatever reasonable proof procedure one prefers]. . . one can effectively decide whether a given objectDis a deduction of a given sentence from a given finite set of sentencesΓ0. IfΓis an infinite set of sentences, then a deduction ofDfromΓis simply a deduction ofDfrom some finite subset ofΓ0, and therefore, so long as one can effectively decide whether a given sentenceCbelongs toΓ, and hence can effectively decide whether a given finite setΓ0 is a subset ofΓ, one can also effectively decide whether a given object is a deduction ofDfromΓ. Church’s thesis then implies the following. [. . . ] (P) IfΓis a recursive set of sentences,then the relation

“Σis a deduction of sentenceD fromΓ” is recursive. The existence of a Turing machine that enumerates all the pairshp, tisuchpis a proof oftis a simple consequence of propositionP.

(6)

We view a (“traditional”) formal proof is a sequence of lines A₁, A₂, . . . , A_n, each one being a logical formula; the theorem proved is A_n. Each line A_i is either an axiom or is inferred the previous lines A1, A2, . . . Ai−1 by a rule of inference (RI). Instead of using traditional proofs⁴ we can use other proof formalisms, such as the sequent calculus (see for instance [9], Chapter XV), and obtain identical conclusions.

Axiom schemata

As mentioned before, instead of axiom, we can use an axiom schema, which represents an infinite set of axioms of the sort exemplified below.

Example 2 (of axiom schema)

(From [13], Chapter III) The axiom schema (F ⊃(G⊃F)), where “⊃” means

“implies”, represents all axioms that can be obtained by replacing F and G by logical formulas. We can of course replace this schema by the single second

order axiom,∀F G: (F ⊃(G⊃F)).

Associated with axiom schemata there is a meta-rule of uniform substitution by which we get a single axiom by replacing schema variables by (integer or propositional) terms.

An axiom schema and a single (universally quantified) axiom can often be used interchangeably. For instance, the single axiom “∀x : x = x” can replace the infinity of axioms represented by the axiom schema “x = x” (where x is a schema variable).

Although the set of axioms is often infinite, and this usually happens when there are axiom schemata, there is a weak condition that garantees that the set of provable statements (and the set of pairs of the form htheorem, proofi) is recursively enumerable.

Theorem 1 If the set of axioms is recursively enumerable, the set of provable statements is also recursively enumerable.

A formal theory is effectively generated if the set of axioms is r.e. We assume that this is the case.

4Roughly Hilbert-style deductions).

(7)

3 Reasons for the increase of KC

As illustrated by Franz´en’s example, the KC of a theorem (and of a formal proof) can be larger that theKCof the axioms from which it is derived. In this work we discuss the reasons for this increase inKC.

There are at least two forms of specifying a proof. One is the traditional form in which a non-deterministic machine (possibly the mathematician) prints a sequence of lines, the last of which is the theorem that is proved. The other is based on a deterministic machine that enumerates all the pairs htheorem, proofi.

Traditional proof. There are three main causes of the Kolmogorov complexity of a proof or theorem, namely:

(a) Axiom instantiation: the instantiation of the universal variables of the axioms to generate a new line of the proof. Franz´en’s example described above (Example 1, page 3) is of this kind. In this class we include the selection of a particular axiom from an “axiom schema”, see page 6.

(b) Selection: The selection of theRI and the previous proof lines used to infer a new line of the proof. Independently of the length of the proof, the number of previous lines used in each step is of course bounded by a constant. Example 5 (page 12) is of this kind.

(c) Length: consider for instance a formal proof where there are no choices in the proof (no possible “selections” or “axiom instantiations”). The Kolmogorov complexity of the last line, the theorem, may be large just because the proof itself is very very long; this is because a theorem may specify in particular the length of such a proof.

This shows that the length of a proof may be a (usually minor) contribution to the complexity of the proof. Example 4 (page 11) is of this kind.

Enumerating machines. A Turing machine that enumerates the theorems of a formal system can be used to print the proof of any particular theorem. Apart from a constant, the large Kolmogorov complexity of a proof or theorem is due to the necessity of specifying the “time” (the Turing machine step number, or the enumerated theorem number) when the theorem is printed. Usually this time is extremely large.

In the following sections we describe in more detail these “KCincreasing” mechanisms”.

(8)

3.1 Traditional proofs: instantiating axioms

A possible step of a (traditional) formal proof is the instantiation of a universally quantified variable (either in an axiom or in a previous line of the proof) with a particular integer or, more generally, with a term.

Definition 1 (Axiom instantiation) Given an universally valid formula ψ in a first-order language, a variable x and a term t, then ψ[t/x] is also universally valid. This process will be called “axiom instantiation” or AxInst. We also denote by AxInst the similar process of selecting an axiom from an axiom schema.

For instance, from an axiom of the form “∀x:P(x)”, whereP is some integer unary predicate, then we can deduceP(5).

This “axiom instantiation” may increase the KC of a proof by an arbitrarily amount⁵. That’s exactly what happens in the proof step

∀x:x=x → xxx=xxx

Many other common axioms are universally quantified expressions. Also, the selection of an axiom from an axiom schema is a similar process. For instance, the associativity axiom is

∀x∀y∀z: ((x+y) +z) = (x+ (y+z))

If we replacex→xxx,y→yyy,z→zzz, and the sum of the 3 values isLKR, TheKC of the proof increases with this replacement.

In order to further illustrate the possible increase of a statement’s (or proof’s) KC, we consider two Peano axioms, namely P2 and P3, page 121 of [14]. The successor of xis denoted byx⁰.

P2: Ifx is a natural number, then x⁰ is a natural number.

Here, the conclusion “x⁰ is a natural number” can have a large KC only if the antecedent “xis a natural number” already has a large Kolmogorov complexity.

Thus, using this axiom can not substantially increase theKC of the statement.

P3: There is no natural numberx such thatx⁰= 0.

This is equivalent to∀x:¬(x⁰ = 0), and we have an axiom of the form∀x :φ, which can cause a large increase ofKC. We may, for instance derive¬((xxx)⁰= 0), the successor ofxxx is not 0.

5When we talk about the KC of a proof, we can alternatively talk about the KC of a theorem, because the termφmay eventually occur in the theorem statement.

(9)

3.2 Traditional proofs: “selection”

A proof can be seen as a sequence of “lines” where each line is either an axiom or follows from the previous lines by anRI.

Thus, in a long proof there may be a large number of choices, because for each step, we have to select the axioms, or rules of inference and previous lines from which the the current line is deduced. This sequence of choices may contribute substantially to theKCof the proof or theorem. In other words, even if noAxInst is used, the theorem or proof may have aKC much larger than the complexity of the formal system.

In this work these selections will be denoted by SEL and will sometimes be referred as “axiom selections” (although they also involve in general previous proof lines or rules of inference) or simply “selections”. Notice thatAxInstmay be seen as a particular case of SEL; however, the KC increase associated with each proof step is unbounded forAxInst, and bounded forSEL.

3.3 Modus ponens does not increase the complexity

The usage of the inference rulemodus ponensdoes not significantly increaseKC significantly. Consider for instance the following part of a proof

. . . .

1 [∀x:P(x)]

2 [∀x:P(x)⇒R(x)]

3 P(xxx) AxInstline 1

4 P(xxx)⇒R(xxx) AxInstline 2

5 R(xxx) modus ponenslines 3, 4

The large value of KC is due to the axiom instantiation in line 3, not to the usage of themodus ponens rule.

3.4 Discussion of Chaitin’s argument

Consider the axiom instantiation mechanism as presented in Example 1, page 3.

It may be argued that we can replace the axiom∀n:n=nby the axiom schema

“n = n”, which represents an infinite number of axioms, so that in a formal deduction we can chose beforehand the finite number of axioms (from the axiom schemata) that are used in the proof. If we view Example 1 this way, we must include the axiomxxx=xxx in the proof (The schematax=x is not directly used). Thus, the complexity of the theorem proved (“xxx =xxx”) is equal to the complexity of the axiom used (also “xxx=xxx”).

(10)

However, this view has its shortcomings. On one hand, the other mechanisms ofKCincrease continue to exist. For instance, there is a long proof ofxxx=xxx which does not use an instantiation of the axiom scheman=n, namely

0 = 0 → s(0) =s(0) → s(s(0)) =s(s(0)) → . . . → xxx=xxx Using only the axiom selection and the length mechanisms we can arbitrarily increase the value ofKC, see Examples 4 and 5 (pages 11 and 12 respectively).

On the other hand, the usage of universally quantified axioms, for instance

∀x, y, z :x+ (y+z) = (x+y) +z, seems to be as natural and useful as the axiom instantiation mechanism, say∀n:P(n) → P(m), wherem is any fixed value.

3.5 Proof enumeration machines

A Turing machine that enumerates the theorems of a formal system provides other method for deriving the proof of a given theorem. This formalization is used in the proof of Theorem 3, page 13)⁶.

Associated with every formal system F is an enumeratingTM, that works in- definitely, printing every pair htheorem, proofi derivable in F. The KC of the machine is essentially equal to the KC of F; however, theorems with proofs having arbitrarily large KC may be deduced. While a (traditional) formal proof is non-deterministic (as mentioned in Section 3, page 7), an enumerating machine is deterministic; thus theKC of a proof cannot increase because of non-deterministic machine steps. For instance, all “axiom instantiations”, all “selections”. . . are considered sequentially and deterministically by the enumerating machine. We will later see how, in thisTMformulation, the extraKC comes from.

How can the output of a deterministic machine have an arbitrarily large complexity? In order to understand this apparent paradox, let us begin with two simple examples that correspond essentially to enumeration machines.

Example 3 (A counting machine)

If a machine lists an infinity of different strings, there are strings that are listed and have a KC much larger than the KC of the machine. For instance, the program

n←0; whiletruedo{printn; n←n+ 1}

prints integers with arbitrarily large KC. It prints all integers, for instance

6The proof precedes the theorem.

(11)

xxxand all encodings of all the proofs of Fermat’s last theorem. However, to produce, sayxxx, we have to stop the machine whennequalsxxx. This can be done with the program

P : n←0; whilen6=xxx do{n←n+ 1}; printn

but nowKC(P)≥KC(xxx) (because “xxx” is included in the program)! Thus, if we want to halt the machine at the appropriate step, we may get a much

more complex machine.

The same kind of reasoning works for some proofs and also for Turing machines that enumerate the theorems derivable in a formal system.

Example 4 (The length of the proof as a source of KC)

In order to have a deterministic formal proof system, suppose that the axioms/rules are: (i) “0is nat”. (ii) If “t is nat” is a previous line of the proof, then the line “s(t) is nat” can be deduced. (iii) No proof can contain two or more identical lines. Consider the proof

Line number proof line deduction

1 0isnat rule (i)

2 s(0) isnat line 1 and rule (ii) 3 s(s(0))is nat line 2 and rule (ii)

. . . .

xxx+ 1

xxxs’s

z }| {

s(. . .s(0)) is nat line xxxand rule (ii)

The theorem, namely “s^(xxx)(0) isnat”, has a largeKC, yet the formal system is very simple. Note however that, at each line of the proof we have two options:

finish the proof (so that this last line is the theorem), or continue. . . Essentially the largeKC of the theorem is due to the largeKC of thelength of the proof.

In fact, this example is not very different from Example 3, page 10.

This example corresponds to the situation described in item (c), page 7. All KCcomes from the length of the proof, or equivalently from xxx.

3.6 Complexity of a proof: upper bound

Theorem 2 (Bounds on the complexity of a proof ) Letpbe a proof with length (number of lines) |p| and assume that the non-determinism of p is due only to selection and instantiation steps. Then, apart from a term

O(log(max{C_s(p), Ci(p), C_l(p)})),

C(p)≤C_s(p) +C_i(p) +C_l(p)

(12)

where C_s(p), C_i(p), and C_l(p) are the terms corresponding to axiom (and previous line) selection, to the axiom instantiation, and to the length of the proof.

Each term is bounded as follows.







C_i(p) is O(|p|log|p|) Cs(p) is unbounded C_l(p) is O(log|p|)

In the last inequality O(log|p|) can be replaced by C(|p|) +c, where c is a constant.

Proof. C_s(p) (selection): for each line deduced, the number of possible choices (axioms and previous lines) is bounded by a constant. Thus each line can be specified byO(log|p|) bits.

C_i(p) (instantiation): in any instantiation a variable is replaced by an integer.

Thus,KCmay increase arbitrarily.

C_l(p) (length,KCof the proof): A proof specifies in particular its length. Thus, the overall increase ofKC(due to the length of the proof) is bounded by log₂|p|.

Clearly, if we use the Kolmogorov complexity conditional to the length C(x :

|x|), there is no contribution due to the length of the proof, C_l(p:||p|) = 0.

Examples 5 (below), 1 (page 3) and 4 (page 11) illustrate the “selection”,

“instantiation” and “length” contributions to the complexity of a proof.

Example 5 (Axiom selection as a source of KC)

Consider a formal system with the following axioms/rules: (i) “0isnat”. (ii) If

“t is nat” is a previous line of the proof, then the line “r(t) is nat” can be deduced. (iii) If “tisnat” is a previous line of the proof, then the line “s(t) is nat” can be deduced. (iii) No proof can contain two or more identical lines.

Theorems have the formF(F(. . . F(0))) where F may be either rof s. If, to derive next line, axiom (ii) or (iii) is chosen randomly and independently, then, with high probability, the complexity of the proof withnlines is very nearlyn;

the contribution due to the length of the proof is negligible.

4 No complex proofs needed

Some theorems seem to require very long and complex proofs (see for instance [1]

and the references therein). Fermat’s last theorem and Thurston’s Monster

(13)

Theorem are two examples. Assume that there are infinitely many theoremst₁, t₂,. . . such that

∀c∃i∀p_i :C(p_i) ≥ |t_i|+c (p_i is a proof oft_i) (1) wherecis a constant. This is a very weak requirement. It is enough for instance that, for anyc >1, we haveC(pi)≥c|t_i|for infinitely many theoremsti. However, in this section (Theorem 3, page 13), we prove that, apart from a constant, the Kolmogorov complexity of a proofneeds notto be larger than the Kolmogorov complexity of the theorem itself, and of course not larger than the length of the theorem,

∃c∀t∃p:C(p) ≤ C(t) +c ≤ |t|+c⁰ (2) wherec⁰ is a constant, which contradicts the intuition expressed in (1).

The following result shows that the minimum complexity of a traditional proof is essentially bounded by the length of the theorem proved.

Theorem 3 For every formal system F and for every theorem t of F, there is a traditional proof p such that C(p) ≤ C(t) +c, where c is a constant that depends only on F.

Proof. LetM be anTP-enu that corresponds⁷ toF. Assume thatM⁰ prints in sequence the pairs htheorem, proofi. For every statement t of F consider the followingTM M⁰

Turing machineM⁰, inputt.

1. ExecuteM until (and if) a pair with the formht, piis printed.

2. Printp.

As p is specified uniquely by t and M⁰, we have C(p) ≤ C(t) +c for some

constantc, dependent only onF. It follows

that every theorem t has a traditional proof p with a Kolmogorov complexity not exceeding|t|plus a constant: long proofs are very compressible and have a simple “structure”. In particular it does not contain a sequence of independent random axiom selections (seen as a sequence of 0’s and 1’s) longer than the length of the theorem (plus a constant). Similarly, the Kolmogorov complexity of the random axiom instantiations does not need to exceed the length of the theorem (plus a constant). In other words, to prove a theoremt, aNDTMonly has to generate a binary sequence with length|t|+c (in fact,C(t) +c).

7The existence ofM assumes Church-Turing hypothesis.

(14)

4.1 Generalized proofs

With a reasonable generalization of the concept of proof, Definition 2 (page 14), we can replace in the previous results, and specifically in Theorem 3 (page 13)

“the Kolmogorov complexity of the proof” by “the length of proof”. With this generalization, every theorem has a proof not longer, apart from a constant, than the Kolmogorov complexity of the theorem itself (and of course, not longer than the length of the theorem).

Now we explain what is the “reasonable generalization of proof”. Suppose that mathematicians agree that the proof checking can be done mechanically⁸ by, say a Turing machineCheck; of course, each formal system has its proof checker.

Suppose also that mathematicians agree, once and for all, on the correctness of some fixed universal Turing machine, sayU. This machine is unique (does not depend on the formal system). It defines a partial function{0,1}^?→ {0,1}^?. Definition 2 (Generalizad proof ) For any formal systemF, letCheckbe a corresponding proof checking machine. It defines a total function

Check(p) =

( htrue, ti if p is a proof of t hfalse,0i otherwise

Let alsoU be a fixedUTM. Suppose thatpsatisfiesCheck(UTM(p)) =htrue, ti.

Then,p is accepted as a proof of t, and this proof technique is called ageneralized proofof t.

Generalized proofs may be significantly shorter than traditional proofs⁹; as we will see, the length of a generalized proof never exceeds the length of the theorem proved plus a constant, Theorem 3 (page 13).

The following result shows that the minimum length of a generalized (Defini- tion 2, page 14) is essentially bounded by the length of the theorem proved.

Theorem 4 For every formal system F and for every theoremtof F, there is ageneralized proof psuch that|p| ≤C(t) +c, wherec is a constant that depends only on F.

Proof. Let z be some string. Execute the computation UTM(z), assume it finishes and letp be the output. Compute Check(p). If the result ishtrue, ti,

8Incidentally, this verification is usually quite efficient.

9Traditional proofs often have a large degree of regularity, and because of that, are highly compressible.

(15)

acceptzas a generalized proof oft. Otherwise outputno(zis not a generalized proof oft).

Thus, whenzis a generalized proof is a minimum length program forp,C(p) =

|z|we get |z|=C(p)≤C(t) +c for some constantc.

4.2 Proof length: relationship with the Kolmogorov complexity Consider an undecidable formal system like PA (Peano Arithmetic). It is known that no recursive function in the length of a theorem bounds the length nor the number of steps of a proof, otherwise the formal system would be decidable.

But, as shown in Theorem 3 (page 13), for every theorem t there is a proof proof p of t with C(p) ≤ C(t) ≤ |t| (in this section the constants related to Kolmogorov complexity will be omitted). Given the statement t, consider the following algorithm.

getProof(t). Inputt.

It tis a theorem, generates a low complexity proof oft.

– Run “in parallel” all computationsU(x) with |x| ≤ |t|.

For any such computation U(x)that halts letp be the output.

Check ifp is a proof oft.

If it is, halt with output p.

Once every theoremthas a small complexity proof, the computationgetProof(t) always halts (ift is a theorem). However, the proof pmay be enormous.

Consider a Turing machine that generates a low complexity proofp of a given theorem t, see for instance M⁰(t) in the proof of Theorem 3, page 13. The string p can be seen as an expansion of t, so that the relationship between |t|

and p has as a lower bound the function m(x) = min{C(y) :y ≥x}, (see [10], page 126), which is unbounded but grows slower than any recursive unbounded function.

It would be interesting to know if there are arbitrarily long theorems t such that the “compression” ofp=M⁰(t) to tapproaches the ultimate compression functionm⁻¹(x).

About the relation with Parikh sentences

For any fixed integern, a Parikh sentence ([11, 16]) states (informally) that This sentence does not have a proof shorter than n

(16)

or more formally, and for each fixed n, a statement P such that (using the notation of [16])

P ≡ ¬∃m: (m < n∧proofLen(m, ^pP^q))

where ^pP^qis the G¨odel number corresponding to statementP, and proofLen(m, s) is the predicate: m is the length of a proof of the statement with G¨odel num- bers. It can be shown that every Parikh statement has a (long) proof.

Consider the Parikh statementP forn= 1 000 000 000 000 000 000. From what we said above and from the results of this work we conclude that

1. P has no traditional proof with length less thann.

2. P has a traditional proof with length at leastn.

3. P has a short generalized proof.

4. P has a traditional proofp withC(p) small.

The two last statements are essentially equivalent.¹⁰

5 The complexity of the formal system

Up to now we have considered a fixed “working” formal system but, like Chaitin (see the transcriptions in page 3), we think that it is interesting to consider alternative formal systems and in particular to study the influence of the complexity of the axioms. This section contains only a few observations about this influence.

Given a formal system F, denote by C(F) the Kolmogorov complexity of a description of F, including of course, the axioms, axiom schemata, and rules of inference. Associated withF there is a proof verifier Turing machineCheck, whose complexity satisfiesC(Check)≤C(F) +c, wherecis a constant independent of the formal system. A similar inequality holds for enumerating Turing machines.

With this observation it is easy to generalize several previous results. For instance, the generalized first statement of Theorem 3 (page 13) is underlined)

For every formal systemF and for every theorem tof F, there is a traditional proofpsuch that C(p)≤C(t) +C(F) +O(log(|F |)).

10Although a Parikh sentence (for largen) only has large proofs, there is a simple and short proof of this fact. Is this related to item 3. (or 4). above?

(17)

We left to the reader the “parametrization” of other results of this work. Es- sentially, what we have to do is to replace a constant c that depends on the formal system F, by C(F) +c, where c is an absolute (not depending on F) constant.

6 Conclusions

In this work three related topics have been studied: the (Kolmogorov) complexity of “traditional” formal proofs, the complexity of theorems specified by enumerating machines, and an upper bound on the complexity of a proof.

Proofs with large complexity. Contrarily to what has been sometimes stated, theorems with arbitrarily large complexity can often be deduced from a low complexity set of axioms. Symbolically, for a typical formal systemF, we have

∃t:C(t)>> C(F)

where t is a theorem of F. The complexity in each line of the proof (the last line is the theorem) may increase due to: (i) the selection of axioms, rules of inference, and previous lines used to deduce the current line; the increase depends on the number of the current line; (ii) axiom instantiation, by an unbounded amount (recall that the domain is N);

(iii) length of the proof; this last contribution is due to the decision, at each line, of whether or not to stop the proof. Usually this contribution is negligible, but it can be dominant in some cases. It is bounded by log₂(length of the proof). These results are summarized in Theorem 2, page 11.

Enumerating machines. A formal deduction system may also be specified by a Turing machine that enumerates the pairs htheorem, proofi deducible in the system. As these Turing machines are deterministic, the only non-constant source of Kolmogorov complexity of a proof (or theorem) is the number of the computation step in which the corresponding htheorem, proofi pair is printed.

Bound on the proof complexity. We have shown that, although there are short theorems which seem to require very long proofs, the complexity of the proof never needs to exceed the complexity (and thus the length) of the theorem which is proved plus a constant. Symbolically, and apart from a constant

∀t∃p: [proves(p, t)]∧[C(p)≤C(t)≤ |t|]

See Theorem 3, page 13. This conclusion is surprising; one would expect much more non-determinism (human creativity?) in a proof of a simple to state result like Fermat’s last Theorem.

Let us informally state two consequences of these results.

1. In a “reasonable” formal system every theoremthas

(18)

– A proof with complexity not exceeding the complexity of t.

– Proofs of arbitrarily large complexity.

2. In the proof of a simple (low complexity) theorem it is never needed to have an axiom instantiations that greatly increases the complexity of the proof.

(19)

References

[1] John Carlos Baez. Insanely long proofs, October 2012.

http://johncarlosbaez.wordpress.com/2012/10/19/insanely-long-proofs/. [2] George S. Boolos, John P. Burgess, and Richard C. Jeffrey. Computability

and Logic. Cambridge University Press, 2007. Fifth Edition.

[3] Samuel Buss. Lectures on proof theory. Technical Report SOCS–96.1, School of Computer Science, McGill University, University of California, San Diego, 1996.

[4] Gregory Chaitin. Information-theoretic limitations of formal systems.

Journal of the ACM, 21:403–424, 1974.

[5] Gregory Chaitin. G¨odel’s theorem and information. International Journal of Theoretical Physics, 22:941–954, 1982.

[6] Gregory Chaitin. Information-theoretic incompleteness. Applied Mathe- matics and Computation, 52:83–101, 1992.

[7] Stephen A. Cook. The complexity of theorem-proving procedures. InPro- ceedings of the Third Annual ACM Symposium on Theory of Computing, STOC ’71, pages 151–158, New York, NY, USA, 1971. ACM.

[8] Torkel Franz´en. Godel’s Theorem: An Incomplete Guide to Its Use and Abuse. A K Peters, 2005.

[9] Stephan Cole Kleene. Introduction to Metamathematics. North-Holland, 1952.

[10] Ming Li and Paul Vit´anyi.An Introduction to Kolmogorov Complexity and Its Applications. Springer, third edition, 2008.

[11] Rohit Parikh. Existence and feasibility in arithmetic. The Journal of Symbolic Logic, 36(3):494–508, 1971.

[12] Wolfram Pohlers. Proof Theory: the First Step into Impredicativity.

Springer, 2009.

[13] Raymond Smullyan. Godel’s Incompleteness Theorems. Oxford, 1992.

[14] Patrick Suppes. Axiomatic Set Theory. Dover Books on Mathematics.

Dover, 1972.

[15] Gaisi Takeuti. Proof Theory. Advanced Studies in Pure Mathematics.

North-Holland, 1987.

(20)

[16] Noson S. Yanofsky. The Outer Limits of Reason: What Science, Mathe- matics, and Logic Cannot Tell Us. MIT Press, 2013.

On proofs and Kolmogorov complexity Working paper