I introduced algorithmic information theory (AIT) in Part 23. The fractal figure below needs ~1.6 million bits for specification or storage. But it can be generated by a very small program (requiring only a tiny fraction of this number of bits), using the definition of the Mandelbrot set and a few other specifications. We say that the figure has only a small algorithmic information content, even though it looks very complex.
Computation involves three things:
1. The length of the computer program.
2. The time it takes to do the computation.
3. The computer memory needed for the job.
AIT largely ignores the second and the third aspect, and recognizes only the first for defining the information content of a given set of numbers or a given set of data. In other words, it focuses on program-size complexity.
Chaitin, who along with Kolmogorov founded the subject of AIT, makes the point that a theory can be likened to a computer program. The program calculates and explains a certain set of observations, and the smaller this program is (in terms of compression of information), the better is the theory (recall Ockham's razor).
When a set of observations or data cannot be described compactly in terms of axioms and/or theorems, there is no structure or order, or pattern, in the data. Such a set of data is logically random. Something is said to be random if the smallest program that calculates or generates it is the same size as it is, so there is no compression.
Chaitin has shown that certain facts are not just computationally irreducible or incompressible; they are logically irreducible or incompressible as well. The proof of their 'truth' must be in the form of additional axioms, without any reasoning. So there are severe limits to the powers of logic and reason.
Chaitin introduced a number omega (Ω) to quantify the degree of logical randomness of any system, and to show that the powers of reasoning are limited. He demonstrated the existence of an infinite stream of unprovable mathematical facts.
Let the term 'program' imply 'the concatenation of the computer program and the data to be read in by the program'. Consider an ensemble of all such possible programs. What is the probability that a program chosen at random from this set will ever halt (cf. Part 66)? The number Ω is that probability.
How do we choose a program at random for testing this? A program is simply a succession of bits (0s and 1s). Since we are considering all possible programs, any succession of bits is a possible program for testing its halting behaviour. We can flip a coin repeatedly to get a random sequence of bits. We go on adding random bits, one at a time, till the sequence of bits is a program that halts, if at all it can halt. The number Ω is the probability that the halting will indeed occur (if at all) for the tested sequence of randomly generated bits.
These operations, of course, assume the presence of a computing machine for doing the job of testing. We also assume the use of a programming language. But it turns out that the crucial conclusions about halting or otherwise do not depend on these things: the actual values of Ω may depend on them, but not the general conclusions drawn. Our arguments can proceed by assuming a particular computer and a particular language for computing.
Since the number Ω is a probability, it lies between 0 and 1. In binary notation it may look something like 0.110010101… The central point made by Chaitin is that the bits after the decimal point form an irreducible stream. Every 0 or 1 after the decimal point represents a fact, and the totality of these bits represents irreducible mathematical facts.
The number Ω can be regarded as an infinite sum. Each N-bit program that halts contributes 1/2N to this sum. Each such program adds a 1 to the Nth bit. One may think that a precise value of Ω can be computed by adding all the bits for the programs that halt. This is not so. Although Ω is a perfectly well-defined specific number, it is impossible to compute it in its entirety (see below). It is possible to compute only a few digits of Ω. For example, if we know that computer programs 0, 10, and 110 all halt, then Ω = 0.111 up to three digits. But the first N digits of Ω cannot be calculated by a program of length significantly shorter than N. Knowing the first N digits of Ω will enable us to tell whether or not each program up to N digits in size ever halts. This means that at least an N-bit program is needed to calculate N bits of Ω.
Chaitin’s Ω cannot be computed to arbitrarily high precision because if we know Ω exactly, we can solve Turing’s halting problem, which is actually unsolvable.
Given any finite program, no matter how long, we have an infinite number of bits that the program cannot compute. This implies that, given any finite set of axioms, there are an infinite number of truths that are unprovable in that system. Ω is irreducible.
Thus a theory of everything for all of mathematics cannot exist. The number Ω has an infinite number of bits or mathematical facts that cannot be derived from any principles simpler than the string of bits itself.
Gödel’s work had shown that individual formal axiomatic mathematical theories cannot prove the true numerical statement 'This statement is unprovable'. According to Chaitin (2006), Gödel left unanswered the key question: 'Is this an isolated phenomenon, or are there many important mathematical truths that are unprovable?' It now turns out that the number Ω provides an infinite number of true theorems that cannot be proved by any finite system of axioms.
Leibniz had stipulated that if something (a theorem) is true in mathematics, it is true for a reason, the reason being the proof of the theorem. But the bits of Ω are totally random, and therefore these mathematical truths are truths for no reason.