Is the pattern you found persuasive?

Hero image: Kolmogorov complexity (Wikipedia)

This entry is a summary of a previous Zhihu article by me (《一种讨论“逻辑简单”的框架》, sorry again if you don’t read Chinese). This entry starts from a frequented type of question posed in elementary maths exams, ‘complete a sequence by finding a pattern’. The questions are mostly okay, and such quizzes are generally good. However, from time to time, one might find puzzles whose patterns are hard to come up with, and when we are told the answer, the patterns fail to be persuasive, making people think the designated pattern is just an artefact. Some ridicule these problems by solving all problems with polynomial interpolation. An (at least somehow) unbiased framework to determine the persuasiveness of a pattern is in dire need, which is the central topic of this entry. For Chinese speakers, you might also want to check this V2EX post.

Navigation:

Background: finding a pattern
Framework: Ockham’s razor and Kolmogorov complexity
Examples and discussion

Background: finding a pattern

Most of us have the experience of being given a (finite) sequence of numbers with a blank for a missing term, and being asked to find a pattern from the existing sequence and fill the blank one with the value that would have been there according to the rule. Examples are:

Question	One possible pattern	Possible missing term
1, 4, 7, _, 13, 16	Arithmetic progression with common difference 3	10
1, 2, _, 8, 16	Geometric progression with common ratio 2	4
1, 1, 2, 3, 5, _, 13, 21	The next term is the sum of two previous terms	8

But what if you are given this sequence: 1, 1, 2, 5, 14, _, 132, 429? The answer is 42 and the sequence is supposed to be Catalan numbers. The pattern is obscure for those who never knew the sequence. What about 38, 28, 30, _, 142, 288, 518, then? The answer is 62, and the general term is $a_n=3n^3-12n^2+5n+42$ , which was randomly made up by me.

The problems of finding a pattern and completing the missing term are mostly encountered when we are at lower grades of primary school. These problems, if well designed, are good for the development of the ability to discover patterns, or the ‘non-exhaustive inductive reasoning’ (不完全归纳). However, sometimes the pattern is hard to discover, not persuasive or not suitable for someone without certain knowledge. To make things worse, sometimes there are multiple ‘possible’ rules to produce the given terms while giving different missing terms. By putting ‘possible’ in quotation marks, I am using it in day-to-day sense, meaning that multiple similarly complicated (or similarly simple) rules.

Some people mock this kind of problems by using polynomial interpolation for all such problems, often intendedly producing ‘ridiculous’ results to exaggerate. For example, one may fill the blank in 1, 2, 3, 4, _, 6, 7 with 19260817 because the pattern is $a_n=n+\frac{19260817-5}{4!2!}({n-1})({n-2})({n-3})({n-4})({n-6})({n-7}).$ Oh, did I mention it was not necessary that we use polynomials? Even non-elementary functions are okay! Mathematically speaking, there is no ‘one single pattern’ that governs the finite observation. In real world, we would want a persuasive, presumably simple, pattern. Plus, according to the previous entry, any sequence of integers has an elementary general term, which allows us ‘filling infinite number of missing terms with “simple” rules’ if we equate elementariness with simplicity.

After all, which pattern is the most persuasive (in the day-to-day sense) varies from one to another. Perceptual thinking fails to make people agree on the thing. It is thus desirable to define a baseline to formally compare the persuasiveness between rules. We then should be able to define the standard answer of a find-the-pattern problem to be the most persuasive one.

Framework: Ockham’s razor and Kolmogorov complexity

Intuition favours simpler patterns to more complicated ones and thinks the simplest rule is the most persuasive. We also have the well-known Ockham’s razor:

Entia non sunt multiplicanda praeter necessitatem.

Entities must not be multiplied beyond necessity.

— John Punch

If we agree that simpler means more persuasive, the remaining task is to compare simplicity, which we owe to Kolmogorov complexity, a concept that naturally extends to model the complexity of a pattern.

Prerequisite Let $\mathcal{T}$ be the set of Turing machines. For $T\in\mathcal{T}$ , we denote the output of $T$ given input $n$ (if it halts) by $T(n)$ (abusing the notation for the sequence computed by it), and define $T(n)={\perp}$ for those $n$ on which $T$ does not halt. We assume without proof that there exists an injective function $r:\mathcal{T}\to\mathbb{N}$ together with a UTM $U\in\mathcal{T}$ and a computable bijection $u:\mathbb{N}^2\mapsto\mathbb{N}$ such that $U(u({r(T),x}))=T(x)$ for all $T\in\mathcal{T}$ and $x\in\mathbb{N}$ . We also assume that $r(\mathcal{T})$ is a decidable language.

Remark. Traditionally the input, the output and the encoding are (binary) strings. However, strings can be mapped to natural numbers with ‘order’ preserved (shorter strings are smaller, and ties are broken by lexicographical order). For ease of language, we’ll use numbers directly.

Remark. Think $r$ as a programming language, and $U$ an interpreter of this language. For example, you may think $r$ as JavaScript, and $U$ as ChakraCore, Node.js or V8. Or you may think $r$ as the binary executable, and $U$ as the CPU. When given the (source or machine) code $r(T)$ and the input $x$ , the interpreter simulates (executes, or simply ‘runs’) $T$ with input $x$ .

Definition A partial sequence is a function $p:\mathbb{N}\to{\mathbb{N}\cup{\left\{{\perp}\right\}}}$ . We write $p_n$ for $p(n)$ . The bottom ${\perp}$ is used to mark missing terms. A sequence is a function $s:\mathbb{N}\to\mathbb{N}$ .

Definition A Turing machine $T\in\mathcal{T}$ is said to be a solution to a partial sequence $p$ , if $T$ halts on all inputs, and $T(n)=p_n$ for all $n\in\mathbb{N},p_n\neq{\perp}$ .

Definition For a Turing machine $T\in\mathcal{T}$ , the natural number $r(T)$ is said to be the complexity of $T$ .

Remark. For source code, a shorter piece of code is simpler. Ties are broken by lexicographical order.

Definition For a partial sequence, if it has a solution, the one with the lowest complexity is said to be its pattern. The computable sequence produced by its pattern is said to be its completion.

Definition A partial sequence $p_n$ is said to be finite, if there exists $N$ such that for all $n>N$ , $p_n={\perp}$ . In other words, it is finite if and only if only there are only finitely many terms given. For a sequence $s$ , its $k$ leading terms form a finite sequence (called its $k$ -th leading part) $p^{(k)}_n=\left\{\begin{array}{ll}s_n,&n\leq k;\\{\perp},&n>k.\end{array}\right.$

From these definitions we reach the following propositions:

Computability A partial sequence $p$ has a solution if and only if there exists a computable function $f$ such that $f(n)=p_n$ for all $n\in\mathbb{N},p_n\neq{\perp}$ . In other words, a partial sequence has a solution if and only if it ‘comes from’ a computable sequence by removing some terms.

Proof. If $p$ has a solution $T\in\mathcal{T}$ , by definition, $T(n)=p_n$ for all $n\in\mathbb{N},p_n\neq{\perp}$ . Conversely, if $p$ coincides with the function computed by some $T\in\mathcal{T}$ everywhere $p_n\neq{\perp}$ , by definition, $T$ is a solution to $p$ .

Corollary A finite partial sequence always has a solution.

Lock-down of pattern/completion For a sequence $s$ , the following are equivalent:

$s$ is computable.
(Lock-down of pattern) There exists $K$ such that for all $k>K$ , the pattern of its $k$ -th leading part is the same as that of its $K$ -th one.
(Lock-down of completion) There exists $k$ such that the completion of its $k$ -th leading part is $s$ itself.

Proof. We prove by chasing a cyclic chain of implications:

(1 ⟹ 2) If $s$ is computable, let $T\in\mathcal{T}$ be the machine with the lowest complexity among those that compute $s$ . Let $K$ be the complexity of $T$ and consider machines whose complexity is less than $K$ . There are only finitely many such machines, specifically, at most $K$ ones. For each such machine $T'$ , by the choice of $T$ , $T'$ does not compute $s$ , therefore, either it doesn’t halt for some natural number, or there exists $K_{T'}$ such that $s\left(K_{T'}\right)\neq T'\left(K_{T'}\right)$ . Let $K$ be a natural number greater than all such $K_{T'}$ , then for all $k\geq K$ , $T$ is a solution to the $K$ -th leading part of $s$ , and by the choice of $T$ and $K$ , it the one with the lowest complexity, hence the pattern.
(2 ⟹ 3) Let $T$ be the common pattern for all $k$ -th leading part of $s$ for all $k\geq K$ , then it is also a common solution. For $n\in\mathbb{N}$ , since $T$ is a solution to the $({n+K})$ -th leading part, we have $s_n=T(n)$ . Therefore, the completion of the $K$ -th leading part of $s$ is the sequence itself.
(3 ⟹ 1) As $s$ is the completion of some partial sequence, it obviously is computable.

Uncomputability There does not exist an algorithm that computes (the description of) the pattern of all finite partial sequences.

Proof (Sketch). If the problem were computable, we would be able to decide the language of compressible strings (see Kolmogorov complexity). Consider the following program:

Input: $s$ .

For all $s'$ shorter than $s$ :

Find the description $r(T)$ of the pattern $T$ of $p_{s'}=s$ (the other terms are ${\perp}$ ).

If $({r(T),s'})$ is shorter than $s$ , output $\mathrm{COMPRESSIBLE}$ and terminate.

If the loop finishes without terminating, output $\mathrm{INCOMPRESSIBLE}$ .

The pattern $T$ is the ‘shortest’ machine such that $T(s')=s$ , by which the correctness of the program follows. Note that traditionally, the definition of Kolmogorov complexity does not require that $T$ halt on all inputs. Nevertheless, tweaking the definition keeps compressibility undecidable.

Examples and discussion

In simple words, for a partial sequence, a solution is a program that produces correct given terms. The pattern is the shortest program among these. If there is a tie, break it by lexicographical order.

Examples

Let’s take some non-abstract sense. Suppose we use JavaScript (ES2015) as our programming language. We define a ‘source code’ as a JavaScript expression that evaluates to a method f and define f(n) as the output on input n. Consider the sequence $s_n=n$ . If we are given only the initial term $p^{(0)}_0=0$ , what is the completion of $p^{(0)}$ ?

function(){return 0}

The completion of $p^{(0)}$ is $s^{(0)}_n=0$ . What if we are given the first two terms, i.e., $p^{(1)}$ ? What’s its completion?

function($){return $}

Its completion is $s$ itself. Let’s consider another example: 1, 2, 3, 4, _, 6, 7. Clearly the completion is $s$ . Let’s look at one of its solutions:

function (n)
{
    if (n == 5) return 19260817;
    return n;
}

That solution is obviously more complicated (longer) than the pattern, hence not considered the pattern (in day-to-day sense).

Real-world considerations

For any given partial sequence, its completion depends on the encoding $r$ , or the programming language selected. For example, given 2, 3, 5, 7, 11, the completion might not be the prime if the programming language is not efficient in expressing such concepts. (Interleave: JavaScript can express the concept with short code thanks to its regular expression engine.)

For real-world cases, the ‘programming language’ also varies from people to people. Perhaps Mathematica can be used to model mathematicians? But clearly I shouldn’t be modelled so.

The framework resolves the definition of ‘persuasive’ by resorting to simplicity. However, simplicity varies according to ‘preset knowledge’ (built-in notations), which, if to be uniformised, is just a matter of choice.

For quizzes in school, we can modify our ‘program’ to ‘mathematical expression’ and define a total order over these expressions to compare their complexities. As suggested by Liwei Cai, for people with more maths knowledge, we should include a lot of built-in functions.

Philosophical implications

The computability of the terms in the completion together with the existence of a solution to any finite partial sequence (that a pattern exists) and the uncomputability of the pattern (that finding the pattern is hard) form a dialectical contradiction (coarsely understood as ‘the two sides of a coin among everything’, a concept from Marxism). The two results are not surprising by themselves or together, but making use of the parlance learnt from philosophy (pronounced ‘political’) courses is interesting to me.

The uncomputability suggests:

The discovery of the most clean theory explaining everything we see cannot be purely mechanical.

The lock-down of completion suggests the rationality of this framework. For a computable sequence $s$ , the completion of its $k$ -th leading part eventually will be $s$ itself, independent of the choice of encoding $r$ . For any programming language, as long as it is Turing complete, the most succinct rule that explains the sequence will eventually be the same, given enough terms. Applying the analogy to human, this implies that:

Having observed abundant phenomena, people vastly different yet all knowledgeable eventually agree on the law that governs the nature.