Is the pattern you found persuasive?

Hero image: Kolmogorov complexity (Wikipedia)Hero image: Kolmogorov complexity (Wikipedia)Hero image: Kolmogorov complexity (Wikipedia)Hero image: Kolmogorov complexity (Wikipedia)
Hero image: Kolmogorov complexity (Wikipedia)

This entry is a summary of a previous Zhihu article by me (《一种讨论“逻辑简单”的框架》, sorry again if you don’t read Chinese). This entry starts from a frequented type of question posed in elementary maths exams, ‘complete a sequence by finding a pattern’. The questions are mostly okay, and such quizzes are generally good. However, from time to time, one might find puzzles whose patterns are hard to come up with, and when we are told the answer, the patterns fail to be persuasive, making people think the designated pattern is just an artefact. Some ridicule these problems by solving all problems with polynomial interpolation. An (at least somehow) unbiased framework to determine the persuasiveness of a pattern is in dire need, which is the central topic of this entry. For Chinese speakers, you might also want to check this V2EX post.

Background: finding a pattern

Most of us have the experience of being given a (finite) sequence of numbers with a blank for a missing term, and being asked to find a pattern from the existing sequence and fill the blank one with the value that would have been there according to the rule. Examples are:

Question One possible pattern Possible missing term
1, 4, 7, _, 13, 16 Arithmetic progression with common difference 3 10
1, 2, _, 8, 16 Geometric progression with common ratio 2 4
1, 1, 2, 3, 5, _, 13, 21 The next term is the sum of two previous terms 8

But what if you are given this sequence: 1, 1, 2, 5, 14, _, 132, 429? The answer is 42 and the sequence is supposed to be Catalan numbers. The pattern is obscure for those who never knew the sequence. What about 38, 28, 30, _, 142, 288, 518, then? The answer is 62, and the general term is an=3n312n2+5n+42a_n=3n^3-12n^2+5n+42, which was randomly made up by me.

The problems of finding a pattern and completing the missing term are mostly encountered when we are at lower grades of primary school. These problems, if well designed, are good for the development of the ability to discover patterns, or the ‘non-exhaustive inductive reasoning’ (不完全归纳). However, sometimes the pattern is hard to discover, not persuasive or not suitable for someone without certain knowledge. To make things worse, sometimes there are multiple ‘possible’ rules to produce the given terms while giving different missing terms. By putting ‘possible’ in quotation marks, I am using it in day-to-day sense, meaning that multiple similarly complicated (or similarly simple) rules.

Some people mock this kind of problems by using polynomial interpolation for all such problems, often intendedly producing ‘ridiculous’ results to exaggerate. For example, one may fill the blank in 1, 2, 3, 4, _, 6, 7 with 19260817 because the pattern is an=n+1926081754!2!(n1)(n2)(n3)(n4)(n6)(n7).a_n=n+\frac{19260817-5}{4!2!}({n-1})({n-2})({n-3})({n-4})({n-6})({n-7}). Oh, did I mention it was not necessary that we use polynomials? Even non-elementary functions are okay! Mathematically speaking, there is no ‘one single pattern’ that governs the finite observation. In real world, we would want a persuasive, presumably simple, pattern. Plus, according to the previous entry, any sequence of integers has an elementary general term, which allows us ‘filling infinite number of missing terms with “simple” rules’ if we equate elementariness with simplicity.

After all, which pattern is the most persuasive (in the day-to-day sense) varies from one to another. Perceptual thinking fails to make people agree on the thing. It is thus desirable to define a baseline to formally compare the persuasiveness between rules. We then should be able to define the standard answer of a find-the-pattern problem to be the most persuasive one.

Framework: Ockham’s razor and Kolmogorov complexity

Intuition favours simpler patterns to more complicated ones and thinks the simplest rule is the most persuasive. We also have the well-known Ockham’s razor:

Entia non sunt multiplicanda praeter necessitatem.

Entities must not be multiplied beyond necessity.

— John Punch

If we agree that simpler means more persuasive, the remaining task is to compare simplicity, which we owe to Kolmogorov complexity, a concept that naturally extends to model the complexity of a pattern.

Prerequisite Let T\mathcal{T} be the set of Turing machines. For TTT\in\mathcal{T}, we denote the output of TT given input nn (if it halts) by T(n)T(n) (abusing the notation for the sequence computed by it), and define T(n)=T(n)={\perp} for those nn on which TT does not halt. We assume without proof that there exists an injective function r:TNr:\mathcal{T}\to\mathbb{N} together with a UTM UTU\in\mathcal{T} and a computable bijection u:N2Nu:\mathbb{N}^2\mapsto\mathbb{N} such that U(u(r(T),x))=T(x)U(u({r(T),x}))=T(x) for all TTT\in\mathcal{T} and xNx\in\mathbb{N}. We also assume that r(T)r(\mathcal{T}) is a decidable language.

Remark. Traditionally the input, the output and the encoding are (binary) strings. However, strings can be mapped to natural numbers with ‘order’ preserved (shorter strings are smaller, and ties are broken by lexicographical order). For ease of language, we’ll use numbers directly.

Remark. Think rr as a programming language, and UU an interpreter of this language. For example, you may think rr as JavaScript, and UU as ChakraCore, Node.js or V8. Or you may think rr as the binary executable, and UU as the CPU. When given the (source or machine) code r(T)r(T) and the input xx, the interpreter simulates (executes, or simply ‘runs’) TT with input xx.

Definition A partial sequence is a function p:NN{}p:\mathbb{N}\to{\mathbb{N}\cup{\left\{{\perp}\right\}}}. We write pnp_n for p(n)p(n). The bottom {\perp} is used to mark missing terms. A sequence is a function s:NNs:\mathbb{N}\to\mathbb{N}.

Definition A Turing machine TTT\in\mathcal{T} is said to be a solution to a partial sequence pp, if TT halts on all inputs, and T(n)=pnT(n)=p_n for all nN,pnn\in\mathbb{N},p_n\neq{\perp}.

Definition For a Turing machine TTT\in\mathcal{T}, the natural number r(T)r(T) is said to be the complexity of TT.

Remark. For source code, a shorter piece of code is simpler. Ties are broken by lexicographical order.

Definition For a partial sequence, if it has a solution, the one with the lowest complexity is said to be its pattern. The computable sequence produced by its pattern is said to be its completion.

Definition A partial sequence pnp_n is said to be finite, if there exists NN such that for all n>Nn>N, pn=p_n={\perp}. In other words, it is finite if and only if only there are only finitely many terms given. For a sequence ss, its kk leading terms form a finite sequence (called its kk-th leading part) pn(k)={sn,nk;,n>k.p^{(k)}_n=\left\{\begin{array}{ll}s_n,&n\leq k;\\{\perp},&n>k.\end{array}\right.

From these definitions we reach the following propositions:

Computability A partial sequence pp has a solution if and only if there exists a computable function ff such that f(n)=pnf(n)=p_n for all nN,pnn\in\mathbb{N},p_n\neq{\perp}. In other words, a partial sequence has a solution if and only if it ‘comes from’ a computable sequence by removing some terms.

Proof. If pp has a solution TTT\in\mathcal{T}, by definition, T(n)=pnT(n)=p_n for all nN,pnn\in\mathbb{N},p_n\neq{\perp}. Conversely, if pp coincides with the function computed by some TTT\in\mathcal{T} everywhere pnp_n\neq{\perp}, by definition, TT is a solution to pp.

Corollary A finite partial sequence always has a solution.

Lock-down of pattern/completion For a sequence ss, the following are equivalent:

  1. ss is computable.
  2. (Lock-down of pattern) There exists KK such that for all k>Kk>K, the pattern of its kk-th leading part is the same as that of its KK-th one.
  3. (Lock-down of completion) There exists kk such that the completion of its kk-th leading part is ss itself.

Proof. We prove by chasing a cyclic chain of implications:

  • (1 ⟹ 2) If ss is computable, let TTT\in\mathcal{T} be the machine with the lowest complexity among those that compute ss. Let KK be the complexity of TT and consider machines whose complexity is less than KK. There are only finitely many such machines, specifically, at most KK ones. For each such machine TT', by the choice of TT, TT' does not compute ss, therefore, either it doesn’t halt for some natural number, or there exists KTK_{T'} such that s(KT)T(KT)s\left(K_{T'}\right)\neq T'\left(K_{T'}\right). Let KK be a natural number greater than all such KTK_{T'}, then for all kKk\geq K, TT is a solution to the KK-th leading part of ss, and by the choice of TT and KK, it the one with the lowest complexity, hence the pattern.
  • (2 ⟹ 3) Let TT be the common pattern for all kk-th leading part of ss for all kKk\geq K, then it is also a common solution. For nNn\in\mathbb{N}, since TT is a solution to the (n+K)({n+K})-th leading part, we have sn=T(n)s_n=T(n). Therefore, the completion of the KK-th leading part of ss is the sequence itself.
  • (3 ⟹ 1) As ss is the completion of some partial sequence, it obviously is computable.

Uncomputability There does not exist an algorithm that computes (the description of) the pattern of all finite partial sequences.

Proof (Sketch). If the problem were computable, we would be able to decide the language of compressible strings (see Kolmogorov complexity). Consider the following program:

  • Input: ss.
  • For all ss' shorter than ss:
    • Find the description r(T)r(T) of the pattern TT of ps=sp_{s'}=s (the other terms are {\perp}).
    • If (r(T),s)({r(T),s'}) is shorter than ss, output COMPRESSIBLE\mathrm{COMPRESSIBLE} and terminate.
  • If the loop finishes without terminating, output INCOMPRESSIBLE\mathrm{INCOMPRESSIBLE}.

The pattern TT is the ‘shortest’ machine such that T(s)=sT(s')=s, by which the correctness of the program follows. Note that traditionally, the definition of Kolmogorov complexity does not require that TT halt on all inputs. Nevertheless, tweaking the definition keeps compressibility undecidable.

Examples and discussion

In simple words, for a partial sequence, a solution is a program that produces correct given terms. The pattern is the shortest program among these. If there is a tie, break it by lexicographical order.

Examples

Let’s take some non-abstract sense. Suppose we use JavaScript (ES2015) as our programming language. We define a ‘source code’ as a JavaScript expression that evaluates to a method f and define f(n) as the output on input n. Consider the sequence sn=ns_n=n. If we are given only the initial term p0(0)=0p^{(0)}_0=0, what is the completion of p(0)p^{(0)}?

function(){return 0}

The completion of p(0)p^{(0)} is sn(0)=0s^{(0)}_n=0. What if we are given the first two terms, i.e., p(1)p^{(1)}? What’s its completion?

function($){return $}

Its completion is ss itself. Let’s consider another example: 1, 2, 3, 4, _, 6, 7. Clearly the completion is ss. Let’s look at one of its solutions:

function (n)
{
    if (n == 5) return 19260817;
    return n;
}

That solution is obviously more complicated (longer) than the pattern, hence not considered the pattern (in day-to-day sense).

Real-world considerations

For any given partial sequence, its completion depends on the encoding rr, or the programming language selected. For example, given 2, 3, 5, 7, 11, the completion might not be the prime if the programming language is not efficient in expressing such concepts. (Interleave: JavaScript can express the concept with short code thanks to its regular expression engine.)

For real-world cases, the ‘programming language’ also varies from people to people. Perhaps Mathematica can be used to model mathematicians? But clearly I shouldn’t be modelled so.

The framework resolves the definition of ‘persuasive’ by resorting to simplicity. However, simplicity varies according to ‘preset knowledge’ (built-in notations), which, if to be uniformised, is just a matter of choice.

For quizzes in school, we can modify our ‘program’ to ‘mathematical expression’ and define a total order over these expressions to compare their complexities. As suggested by Liwei Cai, for people with more maths knowledge, we should include a lot of built-in functions.

Philosophical implications

The computability of the terms in the completion together with the existence of a solution to any finite partial sequence (that a pattern exists) and the uncomputability of the pattern (that finding the pattern is hard) form a dialectical contradiction (coarsely understood as ‘the two sides of a coin among everything’, a concept from Marxism). The two results are not surprising by themselves or together, but making use of the parlance learnt from philosophy (pronounced ‘political’) courses is interesting to me.

The uncomputability suggests:

The discovery of the most clean theory explaining everything we see cannot be purely mechanical.

The lock-down of completion suggests the rationality of this framework. For a computable sequence ss, the completion of its kk-th leading part eventually will be ss itself, independent of the choice of encoding rr. For any programming language, as long as it is Turing complete, the most succinct rule that explains the sequence will eventually be the same, given enough terms. Applying the analogy to human, this implies that:

Having observed abundant phenomena, people vastly different yet all knowledgeable eventually agree on the law that governs the nature.

Please enable JavaScript to view the comments powered by Disqus.