In memory of a nightmare: solving linear equation over PID

Nightmare seasoned with Lecture of Algebra
Nightmare seasoned with Lecture of Algebra

I had a maths-related nightmare that (initially) was concerned about counting solutions of a linear equation modulo MM with integral coefficients. Though the whole detail of the dream cannot be written down due to privacy reasons, at least I can write a simple introduction on solving this specific problem here.

Update on 26 December 2018: The original solution is wrong. See the correction. Another update the same day: The original method is indeed correct, see the correction to correction.

In the dream, someone modelled the possible versions of wicked nursery rhymes as solutions to linear equations modulo MM, and asked how to count the solutions. Write the equations as a matrix equation Axb(mod M),Ax\equiv b\quad\left({\mathrm{mod}\ M}\right), where AZm×n,xZMn,bZmA\in\mathbb{Z}^{m\times n},x\in\mathbb{Z}_M^n,b\in\mathbb{Z}^m. The objective is to count the solutions (and possibly represent them in an easy-to-enumerate way).

Smith normal form of matrix over PID

If you are familiar with structural theorem of finitely generated module over a principal ideal domain, or λ\lambda-matrices (used for determining whether two matrices are similar), or just Smith normal form, you can skip this part.

Recall the elementary reductions for matrices over a field. Row reductions represent manipulations of equations. Rarely used in solving linear equations are column reductions, which represent change of variables. The implication of elementary reductions is that any invertible matrix is a product of several elementary matrices. Moreover, for any matrix AA (over a field), there exist two invertible matrices P,QP,Q and a natural number rr, such that A=P(Ir000)Q,A=P\begin{pmatrix}I_r&0\\0&0\end{pmatrix}Q, where IrI_r is the r×rr\times r identity matrix. As we all know, rr is the rank of the matrix.

The similar thing for the second part goes for matrices over a PID (or more generally, a principal ideal ring or a Bézout domain). For any matrix AA, there exist invertible matrices P,QP,Q, a natural number rr and rr ring elements α1,,αr0\alpha_1,\dots,\alpha_r\neq 0 such that aiai+1a_i\mid a_{i+1} for i=1,,r1i=1,\dots,r-1 and A=P(α10αr00)Q.A=P\left(\begin{array}{ccc|c}\alpha_1&&&\\&\ddots&&0\\&&\alpha_r&\\\hline&0&&0\end{array}\right)Q. The counterpart of the first part, that invertible matrices are products of ‘elementary matrices’, holds for Euclidean domains. It holds for general Bézout domain if we allow a new kind of elementary matrices, ‘coprime combination’, which essentially means multiplying by (ursv)\begin{pmatrix}u&r\\s&v\end{pmatrix} for any uvrs=1uv-rs=1 for two selected rows (resp. columns). Including this kind of elementary matrices generate the general linear group. Read the Wikipedia article for Smith normal form for an example.

The αi\alpha_i’s appearing on the diagonal are called invariant factors, which are unique up to associatedness (multiplication by a unit). The prime factorisation of them corresponds to elementary divisors. The invariant factors can be determined by the quotients of successive determinant factors, the kk-th of which is a greatest common divisor of all k×kk\times k minors.

Solving the equation

Smith normal form answers the question in the nightmare. Recall that we want to count the solutions (and write them in an easy-to-enumerate way) of Axb(mod M).Ax\equiv b\quad\left({\mathrm{mod}\ M}\right). First find out the Smith normal form of AA as A=PDQ,D=(α10αr00).A=PDQ,\quad D=\left(\begin{array}{ccc|c}\alpha_1&&&\\&\ddots&&0\\&&\alpha_r&\\\hline&0&&0\end{array}\right). You can use the Smith normal form of AA as a matrix over either Z\mathbb{Z} or ZM\mathbb{Z}_M, and for the latter case, it is possible to write (the integer representatives of the representatives of) the invariant factors as a positive divisor of MM, which happen to be the greatest common divisors of the corresponding factors over Z\mathbb{Z} and MM (see here). Note that for the former case, P,QP,Q are invertible over Z\mathbb{Z}, hence are also invertible over ZM\mathbb{Z}_M (when naturally mapped via the quotient map). The equation can be rewritten as DQxP1bc(mod M).DQx\equiv P^{-1}b\triangleq c\quad\left({\mathrm{mod}\ M}\right). Since QQ is invertible, xQxyx\mapsto Qx\triangleq y is a bijection over ZMn\mathbb{Z}_M^n. The problem reduces to counting the solutions of Dyc(mod M)Dy\equiv c\quad\left({\mathrm{mod}\ M}\right). This is a diagonal system of equations, of which the number of solutions is a piece of cake — the product of the number of solutions to each single equation. Consider the equation αiyici(mod M),\alpha_i y_i\equiv c_i\quad\left({\mathrm{mod}\ M}\right), where αi\alpha_i is a positive divisor of MM. There are αi\alpha_i solutions if αici\alpha_i\mid c_i, which are yi,tci+tMαi(mod M),t=1,,αi.y_{i,t}\equiv\frac{c_i+tM}{\alpha_i}\quad\left({\mathrm{mod}\ M}\right),\quad t=1,\dots,\alpha_i. Otherwise, there is no solution. Finally, any solution of the original equation is just x=Q1yx=Q^{-1}y, where yy is a solution to the diagonal system.

Correction

The strikethrough texts above are wrong. Each equation αiyici(mod M),\alpha_i y_i\equiv c_i\quad\left({\mathrm{mod}\ M}\right), where αi\alpha_i might not be (representable by) a factor of MM, by Chinese remainder theorem (CRT), is equivalent to the system αiyici(mod pjsj),\alpha_i y_i\equiv c_i\quad\left({\mathrm{mod}\ p_j^{s_j}}\right), where pjsjp_j^{s_j} runs through prime powers such that sj1,pjsjMs_j\geq 1,p_j^{s_j}\mid M and pjsj+1Mp_j^{s_j+1}\nmid M. Multiplied by appropriate coefficients, they can be rewritten as pjtijyivij(mod pjsj),p_j^{t_{ij}}y_i\equiv v_{ij}\quad\left({\mathrm{mod}\ p_j^{s_j}}\right), where pjtijp_j^{t_{ij}} is the maximum prime power dividing αi\alpha_i yet below or equal to pjsjp_j^{s_j}. The solutions to each equation are, this time for real, easy. If pjtijvijp_j^{t_{ij}}\mid v_{ij}, there are pjtijp_j^{t_{ij}} solutions, namely yivijpjtij+kpjsitij(mod pjsj),k=1,,pjtij.y_i\equiv\frac{v_{ij}}{p_j^{t_{ij}}}+kp_j^{s_i-t_{ij}}\quad\left({\mathrm{mod}\ p_j^{s_j}}\right),\quad k=1,\dots,p_j^{t_{ij}}. Otherwise, there is none. Each component in the diagonal system can be combined from the corresponding subset of the expanded system by CRT, which gives a formula to enumerate the solutions. If we just want to count, we can just multiply the number of solutions of each modulo-prime-power equation.

Remarks. CRT can be invoked before or after invariant factorisation. Doing so in the later stage allows reusing the factorisation over Z\mathbb{Z} or ZM\mathbb{Z}_M.

Correction to Correction

I thought the part about choosing αi\alpha_i as a divisor of MM was wrong, but it turns out it is correct. Moreover, we can pretend MM to be some of the invariant factors that follow non-zero entries (as MM is just zero in ZM\mathbb{Z}_M), which unifies the way we write equations. The rewriting isn’t quite immediate. If you read carefully, ZM\mathbb{Z}_M is not a principal ideal domain, though matrices still admit an (and even very nice) invariant factorisation. Let’s slow down and look carefully how we could reach the factorisation we need.

Let M=p1s1pzszM=p_1^{s_1}\cdots p_z^{s_z} be the standard factorisation, where p1,,pzp_1,\dots,p_z are distinct primes. For any dZ+d\in\mathbb{Z}_{+}, factorise d=p1s1pzszqd=p_1^{s_1'}\cdots p_z^{s_z'}q, where gcd(q,M)=1\gcd\left({q,M}\right)=1. Let ti=min(si,si)t_i=\min\left({s_i,s_i'}\right) and g=gcd(d,M)g=\gcd\left({d,M}\right), then g=p1t1pztzg=p_1^{t_1}\cdots p_z^{t_z}. Now CRT guarantees some hZh\in\mathbb{Z} such that h{d/g,si<si;1,sisi;(mod pisi),i=1,,z.h\equiv\begin{cases}d/g,&s_i'<s_i;\\1,&s_i'\geq s_i;\end{cases}\quad\left({\mathrm{mod}\ p_i^{s_i}}\right),\quad i=1,\dots,z. Note this hh is invertible modulo MM (because it’s invertible in each equation in CRT), let an inverse be kZk\in\mathbb{Z}. Note that kd{g,si<si;0g,sisi;}=g(mod pisi),i=1,,z,kd\equiv\left\{\begin{array}{rl}g,&s_i'<s_i;\\0\equiv g,&s_i'\geq s_i;\end{array}\right\}=g\quad\left({\mathrm{mod}\ p_i^{s_i}}\right),\quad i=1,\dots,z, which, again by CRT, yields kdg (mod M)kd\equiv g\ \left({\mathrm{mod}\ M}\right). This means any non-zero number dd is associated with (a unit factor away from) gcd(d,M)\gcd\left({d,M}\right) in ZM\mathbb{Z}_M.

Now we turn to invariant factorisation of matrices over ZM\mathbb{Z}_M. First, find out the invariant factorisation of AA over Z\mathbb{Z} as A=PDQA=PDQ. For each diagonal entry dd, make it positive by adding a multiple of MM, then invoke the above result and do one more row reduction (multiplying a unit) to transform DD so that each diagonal entry is a factor of MM, whence dd becomes gcd(d,M)\gcd\left({d,M}\right).

Remarks. The above method isn’t more desirable (for human; the time to find out the invariant factors dominates the total running time) than the one in the first correction, because it, after all, has to go through CRT to find out kk, which involves breaking equations down to the prime powers and putting them back together. An obvious attempt of fix would be using Euclidean algorithm to find out Bézout’s identity kd+?M=gcd(d,M)kd+?M=\gcd\left({d,M}\right). However, this method (without extra guarantees on the result returned) might fail, e.g., 2×46=gcd(4,6)2\times 4-6=\gcd\left({4,6}\right), but gcd(2,6)1\gcd\left({2,6}\right)\neq 1.

Remarks. However, the above method is faster (for human) for counting. Since multiplying by kk doesn’t affect the prime powers in cic_i that are relevant (i.e., also primes whose powers appearing in MM), we can just test whether gcd(d,M)ci\gcd\left({d,M}\right)\mid c_i and conclude the count.

Anecdotes

  • I dug through my camera roll to find a photo for the hero image so that I don’t have to deal with copyright issues. The photo is a picture of John Steinberger’s lecture of Mathematics in Computer Science (before those fancy artistic effects), and it is chosen based on its content relevance. The picture demonstrates the Bézout’s identity. Just to clarify, the lecture wasn’t a nightmare.
  • My blog building system is updated for this entry so that I can draw horizontal lines inside matrices.
  • Now that I have to revise twice after the entry’s original publication, this really is a nightmare!