Tuesday, September 10, 2019

How elusive was 42 as the sum of three cubes?

The big mathematical news September 9th 2019 was the final resolution of the decade old Diophantine equation.

As is often the case for super tough questions, the question itself seems completely trivial.
Can all the numbers from 1 to 100 be written as the sum of the cubes of three integers,
i^3 + j^3 + k^3?

Of course the key part of the question is integers, otherwise 0^3 + 0^3 + (cubic root of x)^3 will return x for any x we want!

The other subtlety is that integers can be negative, so each cube can be either positive or negative. So 92 can be written as 9^3 + (-8)^3 + (-5)^3.



This puzzle has been around for at least decades in this current form, but similar equations with integer solutions have been around at least since the third century!

Four our problem, solutions for almost all numbers 1 through 100 had been found, but two remained elusive 33 and 42.



And then 33 was found earlier this year:
   (8866128975287528)^3
+ (−8778405442862239)^3
+ (−2736111468807040)^3 = 33

Which left 42, but advances on the theory and on the computing side have cracked the last remaining elusive number:
   (80435758145817515) ^ 3
+ (-80538738812075974) ^ 3
+  (12602123297335631) ^ 3 = 42

Now I will describe man algorithm to generate all numbers 1 through 1000....
Just kidding.

I was actually curious of the elusiveness of these numbers 33 and 42. Mathematicians had to go really far to get the right numbers. Is that because numbers just weren't very likely to fall in the 1 to 100 range (very likely given how quickly cubes get!) or are certain numbers much more likely to get all the attention. in other words, as all combinations of three integers were tested, were all numbers 1 through 100 obtained in roughly equal proportions or did frequency concentrate around certain ranges?

I wrote a small script which explored this, simply generating all possible combinations of three integers (up to -1000 to 1000 in range, didn't feel like going all the way up to 80538738812075974...) and look at the distribution of i^3 + j^3 + k^3 in the 1 to 100 range.

With i,j,k in -10 to 10


With integers just in that small range, already 61 out of the 100 numbers are solved!
The distribution is far from being randomly uniform though! Two things jump out:

  • the spikes: These are perfectly normal! If a number is a perfect cube (i.e. 27), then there is an infinite number of solutions setting the two other values to opposite values: 27 = 3^3 = 3^3 + 0^3 + 0^3 = 3^3 + (1761)^3 + (-1761)^3... And you can add in all the permutations!
  • the clusters: it seems the solved numbers go in groups. Again, quite reasonable if we think about it. If a number can be generated as the sum of two cubes, then we can easily generate the one immediately before and after by adding either 1^3 or (-1)^3. We have an extra degree of freedom around the perfect cubes. Being able to generate consecutive solved numbers creates the clusters. There is a further discussion at the end of the post on how unexpected the number of clusters is.

Based on the spike comment, I will truncate at frequencies of 10 to focus on coverage:


With i,j,k in -100 to 100

Expanding the range ten-fold leads to 68 out of the 100 being solved:




With i,j,k in -1000 to 1000

Increasing the range ten-fold once again leads to a single additional number being solved: 51


We now have a better sense of the elusiveness of the solutions. If they aren't found with the range of the smaller integers, then it will get increasingly difficult to identify later, no wonder this has been such an open problem for so many years!

Just for fun, I looked at a new problem switching the power from 3 to 5, only 13 (compared to 69 for cubes) numbers were solved in the -1000 to 1000 range:


Again the clustering is pretty remarkable! While 42 is still unsolved, I'm glad to report a solution for 33: 0^5 + 1^5 + 2^5 !

And to close out this post, I wanted to return to the topic of the clusters. We saw that when exploring the -1000 to 1000 ranges for the integers we were able to cover 69 out of the 100 target numbers. But it didn't appear as though the 69 numbers were randomly distributed in 1-100 but grouped into clusters, more specifically into 13 clusters. We had a high-level explanation for this as consecutive numbers can easily be generated via k^3 with k = -1, 0 and 1. But the question remains, how non-random is this grouping into 13 clusters?

One approach is to sample without replacement 69 out of the 100 first integers, and see how many clusters are obtained. In order to count clusters, I used the rle() function which stands for run length encoding and basically describes a sequence. E.g. for 1,2,2,2,1,0,0 it will return that there is one 1, three 2s, one 1 and two 0s. From there it is easy to get the number of clusters.

I generated 10,000 samples of "solved" numbers between 1-100 and in each case computed the number of clusters obtained. Here is the associated histogram to which I've added the 13 we obtained in our case as a red vertical line:


While we'd typically expect to get about 20-25 clusters, only in extreme cases do we get a 15 or even a 14. Our 13 was not obtained a single time in all ten thousand simulations. This should settle the question of whether the low number of clusters comes as a surprise or not!