Differences between revisions 1 and 23 (spanning 22 versions)

The Collatz Conjecture

The Collatz conjecture (or 3N+1 problem) is named after Lothar Collatz, who first proposed it in 1937. Wikipedia. Start with any integer. If it is odd, multiply by three and add 1. If it is even, divide by two. The conjecture is that it always converges on 1, then cycles 4, 2, 1, 4, 2, 1.

This has not been proved, and may be undecidable. If there are counterexamples, they (1) either diverge towards infinity forever, or (2) cycle in a loop. No divergent integers (of which there would be an infinite number) or loop (a finite number of integers) have been found, which of course would disprove the conjecture. The loop would have a vast number of very large numbers, the size of the loop related to the binary expansions of the ratio of ln(3)/ln(2).

Collatz, Improving the Definition

An infinitely large integer, on average, converges in log(infinite) cycles - also infinite. Is this actually "convergence"? What if a sequence of numbers needs 2*log(infinite) cycles, or 10*log(infinite) cycles, or N*log(infinite) cycles? What a sequence needs another log(infinite) cycles after we lost hope and gave up computing with our hypothetical Collatz computer? Unlike many other problems of large numbers (like Fermat's conjectecture), a number problem based on repeated calculations of infinitely large numbers has an inherent infinity to it, and may be undecidable until someone decides on a more defined statement of the problem. Those better defined statements - such as "for any group of N bit numbers, there will be less than N nonconvergent values, may actually be provable or disprovable. The whole problem might be divided into a few well-defined subproblems, and the ability to define the subproblems may be more valuable than a vague answer to the original vague conjecture.

Collatz, Logically

The Collatz problem can be mapped onto an infinite array of shifters and adders - all binary logic. Since the proposition we wish to test is also logical, then the infinite array of gates can produce a logical signal - the conjecture is true or false - in K*log(infinite) time, where K is (probably) a finite number of hypothetical gate delays multiplied by the log().

It is profitable to think about this hypothetical machine. Theoretical logic design, and especially fault testing of logic, has received a lot of academic attention over the last few decades. Proving that a logical machine (or a program) is 100% functional and without flaws ("stuck at zero or one" or too slow), without testing every possible input, is an incomplete problem, but the theoretical attention it has received might be applied to the Collatz conjecture, with practical results that might prove valuable to manufacturers of digital logic machines.

The Collatz problem also maps easily onto a Turing machine, and that has been used to develop theorems about computability and the halting problem. With proper remapping, Collatz may prove to be equivalent to the halting problem, or extend it to new realms. Either result would be useful to the programming community.

Perhaps pure number theoreticians care about Collatz purely because of the problem itself, but there are practical benefits to the search for the proof. Maximizing the benefits and leveraging off existing mathematical systems developed for other practical problems might be one way to roadmap a search for new knowledge about the Collatz conjecture. Elegance is nice, but sometimes you should move stuff with a dump truck rather than tweezers.

Parallel Collatz Computation - Gate Array Implementation

A "multiply" is not necessary; this can be constructed with a binary adder and a shifter, and hundreds of evaluation circuits could operate in parallel on a fast gate array. If N is odd, then 3N+1 is always even, so there is always at least one divide by two. So, adding N, plus N/2 (the same bits shifted towards the least significant bit), plus a carry into the bottom of a binary adder, is the same operation.

Shifting can be done with a "barrel shifter", with multiple stages shifting zero or one position, zero or two positions, zero of four positions, etc. as necessary. However, a "shift by one" occurs only 1/2 of the time, a "shift by two" 1/4 of the time, etc, so if we iterate with a single shift we on average add only two shift times, which is probably cheaper than adding a lot of complex logic for a barrel shifter, and we only have to look at one or two least significant bits to decide if to do the operation.

The addition is the most complex operation, since we need to propagate a carry through the whole sum. However, adder logic is sped up by using carry/propagate logic and heirarchical chains of blocks. This reduces carry propagate time from o(N) gate delays to o(log N). Details: Half sums are computed for each pair of input bits, generating a partial sum (using an exclusive OR), a generate signal (if both input bits are one, an AND gate) and a propagate signal (if one input bit is a one, an OR gate). Adder bits are grouped, typically in blocks of 4, with the whole block producing a block propagate signal (all bit propagate signals within the block are 1) or a generate signal (one of the bit generate signals within the block is one, along with all the subsequent bit propagate signals within the block). These signals are fed to higher level groups of blocks, calculating carries for 4 bits, then 16 bits, then 64 bits, then 256 bits. After the global carries are propagated, we go through another set of blocks to compute the carries for each bit pair. Then we generate the output sum with the first sum and the carry signal into another half adder (an exclusive OR) Given speed limitations associated with moving signals far across a chip, the higher level blocks may use three or two input carry/propagate blocks.

Since the least significant bit from the adder tells us whether we will do a subsequent add or shift (which include the same N/2 part), the question is whether to add N or not, and whether to wait for carry propagate, or just blow the bits through a shift down and move on to the next cycle.

Upper bit zero detection can be delayed through more pipeline logic. If the upper bits are delayed, and our 1/2/4 cycle detection is delayed by a few clock cycles, this is a win overall if we can reduce the number of logic stages and increase the clock rate. Or better, keep a moderate clock rate, reduce the logic swing, and increase gate delay to lower the power dissipation. A gate array full of such circuits will be much more power efficient than a microprocessor running a stored program, but if there are hundreds of these circuits running in parallel on the gate array it could get hot.

After all that perseveration, the simplest way to do this is to shift N down every cycle making N/2, but add the unshifted number and a carry if the LSB is one. That is, the result of every step is N/2 + (N+1)&LSB . A straightforward gate array adder of bit length B, using carry propagate, gets larger as B ln(B), and the step time gets larger proportional to ln(B). However, if we pipeline the upper bits and delay the zero detection, the step time remains small, with some ln(B) proportional cleanup time at the end of our calculation. The average number of steps is also proportional to ln(B).

This would act as a coprocessor; a CPU would generate the candidate number to compute, and shove new numbers at each gate array "cell", after pulling the results of the of the previous calculation out of that cell. Adding arithmetic comparisons to each Collatz cell for "highest value reached", "last descent below original value", hailstone cycle counts, etc. would each cost as much time and logic as the original calculation, though simple calculations (how many cycles, highest carry generated, etc. would be cheaper). We could use the gate array to survey for "interesting" numbers, then analyze them in detail with the main CPU or with a slower, more complicated, and more expensive Collatz cell.

method	Clocks per Cycle	Cycle speed	Logic speed	Multithread	Power
Standard 64 bit CPU	B/64 +overhead	many instructions	Better	16?	Very High
Gate array with standard adder	1	B	Slower	400?	Moderate
Gate array with carry propagate	1	ln(B)	Slower	200?	High
Gate array, cp, pipelined	1	1	Slower	100?	High
Custom logic, cp, pipelined	1	1	Optimum	1000?	High	Huge NRE

Using a gate array to develop a "normal" 256 bit calculator for the usual explorations, then a very large carry propagate adder (64K bit words?) for the exploration of very large numbers that are potentially part of large loops, would be a lot faster than the usual multiprecision calculations. The discovery process for large loop candidates might also benefit from a gate array. But at some point, the only way some of these very large calculations will complete in an acceptable time is with a full custom chip design (or two), which might cost $1M to design and build the chips, for a "Moon Shot" calculating system. This will happen when the task involves dozens of researchers and thousands of gate arrays in boards, and will be for cost and runtime reduction.

Odd Integer Starts Only

All even starting integers divide down monotonically to odd integers, so "even starts" are trivial and need not be computed. Odd starts that are divisible by three have only 2^N multiple antecedents, also uninteresting, they cannot have complex webs of antecedents like other integers, nor can they be part of a loop. Every odd integer that is divisible by three reduces to a different odd integer not divisible by 3, or two the "binary backbone chain".

Binary backbone chain

All Collatz sequences that terminate at 1 pass through the Binary chain, a sequence of multiples of 4 starting at 16, and skipping every third multiple (with a 4's exponent divisible by 3) :

4	1	1
16	2	5
64	3	21
256	4	85
1024	5	341
4096	6	1365
16384	7	5461
65536	8	21845
262144	9	87381
1048576	10	349525
4194304	11	1398101
16777216	12	5592405
67108864	13	22369621
268435456	14	89478485
1073741824	15	357913941
4294967296	16	1431655765

... et cetera ...

So if the Collatz conjecture is true,

all odd integers (and hence all integers) terminate in this power-of-four backbone chain, and enter it only via one of the odd entry points in the third column.

Indeed, very nearly all integers enter the backbone chain via the bolded odd integer entry points, with the non-bolded integers (divisible by three) as entry points for only those odd integers themselves (and their power-of-two even multiples).

A program or logic system that computes Collatz sequences need not detect those entry points, just shift the working integer down by one, testing for a 0 lsb and either continuing to the next shift, or testing for 1 and terminating. If these sequential tests are NOT looped, they will be pretty fast, and the many terminating locations in the code will correspond to the terminating number in the backbone chain.

An interesting 64 bit unsigned computation would start with a maximum odd integer (say 2⁵⁶), then following this loop, decrementing downwards. If the result of the tests exceeds 2⁶³, then pass those integers to another processor that does double- or triple-precision BIGINT computations. That might take about a year to compute on a million MIPS multicore/multithread CPU. Starting with a 2⁶⁴ integer would probably take more than a century.

Only a miniscule fraction of all those starts would enter the backbone chain above the starts in the table shown. My WAG is that a table twice as long would cover ALL 2⁶⁴ bit starts. But that can't be proven without doing the computations, and the general case for much larger integers can't be proven until the Collatz conjecture is proven.

A nested, pipelined, carry-lookahead adder can compute an N bit bigint addition in log(N) time. So it is conceivable that a megaabit-integer Collatz processor could still process Collatz cycles at GIPS rates, but the universe would not last long enough for it to test all the starting integers it is capable of processing

A 1000-thread custom logic chip made in 10,000 chip batches would be much faster and less power hungry than a general CPU, but very costly to design and implement. It is probably cheaper to pay a lot of mathematicians to fold the problem down to a smaller logical problem. Enough problem foldings (with the aid of automated logic minimization) could lead to a mathematical proof of Collatz ... which only a giant logical system could understand.

Perhaps the proof of the Collatz conjecture would fill more silicon than the universe contains. That would be an awful way for the world to end ...

-  ⇤ ← Revision 1 as of 2015-01-14 23:50:28 → 
  Size: 7407
  Editor: KeithLofstrom
  Comment:
+   ← Revision 23 as of 2020-07-17 02:03:03 → ⇥
  Size: 13879
  Editor: KeithLofstrom
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 6:
-This has not been proved, and may be undecidable.  If there are counterexamples, they (1) either diverge towards infinity forever, or (2) cycle in a loop.  No divergent integers (of which there would be an infinite number) or loop (a finite number of integers) have been found.
+This has not been proved, and may be undecidable.  If there are counterexamples, they (1) either diverge towards infinity forever, or (2) cycle in a loop.  No divergent integers (of which there would be an infinite number) or loop (a finite number of integers) have been found, which of course would disprove the conjecture.  The loop would have a vast number of very large numbers, the size of the loop related to the binary expansions of the ratio of ln(3)/ln(2).
-Line 10:
+Line 11:
-An infinitely large integer, on average, converges in log(infinite) cycles - also infinite.  Is this actually "convergence"?  What if one number needs 2*log(infinite) cycles, or 10*log(infinite) cycles, or N*log(infinite) cycles, or one merely needs another log(infinite) cycles after we give up computing with our hypothetical Collatz computer?  Unlike many other problems of large numbers (like Fermat's conjectecture), a number problem based on repeated calculations of infinitely large numbers has an inherent infinity to it, and may be undecidable until someone decides on a more defined statement of the problem.  Those better defined statements - such as "for any N bit number above Y, Collatz will converge in less than N cycles) may actually be provable or disprovable.  This as the feel of a problem that can be divided into a few well-defined subproblems, and the ability to define the subproblems may be more valuable than a vague answer to the original vague conjecture.
+An infinitely large integer, on average, converges in log(infinite) cycles - also infinite.  Is this actually "convergence"?  What if a sequence of numbers needs 2*log(infinite) cycles, or 10*log(infinite) cycles, or N*log(infinite) cycles?  What a sequence needs another log(infinite) cycles after we lost hope and gave up computing with our hypothetical Collatz computer?   Unlike many other problems of large numbers (like Fermat's conjectecture), a number problem based on repeated calculations of infinitely large numbers has an inherent infinity to it, and may be undecidable until someone decides on a more defined statement of the problem.  Those better defined statements - such as "for any group of N bit numbers, there will be less than N nonconvergent values, may actually be provable or disprovable.  The whole problem might be divided into a few well-defined subproblems, and the ability to define the subproblems may be more valuable than a vague answer to the original vague conjecture.
-Line 19:
+Line 20:
-The Collatz problem also maps easily onto a Turing machine, and that has been used to develop theorems about computability and the halting problem.  With proper remapping, Collatz may prove to be equivalent to the halting problem, or extend it to new realms.  Either result would be useful to the programming community.
+The Collatz problem also '''maps easily onto a Turing machine''', and that has been used to develop theorems about computability and the halting problem.  With proper remapping, Collatz may prove to be equivalent to the halting problem, or extend it to new realms.  Either result would be useful to the programming community.
-Line 30:
+Line 31:
-The addition is the most complex operation, since we need to propagate a carry through the whole sum.  However, adder logic is sped up by using carry/propagate logic and heirachical chains of blocks. This reduces carry propagate time from o(N) gate delays to o(log N). '''Details:''' Half sums are computed for each pair of input bits, generating a partial sum (using an exclusive OR), a generate signal (if both input bits are one, an AND gate) and a propagate signal (if one input bit is a one, an OR gate).  Adder bits are grouped, typically in blocks of 4, with the whole block producing a block propagate signal (all bit propagate signals within the block are 1) or a generate signal (one of the bit generate signals within the block is one, along with all the subsequent bit propagate signals within the block).  These signals are fed to higher level groups of blocks, calculating carries for 4 bits, then 16 bits, then 64 bits, then 256 bits.  After the global carries are propagated, we go through another set of blocks to compute the carries for each bit pair. Then we generate the output sum with the first sum and the carry signal into another half adder (an exclusive OR) Given speed limitations associated with moving signals far across a chip, the higher level blocks may use three or two input carry/propagate blocks.
+The addition is the most complex operation, since we need to propagate a carry through the whole sum.  However, adder logic is sped up by using carry/propagate logic and heirarchical chains of blocks. This reduces carry propagate time from o(N) gate delays to o(log N). '''Details:''' Half sums are computed for each pair of input bits, generating a partial sum (using an exclusive OR), a generate signal (if both input bits are one, an AND gate) and a propagate signal (if one input bit is a one, an OR gate).  Adder bits are grouped, typically in blocks of 4, with the whole block producing a block propagate signal (all bit propagate signals within the block are 1) or a generate signal (one of the bit generate signals within the block is one, along with all the subsequent bit propagate signals within the block).  These signals are fed to higher level groups of blocks, calculating carries for 4 bits, then 16 bits, then 64 bits, then 256 bits.  After the global carries are propagated, we go through another set of blocks to compute the carries for each bit pair. Then we generate the output sum with the first sum and the carry signal into another half adder (an exclusive OR) Given speed limitations associated with moving signals far across a chip, the higher level blocks may use three or two input carry/propagate blocks.
-Line 36:
+Line 37:
+After all that perseveration, the simplest way to do this is to shift N down every cycle making N/2, but add the unshifted number and a carry if the LSB is one.  That is, the result of every step is N/2 + (N+1)&LSB .   A straightforward gate array adder of bit length B, using carry propagate, gets larger as B ln(B), and the step time gets larger proportional to ln(B).  However, if we pipeline the upper bits and delay the zero detection, the step time remains small, with some ln(B) proportional cleanup time at the end of our calculation.  The average number of steps is also proportional to ln(B).
-Line 38:
+Line 41:
-----
+|| method                          || Clocks per Cycle || Cycle speed          || Logic speed  || Multithread || Power     ||
|| Standard 64 bit CPU             || B/64 +overhead   || many instructions    || Better       || 16?         || Very High ||
|| Gate array with standard adder  || 1                || B                    || Slower       || 400?        || Moderate  ||
|| Gate array with carry propagate || 1                || ln(B)                || Slower       || 200?        || High      ||
|| Gate array, cp, pipelined       || 1                || 1                    || Slower       || 100?        || High      ||
|| Custom logic, cp, pipelined     || 1                || 1                    || Optimum      || 1000?       || High      || Huge NRE ||

Using a gate array to develop a "normal" 256 bit calculator for the usual explorations, then a very large carry propagate adder (64K bit words?) for the exploration of very large numbers that are potentially part of large loops, would be a lot faster than the usual multiprecision calculations.  The discovery process for large loop candidates might also benefit from a gate array.   But at some point, the only way some of these very large calculations will complete in an acceptable time is with a full custom chip design (or two), which might cost $1M to design and build the chips, for a "Moon Shot" calculating system.  This will happen when the task involves dozens of researchers and thousands of gate arrays in boards, and will be for cost and runtime reduction.  

== Odd Integer Starts Only ==

All even starting integers divide down monotonically to odd integers, so "even starts" are trivial and need not be computed.  Odd starts that are divisible by three have only 2^N multiple antecedents, also uninteresting, they cannot have complex webs of antecedents like other integers, nor can they be part of a loop.  Every odd integer that is divisible by three reduces to a different odd integer not divisible by 3, or two the "binary backbone chain".

== Binary backbone chain ==

All Collatz sequences that terminate at 1 pass through the Binary chain, a sequence of multiples of 4 starting at 16, and skipping every third multiple (with a 4's exponent divisible by 3) :

||          4 ||  1 ||             1    ||
||         16 ||  2 ||          '''5''' ||
||         64 ||  3 ||            21    ||
||        256 ||  4 ||         '''85''' ||
||       1024 ||  5 ||        '''341''' ||
||       4096 ||  6 ||          1365    ||
||      16384 ||  7 ||       '''5461''' ||
||      65536 ||  8 ||      '''21845''' ||
||     262144 ||  9 ||         87381    ||
||    1048576 || 10 ||     '''349525''' ||
||    4194304 || 11 ||    '''1398101''' ||
||   16777216 || 12 ||       5592405    ||
||   67108864 || 13 ||   '''22369621''' ||
||  268435456 || 14 ||   '''89478485''' ||
|| 1073741824 || 15 ||     357913941    ||
|| 4294967296 || 16 || '''1431655765''' ||
~+... et cetera ... +~

So ''if'' the Collatz conjecture is true, 
 . '''all odd integers (and hence all integers) terminate in this __power-of-four backbone chain__, and enter it __only__ via one of the __odd entry points__ in the third column'''.  

Indeed, very nearly all integers enter the backbone chain via the '''bolded''' odd integer entry points, with the non-bolded integers (divisible by three) as entry points for '''only''' those odd integers themselves (and their power-of-two even multiples).

A program or logic system that computes Collatz sequences need not detect those entry points, just shift the working integer down by one, testing for a 0 lsb and either continuing to the next shift, or testing for 1 and terminating.  If these sequential tests are NOT looped, they will be pretty fast, and the many terminating locations in the code will correspond to the terminating number in the backbone chain.

An interesting 64 bit unsigned computation would start with a '''maximum''' odd integer (say 2^56^), then following this loop, decrementing downwards.  If the result of the tests exceeds 2^63^, then pass those integers to another processor that does double- or triple-precision BIGINT computations.  That might take about a year to compute on a million MIPS multicore/multithread CPU.  Starting with a 2^64^ integer would probably take more than a century.

Only a miniscule fraction of all those starts would enter the backbone chain above the starts in the table shown.  My WAG is that a table twice as long would cover ALL 2^64^ bit starts.  But that can't be proven without doing the computations, and the general case for much larger integers can't be proven until the Collatz conjecture is proven.

A nested, pipelined, carry-lookahead adder can compute an N bit bigint addition in log(N) time.  So it is conceivable that a megaabit-integer Collatz processor could still process Collatz cycles at GIPS rates, but the universe would not last long enough for it to test all the starting integers it is capable of processing

A 1000-thread custom logic chip made in 10,000 chip batches would be much faster and less power hungry than a general CPU, but very costly to design and implement.  It is probably cheaper to pay a lot of mathematicians to fold the problem down to a smaller logical problem.  Enough problem foldings (with the aid of automated logic minimization) could lead to a mathematical proof of Collatz ... which only a giant logical system could understand.

Perhaps the proof of the Collatz conjecture would fill more silicon than the universe contains.  That would be an awful way for the world to end ... :-(