D.R. Kaprekar was a self taught recreational mathematician, perhaps known mostly for some numbers that bear his name.

Today, I'll focus on Kaprekar's constant (as opposed to Kaprekar numbers.)

The idea is a simple one, embodied in these 5 steps.

1. Take any 4 digit integer, reduce to its decimal digits.

2. Sort the digits in decreasing order.

3. Flip the sequence of those digits, then recompose the two sets of sorted digits into 4 digit numbers. If there were any 0 digits, they will become leading zeros on the smaller number. In this case, a leading zero is acceptable to consider a number as a 4 digit integer.

4. Subtract the two numbers, smaller from the larger. The result will always have no more than 4 decimal digits. If it is less than 1000, then presume there are leading zero digits.

5. If necessary, repeat the above operation, until the result converges to a stable result, or until you see a cycle.

Since this process is deterministic, and must always result in a new 4 digit integer, it must either terminate at either an absorbing state, or in a cycle.

For example, consider the number 6174.

7641 - 1467

ans = 6174

We get 6174 directly back. That seems rather surprising to me. But even more interesting is you will find all 4 digit numbers (excluding the pure rep-digit nmbers) will always terminate at 6174, after at most a few steps. For example, if we start with 1234

4321 - 1234

ans = 3087

8730 - 0378

ans = 8352

8532 - 2358

ans = 6174

and we see that after 3 iterations of this process, we end at 6174. Similarly, if we start with 9998, it too maps to 6174 after 5 iterations.

9998 ==> 999 ==> 8991 ==> 8082 ==> 8532 ==> 6174

Why should that happen? That is, why should 6174 always drop out in the end? Clearly, since this is a deterministic proces which always produces another 4 digit integer (Assuming we treat integers with a leading zero as 4 digit integers), we must either end in some cycle, or we must end at some absorbing state. But for all (non-pure rep-digit) starting points to end at the same place, it seems just a bit surprising.

I always like to start a problem by working on a simpler problem, and see if it gives me some intuition about the process. I'll do the same thing here, but with a pair of two digit numbers. There are 100 possible two digit numbers, since we must treat all one digit numbers as having a "tens" digit of 0.

N = (0:99)';

Next, form the Kaprekar mapping for 2 digit numbers. This is easier than you may think, since we can do it in a very few lines of code on all possible inputs.

Ndig = dec2base(N,10,2) - '0';

Nmap = sort(Ndig,2,'descend')*[10;1] - sort(Ndig,2,'ascend')*[10;1];

I'll turn it into a graph, so we can visualize what happens. It also gives me an excuse to employ a very pretty set of tools in MATLAB.

G2 = graph(N+1,Nmap+1,[],cellstr(dec2base(N,10,2)));

plot(G2)

Do you see what happens? All of the rep-digit numbers, like 11, 44, 55, etc., all map directly to 0, and they stay there, since 0 also maps into 0. We can see that in the star on the lower right.

G2cycles = cyclebasis(G2)

G2cycles = 2x1 cell array

{1x1 cell} {1x5 cell}

All other numbers eventually end up in the cycle:

G2cycles{2}

ans = 1x5 cell array

{'09'} {'45'} {'27'} {'63'} {'81'}

That is

81 ==> 63 ==> 27 ==> 45 ==> 09 ==> and back to 81

looping forever.

Another way of trying to visualize what happens with 2 digit numbers is to use symbolics. Thus, if we assume any 2 digit number can be written as 10*T+U, where I'll assume T>=U, since we always sort the digits first

syms T U

(10*T + U) - (10*U+T)

ans=

So after one iteration for 2 digit numbers, the result maps ALWAYS to a new 2 digit number that is divisible by 9. And there are only 10 such 2 digit numbers that are divisible by 9. So the 2-digit case must resolve itself rather quickly.

What happens when we move to 3 digit numbers? Note that for any 3 digit number abc (without loss of generality, assume a >= b >= c) it almost looks like it reduces to the 2 digit probem, aince we have abc - cba. The middle digit will always cancel itself in the subtraction operation. Does that mean we should expect a cycle at the end, as happens with 2 digit numbers? A simple modification to our previous code will tell us the answer.

N = (0:999)';

Ndig = dec2base(N,10,3) - '0';

Nmap = sort(Ndig,2,'descend')*[100;10;1] - sort(Ndig,2,'ascend')*[100;10;1];

G3 = graph(N+1,Nmap+1,[],cellstr(dec2base(N,10,2)));

plot(G3)

This one is more difficult to visualize, since there are 1000 nodes in the graph. However, we can clearly see two disjoint groups.

We can use cyclebasis to tell us the complete story again.

G3cycles = cyclebasis(G3)

G3cycles = 2x1 cell array

{1x1 cell} {1x1 cell}

G3cycles{:}

ans = 1x1 cell array

{'000'}

ans = 1x1 cell array

{'495'}

And we see that all 3 digit numbers must either terminate at 000, or 495. For example, if we start with 181, we would see:

811 - 118

ans = 693

963 - 369

ans = 594

954 - 459

ans = 495

It will terminate there, forever trapped at 495. And cyclebasis tells us there are no other cycles besides the boring one at 000.

What is the maximum length of any such path to get to 495?

D3 = distances(G3,496) % Remember, MATLAB uses an index origin of 1

D3 = 1x1000

Inf 6 5 4 3 1 2 3 4 5 6 6 5 4 3 1 2 3 4 5 5 5 5 4 3 1 2 3 4 5

<mw-icon class=""></mw-icon>

<mw-icon class=""></mw-icon>

D3(isinf(D3)) = -inf; % some nodes can never reach 495, so they have an infinite distance

plot(D3)

The maximum number of steps to get to 495 is 6 steps.

find(D3 == 6) - 1

ans = 1x54

1 10 11 100 101 110 112 121 122 211 212 221 223 232 233 322 323 332 334 343 344 433 434 443 445 454 455 544 545 554

<mw-icon class=""></mw-icon>

<mw-icon class=""></mw-icon>

So the 3 digit number 100 required 6 iterations to eventually reach 495.

shortestpath(G3,101,496) - 1

ans = 1x7

100 99 891 792 693 594 495

<mw-icon class=""></mw-icon>

<mw-icon class=""></mw-icon>

I think I've rather exhausted the 3 digit case. It is time now to move to the 4 digit problem, but we've already done all the hard work. The same scheme will apply to compute a graph. And the graph theory tools do all the hard work for us.

N = (0:9999)';

Ndig = dec2base(N,10,4) - '0';

Nmap = sort(Ndig,2,'descend')*[1000;100;10;1] - sort(Ndig,2,'ascend')*[1000;100;10;1];

G4 = graph(N+1,Nmap+1,[],cellstr(dec2base(N,10,2)));

plot(G4)

cyclebasis(G4)

ans = 2x1 cell array

{1x1 cell} {1x1 cell}

ans{:}

ans = 1x1 cell array

{'0000'}

ans = 1x1 cell array

{'6174'}

And here we see the behavior, with one stable final point, 6174 as the only non-zero ending state. There are no circular cycles as we had for the 2-digit case.

How many iterations were necessary at most before termination?

D4 = distances(G4,6175);

D4(isinf(D4)) = -inf;

plot(D4)

The plot tells the story here. The maximum number of iterations before termination is 7 for the 4 digit case.

find(D4 == 7,1,'last') - 1

ans = 9985

shortestpath(G4,9986,6175) - 1

ans = 1x8

9985 4086 8172 7443 3996 6264 4176 6174

<mw-icon class=""></mw-icon>

<mw-icon class=""></mw-icon>

Can you go further? Are there 5 or 6 digit Kaprekar constants? Sadly, I have read that for more than 4 digits, things break down a bit, there is no 5 digit (or higher) Kaprekar constant.

We can verify that fact, at least for 5 digit numbers.

N = (0:99999)';

Ndig = dec2base(N,10,5) - '0';

Nmap = sort(Ndig,2,'descend')*[10000;1000;100;10;1] - sort(Ndig,2,'ascend')*[10000;1000;100;10;1];

G5 = graph(N+1,Nmap+1,[],cellstr(dec2base(N,10,2)));

plot(G5)

cyclebasis(G5)

ans = 4x1 cell array

{1x1 cell} {1x2 cell} {1x4 cell} {1x4 cell}

ans{:}

ans = 1x1 cell array

{'00000'}

ans = 1x2 cell array

{'53955'} {'59994'}

ans = 1x4 cell array

{'61974'} {'63954'} {'75933'} {'82962'}

ans = 1x4 cell array

{'62964'} {'71973'} {'83952'} {'74943'}

The result here are 4 disjoint cycles. Of course the rep-digit cycle must always be on its own, but the other three cycles are also fully disjoint, and are of respective length 2, 4, and 4.