Sunday, 4 May, 2008

Largest prime factor of a number

Long live the programming overlords! Today is the day I have begun serious programming again... 'serious' is perhaps a bit too much - programming is never actually serious, it's a sort of fun activity that happens to appear boring and difficult and hence obtains good paychecks from the HR types... ;)

Anyway, I happened to come across the Project Euler and registered in the site. The registration was painless and I fell in love with the site immediately. Problems 1 and 2 were trivial (finding the sum of the multiples of 3 or 5 under 1000, finding the sum of even numbers in the Fibonacci series which are less than 4 million).

Problem 3 was a little more involved. I had to find the largest prime factor of 600851475143. It isn't particularly difficult, but it does call for some thinking. The first method I thought of was to
  1. divide the number by successive integers upto sqrt(n), and
  2. when the remainder is 0, check whether the divisor is prime,
  3. if so, store it as answer overwriting the previous value, if any.
An obvious improvement was to come down from sqrt(n) so that the first number to satisfy conditions 2 and 3 would be the answer.
However, by this time, something seemed fishy in this method. Particularly, it occurred to me that a prime number checking was costly and unnecessary here. Instead, you can go about it using a recursive solution:

FindLargestPrimeFactor(n):
  1. divide the number by successive integers (each denoted by i) upto sqrt(n),
  2. when the remainder is 0, return the maximum among i and FindLargestPrimeFactor(n/i).
It might not be clear at first sight that the algorithm works correctly. There seems to be no check for primality. However, this algorithm exploits the fact that the smallest factor sf of a number n will necessarily be prime. This is because, if it were not prime, its factors would be factors of n and hence sf would not be the smallest factor. Thus, we need only find the smallest factor to have a prime factor. Then, we can divide away this factor and again find the smallest factor. This way, we can have all the prime factors of a number.

In the algorithm, we find a prime factor of n, compare it with the largest prime factor of n/i, and return the larger number. Thus, finally, we'll have the largest prime factor of the number n itself.
Obviously, we need a limit for the recursion. In this case, this limit occurs when i overflows sn. This means that n is prime, and hence we must return n itself as the largest prime number.

This was the logic of my program as I coded and finished it. I entered the answer into the Project Euler website (which is projecteuler.net, by the way). After that, I saw the forums in the website and saw that many had implemented my original algorithm which used prime testing. This more efficient recursive solution had missed the neurons of many... With that boost to my hubris, I turned to the pdf containing the 'overview' of the problem.

Wow, the algorithm described is very close to mine. One improvement is that the recursion has instead become an iteration. This is one area where I stumble every time - converting a recursive program into an iterative one. So I bookmarked this in my mind and moved on.

The next improvement was one that made me feel stupid for not thinking of it earlier. Since I'm going to find only prime factors, I need only use 2 and odd numbers in step 1. So, I can use a separate step for 2 and then start with 3 and increment
i by 2. This way, we'll save about half of the iterations...

UPDATE: There are quite a number of other improvements that must be done to this algorithm. For the actual algorithm for this problem (which develops on the algorithm here), see Finding the Largest prime factor efficiently.

The final perl program I wrote was:

use strict;
use warnings;

my $magic = 600851475143;

sub largestprimef($);
sub max($$);

print largestprimef($magic);
sub largestprimef($)
{
my $n = shift;

my $i;
return largestprimef(max(2, $n/2) if($n % 2 == 0)); #UPDATE: There was a bug here where I didn't missed the recursive call. Corrected 20th April 2009.
my $sn = int(sqrt($n));

for ($i = 3; $i <= $sn; $i += 2) { if($n % $i == 0) { last; } } if($i > $sn) #loop ran over, means the number is prime
{
return $n;
}
else
{
return max($i, largestprimef($n/$i));
}
}

sub max($$)
{
return (sort { $a <=> $b }(@_))[1];
}


A few notes about the program here:
  • not 'use'-ing the strict and warning pragmas gave a speed improvement of a few milliseconds
  • first I implemented max in the good old comparison way - if i > j, return i else return j. Turns out perl's sort implementation gives faster results than this equivalent speed. I just tried it for the heck of it , and was surprised to see a few milliseconds improvement. Probably the speed improvement came from the removal of 'shift's to get the numbers from @_ to $i and $j.(that was a temporary illusion; in repeated testing both turned out to be equivalent.
  • Since I did the program in Linux, the age old newline problem cropped up. When I pasted the code into blogger, it showed the code as one continuous mess. I had to use http://www.hashemall.com/ for converting the newlines. The site dutifully hashed the program, and thankfully returned my original text with MS Windows newlines in place of Unix ones.. (The hash value under sha-1 was b8b7188ad3440084e5db7f44fd70d599f897e33a, in case you are interested... :) )
  • The final runtime was around 14 ms.
  • UPDATE: The code highlighting is courtesy of Online syntax highlighter 'tohtml' which is an excellent service by the way.