Sunday 4 May, 2008

Largest prime factor of a number

Long live the programming overlords! Today is the day I have begun serious programming again... 'serious' is perhaps a bit too much - programming is never actually serious, it's a sort of fun activity that happens to appear boring and difficult and hence obtains good paychecks from the HR types... ;)

Anyway, I happened to come across the Project Euler and registered in the site. The registration was painless and I fell in love with the site immediately. Problems 1 and 2 were trivial (finding the sum of the multiples of 3 or 5 under 1000, finding the sum of even numbers in the Fibonacci series which are less than 4 million).

Problem 3 was a little more involved. I had to find the largest prime factor of 600851475143. It isn't particularly difficult, but it does call for some thinking. The first method I thought of was to
  1. divide the number by successive integers upto sqrt(n), and
  2. when the remainder is 0, check whether the divisor is prime,
  3. if so, store it as answer overwriting the previous value, if any.
An obvious improvement was to come down from sqrt(n) so that the first number to satisfy conditions 2 and 3 would be the answer.
However, by this time, something seemed fishy in this method. Particularly, it occurred to me that a prime number checking was costly and unnecessary here. Instead, you can go about it using a recursive solution:

FindLargestPrimeFactor(n):
  1. divide the number by successive integers (each denoted by i) upto sqrt(n),
  2. when the remainder is 0, return the maximum among i and FindLargestPrimeFactor(n/i).
It might not be clear at first sight that the algorithm works correctly. There seems to be no check for primality. However, this algorithm exploits the fact that the smallest factor sf of a number n will necessarily be prime. This is because, if it were not prime, its factors would be factors of n and hence sf would not be the smallest factor. Thus, we need only find the smallest factor to have a prime factor. Then, we can divide away this factor and again find the smallest factor. This way, we can have all the prime factors of a number.

In the algorithm, we find a prime factor of n, compare it with the largest prime factor of n/i, and return the larger number. Thus, finally, we'll have the largest prime factor of the number n itself.
Obviously, we need a limit for the recursion. In this case, this limit occurs when i overflows sn. This means that n is prime, and hence we must return n itself as the largest prime number.


This was the logic of my program as I coded and finished it. I entered the answer into the Project Euler website (which is projecteuler.net, by the way). After that, I saw the forums in the website and saw that many had implemented my original algorithm which used prime testing. This more efficient recursive solution had missed the neurons of many... With that boost to my hubris, I turned to the pdf containing the 'overview' of the problem.


Wow, the algorithm described is very close to mine. One improvement is that the recursion has instead become an iteration. This is one area where I stumble every time - converting a recursive program into an iterative one. So I bookmarked this in my mind and moved on.

The next improvement was one that made me feel stupid for not thinking of it earlier. Since I'm going to find only prime factors, I need only use 2 and odd numbers in step 1. So, I can use a separate step for 2 and then start with 3 and increment
i by 2. This way, we'll save about half of the iterations...


UPDATE: There are quite a number of other improvements that must be done to this algorithm. For the actual algorithm for this problem (which develops on the algorithm here), see Finding the Largest prime factor efficiently.

The final perl program I wrote was:

use strict;
use warnings;

my $magic = 600851475143;

sub largestprimef($);
sub max($$);

print largestprimef($magic);
sub largestprimef($)
{
my $n = shift;

my $i;
return largestprimef(max(2, $n/2)) if($n % 2 == 0); 
my $sn = int(sqrt($n));

for ($i = 3; $i <= $sn; $i += 2)  { 
  last if($n % $i == 0);
} 
 
if($i > $sn) #loop ran over, means the number is prime
{
  return $n;
}
else
{
return max($i, largestprimef($n/$i));
}
}

sub max($$)
{
return (sort { $a <=> $b }(@_))[1];
}

A few notes about the program here:
  • not 'use'-ing the strict and warning pragmas gave a speed improvement of a few milliseconds
  • first I implemented max in the good old comparison way - if i > j, return i else return j. Turns out perl's sort implementation gives faster results than this equivalent speed. I just tried it for the heck of it , and was surprised to see a few milliseconds improvement. Probably the speed improvement came from the removal of 'shift's to get the numbers from @_ to $i and $j.(that was a temporary illusion; in repeated testing both turned out to be equivalent.
  • Since I did the program in Linux, the age old newline problem cropped up. When I pasted the code into blogger, it showed the code as one continuous mess. I had to use http://www.hashemall.com/ for converting the newlines. The site dutifully hashed the program, and thankfully returned my original text with MS Windows newlines in place of Unix ones.. (The hash value under sha-1 was b8b7188ad3440084e5db7f44fd70d599f897e33a, in case you are interested... :) )
  • The final runtime was around 14 ms.
  • UPDATE: The code highlighting is courtesy of Online syntax highlighter 'tohtml' which is an excellent service by the way.

25 comments:

Anonymous said...

No comments, huh? I for one found this useful, as I'm undergoing a lot of "self teaching" regarding number theory (and math in general). Without some outside help I'm not sure it would have occured to me that the smallest factor of a number must be prime - and this helps for a great many problems afterwards. Perhaps I should look at the overviews more often :)

Anonymous said...

Another solution. Start with a list of primes (or any increasing list of numbers that includes all the primes, if you're scared of making a list of primes). Divide by the first one until you can't any more. Then divide by the second one until you can't any more. And so on. Eventually, you'll find that the next prime *is* the remaining number. That's the largest prime factor.

That's essentially equivalent to the solution given here, except with the optimization that the recursive call doesn't really need to start out looking at small numbers that didn't work the previous time. They won't work this time either.

Anonymous said...

Another cheap optimization...

After checking for divisibility by 2, 3, and 5, increment i by 4 instead of 2, every other time:

5 +2 -> 7 +4 -> 11 +2 -> 13 +4 -> 17

This allows you to avoid checking i valuess that are already divisible by 3, reducing factors you have to check by a third.

Sundar said...

@First commenter
Glad to know it helped, I too am trying some 'self teaching' in number theory. Would be glad to know if there is any blog where you share your experiences too.

@Second commenter
Thanks, thinking about it, this is the only sensible way to do it iteratively, and I might have got it if I'd gone ahead and implemented an iterative version instead of all that talking. :) I've done one now, and will post soon on it.

@Third commenter
Yes, I got that advice from a commenter at the Reddit discussion on the post too, though in a different formulation. I've written a program using this method too, and will post soon with some benchmarks.

Unknown said...

Hi there. I'd just like to say thank you - I found your blog post today when attempting to solve Euler program #3 (in Python), and it was invaluable. I'm a second year computing student, but have never done prime factorisation, and your blog was the best source of information on an actual algorithm for doing it. It did take me a little while to totally understand your algorithm - "divide by successive integers" suggested at first to me that you should divide the number regardless of the remainder, and then divide the *new* number you would get by the next i until you reached an i that would divide neatly - but I got there in the end!

So, yeah, thanks!

Anonymous said...

This is how I solved the problem:

static void Main(string[] args)
{
long dividend = 600851475143;
long n = 2;

while (n < dividend)
{
if (dividend % n == 0)
{
dividend = dividend / n;

}
n++;
}


}

Sundeepan Sen

Anonymous said...

I don't get it; where does the sqrt(n) come from as an upper limit? Why is this the case?

Sundar said...

@Anonymous user:

sqrt(n) is always the upper limit for finding the square root, since if a factor say x is greater than sqrt(n), n/x will be less than sqrt(n), and so we would check x when we encounter n/x itself - note that we find "the maximum among i and FindLargestPrimeFactor(n/i)" in step two, and n/i in this case would be x.

Mark said...

This is my solution. It probably looks like a train wreck (I'm just a beginner), but it works.


#include
#include
#include

using namespace std;

int main() {
long n = 2;
double square;

int number = 3;
int factor = 0;

while (number < 100000) {

square = long int(sqrt(double(number)));


while (n <= square) {
if (number % n != 0) {
n++;
continue;
} else {
break;
}

}

if ( number % n != 0) {
if (600851475143 % number == 0)
if (number > factor) {
factor = number;
}

}
n = 2;
number++;
}
cout << factor << endl;
cin >> number;
}

Mark said...

Oh. At the beginning where I've written while (number < 100000), that is leftover from another thing I was doing. Should have probably changed that to something better.

test said...

Instead of starting from 2 i started from the square root of 600851475143 and kept decrementing the loop variable. On The first prime factor the loop breaks and you have your answer. :)

George said...

The line
return largestprimef(max(2, $n/2) if($n % 2 == 0));

Was giving me a syntax error on the default install of Perl on OS X Snow Leopard, Perl 5.10.0.

Changing parentheses to
return largestprimef(max(2, $n/2)) if($n % 2 == 0);

And it works fine, the answer it reports back is accepted as correct on Project Euler.

bess_Bisrat_Tekle said...

this how i solved it in cpp


#include
#include
using namespace std;

int lpf(long long int n){
for(int i=sqrt(n); i>=2; i--)if(n%i==0)return max(lpf(i),lpf(n/i));
return n;
}

int main(){
cout<<lpf(600851475143LL);
}

Stephen Holtom said...

Even reading the comments above, I still don't get why we can limit the search at sqrt(x).

The highest prime factor of 34 is 17, whereas it's square root is about 5.8.

Sundar said...

@Stephen Holtom: Of course, most numbers do have factors above their square roots, there's no dispute there. What we're doing here is to exploit the fact that, for each factor greater than sqrt($n), there is going to be a corresponding factor less than sqrt($n) - such that multiplying those two will give $n.

So, here we iterate up to sqrt($n), but in each iteration, if the current $i happens to be a factor of $n, we use both $i and $n/$i to determine the largest factor.

In your example, when we find that 34 is divisible by 2, the line

return largestprimef(max(2, $n/2) if($n % 2 == 0));

will find $n/2 = 34/2 = 17 to be the larger number, and call the function recursively with 17. During this call, in turn, the lines

if($i > $sn) #loop ran over, means the number is prime
{
return $n;
}


will make sure 17 itself is returned as the result. So, the program will print 17 correctly.

Hope this clarifies it for you.

Stephen Holtom said...

Ah yes of course. Not really sure how I missed that :/
Thanks for the explanation!

Anonymous said...

#include
using namespace std;
int main()
{
long long int ieskom = 1;
int i = 1;
while ( ieskom < 600851475143 )
if ( 600851475143 % i == 0 )
{
ieskom = ieskom * i;
i++;
}
else i++;
cout << --i << endl;
system ("pause");
}

thats it...

Sundar said...

@Anonymous the last:
That happens to work for this number, but seems like an unsystematic solution. It doesn't even check whether the factor is prime, and prints wrong results for even simple cases such as 12 (prints 4) and 45 (prints 9).

Anonymous said...

you cant be sure the largest prime factor is smaller than the sqrt of the number, most of the time its right but take semiprime numbers for example 26 largest prime is 13... sqrt of 26 is just above 5...

rdsoze said...

exactly .. the limit need to be (n/2) and not sqrt(n) . The largest factor of a number is itself. The next largest factor cannot exceed (n/2). The third largest factor cannot exceed (n/3). So in this case as we are not concerned with the factor n, we should stick till n/2.

rdsoze said...

exactly .. the limit need to be (n/2) and not sqrt(n) . The largest factor of a number is itself. The next largest factor cannot exceed (n/2). The third largest factor cannot exceed (n/3). So in this case as we are not concerned with the factor n, we should stick till n/2.

Anibal said...

Perhaps my code isn't as sophisticated but it works =)

def main(args: Array[String]) {

var rest: BigInt = BigInt("600851475143")
var number: Long = 2

while(number < rest){
if(rest%number == 0)
rest = rest / number;
else
number += 1;
}

println("Answer: " + rest.toString());

}

Unknown said...

Thanks a ton for a useful post. :) I'm going through the Project Euler problems while getting more comfortable with F# and functional coding in general. This post help me organize my thoughts a bit and got me away from a much slower prime checking or finding the primes type implementation. If curious, here's the algorithm in F#. I did go with a recursion rather than for loop, since it seemed more idiomatic, but it seems to work. :)

let max n1 n2 =
if n1 > n2 then n1 else n2

let rec hpf number =
if number % 2L = 0L then hpf (max 2L (number/2L))
else
let limit = number |> float |> sqrt |> int64
let rec hpf2 n =
match n with
| _ when n > limit -> number
| _ when number % n = 0L -> max n (hpf (number / n))
| _ -> hpf2 (n + 2L)
hpf2 3L

Pranith said...

Why only check up to sqrt(n)?
if n is not a prime and n = a*b then one of a or b is less than sqrt(n) which implies the other is greater than. In this case if the greater number is prime then we miss that number.
For example consider the number 87. The largest prime factor is 29. But while checking the largest prime factor if we only check up to floor(sqrt(87)) which is 9 then we won't be able to find the largest prime factor.
I am unable to get the logic so explanation would be appreciated.

Sundar said...

Hi @Maddineni Pranith Srujan Roy,

As you say, if n = a*b, when one of them is greater than sqrt(n), the other will be less. What we're doing in this algorithm is, finding the lesser number, say a, and then finding b immediately there by taking n/a. That's done in this line of code:

return max($i, largestprimef($n/$i));

So, for eg. with 87, checking up to sqrt(87) will find 3 as a factor along the way. When we find 3, the algorithm returns max(3, 87/3), which is max(3, 29), which will return 29.
(Actually the algorithm first calls largestprimef(29) first, because in general we can't be sure that $n/$i is prime. But that's a detail separate from what we're discussing here.)

Basically, checking up to sqrt(n) is just a shortcut way of checking all the factors of n, since for every factor a that we find that's less than sqrt(n), we also immediately check the corresponding factor b = n/a, which will be greater than sqrt(n).