Monday, 27 April, 2009

Finding the Largest prime factor efficiently

Long long ago, I wrote an article on finding the largest prime factor of a number (if you haven't read it, I'd recommend reading it now as much of this article won't make sense without that). It was just a fun little thing I did for solving Project Euler, so I didn't expect it would invite such excellent comments and get posted to Reddit!

And almost every one of the comments pointed out how the algorithm could be made better. I hadn't really thought there would be more optimal solutions than the ones given out by Project Euler, but turns out there are!

The first improvement was one I might have done myself if I had tried to do an iterative version myself. In the recursive version, when I find the number is divisible by (say) 2, I divide out the 2 and call the function recursively. Now, if there's another 2 in there (that is, if the number was divisible by 4), the number would again get divided by 2 and another recursive call would occur. When I'm converting this algorithm to an iterative version too, the most natural way would be:
my $max_factor = 1;
while($n % 2 == 0)
$max_factor = 2;
$n /= 2;

for(my $i = 3; $i < $n; $i += 2)
while($n % $i == 0)
$max_factor = $i;
$n /= $i;

There's a subtle optimization (compared to the recursive version) that has happened here that you probably missed. It's actually a simple one - in the recursive version, when you've divided n by say 2, you pass it back to the same function - which again tries to divide it by 2. This is good as long as n is divisible by 2, but once you've divided out all the 2's, this becomes just a waste of CPU cycles. In the iterative version, this is avoided because, once we know we've divided out all the 2's, we no longer try dividing by that, and instead move on to greater things. That's a Good Thing, as for big numbers, this optimization might save several cycles.

But even this turns out to be a naive implementation. As someone in the math subreddit as well as one of the commenters here pointed out, there's another optimization that can be done here - after 2 and 3, what are the prime numbers you have? 5, 7, 11, 13, 17, 19, 23, 29, etc. Is there some common property these numbers have? Why, yes, they are all either of the form 6*n+1 or 6*n-1. Turns out that it has been proved that all primes after 2 and 3 are of either of these forms (second half of second paragraph). So, instead of trying out all the odd
numbers, we can try only the numbers of these forms. The first code I wrote for this purpose was,

$i = 6;
while($n > ($factor = $i - 1))
$n /= $factor while $n % $factor == 0;

$factor = $i + 1;
last if($n == $factor);
$n /= $factor while $n % $factor == 0;

$i += 6;
That basically just implements the concept of trying out 6n +/- 1 for each value of n. After that, I remembered a comment on the previous post which had made me realize that this is the same as just adding 2 and 4 in alternate iterations of the check. Just for the heck of it, I tried that too:

$factor = 5;
while($n > $factor)
$n /= $factor while $n % $factor == 0;
$factor += 2;
last if($factor == $n);

$n /= $factor while $n % $factor == 0;
$factor += 4;

Finally, I put these all in different subroutines in a Perl program, and benchmarked those using Perl's module. It's a very easy to use module - my entire benchmarking code consists of

# Upper limit for random number generation
my $rangemax = 1000_000;
cmpthese(-10, {
# Recursive => " largestprimefRecurs((int(rand($rangemax))))",
Naive_Iteration => " largestprimefNaiveIter(int(rand($rangemax)))",
Arcane6nplusminus1 => " largestprimef6n(int(rand($rangemax)))",
Arcane6nadding2and4 => " largestprimef6nadding2and4(int(rand($rangemax)))",
Arcane6nusingCount => " largestprimef6nusingCount(int(rand($rangemax)))"

The -10 there asks Benchmark to run each code segment for at least 10 CPU seconds, so that random fluctuations due to irrelevant factors are smoothed out, and we have reliable results.

You might notice that the call to the recursive version (the original one) is commented out. Due to (what I believe to be) a problem with the module, I was not able to benchmark it using this method. would report that the recursive function ran thousands of times faster than the newer methods (probably counting the recursive calls too), but other testing revealed it was actually much slower:

sundar@System:~$ time for (( i=0; i<500; i++ )); do perl; done;
real 0m5.631s
user 0m3.848s
sys 0m1.556s

sundar@System:~$ time for (( i=0; i<500; i++ )); do perl; done;
real 0m3.219s
user 0m1.692s
sys 0m1.344s

We see the recursive version is slower by more than 2 seconds! I'm planning to investigate more about this behaviour of (a quick googling didn't turn up much), but for now suffice to know the old algorithm is much slower.

There's another entry in the code there that I haven't talked about: Arcane6nusingCount. [By the way, the 'arcane' naming convention is a tongue-in-cheek reference to this comment where the commenter said "I get the feeling that [the programmer who wrote this post] would consider that working with 6n +/- 1 as too arcane a method..." :) I couldn't start the function name with the digit 6, so I added 'Arcane' before each one. ;)]
Anyway, that Arcane6nusingCount was a version where I tried to implement the 6n +/- 1 algorithm by keeping a counter of the iterations, and incrementing by 2 or 4 based on whether count%2 was 0 or 1. Turns out this extra mod operation kills performance, as you can see in the benchmark results below.

Benchmark Results:
Due to Blogger's @&%*&%# formatting, I had to change the names to shorter versions as follows:
Naive_Iteration     : Iter
Arcane6nplusminus1 : 6n+/-1
Arcane6nadding2and4 : Add2,4
Arcane6nusingCount : Count
The percentages are the amounts by which the algorithm in the horizontal (the row) is better than the one in the vertical (the column). The /s amounts are the number of times this subroutine could have been run in one second, averaged over the given inputs. For this test, the inputs were random numbers from 0 to 1000,000, as seen in the code above.

Trial 1

Rate Count Iter 6n+/-1 Add2,4
Count 45.9/s -- -24% -35% -39%
Iter 60.1/s 31% -- -15% -20%
6n+/-1 70.5/s 54% 17% -- -6%
Add2,4 75.2/s 64% 25% 7% --

Trial 2

Rate Count Iter 6n+/-1 Add2,4
Count 43.1/s -- -25% -41% -42%
Iter 57.4/s 33% -- -22% -23%
6n+/-1 73.6/s 71% 28% -- -2%
Add2,4 74.9/s 74% 30% 2% --

We see that the method of adding 2 and 4 directly (Arcane6nadding2and4) works best, but the one using multiples of 6 and dividing by $i-1 and $i+1 (Arcane6nplusminus1) is only slightly worse. And, in my opinion, it is clearer and more elegant. They're both much better than the naive iterative method, and the original recursive solution is not even comparable to these!

So, I guess the moral of the story is, look at ways you can optimize your code even if an 'authoritative source' seems to tell you've reached the optimum. Also, where performance is critical, profile and benchmark your code - they are much more reliable practically than theoretical algorithmic analyses (of course, this is true only when they are done right).

The program along with benchmarking code is available at:
To run it, you need to have the Perl module installed (which would probably be available by default).

Have a nice day...


Mike said...

Great information and analysis! I have incorporated some of your ideas in my own Project Euler blog for problem 3. Thanks for the effort.
Check it out here.

numericalrecipes said...

Hi Sundara,

I'm afraid you are misusing "efficient" in your analysis... An algorithm that runs twice as fast as another one, and does so for any input is, I'm afraid, as efficient as the other one. The usual meaning of efficiency in Computer Science is related to computational complexity, and in the end boils down to how the time to a solution scales with the input data.

In your problem, in the worst case situation, i.e. when the number to test is prime, or the square of a prime, you re having to do about sqrt(N) trial divisions, where N is the given number.

Your optimizations can cut that down to sqrt(N)/3, and much further if you are not in the wrost case situation. But in the end, your solution still scales as sqrt(N): if the input is multiplied by 4, the run time is multiplied by 2.

I recently wrote a post on factorization, which presents a way of doing what you are after in a truly more efficient way, i.e. with lower computational complexity. I still consider it a very naive approach, but it may help you gain some insight into the fascinating world of efficient algorithms. And, by the way, neither yours nor mine really are (efficient)...

daksh said...

I think the reason why the recursive one is slower is because of the overhead that the repeated function calls put on the processor.

Praveen said...

The number 2639 = 440*6-1.
But this is not a prime. 377*7 = 2639
So the premise that all numbers which are 6*n-1 or 6*n+1 is not correct.

Praveen said...

All numbers which are 6*n-1 or 6*n+1 are not prime numbers. Example 2639 = 440*6-1. But is has a factor 377.

Sundar said...

@Praveen: The premise is not that all numbers of those forms are prime, instead that any prime after 2, 3 will be in one of those forms.

That is, let's say the set of numbers of the form 6n-1 and 6n+1 is X. And let the set of prime numbers be P. You're saying that some members of X (such as 2639) are not in P, and that's indeed correct. But what we're using in the article is the fact that all members of P are in X. X contains all prime numbers after 2,3, and in addition contains many non-prime numbers also.