I had a list of mp3 files like:
ANBE_VAA_ARUGILE.MP3
ANBU_THAYAE.MP3
ANDANVAN_THEANAKTRU.MP3
ANNANENNA.MP3
and wanted the list of song names from this… What shall we do?
* Open Cygwin (Unix emulator within Windows), navigate to that folder, do:
ls | cut -d. -f1 | sed –e 's/_/ /g'
list the files in this directory, cut and extract the 1st field with delimiter as . character, then substitute space character in place of _ character globally. In other words, get the actual filename removing the .mp3 extension, then replace underscores with spaces.
* Save the above output to a file
Actually this too should be a part of the above step, like:
ls | cut -d. -f1 | sed –e 's/_/ /g' > ./songnames.txt
The file songnames.txt now has:
ANBE VAA ARUGILE
ANBU THAYAE
ANDANVAN THEANAKTRU
ANNANENNA
* Open the file in Vim editor, and select all the text by pressing ggVG. Then, press : to enter command mode, and Vim will show :’<,’> and wait for your input. Type this:
!perl –lpe "@words = map { ucfirst(lc($_)) } split; $_ = join(\" \", @words)"
and press enter, and voila, the list has become:
Anbe Vaa Arugile
Anbu Thayae
Andanvan Theanaktru
Annanenna
Beautiful, isn't it?
Thank God, Unix exists! :)
A weblog of algorithms, analyses, adventures and anecdotes from a lover of Math and Programming.
(If your device has problems displaying this blog properly, try using http://goo.gl/MUyY)
Showing posts with label perl. Show all posts
Showing posts with label perl. Show all posts
Thursday, 1 April 2010
Sunday, 4 May 2008
Largest prime factor of a number
Long live the programming overlords! Today is the day I have begun serious programming again... 'serious' is perhaps a bit too much - programming is never actually serious, it's a sort of fun activity that happens to appear boring and difficult and hence obtains good paychecks from the HR types... ;)
Anyway, I happened to come across the Project Euler and registered in the site. The registration was painless and I fell in love with the site immediately. Problems 1 and 2 were trivial (finding the sum of the multiples of 3 or 5 under 1000, finding the sum of even numbers in the Fibonacci series which are less than 4 million).
Problem 3 was a little more involved. I had to find the largest prime factor of 600851475143. It isn't particularly difficult, but it does call for some thinking. The first method I thought of was to
- divide the number by successive integers upto sqrt(n), and
- when the remainder is 0, check whether the divisor is prime,
- if so, store it as answer overwriting the previous value, if any.
An obvious improvement was to come down from sqrt(n) so that the first number to satisfy conditions 2 and 3 would be the answer.
However, by this time, something seemed fishy in this method. Particularly, it occurred to me that a prime number checking was costly and unnecessary here. Instead, you can go about it using a recursive solution:
However, by this time, something seemed fishy in this method. Particularly, it occurred to me that a prime number checking was costly and unnecessary here. Instead, you can go about it using a recursive solution:
FindLargestPrimeFactor(n):
- divide the number by successive integers (each denoted by i) upto sqrt(n),
- when the remainder is 0, return the maximum among i and FindLargestPrimeFactor(n/i).
It might not be clear at first sight that the algorithm works correctly. There seems to be no check for primality. However, this algorithm exploits the fact that the smallest factor sf of a number n will necessarily be prime. This is because, if it were not prime, its factors would be factors of n and hence sf would not be the smallest factor. Thus, we need only find the smallest factor to have a prime factor. Then, we can divide away this factor and again find the smallest factor. This way, we can have all the prime factors of a number.
In the algorithm, we find a prime factor of n, compare it with the largest prime factor of n/i, and return the larger number. Thus, finally, we'll have the largest prime factor of the number n itself.
Obviously, we need a limit for the recursion. In this case, this limit occurs when i overflows sn. This means that n is prime, and hence we must return n itself as the largest prime number.
Obviously, we need a limit for the recursion. In this case, this limit occurs when i overflows sn. This means that n is prime, and hence we must return n itself as the largest prime number.
This was the logic of my program as I coded and finished it. I entered the answer into the Project Euler website (which is projecteuler.net, by the way). After that, I saw the forums in the website and saw that many had implemented my original algorithm which used prime testing. This more efficient recursive solution had missed the neurons of many... With that boost to my hubris, I turned to the pdf containing the 'overview' of the problem.
Wow, the algorithm described is very close to mine. One improvement is that the recursion has instead become an iteration. This is one area where I stumble every time - converting a recursive program into an iterative one. So I bookmarked this in my mind and moved on.
The next improvement was one that made me feel stupid for not thinking of it earlier. Since I'm going to find only prime factors, I need only use 2 and odd numbers in step 1. So, I can use a separate step for 2 and then start with 3 and increment i by 2. This way, we'll save about half of the iterations...
UPDATE: There are quite a number of other improvements that must be done to this algorithm. For the actual algorithm for this problem (which develops on the algorithm here), see Finding the Largest prime factor efficiently.
The final perl program I wrote was:
use strict; use warnings; my $magic = 600851475143; sub largestprimef($); sub max($$); print largestprimef($magic); sub largestprimef($) { my $n = shift; my $i; return largestprimef(max(2, $n/2)) if($n % 2 == 0);
my $sn = int(sqrt($n)); for ($i = 3; $i <= $sn; $i += 2) {
last if($n % $i == 0);
}
if($i > $sn) #loop ran over, means the number is prime { return $n; } else { return max($i, largestprimef($n/$i)); } } sub max($$) { return (sort { $a <=> $b }(@_))[1]; }
A few notes about the program here:
- not 'use'-ing the strict and warning pragmas gave a speed improvement of a few milliseconds
- first I implemented max in the good old comparison way - if i > j, return i else return j. Turns out perl's sort implementation gives
faster results than thisequivalent speed. I just tried it for the heck of it, and was surprised to see a few milliseconds improvement. Probably the speed improvement came from the removal of 'shift's to get the numbers from @_ to $i and $j.(that was a temporary illusion; in repeated testing both turned out to be equivalent. - Since I did the program in Linux, the age old newline problem cropped up. When I pasted the code into blogger, it showed the code as one continuous mess. I had to use http://www.hashemall.com/ for converting the newlines. The site dutifully hashed the program, and thankfully returned my original text with MS Windows newlines in place of Unix ones.. (The hash value under sha-1 was b8b7188ad3440084e5db7f44fd70d599f897e33a, in case you are interested... :) )
- The final runtime was around 14 ms.
- UPDATE: The code highlighting is courtesy of Online syntax highlighter 'tohtml' which is an excellent service by the way.
Wednesday, 7 November 2007
Law of small numbers and such...
I just read this at Wolfram Mathworld; and something flashed within me. What if the differences between this poly and the actual powers of two were significant? Well, being terribly bored and having a lot of time to kill, I set out to explore that. I was stuck at the first step itself - I was in Linux, and didn't know any way to do math explorations like this. Had I been in Windows, I would have used Matlab (damn good and surprisingly easy to use). I'm sure there's some 'popular' Free Software for this kind of job, but I was not in a mood to search for software packages and get into dependency hell just yet. So, I just asked myself, can I do it with what I do know to do in Linux?The first strong law of small numbers (Gardner 1980, Guy 1988, Guy 1990) states "There aren't enough small numbers to meet the many demands made of them."
The second strong law of small numbers (Guy 1990) states that "When two numbers look equal, it ain't necessarily so." Guy (1988a) gives 35 examples of this statement, and 40 more in Guy (1990). For example, example 35 notes that the first few values of the interpolating polynomial[...] for
, 2, ... are 1, 2, 4, 8, 16, .... Thus, the polynomial appears to give the powers of 2, but then continues 31, 57, 99, ... (Sloane's A000127).
Writing C programs. Perl scripts. Shell scripts. For this kind of job, Perl somehow seemed more appropriate. So, I typed out a Perl script to find the differences (here f(n) is the 4th order polynomial given in the above article):
n | f(n) | 2^n | diff |
6 | 31 | 32 | 1 |
7 | 57 | 64 | 7 |
8 | 99 | 128 | 29 |
9 | 163 | 256 | 93 |
10 | 256 | 512 | 256 |
11 | 386 | 1024 | 638 |
12 | 562 | 2048 | 1486 |
13 | 794 | 4096 | 3302 |
14 | 1093 | 8192 | 7099 |
15 | 1471 | 16384 | 14913 |
16 | 1941 | 32768 | 30827 |
17 | 2517 | 65536 | 63019 |
18 | 3214 | 131072 | 127858 |
19 | 4048 | 262144 | 258096 |
20 | 5036 | 524288 | 519252 |
21 | 6196 | 1048576 | 1042380 |
22 | 7547 | 2097152 | 2089605 |
23 | 9109 | 4194304 | 4185195 |
24 | 10903 | 8388608 | 8377705 |
To check this out, I then printed the quotient obtained by dividing each difference by its predecessor. Lo and behold, suddenly, there was Light:
n | f(n) | 2^n | diff | quotient |
6 | 31 | 32 | 1 | ~ |
7 | 57 | 64 | 7 | 7 |
8 | 99 | 128 | 29 | 4.14285714285714 |
9 | 163 | 256 | 93 | 3.20689655172414 |
10 | 256 | 512 | 256 | 2.75268817204301 |
11 | 386 | 1024 | 638 | 2.4921875 |
12 | 562 | 2048 | 1486 | 2.32915360501567 |
13 | 794 | 4096 | 3302 | 2.22207267833109 |
14 | 1093 | 8192 | 7099 | 2.14990914597214 |
15 | 1471 | 16384 | 14913 | 2.10071841104381 |
16 | 1941 | 32768 | 30827 | 2.06712264467243 |
17 | 2517 | 65536 | 63019 | 2.04427936549129 |
18 | 3214 | 131072 | 127858 | 2.02888017899364 |
19 | 4048 | 262144 | 258096 | 2.01861440035039 |
20 | 5036 | 524288 | 519252 | 2.01185605356147 |
21 | 6196 | 1048576 | 1042380 | 2.00746458367036 |
22 | 7547 | 2097152 | 2089605 | 2.00464801703793 |
23 | 9109 | 4194304 | 4185195 | 2.00286417767951 |
24 | 10903 | 8388608 | 8377705 | 2.00174782775952 |
That is the kind of beauty math has. You obtain a sequence that looks like powers of 2 for small numbers, you find that it deviates from the actual value for higher powers, then you find that the deviation itself is increasing in powers of two. This thing has some inexplicable beauty to it... :)
So, now that we've found that this thing seems to converge to 2, how do we interpret it? That's the kind of thing for which you need mathematical training. If I had had some good math course on series and sequences, may be I would know what to do with this. May be I'd theoretically prove that the quotients actually converge to 2 as n tends to infinity. May be we can show that this f(n) can indeed be used to find powers of 2.
For now, let's get into the nitty gritty of this number crunching. First, I wrote this cute little program (any claims to its uncuteness are by Agents of Satan; avoid them like plague) :
#!/usr/bin/perl
use warnings;
use strict;
my @range = (6 .. 24); #6 is where the differences start to be non-zero
my $i;
my @diffs;
foreach $i (@range)
{
print $i,"\t\t";
print (( $i**4 - 6*$i**3 + 23*$i**2 - 18*$i + 24 ) / 24, "\t\t");
print 2**($i - 1), "\t\t";
push @diffs, (2**($i - 1) - (( $i**4 - 6*$i**3 + 23*$i**2 - 18*$i + 24 ) / 24));
print $diffs[$#diffs], "\t\t";
$#diffs && print $diffs[$#diffs]/$diffs[$#diffs-1];
print "\n";
}
Then, I ran the output through txt2html with --make_tables option. I got the HTML code for a table, which you can see with naked eye by right clicking here and choosing 'View source' (or whatever option reads close to that). Using the code, I made a blog post named 'Law of small numbers and such...'.
You just finished reading it!
Subscribe to:
Posts (Atom)