The Tao of Programming: 2009

Wednesday, 11 November 2009

Perl poetry

Perl is said to be close to natural language in many ways. But, can you write a poem in it that expresses your heart? Let's see:

my $sweetheart;
my $heart = $yours for $ever;
do { accept my $love, my $dear }, or kill $me;

bless \$me, my $Lord; join($us, $together) until eternity; 
heaven: our $home, wait for $us; 

my $soul, my $darling, say "I love you";
let's, $honey, splice(@together), not split $ever,  
                                   and goto heaven, $together;

Perl indeed does a good job, isn't it?
So, why am I into this crazy stuff suddenly? Well, there was this post in my friend Sharmi's blog that said:

Love is the language between two hearts; for everything else there is C#

which is apparently a praise of C#. Well, I thought, why for everything else only? :)

The real deal here is that the above is valid Perl code, and when executed, prints "I love you" repeatedly on the screen.

Now take that, C#. :)

*Execution instructions: You need Perl 5.10 or above, and probably need to enable the 'say' feature. If you saved the above program as perlpoetry.pl, the invocation might be: perl -Mfeature=say perlpoetry.pl
This would fill your screen with text, so be ready to press Ctrl-C immediately. :)

PS: I do like C#, and in fact think MS did a great service to humanity by relieving many developers from Java. It's just that I love Perl even more. ;)

Friday, 23 October 2009

The Internet vs a cycler

I’ve been reading Programming Pearlsfor some time now. It’s a great book, the kind of book that makes you realize what programming really is about.

Among the exercises behind Column 6 in it, there was this strange problem: “At what distances can a courier on a bicycle with a reel of magnetic tape be a more rapid carrier of information than a telephone line that transmits 56,000 bits per second? Than a 1200-bps line?”

The thought that a bicycle can be faster than an electronic communication line seems ridiculous indeed, but it turns out it’s not difficult. For the sake of modernity, let’s replace the ‘magnetic tape’ with a hard disk. 1TB disks are common enough these days that I’ll use that for the calculations.

Now, let’s calculate the time taken for transmitting ‘g’ Gigabytes of data using both the bicycle method and an Internet line, between two places which are ‘d’ kilometres apart. In the bicycle method, we have two kinds of ‘time’s to consider: the time taken to transfer things to and from the hard disk, and the time taken in the bicycle itself. These days, data transfer rates in hard disks lies around 25MB/s from my experience, so I’ll use that as the transfer rate here. Then, 1 gigabyte will take 1GB/25MB ~= 40 seconds. For g GB, it is 40g seconds. This is for one transfer. Since we do this twice (transferring to the hard disk, and later back from it), it’d be 80g seconds in total. In addition, we have the time taken in the bicycle itself. Assuming a minimal speed of 10kmph by the bicycler, the time taken would be d/10 hours, which in seconds would be 360d. So, the total time is 360d + 80g.

For a 56kbps Internet line, the time taken for transferring the same g gigabytes would be the upload time + download time. From discussions in the Internet, it appears that upload speeds were “from 1/2 to less of the the download speed”. Of course, this is not ‘authoritative’, but is good enough for our purposes. The download speed here is 7KB/s (since 56kbps = 7KB/s). Let’s take the upper limit and take the upload speed to be half of download i.e. 3.5KB/s. Then, time taken = 1000,000g/3.5 + 1000,000g/7 seconds (since 1GB ~= 1000,000KB) = 3000,000g/7. So we’ve got to solve for d in:

360d + 80g < 3000,000g/7

360d < (3000,000 – 560)g/7

d < 2999440g/2520

d < 1200g (approx.)

If g = 1GB, the bicycle is faster for upto a distance of 1200 km.

If we are to use the 1TB hard disk to its full capacity, g ~= 1000GB. Substituting that,

d < 1,200,000 km

which means that upto an astounding distance of about 1200,000 kilometres (that’s about 750,000 miles), a bicycle with a 1TB hard disk is faster than a 56kbps line for transferring 1TB of data!

Now, 56kbps is one of those old dialup lines – the more common internet connection these days is a broadband connection. For purposes of modernity, let’s now assume a 1Mbps connection, which transfers at a rate of 128KB/s. Again, the upload speed is less. And in this case, it appears that upload speeds of 1/4th of download are most common. Hence, let’s assume an uplink of 128/4 = 32KB/s. Then, the inequality is:

360d + 80g < 1000,000g/32 + 1000,000g/128

360d < 4999920g/128

d < 108g

Again, substituting g = 1,

d < 108 km

For 1024 GB (i.e., 1TB data),

d < 110,000 km

Given that the earth’s circumference itself is just 40,000 km, this means that if you need to transfer 1TB of data to anywhere on earth, you’re better off sending a bicycle courier than sending it through today’s internet connections (of course, there’s the issue of crossing the oceans, which we’ll ignore for simplicity ;) )

Now, the same is not true if you need to transfer only 1GB – as we saw above, a bicycle is faster only upto 108 km. So, for what amount of data is a bicycle faster to anywhere on earth? Let’s do one final calculation.

The maximum distance you’d need to travel on earth is half its circumference (can you see why?). This is 20,000 km. So, let’s make d=20,000 km in the inequality and see what g we get.

360*20,000 < 4999920g/128

g > 184 GB

So, the moral of the story is, even if you have a 1Mbps broadband line, if you need to transfer more than 184 GB of data, you’d achieve better speeds to anywhere on earth by carrying a hard disk on a bicycle than by transferring it through the 1Mbps connection!

Wednesday, 1 July 2009

Programming for the Finance industry

The last two days, I had a training in finance. While the class was entirely about finance, the trainer mentioned quite a few times how programmers (he called us "IT professionals") had made everything very easy and transparent.

So, I thought about the various things he had taught, and realised that finance is probably the best application of textbook programming. Ok, maybe there are other fields like scientific programming, etc. to compete to that title, but finance is indeed an extremely suitable field.

If you think about it, almost all of finance is about numbers and logic. The two things computers do very well. The only two things computers do 'naturally'.

The trainer told us about one common task that was a chore to do manually until computers took over - checking a stock price every now and then, and performing an action when the price reached a certain value. You can imagine how boring and difficult a job this would have been when done manually.

I thought about it from a programmer's point of view, and couldn't help smile when I realized how easy this would be to program. Just keep fetching the value, comparing and waiting in a loop until you get to the required value. As I said, a textbook application of programming.

There were many other things, obviously - finance is full of calculations: calculating the interests, computing dividends, finding the tax taking into account all the various exemptions, etc. Every one of them is just an application of the most basic operations of current computers - number crunching and logic evaluation.

But, this is looking at only one face of finance - the part which involved tedious manual calculations which are algorithmically specifiable. However, there is another face to finance which is what makes the experts who they are. The part that involves judgement.

If you are any familiar with Artificial Intelligence, you'll know how difficult it is to teach computers to do this. Even most humans find it difficult to give a reasonably correct prediction of how a stock price will change or how an investment will turn out beforehand. How can mere computers which are just number crunching machines do it?

However, we humans are not an easily-giving-up lot. Using these mere number crunching machines, we're trying to build things that 'think'. We have things like neural networks which try to learn from previous data and project them to the future. We have Genetic Programming methods to evolve good models of the stock price variations. Of course, it's not foolproof and will probably never be. But, it's a wonder that we've even managed to even take a number-crunching-machine to the level of a (sometimes faulty) thinker. Also, as we give the system more and more data about the economics involved, it will get more and more accurate. Currently, we're giving them only numbers - previous stock prices, possibly the company's profits and losses, and such relevant data. However, we're currently mostly unable to give some important data that a human expert has access to - the news surrounding the company, the quality of the current management, etc. When we one day find a way to feed these things to the computer (or better, have the computer find those things itself), I believe we can expect much better predictions.

Finance industry has benefited a lot from the 'primitive' types of computation we are now easily able to do - extremely fast mathematical calculations and reliable logic evaluation. It appears it also stands to benefit from the 'newer generations' of computing which are coming up - where computers will cease to be mere machines and will 'understand' and 'think' about things!

Monday, 27 April 2009

Finding the Largest prime factor efficiently

Long long ago, I wrote an article on finding the largest prime factor of a number (if you haven't read it, I'd recommend reading it now as much of this article won't make sense without that). It was just a fun little thing I did for solving Project Euler, so I didn't expect it would invite such excellent comments and get posted to Reddit!

And almost every one of the comments pointed out how the algorithm could be made better. I hadn't really thought there would be more optimal solutions than the ones given out by Project Euler, but turns out there are!

The first improvement was one I might have done myself if I had tried to do an iterative version myself. In the recursive version, when I find the number is divisible by (say) 2, I divide out the 2 and call the function recursively. Now, if there's another 2 in there (that is, if the number was divisible by 4), the number would again get divided by 2 and another recursive call would occur. When I'm converting this algorithm to an iterative version too, the most natural way would be:

my $max_factor = 1;
while($n % 2 == 0)
{
$max_factor = 2;
$n /= 2;
}

for(my $i = 3; $i < $n; $i += 2)
{
while($n % $i == 0)
{
$max_factor = $i;
$n /= $i;
}
}

There's a subtle optimization (compared to the recursive version) that has happened here that you probably missed. It's actually a simple one - in the recursive version, when you've divided n by say 2, you pass it back to the same function - which again tries to divide it by 2. This is good as long as n is divisible by 2, but once you've divided out all the 2's, this becomes just a waste of CPU cycles. In the iterative version, this is avoided because, once we know we've divided out all the 2's, we no longer try dividing by that, and instead move on to greater things. That's a Good Thing, as for big numbers, this optimization might save several cycles.

But even this turns out to be a naive implementation. As someone in the math subreddit as well as one of the commenters here pointed out, there's another optimization that can be done here - after 2 and 3, what are the prime numbers you have? 5, 7, 11, 13, 17, 19, 23, 29, etc. Is there some common property these numbers have? Why, yes, they are all either of the form 6*n+1 or 6*n-1. Turns out that it has been proved that all primes after 2 and 3 are of either of these forms (second half of second paragraph). So, instead of trying out all the odd
numbers, we can try only the numbers of these forms. The first code I wrote for this purpose was,

$i = 6;
while($n > ($factor = $i - 1))
{
$n /= $factor while $n % $factor == 0;

$factor = $i + 1;
last if($n == $factor);
$n /= $factor while $n % $factor == 0;

$i += 6;
}

That basically just implements the concept of trying out 6n +/- 1 for each value of n. After that, I remembered a comment on the previous post which had made me realize that this is the same as just adding 2 and 4 in alternate iterations of the check. Just for the heck of it, I tried that too:

$factor = 5;
while($n > $factor)
{
$n /= $factor while $n % $factor == 0;
$factor += 2;
last if($factor == $n);

$n /= $factor while $n % $factor == 0;
$factor += 4;
}

Finally, I put these all in different subroutines in a Perl program, and benchmarked those using Perl's Benchmark.pm module. It's a very easy to use module - my entire benchmarking code consists of

# Upper limit for random number generation
my $rangemax = 1000_000;
cmpthese(-10, {
# Recursive => " largestprimefRecurs((int(rand($rangemax))))",
Naive_Iteration => " largestprimefNaiveIter(int(rand($rangemax)))",
Arcane6nplusminus1 => " largestprimef6n(int(rand($rangemax)))",
Arcane6nadding2and4 => " largestprimef6nadding2and4(int(rand($rangemax)))",
Arcane6nusingCount => "  largestprimef6nusingCount(int(rand($rangemax)))"
});

The -10 there asks Benchmark to run each code segment for at least 10 CPU seconds, so that random fluctuations due to irrelevant factors are smoothed out, and we have reliable results.

You might notice that the call to the recursive version (the original one) is commented out. Due to (what I believe to be) a problem with the Benchmark.pm module, I was not able to benchmark it using this method. Benchmark.pm would report that the recursive function ran thousands of times faster than the newer methods (probably counting the recursive calls too), but other testing revealed it was actually much slower:

sundar@System:~$ time for (( i=0; i<500; i++ )); do perl Recursive.pl; done;
real 0m5.631s
user 0m3.848s
sys 0m1.556s

sundar@System:~$ time for (( i=0; i<500; i++ )); do perl 6nMethod.pl; done;
real 0m3.219s
user 0m1.692s
sys 0m1.344s

We see the recursive version is slower by more than 2 seconds! I'm planning to investigate more about this behaviour of Benchmark.pm (a quick googling didn't turn up much), but for now suffice to know the old algorithm is much slower.

There's another entry in the code there that I haven't talked about: Arcane6nusingCount. [By the way, the 'arcane' naming convention is a tongue-in-cheek reference to this comment where the commenter said "I get the feeling that [the programmer who wrote this post] would consider that working with 6n +/- 1 as too arcane a method..." :) I couldn't start the function name with the digit 6, so I added 'Arcane' before each one. ;)]
Anyway, that Arcane6nusingCount was a version where I tried to implement the 6n +/- 1 algorithm by keeping a counter of the iterations, and incrementing by 2 or 4 based on whether count%2 was 0 or 1. Turns out this extra mod operation kills performance, as you can see in the benchmark results below.

Benchmark Results:
Legend:
Due to Blogger's @&%*&%# formatting, I had to change the names to shorter versions as follows:

Naive_Iteration     : Iter
Arcane6nplusminus1  : 6n+/-1
Arcane6nadding2and4 : Add2,4
Arcane6nusingCount  : Count

The percentages are the amounts by which the algorithm in the horizontal (the row) is better than the one in the vertical (the column). The /s amounts are the number of times this subroutine could have been run in one second, averaged over the given inputs. For this test, the inputs were random numbers from 0 to 1000,000, as seen in the code above.

Trial 1

	Rate	Count	Iter	6n+/-1	Add2,4
Count	45.9/s	--	-24%	-35%	-39%
Iter	60.1/s	31%	--	-15%	-20%
6n+/-1	70.5/s	54%	17%	--	-6%
Add2,4	75.2/s	64%	25%	7%	--

Trial 2

	Rate	Count	Iter	6n+/-1	Add2,4
Count	43.1/s	--	-25%	-41%	-42%
Iter	57.4/s	33%	--	-22%	-23%
6n+/-1	73.6/s	71%	28%	--	-2%
Add2,4	74.9/s	74%	30%	2%	--

We see that the method of adding 2 and 4 directly (Arcane6nadding2and4) works best, but the one using multiples of 6 and dividing by $i-1 and $i+1 (Arcane6nplusminus1) is only slightly worse. And, in my opinion, it is clearer and more elegant. They're both much better than the naive iterative method, and the original recursive solution is not even comparable to these!

So, I guess the moral of the story is, look at ways you can optimize your code even if an 'authoritative source' seems to tell you've reached the optimum. Also, where performance is critical, profile and benchmark your code - they are much more reliable practically than theoretical algorithmic analyses (of course, this is true only when they are done right).

The program along with benchmarking code is available at: http://sundaryourfriend.googlepages.com/largestprimefactorBenchmark.pl
To run it, you need to have the Benchmark.pm Perl module installed (which would probably be available by default).

Have a nice day...

Wednesday, 22 April 2009

Top ten Internet resources for programmers

1. http://www.stackoverflow.com

A year ago, this website didn't exist. Now, it's at the top of this list of Internet resources for programmers! Stack Overflow is a place where programmers like you and me ask questions and answer them. Some things that make it great are: each question and answer can be 'voted' on (either +1 or -1), and the answers with the greatest number of votes appear at the top near the question; only programming related questions are allowed, and everything else is 'closed' down; any discussions on the answers go into the 'comments' section which is hidden by default - this means that flamewars and such nonsense are much less here, and mostly hidden from our view too.
But I can go all day about what makes it tick, but you won't really see how good a resource it is until you actually go there and try to search for some programming question you've been having. Chances are, someone else had it before you, and asked it there; in that case, you'd probably find very good answers to your question. If you are so unique that no other questioner thought of what you thought, you can login and post the question yourself - in which case you'd probably start getting answers within minutes, and the answer you wanted would be there within hours.
All that talk from me isn't probably going to convince you, so just go try it.

2. http://reader.google.com (actually, RSS feeds from programming blogs)

To be honest, this isn't a resource in itself, but can act as an aggregator of resources. If you're not familiar with the concept of 'feeds' or 'subscription' or 'RSS', here's a simplified view of it: when you add feeds from websites to Google Reader, you're asking it to check the sites for new content now and then, fetch the content automatically whenever it arrives and keep it stored, so that you can read it all in Reader without having to check each site yourself. If this does not seem like a big advantage to you (it didn't to me at first), start using it, and you'll see how much of a convenience it is.
The most important and common use of Google Reader (for us) is to subscribe to and read programming blogs such as Lambda the Ultimate, Joel on Software, etc., (in addition to webcomics such as Dilbert or xkcd :) ). Whenever you come across a blog that you consider interesting, click on the orange icon with the three curved lines that appears near the address bar (you are using Firefox, right?), and choose to subscribe using Google.
And after all this, don't forget to check out Google Reader every day or at least once a week. :)
While Google Reader might be the simplest for newbies (opinions differ even in this), there are many other feed readers and each one has its own fan base. ProBlogger seems to recommend FeedDemon, while other readers like the Sage extension for Firefox, Bloglines, etc. are also very popular.

3. IRC

Internet Relay Chat, simply put, is a set of 'channels' (what you might call chat rooms) each one devoted to a particular subject of interest. For example, there is a channel devoted to Wikipedia (#wikipedia I think), while another one devoted to discussions of mathematics (#math). You probably might find more use for ##c (for C programming), ##c++ (obviously, for C++ programming), etc. Like Google groups, here too, world experts on the field gather, so it's a very good resource if you know how to use it properly. If you're a beginner to IRC, I'd recommend the free IceChat client which is very newbie friendly and also powerful. The channels are hosted in things called 'IRC servers'. For programmers, almost the only server we'd need is irc.freenode.net. Connect to it, be polite, and share your knowledge too.

4. http://www.reddit.com/r/programming

Proggit, as it is fondly called, is a rich place where intelligent discussions, incandescent flamewars and 4chan memes happen side by side. Like Hacker News (#5 in this list), this too is a place where people share and vote upon links (obviously, links relating to programming). If you have the attitude of a gold miner, ready to dig out gold from among so much dirt and rock, you can learn a real lot from here. Just remember not to get trapped into the memes, for it is said there is no salvation after that.

5. http://news.ycombinator.com/
Known as 'Hacker News', it's a site where people post links to things they find interesting, and others vote upon them. This site is for 'hackers', and so though most of the content is about programming, other brain stimulating things that would interest a hacker would also be present. The ambience here is one of extreme decency and politeness, in contrast to most of the Internet. Some like the UI of the site, some don't, see if it fits you.
This site would have been much higher in the list if I had written this post a few weeks ago. But in that space, I've come to see that the discussions on Reddit (site #4 in this list) are usually much richer and informative than the ones here. Reddit does have other noise (people attempting at humour and nasty flamewars), but in trying to suppress that noise, HN (Hacker News) has seems to have suppressed some of the useful discussion too.

6. http://en.wikibooks.org/wiki/Category:Programming

Even if you knew about all the other resources here, I'd bet you didn't know this one till now. From Regular Expressions to Functional Programming, there's a wealth of information about many programming topics. This along with Wikipedia can take your programming knowledge to a higher plane.

7. comp. heirarchy in Google groups

This is a whole list of 'Usenet' groups about computers. Within these, you'll most probably find the groups under comp.lang part most useful, but the others cater to special needs - for eg., if you're into programming for graphics, you might find the comp.graphics groups useful. These groups have lots of experts in the field of the group, which means you'll be interacting with top notch programmers in your area of interest.

8. http://www.topcoder.com

A website where you get to 'compete' with other programmers from throughout the world in various 'tournaments', 'marathons', etc. Frankly though, I don't care a bit for the competition or ranking part of it. It contains so many good programming problems that, when you solve them, give you insights into so many things. Also, the Algorithm Tutorials here are said to be very good (they are on my 'to read' list :) ).
Some practical tips (since the UI of TopCoder is less than ideal and you might find it difficult to do things): To register, click on Login in the top right corner (you might have to scroll right a little to see it), then click on Register in the page that comes up. To participate in some tournament or just practice using questions from old tournaments, login to TC (that's the short form of TopCoder), click on Algorithm in the left panel, then on 'Launch Arena'. It would probably ask to install Java Web Start, go ahead and install it. After that, when the arena appears, login again in that window. Then, if you wanna participate in some tournament going on now (they'll send you mail about the schedules from time to time), choose the Active Contests menu. If you just want to try out old questions, choose the Practice Rooms menu. Either way, choose one of the SRMs from the menu. When you've been 'moved' to that room, click on the box that says 'Select one' and choose one of the scores listed there (like 250, 500 and 1000). A programming problem of that score would be presented to you. After this, I guess you'd be able to take off on your own.
There are other sites like Project Euler and Sphere Online Judge (SPOJ) which are somewhat similar to TC. Project Euler is heavily inclined towards mathematics, so I find its scope to be quite narrow. SPOJ is similar to TopCoder in that it too provides several algorithmic problems and allows us to submit solutions to be evaluated and validated. A small disadvantage is that the collection of problems is less than that in TC. But a great advantage in SPOJ is that it allows a wide variety of programming languages - TopCoder is limited to a few mainstream languages: Java, C++, C#, and VB .Net.

9. http://www.thedailywtf.com

Thought the Internet is serious business, at times we need to take some time off to laugh out and relax. What if you can learn something while you do that? Well, the Daily WTF (they claim that it means Worse Than Failure) is a place for that. Working programmers tell tales about horrid code and other maladies from their co-workers, which are mostly because of a lack of programming knowledge. The stories are hilarious, but at the end of most of them, you'd be a bit wiser knowing a little more as to how to be a good programmer. Of special mention are the 'Featured Articles' and 'Tales from the Interview' sections.

10. http://www.google.com

Ok, I can hear your 'Duh!' even from here, but give me a few seconds. To be honest, no list ever can contain all of the wonderful programming resources out there. There are just too many forums, too many personal web pages that contain precious wealth of information. With some power googling (I'm planning on making a post on this soon), you can dig all that out. So, this makes Google probably the most useful resource for programmers. The only reason this is on number 10 is that you very probably know about it already (if you didn't, what forest were you living in? :) )

So, if I were to limit myself to 10, those would be it...

I know, I know, I probably missed your favourite website out of this list!
If so, do leave a comment here so that others (including me) might know and experience its glory. Thanks... :)

Wednesday, 15 April 2009

Lisp by and for the C programmer

Finally, it's done. That program for finding the next palindrome given a number. I had to jump through several hoops and get help from the www and IRC to get it done (thanks to the helpful guys at #lisp - the newbie bashing notwithstanding).

Lisp is supposed to be a functional language. To be honest, I myself don't clearly understand what that means; I'm trying to learn little by little. As of now, it seems to me to be a conglomorate of various concepts and language facilities, some of which are present in some languages, and almost no real language seems have all of them.

Anyway, because of that, I tried not to make the program entirely C-like, and tried to do it the 'functional' way. Practically to me, it involved using recursion where I would have used iteration in C, and using variables as if they were constants - trying not to modify anything. In functional programming (FP) circles, this is called treating symbols as immutable (I think), and is glorified as a Good Thing. I haven't adhered to it fully (one place that comes to mind is the changing the value of ndigits in increm-left-from-here function), but I feel this is good enough for a C/C++/Perl programmer who understands very little of FP. :)

Some descriptions that might help understand the program are here. You can try out things with the Lisp in a Box IDE for Windows (you'll need to install the base installer and the CLISP module). Some clarifications: I'm not an expert in Lisp and don't claim to be one, I've described things as I understand them. Also, when making analogies to C, I've tried to take the thing that comes closest to it, but don't take the analogies literally. Concepts in Lisp are quite different from those in C, these are here only to give you a rough idea about them.
() syntax:
        The parenthesis syntax here is something that might be very new to you. But it's quite simple, and very consistent.
        When you see (A B C) in Lisp, what it means is call the function A with arguments B and C (A can also be a macro, which I've explained under incf). In C syntax, that might be A(B, C). One good thing about Lisp is, this notation is followed very consistently throughout the language. Even operations like addition are done with this syntax: 1+2 in C will be (+ 1 2) in Lisp. Whenever you see a list (x y z a b c), you can say it's a function call calling function x with arguments y, z, a, b, and c. The only exception to this are literal lists - you know, the lists you use to store data, like C arrays. When you want a list of number 1 to 5, you might try (1 2 3 4 5). But... Lisp sees such lists as function calls, and so tries to 'call' the function 1 with arguments 2, 3, 4, and 5. Surely not what we wanted. So, to tell Lisp that we want a literal list, and not a function call, we have to put a single quote ' before the list. So, '(1 2 3 4 5) will achieve what we wanted. This actually works by changing itself it (quote (1 2 3 4 5)) where quote is a macro. But you don't need to understand that to use the single quote. :)

defun:
        This defines a function, like the sub keyword in Perl. For the C-ers among us,

(defun funName (arg1 arg2)
(body ...
....))

is very much like

void funName(type1 arg1, type2 arg2)
{ body ...
...}

Immediately, we can notice a few things: In Lisp, we don't specify the type of the arguments that the function takes - it can take whatever it wants. arg1 itself can be an entire list itself, or a number, or a string, or an object, or any type you can imagine (and Lisp can handle :) ). Also, there's no declaration of what the function is going to return - this too can be anything.

incf:
        This is a simple macro that increments whatever you give to it as argument. If you're asking 'hello, what is a macro?', congratulations for the careful reading.
        A macro is a thing that modifies the code at the place where you put it. They are somewhat like the preprocessor macros in C - and before the Lisp fans start grumbling, I'll add - but they're supposed to be much much more powerful. There's an entire book that is almost fully dedicated to doing magic with Lisp macros: On Lisp.
        In case you didn't get any clear idea about what macros are from the previous paragraph (yes, I guessed it), I'll explain with this example itself. In my code, in one place, I had

(incf i)

While debugging the code, I saw how the incf was expanded: the macro just replaced itself with

(setq i (+ i 1))

I myself wrote a macro in this code using the defmacro command.
defmacro:
I described what a macro is above. In Lisp, the syntax for macros is very much like the one for functions, with defun replaced by the keyword defmacro. For example, I have defined a macro in this code as:

(defmacro get-mid-index (list)
`(ceiling (1- (/ (length ,list) 2))))

A macro too gets arguments like a function (the argument is 'list' here). What follows after the argument is the expansion of the macro, which means the macro is replaced with this thing wherever we call the macro.
There are some curious things to note here. First off, the expansion of the macro is not in an ordinary

(some code here ...
...)

format here. There is a ` and a , in that code, which makes it special. In () syntax, we said that '() makes a literal list - that is, a list that is not evaluated, but is treated as data. The ` (that is the backtick - the thing which is to the left of 1 in the keyboard) is similar to ' but with a difference: while nothing within the '() is evaluated, in a `() you can ask Lisp to evaluate certain things alone by putting a comma (,) before it. For example, in this expansion, the 'list' has a comma before it, so it will be evaluated. Since here 'list' is the argument we pass to the macro, it gets evaluated to the actual list which we pass. For example,

(get-mid-index digits-of-n)

becomes

(ceiling (1- (/ (length digits-of-n) 2)))

That's nifty, right? :)
nth:
nth is a function that returns the n'th element of a list. For example, (nth 2 '(a b c d e)) will return c. Here, it's important to note that Lisp lists are 0-based - the counting starts from 0. So a is the 0th element, b is the 1st, etc.
cond:
cond is a 'special form' (which is something like a function or a macro, but they say it's not either of those. I'm yet to understand special forms) which is somewhat like the if() construct of C. Lisp too has an (if ...) syntax, but it is limited to single statements. You can say (if (condition) (do-this) (else-do-that)), but (if (condition) ((do-this) (and-this)) (else-do-that)) is not allowed. In C syntax, it's as if

if(condition)
do-this;
else
do-that;

is allowed, but

if(condition) {
    do-this;
    and-do-this;
}
else
    do-that;

is not allowed. In most practical cases, I find that I need to put multiple statements in the if's body, so I rarely use if in Lisp.
The cond function is similar to if, but much more flexible. It's syntax is,

(cond ((condition1) (do-this) (and-this) (also-this) (and-much-more) ...)
        ((condition2) (do-that) (and-that) (also-that) (and-much-more) ...)
            ...
            (t (do-something) (do-something-else) (and-much-more) ...))

In this syntax, cond is followed by many lists. Each list contains one condition and a set of statements. Lisp will look through each of those conditions, and when any of the conditions succeeds (that is, returns true), it will execute the set of statements that are with that conditions. One weird thing about cond is that it requires that at least one of the conditions be true. If not, Lisp will go mad and bite you (or so they say). So, just for safety, it is good practice to put a (t ..) as the last condition, which stands for 'true', so it is a condition that will always be true. If any of the conditions succeeds, Lisp will execute its statements and exit the cond. If none of them succeed, the statements that are with the (t ...) will run.
If you want, you can think of the above cond as equivalent to:

if(condition1) {
do-this;
and-this;
also-this;
and-much-more; ...
}
else if(condition2) {
do-that;
and-that;
also-that;
and-much-more; ...
}
...
else {
do-something;
do-something-else;
and-much-more; ...
}

Note that the final (t ...) in Lisp is like the final else clause of C.
setq:
        setq is like the = assignment of C. i = 1; in C would be (setq i 1) in Lisp. Only today (two days after I finished this code), I read here that it is better to use setf for this purpose in Lisp, but so far everyone has told me to use setq, so I'm not sure which one is really better. The author of this FAQ that I linked to seems to know things, so I'm gonna trust him and use setf hereafter. It's really a trivial change, here it would be (setf i 1) instead of (setq i 1).
car and cdr:
        These strange names stand for simple concepts - they give you the first element of the list and the rest of the elements of the list, respectively. (car '(1 2 3)) would be 1. (cdr '(1 2 3)) would be (2 3). As you can see, car returns a single value (which the Lispers call an atom), but cdr returns a list.
format:
        The format command is the printf of Lisp. It has many fancy options - it can even iterate through a list and print the elements one by one. Peter Seibel gives a very good explanation of the format function here. One simple example is:

(format t "~D~%" n)

is similar to

printf("%d\n", n);

The 't' argument to format tells it to print the output to the screen (and not to a file or something like that).
With this intro, and some help from the Net, I think you'd be able to get a grasp of this program and learn a lot of Lisp in the meantime. Learning Lisp is said to be an enlightening experience, so I hereby invite you to the path to Nirvana, with this piece of code (many thanks to this online syntax highlighter for producing this HTML):

(defun inc-nth (n list)
  (incf (nth n list))
  list)

(defun get-mid-elem (list)
  (nth (get-mid-index list) list))

(defun mid-inc (list)
  (inc-nth (get-mid-index list) list) 
  list)

(defun get-digits (n)
  (cond ((and (< n 10) (>= n 0))
         (list n))
        ((< n 0)
         (get-digits (- n)))
        (t 
         (append (get-digits (floor (/ n 10))) (list (mod n 10))))))

(defun num-from-digits (digits)
  (cond ((= 1 (length digits))
         (car digits))
        (t
         (+ (car (last digits)) (* 10 (num-from-digits (butlast digits)))))))

(defmacro get-mid-index (list)
  `(ceiling (1- (/ (length ,list) 2))))

(defun get-mirrored-num (ndigits) 
  (num-from-digits 
  (let ((firsthalf (butlast ndigits (floor (/ (length ndigits) 2)))))
    (append firsthalf
            (if (evenp (length ndigits))
                (reverse firsthalf)
                (cdr (reverse firsthalf)))))))

(defun increm-left-from-here (index ndigits)
  (cond ((= (nth index ndigits) 9)
         (progn
           (setf (nth index ndigits) 0)
           (if (zerop index)
               (setq ndigits (cons 1 ndigits))
               (setq ndigits (increm-left-from-here (1- index) ndigits)))))
        (t 
         (inc-nth index ndigits))))

(defun get-next-palin (n)
  (let ((ndigits (get-digits n)))
    (cond ((> (get-mirrored-num ndigits) n)
           (get-mirrored-num ndigits))
          (t (get-mirrored-num (increm-left-from-here (get-mid-index ndigits) ndigits))))))

(let ((num-cases (parse-integer (read-line))))
  (format t "~{~D~%~}" 
          (loop 
           for i from 1 to num-cases
           collect (get-next-palin (parse-integer (read-line))))))

Tuesday, 14 April 2009

I want a "good" editor to create blogs

It's been happening for quite a while.

Blogger's post editor has been playing games with me. Especially on my GATE solutions blog, where I like to keep things in a uniform format (dunno why, but it feels good to have similar formatting for similar posts - and all the posts there are similar to one another :) )

I do a 'blockquote' (that's the double quotes icon in the Compose mode), type something, want to get out of the blockquote mode for the next line, it un-blockquotes all the previous lines too. What the heck?

I select some text to delete it, Blogger decides I want to delete some surrounding text also and 'helpfully' deletes. Just for testing, I tried deleting 'Blogger' in the previous line, it deleted 'decides' along with it. What on earth can be the logic behind such a thing?

And the biggest fail here is that, when I try to undo that stupid deletion to restore order, it undoes more than I asked for and deletes some things I typed! Seriously, what did they think they were doing when they added this 'feature'? Now, just because I wanted to delete something, I have two options - either not do an undo and retype the things near the place I deleted, or do the undo and retype the things I typed recently. What if what I did not remember what I had typed there? I have to juggle back and forth between undo and redo to see what I typed myself, and retype it myself! That's really... Worse Than Failure on Blogger's part.

With all that frustration out of me, what I wanted to say was, I need a new editor to create my blog posts or I think I'll stop blogging from the mere frustration of dealing with this. Don't worry, I don't think the situation is so bad I won't be able to get any editor to do the job. :)

First, I remembered Microsoft Word itself has a 'Post to Blogger' option, so I tried it. Well, what can I say? The HTML it created was, to put it bluntly, horrible! I'd have been somewhat ok with that if not for the next fact: when I clicked publish to blogger (or something similar), it told me my username and password would be sent in plaintext and anyone can read them. What? This huge program doesn't have a small place for incuding some SSL agent to do encrypted transmissions? I sure ain't giving my Google username and password in the wild like that.

There's also a Blogger for Word add-in to Word that probably handles things more securely, but it's development has been discontinued by Google, and what they give for download doesn't work for me in Word 2007.

Thinking there should be some solution for it, I searched throughout the Internets, only ending up reading things like this which told me some things about Blogger's WYSIWIG editor worked. Not really what I was looking for, but I learnt something. :)

So, the option I'm looking at currently is, creating the posts in the Kompozer or some similar editor, copying the resulting HTML here and posting. Quite a lot of work for such a simple task, but it seems there's no other option. Also, Kompozer doesn't seem to have easy buttons for things such as blockquotes, etc. like Blogger's interface, which makes things a bit more difficult. But in a few days' time, I think I'll be ok with anything to get off this annoying editor.

So, kind reader, if thou knowest of any editor which can my life more pleasant, kindly bestow thy blessings on poor me and reveal its name. I shall be indebted a lot to thou. Thanks. :)

Friday, 10 April 2009

Finding the next palindrome given a number

I recently was looking for websites similar to Project Euler, and came across SPOJ - the Sphere Online Judge. It gives us some problems, then asks us to submit a solution, which it will evaluate and tell if we are worth our salt as programmers ;)
More details here.

There, I came across this problem which looked quite interesting. Basically, the challenge is, given a number n, we've got to find the palindromic number which comes after this.
Some definitions:
Palindromic number: A number that reads the same either way, like 12421, 57375, 8448, 146641, etc.
"comes after this": the palindrome number must be greater than the given number - given 12321, we must find the next palindrome and not return 12321 itself as the answer.

Ok, with those made clear, let's dive into the problem.

On first sight, it seems simple enough: iterate from n+1 checking if each number is a palindrome. Stop when you find a palindrome.
Unfortunately, that strategy is too slow for our purposes - SPOJ clearly states the inputs can be very huge, in which case this brute force would probably not be able to complete it within the given 9 seconds.

Anyway, we are programmers, and must seek more efficient ways when they exist. Here, it does seem there must be a way to find the next palindrome given a number. What would it be?

My first thought was, simply mirror the number around its centre. Then, we'd surely end up with a palindromic number.
What I mean is, given a number 16310, mirror the first half values across the central '3' to give 16361. If the number has an even number of digits, just mirror it about the centre of the number, like 3628 to 3663.

That strategy works as long as the resulting 'mirror-imaged' value is greater than the original number. What if it results in a number less than (or equal to) the original? Say, for 41258, it would give 41214, which is less than the original number. Or 12321, which would give 12321 itself, which is equal to the original number.

So my next strategy was, if simple mirror-imaging fails, increment the middle digit and then do a mirror image. So, 41258 would become 41314. 12321 would be 12421.

But again, there's a problem with this approach. What if the number had even number of digits? There is no middle digit to increment. Let's say the number was 4265. It's simple to see that, instead of the middle digit, if we increment the digit to the left of the middle (2 here), and do the mirror image, we'll have the palindrome we want: 4334 in this case.

So far, so good.

Now, onto a big issue: 9. Yes, the digit nine. We saw that when the simple mirroring fails, we increment the middle (or left to middle) digit. Now, what do we get when we increment 9? 10. Oops! We can't put a 10 where 9 was. Want proof? Try 41968. Simple mirroring would give 41914, no luck there. Incrementing the 9 to 10 would give 411014, not a palindrome even. And also, it's surely false that there's no palindrome between 41,968 and 411,014 (that's between forty one thousand and four hundred and eleven thousand).

So, what do we do? Luckily, something we learnt in elementary school - carrying over - seems to help here too. Wherever the 10 occurred, keep the 0 there itself, carry over the 1 to the previous digit. So here, 41968 would become 42068 which on applying mirroring would give 42024.

Here too, there's a caveat: consider the extreme cases: 99999 would give (with repeated carryover) 1000001, and 9999 would give 100001. In both cases there's an extra 0 hanging in there. So far I've not been able to generalize it, and have been keeping the case of all digits being 9 a special case only. In this special case, I do the usual procedure, then remove one of the 0's from the resulting number. Ugly, but works. :)

EDIT: Turns out if I finish the increment and carrying over parts fully, then mirror image the number according to how many digits it now has, the 9999 cases take care of themselves with the original algo itself - no 'special cases' needed. :)

So, I believe this is the complete algorithm to find the next palindrome given a number. I have not thought through the steps very carefully, so if you see any flaws, kindly point them out...

I'm trying to code this thing in the Lisp programming language, so I'm taking baby steps towards the solution. As of now, I'm yet to implement the carrying over for 9 part alone. And I don't have any idea how complex that is going to be.

Update: I finished coding this in Lisp, but lost the code somewhere. I coded it again in Python now, which is available here. I've put in a lot of comments so as to make it somewhat easily understandable. Hope this helps the doubting Thomases believe. :)

By the way, my GATE solutions blog now has full feed access, which means you can read it fully without leaving the comfort of your RSS reader.

Happy coding :)

Thursday, 2 April 2009

Back to blogging

Yeow... It's been quite a while since I blogged here. I've been doing somewhat better in my other blog about GATE question papers' solutions.
The reason I've not been so active here is, unlike there, here I don't have anything specific to say.. I was just bloggin whenever I felt I had free time, and also had something technically interesting to write about.
These recent days, I had convinced myself that I was busy, couldn't afford time for blogging, didn't have anything worthy enough to blog about, etc. It's crazy to what extent you can deceive yourself, given this is possibly the most vetti (translated: being without any work) period I have ever had... :)
Meanwhile, Vijay had a look at my blog and was upset that it was fully technical. He recommended I write something 'not-so-technical' and more like a blog. And that seemed like a good idea too, as that would (hopefully) be much easier to do than to analyze and write a technical one...
But what non-technical topic do I know anything about?
I'm a closet nerd, and hate small talk. I might do it in RL for 'social' reasons (gee, do I hate that word!), but surely not in my blog. This is my space, and shall remain so.
Vijay himself came to the rescue again - 'you think a lot about spiritual and philosophical issues right, why not write articles about that?' he asked. For then, I pointed him to my pseudo home page: http://sundaryourfriend.googlepages.com, but I knew it was heavily outdated and incomplete, and the content wasn't all that mature either. My views and ideas have changed a lot in the mean time (hopefully in a positive gradient), and I've (again hopefully) matured a lot.
So, the conclusion of all this rant is, this is probably going to be the platform for my spiritual or philosophical posts too (unless I decide to start a new blog for it - in which case, I'll surely tell you).
Also, don't get daunted by those two words - By spiritual, I mean things that can improve your life, and by philosophical, I just mean things that I think through deeply.. I'll keep the articles as interesting as Sundar-ly possible. ;)
Finally, I promise you this will be the only nearly-contentless post in my blog (at least for some time), and I'll post something worthwhile as soon as possible... :)