The Tao of Programming: 2008

Monday, 29 December 2008

The golden rule of stock market investing

Never invest your emotions in the market.
Remain Rational.

Sunday, 21 December 2008

Solving a Zimbabwe Math Olympiad - Part 1

Today, while reading this entry in bit-player, I saw a link to a Maths Olympiad (an Olympiad is supposedly: period of four years, by which the ancient Greeks reckoned time, being the interval from one celebration of the Olympic games to another. I don't know why they call this an Olympiad...)

There were 20 questions from various areas of math, and they seemed quite easy and that was attractive to me (as my math workouts these days are very less). So, I've started solving them, and am gonna post my modus operandi for each problem as I solve it.

The first problem was,

If log₂(log₃ a) = 2, what is the value of a?

Quite very easy, if you ask me. First, to remove any logarithm, you've got to raise both sides to the power of the base of the log. So, here, the first step would be, to raise both sides to the power of 2, so we'll end up with 2^(log₂(log₃a)) = 2^2 which would reduce to log₃a = 4 (since any x to the power of a log_xy will leave us with y alone.)
Now, formally, I should have used the same method to remove the next log also. Instead, I found myself thinking, 'Which number's log to the base 3 will be 4?'. The answer to this was stored in my memory's cache as '81', so I determined that was the answer. And I did the ordinary computation method too in order to verify my answer, since the answers don't seem to have been given in that page...

The second question was:

In triangle ABD, the angular bisector of angle DAB meets BD at C. If CD=6, BD=10 and AD=9, what is the length of AB?

Ok, this wasn't as easy to me. Geometry has never been a particularly favourite subject of mine, and the theorems were all things I learnt to pass through the exams. I remembered I had read somewhere about angle bisectors and things, but that was all I remembered. :)
No worries, Google to the rescue. I searched for properties of angle bisectors and ended up in this interactive page where they have some Java applet. Fortunately, my NoScript Firefox extension blocked the Java thingies (they slow down my system badly!) and I had what I wanted scrolling down a little itself:

Angle bisectors divide the opposite side in the ratio of the adjacent sides. More accurately, if, in ABC, AD is an angle bisector of angle A, then

AB/AC = DB/DC

So now, I have AD, CD and CB, I just need to apply this formula (note that C and D have reversed roles in the problem and the formula.)

For us, AB/AD = CB/CD which means AB/9 = 4/6 which gives us AB = 6.
I think I cheated by Googling, but what the heck! ;)

to be continued... (as I solve more problems)

Saturday, 29 November 2008

Explaining how diversification lowers risk - using probability

With the economic crisis ringing high and loud everywhere, I decided I must delve into Economics and learn some things, if only to understand those newspaper and blog articles. Fortunately, one of the sites I just stumbled upon, ShareKhan, has an excellent tutorial on stocks and shares. Very informative, and somewhat to my surprise, a very interesting read too.
In it, in this page, there was an example about risk taking and avoidance. A simplified version of it is:

Which of these options do you think is very risky? Which one is the least risky to invest in?
1. SAIL
2. SAIL & TISCO
3. SAIL & HLL
4. HLL, TISCO, ACC & Infosys

Now, if you have any idea about this thing called diversification at all, you would immediately say the answer - the last one is the least risky.

The first one here is very risky - you bet all your money on a single horse.

The second one is slightly better: you place your money on two different companies, but... they are in the same industry (steel), so a problem in the industry itself (which happens ever so often) would still affect you.

In the third one, you still bet on two companies, but the risk is reduced, possibly by a lot. This is because the two companies deal with widely differing fields, so the chance that both of them will fail you together is quite less.

The fourth one is perhaps the safest. Quoting from the tutorial itself,

All these four stocks are leaders in their sectors. The sectors almost cover the entire spectrum of the market. At least two of them will be doing well.

That is all fine, but may be you are not convinced. At least, not convinced as much as to bet your money on it. No problem, Math exists to convince you doubting Thomases.

So, let's begin.

Assume that the probability that SAIL's shares will yield good returns and will not fail you is P(S), the same for TISCO is P(T) and that for HLL is P(H).

Now, if you have bought the shares of SAIL and TISCO, the probability that either of them will yield you good returns is

P(S U T)

which, as we know from our high school math, can be written as:

P(S U T) = P(S) + P(T) - P(S ∩ T)

Here, P(S ∩ T) represents the amount of correlation between the yields of SAIL and TISCO - the probability that both companies will give you good yield. And this is where the difference comes between case 2 and case 3. In case 3, we have,

P(S U H) = P(S) + P(H) - P(S ∩ H)

Now, the conditions so that neither SAIL and TISCO fail you are:

1. When neither company does anything stupid
2. When steel industry is going well and
3. When entire economic situation is good

And the conditions for both SAIL and HLL give you good yield are:
1. When neither company does anything stupid
2. When steel industry is going well and the household items market is going well
3. When entire economic situation is good

The two seem quite similar, except for the second point. But that small difference can make a large difference to your life. Heck, it might even decide the difference between luxury and bankruptcy!
Ok, that dramatism was unwarranted, but you get the point. The probability that two unrelated industries will be very bright together is somewhat less than the probability that just one industry become bright. P(S ∩ T) is a little more than P(S ∩ H).

When we apply this to the above two equations, we see that P(S U T) becomes less than P(S U H), or to put it in optimistic terms, the probability that the stocks fail someone who diversified more, is quite less than the probability that someone who didn't diversify effectively regrets his decision. Thus, good diversification reduces your risk and increases the probability that you will have a successful yield.

From the above analysis, it is quite obvious that the lesser the value of (S ∩ H), the better are your chances of success. One way to achieve this is, as we did above, to choose industries that are less correlated with each other.

However, while writing this, it occurred to me that P(S ∩ H) will be even lesser if the two industries were competing in some sense (in simplistic terms, think car and bike industry, which in some ways are competing with each other - though this is not a perfect example). In such a case, however, an increase in P(S) would decrease P(H), and vice versa. This is due to the very nature of the industries we chose - each one negatively affects the other.

So, the choice is between choosing
* uncorrelated industries so that P(S) and P(H) can increase simultaneously, but P(S ∩ H) is not quite less, and
* competing industries so that P(S) and P(H) negatively influence each other, but P(S ∩ H) is very less

Obviously, there is no single answer: it depends on P(S), P(H) and the trend of P(S ∩ H). There is an equilibrium point: on one side of it, the first strategy works better, while on the other side, the second works better. Do your math and invest wisely. :)

Wednesday, 12 November 2008

Video and voice chat in Gmail

Now, it's finally here.
The folks at Google have finally added Video and Voice chatting ability to the Gmail chat program (no, not Gtalk, it's in the chat thingie that's at the side within Gmail itself.) The announcement seems to have been made only in the Google blog, which I kind of find strange since it won't reach most normal users this way. Anyway, it's Google, so they must have something clever in mind.
To use this facility, you first have to download a plugin from here and restart your browser. After that, you are supposed to have video and voice chat abilities; it took some time for me to appear, but it did get enabled after some time.
Once the facility gets activated, there are some changes in your chat box. Mainly, the menu that said 'Options' in the lower left hand side of the chat box will now say 'Video & more'.

Expanding the menu gives:

The actual options differ based on whom you are chatting with. If the other person too had this Video chat plugin installed, there would be an option to start a video chat.
So, from now on, you can do voice chat with your friends who are in Gtalk, from within Gmail itself.
And if you have a Webcam, you can show your new haircut or your newest girlfriend to your mom. Have fun! :)

Sunday, 24 August 2008

Million dollars...

This really must have been a coincidence... No, really... ;)

Sunday, 4 May 2008

Largest prime factor of a number

Long live the programming overlords! Today is the day I have begun serious programming again... 'serious' is perhaps a bit too much - programming is never actually serious, it's a sort of fun activity that happens to appear boring and difficult and hence obtains good paychecks from the HR types... ;)

Anyway, I happened to come across the Project Euler and registered in the site. The registration was painless and I fell in love with the site immediately. Problems 1 and 2 were trivial (finding the sum of the multiples of 3 or 5 under 1000, finding the sum of even numbers in the Fibonacci series which are less than 4 million).

Problem 3 was a little more involved. I had to find the largest prime factor of 600851475143. It isn't particularly difficult, but it does call for some thinking. The first method I thought of was to

divide the number by successive integers upto sqrt(n), and
when the remainder is 0, check whether the divisor is prime,
if so, store it as answer overwriting the previous value, if any.

An obvious improvement was to come down from sqrt(n) so that the first number to satisfy conditions 2 and 3 would be the answer.
However, by this time, something seemed fishy in this method. Particularly, it occurred to me that a prime number checking was costly and unnecessary here. Instead, you can go about it using a recursive solution:

FindLargestPrimeFactor(n):

divide the number by successive integers (each denoted by i) upto sqrt(n),
when the remainder is 0, return the maximum among i and FindLargestPrimeFactor(n/i).

It might not be clear at first sight that the algorithm works correctly. There seems to be no check for primality. However, this algorithm exploits the fact that the smallest factor sf of a number n will necessarily be prime. This is because, if it were not prime, its factors would be factors of n and hence sf would not be the smallest factor. Thus, we need only find the smallest factor to have a prime factor. Then, we can divide away this factor and again find the smallest factor. This way, we can have all the prime factors of a number.

In the algorithm, we find a prime factor of n, compare it with the largest prime factor of n/i, and return the larger number. Thus, finally, we'll have the largest prime factor of the number n itself.
Obviously, we need a limit for the recursion. In this case, this limit occurs when i overflows sn. This means that n is prime, and hence we must return n itself as the largest prime number.

This was the logic of my program as I coded and finished it. I entered the answer into the Project Euler website (which is projecteuler.net, by the way). After that, I saw the forums in the website and saw that many had implemented my original algorithm which used prime testing. This more efficient recursive solution had missed the neurons of many... With that boost to my hubris, I turned to the pdf containing the 'overview' of the problem.

Wow, the algorithm described is very close to mine. One improvement is that the recursion has instead become an iteration. This is one area where I stumble every time - converting a recursive program into an iterative one. So I bookmarked this in my mind and moved on.

The next improvement was one that made me feel stupid for not thinking of it earlier. Since I'm going to find only prime factors, I need only use 2 and odd numbers in step 1. So, I can use a separate step for 2 and then start with 3 and increment i by 2. This way, we'll save about half of the iterations...

UPDATE: There are quite a number of other improvements that must be done to this algorithm. For the actual algorithm for this problem (which develops on the algorithm here), see Finding the Largest prime factor efficiently.

The final perl program I wrote was:

use strict;
use warnings;

my $magic = 600851475143;

sub largestprimef($);
sub max($$);

print largestprimef($magic);
sub largestprimef($)
{
my $n = shift;

my $i;
return largestprimef(max(2, $n/2)) if($n % 2 == 0);

my $sn = int(sqrt($n));

for ($i = 3; $i <= $sn; $i += 2)  {

  last if($n % $i == 0);

if($i > $sn) #loop ran over, means the number is prime
{
  return $n;
}
else
{
return max($i, largestprimef($n/$i));
}
}

sub max($$)
{
return (sort { $a <=> $b }(@_))[1];
}

A few notes about the program here:

not 'use'-ing the strict and warning pragmas gave a speed improvement of a few milliseconds
first I implemented max in the good old comparison way - if i > j, return i else return j. Turns out perl's sort implementation gives ~~faster results than this~~ equivalent speed. I just tried it for the heck of it ~~, and was surprised to see a few milliseconds improvement. Probably the speed improvement came from the removal of 'shift's to get the numbers from @_ to $i and $j.~~(that was a temporary illusion; in repeated testing both turned out to be equivalent.
Since I did the program in Linux, the age old newline problem cropped up. When I pasted the code into blogger, it showed the code as one continuous mess. I had to use http://www.hashemall.com/ for converting the newlines. The site dutifully hashed the program, and thankfully returned my original text with MS Windows newlines in place of Unix ones.. (The hash value under sha-1 was b8b7188ad3440084e5db7f44fd70d599f897e33a, in case you are interested... :) )
The final runtime was around 14 ms.
UPDATE: The code highlighting is courtesy of Online syntax highlighter 'tohtml' which is an excellent service by the way.