Saturday 27 May 2006

google and erdos

what do the greatest mathematician of the 20th century paul erdos and the google search engine have in common? it turns out that quite a bit, and the way google ranks its searches is based on an age-old concept that we all actually encounter every day - that it is better to be talked about by a few smart people, than to be talked about many by mediocre people - but the genius of the founders of google was in that they applied this concept in a completely new context.

i set out to explore what was behind the dramatic rise of google to the dominance of what at the time seemed an already crowded search engine market in the mid-1990s while making the two stanford graduate student dropouts billionaires overnight. in addition, we learn about the interesting role the fathers of the two founders might have played in the conception of this astonishing search engine, which has profoundly changed the way we work and saved us countless hours of labor.


it is well known that the two of google founders sergey brin and larry page are today billionaires, $12 billion each to be exact at the age of 31 and 32 respectively at the time of the writing this article, and it is also well known that their company is not just a pipe dream since it brings in over $1 billion in real cash every year and adds fundamentally important value to our information society (which in turn justifies the over $80bn valuation of the company).


but what is less talked about is the simple yet elegant concept that Google’s search algorithm is based on and where they got the idea for it. To understand the full story, we need to step back and look at the search engine market in the mid 1990s when Sergey and Larry were sitting in the Gates computer science building at Stanford and were some of the few people who were unhappy with search engines and decided to come up with an elegant solution. It is also the story of Paul Erdos, the legendary Jewish Hungarian mathematician who Time Magazine nominated as the most important person of the 20th century in mathematics (and the vast majority of top mathematicians today would easily agree with this).

search engines before google

The search engine market seemed quite saturated with AltaVista, AOL, Yahoo and Microsoft complacently basking in the warm sunlight of the rising dotcom fever. It was completely inconceivable that another search engine could add anything of value on the internet, and most slick venture capitalists from Silicon Valley to New York would tell you that anyone launching such a business would be destined to a miserable failure.

There was only one problem with these engines. The searches were quite poor since the search algorithm was very primitive and simplistic and produced a lot of irrelevant searches. So, in fact, although they said ‘search’ on the web pages, they were no search engines at all. You still had to manually sieve through dozens and often hundreds of results to find what you really needed. The problem was that these engines based their results only on the number of occurrences of the key word either in the text body of the page and in what is known as meta-tags - the hidden part of the web page which is invisible to users, but contains the description that the web page author entered at the time of the creation of the page.

Thus, for instance, when I would enter ‘John Lennon’ into AltaVista most of the top results might have well been messy pages containing absolutely no relevant information on the famous member of “The Beatles”. This was likely since more often than not, the authors of the pages that came up as results were well-meaning fans like Tommy the Teenager from Ohio and Grace the Groupie from the UK who both simply wrote 1000 times “I love John Lennon” on their colorful teenager home pages. In other words, searches were painful and time consuming.

erdos number

Clearly, a completely novel, revolutionary mechanism for making searches relevant was needed, some mechanism that would create a hierarchy or organization on this web of unstructured information. Even though subconsciously many people understood this, it was very hard to make this hierarchy somehow concrete or rigorous. This is where Paul Erdos and an informal prestige ranking system used among mathematicians known as the Erdos Number gave a surprising solution to search problem.

Now let us just make it clear that unless you were a mathematician yourself or deeply embedded into the sub-culture of mathematicians for some reason, it would be very unlikely that you would have even heard of this Erdos number (the only reason the author of this article happens to know about it is because he shared a college dorm room with a young Romanian mathematician who told him about it).

fathers and sons

But interestingly enough, both Sergey and Larry had a very close source in mathematics: it turns out that both of their fathers were mathematics professors so it does not take a leap of imagination to figure out that the fathers were probably very instrumental to making the connection.

Paul Erdos [left] himself was a quaint mathematician who most of his life did not have even an apartment or a wife, but lived out of a plastic bag as a traveling vagabond from one mathematician’s home to another. In return for the hospitality which his mathematician friends offered him, Erdos worked with the mathematicians, posing new theorems and solving old ones. In doing so, he revolutionized modern mathematics, creating most of the body of work of what today is known as number theory, combinatorics and discrete mathematics, branches of mathematics that most of modern computer science is based on.

Because of his prolific output, mathematicians created the Erdos Number as a humorous tribute to the prolific genius which had the following simple idea: Erdos was assigned an Erdos number 0; people who had a chance to work with him and co-author a paper were given an Erdos number of 1; in turn, people who collaborated with the people who had worked with Erdos received an Erdos number of 2, and so forth. Thus, Erdos number became an unofficial ranking and a status symbol in the world of mathematics. Even though Erdos died in 1996, some have estimated that 9 in 10 of the world's active mathematicians today have an Erdos number smaller than 10.

google version of erdos umber: google page rank

The flash of genius on the part of the Google boys, in the opinion of the author, was to transport the idea of the Erdos Number into the world of search engines by observing a very simple analogy. Erdos had number 0 because Erdos was an authorative source in mathematics. Why not define a core set of institutions with authority for each field of human endeavor, such as government institutions, universities, major news papers and so forth, and give their web sites an "Erdos Number" or Google Page Rank of 0? And why not give those sites that the rank 0 sites referred to a rank of 1 and so forth. The actual Page Rank is adjusted for some minor factors but the basic idea is still the Erdos Number, even though it is not explicitly stated in the Stanford paper that Sergey and Larry wrote initially about the Google algorithm.

Lo and behold, overnight the search engine terrain experienced what could only be called an earthquake. Searches by Google were extremely relevant while other search engines languished into oblivion.

google today

I once read a statistic that Google has 33% of the search market but in reality it feels like it is close to 100% since I do not know anyone who would use a search engine other than Google. So will Google's monopoly continue? Are they perhaps a threat to the lion of the high-tech jungle, the omnipotent Microsoft? Perhaps the key is really to model the company itself after the life of Erdos – a constantly evolving prolific repository of new ideas and tools, taking the user to a new levels of productivity and efficiency,while challenging old school ways of looking at the world.

No comments:

Post a Comment