Lifestyle
Revving The Engine
Roland Henry
Thursday, July 02, 2009
When the world learnt that iconic King of Pop Michael Jackson was dead last Thursday, it was mass hysteria on the Internet.
![]() |
The Internet's most powerful search engine, known to us as Google, had all but collapsed (think 91,100,000 hits for Michael Jackson and you'll get the idea). But while we're still caught up in the Jackson drama, it's a great time to examine search engines and how they work.
Search engines use a particular algorithm or computer program to generate results. When a user enters a particular item, Google, for instance, employs automated programs called spiders or crawlers (software robots) that basically identify several keywords within its large index.
What sets Google apart is how it ranks search engine results, which, simply put, presents its findings in descending order or according to how many hits a particular website receives.
Google, pioneered by Stanford University PhD students Larry Page and Sergey Brin, began as little more than a tool to assist their research, and has spawned a totally new online culture. The information Google presents is stored in a way that makes it useful. To make more useful results, most search engines store more than just the word and URL; it might also store the number of times a word appears on a Web page.
Thursday Tech research suggests that the data is often encoded to save storage space. For example, two bytes of eight bits each are used up to store information on weighting - whether the word was capitalised, its font size, position, and other information to help in ranking the hit. Each factor might take up two or three bits within the two-byte grouping (eight bits = one byte). To this end, much of the information can be stored in a very compact form. After the information is compacted, it's ready for indexing.
The indexing is done so that information can be found in a time-efficient manner. These indices are built using what is known as a hash table. Simply put, this means attaching numerical value to each word by way of a formula. The formula is designed to evenly distribute the entries.
In the dictionary, for example, more words begin with "M" than those that begin with "X". This inequality means that finding a word beginning with a popular letter takes longer than finding one that begins with a less popular one. Hashing evens out the difference and reduces the time it takes to find entries.
Therefore on Google, Netscape or Internet Explorer finding both "M" and "X" words take the same amount of time.
So the next time you punch in keywords to look for a particular site, dart back to all that's happening in cyberspace and the process that's bringing you your requested information in mere seconds.



