Stories of algorithms: Google Page Rank
28 March 2019 | Written by Pietro Crovari
How does Google always find what we are looking for?
Recently Google presented the Stadia project, wanting to pave the way in the video game industry. With this move, the Mountain View giant adds another piece to its multi-billion dollar industry: advertising, multimedia content, cloud products and services, telecommunications, are just some of the sectors in which Google has entered and whose market has been upset thanks to its high innovation products. However, we must not forget how it all started: with an algorithm, hidden behind a simple search bar.
For whatever we need, Google has the answer for us. Are we looking for our son’s birthday cake recipe? We are immediately in a cooking blog. Are we looking for that pair of shoes whose name we do not know? We immediately find an e-commerce site that sells them to us, perhaps at a discounted price. We need to do research on the Punic wars, we will immediately find a list of sites on which to find all the information we need. In recent times, if we look for brief information like the date of birth of some celebrity, the result of the last day of the championship, some currency conversion or quick calculations, Google already provides us with the answer, without having to enter any web page ( try asking Google “What is the answer to everything?” However, how does Google always find what we are looking for?
The search bar is only the tip of the iceberg. Behind the seemingly simple white search bar, there are hundreds of algorithms for the most diverse purposes, ready to intervene as soon as we type something. Many of these deal with interpreting our request to understand what we really want to look for. Others try to anticipate our intentions by providing us with the best possible research suggestions. Still, others have the task of understanding what our tastes and interests are to offer us the best advertisements. Among all, however, the most important algorithm is probably the one that decides the order in which to show the results, Page Rank.
Page Rank is one of the algorithms behind the Google search engine. Larry Page and Sergey Brinn wrote it in 1996, just two years before they founded their startup, Google Inc. Page Rank was one of the key factors that contributed to the company’s success: before that time, in fact, the search engines arranged the results without following particular sorting criteria, making it frustrating for the user to find the result obtained. This algorithm, instead, sorts the results by order of importance: the most relevant first, the others in the queue. In order to understand how it works, however, we must define what it means that a web page is “important”.
Important means very quoted. The creators of the algorithm, when finding a size suitable for measuring the importance of a page, made the following assumption: the more important a page is, the more other sites will refer to it. Therefore, if many web pages contain a link to a specific page, that page will be important and will, therefore, occupy one of the top places in search results. On the contrary, if no page contains a link to the page in question, it probably means that the content is not important.
The deception. If Page and Brinn had stopped at this point, “cheating” to make sure you put your pages on top of the searches would have been very easy: you just need to build many fake pages whose only purpose is to contain links to our pages, so that their score grows considerably and wins over rivals. However, the Page Rank algorithm does not stop there: the voting power is not the same for any page but depends on the importance of the pages themselves. Furthermore, the voting power of a page is divided among all the links present within the same. For example, if a page has a value of 6 and contains 3 links, it will give each of the three links a 6/3 = 2 points. The overall value of the page will simply be the sum of the values of the links pointing to the page itself. In this way, if the owner of page X wanted to “cheat” creating a series of “puppet” pages full of links to page X to increase his score, it would fail, because the puppet pages they would have a very low score, and therefore they would bring an almost null contribution to the total score on the page.
A magnificent automated mechanism. The Google database contains hundreds of billions of web pages. Thinking of applying the algorithm “by hand” would be unthinkable. On Google’s servers, in fact, there are small virtual agents, called Crawlers, whose purpose is to visit web pages and extract their contents and links, in order to update the search engine’s database and, consequently, to be able to rank pages in a completely automated way.
Optimizing results, real art. More than 22 years have passed since the original conception of the algorithm and if Google had not continued to innovate since then, now it would definitely not be where it is. Today, in addition to the Page Rank algorithm there are many other optimizations, to improve the final user experience. For example, if the following page has guidelines on its structure, it guarantees a good level of usability, and then it will be favoured in the ranking. On the contrary, if too invasive advertising is present, the algorithm will penalize the pages. Many criteria like these are known, but many are kept hidden from the “Mountain view giant”. For a company to understand how to make its contents appear on top of certain research is a strategic operation of essential importance and anything but trivial. For this reason in the last few years an entire discipline has been born, the so-called SEO, Search Engine Optimization (search engine optimization), which studies how to set up its own contents so that they are at the top of users’ searches.
We have only uncovered Pandora’s Box. We have only scratched the surface of Google, and we have already understood how complex what lies behind the most famous page of the Internet is. There are lots of other algorithms that make it so elegant and all work together as a single oiled machine to make our experience the best possible. However, this is another story…