In a world where Google is now a verb, you have to recognize how amazing search engines are. The entire world wide web can be searched in a fraction of a second; reams of information travel across the globe almost as quickly as you can think of a question. Search engines are so fast and so complete that people have taken them for granted. But how is this possible? What Internet wizardry makes this process work, and at these ridiculously fast speeds? Scouring the world’s information is an amazing process, and the way the best search engines manage it is quite remarkable.
Clearly, this type of knowledge acquisition is outside the realm of human endeavor. There is simply too much information and it is far too spread out across the entire Internet. Clever programmers created little software robots called spiders. The name stems from the fact that they crawl across the Internet and it’s a play on the phrase “the web.” They spread out across the entire Internet and created a record of the words on web pages. They started on the biggest sites and logged page titles, metatags, sub headings and the major words that appear multiple times on a page. After recording them, they filed them away, making careful note of where they found them. As their collection grew, it became possible for users to enter their own search terms and the search engine would rapidly search through the aggregation of words collected by the site’s spiders. As technology increased, these spiders were able to search more pages faster, adding to the collection of known words in the database. Searches were able to be more precise and results were much more thorough. Pages that either don’t want to be included in search results or sites where the crawling of the spiders would interfere with the site’s functionality, namely gaming sites, have utilized a Robot Exclusion Protocol that blocks the spiders.