Dogpile.com.au is NOT the DogPile search
engine. It is an Australian offsite backup company.
Dogpile is a metasearch engine that fetches results from Google,
Yahoo!, Live Search, Ask.com, About.com, MIVA, LookSmart and several other
popular search engines, including those from audio and video content
providers. It is a registered trademark of InfoSpace, Inc.
|
|
A Web search engine like DogPile is
a search engine designed to search for information on the World Wide
Web.
Information may consist of web pages, images and other types of files.
Some search engines also mine data available in newsbooks, databases, or
open directories. Unlike Web directories, which are maintained by human
editors, search engines operate algorithmically or are a mixture of
algorithmic and human input.
Web search engines work by storing
information about many web pages, which they retrieve from the WWW
itself. These pages are retrieved by a Web crawler (sometimes also known
as a spider) — an automated Web browser which follows every link it
sees. |
Exclusions can be made by the use of robots.txt. The contents of each page
are then analyzed to determine how it should be indexed (for example, words
are extracted from the titles, headings, or special fields called meta
tags). Data about web pages are stored in an index database for use in later
queries.
Some search engines, such as Google,
store all or part of the source page (referred to as a cache) as well as
information about the web pages, whereas others, such as AltaVista, store
every word of every page they find. This cached page always holds the actual
search text since it is the one that was actually indexed, so it can be very
useful when the content of the current page has been updated and the search
terms are no longer in it.
This problem might be considered to be a mild
form of linkrot, and Google's handling of it increases usability by
satisfying user expectations that the search terms will be on the returned
webpage. This satisfies the principle of least astonishment since the user
normally expects the search terms to be on the returned pages. Increased
search relevance makes these cached pages very useful, even beyond the fact
that they may contain data that may no longer be available elsewhere.