site map
home
elements of the Net
research techniques
search our site

 

the other research techniques


check your sources


commercial databases


find people


intelligent agents


search mailing lists and newsgroups


virtual interviews




copyright




try our featured search engines

Search the Web


Searching the Web

In the connected world, lack of information is not really a problem. The problem is rather a lack of relevant information and how to get to what one is really looking for. The tools for this are the various search engines which can be found online.
Mastering them and knowing which engine to use for which query is the difference between searching and surfing and determines whether one can find information fast or getting lost in the web.

Using the tools

When using any of the search engines described below, you should be aware of two principles:

  1. A search engine is just a tool. It is up to you to learn how to use it. Read the instructions before using the search-engine! Really do. They are normally accessible via a "Help" or "Search Tips" link.

    The reason, why you want to do this, is that only very few search engines work according to the same system. Some search engines will for example consider words written with mixed capital and small letters to be case-sensitive. Entering "Sun" would in this example not find "sun" in any text. Thus, you might find the newspaper "The Sun" but will probably miss a lot of scientific texts. Therefore - read the instructions!

    Fortunately, most search engines use the so-called "Boolean" search-language which accepts AND and NOT as operators or its numeric equivalents "+" and "-". The default value for most search engines when using multiple keywords is that they will first list all documents or sites that contain all the words, followed by those which contain only a few keywords. Most search engines also support looking for phrases, which are marked by quotation marks. Again this is rule is not universal, which is why you should read the instructions. Keep in mind that only a few search engines support the boolean operator "OR". And some like Altavista support even "NEAR" to narrow your search to pages where words appear within ten words.

    Common operators
    cat kitten Pages that contain both words are listed first, followed by pages that only contain one of the two words.
    cat AND kitten
    +cat +kitten
    Both words have to appear somewhere on the page
    cat +kitten "kitten" has to be on the page, "cat" should be.
    +cat -kitten
    cat NOT kitten
    "cat" has to be on the page, "kitten" must not.
    "cat food" "cat food" has to appear as a phrase on the page.
    cat* Lists pages that contain words which start with cat. E.g.: cats, catholic, catalog, catfood...

  2. Two search engines never have the same content.
    Sometimes you will be absolutely positive, that a certain piece of information has to be there. But your favorite search engine doesn't find it. Try another one! No two search engines have exactly the same databases. A recent investigation by the NEC Research Institute showed that even the top-search engines list only 16 percent of all webpages.
    One way to get around this problems is by using "Meta search engines" (see below) A few search engines have also developed strategic alliances with each other, so that unsuccessful query in one search engine can be run against the other search engines database.

top

Categories

Roughly said, all search engines fall in four categories:
  1. Indices: Sometimes also called "spiders" or "crawlers", these search engines use little programs (robots) to scurry through the web. These robots index every word they come across on every page they visit and follow every link they can find. These bots create huge databases, which can be searched for individual phrases. The downside of these indices is, that the use of relatively common keywords can result in millions of results. It is therefore often necessary to take a closer look at the search language of the index in order to be able to refine the search so, that only relevant hits are being returned. In this category: Alltheweb, Altavista, Google and Hotbot.

  2. Catalogs: Unlike indices, catalogs are created by humans. Sites must be submitted manually, get a description and are then being put into a certain category. The most important difference between catalogs and indices, is that catalogs only list sites, whereas indices also contain all the pages within a site. Catalogs often give better returns, when searching for a topic (e.g. HIV, American Newspapers, Hockey) rather then a specific piece of information. Also note, that a site content might change without the catalog being updated. In this category: About, Britanncia and Yahoo.

  3. Hybrids: Recently a lot of indices have also began to catalog their information, so that the distinction between index and catalog has become somewhat blurry. In most of these cases however the user has to actively choose the catalog-structure rather then the index, which tends to be the default value. In this category: Excite and Lycos.

  4. Meta search engines do not fit as snugly into this list as the previous three. Unlike the others, these tools do not have a database of their own. Instead they forward your query to a certain number of search-engines simultaneously and they collect the results from all of them. A benefit of this is, that now, you don't have to go to the various search engines, one after the other to compare the results, but there is also a downside: since most search engines work accoding to a different syntax, it is almost impossible to use operators to refine the query. E.g: AltaVista uses "-" to exclude a word from a query. Typing poodle -cat means here, that the document should contain poodle and must not contain cat, but since the same query is also send to search engines that do not support this operator they might think that -cat is actually a word and look for it. In this category: Dogpile and Metacrawler.

The most important thing is that you understand the concepts of indices and the catalogs. If the Internet is like a library, and the websites are the books in that library, then the catalogs give you auothor, title and maybe brief abstract of each book whereas the indices search the complete text of all the books and return individual pages and paragraphs to you. This has some important implications. Say, you have 100 books about AIDS in this fictitious library, each 100 pages long. A catalog would return 100 hits. One for each book. An index on the other hand might return up to 10,000 hits, one for each page that contains the word AIDS.
As a rule of thumb you can say that catalogs are better for searching for topics whereas indices are better suited for seraching for specific documents or pieces of information.

Do you want to know more about search engines and how they work? Then you should check out "Search Engine Watch". If you feel you need more information about operators etc., then take a look at the search-tutorial at the University at Albany. For those who prefer to read German, a look at the "Suchfibel" might be worthwhile.

top


Some search engines and what they are good for.

This is just a rough explanation of a couple of search engines. All together there are more then 300 different search engines with different approaches and advantages out on the web. Most of them can be found at Search Engines Worldwide. We picked the biggest and those which we think are most useful to professional researchers.

A more thorough investigation of many of the search-engines presented here can be found at the "Search Engine shoot-out".

If you want to experience these search engines hands on, without having to go to each and every one of them, to to our "Try it" page.

About
http://www.about.com
Category: Catalog
About.com has developed the catalog-category one step further then its competitors. Here, every field of interest has been assigned a specialist, whose job it is to provide the most relevant and best links within this field. Because of this, About.com has only a comparatively small number of links to offer, but these tend to be highly relevant and up to date.

Alltheweb
http://www.alltheweb.com/
Category: Index
"Alltheweb" aka "Fast web search" is one of the new kids on the search engine block and one of the most successfull ones as well. Its ambitious aim is to index all the web within one year and to maintain the largest and most up-to-date index of the WWW. So far they claim to have 200 millionen webpages in their database in addition to the FTP and MP3-search the site offers. While it's hard to judge whether the numbers are right (unfortunately Alltheweb was not one of the search engines tested by the NEC), this engine is definetely the fastest on the web.

AltaVista
http://www.altavista.com
Category: Index
AltaVista is the classical index and shows beautifully all the advantages and disadvantages of this category. Simple keywords often return millions of hits. AltaVistas strength emerges only, when it's extensive and impressive number of operators are used to refine the query. Reading the search tips is absolutely necessary!
AltaVista is an excellent tool when looking for very specific kinds of information. It's powerful operators make it the favourite search engine for professional researchers. Unfortunately though Altavistas performance has been going down in 1999, probably mainly due to internal problems within the company and the fact that Altavista seems to have lost its focus. Instead of concentrating on being "only" a good search-engine the company has recently been trying to become a portal, mail-provider and even internet service provider.

Britannica
http://www.britannica.com
Category: Catalog
Holding the record in name-changes (they changed their name and URL three times during the last two years) Britannica is one of the smallest and finest catalogs on the web. The "Britannica, Encyclopaedia Britannica's Internet guide (...) classifies, rates, and reviews thousands of websites. Britannica editors search the Web to identify the highest-quality Web resources, which are then clearly and concisely described, rated according to consistent standards, and indexed for superior retrieval.." (Quote: Britannica). And really: Britannica rarely returns irrelevant hits since the editors won't include junk in their databse. The downside is, that quite often you don't get any hits at all. The Britannica Website does also let you access the complete Encyclopaedia Britannica online.

Dogpile
http://www.dogpile.com
Category: Meta
A pretty sophisticated meta search tool, Dogpile offers you a range of operators to refine your search. It searches all of the major search engines, plus some news-services, FTP and Usenet. The advanced options allow to exclude certain types of searches or engines. Dogpile searches all engines one after another and does not exclude and duplicates.

top

Excite
http://www.excite.com
Category: Hybrid
Excite has left the waters of the classical search engines and now includes a vast array of extra features, like personalized news, an appointment-reminder (by email), a chat and even a buddy client. A nice feature of Excite is the "More results like this"-option that recommends pages related to the selected site. Unfortunately the quality of Excite has deteriorated during the last year when it's finacial problems became more and more serious.

Google
http://www.google.com
Category: Index
This new search engine is using a new approach to rank results. Whereas other search engines determine the relevance of a page in their database by looking at where and how often your search terms appear, "Google" looks at how many other sites link to a specific URL. The thought behind this method is, that if a site is good, then other people will link to it and if not they won't. And really, Google is returning excellent results and is together with Alltheweb our current search.-engine of choice. Unfortunately though this system makes it nearly impossible for new sites with high-quality information to be found, e.g. new scientific papers that are not yet heavilly linked to. A really cool feature are the "Google buttons" which can be integarted in your browser. The "Google Scout" for example calls up a list of similar web-pages to the one you are looking at.

Hotbot
http://www.hotbot.com
Category: Index
In beginning Hotbot had gotten a lot of media attentions received some very euphoric reviews. The praise it received is mainly due to its good graphical interface (though the colors are hideous) which allows non-computer-savvy people to refine their searches easily. In 1998 "c'net voted it best search engine on the net. After Hotbot was bought by Lycos it's become rather quite around Hotbot but it's still worth a look.

top

Lycos
http://www.lycos.com
Category: Hybrid
Lycos seems to have a lot of problems: The returned hits are often not as accurate as they should be, there are a lot of duplicates, the index is not up to date and it contains a large number of dead links. On the upside Lycos has a quite sophisticated "Advanced Search" feature, which alleviates the problems with accuracy somehow. A nice feature is, that one can search specifically for pictures or soundfiles and even specify in which part of the document the keyword is supposed to be.

Metacrawler
http://www.metacrawler.com
Category: Meta
The Metacrawler is as far as we know, the oldest meta engine on the net. Positive features of this engine are the easy to use interface and the low-bandwidth version.

Yahoo
http://www.yahoo.com
Category: Catalog
Yahoo is probably the best known and the biggest catalog online. Yahoo's categories and classifications work generally fine and make it a good and useful tool. Sites are being submitted to Yahoo manually. Unfortunately, Yahoo contains a lot of "dead links" since it does apparently not check the validity of the links frequently and is extremely slow to include new sites. Webmasters often have to wait four or more months before their site shows up in Yahoo! So don't look for new sites here.

up to the top

Do you have a tip or know a hot search engine? Please tell us about it!

mail to USUS
© 1998 - 2005 USUS