Search Engine Fundamentals:

The term Search Engine has become a catch-all phrase for all kinds of search services. Without these free resources it would be much more difficult to find anything anywhere on the ‘Net. Searching is fundamental to gathering information on the Internet. You can search different areas of the ‘Net such as Usenet (commonly called newsgroups), or the World Wide Web by using different search services. Each service has their own way of compiling and collecting information.

There are two main kinds of search services commonly used on the Web: the index, and the directory or subject guide. One way to think of the differences between these two kinds of engines is to think of web sites as books. Indexes will catalog every word in every book it looks at, and will list for you each page that contains word(s) you’re looking for. Directories and Subject Guides take the overall subject matter of the books it looks at and lists the front covers of the books that match your word(s).

You’ve probably heard of Alta Vista and HotBot, — both popular search indexes. Indexes regularly scan the Internet for Web pages and record the HTML content and key words. They also have the ability to follow any links associated with scanned pages and get even more information.

The job of compiling data for indexes is done by spiders (also called robots, bots, or crawlers ergo the names HotBot and WebCrawler), software programmed by a human to automatically gather information from all over the ‘Net based on specific or broad search criteria. Most of the time spiders scan pages on the fly, without the owner’s knowledge or consent (if you don’t want some or all of your web pages scanned by spiders, you can write some HTML into your page to keep them out).

The advantages of this kind of service is their data bases are very large and updated often by spiders working around the clock. They catalog Web pages in a computational manner without human intervention. A search engine’s spider catalogs all the pages of a given web site, listing for you only the pages that match the words or phrases you’re searching for.

For instance, if you’re looking for information about spiders, you’ll get over thirty-nine thousand hits (links to a Web page) from Alta Vista with the word spiders in them. This means not only will you get pages referencing Internet robots, you’ll mostly get the eight-legged, living-in-your-shoe-and-going-to-bite-you kind of spider.

A drawback to using services of this type is that sifting through so many hits to find what you’re looking for is sometimes a daunting task. Some indexes include a number of options you can utilize to help narrow down your search criteria, such as search for this exact phrase or search for any of the words on HotBot.

On the other hand, Yahoo! and Magellan are hierarchical directories of web page subjects. Each reference is entered and updated by a person manually, placing each web address in a certain context much like your telephone company’s Yellow Page directory.

People catalog the sites in a directory, so the hits often include reviews and/or recommendations, which can guide you through the content of the pages quicker and more easily.

To have a Web site listed in a directory you must submit it yourself, or you can hire a company to do it for you. The directory has the last word on where they catalog your site. This means directories contain far fewer sites than indexes do, but they are better targeted to what word(s) you use to search.

For example, you enter the same key word spiders in Yahoo!, and this time you’ll get a list of categories like Science: Zoology: Animals, Insects, and Pets: Arachnids or Computers and Internet: Internet: World Wide Web: Searching the Web: Robots, Spiders, etc. Documentation which can narrow and shorten your search significantly. You’ll get fewer hits overall, and hits on pages with headings and content within the context of the keywords you enter.

One drawback is that Yahoo’s hits are usually to home pages (the first page of a site) only, for instance it would hit a home page called Nancy’s Page-O-Spiders but not Nancy’s Home which contains a page exclusively on spiders. Another drawback to directories is that manually updating directories is tedious and time consuming, and that means old sites that are no longer valid (dead links) are often listed long after their demise.

Some search services use both schemes — they are both an index and a directory, like Infoseek and Excite. These services occasionally send out a spider to collect and cull Web sites, alongside people cataloging sites that are submitted by Web developers.

Yahoo’s directory is one of the the best on the Web, but their service is limited. To fill the gaps in their service, Yahoo! teamed up with Alta Vista to automatically send your query there if your Yahoo! search found no matching hits.

As a rule of thumb, if I’m not exactly sure what I’m looking for, like modems, I’ll start with a directory, which will show me lots of modem brand hits and companies that sell modems. But, if I know I’m looking for information about a specific brand of modem, I’ll use an index, which will show me many sites with that particular brand name listed somewhere on the page.

No one service catalogs the whole web. Each service logs parts of it and there is overlap. Services also put their own spin on how they rank hits. For instance, some advertisers pay for their sites to be listed on some services, so their sites get priority listing, being listed in your search even if their site has nothing to do with what you’re looking for. Knowing this, it’s a good idea to use more than one search service when you’re looking for something.

There are hundreds of search tools out there, so don’t only use the big, popular ones. There are even specific search services for special interests, such as art or science. Some search email, home addresses or phone numbers, some usenet only, some search both and more. Look for all-in-one search engines like Dogpile or Metafind that enter your key words into many engines at once, which result in the first set of ten or so hits of several services listed on one seemingly unending page.

Try using a variety of search services using your favorite hobby as the key word(s) and you’ll see the radically different hits you’ll get with each directory and index. No one service is perfect, so use as many as you have time for. Using many search engines will also help you get a feel for how the different kinds of services work. You’ll soon find yourself using a favorite engine to find all the information you need quickly and painlessly.

The Internet is in a state of constant change. Internet addresses disappear as fast or faster than new ones are created. Many sites relocate without telling anyone, and dead listings are everywhere, so finding a page that’s moved requires that you utilize more than one engine.

Happy searching!