Site index

Home

About Web Indexing

Resources

Indexer Search

Member Directory

Membership

Web Index Examples

Contact

ASI logo and link to ASI site

Web Indexing SIG, a Special Interest Group of the American Society for Indexing

(Republished with permission from the author. Originally published on the site of HTML Indexer)

Why Create an Index?

By David M. Brown

To understand the importance of an index to your Web site, intranet, or help system, it helps to consider the alternatives. The most prevalent means of access to information in HTML files seems to be the search engine.

Search engines

At many Web sites, you can search hundreds or even thousands of files. You may get back hundreds of links, so you struggle through the list, trying to find a link or two that leads you to useful information.

Sometimes it's hard to imagine what the link has to do with the word or phrase you searched for. This is especially true with full-text search engines.

Full-text search

In a book, the concordance lists every word in the book and every place where the word appears. Full-text search is like using a virtual concordance. A concordance is not an index.

Drawbacks of full-text search (virtual concordances)

Even if you omit the insignificant words—obvious examples include a, and, for, and the—a concordance is still limited to the words that actually appear in the source material.

Full-text search is even less useful than a real concordance, because you can't see the list of words. Each search is like a shot in the dark: maybe you get a hit, maybe you don't.

You can create a real concordance for your HTML files: a list of all the words in your source files with links to every place where they appear. There's even software to generate concordances automatically. But it doesn't solve the basic problem with a concordance:

  • If the word isn't in the source files, it isn't included in the search results or the concordance.
  • If it is in the source files, the search results include every occurrence of the word—it's pretty hard to separate the wheat from the chaff.
Keyword-based search engines are an attempt to overcome these limitations.

Keyword-based search

Rather than searching the full text of the HTML file, keyword-based search engines look for a special tag in each source file. This tag can specify synonyms, concepts, variant terms, and other words that don't appear in the body of the HTML file.

To some extent, you can use this special tag to improve your readers' chances of finding the information they're looking for. (Some full-text search engines may also benefit from the enhanced content.)

At its best, searching for keywords is like using a limited virtual index.

Drawbacks of keyword-based search (virtual indexes)

Some search engines accept only individual words. This severely limits your ability to create meaningful access to your HTML files. (It would be unacceptable in most printed indexes!)

The search engines that do accept phrases vary in how they separate them—some use commas, others require you to enclose phrases in quotation marks. There's no single way to format the keywords that will work for every search engine.

And your readers are still shooting in the dark, because they can't see the list of keywords.

Newer search technology

Some search engines include plurals, past tense, "-ing" forms, and other variations of the word you search for. Some let you search for phrases as well as individual words. Others add Boolean logic, regular expression evaluation, or natural language processing.

Newer versions of popular browsers use ActiveX, Java, and other technologies to improve the success rate of searches or to create an interface similar to the familiar index in 32-bit help files for Microsoft® Windows®.

Drawbacks

Each search engine implements these features differently and in different combinations, so readers have to learn which ones are available and how to use them at each site they visit. Many simply don't bother.

A lot of so-called "standards" seem to have the life span of a fruit fly. Your readers may not have the latest or most popular browsers, the array of necessary plug-ins, or the hardware and connections needed to use them at tolerable speeds.

Benefits of a real index

For most applications, we think an index is preferable to a concordance. We also believe a real index is preferable to even the best keyword-based search engine, for the following reasons:

  • All the entries are visible. Using a good index, you make discoveries: material related to what you're looking for, terms you might not have thought to look under, and concepts the author and indexer considered important.
  • Creating the index gives the author and indexer a chance to focus on the content, to see the document as a whole, to identify inconsistencies, and improve the content of the document.
  • For most of us, the most familiar access method is the one that's been around the longest: the kind of index at the back of a book.

Related reading

The usability experts at User Interface Engineering (UIE) offer some real-world examples in their article Why On-Site Searching Stinks. Learn more about search engines in these short articles by UIE.

Professional indexer Kevin Broccoli shows how an index improves the value of a corporate intranet in the article Improving Information Retrieval with Human Indexing. Kevin's earlier editorial, Indexes: An Old Tool for a New Medium, is also excellent. (ex www.contentius.com)


© 1998–2005 Brown Inc. 

David M. Brown is president of Brown Inc. and designer of HTML Indexer.

Back to Articles