(Republished with permission from the author. Originally published on the site of HTML Indexer)
Why Create an Index?
By David M. Brown
To understand the importance of an index to your Web site, intranet, or help system, it helps to consider the alternatives. The most prevalent means of access to information in HTML files seems to be the search engine.
At many Web sites, you can search hundreds or even thousands of files. You may get back hundreds of links, so you struggle through the list, trying to find a link or two that leads you to useful information.
Sometimes it's hard to imagine what the link has to do with the word or phrase you searched for. This is especially true with full-text search engines.
In a book, the concordance lists every word in the book and every place where the word appears. Full-text search is like using a virtual concordance. A concordance is not an index.
Even if you omit the insignificant wordsobvious examples include a, and, for, and thea concordance is still limited to the words that actually appear in the source material.
Full-text search is even less useful than a real concordance, because you can't see the list of words. Each search is like a shot in the dark: maybe you get a hit, maybe you don't.
You can create a real concordance for your HTML files: a list of all the words in your source files with links to every place where they appear. There's even software to generate concordances automatically. But it doesn't solve the basic problem with a concordance:
Rather than searching the full text of the HTML file, keyword-based search engines look for a special tag in each source file. This tag can specify synonyms, concepts, variant terms, and other words that don't appear in the body of the HTML file.
To some extent, you can use this special tag to improve your readers' chances of finding the information they're looking for. (Some full-text search engines may also benefit from the enhanced content.)
At its best, searching for keywords is like using a limited virtual index.
Some search engines accept only individual words. This severely limits your ability to create meaningful access to your HTML files. (It would be unacceptable in most printed indexes!)
The search engines that do accept phrases vary in how they separate themsome use commas, others require you to enclose phrases in quotation marks. There's no single way to format the keywords that will work for every search engine.
And your readers are still shooting in the dark, because they can't see the list of keywords.
Newer search technology
Some search engines include plurals, past tense, "-ing" forms, and other variations of the word you search for. Some let you search for phrases as well as individual words. Others add Boolean logic, regular expression evaluation, or natural language processing.
Newer versions of popular browsers use ActiveX, Java, and other technologies to improve the success rate of searches or to create an interface similar to the familiar index in 32-bit help files for Microsoft® Windows®.
Each search engine implements these features differently and in different combinations, so readers have to learn which ones are available and how to use them at each site they visit. Many simply don't bother.
A lot of so-called "standards" seem to have the life span of a fruit fly. Your readers may not have the latest or most popular browsers, the array of necessary plug-ins, or the hardware and connections needed to use them at tolerable speeds.
For most applications, we think an index is preferable to a concordance. We also believe a real index is preferable to even the best keyword-based search engine, for the following reasons:
Professional indexer Kevin Broccoli shows how an index improves the value of a corporate intranet in the article Improving Information Retrieval with Human Indexing (ex http://www.intranetjournal.com/features/humanindex-1.shtml). Kevin's earlier editorial, Indexes: An Old Tool for a New Medium, is also excellent. (ex www.contentius.com)
© 1998–2005 Brown Inc.