
(Republished with permission from the author. Originally published on the
site of HTML Indexer)
Why Create an Index?
By David M. Brown
To understand the importance of an index to your Web site, intranet, or help system, it helps to consider the alternatives. The most prevalent means of access to information in HTML files seems to be the search engine.
Search engines
At many Web sites, you can search hundreds or even thousands of files. You may get back hundreds of links, so you struggle through the list, trying to find a link or two that leads you to useful information.
Sometimes it's hard to imagine what the link has to do with the word or phrase you searched for. This is especially true with full-text search engines.
Full-text search
In a book, the concordance lists every word in the book and every place where the word appears. Full-text search is like using a virtual concordance. A concordance is not an index.
Even if you omit the insignificant wordsobvious examples include a,
and, for, and thea concordance is still
limited to the words that actually appear in the source material.
Full-text search is even less useful than a real concordance, because you can't see the list of words. Each search is like a shot in the dark: maybe you get a hit, maybe you don't.
You can create a real concordance for your HTML files: a list of all the words in your source files with links to every place where they appear. There's even software to generate concordances automatically. But it doesn't solve the basic problem with a concordance:
- If the word isn't in the source files, it isn't included in the search
results or the concordance.
- If it is in the source files, the search results include
every occurrence of the wordit's pretty hard to separate
the wheat from the chaff.
Keyword-based search engines are an attempt to overcome these limitations.
Keyword-based search
Rather than searching the full text of the HTML file, keyword-based search engines look for a special tag in each source file. This tag can specify synonyms, concepts, variant terms, and other words that don't appear in the body of the HTML file.
To some extent, you can use this special tag to improve your readers' chances of finding the information they're looking for. (Some full-text search engines may also benefit from the enhanced content.)
At its best, searching for keywords is like using a limited virtual index.
Some search engines accept only individual words. This severely limits your ability to create meaningful access to your HTML files. (It would be unacceptable in most printed indexes!)
The search engines that do accept phrases vary in how they separate themsome use commas, others require you to enclose phrases in quotation marks. There's no single way to format the keywords that will work for every search engine.
And your readers are still shooting in the dark, because they can't see the list of keywords.
Newer search technology
Some search engines include plurals, past tense, "-ing" forms, and other variations of the word you search for. Some let you search for phrases as well as individual words. Others add Boolean logic, regular expression evaluation, or natural language processing.
Newer versions of popular browsers use ActiveX, Java, and other technologies to improve the success rate of searches or to create an interface similar to the familiar index in 32-bit help files for Microsoft® Windows®.
Drawbacks
Each search engine implements these features differently and in different combinations, so readers have to learn which ones are available and how to use them at each site they visit. Many simply don't bother.
A lot of so-called "standards" seem to have the life span of a fruit
fly. Your readers may not have the latest or most popular browsers, the
array of necessary plug-ins, or the hardware and connections needed to
use them at tolerable speeds.
For most applications, we think an index is preferable to a concordance. We also believe a real index is preferable to even the best keyword-based search engine, for the following reasons:
- All the entries are visible. Using a good index, you make discoveries:
material related to what you're looking for, terms you might not have
thought to look under, and concepts the author and indexer considered
important.
- Creating the index gives the author and indexer a chance to focus
on the content, to see the document as a whole, to identify inconsistencies,
and improve the content of the document.
- For most of us, the most familiar access method is the one that's
been around the longest: the kind of index at the back of a book.
Related reading
The usability experts at User Interface Engineering (UIE) offer some
real-world examples in their article Why
On-Site Searching Stinks. Learn more about search engines in these
short articles by UIE.
Professional indexer Kevin Broccoli shows how an index improves the
value of a corporate intranet in the article Improving
Information Retrieval with Human Indexing. Kevin's earlier editorial,
Indexes:
An Old Tool for a New Medium, is also excellent.
© 1998–2005 Brown Inc.
David M. Brown is president of Brown
Inc. and designer of HTML
Indexer.
Back to Articles
|