
(Republished with permission from the author. Originally published November
17, 1998, in Contentious)
Indexes: An Old Tool for a New Medium
By Kevin Broccoli
"I know it's here but I just can't find it!"
You've probably heard that exclamation in a variety of situations.
Today, however, it seems that people often experience this kind of frustration
when trying to locate specific information within HTML documents. This
is especially true concerning "content-rich" Web sites.
Search-Engine
Shortcomings
Perhaps you've had the following experience: You visit a Web site hoping to find
information about a particular topic. You type a keyword or two into the site's search
engine. What do you find? Nothing! The search engine says, "0 results have been found
for your search."
So you try once more, this time using a different search term than before. Now you do
get some results but too many. The search engine now says, "47 documents have
been retrieved." Thats more than you wanted or expected.
Still, you start looking through those documents one at a time. After several hours
spent scanning many pages of text, you discover that only 4 of those 47 documents
contain the information you sought.
Exasperated, you wonder why so much information was presented to you, when so very
little of it met your needs. You also wonder what might have happened had you not
submitted that precise search term into the search engine.
The root of the problem lies in how search engines perform searches. Put
simply, they scan text looking for occurrences of whatever word you typed
into the search box. Then, they list every single document that contains
even the merest mention of the word.
What
Makes a Good Index
The Internet is a relatively new medium, but you can learn a lot about how to make
online content work well from the "parent" of online media: print media.
Most printed reference or nonfiction books offer an index of some kind. An index is not
a blind, mechanical catalog of words. Rather, it is created by an indexer.
Indexers are trained to analyze concepts. An indexer will physically read every
page of a book and develop a list of page references that lead to information on various
topics, individuals, or places covered in that book.
The goal of an index is to direct readers to pertinent information
on each topic listed, rather than passing mentions. This requires the
indexer to make many judgment calls that is, to consider context
as well as content.
Indexers also categorize concepts they break down main subjects headings
into subtopics, in a hierarchical format. This structure helps readers "narrow"
their search.
A well-written index assumes that the reader may not know specific terms used in the
text. Therefore, an indexer will use a thesaurus to create index entries that are synonyms
of the terms used within the text. This ensures that even if readers dont know the
exact words used in a text, they still will be directed to pages that discuss the topic
sought.
A well-written index also lists topics that are implied, rather than stated
directly in the text. Consider the example of a book about dogs that does not include a
section devoted to canine food or nutrition but that does discuss (in various
places) the importance of feeding a dog properly, and also what vitamins and minerals are
essential to canine health.
It is likely that readers would turn to this book seeking information on
dog food or nutrition, so the book's index should include the terms "nutrition"
and "food," with references to relevant pages.
Web
Indexes vs. Book Indexes
Indexes obviously are useful and appropriate for books. However, they also
can work well for Web sites. A Web site index offers the same benefits
over a search engine that a book index offers over a concordance.
Whats a concordance? A concordance lists every single
occurrence of each individual word of significance contained in a specific
text. This is similar to the results produced by a search engine. If
you look at a large concordance (such as Strong's
Exhaustive Concordance of the Bible), youll see how
many listings are possible for a single word. (For instance, in Strongs
Exhaustive, try looking up the word "king.") Therefore, for
most purposes a concordance generally isnt as useful as an index.
In some respects, the process of creating an index for a Web site is similar to
creating an index for a book. For instance, a Web indexer will read through every
page in the site, analyze the concepts discussed, and develop an index that lists the
topics covered in the text.
One key difference between a book index and a Web index is hypertext.
In a Web index, the references listed can (and should) be live links
that take the user directly to the relevant text in the site. Live links
make a Web index not merely informative, but functional. Some examples
of Web site indexes that utilize live links are:
Ideally, a Web site indexer should know how to modify the HTML code
of Web pages, in order to create hyperlinks. Specifically, indexers
should know how to create an "anchor" in the Web page where
the text referenced in a particular index entry begins (if no anchor already
exists at that location), and then make the index entry a live link to
that anchor.
Updating is an important issue for both print and online
indexes. However, updating a Web index typically involves incremental
maintenance. (Index updates for books are infrequent, major projects.)
Most Web sites evolve constantly from minor modifications to small sections of
text, to the addition or deletion of entire content sections. Also, existing content can
be moved to a different page or directory within the site.
In order for a Web index to remain useful, it must keep pace with the sites
evolution. Few things are more frustrating to a user than broken or outdated links in a
sites own index.
Consequently, there should be regular, frequent communication between the
sites developers and the indexer. Whenever significant content is
modified, moved, added, or deleted, the indexer should be informed. Then,
the indexer should immediately update the index to reflect the current
state of content on the site.
Is
It an Index or Not?
A quick look around the Web reveals that the term "index" is much
misunderstood by Web developers and publishers. In fact, most Web reference tools labeled
"site index" are not indexes at all!
Most people know what an index is, from having used them in printed books. Therefore,
when a visitor sees a link on your site that says "site index," he or she may
click on that link expecting to encounter a real index. However, if that link leads to a
different type of guide it might cause confusion, frustration, or disappointment.
If the guide or reference tool youve created for your Web site
is not a true index, its helpful to your visitors if you call it
by its correct name.
The site guides and tools described below are not indexes,
but they commonly are mislabeled as such. Examples of sites that have
made this mistake also are listed:
- A table of contents, even a very detailed one, is not an index.
It is very common for a sites table of contents to be mislabeled
as a site index in fact, its more common to see this mistake
than to see true Web site indexes that are labeled correctly! A similar
misunderstanding could lead to a site map being mislabeled as
an index.
See: Sears,
Chase Manhattan Bank,
WebReference.com,
and The Beer Info Source
- A collection of links to related Web sites or other resources
is not an index.
See: Family Tree Maker
Sometimes it can be hard to tell whether a particular site guide is an
index or some other kind of tool. For instance, at first glance the "index"
of the Association
for Health Services Research Web site appears to be a true index.
It is ordered alphabetically, and some entries (such as "About AHSR")
include subtopics.
However, this page is a sophisticated table of contents, not a true
index. All of its entries directly reflect the sites structure
(how information is divided into sections and pages). The list is not
really broken down by subject. For instance, while this list includes
entries for "Job and Resume Binder Order Form" and "Career
Center," there is no subject-based entry for "Jobs."
Not
Every Site Needs an Index
Some types of Web sites on the Web that would not benefit significantly from an index.
For instance:
- Online stores: These sites may be large, but since
they usually have very little content (in the conventional sense) the
only essential information retrieval tool is a search engine. Amazon.com
is good example of this. There, users simply type in the title of the
book or CD sought, or perhaps the author's or musicians name,
and they are led to a page featuring information about the book or CD.
- Smaller sites: When a visitor can click through a
sites complete contents in a matter of minutes, an index would
not add much value.
In contrast, many types of sites would serve their visitors better by offering an
index. This is especially true of online magazines or other content-rich sites.
For example, 21st Century Online
publishes articles by professionals in various disciplines. Although a
reader can simply "drill down" through the current selection
of articles on the site, this becomes increasingly difficult as more and
more articles are published.
Even Hotwired (the
online counterpart of Wired magazine) does not yet have a site
index. However, an index would be especially helpful for finding specific
information in this venues four years worth of archives.
Working
with (or as) an Indexer
If you decide that your Web site needs an index, you then must decide whether to hire
someone to create it, or whether to do it yourself.
If your site is very content-rich, youre probably better off investing in hiring
a professional indexer. This also could be a good decision for sites that are smaller or
less complex, as long as the budget is available.
Remember: the goal of an index is to improve the usability of a Web site.
Therefore, considering an indexer as a usability professional could help justify this
investment.
However, if your site is not especially large, or if there is no budget
to hire an indexer, or if you simply wish to learn a new skill, it is
possible to teach yourself enough about the basics of indexing to attempt
this project. A few resources that can help you learn how to create an
index are:
- "Organizing
Your Site from A to Z: Creating an index for users who know what
theyre doing"
This article by Lou Rosenfeld, published in Web Review in Oct.
1997, covers the basics of what makes a good index. It also outlines
a four-step process for creating a Web site index.
- The American Society of Indexers (ASI) offers a bibliography
of resources for and about indexing, as well as a list of frequently asked questions
on this topic. They also have published a document specifically about
Indexing the Web.
The ASI site also has an Index Evaluation Checklist,
which can help you determine whether an index is appropriate and complete.
While this document does not specifically address Web indexing, many
of its points apply to Web indexes.
Indexing also can be a lucrative line of work. Although most available indexing
work is for print media (books, etc.), indexes are becoming increasingly common in online
and digital media (Web sites, Intranets, CD-ROMs, etc.). For writers, editors, producers,
or Web developers, indexing can be one more valuable service to market to your clients.
The ASI is a
good resource for people who seek to become professional indexers. This
groups indexing FAQ
covers several key points about the "business side" of this
field.
Conclusion
Whether your site has an index or not, or whether you learn to create indexes or not,
learning about indexing can prove valuable to anyone who develops or uses Web sites.
Understanding indexes makes Web developers and publishers consider what their users
would want to find, and how those searches could be simplified or aided. Similarly, Web
users who understand the value of a good index can encourage Web publishers to add this
key usability tool to their sites.
Its even possible that, one day, indexes might be considered as
indispensable to informational or content-rich Web sites as they are to
printed reference books today.
© 1998-2006 Kevin Broccoli
Kevin Broccoli is president of Broccoli
Information Management.
Back to Articles
|