Indexing across Titles

By Mark Fretz of Scribe Inc.


Indexes are important components of publications. Indexers create indexes to help people access the material being indexed. An index is ancillary to the primary content of the authored work, whatever form that primary content takes (e.g., book, journal or newspaper article, website, blog, newsletter). As such, we must remember that the index is metadata. Indexes are exceptionally helpful in giving people access to the content of books, journals, and so forth; while publishers consider them valuable (i.e., they add value to the content), customers view them as indispensable.

With the exception of journal indexes, it is rare that indexes are considered collectively. This seems odd because publishers are usually deliberate and systematic about their investments and the processes employed to make it easier to access their content.

Publisher-Wide Indexing

While indexes should be considered value-added metadata, we suggest that they should not be seen as ancillary. Instead, indexes should be considered as part of the content.

We further suggest that indexing should take place across titles. When approached collectively, rather than on a title-by-title basis, where there is no substantive connection between titles, indexes take on a new dimension. We can take one small, conceptual step in this direction if we focus attention on multivolume books with indexes that cover all volumes. Indexes must be formatted consistently to look the same in each volume. But more important, the entries themselves must be consistent and the locators comprehensive.

Expanding this concept further, what if publishers approached the indexing of all their titles as if the entire list were a multivolume work? Although it might not seem as if titles have a direct connection to each other, they are all products of the same house, and as a best practice, publishers logically want to increase access to their content (i.e., make their products easier to find and thus buy). The basic rules governing index creation across a publisher’s entire list would necessarily require consistency. One means of achieving this consistency is to create a glossary of indexable and indexed terms that would serve as a master list of index entries that have appeared in all the publisher’s indexes. This glossary could be developed as a terminology guideline for your indexers (even when the indexes are created by authors).

When an indexer produces an index, he or she would first search the index glossary for possible entries that would work. If a satisfactory entry already exists, then the default would be to use the entry from the glossary. If no satisfactory entry exists, then the indexer would create a new entry for the index being created, which would be added to the glossary for use in a future index.

Another aspect would be the creation of an index repository or database related to the glossary. This repository would hold all published indexes in a single, searchable place. Indexers, editors, proofreaders, or anyone involved in the production of the book or journal could search this database while preparing a new index to see how a given entry was handled in previous titles to achieve consistency across the publisher’s entire list.

Taking the technological component of indexing across titles one step further, the publisher could turn the indexing process on its head. In this upside-down indexing world, indexers would no longer create indexes as ancillary materials generated after proof pages; rather, the index creation process would migrate upstream in the publishing chain. Indexers would perform their work during the authoring, developmental editing, and copyediting stages of content creation. This could involve increased interaction and consultation with the content producers prior to finalizing the content, as opposed to having no contact with the book until it has already been typeset. There are various ways to implement this type of process, but perhaps the most logical way is to move toward embedded indexes, which would reverse the sequence of producing indexes at the proof pages stage and dispense with the creation of indexes as ancillary documents independent of the primary content. At the same time, in an ironic twist, any given index could be published as a separate, linked document to improve the discoverability of the book to which it is connected.

Benefits of Indexing across Titles

Publishers could reap several benefits from indexing across titles:

  1. Consistency. Customers would come to expect indexes to look a certain way, contain certain entries, and reflect the publisher’s brand (i.e., imprint identity).

  2. Efficiency. Indexing across titles would eliminate the need to repaginate an index based on changes during the proof pages stage of production; any changes to the pages would automatically be reflected in the index. Using an index glossary would simplify and reduce the time required to create the index. Consequently, copyediting the index would be easier, faster, and require less quality control time for finding and fixing errors.

  3. Cost savings. Publishers would still need to invest in creating indexes that add value (i.e., thinking indexes vs. automatically generated concordances). That cost might remain relatively constant; instead, the cost savings would show up in fewer hours of corrections to proof pages and the elimination of rush fees when going to press. Publishers producing e-book versions of their titles could eliminate the cost of linking the indexes altogether, because the indexes would already be anchored to pages and linking can be automated in the conversion process.

  4. Increased discoverability. Consistency of indexing across titles will increase the discoverability of content because the same terms will repeatedly pop up in both the broadest Internet searches and searches of proprietary databases, such as those employed by distributors or library systems. When searching the Internet or a database, discoverability depends in part on keywords tagged in the content. While keywords can be determined based on frequency of use and then tagged automatically, the index can provide a more robust and meaningful keyword list. The natural language component of searching—predominantly natural language processing (NLP), which is part of the realm of artificial intelligence—requires an ample enough quantity of data to determine patterns of usage. Discoverability based on natural language is not limited by terms tagged as keywords in a document. On the one hand, an index created by a human is based solely on the content of the book or journal that it indexes. On the other hand, that index also contains the results of decisions related to more than just the literal words written in the document—that is, such an index interprets the content within the larger realm of all human knowledge. The index makes the content more accessible because the content has been interpreted by human means: it increases the findability of individual elements of the content.

Expanding Our Vision of Indexes

Indexes are increasing in importance and value, and publishers need to develop strategies to benefit from these value-added components of their products. In its simplest form, indexing across titles can be implemented by creating a master glossary of index entries based on indexes from backlist titles or by keeping a glossary from this day forward. This concept can become much more complex and sophisticated, with only a publisher’s vision and resources limiting its potential benefits. It is time to expand our vision of what indexes are, how we create them, how we produce them, and what their value is to us as publishers. If we are paying for indexes, we should get the highest return on our investment.