Book Metadata and Discoverability

By Michael Maher of Scribe Inc.


Over the course of the past year, “metadata” and “discoverabilty” have become popular buzz words in the publishing industry. Many publishers, at the behest of digital marketers, have begun to recognize the importance of metadata, especially in a digital environment. Metadata is a huge term, with several layers of meaning. Discoverability is not quite as daunting, but it still eludes many of us.

Earlier this year, Scribe’s President, David Rech, wrote a detailed article aimed at defining metadata, producing metadata, and using metadata. For those looking for a comprehensive breakdown of different types of metadata and the various applications of metadata, David’s article should be your starting point.

While that article was more encompassing, this one is a bit more focused on how book metadata and discoverability relate to each other. In its simplest definition, metadata is “data about data.” In the book publishing world, then, it essentially means “data about books.” With book metadata, we, as readers (or at least potential readers), can utilize metadata to not just gather some information about a book, but to find (or discover) that book in the first place. This is where discoverability and metadata overlap. When you consider that online channels now account for more than half of all book sales, this connection becomes even more important.

Metadata is, of course, not a new concept to book publishing. Books have been categorized and organized based on bibliographic data for hundreds of years (it seems like just yesterday we were all gushing over Melvil Dewey’s 1885 book, Decimal Classification and Relativ Index for arranging, cataloging, and indexing public and private libraries and for pamflets, clippings, notes, scrap books, index rerums, etc.). The ways in which we organize, display, and utilize book metadata, though, have drastically changed over the years. The two biggest changes over the past two decades have been due to online channels for purchasing books and the development of electronic (or digital) book formats.

With the emergence of online booksellers like Amazon, the change was obvious, at least at first: in order to sell books via this outlet, publishers had to rethink their use of metadata. Now, when readers went online, there was no browsing through the different organized sections of a bookstore in order to find books. In place of physical browsing, there was digital browsing, with pages of search results in place of rows of shelves. In the primitive years of the internet and online purchasing, a user’s options were limited.

Once the capabilities of the internet and online transactions caught up with the ideas of the internet (high speed internet, search engines, interactive websites, and, eventually, the ability to use mobile devices to connect), though, the options were limitless. These advances made the use of metadata by publishers even more important, as it, at the very least, gave readers access to more books than ever before.

The question for publishers, with regard to the use of metadata, became: “How do we correctly use metadata to ensure that readers can search for and discover our books?” Once search engines like Google started being used regularly by computer users to search for everything from books, to pop culture news, to Halloween costumes, the need for publishers to effectively use metadata to increase the discoverability of their books became greater than ever.

There are naysayers to the importance of discoverability (and, apparently, the ability to search for books). Andrew Rhomberg, founder of Jellybooks, wrote the following back in January of this year:

Search, be it via Google, Amazon, Barnes & Noble or Kobo, is not the answer. Most readers search by title or author, which implies they are already aware of the book, meaning they have already “discovered” it and are not on the path to interest -> desire -> action (searching by category is the exception to the rule here and the one segment where search plays a discovery function).

But such a view is perhaps (at least slightly) narrow-minded. This might be true for popular fiction titles or popular authors, but what if I am a student looking to find a book on volunteers during the War of 1812? Going to Amazon or Barnes & Noble and wading through categories and page results is not going to be my first stop. Instead, I am going to go right to Google and type in something along the lines of “book on volunteers in the war of 1812.” And when I do this, I will find Edward Skeen’s Citizen Soldiers in the War of 1812. I do not know what I want yet, at least in terms of title or author, and so I turn to what I always turn to: my preferred search engine.

This is how most users utilize their computers, tablets, and smart phones in 2014, and it is why every browser available has built in some form of search engine capability right into the address bar. Users have a preferred method for discovering information they are seeking, and they always turn to that method first. Only when that method lets them down do they turn elsewhere. If publishers overlook the importance of discoverability at a time when search engines are the first stop for the majority of the population, they will likely be disappointed with the results.


Earlier, we mentioned the two biggest recent changes to the use of metadata had to do with online bookselling/purchasing and the development of electronic books. Everything about metadata mentioned above is even truer for e-books, if that is possible. E-books have metadata applied to them in the same ways as print books, with metadata provided to online sellers when the book is being added and uploaded to the website, and with similar (or exactly the same, in some cases) information provided on the copyright page.

However, e-books have additional metadata. An .epub (the open e-book standard formulated by the International Digital Publishing Forum) file, for example, can be unzipped via a zip utility tool or opened by a third party software which allows you to see inside the file. Once unzipped, you will discover that an ePub actually has metadata within it.

You can find the following metadata stored inside most ePubs:

  1. Subject (and BISAC code)
  2. Language
  3. Date of publication
  4. Date/Time this version was created
  5. Description
  6. Title
  7. Author
  8. e-ISBN

This metadata stays with the book wherever it goes, and it can improve the discoverability of a book depending on the online venue. Some online sellers, like Amazon, will also require a publisher or author to input additional metadata (usually with limitations or restrictions in terms of how much metadata you can include) upon uploading the book. Others may simply use the existing metadata included with the e-book.


We have, thus far, stressed the importance of metadata and its relationship to discoverability. What we have not sufficiently touched upon are the best ways to use and utilize metadata in order to build that relationship. Below are some recommendations concerning the use of metadata:

  • Use BISAC Codes and Subject Headings List to your advantage. Most publishers are already familiar with the BISAC Subject Heads List, the Book Industry Study Group’s (BISG) standard used by most companies and book distributors throughout the supply chain to properly categorize books based on their topics and content. If you are not familiar with this standard, you should make learning about it a priority. The correct BISAC information is crucial to making sure your book is categorized properly.
  • Research your venues. As mentioned earlier, all online channels handle metadata in different ways. Amazon, for example, has their own metadata guidelines and has certain limitations in place regarding the use of metadata. Before sending a book off to distributors or uploading to the various online venues, make sure you know their guidelines, restrictions, and preferences for metadata. Some venues will allow more than others, and others may even remove your book from its listings if you do not follow their guidelines.
  • Use metadata without overusing metadata. Probably the biggest mistake made by publishers and self-publishing authors with metadata is overuse. That is, providing much more metadata than is necessary, and, in most of those cases, providing excess metadata which is not necessarily accurate. This overuse can be just as harmful as providing too little metadata. In the case of metadata, think of the old cliché “quality over quantity.” You want your book to be accurately categorized based on its subject matter, with the location true to the topics of the book and the expectations of the readers. Overuse and misuse of metadata will likely hurt book sales in a number of ways, with the most common results being reader disappointment, negative reviews, and, ultimately, the suppression of the book from the site. For example, your fictional novel cannot (at least accurately and honestly) be categorized as “POL04900 POLITICAL SCIENCE / Propaganda” just because your main character watches Triumph of the Will in school.
  • Find and use keywords. Just as keywords are important for website search engine optimization (and the discoverability of websites), they are equally important for your books. The first step is knowing your keywords, and the next is incorporating them. At the very least, keywords can be included in the description or summary of the book, but many online booksellers will have a separate field just for your keywords. The same rules of quality and quantity apply here as well.
  • Learn about data algorithms for recommendations. If you do any online shopping for books or other projects, you have likely seen recommendations provided to you. All of the above points apply to this point, as they can all affect whether or not your book is recommended to readers, and where it is recommended. While most booksellers are not transparent about how they use their data algorithms to generate reader recommendations, you can learn about them and have a very good understanding of how they work. Your book metadata is, of course, crucial to their data algorithms, as it is the primary source of information about your book.

Thinking of metadata as a distant afterthought, or ignoring it completely, is a mistake. It can lead to your book being miscategorized by distributors or by booksellers, which can make the book difficult to find and can damage sales. Metadata is a massive, ever-changing term, and the long-winded breakdown above is only the tip of what is a very large (and growing) iceberg.

Michael Maher is the Lead Electronic Book Developer at Scribe Inc. He manages complex XML conversion and electronic book projects, performs quality control checks at the XML and e-book stages, and trains employees and clients on XML, HTML, CSS, and e-books. He oversees Scribe's social media channels and is involved in the development of ScML and the Well-Formed Document Workflow.