You Must Implement an XML Workflow

By Neil Litt of Princeton University Press


Princeton University Press produces multivolume, heavily illustrated textbooks filled with tables and references. When I started over twenty-five years ago, I considered a markup-based workflow to capture the structure of a document so it could be delivered through different applications and not be dependent on having to lay out the data in a particular format.

In the past, there were too many serious shortcomings for anyone publishing upper-level textbooks. An overwhelming number of technical problems still exist. Even so, we chose to adopt an XML workflow at Princeton. The benefits extend beyond technical capacity: They directly address our editorial and production process. I don’t regret implementing XML.

Before offering an explanation about why a structured workflow is so important, I would like to present a little history. About twelve years ago, it looked like e-books were going to take off. The Rocket e-book reader looked like it would be the next big thing. NetLibrary, among others, was banging on our door. We hired a marketing executive whose sole focus was to negotiate the “electronicizing” of our content, and she spent the next eighteen months painstakingly negotiating contracts with library aggregators and vendors like Microsoft and Amazon. Anyway, e-books went nowhere, and we shut down our e-book initiative.

Then in 2007, the Kindle and the Sony e-readers got the buzz going all over again. The same year, ePub became the official international standard for e-books. XML seemed to be the gateway to produce not only quality ePub files but also online scholarship through library aggregators and e-textbook distributors like Oxford Scholarship Online and—more recently— JStor. XML also seemed to be the gateway to the chunking of academic books that would surely be the foundation for the next generation of course packs—all good reasons to move to XML workflow.

So we began a process to create an XML workflow that would be the least disruptive to our print production. First, there were a few systems we needed to look at more closely. I invited different systems’ representatives to the Press to address a working group I put together that included copyeditors, production editors, and designers. Our group interrogated each representative on the different data structures we regularly encountered as we published a wide range of books—including social science, applied mathematics, upper-level textbooks in mathematics and the physical sciences, humanities monographs, and natural history field guides. Representatives needed to field questions such as, “How do you code poetry embedded in a footnote?”

We determined that the optimal workflow for us would begin with the receipt of the manuscript from the author. The production editor would tag every element of the manuscript in MS Word before passing it on to the copyeditor. The designer would define corresponding tags in InDesign before passing it on to the compositor. Once we chose the vendor whose tagging system seemed most capable of handling our diverse materials, we invited the typesetters to whom we outsourced much of our composition, and the freelance copyeditors who worked on most of our manuscripts, to participate in two days of training, either in person or via webinar. Our staff’s training involved two additional sessions. The outside typesetters and copyeditors were very accepting of the process; they positively embraced it.

Our digital asset manager accepted our files and gave us a significant discount for providing XML. E-book sales increased exponentially and are now 10 percent of our sales and rising. The process does not work with every e-book and is certainly not a panacea for solving technical problems.

Our digital asset manager, who gave us a substantial discount for providing XML, never told us directly that they were not actually using the XML. Why? Because we, as an industry, were not all using the same DTD. Thus all their conversions (all outsourced) had to have a PDF-to-ePub workflow. Also, for scholarly books, having real page numbers accessible in the e-reader was and is an essential data point for researchers. Since our XML is incomplete in the sense that it does not tag the page breaks, the PDF, which does, provides an essential piece of information that the XML cannot currently retain. Of course, we thought XML would solve many problems like this, but it does not.

So why bother to use an XML workflow? It achieved some other very desirable outcomes unrelated to our reasons for initially implementing it. XML workflow got everyone on the same page. Now, there is consistency in formatting from start to finish where, previously, there was every kind of ad hoc coding throughout the process. Most importantly, we arrived at a standard process within InDesign. Now everyone wants to work with the consistent IDTT files—the designers, who no longer have to write specs because a properly prepared IDTT file is better than a dozen pages of type specs, and the typesetters, who were previously receiving those type specifications written a dozen different ways. This process has also stimulated a heightened consciousness in the designers to anticipate and account for all possible combinations of elements. This, in turn, results in fewer problems at the proof stage.

The production editors have only positive things to say about their own heightened awareness of the structure of text that results from their tagging the manuscripts in preparation for editing. It is also easier to extract Word files with styles intact from consistently prepared InDesign files when the need arises to return files to the authors for revised editions. Someday all these structured documents will be in our archive, ready for the next new content delivery device. It is an investment in an uncertain future that is also reaping unanticipated benefits in the present.