Understand XML First

By David Alan Rech of Scribe Inc.

Published

It is encouraging that publishers are discussing XML-first strategies. Having a structured publication that you can easily convert from one form to another is essential for publishers’ survival.

Augmenting our publications with metadata and semantic tagging is also helpful. Yet XML-first strategies are still being confused with the technical process of applying left and right angle brackets to content. The fact that many still think that books must be "tagged" in order to be XML is problematic. The presence of tags in the manuscript interferes with the editorial and production process: thus confusing XML with tagging invites error. Moreover, the focus on the technological application of tagging instead of the methodology of XML draws our attention away from the practices that result in good XML and efficiency. If we do not overcome these problems, we may fail to realize the promise of XML and jeopardize our survival.

Even with macros, keyboard shortcuts, or other time-saving methods, the application of left and right angle brackets is disruptive to the publishing process. It interferes with the copyediting process and increases error rates. Even the most tech-savvy, experienced editors have a difficult time reading around code. At best, the presence of left and right angle brackets slows copyediting, stunting the flow of the sentence and masking punctuation issues. These errors introduced in the copyediting stage result in lengthier proofing and corrections time, not to mention mounting frustration. Those who use track changes features (e.g., Track Changes in Microsoft Word) with codes expressed literally in the text see an increase in introduced errors and time accepting and rejecting changes. Those who do not utilize these tools have all of the associated problems while losing the ability to easily control changes. Further, authors are almost universally ill equipped to read coded materials, adding another layer of difficulty. Finally, to print books, the codes must be turned off or replaced in the typesetter files.

Merely having a tag does not mean that you can take advantage of XML. If publications are not consistently structured or if there is variation from one book to another, then the value of XML is lost. You cannot batch process books that do not share the same characteristics, "chunk" content that does not match, or automatically derive consistent publications from inconsistent markup. The pursuit of an e-first or e-only strategy is also facilitated when material can be subjected to regular algorithmic functions.

To achieve consistency requires that everyone involved in editing and producing your books reaches an agreement on the structure of your content. It requires consistent nomenclature and practice (i.e., consistently following the rules). The production of well-formed documents is the result of good practices, communication, project management, documentation, style guides, and the like. In other words, XML is the result of focused training and education—not merely technology. Only when your publishing house can produce consistent, well-formed, publications can you derive publishable left and right angle bracketed content from an automated process.

In order to implement an XML-first strategy, you must understand XML first.