Many Scribe procedures include a step to perform text checks in a sam or ScML file using regular expressions. These search patterns may indicate errors in files. This page provides more information about how and why to review files for errors and inconsistencies related to italic terms, phrases, and titles.
<(i|tnw|cite)>([^<]*)</\1> search can be found on the Regular Expressions Resource Page and will return results as part of the Sublime Text Regular Expression Result Counts package.
Using a sam or ScML file, run this regular expression to pull all the italic words and phrases from a file. Sort and review the results. In addition to the i style for italics, this search includes the accessibility styles tnw (title, name, or work) and cite (citation).
This search identifies italic text in the file, some of which could represent titles or names of newspapers, academic journals, and other publications. Other italic text may be foreign or scientific terms, emphasized phrases, or ship names.
There are many reasons to handle elements consistently in publications: Deviations make it difficult to find information, tripping up searches in environments such as e-readers that may not be as sophisticated as Google. While some programs can compensate for certain variations, differences in spelling and punctuation or the use of ligatures can affect the results a search may find.
Regularized, consistent treatments make it easier to augment information at later times. The promise of XML is to create an archival extensible file. With this in mind, variation (be it in structure or the content) is in opposition to XML. Indeed, Scribe would make the case that consistency across an imprint is even more desirable than merely within single books. Thus, for example, we would suggest index and abbreviation canons for publishers to employ across all their titles.
Solidifying a manuscript by cleaning up all possible errors prior to pages is a requirement for the successful deployment of the ScML2PDF process. Additionally, all our data demonstrate a reduction of effort the earlier up the chain things are fixed.
Of course, the reader’s experience should be considered as well. An inconsistency in the presentation of titles (e.g., using abbreviated titles that do not reflect the full intent of the book title) can be jarring and distracting. Anything that distracts a reader or requires an extra act of interpretation can be detrimental to the reading experience and hinder the goal of the publication.
Sublime Text Search
Using Sublime Text on a sam or ScML file, search for
<(i|tnw|cite)>([^<]*)</\1>using Find All.
Paste the results into a new document.
Use the Permute (Unique) function to remove duplicate entries (Edit > Permute Lines > Unique).
Use the Sort Lines function to place the results in alphabetical order (Edit > Sort Lines).
Review the Results
Scroll through the results and take note of any terms, phrases, or titles that may be incorrect or inconsistent within the file.
Shortened titles that are not sensible or do not match the corresponding aspect in the full title
Punctuation errors (e.g., quotation marks, em dashes, parentheses, or brackets that should open and close with the same italic or roman treatment; punctuation on abbreviations and acronyms that should be included within the italic treatment, as in “<i>120 lb</i>.” or “<i>The Man from U.N.C.L.E</i>.”)
Spacing errors (e.g., the treatment of initials)
Inconsistencies between new content (e.g., indexes, praise pages) and the existing, edited material
In some cases, apparent inconsistencies are intentional and correct.
Citation formatting vs. the presentation in the body of the book (e.g., one section may use sentence case while another uses title case)
Instances within quoted material
Occurrences within the book that discuss the different terms or treatments specifically
Small Caps, Bold, and Other Styles
While italics are the most common place to find inconsistencies in the presentation of terms, phrases, and titles, some books may use small caps, bold, or other character styles that should be reviewed for these issues.
If needed, pull those phrases by modifying the italics search by replacing the style name in the first set of parentheses.
Examples of Possible Errors
These examples show possible errors. The search results could represent completely different books, updated editions, subsequent volumes, and so on. These aspects cannot be determined without context, but this search provides a good basis for further investigation.
Example 1: Inconsistent Pluralization
Civil War West: Testing the Limits of the United States
Civil War Wests: Testing the Limits of the United States
Example 2: Inconsistent Punctuation
Portland Oregon: Its History and Builders
Portland, Oregon, Its History and Builders
Portland, Oregon: Its History and Builders
Example 3: Inconsistent Capitalization
Report of the Adjutant General of the State of Oregon, For the Years 1865–6
Report of the Adjutant General of the State of Oregon, for the Years 1865–6
Example 4: Inconsistent Treatment of Quotation Marks
“<i>indirect empathy”</i> for participants
Example 5: Inconsistent Spelling
Music & Letters
Music and Letters
Note: Ampersands (&) will appear as the coded & in sam and ScML files.