OCR Standards

When OCR is performed, the resulting file is expected to meet the following standards at 99.9 percent accuracy to reproduce the content from the source materials. Files will be reviewed using Scribe’s OCR Verification procedure.

Word Document

Provide text as a Word document, matching the source materials in all aspects.

Reproduce all content exactly. This includes word errors such as misspellings as found in the source materials.

Preserve Character Styles

Preserve all character styles as they appear in the source file.

  • Capture small caps, italics, superscript, and other character rendering using the appropriate font features of Microsoft Word.
  • Do not apply Microsoft Word’s character styles feature.


Requirements for character accuracy include the following:

  • Use Unicode characters throughout.
  • Maintain characters with accents as single Unicode entities.
  • Maintain foreign language characters.
  • Maintain em dashes, en dashes, and hyphens.
  • Maintain punctuation (and spacing around punctuation).
  • Maintain ellipses.

Paragraph Integrity and Content Order

Paragraph breaks must match those in the source. Key aspects include the following:

  • Maintain paragraph integrity across pages.
  • Do not use soft returns.
  • Do not break up a paragraph to insert content such as images, tables, or sidebars.

Place block elements such as tables, sidebars, boxes, figures, and editor’s notes after the paragraph they may have broken unless that will put them on a new print page (i.e., following the subsequent page’s ID marker). If this placement would result in the element appearing after the next page’s ID, place that element before the paragraph.

Footnotes and Endnotes

Capture footnotes using Microsoft Word’s footnotes feature.

Capture endnotes using Microsoft Word’s endnotes feature.


Resolve line-breaking hyphens.

  • Remove hyphens used to indicate a word break over two lines and reconnect the word.
  • Retain hyphens in words and phrases where hyphens are required.


Image Callouts

Insert image callouts and all associated image captions at the nearest paragraph break after the content appears in the source.

Insert images using the format {~?~IM: insert projectname-p#.jpg here.} where “#” is the page number on which the image appears.

Capture figure captions as text.

Image Files

If specified for a project, capture images with the following settings:

  • Format: .jpg
  • Resolution: 120 dpi minimum
  • Size: 600 px minimum on the largest size.
  • Color: Provide color images as RGB; provide black and white images as grayscale.
  • File name: projectname-p#.jpg (where “#” is the page number on which the image appears).

Provide full-size, high resolution source images in addition to processed images.


Capture table data using Microsoft Word’s table feature.

Insert tables and all associated table notes and captions at the nearest paragraph break after the content appears in the source.

Page Numbers

Insert page numbers.

  • Format: {~?~PG: @#@} (where “#” is the page number).
  • Insert the page numbers at the exact point where pages begin.
  • If a word or URL is hyphenated across a page in the source, insert the page number after the word or URL.
  • If an image occurred at the top of a printed page but is now called out after the paragraph it interrupted, place the page ID with the paragraph content, not the image or caption.