Page ID Placement and Reading Order

When converting files from a print environment to a digital environment, one of the key steps is to place content, including page IDs, into the proper reading order. This task typically takes place in a .sam file, but it can also occur at the .scml adjustment stage.

Scribe Tools for InDesign

When exporting XML with Scribe Tools for InDesign, content outside of the main text flow is anchored based on best-guess assumptions to determine the likely location for that material. The anchoring tool cannot read for context, only for location in relation to paragraphs in the body text. It is useful for arranging the content in an automated process, but it cannot replace expert judgment. Further review is required to determine the best location for reading order in an electronic environment.

The Export .sam from InDesign tool also places page ID markers throughout the file. Page IDs can only carry over when connected to text, so they will often need to be moved from a <figh> paragraph up to a preceding <fig> line.

Content Placement

In a print book, the typesetter is limited by the physical space available to place illustrative material such as images, tables, and sidebars. Scribe’s default rule is to place images at the top or bottom of a page, as near to the in-text reference as possible, after that reference. In some cases, an image in the print version may appear a few pages after it is first referenced. This is not a factor, however, in the electronic version. When placing content for best reading order, items that had to be spread out in the print book can be placed at a more appropriate location.

In an ideal scenario, the text of a book provides guidance as to the proper placement of illustrative material. Referring to tables by name (e.g., “see Table 1.1”) indicates that the table should be placed after the conclusion of that paragraph. Without specific in-text references, one can use key phrases or terms, as well as the typeset files, to guide the decision about where to place a table. The original manuscript may also be used to find where a note may have been made about intended placement.

When placing illustrative material, another factor to consider is if the table, sidebar, or figure would interrupt a thought. For example, a figure should not be placed after a paragraph that leads into a list, nor would a figure typically be placed between list items. In an e-book, the reader encounters all content in order, so aspects like paragraphs ending with colons, commas, or no punctuation at all should be considered when choosing a best location for figure placement—even if the figure is first mentioned in that paragraph.

ID Placement

In a print book, page numbers refer to an exact page, regardless of whether that page is designed to show a number.

In an e-book, the concept of a page is more fluid. There is no limit to the amount of content that can occur between one page ID and another, so any number of tables, figures, and sidebars can be placed at a desired location.

When free from the physical limitations of a print book, electronic content may shift radically in comparison to its typeset counterpart. When this happens, Scribe’s default is for the first piece of content that the reader encounters to hold the page ID.

The following guidelines use figures (<fig>) as examples:

  • If a figure is moved to appear between paragraphs on a preceding page, the page ID should travel with the <fig> line for the page on which it originally appeared.

  • If a figure that occurs at the top of a page does not move at all, remaining between a paragraph that ends on the preceding page and a new paragraph that starts beneath it on the current page, the page ID should remain with the <fig> line.

  • If a figure that occurs at the top of a page is moved to appear after a paragraph on its current page, the page ID should be moved to precede the first word in the body paragraph that occurred at the page break.

Additional guidelines for tables, “continued” paragraphs, URLs, and blank pages:

  • If a table crosses from one page to another, table heads and column heads may be repeated in the print version. In the electronic version, these repeated heads are unnecessary and should be removed. When doing so, the page ID should be kept with the first piece of content in the first cell that occurred at the page break.

  • If an index entry crosses from a recto page to a verso page in the print version, and a “continued” line was included, that line should be removed. For a run-in index, the index paragraph should be reconnected, and the page ID should remain where the page break occurred. If it is not a run-in index, the page ID should be moved to the start of the subentry paragraph that occurred at the page break.

  • If a URL crosses over a page and is interrupted by a page ID, the page ID should remain where it is, but the full href link should be applied to both portions of the URL.

    Find: <url>([^<]*)</url><page id="p([0-9A-z]+)"/><url>([^<]*)</url>
    Replace with: <url href="\1\3">\1</url><page id="p\2"/><url href="\1\3">\3</url>

  • Where a blank page occurred in the print version, the page ID should be placed immediately preceding the page ID for the next page that includes content. Page IDs can be deleted only if they occurred at the very end of a file.

Tip: Consider the reading behavior in print vs. electronic books. When a page is listed in an index, for example, the reader can turn to that page in a print book. In an e-book, index numbers link to the corresponding page IDs. These links should not take the reader to a point that is after the indicated content.