Documentation

Language Scribing

Language Scribing Overview

Web Content Accessibility Guidelines (WCAG) AA accessibility requires that ebooks mark when a language shifts within a book. This helps screen readers and other assistive technology read the content without jarring and incorrect pronunciation. “Proper names, technical terms, words of indeterminate language, and words or phrases that have become part of the vernacular of the immediately surrounding text” are all exempt from this requirement.

Languages that use non-Latin scripts (e.g., Hebrew, Arabic, Chinese, Greek) can frequently be identified using the Scribe Language Styles setting in the Digital Hub. Languages that use the Latin alphabet often need to be identified through manual actions.

The Well-Formed Document Workflow includes methods to mark these language shifts in different stages of a project. Ideally, this action takes place when preparing a manuscript in Word. In a .docx file, languages can be marked by manually creating a new Word character style with a name that combines an ScML style and an established language code.

  • Pattern in Word: [ScML Style]@lang=[Language Code]
  • Example Style Name: lang-i@lang=es

Language codes generally consist of two or three letters, determined by the BCP-47 standard. Some of the most common language codes are listed on the BCP-47 Wikipedia page. The subtag lookup tool can be used to find thousands of additional language codes. If a language has no corresponding code, Scribe recommends scribing this content as lang or lang-i with no additional code.

The metadata and language styles can be added in a Word document, a sam file, an ScML file, or an InDesign document. If added to sam, ScML, or InDesign, this metadata will travel through the Well-Formed Document Workflow.

This example shows how the metadata for Spanish-language italic text could be identified in Word and carried through to sam, ScML, and InDesign. In each environment, the formatting of the style name is slightly different.

  • In Word: lang-i@lang=es
  • In sam/ScML: <lang-i lang="es">
  • In InDesign: lang-i-language-es

Note: Hyphenated language codes, including region subtags (en-US, en-GB), are not completely supported throughout the WFDW. The language codes must be entirely lowercase. If this level of specificity is required, region subtags can be added at the ScML stage before converting to ebook.

Procedure for Language Scribing in Word

  1. Create the necessary language styles needed for the project in Word.
    • Deselect all text to prevent the new style being applied to existing text.
    • Navigate to the Home tab and click the icon in the lower corner of the style group to open the Style Pane.
    • In the Style Pane, click the New Style icon (a capital letter A with a plus symbol).
    • Name the new style according to the language pattern described. Change the style type to “Character.” Change the Style to be based on the existing ScML Style (e.g., lang-i@lang=es should be based on lang-i and gt@lang=fr should be based on gt). Save the new style.
  2. As part of the scribing process, review all italic text for phrases that need to have language metadata applied. Apply the language styles created.
  3. Apply additional language styles as needed (e.g., apply lang@lang=es to Spanish text using the default paragraph font, or gt@lang=es to Spanish-language glossary terms).

Note: The use of lang-i in bibliographies can prevent the bibliography tools from working as expected. If language scribing is needed in a bibliography, Scribe recommends applying the styles after copyediting this section.

Procedure for Language Scribing in sam

Review Special Characters

Review the special characters list in the Digital Hub for languages that fall outside the Latin alphabet. These can be searched for within Sublime.

Review Character Styles

Review italic terms, various “-i” styles, and various lang terms in a new Sublime file.

Find: <i>[^<]+</i>|<[^>]+-b?i>[^<]+</[^>]+-b?i>|<lang[^>]*>[^<]+</lang[^>]*>|<[^>]*lang[^>]*>[^<]+</[^>]*>

Copy into a new file, permute unique lines, remove English text, and filter out proper names. Add lang attributes as needed to the original file.

Review Paragraph Styles

Review block quote (bq) and senseline (sl) paragraphs as common places to identify if there are full paragraphs in another language. Check other paragraph styles as needed.

Find: <[^>"]*(bq|sl)[^>]*>[^\n]+

Copy into a new file, turn off word wrap, and skim for non-English text.

Review Book-Specific Styles

Certain books such as Bibles or language books may have additional paragraph or character styles that are being used to identify languages. Review additional content for languages based on the type of publication.