Documentation

Regular Expressions Resource

Regular expressions from the checklists on scribenet.com are presented here with minimal context. These can be run on .sam and .scml files.

Quotation Marks, Parentheses, and Brackets

Find: ^[^“]*\”|\“[^”]*\“|\”[^“]*\”|\“[^”]*$

Find: “—|”—|—“|—”

Find: ^[^(]*\)|\([^)]*\(|\)[^(]*\)|\([^)]*$

Find: ^[^[]*\]|\][^[]*\]|\[[^]]*$|\[[^]]*\[

Find: ”([^ \)\<\]\:;\?\&\/—])|([^ \(\>\[—])“

Punctuation

Find: ([\!:;,\.\?])\1

Find: ([\[\(“‘])( )|( )([\.,:;\?!\)\]’”])([^\x{a0}])

Find: ([ \x{a0}])\.([ \x{a0}])\.([ \x{a0}])\.([ \x{a0}])\.

Find: ([^A-Z][\.\?\!”])([A-Z])|([,:;\)])([A-Za-z“])|([”])([a-z])|([a-z\>])\(|,”,|Scribe, Inc

Find: ([a-z0-9])</(p|pf|psec|paft|pcon|rf|rf1|rf2|rff)>\n|</([ib])></(rf|rf1|rf2|rff)>\n

Find: <([^>/]*)>[^A-Za-z0-9\n]?</\1>

Unexpected Character Patterns

Find: --|([a-z0-9]+)\||- -|'|“ | ”|^( *)(<[^\n]*?)( ){2,}|\),[0-9]

Find: ([\d]+)([\x{2013}\x{2014}-])([\d])([\x{2013}\x{2014}-])([\d]+)([\x{2013}\x{2014}-])([\d]+)([\x{2013}\x{2014}-])([\d])

Find: [A-Za-z]<i>[A-Za-z]|[A-Za-z]</i>[A-Za-z]

Spaces

Find: ( )(\x{a0})|(\x{a0})( )

Find: ([ \x{a0}])(\t)|(\t)([ \x{a0}])

Find: ( )<([ef]nref)([^>]*>[^<]*</\2>)

Find: ^( *)(<[^>]*>)( )|( )$

Find: ([^ ]\||\|[^ ])

Find: ^( *)(<[^\n]*?)( ){2,}

Incorrect Line Breaks

Find: ^[ ]*<[^>]*>[a-z]

URLs

Find: (<url( href[^>]*)?>[^<]*)([\x{2013}\x{2014} ])

Find: <url>([ \.\(\[])|([ ,\.\)\]])</url>

Find: ([A-Za-z0-9\.\-:/]+\.(?!jpg|tif|eps|png|svg|jpeg)[A-Za-z]{2,})([^ <"\n]*[^ ><"”'’\)\],;:\.–\n—\?])?

Find: ([^ \<\"\>])http

Find: ([ ><"“'‘\(\[–\n—])(@[a-zA-Z0-9_]{1,15})

Special Characters

ISBNs

Angle Brackets

Find: &#x3e;|&#x3c;|&#62;|&#60;|&gt;|&lt;|<<|>>

Typesetter Spaces

Find: &#173;|&#819[2-9];|&#820[0-4];|&#8239;

Find: &#x00AD;|&#x200[0-9A-C];&#x202F;

Find: [\x{ad}\x{2000}-\x{2009}\x{200a}-\x{200c}\x{202f}]

Find: [^\.](&#160;|&#x00A0;|\x{a0})[^\.]|(&#8205;|&#x200D;|\x{200d})

Hyphen Spacing

Find: - | -

Incorrect Hyphenation

Find: [A-z]+-[A-z]+

Missing Spaces Around Tags and Commas

Find: (</[^>]+>)([A-Za-z]+)|([A-Za-z]+)<(?![eft]nref|page)([^/][^>]*)>

Find: ,([A-z0-9]+)([^ \n]*)

Find: (<in[12f]*>)(.*),[A-z0-9]

Small Caps

Find: <[^>]*sm[^>]*>[^<]*</[^>]*sm[^>]*>

Tetragrammaton

Find: <[^>]*tetr[^>]*>[^<]*</[^>]*tetr[^>]*>

Self-Closing Note Reference Tags

Find: <([fe])nref/>|<([fe])nnum/>

Self-Closing and Unnecessary Tags (.sam/.scml)

Find: <(?!cell|img|page)[^<]*/>|</([^>]*)>[^A-Za-z0-9\n]?<\1>|<([^>/]*)>[^A-Za-z0-9\n]?</\2>

Index Section (.scml files)

Position of Tags and Spaces (.sam/.scml)

Find: ^( *)(<[^\n]*?)( )(</[^>]*>)|(<[^/|^>]*>)( )

Page IDs (.sam/.scml)

Find: (<xref.*?>.*?)(<page id=".*?"/>)(.*?</xref>)|(</url>)(<page id=".*?"/>)(<url>)

Find: <([^/>]*)>(<page[^>]*>)</\1>

Find: [a-z]+<page id="([^<]*)"/>[a-z]+

Page references (.scml)

Find: [^</][Pp]age

Single-Chapter Bible Books

Find: (<xbr t=")(Ob|Phm|2Jn|3Jn|Jud|Pr Az|Bel|Sus|Pr Man|LJe)( )([2-9]|[0-9]{2,})(:)([0-9-]+")