So far, the examples you've seen have been concerned only with finding chapter headings wherever they occur. Any occurrence of the string 'Chapter' followed by a space, followed by a number, could be an actual chapter heading, or it could also be a cross-reference to another chapter. Since true chapter headings always appear at the beginning of a line, you'll need to devise a way to find only the headings and not find the cross-references.
The Purpose of Anchors
Anchors provide that capability. Anchors allow you to fix a regular expression to either the beginning or end of a line. They also allow you to create regular expressions that occur either within a word or at the beginning or end of a word. The following table contains the list of regular expression anchors and their meanings:
Character | Description |
---|---|
^ |
Matches the position at the beginning of the input string. If the RegExp object's Multiline property is set, ^ also matches the position following '\n' or '\r'. |
$ |
Matches the position at the end of the input string. If the RegExp object's Multiline property is set, $ also matches the position preceding '\n' or '\r'. |
\b |
Matches a word boundary, that is, the position between a word and a space. |
\B |
Matches a nonword boundary. |
You cannot use a quantifier with an anchor. Since you cannot have more than one position immediately before or after a newline or word boundary, expressions such as '^*' are not permitted.
To match text at the beginning of a line of text, use the '^' character at the beginning of the regular expression. Do not confuse this use of the '^' with the use within a bracket expression.
To match text at the end of a line of text, use the '$' character at the end of the regular expression.
To use anchors when searching for chapter headings, the following JScript regular expression matches a chapter heading with up to two following digits that occurs at the beginning of a line:
Copy Code | |
---|---|
/^Chapter [1-9][0-9]{0,1}/ |
For VBScript the same regular expressions appears as:
Copy Code | |
---|---|
"^Chapter [1-9][0-9]{0,1}" |
Not only does a true chapter heading occur at the beginning of a line, it is also the only text on the line, so it also must be at the end of a line as well. The following expression ensures that the match specified only matches chapters and not cross-references. It does so by creating a regular expression that matches only at the beginning and end of a line of text.
Copy Code | |
---|---|
/^Chapter [1-9][0-9]{0,1}$/ |
For VBScript use:
Copy Code | |
---|---|
"^Chapter [1-9][0-9]{0,1}$" |
Matching word boundaries is a little different but adds a very important capability to regular expressions. A word boundary is the position between a word and a space. A nonword boundary is any other position. The following JScript expression matches the first three characters of the word 'Chapter' because they appear following a word boundary:
Copy Code | |
---|---|
/\bCha/ |
or for VBScript:
Copy Code | |
---|---|
"\bCha" |
The position of the '\b' operator is critical. If it is positioned at the beginning of a string to be matched, it looks for the match at the beginning of the word; if it is positioned at the end of the string, it looks for the match at the end of the word. For example, the following expressions match 'ter' in the word 'Chapter' because it appears before a word boundary:
Copy Code | |
---|---|
/ter\b/ |
and
Copy Code | |
---|---|
"ter\b" |
The following expressions match 'apt' as it occurs in 'Chapter', but not as it occurs in 'aptitude':
Copy Code | |
---|---|
/\Bapt/ |
and
Copy Code | |
---|---|
"\Bapt" |
The string 'apt' occurs on a nonword boundary in the word 'Chapter' but on a word boundary in the word 'aptitude'. For the \B nonword boundary operator, position is not important because the match is not relative to the beginning or end of a word.