QeditQEDIT Searching With Regular Expressions
In addition to searching for a simple, precise string (or pattern) of
characters, the editor also allows you to indicate a complex or ambiguous
search pattern, called a Regular Expression. The [X] (regular eXpression)
search option allows you to define a Regular Expression search pattern.
By using Regular Expressions, you could, for example, find all occurrences of words in a file beginning with "b" or "B", and ending with "ing" (such as being, Beginning, bring). Or you could change a group of names listed as first name followed by last name (example: Kari Hood), to a list composed of last name followed by first name, and separated by a comma (example: Hood, Kari).
Regular Expressions are composed by combining simple character patterns with special operators to create a complex search pattern. Operators in Regular Expressions allow you to: limit a match to specific characters or a Class of characters; broaden a match to any character; optionally match a pattern; indicate a repeated pattern; and specify alternate patterns to match. Matches across line boundaries, however, are not supported.
The following list identifies and describes the operation of the operator symbols used to create Regular Expressions. (A "Summary List of Regular Expression Operators" is provided at the end of this section.)
Regular Expression Operators ────────────────────────────
Symbol Regular Expression Operation ______ _____________________________________________________________________
. In a search pattern, matches any single character. (This does not
match the end-of-line position.)
Example:
Search pattern: wh.t
matches a string beginning with the letters wh, followed by any
single character, followed by the letter t (such as, what or
whet or wh t (all on one line), but NOT wht or wheat)
^ In a search pattern, anchors the search for the sub-pattern that
follows, to the beginning of the line (column 1); or, if a block is
marked and the [L] (Local) option is specified, anchors the search to
the beginning column of the block on a line.
Example:
Search pattern: ^This
matches the string This beginning in column 1
$ In a search pattern, anchors the search for the preceding sub-pattern
to the end of the line; or, if a block is marked and the [L] option is
specified, anchors the search to the ending column of the block or to
the end of the line, whichever comes first.
Example:
Search pattern: that$
matches the string that occurring as the final text on a line
| "Or" operator: in a search pattern, matches the preceding or the
following sub-pattern.
Example:
Search pattern: licens|ce
matches the strings license or licence
The order of precedence in searching for the indicated patterns is
the order in which they are listed in the search string. Thus, if
one shorter pattern (such as what) is included in another longer
"or" pattern (such as whatever), the longer pattern should be
listed first.
Example:
Search pattern: {whatever}|{what}
matches the strings whatever and what, and locates
occurrences of either string in the text (Use of { } symbols
is explained later in this section.)
HOWEVER:
Search pattern: {what}|{whatever}
never locates the full string whatever, since the search is
immediately satisfied by what
? In a search pattern, optionally matches the preceding sub-pattern.
Example:
Search pattern: colou?r
matches the strings color or colour
[ ] In a search pattern, identifies a Class of characters against which
to match a single character.
Within a Class, the case of characters is significant. (The [I]
Ignore-case search option does not apply to a Class.) If upper-case
and lower-case characters are to be included in a Class, both cases
must be specified.
Example:
Search pattern: [123aAbC?Create]
matches any one of the following single characters:
1 2 3 a A b C
[ - ] In a search pattern, indicates a range of characters (based on ASCII
sequence) when used BETWEEN characters in a Class.
The "-" symbol has no special meaning when it occurs as the first or
last character within the "[ ]" Class notation, or when used outside
the Class notation. In such cases, it is treated as the dash ("-")
character.
Example:
Search pattern: [0-9abc-]
matches any one of the following single characters:
0 1 2 3 4 5 6 7 8 9 a b c -
[~ ] In a search pattern, identifies a complement Class of characters to
match against a single character, when "~" is used as the first
character within the Class notation (immediately following the "["
symbol). It matches against the characters that ARE NOT in the
specified Class of characters.
The "~" symbol has no special meaning when it DOES NOT occur as the
FIRST character within the Class notation, or when used outside the
Class notation. In such cases, it is treated as the tilde ("~")
character.
Example:
Search pattern: [~0-2a=]
matches any single character OTHER than:
0 1 2 a =
Example:
Search pattern: [~ ] (This is ~ followed by a space.)
matches any single character OTHER than a space character
sub-pattern, with minimum closure. (See "Minimum/Maximum?Create Closure"
below).
Example:
Search pattern: ba*c
matches a string beginning with the letter b, followed by zero
or more occurrences of the letter a, followed by the letter c
(such as, bc, bac or baac)
Example:
Search pattern: wh.*t
matches a string beginning with the letters wh, followed by zero
or more occurrences of ANY character, followed by the letter t
(such as, wh t (all on one line), wht, what or wheat)
+ In a search pattern, matches 1 or more occurrences of the preceding
sub-pattern, with minimum closure. (See "Minimum/Maximum?Create Closure"
below).
Example:
Search pattern: ba+c
matches a string beginning with the letter b, followed by one or
more occurrences of the letter a, followed by the letter c
(such as, bac or baac, but NOT bc)
@ In a search pattern, matches 0 or more occurrences of the preceding
sub-pattern, with maximum closure. (See "Minimum/Maximum?Create Closure"
below).
Example:
Search pattern: ba@c
matches a string beginning with the letter b, followed by zero
or more occurrences of the letter a, followed by the letter c
(such as, bc, bac or baac)
sub-pattern, with maximum closure. (See "Minimum/Maximum?Create Closure"
below).
Example:
Search pattern: ba#c
matches a string beginning with the letter b, followed by one or
more occurrences of the letter a, followed by the letter c
(such as, bac or baac, but NOT bc)
Example (to find identifiers in many programming languages):
Search pattern: [a-zA-Z0-9_]#
matches any sequence of one or more characters in the Class
consisting of all lower and upper case letters, all numbers, and
the underscore.
{ } In a search pattern, serves as a Tag to identify a sub-pattern within
the full search pattern. Tagged patterns can be nested.
Tags are used to define a group of characters as a sub-pattern so
that an operator acts on more than one character or character Class.
Tags are also used to identify a sub-pattern within a Regular
Expression so the sub-pattern can be separately referenced in a
subsequent replacement. Tagged sub-patterns are implicitly numbered
from 1 through 9 based on the leftmost "{" symbol. The sub-pattern
number can be used within a replacement string to reference a tagged
sub-pattern, using the following format:
\n
where "n" is the actual sub-pattern number from 1 - 9 that
represents the appropriate tagged sub-pattern. To identify the FULL
search pattern, "n" is "0" (that is, \0).
Example (defining groups of characters as sub-patterns):
Search pattern: {Begin}|{End}File
matches either of the strings BeginFile?Create or EndFile?Create
NOTE (without the Tags):
Search pattern: Begin|EndFile?Create
matches the strings BeginndFile?Create or BegiEndFile?Create
Example (identifying sub-patterns for replacement):
Search pattern: {flip}{flop}s
Replace pattern: \2\1s
changes the string flipflops to flopflips
Example (identifying sub-patterns for replacement):
Rearrange the following list into last name, first name:
(1) Sammy Mitchell
(8) Steve Watkins
(15) Kevin Carr
Search pattern: {([0-9]#) +}{[a-zA-Z]#} {[a-zA-Z]#}
Replace pattern: \1\3, \2
changes the list to:
(1) Mitchell, Sammy
(8) Watkins, Steve
(15) Carr, Kevin
Example (identifying the full search pattern):
Search pattern: ^.*$ (any character(s) on entire line)
Replace pattern: "\0"
" encloses entire line in quotes"
\ In a search or replace pattern, serves as an Escape operator to
override a Regular Expression operator so the operator is treated as a
literal character. For a Regular Expression operator to be treated as
a literal character in a search or replace pattern, precede the
operator symbol with the Escape "\" symbol. Note that in a replace
pattern, the only symbol recognized as a Regular Expression operator
is the Escape symbol itself (as described in the following paragraph).
In a search or replace pattern, the "\" symbol is also used with
certain other letters and numbers to indicate specific characters or
values, such as a formfeed character or a hexadecimal value (as listed
below). Further, in a replace pattern, the "\" symbol is used to
reference a Tagged sub-pattern (see the explanation of "{ }" (Tags)
above).
The following examples show use of the "\" symbol as an Escape
operator.
Example:
Search pattern: abc\*\*
matches the string abc**
Example:
Search pattern: abc\\\*
matches the string abc\*
Example:
Search pattern: abc\\\*
Replace pattern: \\abc*
changes the string abc\* to \abc*
Note that within the "[ ]" Class notation in a search pattern, the
only symbols that are recognized as operators are "\", "-", "~", and
"]" (and then only when placed as indicated in their descriptions).
All other operators can be used literally in a Class notation without
the need for the Escape operator. Thus, to find the Class of
characters consisting of the question mark, the comma, the dollar
sign, and the dash, the Class can be designated as "[?,$-]" and DOES
NOT have to be designated as "[\?,\$\-]". (Note in this example that
the dash has not been placed BETWEEN characters within the Class
notation.)
The following notations, each beginning with the "\" symbol, are used to indicate certain control characters or to identify a character by its numeric value.
\a In a search or replace pattern, represents the alert (beep) character
(^G or ASCII 7).
Example:
Search pattern: \axyz
matches the string xyz following an alert character
\b In a search or replace pattern, represents the backspace character
(^H or ASCII 8).
\c In a search pattern, designates the placement of the cursor in the
located string when used with the Find command. By default, the
cursor is positioned on the first character of the located string.
However, if "\c" is used, the cursor is positioned on the character
immediately following this operator. If multiple "\c" operators are
included, the last one is used. This operator has no effect when
used with the Replace command.
Example:
Search pattern (with Find): Hello \cWorld!
matches the string Hello World! and positions the cursor on the
character "W"
\f In a search or replace pattern, represents the formfeed character (^L
or ASCII 12).
\n In a search pattern, represents the newline (line feed) character
(^J or ASCII 10). This operator is intended for use in Binary mode.
It locates only imbedded newline characters; it does not represent the
end-of-line position, nor does it cause a search to span lines.
In a replace pattern, \n causes the line to be split (as if
SplitLine?Create() was invoked) at the point in the replace string where
\n is specified.
\r In a search or replace pattern, represents the return character (^M
or ASCII 13). This operator is intended for use in Binary mode. It
locates only imbedded return characters; it does not represent the
end-of-line position, nor does it cause a search to span lines.
\t In a search or replace pattern, represents the tab character (^I or
ASCII 9).
\v In a search or replace pattern, represents the vertical tab character
(^K or ASCII 11).
\xnn In a search or replace pattern, represents the character that is
equivalent to the indicated hexadecimal value, where "nn" is a value
from 00 through FF that must be specified as a 2-digit number.
Example:
Search pattern: ^\x40
matches the string @ located at the beginning of the line
\dnnn In a search or replace pattern, represents the character that is
equivalent to the indicated decimal value, where "nnn" is a value from
000 through 255 that must be specified as a 3-digit number.
Example:
Search pattern: \d064$
matches the string @ located at the end of the line
\onnn In a search or replace pattern, represents the character that is
equivalent to the indicated octal value, where "nnn" is a value from
000 through 377 that must be specified as a 3-digit number.
Example:
Search pattern: \\xyz\o100abc
matches the string \xyz@abc
Minimum/Maximum?Create Closure ───────────────────────
Within the editor's Regular Expression search feature, you can indicate that you want a search to be satisfied either by zero or more, or by one or more, occurrences of a pattern, based on minimum or maximum closure. Minimum closure is achieved as soon as a string is located which contains the minimum set of characters that match a specified search pattern. Maximum closure is not achieved until the maximum set of characters is located that matches a specified search pattern. To satisfy either minimum or maximum closure, the editor searches for a matching string that is entirely contained on a single line. The symbols "*" (zero or more) and "+" (1 or more) are minimum closure operators; "@" (0 or more) and "#" (1 or more) are maximum closure operators.
To illustrate the difference between minimum and maximum closure in a search, consider the following:
Given the text: This_is_the_issue.
then:
Search pattern: Thi.*is (minimum closure, 0 or more occurrences)
matches the string This_is
and:
Search pattern: Th.*is (minimum closure, 0 or more occurrences)
matches the string This
HOWEVER:
Search pattern: Thi.@is (maximum closure, 0 or more occurrences)
matches the string This_is_the_is
and:
Search pattern: Th.@is (maximum closure, 0 or more occurrences)
also matches the string This_is_the_is
Note that if the LAST character of a Regular Expression is a MINIMUM closure operator, then the number of characters matched will be 0 characters for the "*" operator (0 or more occurrences), and 1 character for the "+" operator (1 or more occurrences). Typically, you would not use either "*" or "+" as the LAST character of a Regular Expression, since the same result can be achieved without the use of these operators. However, you may encounter situations where it is useful to include the MAXIMUM closure operators ("@" and "#") as the last character of a Regular Expression. For example, consider the following:
Given the text: abbbxyz
then:
Search pattern: ab* (minimum closure, 0 or more occurrences)
matches the string a (because zero occurrences of b satisfies the
minimum); thus, you can achieve the same result with the simple Search
pattern: a
and:
Search pattern: ab+ (minimum closure, 1 or more occurrences)
matches the string ab (and not abbb, because one occurrence of b
satisfies the minimum designated number of occurrences); thus, you can
achieve the same result with the simple Search pattern: ab
HOWEVER:
Search patterns: ab@ or ab# (maximum closure)
both match the string abbb (because either search is satisfied only
by the maximum number of occurrences of b)
ALSO NOTE:
Search patterns: ab*x or ab+x or ab@x or ab#x
ALL match the string abbbx
Here is another example distinguishing minimum and maximum closure:
Given the text:
They suggested we start singing something while sightseeing.
then:
Search pattern: s[a-z]*g (minimum closure, 0 or more occurrences)
matches the strings, sug, sing, something, sig, and seeing, as
indicated in bold in the following:
They suggested we start singing something while sightseeing.
HOWEVER:
Search pattern: s[a-z]@g (maximum closure, 0 or more occurrences)
matches the strings, sugg, singing, something, and sightseeing, as
indicated in bold by the following:
They suggested we start singing something while sightseeing.
Additional Regular Expression Examples ──────────────────────────────────────
Following are additional examples illustrating the use of Regular Expressions.
∙ To find any string enclosed in double quotes (""), use the following
search pattern:
".*"
Use the following variation of this to position the cursor on the last
character of the located string (so if you search Again, the search will
begin immediately AFTER the quote at the end of the located string):
".*\c"
∙ To find any string enclosed in single ('') OR double ("") quotes, use
the following search pattern:
{".*"}|{'.*'}
∙ To find any white (tab or space) character, use either of the following
search patterns:
[ \x09]
[ \t]
∙ To find any non-white character, use either of the following search
patterns:
[~ \x09]
[~ \t]
∙ To find a blank line, use the following search pattern:
^$
∙ To find blank lines, or lines that contain only white characters, use
either of the following search patterns:
^[ \x09]@$
^[ \t]@$
Summary List of Regular Expression Operators ────────────────────────────────────────────
Symbol Regular Expression Operation ______ _____________________________________________________________________
. Search: matches any single character (except end-of-line)
^ Search: anchors to beginning of line (or block)
$ Search: anchors to end of line (or block)
| Search "Or" operator: matches preceding or following sub-pattern
? Search: optionally matches preceding sub-pattern
[ ] Search: identifies a Class of characters
[ - ] Search: indicates a range of characters in a Class
[~ ] Search: identifies a complement Class
(minimum closure)
+ Search: matches 1 or more occurrences of preceding sub-pattern
(minimum closure)
@ Search: matches 0 or more occurrences of preceding sub-pattern
(maximum closure)
(maximum closure)
{ } Search: Tags a sub-pattern
\0..\9 Replace: references a Tagged search sub-pattern
\ Search/replace: Escape operator (overrides Regular Expression
operators)
\a Search/replace: represents alert (beep) character (^G or ASCII 7)
\b Search/replace: represents backspace character (^H or ASCII 8)
\c Search: positions cursor within located string (with Find command)
\f Search/replace: represents formfeed character (^L or ASCII 12)
\n Search/replace: represents newline (line feed) character (^J or
ASCII 10) if used in a "Search for:" specification. Causes a
SplitLine?Create() operation to occur (at the specified position) if used in
a "Replace with:" specification.
\r Search/replace: represents return character (^M or ASCII 13)
\t Search/replace: represents tab character (^I or ASCII 9)
\v Search/replace: represents vertical tab character (^K or ASCII 11)
\xnn Search/replace: represents hexadecimal value of equivalent character
\dnnn Search/replace: represents decimal value of equivalent character
\onnn Search/replace: represents octal value of equivalent character
|
Interested in HSI ENGL 242-0731- English Literature II?