| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> | 
|---|
| 2 |  | 
|---|
| 3 | <html> | 
|---|
| 4 | <head> | 
|---|
| 5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> | 
|---|
| 6 | <meta name="Author" content="Thomas Bretz"> | 
|---|
| 7 | <title>MARS: Magic Analysis and Reconstruction Software</title> | 
|---|
| 8 | <link rel="stylesheet" type="text/css" href="mars.css"> | 
|---|
| 9 | </head> | 
|---|
| 10 |  | 
|---|
| 11 | <body background="background.gif" text="#000000" bgcolor="#000099" link="#1122FF" vlink="#8888FF" alink="#FF0000"> | 
|---|
| 12 |   | 
|---|
| 13 |  | 
|---|
| 14 | <center> | 
|---|
| 15 | <table class="Main" CELLPADDING=0> | 
|---|
| 16 |  | 
|---|
| 17 | <tr> | 
|---|
| 18 | <td class="Edge"><img SRC="ecke.gif" ALT=""></td> | 
|---|
| 19 | <td class="Header"> | 
|---|
| 20 | <B>M A R S</B><BR><B>M</B>agic <B>A</B>nalysis and <B>R</B>econstruction <B>S</B>oftware | 
|---|
| 21 | </td> | 
|---|
| 22 | </tr> | 
|---|
| 23 |  | 
|---|
| 24 | <tr> | 
|---|
| 25 | <td COLSPAN=2 BGCOLOR="#FFFFFF"> | 
|---|
| 26 | <hr SIZE=1 NOSHADE WIDTH="80%"> | 
|---|
| 27 | <center><table class="Inner" CELLPADDING=15> | 
|---|
| 28 |  | 
|---|
| 29 | <tr class="Block"> | 
|---|
| 30 | <td><b><u><A NAME="OVERVIEW">MySQL Regular Expressions</A>:</u></b> | 
|---|
| 31 | <P> | 
|---|
| 32 | A <B>regular expression (regex)</B> is a powerful way of specifying a complex search. <P> | 
|---|
| 33 |  | 
|---|
| 34 | MySQL uses Henry Spencer's implementation of regular expressions, which is aimed at conformance with POSIX | 
|---|
| 35 | 1003.2. MySQL uses the extended version. <P> | 
|---|
| 36 |  | 
|---|
| 37 | This is a simplistic reference that skips the details. To get more exact information, see | 
|---|
| 38 | Henry Spencer's <A HREF="#REGEX">regex(7)</A><P> | 
|---|
| 39 |  | 
|---|
| 40 | A regular expression describes a set of strings. The simplest regexp is one that has no special characters in it. For | 
|---|
| 41 | example, the regexp <b>hello</B> matches <B>hello</B> and nothing else. <P> | 
|---|
| 42 |  | 
|---|
| 43 | Non-trivial regular expressions use certain special constructs so that they can match more than one string. For | 
|---|
| 44 | example, the regexp hello|word matches either the string hello or the string word. <P> | 
|---|
| 45 |  | 
|---|
| 46 | As a more complex example, the regexp B[an]*s matches any of the strings Bananas, Baaaaas, Bs, and any | 
|---|
| 47 | other string starting with a B, ending with an s, and containing any number of a or n characters in between. <P> | 
|---|
| 48 |  | 
|---|
| 49 | A regular expression may use any of the following special characters/constructs:   <P> | 
|---|
| 50 | <pre> | 
|---|
| 51 | ^         Match the beginning of a string. | 
|---|
| 52 | mysql> SELECT "fo\nfo" REGEXP "^fo$";           -> 0 | 
|---|
| 53 | mysql> SELECT "fofo"   REGEXP "^fo";            -> 1 | 
|---|
| 54 |  | 
|---|
| 55 | $         Match the end of a string. | 
|---|
| 56 | mysql> SELECT "fo\no" REGEXP "^fo\no$";         -> 1 | 
|---|
| 57 | mysql> SELECT "fo\no" REGEXP "^fo$";            -> 0 | 
|---|
| 58 |  | 
|---|
| 59 | .         Match any character (including newline). | 
|---|
| 60 | mysql> SELECT "fofo"  REGEXP "^f.*";            -> 1 | 
|---|
| 61 | mysql> SELECT "fo\nfo" REGEXP "^f.*";           -> 1 | 
|---|
| 62 |  | 
|---|
| 63 | a*        Match any sequence of zero or more a characters. | 
|---|
| 64 | mysql> SELECT "Ban"   REGEXP "^Ba*n";           -> 1 | 
|---|
| 65 | mysql> SELECT "Baaan" REGEXP "^Ba*n";           -> 1 | 
|---|
| 66 | mysql> SELECT "Bn"    REGEXP "^Ba*n";           -> 1 | 
|---|
| 67 |  | 
|---|
| 68 | a+        Match any sequence of one or more a characters. | 
|---|
| 69 | mysql> SELECT "Ban" REGEXP "^Ba+n";             -> 1 | 
|---|
| 70 | mysql> SELECT "Bn"  REGEXP "^Ba+n";             -> 0 | 
|---|
| 71 |  | 
|---|
| 72 | a?        Match either zero or one a character. | 
|---|
| 73 | mysql> SELECT "Bn"   REGEXP "^Ba?n";            -> 1 | 
|---|
| 74 | mysql> SELECT "Ban"  REGEXP "^Ba?n";            -> 1 | 
|---|
| 75 | mysql> SELECT "Baan" REGEXP "^Ba?n";            -> 0 | 
|---|
| 76 |  | 
|---|
| 77 | de|abc    Match either of the sequences de or abc. | 
|---|
| 78 | mysql> SELECT "pi"  REGEXP "pi|apa";            -> 1 | 
|---|
| 79 | mysql> SELECT "axe" REGEXP "pi|apa";            -> 0 | 
|---|
| 80 | mysql> SELECT "apa" REGEXP "pi|apa";            -> 1 | 
|---|
| 81 | mysql> SELECT "apa" REGEXP "^(pi|apa)$";        -> 1 | 
|---|
| 82 | mysql> SELECT "pi"  REGEXP "^(pi|apa)$";        -> 1 | 
|---|
| 83 | mysql> SELECT "pix" REGEXP "^(pi|apa)$";        -> 0 | 
|---|
| 84 |  | 
|---|
| 85 | (abc)*    Match zero or more instances of the sequence abc. | 
|---|
| 86 | mysql> SELECT "pi"   REGEXP "^(pi)*$";          -> 1 | 
|---|
| 87 | mysql> SELECT "pip"  REGEXP "^(pi)*$";          -> 0 | 
|---|
| 88 | mysql> SELECT "pipi" REGEXP "^(pi)*$";          -> 1 | 
|---|
| 89 |  | 
|---|
| 90 | {1}       The is a more general way of writing regexps that match many | 
|---|
| 91 | {2,3}     occurrences of the previous atom. | 
|---|
| 92 | a*          Can be written as a{0,}. | 
|---|
| 93 | a+          Can be written as a{1,}. | 
|---|
| 94 | a?          Can be written as a{0,1}. | 
|---|
| 95 |  | 
|---|
| 96 | To be more precise, an atom followed by a bound containing one | 
|---|
| 97 | integer i and no comma matches a sequence of exactly i matches | 
|---|
| 98 | of the atom. An atom followed by a bound containing one integer i | 
|---|
| 99 | and a comma matches a sequence of i or more matches of the atom. | 
|---|
| 100 | An atom followed by a bound containing two integers i and j matches | 
|---|
| 101 | a sequence of i through j (inclusive) matches of the atom. | 
|---|
| 102 |  | 
|---|
| 103 | Both arguments must be in the range from 0 to RE_DUP_MAX (default 255), | 
|---|
| 104 | inclusive.  If there are two arguments, the second must be greater | 
|---|
| 105 | than or equal to the first. | 
|---|
| 106 |  | 
|---|
| 107 | [a-dX]    Matches any character which is (or is not, if ^ is used) either a, b, c, | 
|---|
| 108 | [^a-dX]   d or X. To include a literal ] character, it must immediately follow | 
|---|
| 109 | the opening bracket [.  To include a literal - character, it must be | 
|---|
| 110 | written first or last. So [0-9] matches any decimal digit. Any character | 
|---|
| 111 | that does not have a defined meaning inside a [] pair has no special | 
|---|
| 112 | meaning and matches only itself. | 
|---|
| 113 | mysql> SELECT "aXbc"   REGEXP "[a-dXYZ]";               -> 1 | 
|---|
| 114 | mysql> SELECT "aXbc"   REGEXP "^[a-dXYZ]$";             -> 0 | 
|---|
| 115 | mysql> SELECT "aXbc"   REGEXP "^[a-dXYZ]+$";            -> 1 | 
|---|
| 116 | mysql> SELECT "aXbc"   REGEXP "^[^a-dXYZ]+$";           -> 0 | 
|---|
| 117 | mysql> SELECT "gheis"  REGEXP "^[^a-dXYZ]+$";           -> 1 | 
|---|
| 118 | mysql> SELECT "gheisa" REGEXP "^[^a-dXYZ]+$";           -> 0 | 
|---|
| 119 |  | 
|---|
| 120 | [[.characters.]] | 
|---|
| 121 | The sequence of characters of that collating element. characters is | 
|---|
| 122 | either a single character or a character name like newline. You can | 
|---|
| 123 | find the full list of character names in 'regexp/cname.h'. | 
|---|
| 124 |  | 
|---|
| 125 | [ =character_class=] | 
|---|
| 126 | An equivalence class, standing for the sequences of characters of all | 
|---|
| 127 | collating elements equivalent to that one, including itself. | 
|---|
| 128 |  | 
|---|
| 129 | For example, if o and (+) are the members of an equivalence class, | 
|---|
| 130 | then [[=o=]], [[=(+)=]], and [o(+)] are all synonymous. An equivalence | 
|---|
| 131 | class may not be an endpoint of a range. | 
|---|
| 132 |  | 
|---|
| 133 | [:character_class:] | 
|---|
| 134 | Within a bracket expression, the name of a character class enclosed | 
|---|
| 135 | in [: and :] stands for the list of all characters belonging to that | 
|---|
| 136 | class. Standard character class names are: | 
|---|
| 137 |  | 
|---|
| 138 | These stand for the character classes defined in the ctype(3) manual | 
|---|
| 139 | page. A locale may provide others. A character class may not be used | 
|---|
| 140 | as an endpoint of a range. | 
|---|
| 141 | mysql> SELECT "justalnums" REGEXP "[[:alnum:]]+";       -> 1 | 
|---|
| 142 | mysql> SELECT "!!" REGEXP "[[:alnum:]]+";               -> 0 | 
|---|
| 143 |  | 
|---|
| 144 | [[:<:]]   These match the null string at the beginning and end of a word | 
|---|
| 145 | [[:>:]]   respectively.  A word is defined as a sequence of word characters | 
|---|
| 146 | which is neither preceded nor followed by word characters. A word | 
|---|
| 147 | character is an alnum character (as defined by ctype(3)) or an | 
|---|
| 148 | underscore (_). | 
|---|
| 149 | mysql> SELECT "a word a" REGEXP "[[:<:]]word[[:>:]]";   -> 1 | 
|---|
| 150 | mysql> SELECT "a xword a" REGEXP "[[:<:]]word[[:>:]]";  -> 0 | 
|---|
| 151 |  | 
|---|
| 152 | mysql> SELECT "weeknights" REGEXP "^(wee|week)(knights|nights)$"; -> 1 | 
|---|
| 153 | </pre> | 
|---|
| 154 | </td></tr> | 
|---|
| 155 | <tr class="Block"> | 
|---|
| 156 | <td> | 
|---|
| 157 | <center><h3>--- <A NAME="REGEX"><U>REGEX</U></A>(7) ---</h3></center> | 
|---|
| 158 | <B>NAME</B><BR> | 
|---|
| 159 | regex - POSIX 1003.2 regular expressions<P> | 
|---|
| 160 |  | 
|---|
| 161 | <B>DESCRIPTION</B><BR> | 
|---|
| 162 | Regular expressions (``RE''s), as defined in POSIX 1003.2, | 
|---|
| 163 | come in two forms: modern REs  (roughly  those  of  egrep; | 
|---|
| 164 | 1003.2  calls  these  ``extended''  REs)  and obsolete REs | 
|---|
| 165 | (roughly those of ed; 1003.2 ``basic'' REs).  Obsolete REs | 
|---|
| 166 | mostly  exist  for backward compatibility in some old pro- | 
|---|
| 167 | grams; they will be discussed at the end.   1003.2  leaves | 
|---|
| 168 | some  aspects  of  RE  syntax and semantics open; `' marks | 
|---|
| 169 | decisions on these aspects that may not be fully  portable | 
|---|
| 170 | to other 1003.2 implementations.<P> | 
|---|
| 171 |  | 
|---|
| 172 | A (modern) RE is one or more non-empty branches, separated | 
|---|
| 173 | by `|'.  It matches  anything  that  matches  one  of  the | 
|---|
| 174 | branches.<P> | 
|---|
| 175 |  | 
|---|
| 176 | A  branch is one or more pieces, concatenated.  It matches | 
|---|
| 177 | a match for the first, followed by a match for the second, | 
|---|
| 178 | etc.<P> | 
|---|
| 179 |  | 
|---|
| 180 | A piece is an atom possibly followed by a single `*', `+', | 
|---|
| 181 | `?', or bound.  An atom followed by `*' matches a sequence | 
|---|
| 182 | of 0 or more matches of the atom.  An atom followed by `+' | 
|---|
| 183 | matches a sequence of 1 or more matches of the  atom.   An | 
|---|
| 184 | atom  followed by `?' matches a sequence of 0 or 1 matches | 
|---|
| 185 | of the atom.<P> | 
|---|
| 186 |  | 
|---|
| 187 | A bound is `{' followed by an  unsigned  decimal  integer, | 
|---|
| 188 | possibly  followed  by  `,'  possibly  followed by another | 
|---|
| 189 | unsigned decimal integer, always  followed  by  `}'.   The | 
|---|
| 190 | integers  must  lie  between 0 and RE_DUP_MAX (255) inclu- | 
|---|
| 191 | sive, and if there are two of  them,  the  first  may  not | 
|---|
| 192 | exceed the second.  An atom followed by a bound containing | 
|---|
| 193 | one integer i and no comma matches a sequence of exactly i | 
|---|
| 194 | matches of the atom.  An atom followed by a bound contain- | 
|---|
| 195 | ing one integer i and a comma matches a sequence of  i  or | 
|---|
| 196 | more  matches  of  the  atom.  An atom followed by a bound | 
|---|
| 197 | containing two integers i and j matches a  sequence  of  i | 
|---|
| 198 | through j (inclusive) matches of the atom.<P> | 
|---|
| 199 |  | 
|---|
| 200 | An atom is a regular expression enclosed in `()' (matching | 
|---|
| 201 | a match for the regular expression), an empty set of  `()' | 
|---|
| 202 | (matching  the  null  string),  a  bracket expression (see | 
|---|
| 203 | below), `.'  (matching any single character), `^'  (match- | 
|---|
| 204 | ing  the  null  string  at  the  beginning of a line), `$' | 
|---|
| 205 | (matching the null string at the end of  a  line),  a  `\' | 
|---|
| 206 | followed by one of the characters `^.[$()|*+?{\' (matching | 
|---|
| 207 | that character taken as an ordinary character), a `\' fol- | 
|---|
| 208 | lowed  by  any  other  character  (matching that character | 
|---|
| 209 | taken as an ordinary character, as if the `\' had not been | 
|---|
| 210 | present), or a single character with no other significance | 
|---|
| 211 | (matching that character).  A `{' followed by a  character | 
|---|
| 212 | other  than  a  digit  is  an  ordinary character, not the | 
|---|
| 213 | beginning of a bound.  It is illegal to  end  an  RE  with | 
|---|
| 214 | `\'.<P> | 
|---|
| 215 |  | 
|---|
| 216 | A  bracket  expression is a list of characters enclosed in | 
|---|
| 217 | `[]'.  It normally matches any single character  from  the | 
|---|
| 218 | list  (but  see  below).   If the list begins with `^', it | 
|---|
| 219 | matches any single character (but see below) not from  the | 
|---|
| 220 | rest of the list.  If two characters in the list are sepa- | 
|---|
| 221 | rated by `-', this is shorthand  for  the  full  range  of | 
|---|
| 222 | characters  between those two (inclusive) in the collating | 
|---|
| 223 | sequence, e.g. `[0-9]' in ASCII matches any decimal digit. | 
|---|
| 224 | It  is  illegal  for two ranges to share an endpoint, e.g. | 
|---|
| 225 | `a-c-e'.  Ranges  are  very  collating-sequence-dependent, | 
|---|
| 226 | and portable programs should avoid relying on them.<P> | 
|---|
| 227 |  | 
|---|
| 228 | To  include  a  literal `]' in the list, make it the first | 
|---|
| 229 | character (following a possible `^').  To include  a  lit- | 
|---|
| 230 | eral `-', make it the first or last character, or the sec- | 
|---|
| 231 | ond endpoint of a range.  To use  a  literal  `-'  as  the | 
|---|
| 232 | first  endpoint of a range, enclose it in `[.' and `.]' to | 
|---|
| 233 | make it a collating element (see below).  With the  excep- | 
|---|
| 234 | tion  of  these  and some combinations using `[' (see next | 
|---|
| 235 | paragraphs), all other special characters, including  `\', | 
|---|
| 236 | lose  their  special significance within a bracket expres- | 
|---|
| 237 | sion.<P> | 
|---|
| 238 |  | 
|---|
| 239 | Within a bracket expression, a collating element (a  char- | 
|---|
| 240 | acter,  a  multi-character sequence that collates as if it | 
|---|
| 241 | were a single character, or a collating-sequence name  for | 
|---|
| 242 | either)  enclosed in `[.' and `.]' stands for the sequence | 
|---|
| 243 | of characters of that collating element.  The sequence  is | 
|---|
| 244 | a  single  element  of  the  bracket expression's list.  A | 
|---|
| 245 | bracket expression containing a multi-character  collating | 
|---|
| 246 | element  can  thus  match more than one character, e.g. if | 
|---|
| 247 | the collating sequence includes a `ch' collating  element, | 
|---|
| 248 | then the RE `[[.ch.]]*c' matches the first five characters | 
|---|
| 249 | of `chchcc'.<P> | 
|---|
| 250 |  | 
|---|
| 251 | Within a bracket expression, a collating element  enclosed | 
|---|
| 252 | in `[=' and `=]' is an equivalence class, standing for the | 
|---|
| 253 | sequences of characters of all collating elements  equiva- | 
|---|
| 254 | lent  to  that  one,  including  itself.  (If there are no | 
|---|
| 255 | other equivalent collating elements, the treatment  is  as | 
|---|
| 256 | if  the  enclosing  delimiters  were  `[.' and `.]'.)  For | 
|---|
| 257 | example, if o and ^ are  the  members  of  an  equivalence | 
|---|
| 258 | class,  then `[[=o=]]', `[[=^=]]', and `[o^]' are all syn- | 
|---|
| 259 | onymous.  An equivalence class may not be an endpoint of a | 
|---|
| 260 | range.<P> | 
|---|
| 261 |  | 
|---|
| 262 | Within a bracket expression, the name of a character class | 
|---|
| 263 | enclosed in `[:' and `:]' stands for the list of all char- | 
|---|
| 264 | acters  belonging to that class.  Standard character class | 
|---|
| 265 | names are:<P> | 
|---|
| 266 | <table> | 
|---|
| 267 | <tr><td>alnum</TD><td>digit</td><td>punct</td></tr> | 
|---|
| 268 | <tr><td>alpha</TD><td>graph</TD><td>space</td></tr> | 
|---|
| 269 | <tr><td>blank</TD><td>lower</TD><td>upper</td></tr> | 
|---|
| 270 | <tr><td>cntrl</TD><td>print</TD><td>xdigit</td></tr> | 
|---|
| 271 | </table> | 
|---|
| 272 | <P> | 
|---|
| 273 | These stand for the character classes defined in ctype(3). | 
|---|
| 274 | A locale may provide others.  A character class may not be | 
|---|
| 275 | used as an endpoint of a range.<P> | 
|---|
| 276 |  | 
|---|
| 277 | There are two special cases of  bracket  expressions:  the | 
|---|
| 278 | bracket expressions `[[:<:]]' and `[[:>:]]' match the null | 
|---|
| 279 | string at the beginning and end of a word respectively.  A | 
|---|
| 280 | word  is defined as a sequence of word characters which is | 
|---|
| 281 | neither preceded nor followed by word characters.  A  word | 
|---|
| 282 | character  is  an alnum character (as defined by ctype(3)) | 
|---|
| 283 | or an underscore.  This is an extension,  compatible  with | 
|---|
| 284 | but not specified by POSIX 1003.2, and should be used with | 
|---|
| 285 | caution in software intended to be portable to other  sys- | 
|---|
| 286 | tems.<P> | 
|---|
| 287 |  | 
|---|
| 288 | In  the  event  that  an RE could match more than one sub- | 
|---|
| 289 | string of a given string, the RE matches the one  starting | 
|---|
| 290 | earliest  in  the string.  If the RE could match more than | 
|---|
| 291 | one substring starting  at  that  point,  it  matches  the | 
|---|
| 292 | longest.   Subexpressions  also match the longest possible | 
|---|
| 293 | substrings, subject to the constraint that the whole match | 
|---|
| 294 | be  as long as possible, with subexpressions starting ear- | 
|---|
| 295 | lier in the RE taking priority over ones  starting  later. | 
|---|
| 296 | Note  that  higher-level subexpressions thus take priority | 
|---|
| 297 | over their lower-level component subexpressions.<P> | 
|---|
| 298 |  | 
|---|
| 299 | Match lengths are measured in  characters,  not  collating | 
|---|
| 300 | elements.   A  null  string  is  considered longer than no | 
|---|
| 301 | match at all.  For example, `bb*' matches the three middle | 
|---|
| 302 | characters    of   `abbbc',   `(wee|week)(knights|nights)' | 
|---|
| 303 | matches all ten characters of `weeknights', when  `(.*).*' | 
|---|
| 304 | is  matched  against `abc' the parenthesized subexpression | 
|---|
| 305 | matches all three characters, and when `(a*)*' is  matched | 
|---|
| 306 | against  `bc'  both  the  whole  RE  and the parenthesized | 
|---|
| 307 | subexpression match the null string.<P> | 
|---|
| 308 |  | 
|---|
| 309 | If case-independent matching is specified, the  effect  is | 
|---|
| 310 | much  as  if  all  case distinctions had vanished from the | 
|---|
| 311 | alphabet.  When an  alphabetic  that  exists  in  multiple | 
|---|
| 312 | cases  appears  as an ordinary character outside a bracket | 
|---|
| 313 | expression, it is effectively transformed into  a  bracket | 
|---|
| 314 | expression containing both cases, e.g. `x' becomes `[xX]'. | 
|---|
| 315 | When it appears inside  a  bracket  expression,  all  case | 
|---|
| 316 | counterparts of it are added to the bracket expression, so | 
|---|
| 317 | that  (e.g.)  `[x]'  becomes  `[xX]'  and  `[^x]'  becomes | 
|---|
| 318 | `[^xX]'.<P> | 
|---|
| 319 |  | 
|---|
| 320 | No particular limit is imposed on the length of REs.  Pro- | 
|---|
| 321 | grams intended to be portable should not employ REs longer | 
|---|
| 322 | than  256 bytes, as an implementation can refuse to accept | 
|---|
| 323 | such REs and remain POSIX-compliant.<P> | 
|---|
| 324 |  | 
|---|
| 325 | Obsolete (``basic'') regular expressions differ in several | 
|---|
| 326 | respects.   `|',  `+', and `?' are ordinary characters and | 
|---|
| 327 | there is  no  equivalent  for  their  functionality.   The | 
|---|
| 328 | delimiters  for bounds are `\{' and `\}', with `{' and `}' | 
|---|
| 329 | by themselves ordinary characters.   The  parentheses  for | 
|---|
| 330 | nested  subexpressions are `\(' and `\)', with `(' and `)' | 
|---|
| 331 | by themselves ordinary characters.   `^'  is  an  ordinary | 
|---|
| 332 | character  except at the beginning of the RE or the begin- | 
|---|
| 333 | ning of a parenthesized subexpression, `$' is an  ordinary | 
|---|
| 334 | character  except  at  the  end  of the RE or the end of a | 
|---|
| 335 | parenthesized subexpression, and `*' is an ordinary  char- | 
|---|
| 336 | acter  if  it  appears  at  the beginning of the RE or the | 
|---|
| 337 | beginning of a parenthesized subexpression (after a possi- | 
|---|
| 338 | ble leading `^').  Finally, there is one new type of atom, | 
|---|
| 339 | a back reference: `\' followed by a non-zero decimal digit | 
|---|
| 340 | d  matches  the same sequence of characters matched by the | 
|---|
| 341 | dth parenthesized subexpression (numbering  subexpressions | 
|---|
| 342 | by  the  positions  of  their opening parentheses, left to | 
|---|
| 343 | right), so that (e.g.) `\([bc]\)\1' matches `bb'  or  `cc' | 
|---|
| 344 | but not `bc'.<P> | 
|---|
| 345 |  | 
|---|
| 346 | <B>SEE ALSO</B><BR> | 
|---|
| 347 | POSIX 1003.2, section 2.8 (Regular Expression Notation).<P> | 
|---|
| 348 |  | 
|---|
| 349 | <B>BUGS</B><BR> | 
|---|
| 350 | Having two kinds of REs is a botch.<P> | 
|---|
| 351 |  | 
|---|
| 352 | The current 1003.2 spec says that `)' is an ordinary char- | 
|---|
| 353 | acter in the absence of an  unmatched  `(';  this  was  an | 
|---|
| 354 | unintentional  result  of  a  wording error, and change is | 
|---|
| 355 | likely.  Avoid relying on it.<P> | 
|---|
| 356 |  | 
|---|
| 357 | Back references are a dreadful botch, posing  major  prob- | 
|---|
| 358 | lems  for  efficient implementations.  They are also some- | 
|---|
| 359 | what  vaguely  defined   (does   `a\(\(b\)*\2\)*d'   match | 
|---|
| 360 | `abbbd'?).  Avoid using them.<P> | 
|---|
| 361 |  | 
|---|
| 362 | 1003.2's  specification  of  case-independent  matching is | 
|---|
| 363 | vague.  The ``one  case  implies  all  cases''  definition | 
|---|
| 364 | given  above is current consensus among implementors as to | 
|---|
| 365 | the right interpretation.<P> | 
|---|
| 366 |  | 
|---|
| 367 | The syntax for word boundaries is incredibly ugly.<P> | 
|---|
| 368 |  | 
|---|
| 369 | <B>AUTHOR</B><BR> | 
|---|
| 370 | This page was taken from Henry Spencer's regex package. | 
|---|
| 371 | </td> | 
|---|
| 372 | </tr> | 
|---|
| 373 |  | 
|---|
| 374 | </table></center> | 
|---|
| 375 |  | 
|---|
| 376 | <center> | 
|---|
| 377 | <hr NOSHADE WIDTH="80%"><i><font color="#000099"><font size=-1>This Web Site is | 
|---|
| 378 | hosted by Apache for OS/2 and done by <a href="mailto:tbretz@astro.uni-wuerzburg.de">Thomas Bretz</a>.</font></font></i><BR> | 
|---|
| 379 |  <BR> | 
|---|
| 380 | <a href="http://validator.w3.org/check/referer"><img border="0" | 
|---|
| 381 | src="../../valid-html40.png" alt="Valid HTML 4.0!" height="20" width="66"></a> | 
|---|
| 382 | </center>  | 
|---|
| 383 | </tr> | 
|---|
| 384 | </table> | 
|---|
| 385 |  | 
|---|
| 386 | </center> | 
|---|
| 387 |  | 
|---|
| 388 | </body> | 
|---|
| 389 | </html> | 
|---|