正則表達式Regular expression 個人筆記

Regular Expression Syntax

特殊符號 說明
. 匹配除换行符(Newline)以外的任意字符
^ 匹配字符串的開始
$ 匹配字符串的結束
* 匹配前面出現0次或多次的正則表達式
+ 匹配前面出現1次或多次的正則表達式
? 匹配前面出現0次或1次的正則表達式
{m} m必須為一個非負整數。匹配確定的m次,例如可以匹配a{6},你只能匹配有6個’a’的正則式
{m,n} 匹配的次數需要在m和之間。其中m<=n.例如,o{1,3}只可以匹配“fooooood”裡面前三個的o。
{m,} 匹配的次數最少為m,最多無限制。
{m,n}? 加了問號正則式只會匹配m而已,例子:‘aaaaaa’中,a{3,5}?,只會匹配3個a。

\ Either escapes special characters (permitting you to match characters like ‘*’, ‘?’, and so forth), or signals a special sequence; special sequences are discussed below.

If you’re not using a raw string to express the pattern, remember that Python also uses the backslash as an escape sequence in string literals; if the escape sequence isn’t recognized by Python’s parser, the backslash and subsequent character are included in the resulting string. However, if Python would recognize the resulting sequence, the backslash should be repeated twice. This is complicated and hard to understand, so it’s highly recommended that you use raw strings for all but the simplest expressions.

[] Used to indicate a set of characters. In a set:

Characters can be listed individually, e.g. [amk] will match ‘a’, ‘m’, or ‘k’.
Ranges of characters can be indicated by giving two characters and separating them by a ‘-‘, for example [a-z] will match any lowercase ASCII letter, [0-5][0-9] will match all the two-digits numbers from 00 to 59, and [0-9A-Fa-f] will match any hexadecimal digit. If - is escaped (e.g. [a-z]) or if it’s placed as the first or last character (e.g. [-a] or [a-]), it will match a literal ‘-‘.
Special characters lose their special meaning inside sets. For example, [(+)] will match any of the literal characters ‘(‘, ‘+’, ‘‘, or ‘)’.
Character classes such as \w or \S (defined below) are also accepted inside a set, although the characters they match depends on whether ASCII or LOCALE mode is in force.
Characters that are not within a range can be matched by complementing the set. If the first character of the set is ‘^’, all the characters that are not in the set will be matched. For example, 5 will match any character except ‘5’, and [^^] will match any character except ‘^’. ^ has no special meaning if it’s not the first character in the set.
To match a literal ‘]’ inside a set, precede it with a backslash, or place it at the beginning of the set. For example, both [()[]{}] and [{}] will both match a parenthesis.

| A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the ‘|’ in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by ‘|’ are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the ‘|’ operator is never greedy. To match a literal ‘|’, use |, or enclose it inside a character class, as in [|].

To be continue.

0%