python re 模塊和基礎正則表達式


1.迭代器:對象在其內部實現了iter(),__iter__()方法,可以用next方法實現自我遍歷。

 

.python正則表達式

1.python通過re模塊支持正則表達式

2.查看當前系統有哪些python模塊:help('modules')

help():交互式模式,支持兩種方式調用(交互式模式調用,函數方式調用)

例:交互式調用

>>> help()

 

Welcome to Python 3.5's help utility!

 

If this is your first time using Python, you should definitely check out

the tutorial on the Internet at http://docs.python.org/3.5/tutorial/.

 

Enter the name of any module, keyword, or topic to get help on writing

Python programs and using Python modules.  To quit this help utility and

return to the interpreter, just type "quit".

 

To get a list of available modules, keywords, symbols, or topics, type

"modules", "keywords", "symbols", or "topics".  Each module also comes

with a one-line summary of what it does; to list the modules whose name

or summary contain a given string such as "spam", type "modules spam".

 

help> modules

 

函數式調用

help('modules')

 

3.正則表達式的元字符

\s  :空白符;
\S  :非空白符;
[\s\S]  :任意字符;
[\s\S]*  :0個到多個任意字符;
[\s\S]*?   : 0個字符,匹配任何字符前的位置;

\d:數字;

\B:非數字 ;

\w:匹配單詞 單詞等價於:[a-zA-Z0-9_]; 

\W:匹配非單詞;

 

規則:

.  匹配任意單個字符;

*  匹配前一個字符0次或多次;

+  匹配前一個字符1次或多次;

?  匹配前一個字符0次或一次;

{m}    匹配前一個字符m次;

{m,n} 匹配前一個字符 m - n 次;

{m,}   匹配前一外字符至少 m次 至多無限次;

{,n}  匹配前一個字符 0 到 n次;

\  轉義字符;

[...]  字符集 例:[a-z];   

.*? *? +? ??  {}?    使* + 等 變成非貪婪模式

 

邊界匹配(不消耗待匹配的待匹配字符串的字符)

^:匹配字符串開頭,在多行模式中匹配每一行的行首;

$:匹配字符串結尾,在多行模式中匹配每一行的行尾;

\b:匹配單詞邊界,不匹配任何字符,\b匹配的只是一個位置,這個位置的一側是構成單詞的字符,另一側為非字符、字符串的開始或結束位置,\b是零寬度的。(“單詞”是由\w所定義的單詞子串) \b相當於:(?<!\w)(?=\w)|(?<=\w)(?!\w);

\B:[^\b];

\A:僅匹配字符串開頭;

\Z:僅匹配字符串結尾;

 

分組:

|  或,左右表達式任意匹配一個,它先嘗試匹配 | 左邊的表達式,如果匹配成功則跳過匹配右邊的表達式;如果 | 沒有被包括在()中,則它在范圍是整個正則表達式。

()  分組 ;從表達式左邊開始,第遇到一個分組,編號加1;分組表達式作為一個整體,后面可接數量詞;分組表達式中的 | 僅在該分組中有效。   例:(abc){3}  (abc|def)123  (abc|def){3}123

\number  引用編號為 number 的分組匹配到的字符串。 例:(\d)([a-z])\1\2

 

環視(lookhead)

(?=) :順序肯定環視  

(?!) :順序否定環視  

(?<=) :逆序肯定環視 

(?<!) :逆序否定環視  

 

4.調用re的內置方法完成正則表達式分析

 

5.match(匹配)對象:

match(pattern, string, flags=0)

    Try to apply the pattern at the start of the string, returning

    a match object, or None if no match was found.

 

 

m = re.match('a','abc')

 

所有:

m.end        m.group      m.lastgroup  m.re         m.start

m.endpos     m.groupdict  m.lastindex  m.regs       m.string

m.expand     m.groups     m.pos        m.span       

 

 group([group1, …]): 

獲得一個或多個分組截獲的字符串;指定多個參數時將以元組形式返回。group1可以使用編號也可以使用別名;編號0代表整個匹配的子串;不填寫參數時,返回group(0);沒有截獲字符串的組返回None;截獲了多次的組返回最后一次截獲的子串。        

groups([default]):     

以元組形式返回全部分組截獲的字符串。相當於調用group(1,2,…last)。default表示沒有截獲字符串的組以這個值替代,默認為None。

m.pos (pos:postion):返回從哪個位置開始搜索

m.endpos:返回從哪個位置結束搜索

 

m.start():返回指定pattern在作匹配時所截獲的子串在原串的起始位置

m.end():返回指定pattern在作匹配時所截獲的子串在原串的結束位置

 

 

6.search:執行正則表達式搜索並且在搜索結束后返回所匹配到的串,只返回第一次匹配到的結果

search(pattern, string, flags=0)

    Scan through string looking for a match to the pattern, returning

    a match object, or None if no match was found.

m.group()

m.groups()

 

7.findall :匹配所有的對象,返回一個列表

findall(pattern, string, flags=0)

    Return a list of all non-overlapping matches in the string.

    

    If one or more capturing groups are present in the pattern, return

    a list of groups; this will be a list of tuples if the pattern

    has more than one group.

    

    Empty matches are included in the result.

 直接打印結果

8.finditer(用的不多)

finditer(pattern, string, flags=0)

    Return an iterator(迭代器) over all non-overlapping matches in the

    string.  For each match, the iterator returns a match object.

    

    Empty matches are included in the result.

 

9.split

split(pattern, string, maxsplit=0, flags=0)

    Split the source string by the occurrences of the pattern,

    returning a list containing the resulting substrings.  If

    capturing parentheses are used in pattern, then the text of all

    groups in the pattern are also returned as part of the resulting

    list.  If maxsplit is nonzero, at most maxsplit splits occur,

    and the remainder of the string is returned as the final element

    of the list.

   例:a = re.split('\.','www.baidu.com')

 直接打印結果

10.sub:實現查找替換

sub(pattern, repl, string, count=0, flags=0)

    Return the string obtained by replacing the leftmost

    non-overlapping occurrences of the pattern in string by the

    replacement repl.  repl can be either a string or a callable;

    if a string, backslash escapes in it are processed.  If it is

    a callable, it's passed the match object and must return

    a replacement string to be used.

   例:In [47]: re.sub('baidu','BAIDU','www.baidu.com')

   Out[47]: 'www.BAIDU.com'

11.subn :查找替換,並顯示替換的次數

例:

In [48]: re.subn('baidu','BAIDU','www.baidu.com')

Out[48]: ('www.BAIDU.com', 1)

 

 

flags:

re.IIGNORECASE:忽略字符大小寫

re.MMULTILINE:多行匹配

re.AASCII:僅執行8位的ASCII碼字符匹配

re.UUNICODE:使用\w,\W

re.S (DOTALL): "." matches any character at all, including the newline.  使 . 可以匹配 \n 符。

re.X (VERBOSE): Ignore whitespace and comments for nicer looking RE's. 允許在正則表達式規則中加入注釋,但默認會去掉所有空格。

 

12.去除優先捕獲:

xxx(?:)xxx

 

?:  :分組時去除優先捕獲

?P<>   :

 (?P<name>...)

Similar to regular parentheses, but the substring matched by the group is accessible via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named.

 

Named groups can be referenced in three contexts. If the pattern is (?P<quote>['"]).*?(?P=quote) (i.e. matching a string quoted with either single or double quotes):

 

Context of reference to group quote Ways to reference it

in the same pattern itself

(?P=quote) (as shown)

\1

when processing match object m

m.group('quote')

m.end('quote') (etc.)

in a string passed to the repl argument of re.sub()

\g<quote>

\g<1>

\1

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM