Python中正則模塊re.compile、re.match及re.search函數用法

本文轉載自查看原文 2018-07-18 21:47 1596 python

import re
help(re.compile)
'''
輸出結果為：
Help on function compile in module re:

compile(pattern, flags=0)
    Compile a regular expression pattern, returning a pattern object.
通過help可知：編譯一個正則表達式模式，返回一個模式對象。
'''

'''
第二個參數flags是匹配模式，可以使用按位或’|’表示同時生效，也可以在正則表達式字符串中指定。
Pattern對象是不能直接實例化的，只能通過compile方法得到。匹配模式有： 
1).re.I(re.IGNORECASE): 忽略大小寫
2).re.M(MULTILINE): 多行模式，改變’^’和’$’的行為
3).re.S(DOTALL): 點任意匹配模式，改變’.’的行為
4).re.L(LOCALE): 使預定字符類 \w \W \b \B \s \S 取決於當前區域設定
5).re.U(UNICODE): 使預定字符類 \w \W \b \B \s \S \d \D 取決於unicode定義的字符屬性
6).re.X(VERBOSE): 詳細模式。這個模式下正則表達式可以是多行，忽略空白字符，並可以加入注釋 
'''

text="JGod is a handsome boy ,but he is a ider"
print re.findall(r'\w*o\w*',text)  #查找有o的單詞
#輸出結果為：['JGod', 'handsome', 'boy']

#利用compile生成一個規則模式吧，然后利用findall將某一個對象內容進行匹配。，合適則輸出符合規則的內容
regex=re.compile(r'\w*o\w*')
print regex.findall(text)
#>>> ['JGod', 'handsome', 'boy']

test1="who you are,what you do,When you get get there? What is time you state there?"
regex1=re.compile(r'\w*wh\w*',re.IGNORECASE)
wh=regex1.findall(test1)
print wh
#>>> ['who', 'what', 'When', 'What']

'''
re正則表達式模塊還包括一些有用的操作正則表達式的函數。下面主要介紹match函數以及search函數。
定義： re.match 嘗試從字符串的開始匹配一個模式。
原型：
re.match(pattern, string, flags)
第一個參數是正則表達式,如果匹配成功，則返回一個Match，否則返回一個None；
第二個參數表示要匹配的字符串；
第三個參數是標致位，用於控制正則表達式的匹配方式，如：是否區分大小寫，多行匹配等等。

函數的返回值為真或者假。
例如：match(‘p’,’python’)返回值為真；match(‘p’,’www.python.org’)返回值為假。

定義：re.search會在給定字符串中尋找第一個匹配給定正則表達式的子字符串。

函數的返回值：如果查找到則返回查找到的值，否則返回為None。

原型：
re.search(pattern, string, flags)

每個參數的含意與re.match一樣。
'''
#re.match的例子1
import re
your_love=re.match("wh","What are you doing? who is you mate?",re.I)
if your_love:
    print "you are my angle"
else:
    print "i lose you "
#相當於：
print "*"*100  #便於區分
import re
content="What are you doing? who is your mate?"
regu_cont=re.compile("\w*wh\w*",re.I)
yl=regu_cont.match(content)
if yl:
    print yl.group(0)
else:
    print "what happen?"
解析：首先創造了需要正則表達式匹配的字符串content;
接着利用re.compile()來創建了我們所需要的匹配規則，創建了模式對象regu_cont；
yl用來接收對內容content字符串進行regu_cont正則表達式實現match函數的結果
如果有yl不為空，則使用m.group(index)輸出查找到的子字符串 
否則（返回值為None） print “what happen?”


match例子2


'''
match如果查找到結果， 將返回一個 MatchObject，你可以查詢 MatchObject 關於匹配字符串的相關信息了。MatchObject 實例也有幾個方法和屬性；最重要的那些如下所示：
group() 返回被 RE 匹配的字符串
start() 返回匹配開始的位置
end() 返回匹配結束的位置
span() 返回一個元組包含匹配 (開始,結束) 的位置
'''
import re
content="What are you doing? who is your mate?"
regu_cont=re.compile("\w*wh\w*",re.I)
yl=regu_cont.match(content)
if yl:
    print yl.group(0)
else:
    print "pass the test"
print yl.group()
print yl.start()
print yl.end()
print yl.span()

執行結果為：
What
What
0
4
(0, 4)



#search()方法與match()方法類似

import re content='Where are you from? You look so hansome.' regex=re.compile(r'\w*som\w*') m=regex.search(content) if m: print m.group(0) else: print "Not found"

#相當於：
import re
m=re.search(r'\w*som\w*','Where are you from? You look so handsome.',re.I)
if m:
    print m.group(0)
else:
    print "not found"

re 模塊官方說明文檔

正則匹配的時候，第一個字符是 r，表示 raw string 原生字符，意在聲明字符串中間的特殊字符不用轉義。

比如表示 ‘\n'，可以寫 r'\n'，或者不適用原生字符 ‘\n'。

推薦使用 re.match

re.compile() 函數

編譯正則表達式模式，返回一個對象。可以把常用的正則表達式編譯成正則表達式對象，方便后續調用及提高效率。

re.compile(pattern, flags=0)

pattern 指定編譯時的表達式字符串
flags 編譯標志位，用來修改正則表達式的匹配方式。支持 re.L|re.M 同時匹配

flags 標志位參數

re.I(re.IGNORECASE)
使匹配對大小寫不敏感

re.L(re.LOCAL)
做本地化識別（locale-aware）匹配

re.M(re.MULTILINE)
多行匹配，影響 ^ 和 $

re.S(re.DOTALL)
使 . 匹配包括換行在內的所有字符

re.U(re.UNICODE)
根據Unicode字符集解析字符。這個標志影響 \w, \W, \b, \B.

re.X(re.VERBOSE)
該標志通過給予你更靈活的格式以便你將正則表達式寫得更易於理解。

示例：

 
                 import 
                 re 
                
 
                 content  
                 = 
                 'Citizen wang , always fall in love with neighbour，WANG' 
                
 
                 rr  
                 = 
                 re. 
                 compile 
                 (r 
                 'wan\w' 
                 , re.I)  
                 # 不區分大小寫 
                
 
                 print 
                 ( 
                 type 
                 (rr)) 
                
 
                 a  
                 = 
                 rr.findall(content) 
                
 
                 print 
                 ( 
                 type 
                 (a)) 
                
 
                 print 
                 (a) 
                

findall 返回的是一個 list 對象

<class '_sre.SRE_Pattern'>
<class 'list'>
['wang', 'WANG']

re.match() 函數

總是從字符串‘開頭曲匹配'，並返回匹配的字符串的 match 對象 <class '_sre.SRE_Match'>。

re.match(pattern, string[, flags=0])

pattern 匹配模式，由 re.compile 獲得
string 需要匹配的字符串

 
                 import 
                 re 
                
                 pattern  
                 = 
                 re. 
                 compile 
                 (r 
                 'hello' 
                 ) 
                
                 a  
                 = 
                 re.match(pattern,  
                 'hello world' 
                 ) 
                
                 b  
                 = 
                 re.match(pattern,  
                 'world hello' 
                 ) 
                
                 c  
                 = 
                 re.match(pattern,  
                 'hell' 
                 ) 
                
                 d  
                 = 
                 re.match(pattern,  
                 'hello ' 
                 ) 
                
                 if 
                 a: 
                
                 print 
                 (a.group()) 
                
                 else 
                 : 
                
                 print 
                 ( 
                 'a 失敗' 
                 ) 
                
                 if 
                 b: 
                
                 print 
                 (b.group()) 
                
                 else 
                 : 
                
                 print 
                 ( 
                 'b 失敗' 
                 ) 
                
                 if 
                 c: 
                
                 print 
                 (c.group()) 
                
                 else 
                 : 
                
                 print 
                 ( 
                 'c 失敗' 
                 ) 
                
                 if 
                 d: 
                
                 print 
                 (d.group()) 
                
                 else 
                 : 
                
                 print 
                 ( 
                 'd 失敗' 
                 )

hello
b 失敗
c 失敗
hello

match 的方法和屬性

參考鏈接

 
                 import 
                 re 
                
 
                 str 
                 = 
                 'hello world! hello python' 
                
 
                 pattern  
                 = 
                 re. 
                 compile 
                 (r 
                 '(?P<first>hell\w)(?P<symbol>\s)(?P<last>.*ld!)' 
                 )  
                 # 分組，0 組是整個 hello world!, 1組 hello，2組 ld! 
                
 
                 match  
                 = 
                 re.match(pattern,  
                 str 
                 ) 
                
 
                 print 
                 ( 
                 'group 0:' 
                 , match.group( 
                 0 
                 ))  
                 # 匹配 0 組，整個字符串 
                
 
                 print 
                 ( 
                 'group 1:' 
                 , match.group( 
                 1 
                 ))  
                 # 匹配第一組，hello 
                
 
                 print 
                 ( 
                 'group 2:' 
                 , match.group( 
                 2 
                 ))  
                 # 匹配第二組，空格 
                
 
                 print 
                 ( 
                 'group 3:' 
                 , match.group( 
                 3 
                 ))  
                 # 匹配第三組，ld! 
                
 
                 print 
                 ( 
                 'groups:' 
                 , match.groups())   
                 # groups 方法，返回一個包含所有分組匹配的元組 
                
 
                 print 
                 ( 
                 'start 0:' 
                 , match.start( 
                 0 
                 ),  
                 'end 0:' 
                 , match.end( 
                 0 
                 ))  
                 # 整個匹配開始和結束的索引值 
                
 
                 print 
                 ( 
                 'start 1:' 
                 , match.start( 
                 1 
                 ),  
                 'end 1:' 
                 , match.end( 
                 1 
                 ))  
                 # 第一組開始和結束的索引值 
                
 
                 print 
                 ( 
                 'start 2:' 
                 , match.start( 
                 1 
                 ),  
                 'end 2:' 
                 , match.end( 
                 2 
                 ))  
                 # 第二組開始和結束的索引值 
                
 
                 print 
                 ( 
                 'pos 開始於：' 
                 , match.pos) 
                
 
                 print 
                 ( 
                 'endpos 結束於：' 
                 , match.endpos)  
                 # string 的長度 
                
 
                 print 
                 ( 
                 'lastgroup 最后一個被捕獲的分組的名字：' 
                 , match.lastgroup) 
                
 
                 print 
                 ( 
                 'lastindex 最后一個分組在文本中的索引：' 
                 , match.lastindex) 
                
 
                 print 
                 ( 
                 'string 匹配時候使用的文本：' 
                 , match.string) 
                
 
                 print 
                 ( 
                 're 匹配時候使用的 Pattern 對象：' 
                 , match.re) 
                
 
                 print 
                 ( 
                 'span 返回分組匹配的 index （start(group),end(group))：' 
                 , match.span( 
                 2 
                 )) 
                

返回結果：

group 0: hello world!
group 1: hello
group 2:
group 3: world!
groups: ('hello', ' ', 'world!')
start 0: 0 end 0: 12
start 1: 0 end 1: 5
start 2: 0 end 2: 6
pos 開始於： 0
endpos 結束於： 25
lastgroup 最后一個被捕獲的分組的名字： last
lastindex 最后一個分組在文本中的索引： 3
string 匹配時候使用的文本： hello world! hello python
re 匹配時候使用的 Pattern 對象： re.compile('(?P<first>hell\\w)(?P<symbol>\\s)(?P<last>.*ld!)')
span 返回分組匹配的 index （start(group),end(group))： (5, 6)

re.search 函數

對整個字符串進行搜索匹配，返回第一個匹配的字符串的 match 對象。

re.search(pattern, string[, flags=0])

pattern 匹配模式，由 re.compile 獲得
string 需要匹配的字符串

 
                 import 
                 re 
                
 
                 str 
                 = 
                 'say hello world! hello python' 
                
 
                 pattern  
                 = 
                 re. 
                 compile 
                 (r 
                 '(?P<first>hell\w)(?P<symbol>\s)(?P<last>.*ld!)' 
                 )  
                 # 分組，0 組是整個 hello world!, 1組 hello，2組 ld! 
                
 
                 search  
                 = 
                 re.search(pattern,  
                 str 
                 ) 
                
 
                 print 
                 ( 
                 'group 0:' 
                 , search.group( 
                 0 
                 ))  
                 # 匹配 0 組，整個字符串 
                
 
                 print 
                 ( 
                 'group 1:' 
                 , search.group( 
                 1 
                 ))  
                 # 匹配第一組，hello 
                
 
                 print 
                 ( 
                 'group 2:' 
                 , search.group( 
                 2 
                 ))  
                 # 匹配第二組，空格 
                
 
                 print 
                 ( 
                 'group 3:' 
                 , search.group( 
                 3 
                 ))  
                 # 匹配第三組，ld! 
                
 
                 print 
                 ( 
                 'groups:' 
                 , search.groups())   
                 # groups 方法，返回一個包含所有分組匹配的元組 
                
 
                 print 
                 ( 
                 'start 0:' 
                 , search.start( 
                 0 
                 ),  
                 'end 0:' 
                 , search.end( 
                 0 
                 ))  
                 # 整個匹配開始和結束的索引值 
                
 
                 print 
                 ( 
                 'start 1:' 
                 , search.start( 
                 1 
                 ),  
                 'end 1:' 
                 , search.end( 
                 1 
                 ))  
                 # 第一組開始和結束的索引值 
                
 
                 print 
                 ( 
                 'start 2:' 
                 , search.start( 
                 1 
                 ),  
                 'end 2:' 
                 , search.end( 
                 2 
                 ))  
                 # 第二組開始和結束的索引值 
                
 
                 print 
                 ( 
                 'pos 開始於：' 
                 , search.pos) 
                
 
                 print 
                 ( 
                 'endpos 結束於：' 
                 , search.endpos)  
                 # string 的長度 
                
 
                 print 
                 ( 
                 'lastgroup 最后一個被捕獲的分組的名字：' 
                 , search.lastgroup) 
                
 
                 print 
                 ( 
                 'lastindex 最后一個分組在文本中的索引：' 
                 , search.lastindex) 
                
 
                 print 
                 ( 
                 'string 匹配時候使用的文本：' 
                 , search.string) 
                
 
                 print 
                 ( 
                 're 匹配時候使用的 Pattern 對象：' 
                 , search.re) 
                
 
                 print 
                 ( 
                 'span 返回分組匹配的 index （start(group),end(group))：' 
                 , search.span( 
                 2 
                 )) 
                

注意 re.search 和 re.match 匹配的 str 的區別

打印結果：

group 0: hello world!
group 1: hello
group 2:
group 3: world!
groups: ('hello', ' ', 'world!')
start 0: 4 end 0: 16
start 1: 4 end 1: 9
start 2: 4 end 2: 10
pos 開始於： 0
endpos 結束於： 29
lastgroup 最后一個被捕獲的分組的名字： last
lastindex 最后一個分組在文本中的索引： 3
string 匹配時候使用的文本： say hello world! hello python
re 匹配時候使用的 Pattern 對象： re.compile('(?P<first>hell\\w)(?P<symbol>\\s)(?P<last>.*ld!)')
span 返回分組匹配的 index （start(group),end(group))： (9, 10)

PS：這里再為大家提供2款非常方便的正則表達式工具供大家參考使用：

JavaScript正則表達式在線測試工具：
http://tools.jb51.net/regex/javascript

正則表達式在線生成工具：
http://tools.jb51.net/regex/create_reg

https://www.cnblogs.com/huangdongju/p/7839697.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python3中正則模塊re.compile、re.match及re.search函數用法詳解 Python3中正則模塊re.compile、re.match及re.search函數用法詳解 python的re模塊理解（re.compile、re.match、re.search） python正則表達式re模塊理解（RE.COMPILE、RE.MATCH、RE.SEARCH）（六）正則表達式筆記（re.search/re.match/re.split/re.compile/用法） python正則表達式re.match以及re.search函數 re.match與re.search的區別 re.match與re.search的區別正則表達式整理(\w \s \d 點貪婪匹配非貪婪匹配 * + ? {} | [] ^ $ \b 單詞邊界分組、re.findall()、re.split()、re.search()、re.match()、re.compile()、re.sub()) 正則 re.compile 函數