python-re模塊

本文轉載自查看原文 2014-08-24 15:32 3045 python學習

python re模塊中的用法：

1，compile: re.compile(strPattern[, flag])

pattern=re.compile(r"<div.*?>(.*?)</div>")

得到的是一個pattern對象，屬性有：

pattern: 編譯時用的表達式字符串。
flags: 編譯時用的匹配模式。數字形式。
groups: 表達式中分組的數量。
groupindex: 以表達式中有別名的組的別名為鍵、以該組對應的編號為值的字典，沒有別名的組不包含在內。

 1 #! /usr/bin/env python  2 #coding=utf-8  3 import re  4 pattern=re.compile(r'(\w+) (\w+)(?P<sign>.*)',re.I|re.M)  5 print pattern.pattern  6 print pattern.flags  7 print pattern.groups  8 print pattern.groupindex  9 "(\w+) (\w+)(?P<sign>.*) 10 10 11 3 12 {'sign': 3}"

2，match:match(string[, pos[, endpos]]) | re.match(pattern, string[, flags])：

如：matchObj = re.match( r'(.*) are (.*?) .*', string, re.M|re.I)

或

pattern=re.compile("hello",re.S)

matchObj=pattern.match("hello world!")

這個方法將從string的pos下標處起嘗試匹配pattern；

如果pattern結束時仍可匹配，則返回一個Match對象；

如果匹配過程中pattern無法匹配，或者匹配未結束就已到達endpos，則返回None。

pos和endpos的默認值分別為0和len(string)；

re.match()無法指定這兩個參數，參數flags用於編譯pattern時指定匹配模式。

注意：這個方法並不是完全匹配。

當pattern結束時若string還有剩余字符，仍然視為成功。

想要完全匹配，可以在表達式末尾加上邊界匹配符'$'。

得到的是Match對象，屬性有：

string: 匹配時使用的文本。
re: 匹配時使用的Pattern對象。
pos: 文本中正則表達式開始搜索的索引。值與Pattern.match()和Pattern.seach()方法的同名參數相同。
endpos: 文本中正則表達式結束搜索的索引。值與Pattern.match()和Pattern.seach()方法的同名參數相同。
lastindex: 最后一個被捕獲的分組在文本中的索引。如果沒有被捕獲的分組，將為None。
lastgroup: 最后一個被捕獲的分組的別名。如果這個分組沒有別名或者沒有被捕獲的分組，將為None。

方法：

group([group1, …])：
獲得一個或多個分組截獲的字符串；指定多個參數時將以元組形式返回。group1可以使用編號也可以使用別名；編號0代表整個匹配的子串；不填寫參數時，返回group(0)；沒有截獲字符串的組返回None；截獲了多次的組返回最后一次截獲的子串。
groups([default])：
以元組形式返回全部分組截獲的字符串。相當於調用group(1,2,…last)。default表示沒有截獲字符串的組以這個值替代，默認為None。
groupdict([default])：
返回以有別名的組的別名為鍵、以該組截獲的子串為值的字典，沒有別名的組不包含在內。default含義同上。
start([group])：
返回指定的組截獲的子串在string中的起始索引（子串第一個字符的索引）。group默認值為0。
end([group])：
返回指定的組截獲的子串在string中的結束索引（子串最后一個字符的索引+1）。group默認值為0。
span([group])：
返回(start(group), end(group))。
expand(template)：
將匹配到的分組代入template中然后返回。template中可以使用\id或\g<id>、\g<name>引用分組，但不能使用編號0。\id與\g<id>是等價的；但\10將被認為是第10個分組，如果你想表達\1之后是字符'0'，只能使用\g<1>0。

 1 #! /usr/bin/env python
 2 #coding=utf-8
 3 import re  4 pattern=re.compile(r'<p.*?>#(\d*)(\w+) (\w+)(?P<sign>.*)#.*?',re.I|re.M)  5 str="<p style='font-family:arial;color:red;font-size:20px;'>#6748hello word!#6748</p>"
 6 Match=pattern.match(str)  7 print "Match.string:", Match.string  8 print "Match.re:", Match.re  9 print "Match.pos:", Match.pos 10 print "Match.endpos:", Match.endpos 11 print "Match.lastindex:", Match.lastindex 12 print "Match.lastgroup:", Match.lastgroup 13 
14 print "Match.group():", Match.group() 15 print "Match.group(1,2):", Match.group(1, 2) 16 print "Match.groups():", Match.groups() 17 print "Match.groupdict():", Match.groupdict() 18 print "Match.start(2):", Match.start(2) 19 print "Match.end(2):", Match.end(2) 20 print "Match.span(2):", Match.span(2) 21 print r"Match.expand(r'\g<2> \g<3> \g<4> \g<1>'):", Match.expand(r'\2 \3 \4 \1') 22 
23 '''
24 atch.string: <p style='font-family:arial;color:red;font-size:20px;'>#6748hello word!#6748</p> 25 Match.re: <_sre.SRE_Pattern object at 0x0263D600> 26 Match.pos: 0 27 Match.endpos: 80 28 Match.lastindex: 4 29 Match.lastgroup: sign 30 Match.group(): <p style='font-family:arial;color:red;font-size:20px;'>#6748hello word!# 31 Match.group(1,2): ('6748', 'hello') 32 Match.groups(): ('6748', 'hello', 'word', '!') 33 Match.groupdict(): {'sign': '!'} 34 Match.start(2): 60 35 Match.end(2): 65 36 Match.span(2): (60, 65) 37 Match.expand(r'\g<2> \g<3> \g<4> \g<1>'): hello word ! 6748 38 
39 '''

3，search
search(string[, pos[, endpos]]) | re.search(pattern, string[, flags]):
這個方法用於查找字符串中可以匹配成功的子串。

從string的pos下標處起嘗試匹配pattern，

如果pattern結束時仍可匹配，則返回一個Match對象；

若無法匹配，則將pos加1后重新嘗試匹配；

直到pos=endpos時仍無法匹配則返回None。

pos和endpos的默認值分別為0和len(string))；

re.search()無法指定這兩個參數，參數flags用於編譯pattern時指定匹配模式。

那么它和match有什么區別呢？

match()函數只檢測re是不是在string的開始位置匹配，

search()會掃描整個string查找匹配，

,match（）只有在0位置匹配成功的話才有返回，如果不是開始位置匹配成功的話，match()就返回none
例如：
print(re.match(‘super’, ‘superstition’).span())

會返回(0, 5)

print(re.match(‘super’, ‘insuperable’))

則返回None

search()會掃描整個字符串並返回第一個成功的匹配
例如：

print(re.search(‘super’, ‘superstition’).span())

返回(0, 5)
print(re.search(‘super’, ‘insuperable’).span())

返回(2, 7)

4，split

split(string[, maxsplit]) | re.split(pattern, string[, maxsplit]):
按照能夠匹配的子串將string分割后返回列表。

maxsplit用於指定最大分割次數，不指定將全部分割。

[python] view plain copy print ?

import re
p = re.compile(r'\d+')
print p.split('one1two2three3four4')
### output ###
# ['one', 'two', 'three', 'four', '']

5.findall

findall(string[, pos[, endpos]]) | re.findall(pattern, string[, flags]):
搜索string，以列表形式返回全部能匹配的子串。

[python] view plain copy print ?

import re
p = re.compile(r'\d+')
print p.findall('one1two2three3four4')
### output ###
# ['1', '2', '3', '4']

6，finditer

finditer(string[, pos[, endpos]]) | re.finditer(pattern, string[, flags]):
搜索string，返回一個順序訪問每一個匹配結果（Match對象）的迭代器。

import re
p = re.compile(r'\d+')
for m in p.finditer('one1two2three3four4'):
print m.group(),
### output ###
# 1 2 3 4

7，sub

sub(repl, string[, count]) | re.sub(pattern, repl, string[, count]):
使用repl替換string中每一個匹配的子串后返回替換后的字符串。
當repl是一個字符串時，可以使用\id或\g<id>、\g<name>引用分組，但不能使用編號0。
當repl是一個方法時，這個方法應當只接受一個參數（Match對象），並返回一個字符串用於替換（返回的字符串中不能再引用分組）。
count用於指定最多替換次數，不指定時全部替換。,

[python] view plain copy print ?

import re
p = re.compile(r'(\w+) (\w+)')
s = 'i say, hello world!'
print p.sub(r'\2 \1', s)
def func(m):
return m.group(1).title() + ' ' + m.group(2).title()
print p.sub(func, s)
### output ###
# say i, world hello!
# I Say, Hello World!

8，subn(repl, string[, count]) |re.sub(pattern, repl, string[, count]):
返回 (sub(repl, string[, count]), 替換次數)。

[python] view plain copy print ?

import re
p = re.compile(r'(\w+) (\w+)')
s = 'i say, hello world!'
print p.subn(r'\2 \1', s)
def func(m):
return m.group(1).title() + ' ' + m.group(2).title()
print p.subn(func, s)
### output ###
# ('say i, world hello!', 2)
# ('I Say, Hello World!', 2)

轉自： http://blog.csdn.net/pleasecallmewhy/article/details/8929576

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python-re之中文匹配 python-re正則、jsonpath返回值提取 python模塊&&模塊re python 內置模塊-re python的re模塊詳解 python re模塊 python re模塊使用(一) python_re模塊 python re模塊使用 python中的re模塊