1. 分組

分組的方法：將子表達式用小括號括起來，如：(exp)，表示匹配表達式exp，並捕獲文本到自動命名的組里。舉例：

import re
s = 'c1c b2b c3c'
p = re.compile(r'c(\d)c')
print '【Output】'
print re.findall(p,s)

【Output】
['1', '3']

s = 'a1b2 c3d4 ea7f'
p1 = re.compile(r'[a-z]\d[a-z]\d')

print '【Output 1】'
print re.findall(p1,s)

p2 = re.compile(r'[a-z]\d[a-z](\d)')

print '\n【Output 2】'
print re.findall(p2,s)

p3 = re.compile(r'[a-z](\d)[a-z](\d)')

print '\n【Output 3】'
print re.findall(p3,s)

【Output 1】
['a1b2', 'c3d4']

【Output 2】
['2', '4']

【Output 3】
[('1', '2'), ('3', '4')]

s = 'age:13,name:Tom;age:18,name:John'
p = re.compile(r'age:(\d+),name:(\w+)')
it = re.finditer(p,s)
print '【Output】'
for m in it:
    print '------'
    print m.group()
    print m.group(0)
    print m.group(1)
    print m.group(2)

【Output】
------
age:13,name:Tom
age:13,name:Tom
13
Tom
------
age:18,name:John
age:18,name:John
18
John

2. 忽略某個分組

有時候給正則的某個子表達式加括號並不是為了分組，而僅僅是為了看起來更清晰，因此在匹配結果中並不想匹配該子表達式，那么該怎么辦呢？答案是忽略該分組，方法：(?:exp)。舉例：只想匹配name，不想匹配age：

s = 'age:13,name:Tom'
p1 = re.compile(r'age:(\d+),name:(\w+)')
print '【Output】'
# 不忽略分組
print re.findall(p1,s)

# 忽略分組
p2 = re.compile(r'age:(?:\d+),name:(\w+)')
print re.findall(p2,s)

【Output】
[('13', 'Tom')]
['Tom']

3. 后向引用

所謂后向引用，就是對前面出現過的分組再一次引用，使用默認的分組名稱進行后向引用：\1,\2,\3...（注：從1開始）

舉例：

# 匹配字符串中連續出現的兩個相同的單詞
s = 'hello blue go go hello'
p = re.compile(r'\b(\w+)\b\s+\1\b')  # 這里的'\1'就對應前面的(\w+)
print '【Output】'
print re.findall(p,s)

【Output】
['go']

4. 自定義名稱分組的后向引用

python正則可以對分組自定義名稱，然后可以使用自定義名稱進行后向引用，使用自定義分組名稱比使用默認分組名稱更加清晰、更容易讓人理解。對分組自定義名稱的方法：

(?P<myname>exp)

后向引用的方式：

(?P=myname)

這里要注意的是，其他語言的正則與python正則的分組自定義名稱的語法不太一樣，其他語言是這樣寫的：

# 自定義名稱
(?<name>exp)
# 后向引用
\K<name>

舉個例子：

s = 'hello blue go go hello'
p = re.compile(r'\b(?P<my_group1>\w+)\b\s+(?P=my_group1)\b')
print '【Output】'
print re.findall(p,s)

【Output】
['go']

5.嵌套分組

s  = '2017-07-10 20:00'
p = re.compile(r'(((\d{4})-\d{2})-\d{2}) (\d{2}):(\d{2})')
re.findall(p,s)
# 輸出：
# [('2017-07-10','2017-07','2017','20','00')]

se = re.search(p,s)
print se.group()
print se.group(0)
print se.group(1)
print se.group(2)
print se.group(3)
print se.group(4)
print se.group(5)

# 輸出：
'''
'2017-07-10 20:00'
'2017-07-10 20:00'
'2017-07-10'
'2017-07'
'2017'
'20'
'00'
'''

可以看出，分組的序號是以左小括號'('從左到右的順序為准的。

6. 后向引用的應用

1. 匹配"ABAB"型字符串

s = 'abab cdcd efek'
p = re.compile(r'(\w\w)\1')
print '【Output】'
print re.findall(p,s)

【Output】
['ab', 'cd']

2. 匹配"AABB"型字符串

s = 'abab cdcd xxyy'
p = re.compile(r'(\w)\1(\w)\2')
print '【Output】'
print re.findall(p,s)

【Output】
[('x', 'y')]

3. 匹配"AABA"型字符串

s = 'abab cdcd xxyx'
p = re.compile(r'(\w)\1(?:\w)\1')
print '【Output】'
print re.findall(p,s)

【Output】
['x']

4. 匹配"ABBA"型字符串

s = 'abab toot'
p = re.compile(r'(\w)(\w)\2\1')
print '【Output】'
print re.findall(p,s)

【Output】
[('t', 'o')]

5. 向字符串中的某些位置插入字符

有一個需求：在一個字符串中的所有通配符（% _ [ ]）前都加上"\"符進行轉義，如果通配符前面本來就有"\"，則不再插入。舉例：

s = 'abc\\_de%fgh[c][]c'
special = r'[%_\[\]]'
print '【Output】'
print 's = {0}'.format(s)
print re.sub(r'([^\\])(?=%s)' % special,r'\1\\',s)
# 注：這里的"(?=%s)"是零寬斷言，匹配一個位置，零寬斷言在后面會講

【Output】
s = abc\_de%fgh[c][]c
abc\_de\%fgh\[c\]\[\]c

6. 在字符串中從后往前每隔3個字符插入一個","符號

s = '1234567890'
s = s[::-1]
print '【Output】'
print s
s = re.sub(r'(...)',r'\1,',s)
print s[::-1]

【Output】
0987654321
1,234,567,890

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python正則表達式(8)--分組、后向引用、前(后)向斷言正則表達式之后向引用正則表達式：引用分組 Python 正則表達式（分組） Python 正則表達式（分組） Python 正則表達式分組正則表達式入門（四）選擇，分組和向后引用正則表達式詳解（貪婪與懶惰、前瞻與后顧、后向引用等）正則表達式中圓括號的用法--也叫后向引用 python正則表達式中的分組 group