Python正則表達式re模塊

本文轉載自查看原文 2019-04-01 11:33 610

什么是正則表達式

世界上信息非常多，而我們關注的信息有限，加入我們希望只提取出關注的數據，此時可以通過一些表達式進行提取，正則表達式就是其中一種進行數據篩選的表達式。

正則表達式的通用語法

表達式	描述
()	括起來的部分就是要提取的，可以用來分組
.	匹配除換行符以外的所有單個字符
X*	匹配X(單個字符)零次或多次
X+	匹配X(單個字符)一次或多次
.*	匹配任意字符任意次(換行符除外)
?	匹配該字符前面的字符0次或1次
\d	匹配數字
\b	匹配單詞的開始或結束
\w	匹配數字或者字母或下划線或漢字
{n}	表示匹配n個字符
{n,m}	表示匹配n-m個字符
\s	匹配空格
[]	表示范圍，如[0-9a-zA-Z]表示匹配數字小寫字母和大寫字母
^	表示匹配以某元素開頭，該字符在[]中如[^0-2]，表示不包含0-2
$	表示匹配以某元素結尾
A\|B	匹配A或者B

re模塊

在Python中，使用內置的re模塊來使用正則表達式，正則表達式使用\對特殊的字符進行轉義，如匹配“baidu.com”，我們需要使用正則表達式"baidu\.com"，但是Python的\本身也需要\轉義，因此上面的正則表達式使用Python應該寫成"baidu\\.com"，很不方便，我們可以使用Python的原始字符串，只需要加上r前綴，這樣的話，上面的正則表達式可以寫成：

1 r"baidu.com"

re模塊的使用步驟一般為：

1）使用compile函數將正則表達式的字符串形式編譯為一個Pattern對象。

2）通過Pattern對象提供的方法對文本進行查找，獲得匹配的結果（Match對象）。

3）使用Match對象提供的屬性和方法獲得信息，然后根據需要進行其他操作。

compile函數

compile函數用於編譯正則表達式，生成Pattern對象。

1 compile(pattern, flags=0)

說明：pattern是一個正則表達式，flag是匹配模式，如忽略大小寫等。

1 import re
2 
3 # 將正則表達式編譯成Pattern對象
4 pattern = re.compile("\d+")

Pattern對象的常用方法有：match(),search(),findall(),finditer(),split(),sub(),subn()。

1）match方法

該方法用於查找字符串的頭部，它只要找到了一個匹配的結果就返回。

1 match(string, pos=0, endpos=-1)

說明：string是待匹配的字符串，pos和endpos指定字符串的起始和終點的位置，當不指定時，默認從頭部開始匹配，當匹配成功時，返回Match對象。

import re

pattern = re.compile("\\d+")
match = pattern.match("aaa123bbb123ccc123")
print(match)  # None

match = pattern.match("aaa123bbb123ccc123", 3, 6)
print(match)  # <_sre.SRE_Match object; span=(3, 6), match='123'>

print(match.group())  # 123，返回匹配的字符串，如果需要獲得整個匹配的子串時，可以使用group()或者group(0)
print(match.start())  # 3，返回匹配的子串在整個字符串的起始位置
print(match.end())  # 6，返回匹配的子串在整個字符串的結束位置
print(match.span())  # (3, 6)，返回(start(),end())

2）search()方法

該方法用於查找字符串的任何位置，它只要找到一個匹配的結果就返回。

1 search(string, pos=0, endpos=-1):

說明：string是帶匹配的字符串，pos和endpos分別為字符串的起始和結束位置，當匹配成功時返回Match對象，匹配不成功時返回None。

pattern = re.compile("\\d+")
match = pattern.search("aaaa1111bbbb2222cccc3333")
print(match)  # <_sre.SRE_Match object; span=(4, 8), match='1111'>

match = pattern.search("aaaa1111bbbb2222cccc3333", 3, 6)
print(match)  # <_sre.SRE_Match object; span=(4, 6), match='11'>

print(match.group())  # 11，返回匹配的字符串，如果需要獲得整個匹配的子串時，可以使用group()或者group(0)
print(match.start())  # 4，返回匹配的子串在整個字符串的起始位置
print(match.end())  # 6，返回匹配的子串在整個字符串的結束位置
print(match.span())  # (4, 6)，返回(start(),end())

3）findall()方法

該方法返回所有匹配的結果。

1 findall(string, pos=0, endpos=-1)

說明：string表示需要匹配的字符串，pos和endpos表示匹配字符串的起始和結束位置，匹配成功，返回匹配的列表，匹配不成功，返回空列表。

import re

pattern = re.compile("\\d+")
match = pattern.findall("aaaa1111bbbb2222cccc3333")
print(match)  # ['1111', '2222', '3333']

4）finditer()方法

1 finditer(string, pos=0, endpos=-1)

說明：該方法表示匹配所有的字符串，pos和endpos表示匹配字符串的起始和結束位置。

該方法返回所有匹配的字符串，但是它返回的是一個迭代器，通過該迭代器我們可以訪問匹配的每一個字符串。

import re

pattern = re.compile("\\d+")
result_iter = pattern.finditer("aaaa1111bbbb2222cccc3333")
for result in result_iter:
    print("找到的字符串{0},位置是{1}".format(result.group(), result.span()))
# 找到的字符串1111,位置是(4, 8)
# 找到的字符串2222,位置是(12, 16)
# 找到的字符串3333,位置是(20, 24)

5）split()方法

1 split(string, maxsplit=0)

說明：該方法表示將能夠匹配的子串切割，string表示需要匹配的字符串，maxsplit表示最大的分割次數，不指定即為全部分割。

import re

# 將正則表達式編譯成Pattern對象
pattern = re.compile("[,;\s]+")  # 匹配, ; 空格一次或多次
l = pattern.split("a,b;c  d", 2)
print(l)  # ['a', 'b', 'c  d']

6）sub()方法

1 sub(repl, string, count=0)

說明：該方法用來替換。

repl如果為字符串，會使用repl替換字符串中的每一個匹配的子串，並且返回替換后的字符串；如果為函數，則該函數應該只接收一個Match對象，並且返回一個字符串用於替換。

count用於指定替換次數。

# 將正則表達式編譯成Pattern對象
p = re.compile(r'(\w+) (\w+)')
s = 'test aaa, test bbb'


def func(m):
    return 'hei' + ' ' + m.group(2)


print(p.sub(r'hello world', s))  # ('hello world, hello world')，使用hello world替換
print(p.sub(r'\2 \1', s))  # ('aaa test, bbb test')，引用分組
print(p.sub(func, s))  # ('hei aaa, hei bbb')，替換全部
print(p.sub(func, s, 1))  # ('hei aaa, test bbb')，最多只替換一次

7）subn()方法

1 subn(repl, string, count=0)

該方法也是用於替換，返回一個元組，元組有兩個元素，第一個和使用sub方法返回的結果一樣，另一個表示替換的次數。

# 將正則表達式編譯成Pattern對象
p = re.compile(r'(\w+) (\w+)')
s = 'test aaa, test bbb'


def func(m):
    return 'hei' + ' ' + m.group(2)


print(p.subn(r'hello world', s))  # ('hello world, hello world', 2)，使用hello world替換
print(p.subn(r'\2 \1', s))  # ('aaa test, bbb test', 2)，引用分組
print(p.subn(func, s))  # ('hei aaa, hei bbb', 2)，替換全部
print(p.subn(func, s, 1))  # ('hei aaa, test bbb', 1)，最多只替換一次

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 正則表達式re模塊 re模塊（正則表達式） re(正則表達式)模塊 python中的正則表達式（re模塊） python re模塊 - 正則表達式 python--re模塊(正則表達式) Python 之【re模塊的正則表達式學習】 Python中re(正則表達式)模塊學習 python3 正則表達式 re模塊正則表達式及Python的re模塊