python正則模塊re.findall的問題

本文轉載自查看原文 2020-09-08 17:39 580 Python

發現python的正則模塊re的findall方法跟我預想的不太一樣，它匹配的時候會消耗掉之前已經匹配到的字符，例如：

[In]:
import re
pat=',\d+,' #表示一個或以上整數前后都有一個逗號
text='1,2,3,4,5,6,7,'
[In]:
result=re.findall(pat,text)
print(result)
[Out]:
[',2,', ',4,', ',6,']

理想中，應該2、3、4、5、6、7、都能匹配出來。

但上例中，實際執行的時候，2后面的逗號被取走（消耗掉）了，於是3不符合模式串，跳過，4前后都有逗號，匹配到，類似於3，5也被跳過，6匹配到。

如何得到我們希望的，從左到右掃描進行匹配時，已經匹配到的字符不被消耗？還沒找到好辦法，暫時只能用re.search()中設定pos參數值從文本的指定index開始進行匹配來解決了。

下面給出參考的代碼：

def recursive_search(res,pat,text,pos=0):
    # 遞歸查找，避免re.findall()方法的缺點:
    # re.findall(',\d+,','1,2,3,4,5,6,7,')返回[',2,', ',4,', ',6,'],但希望返回[',2,',',3,',',4,',',5,',',6,',',7,']
    chars=[',','、','，','。','.']
    m=re.search(pat,text[pos:])
　　 # pos表示從text的哪個index開始匹配 if m:
        span=(m.span()[0]+pos,m.span()[1]+pos)
        res.append({'result':m.group(0),'span':span})
        start_index=span[1]
        if pat[-1] in chars:
            start_index=span[1]-1
        return recursive_search(res,pat,text,pos=start_index)
    else:
        return res

測試方法：

def test():
    pat=',\d+,'
    text='1,2,3,4,5,6,7,'
    res=[]
    res=recursive_search(res,pat,text)
    print(res)
輸出：
[{'result': ',2,', 'span': (1, 4)}, {'result': ',3,', 'span': (3, 6)}, {'result': ',4,', 'span': (5, 8)}, 
{'result': ',5,', 'span': (7, 10)}, {'result': ',6,', 'span': (9, 12)}, {'result': ',7,', 'span': (11, 14)}]

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python re.findall 使用 python中正則表達式 re.findall 用法 python之路----正則re（search，match，findall……） Python--re模塊的findall等用法 python模塊re中的findall和finditer區別 re模塊findall函數用法 python 正則提取（re模塊） day16- re模塊（正則表達式三種查找方法findall search match） python爬蟲筆記之re.compile.findall() re模塊（詳解正則）