Python re.sub 會對要替換成的字符串進行轉義問題

本文轉載自查看原文 2021-04-10 11:56 439 原創/ Python

衍生問題：`re.error: bad escape \x at position xxx (line xz, column xz)`

我先把定義放在這：re.sub(pattern, repl, string, count=0, flags=0)。
出現這個問題的時候，我搜索了一下，結合我的情況：我的 pattern 是沒有進行錯誤的轉義的。可能出錯的就是在 repl 里。翻看源代碼：

def sub(pattern, repl, string, count=0, flags=0):
    """Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a string, backslash escapes in it are processed.  If it is
    a callable, it's passed the Match object and must return
    a replacement string to be used."""
    return _compile(pattern, flags).sub(repl, string, count)

分析

re.sub(pattern, repl, string, count=0, flags=0) 會對 repl 內的反斜杠進行轉義，這是一個容易被忽略、值得注意的問題。

一方面 sub 通過轉義反斜杠，實現反向引用。另一方面卻可能會影響我們單純字面意義上的反斜杠，比如 \\ 會被最終轉義成 \，如果我們處理的是代碼字符串，就會導致問題出現，例如：

import re

# 直接將整個字符串替換
origin = "/*  好的  */"
to = r"('\\', 0)"
print("想要替換成：", to)
print("實際上替換成：", re.sub(r"/*  好的  */", to, origin))

想要替換成： ('\\', 0)
實際上替換成： ('\', 0)

如果要替換成的字符串 repl 再復雜一些，可能會產生一些錯誤的轉義，拋出形如異常：

re.error: bad escape \x at position 86013 (line 1575, column 7966)

看到一種方法：import regex as re，可以跳過這個異常，但是實際上也沒有完成我們的最初替換需求。

解決

可以手動再次轉義 repl 字符串，但是實際上這個工作得不償失。

結合源代碼，repl 可以是字符串也是可調用的，比如函數。如果是字符串就會對反斜杠進行轉義。如果是函數，會被傳遞 Match 對象，函數需要返回要替換成的字符串。

因此我們就可以自定義一個函數來解決 re.sub 自動轉義的問題：

import re

origin = "/*dfe1683替換區  \n好的   */"
to = r"('\\', 0)"
print("想要替換成:", to)
print("實際上替換成：", re.sub(r"/\*dfe1683替換區[\s\S]+?\*/", lambda m: to, origin))

想要替換成: ('\\', 0)
實際上替換成： ('\\', 0)

也就是說，把我們要替換成的不需要轉義斜杠的字符串，從 to 改成匿名函數 lambda m: to。這個方法相比轉義 repl 的字符串，避免了兩次轉義斜杠的性能浪費。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python replace() 和 re.sub() 字符串字符替換 Python:re.sub()實現字符串替換 Python3字符串替換replace()，translate()，re.sub() Python正則替換字符串函數re.sub用法示例（1） python 替換字符串的方法replace（）、正則re.sub() Python 使用正則替換 re.sub Python 正則 re.sub替換 python re.sub python3的檢索和替換re.sub函數 python 正則函數 re.sub 替換不完全的問題以及解決方法