string.casefold和
string.lower
區別
python 3.3 引入了string.casefold
方法,其效果和 string.lower
非常類似,都可以把字符串變成小寫,那么它們之間有什么區別?他們各自的應用場景?
對 Unicode 的時候用 casefold
string.casefold
官方說明:
Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. For example, the German lowercase letter 'ß'
is equivalent to "ss"
. Since it is already lowercase, lower()
would do nothing to 'ß'
; casefold()
converts it to "ss"
.
The casefolding algorithm is described in section 3.13 of the Unicode Standard
lower()
只對 ASCII 也就是 'A-Z'
有效,但是其它一些語言里面存在小寫的情況就沒辦法了。文檔里面舉得例子是德語中'ß'
的小寫是'ss'
:
s = 'ß' s.lower() # 'ß' s.casefold() # 'ss'
string.lower
官方說明:
Return a copy of the string with all the cased characters [4] converted to lowercase.
The lowercasing algorithm used is described in section 3.13 of the Unicode Standard
參考
https://docs.python.org/3/library/stdtypes.html#str.casefold
https://segmentfault.com/q/1010000004586740/a-1020000004586838
總結
漢語 & 英語環境下面,繼續用 lower()
沒問題;要處理其它語言且存在大小寫情況的時候再用casefold()