string.casefold和string.lower 區別
python 3.3 引入了string.casefold 方法,其效果和 string.lower 非常類似,都可以把字符串變成小寫,那么它們之間有什么區別?他們各自的應用場景?
對 Unicode 的時候用 casefold
string.casefold官方說明:
Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. For example, the German lowercase letter 'ß' is equivalent to "ss". Since it is already lowercase, lower() would do nothing to 'ß'; casefold()converts it to "ss".
The casefolding algorithm is described in section 3.13 of the Unicode Standard
lower() 只對 ASCII 也就是 'A-Z'有效,但是其它一些語言里面存在小寫的情況就沒辦法了。文檔里面舉得例子是德語中'ß'的小寫是'ss':
s = 'ß' s.lower() # 'ß' s.casefold() # 'ss'
string.lower官方說明:
Return a copy of the string with all the cased characters [4] converted to lowercase.
The lowercasing algorithm used is described in section 3.13 of the Unicode Standard
參考
https://docs.python.org/3/library/stdtypes.html#str.casefold
https://segmentfault.com/q/1010000004586740/a-1020000004586838
總結
漢語 & 英語環境下面,繼續用 lower()沒問題;要處理其它語言且存在大小寫情況的時候再用casefold()
