【456】python string 類內容(去除文本標點)


repr() 函數可以將對象轉為 string 類型。

 

主要用於 NLP 處理,里面存在一些常量列表,包括數字、字母、大寫字母、小寫字母、標點符號、空格等。

參考:6.1. string — Common string operations

可以用於刪除文本中的標點符號,將標點符號 replace 為 空。

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> string.digits
'0123456789'
>>> string.ascii_letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
>>> string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> string.hexdigits
'0123456789abcdefABCDEF'
>>> string.printable
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
>>> string.whitespace
' \t\n\r\x0b\x0c'

6.1.1. String constants

The constants defined in this module are:

string. ascii_letters

The concatenation of the ascii_lowercase and ascii_uppercase constants described below. This value is not locale-dependent.

string. ascii_lowercase

The lowercase letters 'abcdefghijklmnopqrstuvwxyz'. This value is not locale-dependent and will not change.

string. ascii_uppercase

The uppercase letters 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. This value is not locale-dependent and will not change.

string. digits

The string '0123456789'.

string. hexdigits

The string '0123456789abcdefABCDEF'.

string. octdigits

The string '01234567'.

string. punctuation

String of ASCII characters which are considered punctuation characters in the C locale.

string. printable

String of ASCII characters which are considered printable. This is a combination of digits, ascii_letters, punctuation, and whitespace.

string. whitespace

A string containing all ASCII characters that are considered whitespace. This includes the characters space, tab, linefeed, return, formfeed, and vertical tab.


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM