【456】python string 类内容(去除文本标点)


repr() 函数可以将对象转为 string 类型。

 

主要用于 NLP 处理,里面存在一些常量列表,包括数字、字母、大写字母、小写字母、标点符号、空格等。

参考:6.1. string — Common string operations

可以用于删除文本中的标点符号,将标点符号 replace 为 空。

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> string.digits
'0123456789'
>>> string.ascii_letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
>>> string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> string.hexdigits
'0123456789abcdefABCDEF'
>>> string.printable
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
>>> string.whitespace
' \t\n\r\x0b\x0c'

6.1.1. String constants

The constants defined in this module are:

string. ascii_letters

The concatenation of the ascii_lowercase and ascii_uppercase constants described below. This value is not locale-dependent.

string. ascii_lowercase

The lowercase letters 'abcdefghijklmnopqrstuvwxyz'. This value is not locale-dependent and will not change.

string. ascii_uppercase

The uppercase letters 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. This value is not locale-dependent and will not change.

string. digits

The string '0123456789'.

string. hexdigits

The string '0123456789abcdefABCDEF'.

string. octdigits

The string '01234567'.

string. punctuation

String of ASCII characters which are considered punctuation characters in the C locale.

string. printable

String of ASCII characters which are considered printable. This is a combination of digits, ascii_letters, punctuation, and whitespace.

string. whitespace

A string containing all ASCII characters that are considered whitespace. This includes the characters space, tab, linefeed, return, formfeed, and vertical tab.


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM