python字符串與文本操作(一)

本文轉載自查看原文 2018-04-09 23:12 982 python/ 文本操作/ 字符串

1.一個字符串分割為多個字段，但是分隔符 (還有周圍的空格) 並不是固定的

#string 對象的split()方法只適應於非常簡單的字符串分割情形，它並不允許有 多個分隔符或者是分隔符周圍不確定的空格。當你需要更加靈活的切割字符串的時候最好使用re.split()方法
line = 'asdf fjdk; afed, fjek,asdf, foo'
import re
list_line = re.split(r'[;,\s]\s*',line)
print(list_line)

2.你需要通過指定的文本模式去檢查字符串的開頭或者結尾，比如文件名后綴，URL Scheme 等等

filename = 'spam.txt'
print(filename.endswith('.txt')) #True
print(filename.startwith('file'))#Farse

#想檢查多種匹配可能，只需要將所有的匹配項放入到一個元組中去，然后傳給startswith()或者 endswith() 方法
import os

filenames = os.listdir('.')
list_1 =[name for name in filenames if name.endswith(('.c', '.h')) ]
print(any(name.endswith('.py') for name in filenames))#True

from urllib.request import urlopen
def read_data(name):
    if name.startswith(('http:', 'https:', 'ftp:')):#必須要輸入一個元組作為參數
        return urlopen(name).read()

3.對於復雜的匹配需要使用正則表達式和 re 模塊,

核心步驟就是先使用 re.compile() 編譯正則表達式字符串，然后使用 match() , findall() 或者 finditer() 等方法

match() 總是從字符串開始去匹配。

search(),從整體中匹配返回第一次匹配到的結果。

在定義正則式的時候，通常會利用括號去捕獲分組。

datepat = re.compile(r'(\d+)/(\d+)/(\d+)')
m = datepat.match('11/27/2012')
m.group(0)
m.group(1)
m.group(2)
m.group(3)
m.groups()

4.在字符串中搜索和匹配指定的文本模式

text = 'yeah, but no, but yeah, but no, but yeah'
text.replace('yeah','ok')#對於簡單的字面模式，直接使用 str.repalce() 方法即可

text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'
import re
text = re.sub(r'(\d+)/(\d+)/(\d+)',r'\3-\2-\1',text)#對於復雜的模式,請使用 re 模塊中的 sub() 函數,反斜杠數字比如 \3 指向前面模式的捕獲組號
print(text)#Today is 2012-27-11. PyCon starts 2013-13-3.


datepat = re.compile(r'(\d+)/(\d+)/(\d+)')#打算用相同的模式做多次替換，考慮先編譯它來提升性能
datepat.sub(r'\3-\1-\2', text)
#對於更加復雜的替換，可以傳遞一個替換回調函數來代替
from calendar import month_abbr   
def change_date(m):
　　mon_name = month_abbr[int(m.group(1))]
　　return '{} {} {}'.format(m.group(2), mon_name, m.group(3))
newtext = datepat.sub(change_date, text)#'Today is 27 Nov 2012. PyCon starts 13 Mar 2013

5.某種對齊方式來格式化字符串:對於基本的字符串對齊操作，可以使用字符串的 ljust() , rjust() 和 center() 方法

text = 'Hello World'
text.ljust(20,'>') #'Hello World>>>>>>>'
text.rjust(20,'<')#'<<<<<<<<Hello World
text.center(20,'*')#' ****Hello World*****'

#函數 format() 同樣可以用來很容易的對齊字符串。你要做的就是使用 <,> 或者ˆ 字符后面緊跟一個指定的寬度
format(text, '>20')#'         Hello World'

#當格式化多個值的時候，這些格式代碼也可以被用在 format() 方法中
'{:>10s} {:>10s}'.format('Hello', 'World')#'      Hello        World'

#format() 函數的一個好處是它不僅適用於字符串。它可以用來格式化任何值，使得它非常的通用。比如，你可以用它來格式化數字：
x = 1.2345
format(x, '>10')#'       1.2345'
format(x, '^10.2f')#'    1.23    '

　　　　print("The total value of your change is ${0:0.2f}".format(total))#The total value of your change is $1.50

　　索引 0 用於表示第一個（也是唯一的）參數插入該插槽，格式說明符為 0.2f。此說明符的格式為<寬度>.<精度><類型>。寬度指明值應占用多少“空間”。如果值小於指定的寬度，則用額外的字符填充（空格是默認值）。如果值需要的空間比分配的更多，它會占據顯示該值所需的空間。所以在這里放置一個 0 基本上是說“使用你需要的空間”。精度是2 ，這告訴 Python 將值舍入到兩個小數位。最后，類型字符 f 表示該值應顯示為定點數。這意味着，將始終顯示指定的小數位數，即使它們為 0

6.幾個小的字符串合並為一個大的字符串:你想要合並的字符串是在一個序列或者 iterable 中，那么最快的方式就是使用 join() 方法

parts = ['Is', 'Chicago', 'Not', 'Chicago?']
' '.join(parts)#'Is Chicago Not Chicago?'

#你僅僅只是合並少數幾個字符串，使用加號 (+) 通常已經足夠了
a = 'Is Chicago'
b = 'Not Chicago?'
a + ' ' + b  #'Is Chicago Not Chicago?'

7.創建一個內嵌變量的字符串，變量被它的值所表示的字符串替換掉

s = '{name} has {n} messages.'
s.format(name='Guido', n=37)#Python 並沒有對在字符串中簡單替換變量值提供直接的支持。但是通過使用字符串的 format() 方法來解決這個問題


name = 'songshichao'
n = 18
'%(name) has %(n) messages.' % vars()#'songshichao has 18 messages.'

#或者
import string
s = string.Template('$name has $n messages.')
s.substitute(vars())

8.以指定列寬格式化字符串：textwrap 模塊來格式化字符串的輸出

#textwrap 模塊對於字符串打印是非常有用的，特別是當你希望輸出自動匹配終端大小的時候。你可以使用 os.get terminal size() 方法來獲取終端的大小尺寸

import textwrap
print(textwrap.fill(s, 70))
print(textwrap.fill(s, 40, initial_indent=' '))
print(textwrap.fill(s, 40, subsequent_indent=' '))

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python字符串和文本操作 Python字符串操作 python字符串操作 python 字符串操作 Python 字符串操作 python讀取多行字符串文本 python字符串及字符串操作 Python字符串操作之字符串分割與組合「Python」字符串操作內置函數 python(二)——list、字典、字符串操作