手冊中關於split()用法如下:
str.split(sep=None, maxsplit=-1)
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified or -1, then there is no limit on the number of splits (all possible splits are made).
If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']). The sep argument may consist of multiple characters (for example, '1<>2<>3'.split('<>') returns ['1', '2', '3']). Splitting an empty string with a specified separator returns [''].
For example:
>>> '1,2,3'.split(',')
['1', '2', '3']
>>> '1,2,3'.split(',', maxsplit=1)
['1', '2,3']
>>> '1,2,,3,'.split(',')
['1', '2', '', '3', '']
If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].
For example:
>>> '1 2 3'.split()
['1', '2', '3']
>>> '1 2 3'.split(maxsplit=1)
['1', '2 3']
>>> ' 1 2 3 '.split()
['1', '2', '3']
使用中碰到對一個數據集的處理碰到一點問題,最終用split()解決:
文件數據集:
0.0888 201 36.02 28 0.5885 0.1399 198 39.32 30 0.8291 ...
目的將數據集保存到列表,以作接下來處理。
import os data = [] for lines in open(r"date.txt",'r').readlines(): lines.strip() s = [x for x in lines.strip()] data.append(s) print(data)
發現輸出將每個字符都打印出來了,即0.0888為6個字符而不是期望中的1個,打印data[0]長度可知確實如此。
[['0', '.', '0', '8', '8', '8', ' ', ' ', ' ', ' ', ' ', '2', '0', '1', ' ', ' ', ' ', ' ', ' ', '3', '6', '.', '0', '2', ' ', ' ', ' ', ' ', ' ', '2', '8', ' ', ' ', ' ', ' ', ' ', '0', '.', '5', '8', '8', '5'], ['0', '.', '1', '3', '9', '9', ' ', ' ', ' ', ' ', ' ', '1', '9', '8', ' ', ' ', ' ', ' ', ' ', '3', '9', '.', '3', '2', ' ', ' ', ' ', ' ', ' ', '3', '0', ' ', ' ', ' ', ' ', ' ', '0', '.', '8', '2', '9', '1']]
利用split()函數按' '把每個數字分割出來:
for lines in open(r"date.dat",'r').readlines(): lines.strip() s = [x for x in lines.strip().split()] data.append(s) print(data) print(len(data[0]))
輸出:
[['0.0888', '201', '36.02', '28', '0.5885'], ['0.1399', '198', '39.32', '30', '0.8291']] 5