1.什么是csv文件
The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. CSV format was used for many years prior to attempts to describe the format in a standardized way in RFC 4180.
2.csv文件缺點
The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer.
3.python模塊csv.py
The csv
module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats.
the csv
module’s reader
and writer
objects read and write sequences. Programmers can also read and write data in dictionary form using the DictReader
and DictWriter
classes
reader(csvfile[, dialect='excel'][, fmtparam])
csvfile
需要是支持迭代(Iterator)的對象,並且每次調用next方法的返回值是字符串(string),通常的文件(file)對象,或者列表(list)對象都是適用的,如果是文件對象,打開是需要加"b"標志參數。
dialect
編碼風格,默認為excel方式,也就是逗號(,)分隔,另外csv模塊也支持excel-tab風格,也就是制表符(tab)分隔。其它的方式需要自己定義,然后可以調用register_dialect方法來注冊,以及list_dialects方法來查詢已注冊的所有編碼風格列表。
fmtparam
格式化參數,用來覆蓋之前dialect對象指定的編碼風格。
參數解釋:
delimiter:設置分隔符
quotechar:設置引用符
quoting:引號選項,有4種不同的引號選項
在csv模塊中定義為四個變量:
QUOTE_ALL不論類型是什么,對所有字段都加引號。
QUOTE_MINIMAL對包含特殊字符的字段加引號(所謂特殊字符是指,對於一個用相同方言和選項配置的解析器,可能會造成混淆的字符)。這是默認選項。
QUOTE_NONNUMERIC對所有非整數或浮點數的字段加引號。在閱讀器中使用時,不加引號的輸入字段會轉換為浮點數。
QUOTE_NONE輸出中所有內容都不加引號。在閱讀器中使用時,引號字符包含在字段值中(正常情況下,它們會處理為定界符並去除)。
import csv
def testReader(file):
with open(file, 'r') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
for row in spamreader:
print(', '.join(row))
if __name__ == '__main__':
csvFile = 'test.csv'
testReader(csvFile)
writer(csvfile[, dialect='excel'][, fmtparam])
參數表(略: 同reader, 見上)
def testWriter(file):
with open(file, 'w') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
DictReader(f,fieldnames = None,restkey = None,restval = None,dialect ='excel',* args,** kwds )
創建一個像常規閱讀器一樣操作的對象,但將每一行中的信息映射到一個OrderedDict
由可選的fieldnames參數給出的鍵。
字段名的參數是一個序列。如果省略字段名稱,文件f的第一行中的值將用作字段名稱。無論字段名稱如何確定,有序字典保留其原始排序。
如果一行的字段數超過了字段名,剩下的數據將被放在一個列表中,並與restkey(默認為None
)指定的字段名一起存儲。如果非空行的字段數少於字段名,則缺少的值將被填入None
。
def testDictReader(file):
# 院系,專業,年級,學生類別,班級,學號,姓名,學分成績,更新時間,班級排名,參與班級排名總人數
with open(file, 'rb') as csvfile:
dictreader = csv.DictReader(csvfile)
for row in dictreader:
print(' '.join([row['院系'], row['專業'], row['學號'], row['姓名']]))
DictWriter(f,fieldnames,restval =“,extrasaction ='raise',dialect ='excel',* args,** kwds )
創建一個像普通writer一樣運行的對象,但將字典映射到輸出行上。的字段名的參數是一個sequence
標識,其中在傳遞給字典值的順序按鍵的writerow()
方法被寫入到文件 ˚F。可選的restval參數指定字典缺少字段名中的鍵時要寫入的值。如果傳遞給該writerow()
方法的字典包含在字段名稱中未找到的鍵 ,則可選的extrasaction參數指示要執行的操作。如果設置為'raise'
默認值,ValueError
則為a 。如果設置為'ignore'
,字典中的額外值將被忽略。任何其他可選或關鍵字參數都傳遞給底層 writer
實例。
請注意,與DictReader
類不同,fieldnames參數DictWriter
不是可選的。由於Python的dict
對象未被排序,因此沒有足夠的可用信息推導出行應該寫入文件f的順序。
def testDictWriter(file):
with open(file, 'w') as csvfile:
fieldnames = ['院系', '專業', '年級', '學生類別', '班級', '學號']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerow(
{'院系': '信息學院', '專業': '計算機科學與技術', '年級': '2011級', '學生類別': '本科(本科)4年', '班級': '計算機11', '學號': '201101245'})
writer.writerow(
{'院系': '信息學院', '專業': '計算機科學與技術', '年級': '2011級', '學生類別': '本科(本科)4年', '班級': '計算機11', '學號': '201101275'})
4.示例代碼
csv文件的拷貝
def copycsv(source, target):
csvtarget = open(target, 'w+')
with open(source, 'r') as csvscource:
reader = csv.reader(csvscource, delimiter=',')
for line in reader:
writer = csv.writer(csvtarget, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
writer.writerow(line)
csvtarget.close()
5.其他方式(numpy,pandas)
import numpy
my_matrix = numpy.loadtxt(open("num.csv", "rb"), delimiter=",", skiprows=0)
print(my_matrix)
import pandas as pd
obj=pd.read_csv('test.csv') print obj print type(obj) print obj.dtypes
test.csv
院系,專業,年級,學生類別,班級,學號,姓名,學分成績,更新時間,班級排名,參與班級排名總人數
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101244,欒,86.72,2017/9/5 9:59,1,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101237,劉,86.05,2017/9/5 9:59,2,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101233,劉,86.03,2017/9/5 9:59,3,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101250,李,85.43,2017/9/5 9:59,4,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101229,張,82.35,2017/9/5 9:59,5,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101241,韓,80.92,2017/9/5 9:59,6,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101232,丁,80.66,2017/9/5 9:59,7,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101228,張,79.61,2017/9/5 9:59,8,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101255,孟,79.55,2017/9/5 9:59,9,27
num.csv
1,2,3
4,5,6
7,8,9
6.完整代碼
# coding:utf-8
import csv
def testReader(file):
with open(file, 'r') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
for row in spamreader:
print(', '.join(row))
def testWriter(file):
with open(file, 'w') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
def copycsv(source, target):
csvtarget = open(target, 'w+')
with open(source, 'r') as csvscource:
reader = csv.reader(csvscource, delimiter=',')
for line in reader:
writer = csv.writer(csvtarget, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
writer.writerow(line)
csvtarget.close()
def testDictReader(file):
# 院系,專業,年級,學生類別,班級,學號,姓名,學分成績,更新時間,班級排名,參與班級排名總人數
with open(file, 'rb') as csvfile:
dictreader = csv.DictReader(csvfile)
for row in dictreader:
print(' '.join([row['院系'], row['專業'], row['學號'], row['姓名']]))
def testDictWriter(file):
with open(file, 'w') as csvfile:
fieldnames = ['院系', '專業', '年級', '學生類別', '班級', '學號']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerow(
{'院系': '信息學院', '專業': '計算機科學與技術', '年級': '2011級', '學生類別': '本科(本科)4年', '班級': '計算機11', '學號': '201101245'})
writer.writerow(
{'院系': '信息學院', '專業': '計算機科學與技術', '年級': '2011級', '學生類別': '本科(本科)4年', '班級': '計算機11', '學號': '201101275'})
def testpandas_csv():
import pandas as pd
obj = pd.read_csv('test.csv')
print obj
print type(obj)
print obj.dtypes
def testnumpy_csv():
import numpy
my_matrix = numpy.loadtxt(open("num.csv", "rb"), delimiter=",", skiprows=0)
print(my_matrix)
if __name__ == '__main__':
# csvFile = 'test.csv'
# testReader(csvFile)
# csvFile = 'test2.csv'
# testWriter(csvFile)
# copycsv('test.csv', 'testcopy.csv')
# testDictReader('test.csv')
# testDictWriter('test2.csv')
testnumpy_csv()
# testpandas_csv()