Python操作csv文件

本文轉載自查看原文 2017-10-03 17:10 1541 Python

1.什么是csv文件

The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. CSV format was used for many years prior to attempts to describe the format in a standardized way in RFC 4180.

2.csv文件缺點

The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer.

3.python模塊csv.py

The csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats.

the csv module’s reader and writer objects read and write sequences. Programmers can also read and write data in dictionary form using the DictReader and DictWriter classes

reader(csvfile[, dialect='excel'][, fmtparam])

csvfile
        需要是支持迭代(Iterator)的對象，並且每次調用next方法的返回值是字符串(string)，通常的文件(file)對象，或者列表(list)對象都是適用的，如果是文件對象，打開是需要加"b"標志參數。
dialect
        編碼風格，默認為excel方式，也就是逗號(,)分隔，另外csv模塊也支持excel-tab風格，也就是制表符(tab)分隔。其它的方式需要自己定義，然后可以調用register_dialect方法來注冊，以及list_dialects方法來查詢已注冊的所有編碼風格列表。
fmtparam
        格式化參數，用來覆蓋之前dialect對象指定的編碼風格。

參數解釋：

delimiter：設置分隔符

quotechar：設置引用符

quoting：引號選項，有4種不同的引號選項

在csv模塊中定義為四個變量：

QUOTE_ALL不論類型是什么，對所有字段都加引號。

QUOTE_MINIMAL對包含特殊字符的字段加引號（所謂特殊字符是指，對於一個用相同方言和選項配置的解析器，可能會造成混淆的字符）。這是默認選項。

QUOTE_NONNUMERIC對所有非整數或浮點數的字段加引號。在閱讀器中使用時，不加引號的輸入字段會轉換為浮點數。

QUOTE_NONE輸出中所有內容都不加引號。在閱讀器中使用時，引號字符包含在字段值中（正常情況下，它們會處理為定界符並去除）。

import csv

def testReader(file):
	with open(file, 'r') as csvfile:
		spamreader = csv.reader(csvfile, delimiter=',')
		for row in spamreader:
			print(', '.join(row))

if __name__ == '__main__':
	csvFile = 'test.csv'
	testReader(csvFile)

writer(csvfile[, dialect='excel'][, fmtparam])

參數表(略: 同reader, 見上)

def testWriter(file):
	with open(file, 'w') as csvfile:
		spamwriter = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
		spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
		spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

DictReader（f，fieldnames = None，*restkey = None，restval = None，dialect ='excel'， args，** kwds ）

創建一個像常規閱讀器一樣操作的對象，但將每一行中的信息映射到一個OrderedDict 由可選的fieldnames參數給出的鍵。

字段名的參數是一個序列。如果省略字段名稱，文件f的第一行中的值將用作字段名稱。無論字段名稱如何確定，有序字典保留其原始排序。

如果一行的字段數超過了字段名，剩下的數據將被放在一個列表中，並與restkey（默認為None）指定的字段名一起存儲。如果非空行的字段數少於字段名，則缺少的值將被填入None。

def testDictReader(file):
	# 院系,專業,年級,學生類別,班級,學號,姓名,學分成績,更新時間,班級排名,參與班級排名總人數
	with open(file, 'rb') as csvfile:
		dictreader = csv.DictReader(csvfile)
		for row in dictreader:
			print(' '.join([row['院系'], row['專業'], row['學號'], row['姓名']]))

DictWriter（f，fieldnames，*restval =“，extrasaction ='raise'，dialect ='excel'， args，** kwds ）

創建一個像普通writer一樣運行的對象，但將字典映射到輸出行上。的字段名的參數是一個sequence標識，其中在傳遞給字典值的順序按鍵的writerow()方法被寫入到文件 ˚F。可選的restval參數指定字典缺少字段名中的鍵時要寫入的值。如果傳遞給該writerow()方法的字典包含在字段名稱中未找到的鍵，則可選的extrasaction參數指示要執行的操作。如果設置為'raise'默認值，ValueError 則為a 。如果設置為'ignore'，字典中的額外值將被忽略。任何其他可選或關鍵字參數都傳遞給底層 writer實例。

請注意，與DictReader類不同，fieldnames參數DictWriter不是可選的。由於Python的dict 對象未被排序，因此沒有足夠的可用信息推導出行應該寫入文件f的順序。

def testDictWriter(file):
	with open(file, 'w') as csvfile:
		fieldnames = ['院系', '專業', '年級', '學生類別', '班級', '學號']
		writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
		writer.writeheader()
		writer.writerow(
			{'院系': '信息學院', '專業': '計算機科學與技術', '年級': '2011級', '學生類別': '本科(本科)4年', '班級': '計算機11', '學號': '201101245'})
		writer.writerow(
			{'院系': '信息學院', '專業': '計算機科學與技術', '年級': '2011級', '學生類別': '本科(本科)4年', '班級': '計算機11', '學號': '201101275'})

4.示例代碼

csv文件的拷貝

def copycsv(source, target):
	csvtarget = open(target, 'w+')
	with open(source, 'r') as csvscource:
		reader = csv.reader(csvscource, delimiter=',')
		for line in reader:
			writer = csv.writer(csvtarget, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
			writer.writerow(line)
	csvtarget.close()

5.其他方式（numpy,pandas）

import numpy

	my_matrix = numpy.loadtxt(open("num.csv", "rb"), delimiter=",", skiprows=0)
	print(my_matrix)

import pandas as pd

obj=pd.read_csv('test.csv')
print obj
print type(obj)
print obj.dtypes

test.csv

院系,專業,年級,學生類別,班級,學號,姓名,學分成績,更新時間,班級排名,參與班級排名總人數
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101244,欒,86.72,2017/9/5 9:59,1,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101237,劉,86.05,2017/9/5 9:59,2,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101233,劉,86.03,2017/9/5 9:59,3,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101250,李,85.43,2017/9/5 9:59,4,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101229,張,82.35,2017/9/5 9:59,5,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101241,韓,80.92,2017/9/5 9:59,6,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101232,丁,80.66,2017/9/5 9:59,7,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101228,張,79.61,2017/9/5 9:59,8,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101255,孟,79.55,2017/9/5 9:59,9,27

num.csv

1,2,3
4,5,6
7,8,9

6.完整代碼

# coding:utf-8

import csv


def testReader(file):
	with open(file, 'r') as csvfile:
		spamreader = csv.reader(csvfile, delimiter=',')
		for row in spamreader:
			print(', '.join(row))


def testWriter(file):
	with open(file, 'w') as csvfile:
		spamwriter = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
		spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
		spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])


def copycsv(source, target):
	csvtarget = open(target, 'w+')
	with open(source, 'r') as csvscource:
		reader = csv.reader(csvscource, delimiter=',')
		for line in reader:
			writer = csv.writer(csvtarget, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
			writer.writerow(line)
	csvtarget.close()


def testDictReader(file):
	# 院系,專業,年級,學生類別,班級,學號,姓名,學分成績,更新時間,班級排名,參與班級排名總人數
	with open(file, 'rb') as csvfile:
		dictreader = csv.DictReader(csvfile)
		for row in dictreader:
			print(' '.join([row['院系'], row['專業'], row['學號'], row['姓名']]))


def testDictWriter(file):
	with open(file, 'w') as csvfile:
		fieldnames = ['院系', '專業', '年級', '學生類別', '班級', '學號']
		writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
		writer.writeheader()
		writer.writerow(
			{'院系': '信息學院', '專業': '計算機科學與技術', '年級': '2011級', '學生類別': '本科(本科)4年', '班級': '計算機11', '學號': '201101245'})
		writer.writerow(
			{'院系': '信息學院', '專業': '計算機科學與技術', '年級': '2011級', '學生類別': '本科(本科)4年', '班級': '計算機11', '學號': '201101275'})


def testpandas_csv():
	import pandas as pd

	obj = pd.read_csv('test.csv')
	print obj
	print type(obj)
	print obj.dtypes


def testnumpy_csv():
	import numpy

	my_matrix = numpy.loadtxt(open("num.csv", "rb"), delimiter=",", skiprows=0)
	print(my_matrix)


if __name__ == '__main__':
	# csvFile = 'test.csv'
	# testReader(csvFile)

	# csvFile = 'test2.csv'
	# testWriter(csvFile)

	# copycsv('test.csv', 'testcopy.csv')

	# testDictReader('test.csv')

	# testDictWriter('test2.csv')
	testnumpy_csv()

# testpandas_csv()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python操作csv文件使用Python對Csv文件操作 python讀寫操作csv及excle文件 Python文件操作(txt\xlsx\csv)及os操作 csv文件操作 python csv與字典操作 python之csv操作 python + csv 操作(讀寫) python讀取csv文件 python CSV文件合並