DataFrame是一個表格型的數據結構,它含有一組有序的列,每列可以是不同的值類型(數值,字符串,布爾型)。DateFrame既有行索引也有列索引,可以被看作為由Series組成的字典。
構建DataFrame:
1.1、直接傳入一個由等長列表或numpy數組組成的字典

''' Created on 2016-8-10 @author: xuzhengzhu ''' from pandas import * data={'state':['ohio','ohio','ohio','nevada','nevada'],'year':[2000,2001,2002,2001,2002],'pop':[1.5,1.7,3.6,2.4,2.9]} frame=DataFrame(data) print frame print "--------------------------" #可指定序列,DataFrame的列會按照指定的順序進行排列 frame1=DataFrame(data,columns=['year','state','pop']) print frame1 print "--------------------------" #如果傳入的數據找不到,就會NA值 frame2=DataFrame(data,columns=['year','state','pop','debt'],index=['one','two','three','four','five']) print frame2 print "--------------------------"
1.2 對屬性進行操作

''' Created on 2016-8-10 @author: xuzhengzhu ''' from pandas import * data={'state':['ohio','ohio','ohio','nevada','nevada'],'year':[2000,2001,2002,2001,2002],'pop':[1.5,1.7,3.6,2.4,2.9]} frame2=DataFrame(data,columns=['year','state','pop','debt'],index=['one','two','three','four','five']) print frame2 print "--------------------------" print frame2.year print "--------------------------" print frame2['year'] print "--------------------------" print frame2.ix['two'] print "--------------------------"
#通過類似字典標記的方式或屬性的方式,可,以將DataFrame的列獲取為一個Series,返回的Series與原來有相同的索引,且name屬性已指定
#行也可以通過位置或名稱的方式進行獲取比如索引字段ix
1.3 對DataFrame列進行操作

''' Created on 2016-8-10 @author: xuzhengzhu ''' from pandas import * data={'state':['ohio','ohio','ohio','nevada','nevada'],'year':[2000,2001,2002,2001,2002],'pop':[1.5,1.7,3.6,2.4,2.9]} frame2=DataFrame(data,columns=['year','state','pop','debt'],index=['one','two','three','four','five']) print frame2 print "--------------------------" #列可以通過賦值的方式進行修改 frame2['debt']=16.5 print frame2 #為不存在的列賦值會創建出一個新列 print "--------------------------" frame2['eastern']=frame2.state=='ohio' print frame2 print "--------------------------" #關鍵詞del用於刪除列 del frame2['eastern'] print frame2
1.4 另一種常見的數據形式是嵌套字典,傳入時會將外層字典作為列,內層的的鍵則作為行索引 (行列交換)

''' Created on 2016-8-10 @author: xuzhengzhu ''' ''' Created on 2016-8-10 @author: xuzhengzhu ''' from pandas import * pop={'nevada':{2001:2.4,2002:2.9},'ohio':{2000:1.5,2001:1.7,2002:3.6}} frame3=DataFrame(pop) print frame3 print frame3.T