本節介紹Series和DataFrame中的數據的基本手段
- 重新索引
pandas對象的一個重要方法就是reindex,作用是創建一個適應新索引的新對象

''' Created on 2016-8-10 @author: xuzhengzhu ''' ''' Created on 2016-8-10 @author: xuzhengzhu ''' from pandas import * print "--------------obj result:-----------------" obj=Series([4.5,7.2,-5.3,3.6],index=['d','b','a','c']) print obj print "--------------obj2 result:-----------------" obj2=obj.reindex(['a','b','c','d','e']) print obj2 print "--------------obj3 result:-----------------" obj3=obj.reindex(['a','b','c','d','e'],fill_value=0) print obj3
#reindex對索引值進行重排,如果當前索引值不存在,就引入缺失值
#可以指定fill_value=0來進行缺失值的替換

--------------obj result:----------------- d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64 --------------obj2 result:----------------- a -5.3 b 7.2 c 3.6 d 4.5 e NaN dtype: float64 --------------obj3 result:----------------- a -5.3 b 7.2 c 3.6 d 4.5 e 0.0 dtype: float64
2.插值
對於時間序列這樣的有序數據,重新索引時可能需要做一些插值處理,method選項即可達到此目的:
對於時間序列這樣的有序數據,重新索引時可能需要做一些插值處理,method選項即可達到此目的:
method參數介紹 | |
參數 | 說明 |
ffill或pad | 前向填充 |
bfill或backfill | 后向填充 |

''' Created on 2016-8-10 @author: xuzhengzhu ''' from pandas import * print "--------------obj3 result:-----------------" obj3=Series(['blue','red','yellow'],index=[0,2,4]) print obj3 print "--------------obj4 result:-----------------" obj4=obj3.reindex(range(6),method='ffill') print obj4

--------------obj3 result:----------------- 0 blue 2 red 4 yellow dtype: object --------------obj4 result:----------------- 0 blue 1 blue 2 red 3 red 4 yellow 5 yellow dtype: object
對於DataFrame數據類型,reindex可以修改行與列索引,但如果僅傳入一個序列,則優先重新索引行:

''' Created on 2016-8-10 @author: xuzhengzhu ''' from pandas import * print "--------------frame result:-----------------" frame=DataFrame(np.arange(9).reshape((3,3)),index=['a','c','d'],columns=['ohio','texas','california']) print frame print "--------------frame2 result:-----------------" frame2=frame.reindex(['a','b','c','d']) print frame2 print "--------------frame3 result:-----------------" frame3=frame.reindex(columns=['texas','utah','california']) print frame3 print "--------------frame3 result:-----------------" frame4=frame.ix[['a','b','c','d'],['texas','utah','california']] print frame4

--------------frame result:----------------- ohio texas california a 0 1 2 c 3 4 5 d 6 7 8 --------------frame2 result:----------------- ohio texas california a 0.0 1.0 2.0 b NaN NaN NaN c 3.0 4.0 5.0 d 6.0 7.0 8.0 --------------frame3 result:----------------- texas utah california a 1 NaN 2 c 4 NaN 5 d 7 NaN 8 --------------frame3 result:----------------- texas utah california a 1.0 NaN 2.0 b NaN NaN NaN c 4.0 NaN 5.0 d 7.0 NaN 8.0
3.指定軸上的項

''' Created on 2016-8-10 @author: xuzhengzhu ''' from pandas import * print "--------------Series drop item by index:-----------------" obj=Series(np.arange(3,8),index=['a','b','c','d','e']) print obj obj1=obj.drop('c') print obj1 print "--------------DataFrame drop item by index :-----------------" frame=DataFrame(np.arange(9).reshape((3,3)),index=['a','c','d'],columns=['ohio','texas','california']) print frame frame1=frame.drop(['ohio'],axis=1) print frame1

--------------Series drop item by index:----------------- a 3 b 4 c 5 d 6 e 7 dtype: int32 a 3 b 4 d 6 e 7 dtype: int32 --------------DataFrame drop item by index :----------------- ohio texas california a 0 1 2 c 3 4 5 d 6 7 8 texas california a 1 2 c 4 5 d 7 8
#對於DataFrame,可以刪除任意軸上的索引值
4.索引,選取和過濾
Series利用標簽的切片運算與普通的python切片運算不同,其末端是包含的,
DataFrame進行索引就是獲取一個或多個列

''' Created on 2016-8-10 @author: xuzhengzhu ''' from pandas import * print "--------------DataFrame drop item by index :-----------------" frame=DataFrame(np.arange(9).reshape((3,3)),index=['a','c','d'],columns=['ohio','texas','california']) print frame frame1=frame.drop(['ohio'],axis=1) print frame1 print "--------------DataFrame filter item by index :-----------------" #也可通過切片和布爾型來選取 print frame['ohio'] print frame[:2] print frame[frame['ohio']>=3] print "--------------DataFrame filter item by index :-----------------" #在DateFrame上進行標簽索引,引入ix: 注意行標簽在前,列標簽在后 print frame.ix['a',['ohio','texas']]

--------------DataFrame drop item by index :----------------- ohio texas california a 0 1 2 c 3 4 5 d 6 7 8 texas california a 1 2 c 4 5 d 7 8 --------------DataFrame filter item by index :----------------- a 0 c 3 d 6 Name: ohio, dtype: int32 ohio texas california a 0 1 2 c 3 4 5 ohio texas california c 3 4 5 d 6 7 8 --------------DataFrame filter item by index :----------------- ohio 0 texas 1 Name: a, dtype: int32
5.算術運算和數據對齊

''' Created on 2016-8-10 @author: xuzhengzhu ''' from pandas import * print "--------------DataFrame drop item by index :-----------------" s1=Series([7.3,-2.5,3.4,1.5],index=['a','c','d','e']) s2=Series([-2.1,3.6,-1.5,4,3.1],index=['a','c','e','f','g']) print s1+s2

--------------DataFrame drop item by index :----------------- a 5.2 c 1.1 d NaN e 0.0 f NaN g NaN dtype: float64

''' Created on 2016-8-10 @author: xuzhengzhu ''' from pandas import * print "--------------DataFrame drop item by index :-----------------" df1=DataFrame(np.arange(9).reshape((3,3)),columns=list('bcd'),index=['ohio','texas','colorado']) df2=DataFrame(np.arange(12).reshape((4,3)),columns=list('bde'),index=['utah','ohio','texas','oregon']) print df1 print "--------------------" print df2 #只返回行列均匹配的數值 print "-------df1+df2-------------" print df1+df2 #在對不同的索引對象進行算術運算時,當一個對象中某個軸標簽在另一個對象中找不到時填充一個特殊值 print "-------df3-------------" df3=df1.add(df2,fill_value=0) print df3

--------------DataFrame drop item by index :----------------- b c d ohio 0 1 2 texas 3 4 5 colorado 6 7 8 -------------------- b d e utah 0 1 2 ohio 3 4 5 texas 6 7 8 oregon 9 10 11 -------df1+df2------------- b c d e colorado NaN NaN NaN NaN ohio 3.0 NaN 6.0 NaN oregon NaN NaN NaN NaN texas 9.0 NaN 12.0 NaN utah NaN NaN NaN NaN -------df3------------- b c d e colorado 6.0 7.0 8.0 NaN ohio 3.0 1.0 6.0 5.0 oregon 9.0 NaN 10.0 11.0 texas 9.0 4.0 12.0 8.0 utah 0.0 NaN 1.0 2.0