python數據分析之pandas庫的DataFrame應用二

本文轉載自查看原文 2016-08-11 11:02 1760 python/ Python

　　本節介紹Series和DataFrame中的數據的基本手段

重新索引

　　pandas對象的一個重要方法就是reindex,作用是創建一個適應新索引的新對象

'''
Created on 2016-8-10
@author: xuzhengzhu
'''
'''
Created on 2016-8-10
@author: xuzhengzhu
'''
from pandas import  *

print "--------------obj result:-----------------"
obj=Series([4.5,7.2,-5.3,3.6],index=['d','b','a','c'])
print obj

print "--------------obj2 result:-----------------"
obj2=obj.reindex(['a','b','c','d','e'])
print obj2

print "--------------obj3 result:-----------------"
obj3=obj.reindex(['a','b','c','d','e'],fill_value=0)
print obj3

reindex

#reindex對索引值進行重排，如果當前索引值不存在，就引入缺失值
#可以指定fill_value=0來進行缺失值的替換

--------------obj result:-----------------
d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64
--------------obj2 result:-----------------
a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64
--------------obj3 result:-----------------
a   -5.3
b    7.2
c    3.6
d    4.5
e    0.0
dtype: float64

reindex_index

　　2.插值

　　對於時間序列這樣的有序數據，重新索引時可能需要做一些插值處理，method選項即可達到此目的：

method參數介紹
參數	說明
ffill或pad	前向填充
bfill或backfill	后向填充

'''
Created on 2016-8-10
@author: xuzhengzhu
'''
from pandas import  *

print "--------------obj3 result:-----------------"
obj3=Series(['blue','red','yellow'],index=[0,2,4])
print obj3

print "--------------obj4 result:-----------------"
obj4=obj3.reindex(range(6),method='ffill')

print obj4

ffill前向填充

--------------obj3 result:-----------------
0      blue
2       red
4    yellow
dtype: object
--------------obj4 result:-----------------
0      blue
1      blue
2       red
3       red
4    yellow
5    yellow
dtype: object

ffill結果：

　　對於DataFrame數據類型，reindex可以修改行與列索引，但如果僅傳入一個序列，則優先重新索引行：

'''
Created on 2016-8-10
@author: xuzhengzhu
'''
from pandas import  *

print "--------------frame result:-----------------"
frame=DataFrame(np.arange(9).reshape((3,3)),index=['a','c','d'],columns=['ohio','texas','california'])
print frame

print "--------------frame2 result:-----------------"
frame2=frame.reindex(['a','b','c','d'])
print frame2

print "--------------frame3 result:-----------------"
frame3=frame.reindex(columns=['texas','utah','california'])
print frame3

print "--------------frame3 result:-----------------"
frame4=frame.ix[['a','b','c','d'],['texas','utah','california']]
print frame4

reindex_dataframe

--------------frame result:-----------------
   ohio  texas  california
a     0      1           2
c     3      4           5
d     6      7           8
--------------frame2 result:-----------------
   ohio  texas  california
a   0.0    1.0         2.0
b   NaN    NaN         NaN
c   3.0    4.0         5.0
d   6.0    7.0         8.0
--------------frame3 result:-----------------
   texas  utah  california
a      1   NaN           2
c      4   NaN           5
d      7   NaN           8
--------------frame3 result:-----------------
   texas  utah  california
a    1.0   NaN         2.0
b    NaN   NaN         NaN
c    4.0   NaN         5.0
d    7.0   NaN         8.0

reindex結果：

　　3.指定軸上的項

'''
Created on 2016-8-10
@author: xuzhengzhu
'''
from pandas import  *

print "--------------Series drop item by index:-----------------"
obj=Series(np.arange(3,8),index=['a','b','c','d','e'])
print obj



obj1=obj.drop('c')
print obj1

print "--------------DataFrame drop item by index :-----------------"
frame=DataFrame(np.arange(9).reshape((3,3)),index=['a','c','d'],columns=['ohio','texas','california'])
print frame

frame1=frame.drop(['ohio'],axis=1)
print frame1

指定軸上的項

--------------Series drop item by index:-----------------
a    3
b    4
c    5
d    6
e    7
dtype: int32
a    3
b    4
d    6
e    7
dtype: int32
--------------DataFrame drop item by index :-----------------
   ohio  texas  california
a     0      1           2
c     3      4           5
d     6      7           8
   texas  california
a      1           2
c      4           5
d      7           8

drop_item

#對於DataFrame，可以刪除任意軸上的索引值

　　4.索引，選取和過濾

　　Series利用標簽的切片運算與普通的python切片運算不同，其末端是包含的，

　　DataFrame進行索引就是獲取一個或多個列

'''
Created on 2016-8-10
@author: xuzhengzhu
'''
from pandas import  *

print "--------------DataFrame drop item by index :-----------------"
frame=DataFrame(np.arange(9).reshape((3,3)),index=['a','c','d'],columns=['ohio','texas','california'])
print frame

frame1=frame.drop(['ohio'],axis=1)
print frame1

print "--------------DataFrame filter item by index :-----------------"
#也可通過切片和布爾型來選取
print frame['ohio']
print frame[:2]
print frame[frame['ohio']>=3]

print "--------------DataFrame filter item by index :-----------------"
#在DateFrame上進行標簽索引，引入ix： 注意行標簽在前，列標簽在后
print frame.ix['a',['ohio','texas']]

索引選取和過濾

--------------DataFrame drop item by index :-----------------
   ohio  texas  california
a     0      1           2
c     3      4           5
d     6      7           8
   texas  california
a      1           2
c      4           5
d      7           8
--------------DataFrame filter item by index :-----------------
a    0
c    3
d    6
Name: ohio, dtype: int32
   ohio  texas  california
a     0      1           2
c     3      4           5
   ohio  texas  california
c     3      4           5
d     6      7           8
--------------DataFrame filter item by index :-----------------
ohio     0
texas    1
Name: a, dtype: int32

結果：

　　5.算術運算和數據對齊

'''
Created on 2016-8-10
@author: xuzhengzhu
'''
from pandas import  *

print "--------------DataFrame drop item by index :-----------------"
s1=Series([7.3,-2.5,3.4,1.5],index=['a','c','d','e'])
s2=Series([-2.1,3.6,-1.5,4,3.1],index=['a','c','e','f','g'])
print s1+s2

算術運算和數據對齊

--------------DataFrame drop item by index :-----------------
a    5.2
c    1.1
d    NaN
e    0.0
f    NaN
g    NaN
dtype: float64

結果：

'''
Created on 2016-8-10
@author: xuzhengzhu
'''
from pandas import  *

print "--------------DataFrame drop item by index :-----------------"
df1=DataFrame(np.arange(9).reshape((3,3)),columns=list('bcd'),index=['ohio','texas','colorado'])
df2=DataFrame(np.arange(12).reshape((4,3)),columns=list('bde'),index=['utah','ohio','texas','oregon'])

print df1
print "--------------------"

print df2

#只返回行列均匹配的數值
print "-------df1+df2-------------"
print df1+df2

#在對不同的索引對象進行算術運算時，當一個對象中某個軸標簽在另一個對象中找不到時填充一個特殊值
print "-------df3-------------"
df3=df1.add(df2,fill_value=0)
print df3

對齊操作

--------------DataFrame drop item by index :-----------------
          b  c  d
ohio      0  1  2
texas     3  4  5
colorado  6  7  8
--------------------
        b   d   e
utah    0   1   2
ohio    3   4   5
texas   6   7   8
oregon  9  10  11
-------df1+df2-------------
            b   c     d   e
colorado  NaN NaN   NaN NaN
ohio      3.0 NaN   6.0 NaN
oregon    NaN NaN   NaN NaN
texas     9.0 NaN  12.0 NaN
utah      NaN NaN   NaN NaN
-------df3-------------
            b    c     d     e
colorado  6.0  7.0   8.0   NaN
ohio      3.0  1.0   6.0   5.0
oregon    9.0  NaN  10.0  11.0
texas     9.0  4.0  12.0   8.0
utah      0.0  NaN   1.0   2.0

結果：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python數據分析之pandas庫的DataFrame應用一 Python數據分析庫pandas ------ DataFrame python數據分析之pandas庫的Series應用 Python數據分析-Pandas（Series與DataFrame）用python做數據分析pandas庫介紹之DataFrame基本操作數據分析之Pandas(三) DataFrame入門 pandas模塊（數據分析）------dataframe Python數據分析庫pandas ------ pandas數據讀寫 Python三方庫：Pandas（數據分析） Python數據分析庫之pandas，你該這么學！No.1