又見Python<3>:Pandas之Series對象的使用

本文轉載自查看原文 2018-04-16 21:14 877 [A2]Python/ Series/ python/ Pandas

Pandas是Python下最強大的數據分析和探索庫，是基於Numpy庫構建的，支持類似SQL的結構化數據的增、刪、查、改，具有豐富的數據處理函數。Pandas有兩大數據結構：Series和DataFrame，本文主要對Series的常用用法進行總結梳理。

約定：

import pandas as pd

1.什么是Series對象?

Series對象本質上類似於一個一維數組，由一列元素（由值和對應的索引）組成。

2.Series對象的創建

Series對象的創建主要是使用pd.Series方法。具體又分為兩種：

（1）通過列表創建

向pd.Series方法中傳入一個列表，未指定索引時，默認從0到N-1。

ser1=pd.Series([11,22,33,44])
ser1
Out[60]:
0    11
1    22
2    33
3    44
dtype: int64

也可以使用index參數指定索引：

ser2=pd.Series([11,22,33,44],index=['a','b','c','d'])
ser2
Out[61]:
a    11
b    22
c    33
d    44
dtype: int64

（2）通過字典創建

向傳入一個字典，字典的鍵就是索引，值就是值。

ser3=pd.Series({'a':11,'d':22,'c':33})
ser3
Out[62]:
a    11
d    22
c    33
dtype: int64

##3.Series對象的四個主要屬性 Series對象的四個主要屬性：索引、值、名稱、數據類型。 ###（1）索引 **a.索引的查看** 通過Series對象的**index屬性**查看索引，返回一個Index對象。

ser1.index
Out[63]: RangeIndex(start=0, stop=4, step=1)
ser2.index
Out[64]: Index([u'a', u'b', u'c', u'd'], dtype='object')

索引允許有重復，可使用Index對象的is_unique屬性查看是否有重復。

ser1.index.is_unique
Out[65]: True

b.索引的修改
索引對象是一個不可變數組，不能修改其中的值。

ser1.index[0]=5
Traceback (most recent call last):
  File "<ipython-input-68-2029117c9570>", line 1, in <module>
    ser1.index[0]=5
  File "/usr/local/share/anaconda2/lib/python2.7/site-packages/pandas/indexes/base.py", line 1404, in __setitem__
    raise TypeError("Index does not support mutable operations")
TypeError: Index does not support mutable operations

如果想修改Series對象的索引，只能將其重定向到一個新的索引對象上。

ser1.index=[5,6,7,8]
ser1.index
Out[70]: Int64Index([5, 6, 7, 8], dtype='int64')

c.索引的重排
使用reindex方法對索引進行重排。

ser2.reindex(['b','a','c','d'])
Out[73]:
b    22
a    11
c    33
d    44
dtype: int64

重排產生一個新Series對象，原對象不發生改變。
索引重排可實現3個目的：
① 對現有索引進行順序指定，即重新排列原來的元素順序；
② 刪除某個舊索引，即刪除對應元素；

ser2.reindex(['b','a','d'])
Out[74]:
b    22
a    11
d    44
dtype: int64

③ 增加某個新索引，即增加新元素，值為NaN。

ser2.reindex(['b','a','e','c','d'])
Out[75]:
b    22.0
a    11.0
e     NaN
c    33.0
d    44.0
dtype: float64

d.索引的排序
使用sort_index方法根據現有索引進行升序、降序排列。

ser3.sort_index()
Out[80]:
a    11
c    33
d    22
dtype: int64

默認按索引取值升序排列，排序后產生一個新Series對象，原對象不發生改變。
e.索引是否存在
使用in判斷元素是否存在，實質是判斷某索引是否存在。

'a' in ser3
Out[110]: True
11 in ser3
Out[111]: False

（2）值

a.值的查看
通過Series對象的values屬性查看值，返回一個數組對象。

ser1.values
Out[81]: array([11, 22, 33, 44])

b.值的修改
可以通過直接對values屬性返回的數組對象進行修改來修改Series對象的值。這種修改是對原對象的直接修改。

ser1.values[1]=23
ser1
Out[83]:
5    11
6    23
7    33
8    44
dtype: int64

c.值的排序
使用sort_values方法按照值進行升序、降序排列。

ser3.sort_values()
Out[84]:
a    11
d    22
c    33
dtype: int64

默認按索引取值升序排列，排序后產生一個新Series對象，原對象不發生改變。
d.值的排名
使用rank方法獲取元素取值排名。

ser2.rank()
Out[145]:
a    1.0
c    2.0
f    3.0
dtype: float64

默認升序排名，對於並列排名，默認取其均值。
e.值是否存在
使用isin方法判斷，要求傳入一個列表，返回一個布爾型Series對象。

ser6.isin(['a'])
Out[164]:
a    False
b    False
d    False
e    False
dtype: bool

（3）名稱

Series對象有名稱，可通過name屬性獲得。
Series對象的索引對象也有名稱，可通過Index對象的name屬性獲得。

（4）數據類型

通過Series對象的dtype屬性獲得。

ser2.dtype
Out[146]: dtype('float64')

##4.元素操作 ###（1）元素選取 **選擇一個元素：** **a.以對應的索引選取**

ser2['b']
Out[90]: 22

b.以對應的索引序號選取

ser2[1]
Out[91]: 22

選擇多個元素：
a.以對應的索引組成的列表選取

ser2[['a','c']]
Out[93]:
a    11
c    33
dtype: int64

b.以對應的索引組成的切片選取

ser2['a':'d']
Out[94]:
a    11
b    22
c    33
d    44
dtype: int64

c.以對應的索引序號組成的切片選取

ser2[0:3]
Out[92]:
a    11
b    22
c    33
dtype: int64

注意：a和c的區別是，前者包括右端點的元素，后者不包括右端點的元素。

（2）元素過濾

可直接使用基於值的比較運算條件進行過濾。

ser2[ser2>30]
Out[95]:
c    33
d    44
dtype: int64

（3）元素新增

a.使用賦值新增

ser2['e']=55
ser2
Out[97]:
a    11
b    22
c    33
d    44
e    55
dtype: int64

b.使用索引重排新增（注意reindex方法產生新對象，不會修改原對象）

ser2=ser2.reindex(['a','c','f'])
ser2
Out[100]:
a    11.0
c    33.0
f     NaN
dtype: float64

（4）元素刪除

使用drop方法刪除，drop方法產生新對象，不會修改原對象。

ser2=ser2.drop('f')
ser2
Out[106]:
a    11.0
c    33.0
dtype: float64

（5）算術運算

Series對象支持直接進行算術運算。

ser2+2
Out[107]:
a    13.0
c    35.0
dtype: float64
ser2*2
Out[108]:
a    22.0
c    66.0
dtype: float64

（6）獲取元素唯一值

使用unique方法獲取元素的唯一值。

ser6=pd.Series([11,22,44,22],index=['a','b','d','e'])
ser6
Out[159]:
a    11
b    22
d    44
e    22
dtype: int64
ser6.unique()
Out[160]: array([11, 22, 44])

使用value_counts方法獲取元素唯一值的頻數分布。

ser6.value_counts()
Out[161]:
22    2
11    1
44    1
dtype: int64

（7）判斷是否存在某元素

a.使用in判斷
使用in判斷元素是否存在，實質是判斷某索引是否存在。

'a' in ser3
Out[110]: True
11 in ser3
Out[111]: False

b.使用isin方法判斷
使用isin方法判斷，要求傳入一個列表，返回一個布爾型Series對象。

ser6.isin(['a'])
Out[164]:
a    False
b    False
d    False
e    False
dtype: bool

（8）判斷是否有空值

使用isnull或者notnull方法判斷是否有空值。

ser3.isnull()
Out[114]:
a    False
c    False
d    False
dtype: bool
ser3.notnull()
Out[115]:
a    True
c    True
d    True
dtype: bool

（9）缺失值處理

缺失值的處理主要有兩種方法：填充和過濾。
a.填充
使用fillna方法進行空值填充，該方法產生新對象，不會修改原對象。

ser2=ser2.reindex(['a','c','h'])
ser2=ser2.fillna(99)
ser2
Out[125]:
a    11.0
c    33.0
h    99.0
dtype: float64

b.過濾
使用dropna方法進行空值過濾，該方法產生新對象，不會修改原對象。

ser6=ser6.reindex(['a','b','d','f'])
ser6
Out[168]:
a    11.0
b    22.0
d    44.0
f     NaN
dtype: float64
ser6.dropna()
Out[169]:
a    11.0
b    22.0
d    44.0
dtype: float64

（10）過濾重復值

使用duplicated方法返回布爾型Series對象，判斷哪些元素是重復值。

ser7=pd.Series([11,22,44,22,11],index=['a','b','d','e','h'])
ser7
Out[173]:
a    11
b    22
d    44
e    22
h    11
dtype: int64
ser7.duplicated()
Out[174]:
a    False
b    False
d    False
e     True
h     True
dtype: bool

使用drop_duplicates方法過濾其中的重復值，不修改原對象，而是產生一個沒有重復值的新Series對象。

ser7.drop_duplicates()
Out[175]:
a    11
b    22
d    44
dtype: int64

（11）替換指定值

使用replace方法進行指定值的替換。第一個參數是舊值，第二個參數是新值。不修改原對象，產生一個新對象。

ser7
Out[177]:
a    11
b    22
d    44
e    22
h    11
dtype: int64
ser7.replace(44,55)
Out[178]:
a    11
b    22
d    55
e    22
h    11
dtype: int64

一次替換多個值，共用同一個新值，可以將舊值放在列表中傳入。

ser7.replace([44,11],55)
Out[180]:
a    55
b    22
d    55
e    22
h    55
dtype: int64

一次替換多個值，分別使用不同新值，要使用字典建立映射對象。

ser7.replace({44:55,11:66})
Out[182]:
a    66
b    22
d    55
e    22
h    66
dtype: int64

（12）匯總統計

常規的統計方法：sum（求和）、mean（均值）、cumsum（累計求和）。

ser7.sum()
Out[183]: 110
ser7.mean()
Out[184]: 22.0
ser7.cumsum()
Out[185]:
a     11
b     33
d     77
e     99
h    110
dtype: int64

也可以使用describe方法直接生成描述性統計結果。
a.當元素的數據類型為數值型時，生成的結果包括：均值、最大值、最小值、標准差、元素個數、百分位數。

ser7.describe()
Out[186]:
count     5.000000
mean     22.000000
std      13.472194
min      11.000000
25%      11.000000
50%      22.000000
75%      22.000000
max      44.000000
dtype: float64

b.當元素的數據類型為類別型時，生成的結果包括：唯一值個數、最大類別、最大類別頻數。

ser8=pd.Series({'a':'v1','b':'v2','c':'v3'})
ser8
Out[189]:
a    v1
b    v2
c    v3
dtype: object
ser8.describe()
Out[190]:
count      3
unique     3
top       v3
freq       1
dtype: object

##5.Series對象之間的操作 ###（1）Series之間算術運算自動按索引進行對齊，對應元素與元素之間進行算術運算，未對齊的索引，最后的運算結果為NaN。

ser4=pd.Series([11,22,44],index=['a','b','d'])
ser5=pd.Series([11,33,44],index=['a','c','d'])
ser4+ser5
Out[126]:
a    22.0
b     NaN
c     NaN
d    88.0
dtype: float64

（2）Series之間連接

a. 使用append方法連接
使用append方法進行兩個Series對象的連接，對二者的數據類型不做要求，索引也可以重復。結果為一個新對象，不會修改原對象。

ser4.append(ser5)
Out[127]:
a    11
b    22
d    44
a    11
c    33
d    44
dtype: int64

b. 使用concat方法連接
使用concat方法進行兩個或多個Series對象的連接，對二者的數據類型不做要求，索引也可以重復。結果為一個新對象，不會修改原對象。
① 默認axis=0，合並各個Series對象的行。

ser1
Out[191]:
5    11
6    23
7    33
8    44
dtype: int64
ser2
Out[192]:
a    11.0
c    33.0
f    99.0
dtype: float64
ser3
Out[193]:
a    11
c    33
d    22
dtype: int64
pd.concat([ser1,ser2,ser3])
Out[194]:
5    11.0
6    23.0
7    33.0
8    44.0
a    11.0
c    33.0
f    99.0
a    11.0
c    33.0
d    22.0
dtype: float64

② axis=1時，合並各個Series對象的列，產生一個DataFrame對象，每個Series對象自成一列，行索引對齊。

pd.concat([ser1,ser2,ser3],axis=1)
Out[195]:
      0     1     2
5  11.0   NaN   NaN
6  23.0   NaN   NaN
7  33.0   NaN   NaN
8  44.0   NaN   NaN
a   NaN  11.0  11.0
c   NaN  33.0  33.0
d   NaN   NaN  22.0
f   NaN  99.0   NaN

##6.參考與感謝 \[1] [利用Python進行數據分析](https://book.douban.com/subject/25779298/)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Pandas庫的使用--Series Python之Pandas中Series、DataFrame [Pandas技巧] 如何把pandas dataframe對象或series對象轉換成list python數據分析之pandas庫的Series應用 Python學習筆記：pandas.series.between方法 pandas基礎(1)_Series和DataFrame Pandas -----簡述 Series和DataFrame pandas.Series.value_counts Pandas中Series和DataFrame的索引 pandas DataFrame(5)-合並DataFrame與Series