數據分析之Pandas(二) Series入門

本文轉載自查看原文 2016-10-20 10:18 14440 數據分析

Pandas基本的數據結構是Series和DataFrame。Series是1-D的，DataFrame是2-D的。

首先引入Pandas和Numpy

from pandas import Series, DataFrame
import pandas as pd
import numpy as np

Series是一種類似於一維數組的對象，它由一組數據（各種NumPy數據類型）以及一組與之相關的數據標簽（即索引）組成。
創建一個Series的最基本方法是：

s = pd.Series(data, index=index)

這里，data指代許多不同的數據類型:

a Python dict
an ndarray
a Python list
a scalar value

僅由一組數據即可產生最簡單的Series，通過傳遞一個list來創建一個Series,pandas會默認創建整形索引：

In [2]: obj = Series([4, 7, -5, 3])
In [3]: obj
Out[3]:
0    4
1    7
2   -5
3    3
dtype: int64

我們可以通過Series的values和index屬性獲取其數組表示形式和索引對象：

In [4]: obj.values
Out[4]: array([ 4,  7, -5,  3], dtype=int64)

In [5]: obj.index
Out[5]: RangeIndex(start=0, stop=4, step=1)

通常，我們希望所創建的Series帶有一個可以對各個數據點進行標記的索引：

In [7]: obj2 = Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])

In [8]: obj2
Out[8]:
d    4
b    7
a   -5
c    3
dtype: int64
In [9]: obj2.index
Out[9]: Index([u'd', u'b', u'a', u'c'], dtype='object')

與普通的NumPy數組相比，你可以通過索引的方式選取Series中的單個或一組值：

In [10]: obj2['a']
Out[10]: -5

In [11]: obj2['d'] = 6

In [12]: obj2[['c', 'a', 'd']]
Out[12]:
c    3
a   -5
d    6

NumPy數組運算（如根據布爾型數組進行過濾、標量乘法、應用數學函數）都會保留索引和值之間的連接:

In [13]: obj2
Out[13]:
d    6
b    7
a   -5
c    3
dtype: int64

In [14]: obj2[obj2 > 0]
Out[14]:
d    6
b    7
c    3
dtype: int64

In [15]: obj * 2
Out[15]:
0     8
1    14
2   -10
3     6
dtype: int64

In [18]: np.exp(obj2)
Out[18]:
d     403.428793
b    1096.633158
a       0.006738
c      20.085537
dtype: float64

還可以將Series看成是一個定長的有序字典，因為它是索引值到數據值的一個映射。它可以在許多需要字典參數的函數中:

In [19]: 'b' in obj2
Out[19]: True

In [20]: 'e' in obj2
Out[20]: False

如果數據存放在一個python字典中，也可以直接通過這個字典來創建Series:

In [21]: sdata = {'Ohio':3500, 'Texas':7100, 'Oregon':1600, 'Utah':5000}

In [22]: obj3 = Series(sdata)

In [23]: obj3
Out[23]:
Ohio      3500
Oregon    1600
Texas     7100
Utah      5000
dtype: int64

如果只傳入一個字典，則結果Series中的索引就是原字典的鍵（有序排列）。

In [24]: states = ['California', 'Ohio', 'Oregon', 'Texas']

In [25]: obj4 = Series(sdata, index=states)

In [26]: obj4
Out[26]:
California       NaN
Ohio          3500.0
Oregon        1600.0
Texas         7100.0
dtype: float64

在這個例子中，sdata中跟states索引相匹配的那3個值會被找出來並放在相應的位置上，但由於“California”所對應的sdata值找不到，所以其結果就為NaN(即“非數字”（not a number）,在pandas中，它用於表示缺失或NA值)。pandas的isnull和notnull函數可用於檢測缺失數據：

In [29]: pd.isnull(obj4)
Out[29]:
California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

In [30]: pd.notnull(obj4)
Out[30]:
California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool

Series也有類似的實例方法：

In [31]: obj4.isnull()
Out[31]:
California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

In [32]: obj4.notnull()
Out[32]:
California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool

對於許多應用而言，Series最重要的一個功能是：它在算術運算中會自動對齊不同索引的數據。

In [33]: obj3
Out[33]:
Ohio      3500
Oregon    1600
Texas     7100
Utah      5000
dtype: int64

In [34]: obj4
Out[34]:
California       NaN
Ohio          3500.0
Oregon        1600.0
Texas         7100.0
dtype: float64

In [35]: obj3 + obj4
Out[35]:
California        NaN
Ohio           7000.0
Oregon         3200.0
Texas         14200.0
Utah              NaN
dtype: float64

Series對象本身及其索引都有一個name屬性，該屬性跟pandas其他的關鍵功能關系非常密切：

In [36]: obj4.name = 'population'

In [37]: obj4.index.name = 'state'

In [38]: obj4
Out[38]:
state
California       NaN
Ohio          3500.0
Oregon        1600.0
Texas         7100.0
Name: population, dtype: float64

Series的索引可以通過賦值的方式就地修改：

In [39]: obj
Out[39]:
0    4
1    7
2   -5
3    3
dtype: int64

In [40]: obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']

In [41]: obj
Out[41]:
Bob      4
Steve    7
Jeff    -5
Ryan     3
dtype: int64

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 數據分析——Pandas的用法（Series,DataFrame） Python數據分析工具：Pandas之Series Python數據分析-Pandas（Series與DataFrame） python數據分析之pandas庫的Series應用數據分析之Pandas(三) DataFrame入門 Pandas數據分析從放棄到入門 Python數據分析庫pandas ------ 初識pandas、Series對象利用pandas進行數據分析之二：DataFrame與Series數據結構對比小白學 Python 數據分析（3）：Pandas （二）數據結構 Series python數據分析（七） python pandas--series和dataframe的算術運算和數據對齊