Python數據分析之pandas基本數據結構：Series、DataFrame

本文轉載自查看原文 2019-08-30 17:05 2252 python/ 數據分析

1引言

本文總結Pandas中兩種常用的數據類型：

（1）Series是一種一維的帶標簽數組對象。

（2）DataFrame，二維，Series容器

2 Series數組

2.1 Series數組構成

Series數組對象由兩部分構成：

值（value）：一維數組的各元素值，是一個ndarray類型數據。
索引（index）：與一維數組值一一對應的標簽。利用索引，我們可非常方便得在Series數組中進行取值。

如下所示，我們通過字典創建了一個Series數組，輸出結果的第一列就是索引，第二列就是數組的具體值。

>>> import pandas as pd
>>> a =pd.Series([102, 212, 332, 434])
>>> a
0 102
1 212
2 332
3 434
dtype: int64

也可以在創建時手動指定索引：

>>> a = pd.Series([102, 212, 332, 434], index=['第一列', '第二列', '第三列', '第四列'])
>>> a
第一列 102
第二列 212
第三列 332
第四列 434
dtype: int64

利用索引，我們可以更加方便得在數組中進行取值：

>>> a['第一列']
102
>>> a[['第一列', '第二列']]
第一列 102
第二列 212
dtype: int64

當然，你也可以使用以往的數字下標從數組中取值：

>>> a[0]
102
>>> a[[0,1]]
第一列 102
第二列 212
dtype: int64

2.2 創建Series數組

（1）通過list、tuple創建

>>> pd.Series([123, 321, 345,543]) # 傳入一個list
0 123
1 321
2 345
3 543
dtype: int64
>>> pd.Series((123, 321, 345,543)) # 傳入一個元組
0 123
1 321
2 345
3 543
dtype: int64

（2）通過傳入一維numpy數組對象創建

>>> import numpy as np
>>> n = np.arange(3) # 創建一個一維的numpy數組
>>> pd.Series(n)
0 0
1 1
2 2
dtype: int32

注意：傳入的numpy必須是一維的數組，否則會報錯。

>>> n = np.arange(6).reshape((2,3))
>>> pd.Series(n)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
……
packages\pandas\core\internals\construction.py", line 729, in sanitize_array
raise Exception("Data must be 1-dimensional")
Exception: Data must be 1-dimensional

（3）通過傳入字典創建

通過字典創建Series數組時，字典的key會自動被設置成Series數組的索引：

>>> pd.Series({'name':'張三', 'age':40, 'weight':140})
name 張三
age 40
weight 140
dtype: object

（4）通過傳入一個標量值創建

當傳入一個標量值時，必須傳入index索引，Series會根據傳入的index參數來確定數組對象的長度：

>>> a = pd.Series(10, index=['a', 'b', 'c', 'd'])
>>> a
a 10
b 10
c 10
d 10
dtype: int64

2.2 Series數組常用屬性

Series數組的屬性與numpy數組屬性很是類似，如下表所示：

Series.index	以列表的形式返回數組的索引
Series.values	以列表的形式返回數組的所有值
Series.dtype	返回基礎數據的dtype對象，數據類型
Series.shape	返回基礎數據形狀的元組
Series.ndim	根據定義1，數組的維數
Series.size	返回數組中的元素數
Series.base	如果與另一數組共享數據，則返回基礎數組
Series.T	轉置
Series.empty	判斷數組是否為空
Series.name	返回系列的名稱

3 DataFrame數組

3.1 DataFrame數組構成

DataFrame數組是Pandas中另一種數據結構，其數據的呈現方式類似於Excel這種二維表結構。相比於Series數組，DataFrame可以存放多維數據，所以DataFrame不僅僅有索引，還有列名，如下所示：

>>> d = {'one': [1, 2, 3, 4], 'two':['一', '二', '三', '四']}
>>> pd.DataFrame(d)
one two
0 1 一
1 2 二
2 3 三
3 4 四
>>> df.index
RangeIndex(start=0, stop=4, step=1)
>>> df.columns
Index(['one', 'two'], dtype='object')

可以看到，DataFrame數組可以包含多維數據，類似於一張二維表。與Series類似，DataFrame數組也有一個index索引，在不指定索引時，通常會自動生成從零開始步長為1的索引。此外DataFrame數組還有一個列名，索引和列名是從數組中挑選數據的重要依據。

3.2 創建DataFrame數組

（1）通過字典創建

通過字典來創建DataFrame數組時，字典的鍵將會自動成DataFrame數組的列名，字典的值必須是可迭代對象，例如Series、numpy數組、list、tuple等，不同Series數組中對應的缺失值pandas將自動填充NaN：

以list列表為值的字典：

>>> d = {'one': [1, 2, 3, 4], 'two':['一', '二', '三', '四']}
>>> pd.DataFrame(d)
one two
0 1 一
1 2 二
2 3 三
3 4 四

以numpy數組為值得字典：

>>> d = {'zero': np.zeros((3,)), 'ones': np.ones((3,)), 'twos':np.full((3,),2)}
>>> pd.DataFrame(d)
zero ones twos
0 0.0 1.0 2
1 0.0 1.0 2
2 0.0 1.0 2

以Series為值的字典：

>>> d = {'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']), 'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
>>> df = pd.DataFrame(d) # 創建DataFrame數組
>>> df
one two
a 1.0 1.0
b 2.0 2.0
c 3.0 3.0
d  NaN 4.0

無論是上面那種類型對象為值的字典，都可以通過下面的方式重新指定列索引：

>>> pd.DataFrame(d, index=['d', 'b', 'a'])
one two
d NaN 4.0
b 2.0 2.0
a 1.0 1.0

當然，也可以在手動指定列名，不過行索引對應的鍵數據才會傳入新建的數組中：

>>> pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])
two three
d 4.0 NaN
b 2.0 NaN
a 1.0 NaN

（2）通過列表創建

通過列表創建DataFrame數組時，列表的每一個元素必須是字典，這樣，字典的鍵將作為列名。

>>> d = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
>>> pd.DataFrame(d)
a b c
0 1 2 NaN
1 5 10 20.0
>>> pd.DataFrame(d, index=['第一行', '第二行']) # 重新指定索引
a b c
第一行 1 2 NaN
第二行 5 10 20.0

（3）通過功能函數創建

我們還可以通過諸如from_dict()、from_records()這類的功能函數來創建DataFrame數組，以from_dict()為例：

>>> d = {'A': [1, 2, 3], 'B': [4, 5, 6]}
>>> pd.DataFrame.from_dict(d)
A B
0 1 4
1 2 5
2 3 6

如果需要讓字典的鍵作為索引，重新指定列名，可以傳入orient='index'參數，然后重新傳入列名：

>>> pd.DataFrame.from_dict(d,orient='index', columns=['one', 'two', 'three'])
one two three
A 1 2 3
B 4 5 6

3.3 DataFrame數組的常用屬性

DataFrame數組的屬性與Series數據幾乎一樣，只是多了一個保存列名信息的columns屬性，參看上面表格中的Series屬性就行了。

4 總結

          本文大致介紹了Pandas中的兩種重要數據結構Series數組對象和DataFrame數組對象的特點、主要創建方法、屬性。對於從數組對象中進行切片、索引數據的方法，請參考博客 
         《python數據分析之pandas數據選取：df[] df.loc[] df.iloc[] df.ix[] df.at[] df.iat[]》。 
        

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Pandas的三個基本數據結構Series,DataFrame,Index python數據分析之pandas庫的Series應用 Python數據分析庫pandas ------ DataFrame Pandas初體驗之數據結構——Series和DataFrame 數據分析之pandas基本使用(DataFrame系列) python數據分析工具 | pandas Python數據分析之pandas學習 python數據分析實戰---Pandas 【Python 數據分析】pandas模塊利用python數據分析panda學習筆記之Series