python數據分析之Pandas：基本功能介紹

本文轉載自查看原文 2018-02-04 14:43 1655 python數據分析

Pandas有兩個主要的數據結構：Series和DataFrame．

Series是一種類似於一維數組的對象，它由一組數據以及一組與之相關的數據標簽構成．來看下它的使用過程

In [1]: from pandas import Series,DataFrame

In [2]: import pandas as pd

In [3]: obj=Series([4,7,-5,3])

In [5]: obj

Out[5]:

0 4

1 7

2 -5

3 3

dtype: int64

通過Series生成的對象左邊是索引，右邊是具體的值．如果我們沒有指定索引，那么會默認的生成一個．可以通過values和index來查看對應的值和索引．

In [6]: obj.values

Out[6]: array([ 4, 7, -5, 3])

In [7]: obj.index

Out[7]: RangeIndex(start=0, stop=4, step=1)

如果我們想指示索引，可以在生成的時候通過index來指示對應的索引

In [8]: obj2=Series([4,7,-5,3],index=['a','b','c','d'])

In [9]: obj2

Out[9]:

a 4

b 7

c -5

d 3

dtype: int64

通過對應的索引就可以訪問對應的值

In [10]: obj2['a']

Out[10]: 4

通過numpy數組運算后的結果也會保留索引和值之間的鏈接：

In [12]: np.exp(obj2)

Out[12]:

a 54.598150

b 1096.633158

c 0.006738

d 20.085537

dtype: float64

如果數據存在字典中，那么也可以通過這個字典來創建Series．創建之后索引就是字典中的key值．

In [13]: data={'name':'zhf','age':33,'city':'chengdu'}

In [14]: obj3=Series(data)

In [15]: obj3

Out[15]:

age 33

city chengdu

name zhf

dtype: object

DataFrame:

DataFrame是一個表格形的數據結構．DataFrame既有行索引也有列索引，可以被看作是Series組成的字典．

In [25]: data={'city':['chongqing','chengdu','beijing'],'weather':['rainy','suns

...: haw','snow'],'temperature':[9,5,-3]}

In [26]: frame=DataFrame(data)

In [27]: frame

Out[27]:

city temperature weather

0 chongqing 9 rainy

1 chengdu 5 sunshaw

2 beijing -3 snow

但是生成的數據的列索引和我們初始化data的時候不一樣，如果我們想按照初始化data的索引順序來生成的話就要在DataFrame中指定columns

In [28]: frame=DataFrame(data,columns=['city','weather','temperature'])

In [29]: frame

Out[29]:

city weather temperature

0 chongqing rainy 9

1 chengdu sunshaw 5

2 beijing snow -3

同樣的也可以指示行索引的值

In [30]: frame=DataFrame(data,columns=['city','weather','temperature'],index=['f

...: irst','second','third'])

In [31]: frame

Out[31]:

city weather temperature

first chongqing rainy 9

second chengdu sunshaw 5

third beijing snow -3

有了索引后就可以通過索引訪問對應的行和列的數據．

通過列索引訪問

In [33]: frame.city

Out[33]:

first chongqing

second chengdu

third beijing

Name: city, dtype: object

通過行索引訪問

In [41]: frame.loc['first']

Out[41]:

city chongqing

weather rainy

temperature 9

Name: first, dtype: object

另外一種常見的形式就是嵌套字典(也就是字典的字典)

這種格式的生成外層字典的鍵作為列，內層鍵則作為行索引

In [42]: pop={'cost':{2016:3000,2017:3400,2018:5000},'need':{2017:4000,2018:6000

...: }}

In [43]: frame3=DataFrame(pop)

In [44]: frame3

Out[44]:

cost need

2016 3000 NaN

2017 3400 4000.0

2018 5000 6000.0

當然也可以轉置

In [45]: frame3.T

Out[45]:

2016 2017 2018

cost 3000.0 3400.0 5000.0

need NaN 4000.0 6000.0

基本功能：

一　重新索引

首先來看下之前生成的數據，返回一個index對象．然后通過index[1]=’a’的形式來修改

In [50]: obj.index

Out[50]: RangeIndex(start=0, stop=4, step=1)

In [51]: index=obj.index

In [52]: index[1]='a'

會提示如下的錯誤，Index does not support mutable operations．表示index對象是不可以修改的對象．因此無法通過這種方式進行修改

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-52-336c3a4c2807> in <module>()

----> 1 index[1]='a'

/usr/local/lib/python2.7/dist-packages/pandas/core/indexes/base.pyc in __setitem__(self, key, value)

1722

1723 def __setitem__(self, key, value):

-> 1724 raise TypeError("Index does not support mutable operations")

1725

1726 def __getitem__(self, key):

TypeError: Index does not support mutable operations

要想修改序列，只能通過obj.reindex的方法．

In [60]: obj.reindex(['a','b','c','d','e'])

二丟棄指定軸上的項

可以通過drop的方法來丟棄某個行上的數據，參數即是行索引

In [64]: obj

Out[64]:

1 4

2 7

3 5

4 3

dtype: int64

In [65]: new=obj.drop(1)

In [66]: new

Out[66]:

2 7

3 5

4 3

dtype: int64

三索引，選取和過濾

在python的列表和元組中，我們可以通過切片來得到我們想要的信息，同樣的在pandas中也可以通過切片來得到信息．

In [67]: obj[2:4]

Out[67]:

3 5

4 3

dtype: int64

對於之前的嵌套字典，也可以通過切片的方式進行訪問．

In [81]: frame

Out[81]:

city weather temperature

first chongqing rainy 9

second chengdu sunshaw 5

third beijing snow -3

In [82]: frame[0:1]

Out[82]:

city weather temperature

first chongqing rainy 9

或者是通過ｉx來訪問單個的行

In [83]: frame.ix[1]

Out[83]:

city chengdu

weather sunshaw

temperature 5

Name: second, dtype: object

三　算術運算和數據對齊

在對象進行相加的時候，如果存在不同的索引對，則結果的索引就是該索引的並集．如下面２個數據，只有一個索引’a’是能夠對應得上的．因此相加后只有索引a才有結果其他都是空值

In [84]: s1=Series([1,2,3,4],index=['a','b','c','d'])

In [85]: s2=Series([5,6,7,8],index=['x','a','y','z'])

In [86]: s1+s2

Out[86]:

a 7.0

b NaN

c NaN

d NaN

x NaN

y NaN

z NaN

dtype: float64

四　在算術方法中填充值

前面介紹到如果相加后沒有相同的索引值，那么對應的值就會被填充為NaN，如果我們期望填充某個固定的值比如０的話該如何操作呢，可以使用s1.add(s2,fill_value=0)的方式，這樣的話就可以呈現出０而不是NaN

五　DataFrame和Series之間的運算

來看一個具體的例子

In [111]: frame=DataFrame(np.arange(12).reshape((4,3)),columns=list('bde'),index

...: =['a1','a2','a3','a4'])

In [119]: series=frame.ix[0]

In [120]: series

Out[120]:

b 0

d 1

e 2

Name: a1, dtype: int64

DataFrame和Series之間的算術運算會將Series的索引匹配到DataFrame的列，然后進行相減

In [122]: frame

Out[122]:

b d e

a1 0 1 2

a2 3 4 5

a3 6 7 8

a4 9 10 11

In [123]: frame-series

Out[123]:

b d e

a1 0 0 0

a2 3 3 3

a3 6 6 6

a4 9 9 9

六　排序和排名

要對行或者列索引進行排序，可使用sort_index的方法，它將返回一個已排序的新對象

In [133]: frame

Out[133]:

e c d

a3 0 1 2

a2 3 4 5

a0 6 7 8

a1 9 10 11

對行索引進行排序

In [134]: frame.sort_index()

Out[134]:

e c d

a0 6 7 8

a1 9 10 11

a2 3 4 5

a3 0 1 2

對列索引進行排序

In [135]: frame.sort_index(axis=1)

Out[135]:

c d e

a3 1 2 0

a2 4 5 3

a0 7 8 6

a1 10 11 9

如果要對具體某一列的的數據進行排序的話可以采用傳入參數by的方式．這里sort_index和sort_values都是一樣的效果．

In [139]: frame.sort_values(by='d')

Out[139]:

e c d

a3 0 1 2

a2 3 4 5

a0 6 7 8

a1 9 10 11

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 利用python數據分析panda學習筆記之基本功能 python數據分析工具 | pandas Python數據分析之pandas學習 python數據分析實戰---Pandas 【Python 數據分析】pandas模塊數據分析(7):pandas介紹和數據導入和導出 Python 數據分析：讓你像寫 Sql 語句一樣，使用 Pandas 做數據分析 Python實驗五：Pandas數據分析及數據預處理小白學 Python 數據分析（7）：Pandas （六）數據導入【Python 數據分析】pandas數據導入