Python數據分析庫pandas ------ 初識pandas、Series對象

本文轉載自查看原文 2018-07-31 10:23 5131 pandas庫

pandas在python中的使用：

　　在python中默認用 import pandas as pd 導入pandas庫，你可以用 pd.__version__ 查看你安裝的版本。

　　pandas中主要有兩種數據結構：Series 和 DataFrame。下面我們將介紹 Series 。

　　Series:一種類似於一維數組的對象，是由一組數據（一種NumPy數據類型）以及一組與之相關的數據標簽（即索引）

　　　　組成。僅有一組數據也可以產生簡單的Series對象。注意：Series中的索引值是可以重復的。

　　DataFrame:一個表格型的數據結構，包含有一組有序的列，每列可以是不同的值類型（數值，字符串，布爾型等），

　　　　DataFrame 即有行索引也有列索引，可以被看做是由Series組成的字典。

　　上面的表訴來源於博客：pandas---Series基礎使用。

Series對象：

　生成Series對象

 1 import pandas as pd  2 # print(pd.__version__)
 3 s = pd.Series([True, 1, 2, 'kl'])  # 默認添加index
 4 print("python%s的數據：\n" % type(s), s)  5 Out[1]：  6 python<class 'pandas.core.series.Series'>的數據：  7  0 True  8 1       1
 9 2       2
10 3 kl 11 dtype: object

　　為Series對象添加index

1 s = pd.Series([True, 1, 2, 'kl'], index=['logical', 'num1', 'num2', 'id']) 2 print("python%s的數據：\n" % type(s), s) 3 Out[2]： 4 python<class 'pandas.core.series.Series'>的數據： 5  logical True 6 num1          1
7 num2          2
8 id kl 9 dtype: object

　獲取元素

 1 print(s.values)  2 print(s.index)  3 print(s[2])  4 print(s['num1'])  5 print(s[:2])  6 print(s[['num1', 'num2']])  7 Out[3]：  8 [True 1 2 'kl']  9 Index(['logical', 'num1', 'num2', 'id'], dtype='object') 10 2
11 1
12 logical True 13 num1          1
14 dtype: object 15 num1    1
16 num2    2
17 dtype: object

　為元素賦值

 1 s = pd.Series([True, 1, 2, 'kl'], index=['logical', 'num1', 'num2', 'id'])  2 s[0] = False  3 s['num1'] = 1.1
 4 print(s)  5 Out[4]：  6 logical False  7 num1         1.1
 8 num2           2
 9 id kl 10 dtype: object

　用Numpy數組或其他Series對象定義新Series對象

 1 a = np.array([1, 2, 3, 4])  2 s1 = pd.Series(a)  3 s2 = pd.Series(s1)  4 print("s1:\n", s1)  5 print("s2:\n", s2)  6 print(s1 == s2)  7 s1[2] = 100
 8 print("s1更改后的s2:\n", s2)  9 Out[5]： 10 s1: 11 0    1
12 1    2
13 2    3
14 3    4
15 dtype: int32 16 s2: 17 0    1
18 1    2
19 2    3
20 3    4
21 dtype: int32 22 0 True 23 1 True 24 2 True 25 3 True 26 dtype: bool 27 s1更改后的s2: 28 0      1
29 1      2
30 2    100
31 3      4
32 dtype: int32

　　注意上面的s1更改之后，s2也發生了相應的變化，對比深復制與淺復制。

　篩選元素

1 a = pd.Series(np.array([1, 2, 3, 4]))
2 print(a[a < 3]) 
3 Out[6]： 
4 [1 2]

　運算和數學函數

 1 s1 = pd.Series([6, 1, 2, 9])  # 可以加減乘除
 2 b = s1 + 2
 3 print(b)  4 print(np.log(s1))  5 Out[7]：  6 0     8
 7 1     3
 8 2     4
 9 3    11
10 dtype: int64 11 0    1.791759
12 1    0.000000
13 2    0.693147
14 3    2.197225
15 dtype: float64

　Series對象的組成元素　

 1 color = pd.Series([1, 0, 2, 1, 2, 3], index=['white', 'white', 'blue', 'green', 'green', 'yellow'])  2 print("color:\n", color)  3 print("color.unique():\n", color.unique())  4 print("color.value_counts():\n", color.value_counts())  5 print("color.isin():\n", color.isin([0, 3]))  6 print("color[color.isin([0, 3])]:\n", color[color.isin([0, 3])])  7 Out[8]：  8 color:  9 white     1
10 white 0 11 blue      2
12 green     1
13 green     2
14 yellow    3
15 dtype: int64 16 color.unique(): 17  [1 0 2 3] 18 color.value_counts(): 19 2    2
20 1    2
21 3    1
22 0    1
23 dtype: int64 24 color.isin(): 25 white False 26 white True 27 blue False 28 green False 29 green False 30 yellow True 31 dtype: bool 32 color[color.isin([0, 3])]: 33 white 0 34 yellow    3
35 dtype: int64

　缺失值NaN

 1 s = pd.Series([1, 2, np.nan, 6])  2 print(s.isnull())  3 print(s.notnull())  4 print(s[s.notnull()])  5 Out[9]：  6 0 False  7 1 False  8 2 True  9 3 False 10 dtype: bool 11 0 True 12 1 True 13 2 False 14 3 True 15 dtype: bool 16 0    1.0
17 1    2.0
18 3    6.0
19 dtype: float64

　Series用作字典

1 mydict = {'red':200, 'blue':100, 'yellow':50, 'orange':100} 2 myseries = pd.Series(mydict) 3 print(myseries) 4 Out[10]： 5 red       200
6 blue      100
7 yellow     50
8 orange    100
9 dtype: int64

　　當然你也可以用index參數指定index。

　Series 對象之間的運算

 1 mydict0 = {'red':200, 'blue':100, 'yellow':50, 'orange':100}  2 myseries0 = pd.Series(mydict0)  3 print(myseries0)  4 mydict1 = {'red':200, 'blue':100, 'yellow':50, 'orange':100, 'black':30}  5 myseries1 = pd.Series(mydict1)  6 print(myseries0 + myseries1)  7 Out[11]：  8 red       200
 9 blue      100
10 yellow     50
11 orange    100
12 dtype: int64 13 black NaN 14 blue      200.0
15 orange    200.0
16 red       400.0
17 yellow    100.0
18 dtype: float64

　　注意 myseries0 是沒有black的所以相加時默認以NaN補位。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 數據分析之Pandas(二) Series入門 Python數據分析庫pandas ------ pandas數據讀寫 python數據分析（八） python pandas--series和dataframe的方法，排序，統計數據分析——Pandas的用法（Series,DataFrame）利用Python進行數據分析(8) pandas基礎: Series和DataFrame的基本操作 Python數據分析：初識Pandas，理解Pandas實現和原理 python數據分析之pandas庫的DataFrame應用一 Python數據分析(二): Pandas技巧 (2) Python數據分析(二): Pandas技巧 (1) python 數據分析之pandas