pandas入門之Series


一、創建Series 

參數

- Series (Series)是能夠保存任何類型的數據(整數,字符串,浮點數,Python對象等)的一維標記數組。軸標簽統稱為索引。
- data 參數
- index 索引 索引值必須是唯一的和散列的,與數據的長度相同。 默認np.arange(n)如果沒有索引被傳遞。
- dtype 輸出的數據類型 如果沒有,將推斷數據類型
- copy 復制數據 默認為false

數組創建

data = ['a','b','c','d','e']
res= pd.Series(data,index=[i for i in range(1,6)],dtype=str)
print(res)

1    a
2    b
3    c
4    d
5    e
dtype: object

字典創建

data = {"a":1.,"b":2,"c":3,"d":4}
res = pd.Series(data,index=["d","c","b","a"])
print(res)        # 字典的鍵用於構建索引

d    4.0
c    3.0
b    2.0
a    1.0
dtype: float64

常量創建

# 如果數據是常量值,則必須提供索引。將重復該值以匹配索引的長度。
res = pd.Series(5,index=[1,2,3,4,5])
print(res)   

1    5
2    5
3    5
4    5
5    5
dtype: int64

二、數據查詢

切片

data = [1,2,3,4,5]
res = pd.Series(data,index=["a","b","c","d","e"])
print(res[0:3],"---")  # 這里跟python的切片一樣
print(res[3],"---")
print(res[-3:],"---")

a    1
b    2
c    3
dtype: int64 ---

4 ---

c    3
d    4
e    5
dtype: int64 ---

使用索引檢索數據

data = [1,2,3,4,5]
res = pd.Series(data,index=["a","b","c","d","e"])
print(res["a"])
# 檢索多個值 標簽用中括號包裹
print(res[["a","b"]]) # 如果用沒有的標簽檢索則會拋出異常KeyError: 'f'

1

a    1
b    2
dtype: int64
data = [1,2,3,4,5]
res = pd.Series(data)
res[[2,4]]

2    3
4    5
dtype: int64

使用head()/tail()查看前幾個或后幾個

data = [1,2,3,4,5]
res = pd.Series(data,index=["a","b","c","d","e"])
res.head(3)  # 查看前三個
res.tail(2)  # 查看后兩個

三、其他操作

series元素進行去重

unique() 對series元素進行去重

s = pd.Series(data=[1,1,2,2,3,4,5,6,6,6,7,6,6,7,8])
s.unique()

array([1, 2, 3, 4, 5, 6, 7, 8], dtype=int64)

兩個series元素相加

Series之間的運算

- 在運算中自動對齊不同索引的數據
- 如果索引不對應,則補NaN

# 當索引沒有對應的值時,可能出現缺失數據顯示NaN(not a number)的情況
s1 = pd.Series(data=[1,2,3,4,5],index=["a","b","c","d","e"])
s2 = pd.Series(data=[1,2,3,4,5],index=["a","b","c","d","f"])
s = s1 + s2
s

a    2.0
b    4.0
c    6.0
d    8.0
e    NaN
f    NaN
dtype: float64

監測缺失的數據

isnull()  # 缺失的數據返回的布爾值為True
notnull() # 缺失的數據返回的布爾值為False

isnull

s1 = pd.Series(data=[1,2,3,4,5],index=["a","b","c","d","e"])
s2 = pd.Series(data=[1,2,3,4,5],index=["a","b","c","d","f"])
s = s1 + s2
s.isnull()  # 缺失的數據返回的布爾值為True

a    False
b    False
c    False
d    False
e     True
f     True
dtype: bool

notnull

s1 = pd.Series(data=[1,2,3,4,5],index=["a","b","c","d","e"])
s2 = pd.Series(data=[1,2,3,4,5],index=["a","b","c","d","f"])
s = s1 + s2
s.notnull()  # 缺失的數據返回的布爾值為False

a     True
b     True
c     True
d     True
e    False
f    False
dtype: bool

如果將布爾值作為Serrise的索引,則只保留True對應的元素值

s[[True,True,False,False,True,True]] 

a    2.0
b    4.0
e    NaN
f    NaN
dtype: float64

根據上面的特性,可以取出所有空的數據和所有不為空的數據

s[s.isnull()]   # 取所有空值

e   NaN
f   NaN
dtype: float64

s[s.notnull()]  # 取出不為空的數據

a    2.0
b    4.0
c    6.0
d    8.0
dtype: float64


s.index  # 取出索引

Index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object')

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM