DataFrame定義:
DataFrame是pandas的兩個主要數據結構之一,另一個是Series
—一個表格型的數據結構
—含有一組有序的列
—大致可看成共享同一個index的Series集合
DataFrame創建方式:
默認方式創建:
>>> data = {'name':['Wangdachui','Linling','Niuyun'],'pay':[4000,5000,6000]} >>> frame = pd.DataFrame(data) >>> frame name pay 0 Wangdachui 4000 1 Linling 5000 2 Niuyun 6000
傳入索引的方式創建:
>>> data = np.array([('Wangdachui',4000),('Linling',5000),('Niuyun',6000)]) >>> frame = pd.DataFrame(data,index = range(1,4),columns=['name','pay']) >>> frame name pay 1 Wangdachui 4000 2 Linling 5000 3 Niuyun 6000 >>> frame.index RangeIndex(start=1, stop=4, step=1) >>> frame.columns Index(['name', 'pay'], dtype='object') >>> frame.values array([['Wangdachui', '4000'], ['Linling', '5000'], ['Niuyun', '6000']], dtype=object)
DataFrame的基本操作:
取DataFrame對象的行和列
>>> frame
name pay
1 Wangdachui 4000
2 Linling 5000
3 Niuyun 6000
>>> frame['name'] 1 Wangdachui 2 Linling 3 Niuyun Name: name, dtype: object
>>> frame.pay 1 4000 2 5000 3 6000 Name: pay, dtype: object
取特定的行或列
>>> frame.iloc[:2,1]#取第0,1行的第1列 1 4000 2 5000 Name: pay, dtype: object >>> frame.iloc[:1,0]#取第0行的第0列 1 Wangdachui Name: name, dtype: object >>> frame.iloc[2,1]#取第2行的第1列 '6000' >>> frame.iloc[2]#取第2行 name Niuyun pay 6000 Name: 3, dtype: object
DataFrame對象的修改和刪除
>>> frame['name']= 'admin' >>> frame name pay 1 admin 4000 2 admin 5000 3 admin 6000
>>> del frame['pay'] >>> frame name 1 admin 2 admin 3 admin
DataFrame的統計功能
找最低工資和工資大於5000的人
>>> frame name pay 1 Wangdachui 4000 2 Linling 5000 3 Niuyun 6000 >>> frame.pay.min() '4000' >>> frame[frame.pay >= '5000'] name pay 2 Linling 5000 3 Niuyun 6000
案例:
已知有一個列表中存放了一組音樂數據:
music_data = [("the rolling stones","Satisfaction"),("Beatles","Let It Be"),("Guns N'Roses","Don't Cry"),("Metallica","Nothing Else Matters")],請根據這組數據創建一個如下的DataFrame:
singer song_name
1 the rolling stones Satisfaction
2 Beatles Let It Be
3 Guns N'Roses Don't Cry
4 Metallica Nothing Else Matters
方法如下:
>>> import pandas as pd >>> music_data = [("the rolling stones","Satisfaction"),("Beatles","Let It Be"),("Guns N'Roses","Don't Cry"),("Metallica","Nothing Else Matters")] >>> music_table = pd.DataFrame(music_data) >>> music_table 0 1 0 the rolling stones Satisfaction 1 Beatles Let It Be 2 Guns N'Roses Don't Cry 3 Metallica Nothing Else Matters >>> music_table.index = range(1,5) >>> music_table.columns = ['singer','song_name'] >>> print(music_table) singer song_name 1 the rolling stones Satisfaction 2 Beatles Let It Be 3 Guns N'Roses Don't Cry 4 Metallica Nothing Else Matters
DataFrame基本操作補充
DataFrame對象如下:
>>> frame
name pay
1 Wangdachui 4000
2 Linling 5000
3 Niuyun 6000
(1)添加列
添加列可以直接賦值,例如給frame添加tax列:
>>> frame['tax'] = [0.05,0.05,0.1] >>> frame name pay tax 1 Wangdachui 4000 0.05 2 Linling 5000 0.05 3 Niuyun 6000 0.10
(2)添加行
添加行可以用loc(標簽)和iloc(位置)索引,也可以用append()和concat()方法,這里用loc()方法
>>> frame.loc[5] = {'name':'Liuxi','pay':5000,'tax':0.05} >>> frame name pay tax 1 Wangdachui 4000 0.05 2 Linling 5000 0.05 3 Niuyun 6000 0.10 5 Liuxi 5000 0.05
(3)刪除對象元素
刪除數據可直接用“del數據”的方式進行,但這種方式是直接對原始數據操作,不安全,可利用drop()方法刪除指定軸上的數據
>>> frame.drop(5)
name pay tax
1 Wangdachui 4000 0.05
2 Linling 5000 0.05
3 Niuyun 6000 0.10
>>> frame.drop('tax',axis = 1) name pay 1 Wangdachui 4000 2 Linling 5000 3 Niuyun 6000 5 Liuxi 5000
此時frame沒有受影響
>>> frame
name pay tax
1 Wangdachui 4000 0.05
2 Linling 5000 0.05
3 Niuyun 6000 0.10
5 Liuxi 5000 0.05
(4)修改
繼承上面的frame,對tax統一修改成0.03
>>> frame['tax'] = 0.03 >>> frame name pay tax 1 Wangdachui 4000 0.03 2 Linling 5000 0.03 3 Niuyun 6000 0.03 5 Liuxi 5000 0.03
也可以直接用loc()修改
>>> frame.loc[5] = ['Liuxi',9800,0.05] >>> frame name pay tax 1 Wangdachui 4000 0.03 2 Linling 5000 0.03 3 Niuyun 6000 0.03 5 Liuxi 9800 0.05