python pandas 合並數據函數merge join concat combine_first 區分

本文轉載自查看原文 2017-10-15 16:43 7164 python/ python-細節

　　pandas對象中的數據可以通過一些內置的方法進行合並：pandas.merge，pandas.concat，實例方法join，combine_first，它們的使用對象和效果都是不同的，下面進行區分和比較。

　　數據的合並可以在列方向和行方向上進行，即下圖所示的兩種方式：

　pandas.merge和實例方法join實現的是圖2列之間的連接，以DataFrame數據結構為例講解，DataFrame1和DataFrame2必須要在至少一列上內容有重疊，index也好，columns也好，只要是有內容重疊的列即可，指定其中一列或幾列作為連接的鍵，然后按照鍵，索引DataFrame2其他列上的的數據，添加DataFrame1中。例，以columns內容作為連接鍵：

import numpy as np import pandas as pd from pandas import Series,DataFrame df1 = DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)}) df2 = DataFrame({ 'key': ['a', 'b', 'd'], 'data2': range(3), 'data3':range(3,6)}) DF1=pd.merge(df1, df2)

通過設置merge參數'on'，'left_on'，'right_on'可以指定用來連接的列（即關鍵的重復內容列），也可以將index作為連接鍵，只要傳入left_index=True或right_index=True（或兩個都傳）來說明索引被用作連接鍵，例：

left1 = DataFrame({'key': ['a', 'b', 'a', 'a', 'b', 'c'],
                  'value': range(6)})
right1 = DataFrame({'group_val': [3.5, 7]}, index=['a', 'b'])
lr=pd.merge(left1, right1, left_on='key', right_index=True)

　　而實例方法join默認通過index來進行連接，例：

left2 = DataFrame([[1., 2.], [3., 4.], [5., 6.]], index=['a', 'c', 'e'],
                 columns=['Ohio', 'Nevada'])
right2 = DataFrame([[7., 8.], [9., 10.], [11., 12.], [13, 14]],
                   index=['b', 'c', 'd', 'e'], columns=['Missouri', 'Alabama'])
lr2=left2.join(right2, how='outer')

　join方法也可以通過列來連接，同樣設置參數‘on’即可。

　　上面介紹的函數實現的均是列之間的連接，要實現行之間的連接，要使用pd.concat方法，例:

s1 = Series([0, 1], index=['a', 'b'])
s2 = Series([2, 3, 4], index=['c', 'd', 'e'])
s3 = Series([5, 6], index=['f', 'g'])
ss=pd.concat([s1, s2, s3])
st=pd.concat([s1,s2,s3],axis=1)

concat默認在axis=0上工作（沿着負y軸的方向），當設置axis=1時（沿着x軸的方向），它同時也可以實現列之間的連接，產生一個DataFrame。

　　最后一個實例方法combine_first，它實現既不是行之間的連接，也不是列之間的連接，它在為數據“打補丁”：用參數對象中的數據為調用者對象的缺失數據“打補丁”。例：

a = Series([np.nan, 2.5, np.nan, 3.5, 4.5, np.nan], index=['f', 'e', 'd', 'c', 'b', 'a']) b = Series(np.arange(len(a), dtype=np.float64), index=['f', 'e', 'd', 'c', 'b', 'a']) b[-1] = np.nan c=b[:-2].combine_first(a[2:]) df1 = DataFrame({'a': [1., np.nan, 5., np.nan], 'b': [np.nan, 2., np.nan, 6.], 'c': range(2, 18, 4)}) df2 = DataFrame({'a': [5., 4., np.nan, 3., 7.], 'b': [np.nan, 3., 4., 6., 8.]}) df=df1.combine_first(df2)

　　簡單總結來說，通過merge和join合並的數據后數據的列變多，通過concat合並后的數據行列都可以變多（axis=1)，而combine_first可以用一個數據填充另一個數據的缺失數據。

注：以上所有實驗都是默認的“inner”連接方式（交集），可以通過“how”參數改變。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python學習筆記：數據合並join、merge、concat、append、combine、combine_first等 Pandas中DataFrame數據合並、連接（concat、merge、join）之merge python數據表的合並(python pandas join() 、merge()和concat()的用法) 4-Pandas數據預處理之數據融合（pd.merge()、df.join()、df.combine_first()詳解） DataFrame的merge、join和concat函數 Python中合並數據集——merge函數和concat函數區別 PANDAS 數據合並與重塑（join/merge篇） python merge、join、concat用法與區別 Pandas拼接操作（concat，merge，join和append）的區別數據分析入門——pandas之合並函數merge