pandas 的拼接merge和concat函數小結

本文轉載自查看原文 2019-11-05 13:59 1037 Python

pandas中數據的合並方案主要有concat,merge,join等函數。

其中concat主要是根據索引進行行或列的拼接，只能取行或列的交集或並集。
merge主要是根據共同列或者索引進行合並，可以取內連接，左連接、右連接、外連接等。
join的功能跟merge類似，因此不再贅述。

import pandas as pd
from pandas import Series,DataFrame
# 定義一個函數，根據行和列名對元素設置值
def make_df(cols,inds):
    data = {c:[c+str(i) for i in inds] for c in cols}
    return DataFrame(data,index=inds)

df1 = make_df(list("abc"),[1,2,4])
df1

	a	b	c
1	a1	b1	c1
2	a2	b2	c2
4	a4	b4	c4

df2 = make_df(list("abcd"),[2,4,6])
df2

	a	b	c	d
2	a2	b2	c2	d2
4	a4	b4	c4	d4
6	a6	b6	c6	d6

df11=df1.set_index('a')
df22=df2.set_index('a')

1. concat函數

axis :默認為0，為按行拼接；1 為按列拼接
ignore_index: 默認為False,會根據索引進行拼接；True 則會忽略原有索引，重建新索引
join: 為拼接方式，包括 inner,outer
sort: True 表示按索引排序

(1) 簡單的按索引的行列拼接

# 按行拼接
pd.concat([df1,df2],sort=False)

	a	b	c	d
1	a1	b1	c1	NaN
2	a2	b2	c2	NaN
4	a4	b4	c4	NaN
2	a2	b2	c2	d2
5	a5	b5	c5	d5
6	a6	b6	c6	d6

# 按列拼接
pd.concat([df1,df2],axis=1)

	a	b	c	a	b	c	d
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6

(2）去掉原索引的拼接

# 按行拼接，去掉原來的行索引重新索引
pd.concat([df1,df2],sort=False,ignore_index=True)

	a	b	c	d
0	a1	b1	c1	NaN
1	a2	b2	c2	NaN
2	a4	b4	c4	NaN
3	a2	b2	c2	d2
4	a5	b5	c5	d5
5	a6	b6	c6	d6

# 按列拼接，去掉原來的列索引重新索引
pd.concat([df1,df2],axis=1,ignore_index=True)

	0	1	2	3	4	5	6
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6

(3)指定連接方式的拼接

拼接方式有 inner,outer

# 交集,inner join
pd.concat([df1,df2],sort=False,join='inner')

	a	b	c
1	a1	b1	c1
2	a2	b2	c2
4	a4	b4	c4
2	a2	b2	c2
5	a5	b5	c5
6	a6	b6	c6

# 並集,outer join
pd.concat([df1,df2],sort=False,join='outer')

	a	b	c	d
1	a1	b1	c1	NaN
2	a2	b2	c2	NaN
4	a4	b4	c4	NaN
2	a2	b2	c2	d2
5	a5	b5	c5	d5
6	a6	b6	c6	d6

2.merge函數

how：數據合並的方式。left：基於左dataframe列的數據合並；right：基於右dataframe列的數據合並；outer：基於列的數據外合並（取並集）；inner：基於列的數據內合並（取交集）；默認為'inner'。
on：基於相同列的合並
left_on/right_on：左/右dataframe合並的列名。
left_index/right_index：是否以index作為數據合並的列名，True表示是。可與left_on/right_on合並使用
sort：根據dataframe合並的keys排序，默認是。
suffixes：若有相同列且該列沒有作為合並的列，可通過suffixes設置該列的后綴名，一般為元組和列表類型。

(1) 基於相同列的合並

df3 = pd.merge(df1,df2,how='inner',on='a')        # 基於單列的合並
df4 = pd.merge(df1,df2,how='inner',on=['a','b'])  # 基於多列的合並
df5 = pd.merge(df1,df2,how='left',on='a',suffixes=['_1','_2']) # 左連接,且指定后綴
df5

	a	b_1	c_1	b_2	c_2	d
0	a1	b1	c1	NaN	NaN	NaN
1	a2	b2	c2	b2	c2	d2
2	a4	b4	c4	b4	c4	d4

(2) 基於不同列名，或者列和索引，或者索引和索引間的合並

df6 = pd.merge(df1,df2,how='inner',left_on='a',right_on='b')             # 基於不同列名
df7 = pd.merge(df1,df22,how='inner',left_on='a',right_index=True)        #基於列和索引
df8 = pd.merge(df1,df2,how='inner',left_index=True,right_index=True)    #基於兩邊都是索引
df8

	a_x	b_x	c_x	a_y	b_y	c_y	d
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	a4	b4	c4	d4

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Pandas拼接操作（concat，merge，join和append）的區別 Pandas concat和merge合並 pandas的concat和drop函數 pandas 連接合並merge、join、concat DataFrame的merge、join和concat函數 pandas dataframe的合並（append, merge, concat） python pandas 合並數據函數merge join concat combine_first 區分 pandas的merge函數 Pandas中DataFrame數據合並、連接（concat、merge、join）之concat pandas的連接函數concat()函數

	a	b	c	a	b	c	d
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6

	0	1	2	3	4	5	6
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6

	a	b	c	a	b	c	d
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6

	0	1	2	3	4	5	6
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6

	a	b	c	a	b	c	d
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6

	0	1	2	3	4	5	6
1	a1	b1	c1	NaN	NaN	NaN	NaN
2	a2	b2	c2	a2	b2	c2	d2
4	a4	b4	c4	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	a5	b5	c5	d5
6	NaN	NaN	NaN	a6	b6	c6	d6