python實現關鍵詞共現矩陣,將下圖中同時出現的關鍵詞,
轉化為下圖的共現矩陣。
代碼如下:
import pandas as pd import numpy as np data = pd.read_excel(r'E:\Python\data.xlsx',header=None) keyword = (set(i.split('/')) for i in data.loc[:,2]) keyword = set.union(*keyword)#所有關鍵詞 togo = pd.DataFrame(np.zeros([len(keyword),len(keyword)]),columns=keyword,index=keyword) for i in data.iloc[:,2]: line = i.split('/') togo.loc[line,line] = togo.loc[line,line] + 1 for i in range(len(togo)):#對角線都為0 togo.iloc[i,i]=0 togo.to_csv(r'E:\Python\togo.csv')
最后生成的表格如上圖,總長度較大,不方便展示,下圖大概體現下共現矩陣的信息。