wordcloud使用


 

學了下怎么用wordcloud。

以imet的數據集為例

https://www.kaggle.com/c/imet-2019-fgvc6

讀取“train.csv”,”label.csv”文件,得到id2name[] (label的id和label名稱對應) 和 attribute_count(label出現次數統計)兩個dict。

import matplotlib.pyplot as plt
import numpy as np import osimport csv

lines=csv.reader(open("train.csv"))
train_content = []
head_row =next(lines)
for line in lines:
    train_content.append(line)

attribute_ids = []
for line in train_content:
    attributes = line[1].split()
    for a in attributes:
        attribute_ids.append(a)

lines=csv.reader(open("labels.csv"))
attribute_content = []
head_row =next(lines)
for line in lines:
    attribute_content.append(line)
id2name = {}
for line in attribute_content:
    if line[0] not in id2name:
        id2name.update({line[0]:line[1]})
def count_list(lt):
    d={}
    for i in lt:
        if (i in d.keys()):
            continue
        count = lt.count(i)
        d[i] = count
    return d
attribute_count = count_list(attribute_ids)

對attribute_count進行排序,輸出出現次數較多的標簽(前十個)

sorted_attribute= sorted(attribute_count.items(),key = lambda item :item[1],reverse = True)
for i in range(10):
    print (sorted_attribute[i][0],':   ',id2name[sorted_attribute[i][0]])
    print (sorted_attribute[i][1])

結果為

然而這樣還不夠直觀,使用wordcloud可以更直觀展示詞頻。

需要的python庫

seaborn、wordcloud

准備好dict

culture_count_dict = {}
tag_count_dict = {}
for i in range(1103):
    idx = str(i)
    if (id2name[idx][0:5] == 'tag::'):
        tag_count_dict.update({id2name[idx][5:]:attribute_count[idx]})
    else:
        culture_count_dict.update({id2name[idx][9:]:attribute_count[idx]})

  wordcloud 生成圖像

import seaborn as sns
from wordcloud import WordCloud

culture_cloud = WordCloud(background_color='Black', colormap='Paired', width=1600, height=800, random_state=123).generate_from_frequencies(culture_count_dict)
tag_cloud = WordCloud(background_color='Black', colormap='Paired', width=1600, height=800, random_state=123).generate_from_frequencies(tag_count_dict)

plt.figure(figsize=(24,24))
plt.subplot(211)
plt.imshow(culture_cloud,interpolation='bilinear')
plt.axis('off')

plt.subplot(212)
plt.imshow(tag_cloud, interpolation='bilinear')
plt.axis('off')

plt.tight_layout()
plt.show()

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM