网易云音乐欧美歌单数据分析
一、 选题背景
如今听音乐成为我们日常生活中的一部分,听音乐方式从磁带→光盘、黑胶唱片→数字专辑。目前主流音乐平台有:网易云音乐、QQ音乐、酷狗音乐、酷我音乐。网易云音乐,是一款由网易开发的音乐产品,是网易杭州研究院的成果,依托专业音乐人、DJ、好友推荐及社交功能,在线音乐服务主打歌单、社交、大牌推荐和音乐指纹,以歌单、DJ节目、社交、地理位置为核心要素,主打发现和分享。本次作业选择对网易云音乐平台进行爬取并数据可视化分析。
二、主题式网络爬虫设计方案
1 .主题式爬虫名称
名称:网易云音乐歌单爬虫系统
2. 主题式网络爬虫爬取内容与数据特征分析
爬取内容:歌单、播放数、收藏数、标签、歌曲数
数据特征分析:网页文本
3. 主题式网络爬虫设计方案概述
(一)思路
1.对网页进行解析
2.将数据储存
3.查看网页结构
4.查看网页爬取内容的位置
5.取出数据
6.遍历数据
(二) 技术难点
1.异常处理
2.网页内容读取
3.数据遍历
4.数据批量存储
5.整体系统设计
三、主题页面的结构特征分析
(1)主题页面的结构特征与特征分析
()Htmls 页面解析
第一个页面只需要获取歌单的url
url:'https://music.163.com/discover/playlist/?cat=欧美&order=hot&limit=35&offset=' + str(i)
第二个页面需要获取歌单名、播放数、收藏数、评论数、歌曲数。
查找节点:
通过F12定位元素、使用Beatifulsoup.select()获取标签定位
1 title = soup.select('h2')[0].get_text().replace(',', ',') 2 # 获取标签 3 tags = [ ] 4 tags_message = soup.select('.u-tag i') 5 text = '无' 6 # 获取歌单收藏量 7 collection = soup.select('#content-operation i')[1].get_text().replace('(', '').replace(')', '') 8 # 歌单播放量 9 play = soup.select('.s-fc6')[0].get_text() 10 # 歌单内歌曲数 11 songs = soup.select('#playlist-track-count')[0].get_text() 12 # 歌单评论数 13 comments = soup.select('#cnt_comment_count')[0].get_text()
遍历方法:
内容取出用for循环
四、网络爬虫程序设计
(一)数据爬取与采集
1 保存url: 2 url='https://music.163.com/discover/playlist/?cat=欧美&order=hot&limit=35&offset=' + str(i) 3 response = requests.get(url=url, headers=headers) 4 html = response.text 5 soup = BeautifulSoup(html, 'html.parser') 6 # 获取包含歌单详情页网址的标签 7 ids = soup.select('.dec a') 8 # 获取包含歌单索引页信息的标签 9 lis = soup.select('#m-pl-container li') 10 print(len(lis)) 11 for j in range(len(lis)): 12 # 获取歌单详情页地址 13 url = ids[j]['href'] 14 print(url) 15 # 将信息写入CSV文件中 16 with open('playlist.csv', 'a+', encoding='utf-8-sig') as f: 17 f.write(url + '\n') 18 保存歌单内容: 19 for i in data['url']: 20 # time.sleep(2) 21 url = 'https://music.163.com' + str(i) 22 print(url) 23 response = requests.get(url=url, headers=headers) 24 html = response.text 25 soup = BeautifulSoup(html, 'html.parser') 26 # 获取歌单标题 27 title = soup.select('h2')[0].get_text().replace(',', ',') 28 # 获取标签 29 tags = [] 30 tags_message = soup.select('.u-tag i') 31 for p in tags_message: 32 tags.append(p.get_text()) 33 # 对标签进行格式化 34 if len(tags) > 1: 35 tag = '-'.join(tags) 36 else: 37 tag = tags[0] 38 # 获取歌单介绍 39 text = '无' 40 # 获取歌单收藏量 41 collection = soup.select('#content-operation i')[1].get_text().replace('(', '').replace(')', '') 42 # 歌单播放量 43 play = soup.select('.s-fc6')[0].get_text() 44 # 歌单内歌曲数 45 songs = soup.select('#playlist-track-count')[0].get_text() 46 # 歌单评论数 47 comments = soup.select('#cnt_comment_count')[0].get_text() 48 # 输出歌单详情页信息 49 print(title, tag, text, collection, play, songs, comments) 50 # 将详情页信息写入CSV文件中 51 with open('muslist_message.csv', 'a', encoding='utf-8-sig') as f: 52 f.write( 53 title + ',' + tag + ',' + text + ',' + collection + ',' + play + ',' + songs + ',' + comments + '\n')
爬虫系统运行演示
(二)对数据进行清洗和处理
1.导入数据
1 import numpy as np 2 import pandas as pd 3 import matplotlib.pyplot as plt 4 muc = pd.read_csv(r'C:/Users/Bling/Desktop/muslist_message1.csv')
2.数据清洗
1 # 重复值处理 2 muc = muc.drop_duplicates('title') 3 # Nan处理 4 muc = muc.dropna(axis = 0) 5 #删除无效行 6 muc = muc.drop(['text'], axis = 1) 7 #替换值 8 muc.replace('评论', '0',inplace = True) 9 muc.replace('收藏', '0',inplace = True)
3.文本分析
1 # 词云 2 import numpy as np 3 import wordcloud as wc 4 from PIL import Image 5 import matplotlib.pyplot as plt 6 7 bk = np.array(Image.open("wyymuc.jpg")) 8 mask = bk 9 # 定义尺寸 10 word_cloud = wc.WordCloud( 11 mask = mask, 12 background_color='mintcream', 13 font_path='msyhbd.ttc', 14 max_font_size=300, 15 random_state=50, 16 ) 17 text = muc["tag"] 18 text = " ".join(text) 19 word_cloud.generate(text) 20 plt.imshow(word_cloud) 21 plt.show()
1 标签分析: 2 import squarify 3 import pandas as pd 4 import matplotlib.pyplot as plt 5 6 df = pd.read_csv(r'C:/Users/Bling/Desktop/muslist_message1.csv', header=None) 7 # 处理标签信息 8 tags = [] 9 dom2 = [] 10 for i in df[1]: 11 c = i.split('-') 12 for j in c: 13 if j not in tags: 14 tags.append(j) 15 else: 16 continue 17 for item in tags: 18 num = 0 19 for i in df[1]: 20 type2 = i.split('-') 21 for j in range(len(type2)): 22 if type2[j] == item: 23 num += 1 24 else: 25 continue 26 dom2.append(num) 27 # 数据创建 28 data = {'tags': tags, 'num': dom2} 29 frame = pd.DataFrame(data) 30 df1 = frame.sort_values(by='num', ascending=False) 31 name = df1['tags'][:10] 32 income = df1['num'][:10] 33 # 绘图details 34 colors = ['#FFB6C1', '#B0C4DE', '#FFFF00', '#FF4500', '#DCDCDC', '#009966', '#FF6600', '#FF0033', '#009999', '#333366'] 35 plot = squarify.plot(sizes=income, label=name, color=colors, alpha=1, value=income, edgecolor='white', linewidth=1.5) 36 # 设置图片显示属性,字体及大小 37 plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] 38 plt.rcParams['font.size'] = 8 39 plt.rcParams['axes.unicode_minus'] = False 40 # 设置标签大小为1 41 plt.rc('font', size=6) 42 # 设置标题大小 43 plot.set_title('网易云音乐欧美歌单标签图', fontsize=13, fontweight='light') 44 # 除坐标轴 45 plt.axis('off') 46 # 除上边框和右边框刻度 47 plt.tick_params(top=False, right=False) 48 # 图形展示 49 plt.show()
4.数据分析与可视化
柱状图
1 # 设置图片显示属性,字体及大小 2 plt.rcParams['font.sans-serif'] = ['STXihei'] 3 plt.rcParams['font.size'] = 12 4 plt.rcParams['axes.unicode_minus'] = False 5 # 设置图片显示属性 6 fig = plt.figure(figsize=(16, 8), dpi=80) 7 ax = plt.subplot(1, 1, 1) 8 ax.patch.set_color('white') 9 # 设置坐标轴属性 10 lines = plt.gca() 11 plt.ylabel("播放/万")#纵坐标名字 12 plt.xticks([]) 13 # 绘制直方图,设置直方图颜色 14 ax.hist(muc['play'], bins=30, alpha=0.7, color='plum') 15 ax.set_title('欧美歌单播放数量分布情况', fontsize=20) 16 # 显示图片 17 plt.show()
1 # 设置图片显示属性,字体及大小 2 plt.rcParams['font.sans-serif'] = ['STXihei'] 3 plt.rcParams['font.size'] = 12 4 plt.rcParams['axes.unicode_minus'] = False 5 # 设置图片显示属性 6 fig = plt.figure(figsize=(16, 8), dpi=80) 7 ax = plt.subplot(1, 1, 1) 8 ax.patch.set_color('white') 9 # 设置坐标轴属性 10 lines = plt.gca() 11 plt.ylabel("播放量/万")#纵坐标名字 12 plt.xticks([]) 13 # 绘制直方图,设置直方图颜色 14 ax.hist(muc['comments'], bins=30, alpha=0.7, color='wheat') 15 ax.set_title('欧美歌单评论数分布情况', fontsize=20) 16 # 显示图片 17 plt.show()
1 # 设置图片显示属性,字体及大小 2 plt.rcParams['font.sans-serif'] = ['STXihei'] 3 plt.rcParams['font.size'] = 12 4 plt.rcParams['axes.unicode_minus'] = False 5 # 设置图片显示属性 6 fig = plt.figure(figsize=(16, 8), dpi=80) 7 ax = plt.subplot(1, 1, 1) 8 ax.patch.set_color('white') 9 # 设置坐标轴属性 10 lines = plt.gca() 11 plt.ylabel("播放量/万")#纵坐标名字 12 plt.xticks([]) 13 # 绘制直方图,设置直方图颜色 14 ax.hist(muc['collection'], bins=30, alpha=0.7, color='m') 15 ax.set_title('欧美歌单收藏数分布情况', fontsize=20) 16 # 显示图片 17 plt.show()
水平图
1 # 水平图 2 plt.rcParams['font.sans-serif'] = ['STXihei'] 3 fig = plt.figure(figsize=(16, 8), dpi=80) 4 ax = plt.subplot(1, 1, 1) 5 ax.patch.set_color('white') 6 x=muc['collection'].head(20) 7 y=muc['title'].head(20) 8 plt.barh(y,x, alpha=0.2,color='tan',label="热度", lw=3) 9 plt.xticks(rotation=90) 10 plt.title("网易云欧美歌单趋势",fontsize=18,) 11 plt.legend(loc = "best")#图例 12 plt.show()
1 # 降序排列 2 muc.sort_values(by='collection', inplace=True,ascending=False) 3 muc 4 # 水平图 5 plt.rcParams['font.sans-serif'] = ['STXihei'] 6 fig = plt.figure(figsize=(16, 8), dpi=80) 7 ax = plt.subplot(1, 1, 1) 8 ax.patch.set_color('white') 9 x=muc['collection'].head(10) 10 y=muc['title'].head(10) 11 plt.barh(y,x, alpha=0.2,color='pink',label="热度", lw=3) 12 plt.xticks(rotation=90) 13 plt.title("网易云欧美歌单最受欢迎Top10",fontsize=18) 14 plt.legend(loc = "best")#图例 15 plt.show()
散点图
1 #散点图 2 x=muc['collection'] 3 y=muc['play'] 4 fig = plt.figure(figsize=(18,7)) 5 plt.scatter(x,y,color='lightgreen',marker='o',s=40,alpha=0.5) 6 plt.xticks(rotation=90) 7 plt.title("网易云歌单散点图")
盒图
1 #盒图 2 plt.boxplot(y) 3 plt.title("网易云歌单盒图") 4 plt.show()
5.线性回归方程
散点图
1 #散点图 2 import seaborn as sns 3 import matplotlib.pyplot as plt 4 import pandas as pd 5 import numpy as np 6 import warnings 7 warnings.filterwarnings("ignore") 8 four=pd.DataFrame(pd.read_csv('C:/Users/Bling/Desktop/muslist_message1.csv')) 9 sns.regplot(x='play',y='songs',data=four,color='r')
构建线性回归模型
1 #构建线性回归模型 2 from sklearn import datasets 3 from sklearn.linear_model import LinearRegression 4 import pandas as pd 5 import numpy as np 6 import seaborn as sns 7 predict_model = LinearRegression() 8 three=pd.DataFrame(pd.read_csv('C:/Users/Bling/Desktop/muslist_message1.csv')) 9 X = three['play'].values 10 X = X.reshape(-1,1) 11 predict_model.fit(X , three['songs']) 12 np.set_printoptions(precision = 3, suppress = True) 13 a = predict_model.coef_ 14 b = predict_model.intercept_ 15 print("回归方程系数{}".format(predict_model.coef_)) 16 print("回归方程截距{0:2f}".format(predict_model.intercept_)) 17 print("线性回归预测模型表达式为{}*x+{}".format(predict_model.coef_,predict_model.intercept_))
6.数据持久化
7.代码汇总
1 4.5.1 爬虫系统 2 from bs4 import BeautifulSoup 3 import requests 4 import time 5 import pandas as pd 6 7 8 def playlist (): 9 headers = { 10 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36' 11 } 12 count = 1 13 for i in range(0, 1330, 35): 14 print("正在爬取第{}页.".format(count)) 15 time.sleep(2) 16 url = 'https://music.163.com/discover/playlist/?cat=欧美&order=hot&limit=35&offset=' + str(i) 17 response = requests.get(url=url, headers=headers) 18 html = response.text 19 soup = BeautifulSoup(html, 'html.parser') 20 # 获取包含歌单详情页网址的标签 21 ids = soup.select('.dec a') 22 # 获取包含歌单索引页信息的标签 23 lis = soup.select('#m-pl-container li') 24 print(len(lis)) 25 for j in range(len(lis)): 26 # 获取歌单详情页地址 27 url = ids[j]['href'] 28 print(url) 29 # 将信息写入CSV文件中 30 with open('playlist.csv', 'a+', encoding='utf-8-sig') as f: 31 f.write(url + '\n') 32 count += 1 33 34 35 36 def singlist(): 37 file = open("muslist_message.csv", "a",encoding='utf-8-sig') 38 file.write("title" + "," + "tag" + "," + "text" + "," + "collection" + "," + "play" + "," + "songs" + "," + "comments" + "," + '\n') 39 file = file.close() 40 data = pd.read_csv('playlist.csv', header=None, error_bad_lines=False, names=['url']) 41 42 headers = { 43 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36' 44 } 45 for i in data['url']: 46 # time.sleep(2) 47 url = 'https://music.163.com' + str(i) 48 print(url) 49 response = requests.get(url=url, headers=headers) 50 html = response.text 51 soup = BeautifulSoup(html, 'html.parser') 52 # 获取歌单标题 53 title = soup.select('h2')[0].get_text().replace(',', ',') 54 # 获取标签 55 tags = [] 56 tags_message = soup.select('.u-tag i') 57 for p in tags_message: 58 tags.append(p.get_text()) 59 # 对标签进行格式化 60 if len(tags) > 1: 61 tag = '-'.join(tags) 62 else: 63 tag = tags[0] 64 # 获取歌单介绍 65 text = '无' 66 # 获取歌单收藏量 67 collection = soup.select('#content-operation i')[1].get_text().replace('(', '').replace(')', '') 68 # 歌单播放量 69 play = soup.select('.s-fc6')[0].get_text() 70 # 歌单内歌曲数 71 songs = soup.select('#playlist-track-count')[0].get_text() 72 # 歌单评论数 73 comments = soup.select('#cnt_comment_count')[0].get_text() 74 # 输出歌单详情页信息 75 print(title, tag, text, collection, play, songs, comments) 76 # 将详情页信息写入CSV文件中 77 with open('muslist_message.csv', 'a', encoding='utf-8-sig') as f: 78 f.write( 79 title + ',' + tag + ',' + text + ',' + collection + ',' + play + ',' + songs + ',' + comments + '\n') 80 81 if __name__ == '__main__': 82 # playlist() 83 singlist() 84 85 86 4.5.2 数据分析 87 import numpy as np 88 import pandas as pd 89 import matplotlib.pyplot as plt 90 muc = pd.read_csv(r'C:/Users/Bling/Desktop/muslist_message1.csv') 91 muc 92 # 重复值处理 93 muc = muc.drop_duplicates('title') 94 # Nan处理 95 muc = muc.dropna(axis = 0) 96 #删除无效行 97 muc = muc.drop(['text'], axis = 1) 98 #替换值 99 muc.replace('评论', '0',inplace = True) 100 muc.replace('收藏', '0',inplace = True) 101 # 降序排列 102 muc.sort_values(by=["play"],inplace=True,ascending=[False]) 103 muc 104 # 设置图片显示属性,字体及大小 105 plt.rcParams['font.sans-serif'] = ['STXihei'] 106 plt.rcParams['font.size'] = 12 107 plt.rcParams['axes.unicode_minus'] = False 108 # 设置图片显示属性 109 fig = plt.figure(figsize=(16, 8), dpi=80) 110 ax = plt.subplot(1, 1, 1) 111 ax.patch.set_color('white') 112 # 设置坐标轴属性 113 lines = plt.gca() 114 plt.ylabel("播放/万")#纵坐标名字 115 plt.xticks([]) 116 # 绘制直方图,设置直方图颜色 117 ax.hist(muc['play'], bins=30, alpha=0.7, color='plum') 118 ax.set_title('欧美歌单播放数量分布情况', fontsize=20) 119 # 显示图片 120 plt.show() 121 # 对播放数取对数 122 # 设置图片显示属性,字体及大小 123 plt.rcParams['font.sans-serif'] = ['STXihei'] 124 plt.rcParams['font.size'] = 12 125 plt.rcParams['axes.unicode_minus'] = False 126 # 设置图片显示属性 127 fig = plt.figure(figsize=(16, 8), dpi=80) 128 ax = plt.subplot(1, 1, 1) 129 ax.patch.set_color('white') 130 # 设置坐标轴属性 131 lines = plt.gca() 132 plt.ylabel("播放量/万")#纵坐标名字 133 plt.xticks([]) 134 # 绘制直方图,设置直方图颜色 135 ax.hist(muc['comments'], bins=30, alpha=0.7, color='wheat') 136 ax.set_title('欧美歌单评论数分布情况', fontsize=20) 137 # 显示图片 138 plt.show() 139 # 设置图片显示属性,字体及大小 140 plt.rcParams['font.sans-serif'] = ['STXihei'] 141 plt.rcParams['font.size'] = 12 142 plt.rcParams['axes.unicode_minus'] = False 143 # 设置图片显示属性 144 fig = plt.figure(figsize=(16, 8), dpi=80) 145 ax = plt.subplot(1, 1, 1) 146 ax.patch.set_color('white') 147 # 设置坐标轴属性 148 lines = plt.gca() 149 plt.ylabel("播放量/万")#纵坐标名字 150 plt.xticks([]) 151 # 绘制直方图,设置直方图颜色 152 ax.hist(muc['collection'], bins=30, alpha=0.7, color='m') 153 ax.set_title('欧美歌单收藏数分布情况', fontsize=20) 154 # 显示图片 155 plt.show() 156 # 水平图 157 plt.rcParams['font.sans-serif'] = ['STXihei'] 158 fig = plt.figure(figsize=(16, 8), dpi=80) 159 ax = plt.subplot(1, 1, 1) 160 ax.patch.set_color('white') 161 x=muc['collection'].head(20) 162 y=muc['title'].head(20) 163 plt.barh(y,x, alpha=0.2,color='tan',label="热度", lw=3) 164 plt.xticks(rotation=90) 165 plt.title("网易云欧美歌单趋势",fontsize=18,) 166 plt.legend(loc = "best")#图例 167 plt.show() 168 # 降序排列 169 muc.sort_values(by='collection', inplace=True,ascending=False) 170 muc 171 # 水平图 172 plt.rcParams['font.sans-serif'] = ['STXihei'] 173 fig = plt.figure(figsize=(16, 8), dpi=80) 174 ax = plt.subplot(1, 1, 1) 175 ax.patch.set_color('white') 176 x=muc['collection'].head(10) 177 y=muc['title'].head(10) 178 plt.barh(y,x, alpha=0.2,color='pink',label="热度", lw=3) 179 plt.xticks(rotation=90) 180 plt.title("网易云欧美歌单最受欢迎Top10",fontsize=18) 181 plt.legend(loc = "best")#图例 182 plt.show() 183 #散点图 184 x=muc['collection'] 185 y=muc['play'] 186 fig = plt.figure(figsize=(18,7)) 187 plt.scatter(x,y,color='lightgreen',marker='o',s=40,alpha=0.5) 188 plt.xticks(rotation=90) 189 plt.title("网易云歌单散点图") 190 #盒图 191 plt.boxplot(y) 192 plt.title("网易云歌单盒图") 193 plt.show() 194 import squarify 195 import pandas as pd 196 import matplotlib.pyplot as plt 197 198 df = pd.read_csv(r'C:/Users/Bling/Desktop/muslist_message1.csv', header=None) 199 #线性回归方程散点图 200 import seaborn as sns 201 import matplotlib.pyplot as plt 202 import pandas as pd 203 import numpy as np 204 import warnings 205 warnings.filterwarnings("ignore") 206 four=pd.DataFrame(pd.read_csv('C:/Users/Bling/Desktop/muslist_message1.csv')) 207 sns.regplot(x='play',y='songs',data=four,color='r') 208 #建立线性回归方程模型 209 from sklearn import datasets 210 from sklearn.linear_model import LinearRegression 211 import pandas as pd 212 import numpy as np 213 import seaborn as sns 214 predict_model = LinearRegression() 215 three=pd.DataFrame(pd.read_csv('C:/Users/Bling/Desktop/muslist_message1.csv')) 216 X = three['play'].values 217 X = X.reshape(-1,1) 218 predict_model.fit(X , three['songs']) 219 np.set_printoptions(precision = 3, suppress = True) 220 a = predict_model.coef_ 221 b = predict_model.intercept_ 222 print("回归方程系数{}".format(predict_model.coef_)) 223 print("回归方程截距{0:2f}".format(predict_model.intercept_)) 224 print("线性回归预测模型表达式为{}*x+{}".format(predict_model.coef_,predict_model.intercept_)) 225 # 处理标签信息 226 tags = [] 227 dom2 = [] 228 for i in df[1]: 229 c = i.split('-') 230 for j in c: 231 if j not in tags: 232 tags.append(j) 233 else: 234 continue 235 for item in tags: 236 num = 0 237 for i in df[1]: 238 type2 = i.split('-') 239 for j in range(len(type2)): 240 if type2[j] == item: 241 num += 1 242 else: 243 continue 244 dom2.append(num) 245 # 数据创建 246 data = {'tags': tags, 'num': dom2} 247 frame = pd.DataFrame(data) 248 df1 = frame.sort_values(by='num', ascending=False) 249 name = df1['tags'][:10] 250 income = df1['num'][:10] 251 # 绘图details 252 colors = ['#FFB6C1', '#B0C4DE', '#FFFF00', '#FF4500', '#DCDCDC', '#009966', '#FF6600', '#FF0033', '#009999', '#333366'] 253 plot = squarify.plot(sizes=income, label=name, color=colors, alpha=1, value=income, edgecolor='white', linewidth=1.5) 254 # 设置图片显示属性,字体及大小 255 plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] 256 plt.rcParams['font.size'] = 8 257 plt.rcParams['axes.unicode_minus'] = False 258 # 设置标签大小为1 259 plt.rc('font', size=6) 260 # 设置标题大小 261 plot.set_title('网易云音乐欧美歌单标签图', fontsize=13, fontweight='light') 262 # 除坐标轴 263 plt.axis('off') 264 # 除上边框和右边框刻度 265 plt.tick_params(top=False, right=False) 266 # 图形展示 267 plt.show() 268 # 词云 269 import numpy as np 270 import wordcloud as wc 271 from PIL import Image 272 import matplotlib.pyplot as plt 273 274 bk = np.array(Image.open("wyymuc.jpg")) 275 mask = bk 276 # 定义尺寸 277 word_cloud = wc.WordCloud( 278 mask = mask, 279 background_color='mintcream', 280 font_path='msyhbd.ttc', 281 max_font_size=300, 282 random_state=50, 283 ) 284 text = muc["tag"] 285 text = " ".join(text) 286 word_cloud.generate(text) 287 plt.imshow(word_cloud) 288 plt.show()
五、总结
1.经过对主题数据的分析与可视化,可以得到哪些结论?是否达到预期的目标?
通过分析播放量、收藏量、评论量可以让我们清楚的得知一个歌单的受欢迎程度,达到了预期的目标。
2.在完成此设计过程中,得到哪些收获?以及要改进的建议?
在本次设计过程中,使我在数据处理方面有了很大的收获,能够达到自己的想要的效果。对数据分析有一定认知的同时,让我对数据分析所要用到的程序语言更加熟悉。
需要改进的地方可能就是不能够自主的编写程序,需要借鉴网络或者书本来编写程序;编写程序的效率不够高,花费的时间比较长。