轉利用python實現電影推薦

本文轉載自查看原文 2018-01-01 16:51 1867 python/ machine learning

“協同過濾”是推薦系統中的常用技術，按照分析維度的不同可實現“基於用戶”和“基於產品”的推薦。

以下是利用python實現電影推薦的具體方法，其中數據集源於《集體編程智慧》一書，后續的編程實現則完全是自己實現的（原書中的實現比較支離、難懂）。

這里我采用的是“基於產品”的推薦方法，因為一般情況下，產品的種類往往較少，而用戶的數量往往非常多，“基於產品”的推薦程序可以很好的減小計算量。

其實基本的思想很簡單：

首先讀入數據，形成用戶-電影矩陣，如圖所示：矩陣中的數據為用戶（橫坐標）對特定電影（縱坐標）的評分。

其次根據用戶-電影矩陣計算不同電影之間的相關系數（一般用person相關系數），形成電影-電影相關度矩陣。

其次根據電影-電影相關度矩陣，以及用戶已有的評分，通過加權平均計算用戶未評分電影的預估評分。例如用戶對A電影評3分、B電影評4分、C電影未評分，而C電影與A電影、B電影的相關度分別為0.3和0.8，則C電影的預估評分為(0.3*3+0.8*4)/(0.3+0.8)。

最后對於每一位用戶，提取其未評分的電影並按預估評分值倒序排列，提取前n位的電影作為推薦電影。

以下為程序源代碼，大塊的注釋還是比較詳細的，便於理解各個模塊的作用。此外，程序用到了pandas和numpy庫，實現起來會比較簡潔，因為許多功能如計算相關系數、排序等功能在這些庫中已有實現，直接拿來用即可。

[python] view plain copy

import pandas as pd
import numpy as np
#read the data
data={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'The Night Listener': 3.0},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Just My Luck': 2.0, 'Lady in the Water': 3.0,'Superman Returns': 3.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 2.0},
'Jack Matthews': {'Snakes on a Plane': 4.0, 'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}
#clean&transform the data
data = pd.DataFrame(data)
#0 represents not been rated
data = data.fillna(0)
#each column represents a movie
mdata = data.T
#calculate the simularity of different movies, normalize the data into [0,1]
np.set_printoptions(3)
mcors = np.corrcoef(mdata, rowvar=0)
mcors = 0.5+mcors*0.5
mcors = pd.DataFrame(mcors, columns=mdata.columns, index=mdata.columns)
#calculate the score of every item of every user
#matrix:the user-movie matrix
#mcors:the movie-movie correlation matrix
#item:the movie id
#user:the user id
#score:score of movie for the specific user
def cal_score(matrix,mcors,item,user):
totscore = 0
totsims = 0
score = 0
if pd.isnull(matrix[item][user]) or matrix[item][user]==0:
for mitem in matrix.columns:
if matrix[mitem][user]==0:
continue
else:
totscore += matrix[mitem][user]*mcors[item][mitem]
totsims += mcors[item][mitem]
score = totscore/totsims
else:
score = matrix[item][user]
return score
#calculate the socre matrix
#matrix:the user-movie matrix
#mcors:the movie-movie correlation matrix
#score_matrix:score matrix of movie for different users
def cal_matscore(matrix,mcors):
score_matrix = np.zeros(matrix.shape)
score_matrix = pd.DataFrame(score_matrix, columns=matrix.columns, index=matrix.index)
for mitem in score_matrix.columns:
for muser in score_matrix.index:
score_matrix[mitem][muser] = cal_score(matrix,mcors,mitem,muser)
return score_matrix
#give recommendations: depending on the score matrix
#matrix:the user-movie matrix
#score_matrix:score matrix of movie for different users
#user:the user id
#n:the number of recommendations
def recommend(matrix,score_matrix,user,n):
user_ratings = matrix.ix[user]
not_rated_item = user_ratings[user_ratings==0]
recom_items = {}
#recom_items={'a':1,'b':7,'c':3}
for item in not_rated_item.index:
recom_items[item] = score_matrix[item][user]
recom_items = pd.Series(recom_items)
recom_items = recom_items.sort_values(ascending=False)
return recom_items[:n]
#main
score_matrix = cal_matscore(mdata,mcors)
for i in range(10):
user = input(str(i)+' please input the name of user:')
print recommend(mdata,score_matrix,user,2)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python基於機器學習方法實現的電影推薦系統利用Surprise包進行電影推薦 python利用requests和threading模塊，實現多線程爬取電影天堂最新電影信息。親和性分析實現推薦電影電影推薦推薦算法之電影推薦利用Python爬取豆瓣電影 Python簡單電影推薦算法（根據用戶看過的電影名和對其打分進行推薦）一篇文章教會你利用Python網絡爬蟲實現豆瓣電影采集【大數據 Spark】利用電影觀看記錄數據,進行電影推薦