Mahout中對協同過濾算法進行了封裝,看一個簡單的基於用戶的協同過濾算法。
基於用戶:通過用戶對物品的偏好程度來計算出用戶的在喜好上的近鄰,從而根據近鄰的喜好推測出用戶的喜好並推薦。
程序中用到的數據都存在MySQL數據庫中,計算結果也存在MySQL中的對應用戶表中。
package com.mahout.helloworlddemo; import java.sql.Connection; import java.sql.DatabaseMetaData; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; import java.util.HashSet; import java.util.List; import org.apache.mahout.cf.taste.impl.model.jdbc.MySQLJDBCDataModel; import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood; import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender; import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity; import org.apache.mahout.cf.taste.model.DataModel; import org.apache.mahout.cf.taste.model.JDBCDataModel; import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood; import org.apache.mahout.cf.taste.recommender.RecommendedItem; import org.apache.mahout.cf.taste.recommender.Recommender; import org.apache.mahout.cf.taste.similarity.UserSimilarity; import com.mahout.util.DBUtil; import com.mysql.jdbc.jdbc2.optional.MysqlDataSource; /** * *@author wxisme *@time 2015-9-13 下午6:25:26 */ public class RecommenderIntroFromMySQL { public static void main(String[] args) throws Exception { //連接MySQL MysqlDataSource dataSource = new MysqlDataSource(); dataSource.setServerName("localhost"); dataSource.setUser("root"); dataSource.setPassword("1234"); dataSource.setDatabaseName("mahoutdemo"); //獲取數據模型 JDBCDataModel dataModel = new MySQLJDBCDataModel(dataSource, "taste_preferences", "user_id", "item_id", "preference","time"); DataModel model = dataModel; //計算相似度 UserSimilarity similarity = new PearsonCorrelationSimilarity(model); //計算閾值 UserNeighborhood neighborhood = new NearestNUserNeighborhood(2,similarity,model); //推薦 Recommender recommender = new GenericUserBasedRecommender(model,neighborhood,similarity); Connection con = DBUtil.getConnection(); Statement stmt = con.createStatement(); //獲取每個用戶的推薦數據並存入數據庫 for(int i=0; i<5; i++) { List<RecommendedItem> recommendations = recommender.recommend(i, 3); String tableName = "user_" + i; for (RecommendedItem recommendation : recommendations) { //如果是第一次推薦就創建該用戶的數據表 if(!doesTableExist(tableName)) { String createSQL = "create table " + tableName + " (item_id bigint primary key,value float);"; stmt.execute(createSQL); } String insertSQL = "insert into " + tableName + " values (" + recommendation.getItemID() + "," + recommendation.getValue() + " );"; //插入用戶的推薦數據 stmt.execute(insertSQL); System.out.println(recommendation); } } } /** * 是否存在這個數據表 * @param tablename * @return * @throws SQLException */ public static Boolean doesTableExist(String tablename) throws SQLException { HashSet<String> set = new HashSet<String>(); Connection con = DBUtil.getConnection(); DatabaseMetaData meta = con.getMetaData(); ResultSet res = meta.getTables(null, null, null, new String[]{"TABLE"}); while (res.next()) { set.add(res.getString("TABLE_NAME")); } DBUtil.close(res, con); return set.contains(tablename); } }
測試數據:
1,101,5
1,102,3
1,103,2.5
2,101,2
2,102,2.5
2,103,5
2,104,2
3,101,2.5
3,104,4
3,105,4.5
3,107,5
4,101,5
4,103,3
4,104,4.5
4,106,4
5,101,4
5,102,3
5,103,2
5,104,4
5,105,3.5
5,106,4
運行結果:
更多Mahout和協同過濾算法的介紹與分析:
http://www.cnblogs.com/dlts26/archive/2011/08/23/2150225.html
http://www.tuicool.com/articles/FzmQziz
http://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy2/