Mahout實現基於用戶的協同過濾算法


Mahout中對協同過濾算法進行了封裝,看一個簡單的基於用戶的協同過濾算法。

基於用戶:通過用戶對物品的偏好程度來計算出用戶的在喜好上的近鄰,從而根據近鄰的喜好推測出用戶的喜好並推薦。

圖片來源

 

 

程序中用到的數據都存在MySQL數據庫中,計算結果也存在MySQL中的對應用戶表中。

package com.mahout.helloworlddemo;

import java.sql.Connection;
import java.sql.DatabaseMetaData;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.HashSet;
import java.util.List;

import org.apache.mahout.cf.taste.impl.model.jdbc.MySQLJDBCDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.JDBCDataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;

import com.mahout.util.DBUtil;
import com.mysql.jdbc.jdbc2.optional.MysqlDataSource;



/**
 *
 *@author wxisme
 *@time 2015-9-13 下午6:25:26
 */
public class RecommenderIntroFromMySQL {
    
    public static void main(String[] args) throws Exception {
        
        //連接MySQL
        MysqlDataSource dataSource = new MysqlDataSource();
        dataSource.setServerName("localhost");
        dataSource.setUser("root");
        dataSource.setPassword("1234");
        dataSource.setDatabaseName("mahoutdemo");
        
        
        //獲取數據模型
        JDBCDataModel dataModel = new MySQLJDBCDataModel(dataSource, "taste_preferences", "user_id", "item_id", "preference","time");
                                       
        DataModel model = dataModel;
        
        //計算相似度
        UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
        //計算閾值
        UserNeighborhood neighborhood = new NearestNUserNeighborhood(2,similarity,model);
        
        //推薦
        Recommender recommender = new GenericUserBasedRecommender(model,neighborhood,similarity);
        
        Connection con = DBUtil.getConnection();
        Statement stmt = con.createStatement();
        
        //獲取每個用戶的推薦數據並存入數據庫
        
        for(int i=0; i<5; i++) {
            List<RecommendedItem> recommendations = recommender.recommend(i, 3);
            
            String tableName = "user_" + i;
            
            for (RecommendedItem recommendation : recommendations) {
                
                //如果是第一次推薦就創建該用戶的數據表
                if(!doesTableExist(tableName)) {
                    
                    String createSQL = "create table " + tableName
                            + " (item_id bigint primary key,value float);";
                    stmt.execute(createSQL);
                }
                
                String insertSQL = "insert into " + tableName + " values ("
                        + recommendation.getItemID() + "," + recommendation.getValue() + " );";
                
                //插入用戶的推薦數據
                stmt.execute(insertSQL);
                
                
                System.out.println(recommendation);
            }
        }
        
        
        
        
        
    }
    
    
    /**
     * 是否存在這個數據表
     * @param tablename
     * @return
     * @throws SQLException
     */
    public static Boolean doesTableExist(String tablename) throws SQLException {
        HashSet<String> set = new HashSet<String>();
        Connection con = DBUtil.getConnection();
        DatabaseMetaData meta = con.getMetaData();
        ResultSet res = meta.getTables(null, null, null,
                new String[]{"TABLE"});
        while (res.next()) {
            set.add(res.getString("TABLE_NAME"));
        }
        DBUtil.close(res, con);
        return set.contains(tablename);
    }

}

 

測試數據:

1,101,5
1,102,3
1,103,2.5
2,101,2
2,102,2.5
2,103,5
2,104,2
3,101,2.5
3,104,4
3,105,4.5
3,107,5
4,101,5
4,103,3
4,104,4.5
4,106,4
5,101,4
5,102,3
5,103,2
5,104,4
5,105,3.5
5,106,4

運行結果:

更多Mahout和協同過濾算法的介紹與分析:

http://www.cnblogs.com/dlts26/archive/2011/08/23/2150225.html

http://www.tuicool.com/articles/FzmQziz

http://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy2/

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM