spark MLlib 概念 6：ALS（Alternating Least Squares） or (ALS-WR)

本文轉載自查看原文 2015-02-03 00:30 2653 算法/ scala/ ML

Large-scale Parallel Collaborative Filtering for the Netflix Prize

http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf

MATRIX FACTORIZATION TECHNIQUES FOR RECOMMENDER SYSTEMS

http://www2.research.att.com/~volinsky/papers/ieeecomputer.pdf

-------------------------------------------------

MATRIX FACTORIZATION TECHNIQUES FOR RECOMMENDER SYSTEMS

recommender systems are based on one of two strategies.

推薦系統基於兩種策略：

1）基於內容的過濾方法對每個用戶或產品創建一個模板；當然需要的額外信息可能很難得到。

The content filtering approach creates a profile for each user or product to characterize its nature. Of course, content-based strategies require gathering external information that might not be available or easy to collect.

2）另一種是依賴於用戶過去的行為，而不需要創建具體的模板。這種方法就是協同過濾。

An alternative to content filtering relies only on past user behavior—for example, previous transactions or product ratings— without requiring the creation of explicit profiles. This approach is known as collaborative filtering, a term coined by the developers of Tapestry, the first recommender system. While generally more accurate than content-based techniques, collaborative filtering suffers from what is called the cold start problem, due to its inability to address the system’s new products and users. In this aspect, content filtering is superior.

協同過濾的主要方向有基於鄰近的方法和基於隱藏因子的模型

The two primary areas of collaborative filtering are the neighborhood methods and latent factor models

基於鄰近的方法圍繞着如何計算商品或者用戶的關系。基於商品的方法評估一個用戶對一個商品的偏好，是通過判斷這個用戶對”相似“商品的評分得到的；

Neighborhood methods are centered on computing the relationships between items or, alternatively, between users. The item-oriented approach evaluates a user’s preference for an item based on ratings of “neighboring” items by the same user.

隱藏因子模型是另一種從商品和用戶兩方面同時解釋評分特點的方法。

Latent factor models are an alternative approach that tries to explain the ratings by characterizing both items and users on, say, 20 to 100 factors inferred from the ratings patterns

推薦系統依賴於不同類型的數據輸入，且數據經常一個矩陣中，其中一個維度代表用戶，另一個維度代表商品。一種強有力的矩陣分解方法應該是能融合額外的信息，當沒有顯示的的反饋時，可以通過隱式反饋來推斷用戶的偏好。

Recommender systems rely on different types of input data, which are often placed in a matrix with one dimension representing users and the other dimension representing items of interest.

One strength of matrix factorization is that it allows incorporation of additional information. When explicit feedback is not available, recommender systems can infer user preferences using implicit feedback

A BASIC MATRIX FACTORIZATION MODEL

矩陣分解模型將商品和用戶都映射到一個f維度的隱藏因子空間，因此用戶-產品的交互被一個內積所表示在這個空間內。

公式（1）為用戶u對商品i的評分。

這個模型很像奇異值分解模型（SVD)，但是由於矩陣的稀疏性，使用SVD很容易使得一些有用信息丟失。最近的研究更傾向於直接使用評分，同時使用一個調控參數來避免過度擬合。如公式（2）即為我們要優化的目標函數。

LEARNING ALGORI

學習算法：

1）隨機梯度下降，這是一個古老而簡單的算法，非主題，不展開。。

2）交替最小方差算法。

對於公式（2），由於存在兩個變量，所以它不是一個凸函數（如果看過機器學習的教科書，會發現最常用的優化方法就是目標函數是一個凸函數，然后通過尋找一階導數或二階導數為0來得到最小值；其中，最簡單的是二次方程，也就是我們說的拋物線，只要尋找一階導數為0即可）。但是如果固定其中一個變量，那么另一個變量就是一個二次方程了。所以我們可以交替的固定p、q這兩個變量（商品和用戶），每次優化一次變量，反復進行。

梯度下降的算法計算效率比較高，但是我們至少有兩個原因使用ALS：

1）ALS更容易並行化；

2）ALS對於處理隱式數據更方便；（這里的cannot be considered sparse應該是筆誤了，數據應該是稀疏的）。