系列隨筆:
(二)基於商品屬性的相似商品推薦算法——Flink SQL實時計算實現商品的隱式評分
(三)基於商品屬性的相似商品推薦算法——批量處理商品屬性,得到屬性前綴及完整屬性字符串
(四)基於商品屬性的相似商品推薦算法——推薦與評分高的商品屬性相似的商品
2020.04.15 補充:協同過濾推薦算法.pptx
提取碼:4tds
推薦與評分高的商品屬性相似的商品
重點:相似系數計算公式:相同屬性位個數/總屬性位個數
一、按評分倒序,查詢會員瀏覽過的商品
$sql = "SELECT t1.member_code,t1.goods_code,t1.score,t2.goods_code,t2.goods_name,t2.shopping_guide,t2.market_price,t2.wbiao_price,t2.sale_type,t2.promotion,t2.models,t2.products,t2.image_url,t3.property_prefix,t3.properties FROM rc_member_goods t1 LEFT JOIN sj_goods t2 ON t2.goods_code=t1.goods_code LEFT JOIN rc_goods_properties t3 ON t3.goods_code=t1.goods_code WHERE t1.score<1000 AND t1.member_code IN ('". implode("','", $memberCodes) ."') ORDER BY t1.score DESC LIMIT {$nums}";
注1:添加 t1.score<1000 條件是為了過濾惡意(或機器)的行為記錄
注2:為了方便計算,這里的 $nums 默認取值為 2
注3:雖然這里 limit 2,但因為 rc_member_goods 和 rc_goods_properties 是一對多的關系,返回的數據行數會 >= 2;所以后面還要合並去重。
合並去重:
$return = []; while ($v = $records->fetch(PDO::FETCH_ASSOC)) { if (!isset($return[$v['member_code']])) { $return[$v['member_code']] = []; } if (!isset($return[$v['member_code']][$v['goods_code']])) { $return[$v['member_code']][$v['goods_code']] = $v; } else { $return[$v['member_code']][$v['goods_code']]['property_prefix'] += ','.$v['property_prefix']; $return[$v['member_code']][$v['goods_code']]['properties'] += ','.$v['properties']; } }
二、准備一下需要查詢的屬性前綴和需要排除的商品
// $records為上面查了出來的兩條記錄 $one = $records[0]; $memberCode = isset($one['member_code'])? $one['member_code']:0; $goodsCodes = []; $propertyPrefixs = []; if ($memberCode > 0) { // 會員的下過的訂單的商品(后面有用) $goodsCodes = $this->getMemberOrderGoods($memberCode); } $temp = []; foreach ($records as $key => $value) { $temp[$value['goods_code']] = []; $goodsCodes[] = $value['goods_code']; // 上面合並出來的前綴,現在又拆開[笑哭] if (strpos($value['property_prefix'], ',') !== false) { $prefixs = explode(',', $value['property_prefix']); $propertyPrefixs = array_merge($propertyPrefixs, $prefixs); } else { $propertyPrefixs[] = $value['property_prefix']; } }
三、查詢屬性前綴相同的商品(注:前綴相同,說明大致相似)
$sql = "SELECT t2.goods_code,t2.goods_name,t2.shopping_guide,t2.market_price,t2.wbiao_price,t2.sale_type,t2.promotion,t2.models,t2.products,t2.image_url,t1.properties FROM rc_goods_properties t1 LEFT JOIN sj_goods t2 ON t1.goods_code=t2.goods_code WHERE t1.goods_code NOT IN ('". implode("','", $goodsCodes) ."') AND t1.property_prefix IN ('". implode("','", $propertyPrefixs) ."') AND t2.status=1 AND t2.shelf_status=1 AND t2.view_status=1";
注:$goodsCode 和 $propertyPrefixs 為上一步得出的值
四、循環處理對比商品完整屬性,得出相似系數
while ($row = $list->fetch(PDO::FETCH_ASSOC)) { $goodsCode = $row['goods_code']; $properties = $row['properties']; unset($row['properties']); foreach ($records as $key => $value) { if (strpos($value['properties'], ',') !== false) { $vProperties = explode(',', $value['properties']); $row['similarity'] = 0; foreach ($vProperties as $p) { $row['similarity'] = max($this->genSimilarity($p, $properties), $row['similarity']); } } else { $row['similarity'] = $this->genSimilarity($value['properties'], $properties); } } }
相似系數計算公式:相同屬性位個數/總屬性位個數
private function genSimilarity($s1, $s2) { $arr1 = explode('|', $s1); $arr2 = explode('|', $s2); $same = 0; $total = count($arr1); foreach ($arr1 as $key => $v1) { $v2 = $arr2[$key]; if ($v1 == $v2) { $same++; } else { $t1 = explode(',', $v1); $t2 = explode(',', $v2); if (array_intersect($t1, $t2)) { $same++; } } } return $same/$total; }
五、過濾相似系數低的商品、按相似系數倒序排列
// $similarity 為你需要的最低相似度 if ($row['similarity'] >= $similarity) { if (isset($temp[$value['goods_code']][$goodsCode])) { // 因為多選屬性拆分的問題,多個前綴可能對應的同一個商品;這里通過比較,取相似度最高的記錄 if ($row['similarity'] > $temp[$value['goods_code']][$goodsCode]['similarity']) { $temp[$value['goods_code']][$goodsCode] = $row; } } else { // $value['goods_code'] 為原記錄的商品, $goodsCode 為正在對比的商品 $temp[$value['goods_code']][$goodsCode] = $row; } }
private function sortAndFilter($arr) { $return = []; foreach ($arr as $k => $v) { if (!empty($v)) { $v = array_values($v); uasort($v, function($a, $b){ if ($a['similarity'] == $b['similarity']) { return 0; } return $a['similarity'] < $b['similarity']? 1:-1; }); $return[] = $v; } } return $return; }
六、按評分比例,取N個商品
$return = []; // 商品B的占比 $rate = 0; if (isset($records[1]['score'])) { $rate = $records[1]['score']/($records[0]['score']+$records[1]['score']); } // 商品B的截取個數 $num2 = intval($nums * $rate); // 商品A的截取個數 $num1 = $nums - $num2; foreach ($temp as $key => $value) { if ($key == 0) { $p = array_slice($value, 0, $num1); $return = array_merge($return, $p); // 商品A的相似商品可能不夠 $num1 個,不夠就補給 $num2 $num2 += $num1 - count($p); } elseif ($key == 1) { $p = array_slice($value, 0, $num2); $return = array_merge($return, $p); } } // 假設商品B沒推薦商品或者總的推薦商品還是不夠 $num 個 $has = count($return); if ($has < $nums) { $p = array_slice($temp[0], $has, $nums-$has); $return = array_merge($return, $p); }
七、其他
如果最終的推薦商品數量還是不及 $num 個,則考慮補充銷量高的商品(隨機)
上一節:(三)基於商品屬性的相似商品推薦算法——批量處理商品屬性,得到屬性前綴及完整屬性字符串
下一節:(五)基於商品屬性的相似商品推薦算法——算法調優及其他