Ndcg及其實現,NDCG

發表時間：2020-07-10

1、NDCG的目標：希望得到的排序列表，質量越高越好。並且，如果將更相關的排到更前面，那么計算得到的NDCG是會越高的。
AUC和NDCG的區別：

1、AUC的含義：把正樣本排在負樣本前的概率。AUC關注的是全局的排序，只要正樣本排在負樣本之前，就可以得分。並沒有加權。
2、NDCG也是關注排序，但是NDCG關注的是，加權排序。比如我們希望top10的排序准確度，要比bottom10的排序准確度重要。對於這種加權排序，NDCG會更加合適。

因此，AUC和NDCG的區別是，加權與否。AUC的評估中，top-10的排序質量和bottom-10的排序質量是一樣重要的。但是，在NDCG中，是需要加權的，top-10的排序質量和bottom-10的排序質量的權重是不一樣的。

在這里插入圖片描述
2、

說明：sklearn只有到0.20版本才支持NDCG誤差的計算，因此我們可以將該代碼拷貝出來。

import numpy as np from sklearn.preprocessing import LabelBinarizer from sklearn.metrics import make_scorer from sklearn.utils import check_X_y import sys def dcg_score(y_true, y_score, k=5): order = np.argsort(y_score)[::-1] y_true = np.take(y_true, order[:k]) gain = 2 ** y_true - 1 #print(gain) discounts = np.log2(np.arange(len(y_true)) + 2) #print(discounts) return np.sum(gain / discounts) def ndcg_score(y_true, y_score, k=5): y_score, y_true = check_X_y(y_score, y_true) # Make sure we use all the labels (max between the length and the higher # number in the array) lb = LabelBinarizer() lb.fit(np.arange(max(np.max(y_true) + 1, len(y_true)))) binarized_y_true = lb.transform(y_true) print(binarized_y_true) if binarized_y_true.shape != y_score.shape: raise ValueError("y_true and y_score have different value ranges") scores = [] # Iterate over each y_value_true and compute the DCG score for y_value_true, y_value_score in zip(binarized_y_true, y_score): actual = dcg_score(y_value_true, y_value_score, k) best = dcg_score(y_value_true, y_value_true, k) #print(best) scores.append(actual / best) return np.mean(scores) # NDCG Scorer function # sklearn的NDCG對二維的計算有點問題，可以轉化為三分類問題 y_true = [0, 1, 0] y_score = [[0.0, 1.0, 0.0], [1.0, 0.0, 0.0], [0.0, 1.0, 0.0]] print(ndcg_score(y_true, y_score, k=2))

說明：sklearn對二分類的NDCG貌似不是支持得很好，所以折中一下，換成三分類，第三類補成概率為0.

二分類中計算該指標。

# ndcg def get_dcg(y_pred, y_true, k): #注意y_pred與y_true必須是一一對應的，並且y_pred越大越接近label=1(用相關性的說法就是，與label=1越相關) df = pd.DataFrame({"y_pred":y_pred, "y_true":y_true}) df = df.sort_values(by="y_pred", ascending=False) # 對y_pred進行降序排列，越排在前面的，越接近label=1 df = df.iloc[0:k, :] # 取前K個 dcg = (2 ** df["y_true"] - 1) / np.log2(np.arange(1, df["y_true"].count()+1) + 1) # 位置從1開始計數 dcg = np.sum(dcg) def get_ndcg(df, k): # df包含y_pred和y_true dcg = get_dcg(df["y_pred"], df["y_true"], k) idcg = get_dcg(df["y_true"], df["y_true"], k) ndcg = dcg / idcg return ndcg

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 CG, DCG, NDCG NDCG、AUC介紹搜索排序評估指標-NDCG IR的評價指標-MAP,NDCG和MRR 推薦系統中MAP與nDCG的計算方法推薦系統評價指標——（歸一化折損累計增益）NDCG 我是如何實現零焦慮的樹的實現為什么要實現Serializable 我們是如何實現DevOps的