機器學習sklearn（二十四）：模型評估（四）量化預測的質量（一）scoring 參數: 定義模型評估規則

本文轉載自查看原文 2021-06-19 23:12 298

有 3 種不同的 API 用於評估模型預測的質量:

Estimator score method（估計器得分的方法）: Estimators（估計器）有一個 score（得分） 方法，為其解決的問題提供了默認的 evaluation criterion （評估標准）。在這個頁面上沒有相關討論，但是在每個 estimator （估計器）的文檔中會有相關的討論。
Scoring parameter（評分參數）: Model-evaluation tools （模型評估工具）使用 cross-validation (如 model_selection.cross_val_score 和 model_selection.GridSearchCV) 依靠 internal scoring strategy （內部 scoring（得分） 策略）。這在 scoring 參數: 定義模型評估規則部分討論。
Metric functions（指標函數）: metrics 模塊實現了針對特定目的評估預測誤差的函數。這些指標在以下部分部分詳細介紹分類指標, 多標簽排名指標, 回歸指標和聚類指標。

最后，虛擬估計用於獲取隨機預測的這些指標的基准值。

See also:對於 “pairwise（成對）” metrics（指標），samples（樣本） 之間而不是 estimators （估計量）或者 predictions（預測值），請參閱成對的矩陣, 類別和核函數部分。dr

1. `scoring` 參數: 定義模型評估規則

Model selection （模型選擇）和 evaluation （評估）使用工具，例如 model_selection.GridSearchCV 和 model_selection.cross_val_score ，采用 scoring 參數來控制它們對 estimators evaluated （評估的估計量）應用的指標。

3.3.1.1. 常見場景: 預定義值

對於最常見的用例, 您可以使用 scoring 參數指定一個 scorer object （記分對象）; 下表顯示了所有可能的值。所有 scorer objects （記分對象）遵循慣例 higher return values are better than lower return values（較高的返回值優於較低的返回值）。因此，測量模型和數據之間距離的 metrics （度量），如 metrics.mean_squared_error 可用作返回 metric （指數）的 negated value （否定值）的 neg_mean_squared_error 。

Scoring（得分）	Function（函數）	Comment（注解）
Classification（分類）
‘accuracy’	`metrics.accuracy_score`
‘average_precision’	`metrics.average_precision_score`
‘f1’	`metrics.f1_score`	for binary targets（用於二進制目標）
‘f1_micro’	`metrics.f1_score`	micro-averaged（微平均）
‘f1_macro’	`metrics.f1_score`	macro-averaged（宏平均）
‘f1_weighted’	`metrics.f1_score`	weighted average（加權平均）
‘f1_samples’	`metrics.f1_score`	by multilabel sample（通過 multilabel 樣本）
‘neg_log_loss’	`metrics.log_loss`	requires `predict_proba` support（需要 `predict_proba` 支持）
‘precision’ etc.	`metrics.precision_score`	suffixes apply as with ‘f1’（后綴適用於 ‘f1’）
‘recall’ etc.	`metrics.recall_score`	suffixes apply as with ‘f1’（后綴適用於 ‘f1’）
‘roc_auc’	`metrics.roc_auc_score`
Clustering（聚類）
‘adjusted_mutual_info_score’	`metrics.adjusted_mutual_info_score`
‘adjusted_rand_score’	`metrics.adjusted_rand_score`
‘completeness_score’	`metrics.completeness_score`
‘fowlkes_mallows_score’	`metrics.fowlkes_mallows_score`
‘homogeneity_score’	`metrics.homogeneity_score`
‘mutual_info_score’	`metrics.mutual_info_score`
‘normalized_mutual_info_score’	`metrics.normalized_mutual_info_score`
‘v_measure_score’	`metrics.v_measure_score`
Regression（回歸）
‘explained_variance’	`metrics.explained_variance_score`
‘neg_mean_absolute_error’	`metrics.mean_absolute_error`
‘neg_mean_squared_error’	`metrics.mean_squared_error`
‘neg_mean_squared_log_error’	`metrics.mean_squared_log_error`
‘neg_median_absolute_error’	`metrics.median_absolute_error`
‘r2’	`metrics.r2_score`

使用示例:

>>> from sklearn import svm, datasets
>>> from sklearn.model_selection import cross_val_score
>>> iris = datasets.load_iris()
>>> X, y = iris.data, iris.target
>>> clf = svm.SVC(probability=True, random_state=0)
>>> cross_val_score(clf, X, y, scoring='neg_log_loss')
array([-0.07..., -0.16..., -0.06...])
>>> model = svm.SVC()
>>> cross_val_score(model, X, y, scoring='wrong_choice')
Traceback (most recent call last):
ValueError: 'wrong_choice' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score']

注意

ValueError exception 列出的值對應於以下部分描述的 functions measuring prediction accuracy （測量預測精度的函數）。這些函數的 scorer objects （記分對象）存儲在 dictionary sklearn.metrics.SCORERS 中。

1.2. 根據 metric 函數定義您的評分策略

模塊 sklearn.metrics 還公開了一組 measuring a prediction error （測量預測誤差）的簡單函數，給出了基礎真實的數據和預測:

函數以 _score 結尾返回一個值來最大化，越高越好。
函數 _error 或 _loss 結尾返回一個值來 minimize （最小化），越低越好。當使用 make_scorer 轉換成 scorer object （記分對象）時，將 greater_is_better 參數設置為 False（默認為 True; 請參閱下面的參數說明）。

可用於各種機器學習任務的 Metrics （指標）在下面詳細介紹。

許多 metrics （指標）沒有被用作 scoring（得分） 值的名稱，有時是因為它們需要額外的參數，例如 fbeta_score 。在這種情況下，您需要生成一個適當的 scoring object （評分對象）。生成 callable object for scoring （可評估對象進行評分）的最簡單方法是使用 make_scorer 。該函數將 metrics （指數）轉換為可用於可調用的 model evaluation （模型評估）。

一個典型的用例是從庫中包含一個非默認值參數的 existing metric function （現有指數函數），例如 fbeta_score 函數的 beta 參數:

>>> from sklearn.metrics import fbeta_score, make_scorer
>>> ftwo_scorer = make_scorer(fbeta_score, beta=2)
>>> from sklearn.model_selection import GridSearchCV
>>> from sklearn.svm import LinearSVC
>>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)

第二個用例是使用 make_scorer 從簡單的 python 函數構建一個完全 custom scorer object （自定義的記分對象），可以使用幾個參數 :

你要使用的 python 函數（在下面的示例中是 my_custom_loss_func）
python 函數是否返回一個分數 (greater_is_better=True, 默認值) 或者一個 loss （損失） (greater_is_better=False)。如果是一個 loss （損失），scorer object （記分對象）的 python 函數的輸出被 negated （否定），符合 cross validation convention （交叉驗證約定），scorers 為更好的模型返回更高的值。
僅用於 classification metrics （分類指數）: 您提供的 python 函數是否需要連續的 continuous decision certainties （判斷確定性）（needs_threshold=True）。默認值為 False 。
任何其他參數，如 beta 或者 labels 在函數 f1_score 。

以下是建立 custom scorers （自定義記分對象）的示例，並使用 greater_is_better 參數:

>>> import numpy as np
>>> def my_custom_loss_func(y_true, y_pred):
...     diff = np.abs(y_true - y_pred).max()
...     return np.log1p(diff)
...
>>> # score will negate the return value of my_custom_loss_func,
>>> # which will be np.log(2), 0.693, given the values for X
>>> # and y defined below.
>>> score = make_scorer(my_custom_loss_func, greater_is_better=False)
>>> X = [[1], [1]]
>>> y = [0, 1]
>>> from sklearn.dummy import DummyClassifier
>>> clf = DummyClassifier(strategy='most_frequent', random_state=0)
>>> clf = clf.fit(X, y)
>>> my_custom_loss_func(clf.predict(X), y)
0.69...
>>> score(clf, X, y)
-0.69...

1.3. 實現自己的記分對象

您可以通過從頭開始構建自己的 scoring object （記分對象），而不使用 make_scorer factory 來生成更加靈活的 model scorers （模型記分對象）。對於被叫做 scorer 來說，它需要符合以下兩個規則所指定的協議:

可以使用參數 (estimator, X, y) 來調用它，其中 estimator 是要被評估的模型，X 是驗證數據， y 是 X (在有監督情況下) 或 None (在無監督情況下) 已經被標注的真實數據目標。
它返回一個浮點數，用於對 X 進行量化 estimator 的預測質量，參考 y 。再次，按照慣例，更高的數字更好，所以如果你的 scorer 返回 loss ，那么這個值應該被 negated 。

注意:在n_jobs > 1的函數中使用自定義評分器

雖然在調用函數的旁邊定義自定義計分函數應該使用默認的joblib后端(loky)，但是從另一個模塊導入它將是一種更健壯的方法，並且獨立於joblib后端。

例如，在下面的示例中，要使用大於1的n_jobs,custom_scoring_function函數保存在用戶創建的模塊中(custom_scorer_module.py)並導入:
>> from custom_scorer_module import custom_scoring_function >> cross_val_score(model,  ... X_train,  ... y_train,  ... scoring=make_scorer(custom_scoring_function, greater_is_better=False),  ... cv=5,  ... n_jobs=-1)

1.4. 使用多個指數評估

Scikit-learn 還允許在 GridSearchCV, RandomizedSearchCV 和 cross_validate 中評估 multiple metric （多個指數）。

為 scoring 參數指定多個評分指標有兩種方法:

作為 string metrics 的迭代:

>>> scoring = ['accuracy', 'precision']

作為 dict ，將 scorer 名稱映射到 scoring 函數:

>>> from sklearn.metrics import accuracy_score
>>> from sklearn.metrics import make_scorer
>>> scoring = {'accuracy': make_scorer(accuracy_score),
...            'prec': 'precision'

請注意， dict 值可以是 scorer functions （記分函數）或者 predefined metric strings （預定義 metric 字符串）之一。

目前，只有那些返回 single score （單一分數）的 scorer functions （記分函數）才能在 dict 內傳遞。不允許返回多個值的 Scorer functions （Scorer 函數），並且需要一個 wrapper 才能返回 single metric（單個指標）:

>>> from sklearn.model_selection import cross_validate
>>> from sklearn.metrics import confusion_matrix
>>> # A sample toy binary classification dataset
>>> X, y = datasets.make_classification(n_classes=2, random_state=0)
>>> svm = LinearSVC(random_state=0)
>>> def tn(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 0]
>>> def fp(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 1]
>>> def fn(y_true, y_pred): return confusion_matrix(y_true, y_pred)[1, 0]
>>> def tp(y_true, y_pred): return confusion_matrix(y_true, y_pred)[1, 1]
>>> scoring = {'tp': make_scorer(tp), 'tn': make_scorer(tn),
...            'fp': make_scorer(fp), 'fn': make_scorer(fn)}
>>> cv_results = cross_validate(svm.fit(X, y), X, y,
...                             scoring=scoring, cv=5)
>>> # Getting the test set true positive scores
>>> print(cv_results['test_tp'])  
[10  9  8  7  8]
>>> # Getting the test set false negative scores
>>> print(cv_results['test_fn'])  
[0 1 2 3 2]

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 機器學習sklearn（二十五）：模型評估（五）量化預測的質量（二）分類指標機器學習sklearn（二十二）：模型評估（二）交叉驗證：評估估算器的表現（二）計算交叉驗證的指標二、機器學習模型評估機器學習之模型評估 sklearn中模型評估和預測機器學習基礎——模型參數評估與選擇機器學習模型評估指標總結機器學習實戰--信用評估模型機器學習模型評估指標匯總機器學習【十】模型評估與優化