用sklearn的DecisionTreeClassifer訓練模型,然后用roc_auc_score計算模型的auc。代碼如下
clf = DecisionTreeClassifier(criterion='gini', max_depth=6, min_samples_split=10, min_samples_leaf=2) clf.fit(X_train, y_train) y_pred = clf.predict_proba(X_test) roc_auc = roc_auc_score(y_test, y_pred)
報錯信息如下
/Users/wgg/anaconda/lib/python2.7/site-packages/sklearn/metrics/ranking.pyc in _binary_clf_curve(y_true, y_score, pos_label, sample_weight) 297 check_consistent_length(y_true, y_score) 298 y_true = column_or_1d(y_true) --> 299 y_score = column_or_1d(y_score) 300 assert_all_finite(y_true) 301 assert_all_finite(y_score) /Users/wgg/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.pyc in column_or_1d(y, warn) 560 return np.ravel(y) 561 --> 562 raise ValueError("bad input shape {0}".format(shape)) 563 564 ValueError: bad input shape (900, 2)
目測是你的y_pred出了問題,你的y_pred是(900, 2)的array,也就是有兩列。
因為predict_proba返回的是兩列。predict_proba的用法參考這里。
簡而言之,你上面的代碼改成這樣就可以了。
y_pred = clf.predict_proba(X_test)[:, 1]
roc_auc = roc_auc_score(y_test, y_pred)
原文:http://sofasofa.io/forum_main_post.php?postid=1001678