安裝說明
安裝Scikit-plot非常簡單,直接用命令:
pip install scikit-plot
即可完成安裝。
倉庫地址:
https://github.com/reiinakano/scikit-plot
里面有使用說明和樣例(py和ipynb格式)。
使用說明
簡單舉幾個例子
-
比如畫出分類評級指標的ROC曲線的完整代碼:
from sklearn.datasets import load_digits from sklearn.model_selection import train_test_split from sklearn.naive_bayes import GaussianNB X, y = load_digits(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) nb = GaussianNB() nb.fit(X_train, y_train) predicted_probas = nb.predict_proba(X_test) # The magic happens here import matplotlib.pyplot as plt import scikitplot as skplt skplt.metrics.plot_roc(y_test, predicted_probas) plt.show()
效果如圖
圖:ROC曲線
-
P-R曲線就是精確率precision vs 召回率recall 曲線,以recall作為橫坐標軸,precision作為縱坐標軸。首先解釋一下精確率和召回率。
import matplotlib.pyplot as plt from sklearn.naive_bayes import GaussianNB from sklearn.datasets import load_digits as load_data import scikitplot as skplt # Load dataset X, y = load_data(return_X_y=True) # Create classifier instance then fit nb = GaussianNB() nb.fit(X,y) # Get predicted probabilities y_probas = nb.predict_proba(X) skplt.metrics.plot_precision_recall_curve(y, y_probas, cmap='nipy_spectral') plt.show()
-
混淆矩陣是分類的重要評價標准,下面代碼是用隨機森林對鳶尾花數據集進行分類,分類結果畫一個歸一化的混淆矩陣。
from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import load_digits as load_data from sklearn.model_selection import cross_val_predict import matplotlib.pyplot as plt import scikitplot as skplt X, y = load_data(return_X_y=True) # Create an instance of the RandomForestClassifier classifier = RandomForestClassifier() # Perform predictions predictions = cross_val_predict(classifier, X, y) plot = skplt.metrics.plot_confusion_matrix(y, predictions, normalize=True) plt.show()
圖:歸一化混淆矩陣
-
其他圖如學習曲線、特征重要性、聚類的肘點等等,都可以用幾行代碼搞定。
圖:學習曲線、特征重要性
圖:K-means肘點圖