sklearn中的投票法

本文轉載自查看原文 2018-04-10 11:18 5574 機器學習算法

投票法（voting）是集成學習里面針對分類問題的一種結合策略。基本思想是選擇所有機器學習算法當中輸出最多的那個類。

分類的機器學習算法輸出有兩種類型：一種是直接輸出類標簽，另外一種是輸出類概率，使用前者進行投票叫做硬投票(Majority/Hard voting)，使用后者進行分類叫做軟投票(Soft voting)。 sklearn中的VotingClassifier是投票法的實現。

硬投票

硬投票是選擇算法輸出最多的標簽，如果標簽數量相等，那么按照升序的次序進行選擇。下面是一個例子：

from sklearn import datasets
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier

iris = datasets.load_iris()
X, y = iris.data[:,1:3], iris.target
clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()

eclf = VotingClassifier(estimators=[('lr',clf1),('rf',clf2),('gnb',clf3)], voting='hard')
#使用投票法將三個模型結合在以前，estimotor采用 [(name1,clf1),(name2,clf2),...]這樣的輸入，和Pipeline的輸入相同 voting='hard'表示硬投票

for clf, clf_name in zip([clf1, clf2, clf3, eclf],['Logistic Regrsssion', 'Random Forest', 'naive Bayes', 'Ensemble']):
    scores = cross_val_score(clf, X, y, cv=5, scoring='accuracy')
    print('Accuracy: {:.2f} (+/- {:.2f}) [{}]'.format(scores.mean(), scores.std(), clf_name))

輸出結果如下：

Accuracy: 0.90 (+/- 0.05) [Logistic Regrsssion]
Accuracy: 0.93 (+/- 0.05) [Random Forest]
Accuracy: 0.91 (+/- 0.04) [naive Bayes]
Accuracy: 0.95 (+/- 0.05) [Ensemble]

實際當中會報：DeprecationWarning

軟投票

軟投票是使用各個算法輸出的類概率來進行類的選擇，輸入權重的話，會得到每個類的類概率的加權平均值，值大的類會被選擇。

from itertools import product

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier

iris = datasets.load_iris()
X = iris.data[:,[0,2]] #取兩列，方便繪圖
y = iris.target

clf1 = DecisionTreeClassifier(max_depth=4)
clf2 = KNeighborsClassifier(n_neighbors=7)
clf3 = SVC(kernel='rbf', probability=True)
eclf = VotingClassifier(estimators=[('dt',clf1),('knn',clf2),('svc',clf3)], voting='soft', weights=[2,1,1])
#weights控制每個算法的權重, voting=’soft' 使用了軟權重


clf1.fit(X,y)
clf2.fit(X,y)
clf3.fit(X,y)
eclf.fit(X,y)

x_min, x_max = X[:,0].min() -1, X[:,0].max() + 1
y_min, y_max = X[:,1].min() -1, X[:,1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))  #創建網格

fig, axes = plt.subplots(2, 2, sharex='col', sharey='row', figsize=(10, 8)) #共享X軸和Y軸

for idx, clf, title in zip(product([0, 1],[0, 1]),
                           [clf1, clf2, clf3, eclf],
                           ['Decision Tree (depth=4)', 'KNN (k=7)',
                            'Kernel SVM', 'Soft Voting']):
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]) #起初我以為是預測的X的值，實際上是預測了上面創建的網格的值，以這些值來進行描繪區域
    Z = Z.reshape(xx.shape)
    axes[idx[0], idx[1]].contourf(xx, yy, Z, alpha=0.4)
    axes[idx[0], idx[1]].scatter(X[:, 0],X[:, 1], c=y, s=20, edgecolor='k')
    axes[idx[0], idx[1]].set_title(title)
plt.show()

輸出結果如下：

參考：

Voting Classifier

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 摩爾投票法集成方法（1）投票法 LeetCode題解-----Majority Element II 摩爾投票法 sklearn中的Pipeline sklearn 中的 Pipeline 機制 sklearn中的隨機森林 Sklearn 中 OneHotEncoder 解析 sklearn中的StandardScaler sklearn中的KMeans算法 sklearn中的損失函數