機器學習：SVM（scikit-learn 中的 SVM：LinearSVC）

本文轉載自查看原文 2018-08-12 19:22 3158 機器學習算法

一、基礎理解

Hard Margin SVM 和 Soft Margin SVM 都是解決線性分類問題，無論是線性可分的問題，還是線性不可分的問題；

和 kNN 算法一樣，使用 SVM 算法前，要對數據做標准化處理；
原因：SVM 算法中設計到計算 Margin 距離，如果數據點在不同的維度上的量綱不同，會使得距離的計算有問題；
例如：樣本的兩種特征，如果相差太大，使用 SVM 經過計算得到的決策邊界幾乎為一條水平的直線——因為兩種特征的數據量綱相差太大，水平方向的距離可以忽略，因此，得到的最大的 Margin 就是兩條虛線的垂直距離；
只有不同特征的數據的量綱一樣時，得到的決策邊界才沒有問題；

二、例

　1）導入並繪制數據集

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target
X = X[y<2, :2]
y = y[y<2]

plt.scatter(X[y==0, 0], X[y==0, 1], color='red')
plt.scatter(X[y==1, 0], X[y==1, 1], color='blue')
plt.show()

　2）LinearSVC（線性 SVM 算法）

LinearSVC：該算法使用了支撐向量機的思想；

數據標准化

from sklearn.preprocessing import StandardScaler

standardScaler = StandardScaler()
standardScaler.fit(X)
X_standard = standardScaler.transform(X)

調用 LinearSVC

from sklearn.svm import LinearSVC

svc = LinearSVC(C=10**9)
svc.fit(X_standard, y)

導入繪制決策邊界的函數，並繪制模型決策邊界：Hard Margin SVM 思想

def plot_decision_boundary(model, axis):
    
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1,1),
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1,1)
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]
    
    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)
    
    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
    
    plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)

plot_decision_boundary(svc, axis=[-3, 3, -3, 3])
plt.scatter(X_standard[y==0, 0], X_standard[y==0, 1], color='red')
plt.scatter(X_standard[y==1, 0], X_standard[y==1, 1], color='blue')
plt.show()

繪制決策邊界：Soft Margin SVM 思想

svc2 = LinearSVC(C=0.01)
svc2.fit(X_standard, y)

plot_decision_boundary(svc2, axis=[-3, 3, -3, 3])
plt.scatter(X_standard[y==0, 0], X_standard[y==0, 1], color='red')
plt.scatter(X_standard[y==1, 0], X_standard[y==1, 1], color='blue')
plt.show()

　3）繪制支撐向量所在的直線

svc.coef_：算法模型的系數，有兩個值，因為樣本有兩種特征，每個特征對應一個系數；
系數：特征與樣本分類結果的關系系數；
svc.intercept_：模型的截距，一維向量，只有一個數，因為只有一條直線；

系數：w = svc.coef_
截距：b = svc.intercept_
決策邊界直線方程：w[0] * x0 + w[1] * x1 + b = 0
支撐向量直線方程：w[0] * x0 + w[1] * x1 + b = ±1
變形：

決策邊界：x1 = -w[0]/w[1] * x0 - b/w[1]
支撐向量：x1 = -w[0]/w[1] * x0 - b/w[1] ± 1/w[1]

修改繪圖函數

# 繪制：決策邊界、支撐向量所在的直線
def plot_svc_decision_boundary(model, axis):
    
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1,1),
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1,1)
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]
    
    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)
    
    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
    
    plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)
    
    w = model.coef_[0]
    b = model.intercept_[0]
    
    plot_x = np.linspace(axis[0], axis[1], 200)
    up_y = -w[0]/w[1] * plot_x - b/w[1] + 1/w[1]
    down_y = -w[0]/w[1] * plot_x - b/w[1] - 1/w[1]
    
    # 將 plot_x 與 up_y、down_y 的關系以折線圖的形式表示出來
    # 此處有一個問題：up_y和down_y的結果可能超過了 axis 中 y 坐標的范圍，需要添加一個過濾條件：
    # up_index：布爾向量，元素 True 表示，up_y 中的滿足 axis 中的 y 的范圍的值在 up_y 中的引索；
    # down_index：布爾向量，同理 up_index;
    up_index = (up_y >= axis[2]) & (up_y <= axis[3])
    down_index = (down_y >= axis[2]) & (down_y <= axis[3])
    plt.plot(plot_x[up_index], up_y[up_index], color='black')
    plt.plot(plot_x[down_index], down_y[down_index], color='black')

繪圖：Hard Margin SVM

plot_svc_decision_boundary(svc, axis=[-3, 3, -3, 3])
plt.scatter(X_standard[y==0, 0], X_standard[y==0, 1], color='red')
plt.scatter(X_standard[y==1, 0], X_standard[y==1, 1], color='blue')
plt.show()

繪圖：Soft Margin SVM

plot_svc_decision_boundary(svc2, axis=[-3, 3, -3, 3])
plt.scatter(X_standard[y==0, 0], X_standard[y==0, 1], color='red')
plt.scatter(X_standard[y==1, 0], X_standard[y==1, 1], color='blue')
plt.show()

現象：Margin 非常大，中間容錯了很多樣本點；
原因：C 超參數過小，模型容錯空間過大；
方案：調參；

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 機器學習：SVM（scikit-learn 中的 RBF、RBF 中的超參數 γ） [譯]使用scikit-learn進行機器學習(scikit-learn教程1) 機器學習算法庫scikit-learn的安裝 Python機器學習庫scikit-learn實踐基於 Python 和 Scikit-Learn 的機器學習介紹 4.2 Scikit-Learn簡介（機器學習篇）機器學習-Scikit-Learn與回歸樹機器學習算法庫——scikit-learn工具解讀 python調用scikit-learn機器學習機器學習利器——Scikit-learn的安裝