coursera機器學習-支持向量機SVM

本文轉載自查看原文 2013-12-07 13:42 2447 機器學習

#對coursera上Andrew Ng老師開的機器學習課程的筆記和心得；

#注:此筆記是我自己認為本節課里比較重要、難理解或容易忘記的內容並做了些補充，並非是課堂詳細筆記和要點；

#標記為<補充>的是我自己加的內容而非課堂內容，參考文獻列於文末。博主能力有限，若有錯誤，懇請指正；

#---------------------------------------------------------------------------------#

<補充>支持向量機方法的三要素(若不了解機器學習模型、策略、算法的具體意義，可參考機器學習三要素)

　　基本模型:間隔最大的線性分類器；若用上核技巧，成為實質上的非線性分類器；

　　學習策略:間隔最大化，可形式化為一個求解凸二次規划的問題；

　　學習算法:求解凸二次規划的最優化算法，如序列最小最優算法(SMO);

#---------------------------------------------------------------------------------#

由logistic regression引出SVM

logistic function(sigmoid function):g(z) = 1/(1 + e^-z)，z=Θ^Tx;

預測函數:;

logistic函數的圖形：

;

當Θ^Tx 遠大於0時，h_θ(x)接近於0；

logistic回歸的cost function:

；

當y=1時，上式變為-log(1 + e^-z),見圖形

;

SVM的cost function對logistic回歸的cost function做了改變，當y=1時，SVM的cost function記為cost₁(θ^T x)，分為兩部分(見下圖紫線)，當z>1時cost₁(Θ^Tx)=0,當z<1時cost₁(Θ^Tx)是條直線。這樣做有兩個好處，一是計算更快(從計算logistic函數轉變為計算直線函數)，二是更有利於后來的優化；

；

同理對y=0時做同樣的處理，得到cost₀(θ^T x)，下圖紫線。

；

由此我們得到cost₀(θ^T x)和cost₁(Θ^Tx)：

；

由此我們從最小化logistic回歸的cost function：

，

得到下式：

；

再令C=1/λ，去掉1/m(m是常數，不影響計算優化結果)，得到最終SVM的cost function：

；

#---------------------------------------------------------------------------------#

Large margin intuition

再來看SVM的cost₀(θ^T x)和cost₁(Θ^Tx)：

；

注意:SVM wants a bit more than that - doesn't want to *just* get it right, but have the value be quite a bit bigger than zero

Throws in an extra safety margin factor

對於訓練數據，SVM不僅要求是分的對，而且還有額外的間隔條件來保證分的“好”；

；

The green and magenta lines are functional decision boundaries which could be chosen by logistic regression
- But they probably don't generalize too well
The black line, by contrast is the the chosen by the SVM because of this safety net imposed by the optimization graphMathematically, that black line has a larger minimum distance (margin) from any of the training examples
- More robust separator

By separating with the largest margin，you incorporate robustness into your decision making process

<補充>什么是支持向量support vector？

　　下圖中兩個支撐着中間的 gap 的超平面，它們到中間的純紅線separating hyper plane 的距離相等，即我們所能得到的最大的 geometrical margin，而“支撐”這兩個超平面的必定會有一些點，而這些“支撐”的點便叫做支持向量Support Vector。　　　　　　

C的選擇對SVM的影響

C選的合適時，

；

C太大時造成過擬合(紫線)，

；

<補充>最大間隔分離超平面存在唯一性：若訓練數據線性可分(這是前提)，則可將訓練數據的樣本點完全正確分開的最大間隔分離超平面存在且唯一；

#---------------------------------------------------------------------------------#

Kernels

<補充>當訓練數據線性可分或近似線性可分時，通過間隔最大化，學習一個線性分類器；當訓練數據線性不可分時，使用核技巧(kernel trick)，學習非線性分類器；

<補充>核函數(kernel function)表示將輸入從輸入空間映射到特征空間得到的特征向量之間的內積。通過使用核函數可以學習非線性支持向量機，等價於隱式地在高維空間的特征空間中學習線性支持向量機；也就是說，在核函數K(x,z)給定的條件下，可以利用解線性分類問題的方法去求解非線性分類問題的支持向量機。學習是隱式的在特征空間進行的，不需要顯式地定義特征空間和映射函數。這樣的技巧稱作核技巧；

幾個常用核函數

Gaussian kernel(使用最多的):Need to define σ (σ²)；

;

linear kernel:no kernel；

others：Polynomial Kernel,String kernel,Chi-squared kernel...

#---------------------------------------------------------------------------------#

Logistic regression vs. SVM

If n (features) is large vs. m (training set)

e.g. text classification problem

Feature vector dimension is 10 000
Training set is 10 - 1000
Then use logistic regression or SVM with a linear kernel

If n is small and m is intermediate

n = 1 - 1000
m = 10 - 10 000
Gaussian kernel is good

If n is small and m is large

n = 1 - 1000
m = 50 000+
- - SVM will be slow to run with Gaussian kernel
In that case
- - Manually create or add more features
  - Use logistic regression of SVM with a linear kernel

Logistic regression and SVM with a linear kernel are pretty similar

Do similar things
Get similar performance

A lot of SVM's power is using diferent kernels to learn complex non-linear functions

For all these regimes a well designed NN should work

But, for some of these problems a NN might be slower - SVM well implemented would be faster

SVM has a convex optimization problem - so you get a global minimum

#---------------------------------------------------------------------------------#

參考文獻

《統計學習方法》，李航著

理解SVM的三層境界-支持向量機通俗導論，July、pluskid著

standford machine learning, by Andrew Ng

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【機器學習】支持向量機（SVM） Python機器學習算法 — 支持向量機（SVM）機器學習——支持向量機(SVM)之核函數(kernel) 機器學習基礎---支持向量機SVM 機器學習之支持向量機—SVM原理代碼實現機器學習回顧篇（11）：支持向量機（SVM）機器學習作業---支持向量機SVM（一）了解SVM stanford coursera 機器學習編程作業 exercise 6（支持向量機-support vector machines）【Python機器學習實戰】感知機和支持向量機學習筆記（三）之SVM的實現吳裕雄 python 機器學習——支持向量機SVM非線性分類SVC模型