機器學習算法--Perceptron(感知器)算法

本文轉載自查看原文 2020-01-02 11:08 2079 機器學習

概括

Perceptron(感知器)是一個二分類線性模型，其輸入的是特征向量，輸出的是類別。Perceptron的作用即將數據分成正負兩類的超平面。可以說是機器學習中最基本的分類器。

模型

Perceptron 一樣屬於線性分類器。
對於向量$X={x}_1,{x}_2,...{x}_n$，對於權重向量（w）相乘，加上偏移（b），於是有：

\[f(x) = \sum_{i=1}^N w_i \mathbf{x}_i+b \]

設置閾值threshold之后，可以的到

\[如果 f(x) > threshold, 則 y(標簽)設為1 \]

\[如果 f(x) < threshold, 則 y(標簽)設為0 \]

即，可以把其表示為：

\[\mathbf{y} = sign(\mathbf{w}^{T} \mathbf{x}+\mathbf{b}) \]

參數學習

我們已經知道模型是什么樣，也知道Preceptron有兩個參數，那么如何更新這兩個參數呢？
首先，我們先Preceptron的更新策略：

初始化參數
對所有數據進行判斷，超平面是否可以把正實例點和負實例點完成正確分開。
如果不行，更新w，b。
重復執行2，3步，直到數據被分開，或者迭代次數到達上限。

那么如何更新w，b呢？

我們知道在任何時候，學習是朝着損失最小的地方，也就是說，我的下一步更新的策略是使當前點到超平面的損失函數極小化。
在超平面中，我們定義一個點到超平面的距離為：(具體如何得出，可以自行百度~~~ )，此外其中$ \frac{1}{\left | \mathbf{w} \right |}$是L2范數。意思可表示全部w向量的平方和的開方。

\[\frac{1}{\left \| \mathbf{w} \right \|} \left | \mathbf{w}^{T} \mathbf{x}+\mathbf{b} \right | \]

這里假設有M個點是誤差點，那么損失函數可以是：

\[L(\mathbf{w},\mathbf{b}) = -\frac{1}{\left \| \mathbf{w} \right \|}\sum_{\mathbf{x}_{i}\in M}\mathbf{y}_{i}(\mathbf{w}^{T} \mathbf{x}+\mathbf{b}) \]

當然，為了簡便計算，這里忽略$ \frac{1}{\left | \mathbf{w} \right |}$。

$ \frac{1}{\left | \mathbf{w} \right |}$不影響-y(w,x+b)正負的判斷，即不影響學習算法的中間過程。
因為感知機學習算法最終的終止條件是所有的輸入都被正確分類，即不存在誤分類的點。則此時損失函數為0，則可以看出$ \frac{1}{\left | \mathbf{w} \right |}$對最終結果也無影響。
既然沒有影響，為了簡便計算，直接忽略$ \frac{1}{\left | \mathbf{w} \right |}$。
- 這里可以把損失函數表示為：

\[L(\mathbf{w},\mathbf{b}) = -\sum_{\mathbf{x}_{i}\in M}\mathbf{y}_{i}(\mathbf{w}^{T} \mathbf{x}+\mathbf{b}) \]

這里我們只對一個點到超平面的損失進行計算。利用$L(\mathbf{w},\mathbf{b})$分別對w跟b求導，可得使L最小的w，b：

\[ L(\mathbf{w},\mathbf{b}) = -\mathbf{y}_{i}(\mathbf{w}^{T} \mathbf{x}+\mathbf{b})\Rightarrow \left\{\begin{matrix}\frac{\partial L(\mathbf{w},\mathbf{b})}{\partial \mathbf{w}} = -y_{i}x_{i}\\\frac{\partial L(\mathbf{w},\mathbf{b})}{\partial \mathbf{b}} = -y_{i}\end{matrix}\right. \]

那么對於w,b的更新可以表示為：

\[w_{k+1} = w_{k}+y_{i}x_{i} \]

\[b_{k+1} = b_{k}+y_{i} \]

注意

對於線性可分的數據，Perceptron在有限的迭代里一定會找到一個超平面，可以把數據正確分類，但是這個分離超平面不是唯一的。

對於線性不可分數據，Perceptron學習算法永遠不會結束，只可能達到迭代次數，因為不存在超平面完美分類數據，學習后期的超平面會一直“震盪”。

驗證及代碼

首先生成一組數據，可以完美線性可分

import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

# generate the separable data
def generate_separable_data(N):
    np.random.seed(2333)  # for reproducibility
    w = np.random.uniform(-1, 1, 2)
    b = np.random.uniform(-1, 1)
    X = np.random.uniform(-1, 1, [N, 2])
    y = np.sign(np.inner(w, X)+b)
    return X,y,w,b

# generate the non separable data, set the first negative point as the positive point.
def generate_non_separable_data(N):
    np.random.seed(2333)  # for reproducibility
    w = np.random.uniform(-1, 1, 2)
    b = np.random.uniform(-1, 1)
    X = np.random.uniform(-1, 1, [N, 2])
    y = np.sign(np.inner(w, X)+b)
    for i in range(len(y)):
        if y[i] == -1:
            y[i] = 1
            break
            
    return X,y,w,b

#plot the data 
def plot_data(X, y, w, b) :
    fig = plt.figure(figsize=(4,4))
    plt.xlim(-1,1)
    plt.ylim(-1,1)
    a = -w[0]/w[1]
    pts = np.linspace(-1,1)
    plt.plot(pts, a*pts-(b/w[1]), 'k-')
    cols = {1: 'r', -1: 'b'}
    for i in range(len(X)): 
        plt.plot(X[i][0], X[i][1], cols[y[i]]+'o')
    plt.show()

X,y,w,b = generate_separable_data(50)
plot_data(X, y, w,b)

X,y,w,b = generate_non_separable_data(50)
plot_data(X, y, w,b)

The Perceptron

class Perceptron :
 
    """An implementation of the perceptron algorithm."""
 
    def __init__(self, max_iterations=100, learning_rate=0.2) :
 
        self.max_iterations = max_iterations
        self.learning_rate = learning_rate
 
    def fit(self, X, y) :
        X = self.add_bias(X,y)
        
        self.w = np.zeros(len(X[0]))

        converged = False
        iterations = 0
        while (not converged and iterations <= self.max_iterations) :
            converged = True
            for i in range(len(X)) :
                if y[i] * self.decision_function(X[i]) <= 0 :
                    self.w = self.w + y[i] * self.learning_rate * X[i]
                    converged = False
            iterations += 1
        self.converged = converged
        plot_data(X[:,1:], y, self.w[1:],self.w[0])
        if converged :
            print ('converged in %d iterations ' % iterations)
        else:
            print ('cannot converged in %d iterations ' % iterations)

        print ('weight vector: ' + str(self.w))

    def decision_function(self, x) :
        return np.inner(self.w, x)

    def predict(self, X) :
        scores = np.inner(self.w, X)
        return np.sign(scores)
    
    def add_bias(self,X,y):
        a = np.ones(X.shape[0])
        X = np.insert(X, 0, values=a, axis=1)
        return X
    def error(self,X,y):
        num = 0
        err_sco = self.predict(X)
        num =sum (err_sco!=y)
        return num/len(X)

測試可以分割的數據

X,y,w,b = generate_separable_data(100)
p = Perceptron()
p.fit(X,y)

converged in 41 iterations 
weight vector: [-1.          0.07009608  1.53050105]

The Pocket

一種優化后的Perceptron。

Pocket 與Perceptron的參數更新一致，但是pocket在一定程度上解決了Perceptron震盪的問題。

在Perceptron中，碰到無法完美分割的數據集，只能等迭代次數到達上限，至於結果，無法預知，可能較好，可能很差，取決於震盪的結果。
而在Pocket中，我們加入了最優參數的記錄，保證記錄的參數是最好的，那么即使不能完美分割數據，也可以得到最優的解。

我們來看一下Pocket的更新策略：

初始化參數，記錄為最優
對所有數據進行判斷，超平面是否可以把正實例點和負實例點完成正確分開。
如果不行，計算最優w_p，b_p 下的誤差值e_p，w，b下的誤差值e，對比e_p與e。
如果新w，b產生的誤差值（e<e_p）小，令w_p = w，b_p = b，否則w_p，b_p不變
更新w，b
重復執行2，3，4，5步，直到數據被分開，或者迭代次數到達上限。

import copy
class Pocket :
 
    """An implementation of the Pocket algorithm."""
    
    def __init__(self, max_iterations=100, learning_rate=0.2) :
 
        self.max_iterations = max_iterations
        self.learning_rate = learning_rate
 
    def fit(self, X, y) :
        X = self.add_bias(X,y)
        
        self.w = np.zeros(len(X[0]))
        self.w_p = np.zeros(len(X[0])) # the best w
        self.error_pocket = self.error(X,y)   
        converged = False
        iterations = 0
        while (not converged and iterations <= self.max_iterations) :
            converged = True
            for i in range(len(X)) :
                if y[i] * self.decision_function(X[i]) <= 0 :
                    self.w = self.w + y[i] * self.learning_rate * X[i]
                    
                    """if self.w makes fewer mistakes than self.w_p, replace the 
                            self.w_p by self.w
                            In this way, it can find a "(somewhat)best" weight
                                """ 
                    self.error_in = self.error(X,y)
                    ### save the best value of w and b
                    if self.error_pocket>self.error_in:
                        self.w_p = copy.copy(self.w)
                        self.error_pocket = copy.copy(self.error_in)
                    converged = False

            iterations += 1
        self.converged = converged
        self.w = self.w_p
        plot_data(X[:,1:], y, self.w[1:],self.w[0])
        if converged :
            print ('converged in %d iterations ' % iterations)
        else:
            print ('cannot converged in %d iterations ' % iterations)
        print ('weight vector: ' + str(self.w))

    def decision_function(self, x) :
        return np.inner(self.w, x)
 
    def predict(self, X) :
        scores = np.inner(self.w, X)
        return np.sign(scores)
     
    def add_bias(self,X,y):
        a = np.ones(X.shape[0])
        X = np.insert(X, 0, values=a, axis=1)
        return X
    def error(self,X,y):
        num = 0
        err_sco = self.predict(X)
        num =sum (err_sco!=y)
        return num/len(X)

測試可以分割的數據

X,y,w,b = generate_separable_data(100)
p = Pocket()
p.fit(X,y)

converged in 41 iterations 
weight vector: [-1.          0.07009608  1.53050105]

對比

對比Perceptron 跟pocket的具體區別，這里使用不可分割數據作為測試查看

下圖可以看出，Perceptron不能很好的針對不可分做出優化，而pocket由於保存了較好的模型，理論上如果迭代次數足夠，可以達到最優模型

Perceptron

X,y,w,b = generate_non_separable_data(100)
p = Perceptron()
p.fit(X,y)

cannot converged in 101 iterations 
weight vector: [-0.4        -0.08723007  0.56359207]

Pocket

X,y,w,b = generate_non_separable_data(100)
p = Pocket()
p.fit(X,y)

cannot converged in 101 iterations 
weight vector: [-0.4        -0.00380026  0.60356631]

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python機器學習——感知器 Coursera機器學習基石第2講：感知器機器學習入門篇——感知器機器學習 —— 基礎整理（六）線性判別函數：感知器、松弛算法、Ho-Kashyap算法機器學習---感知機（Machine Learning Perceptron）《神經網絡與機器學習》學習筆記——第1章感知器機器學習之感知器和線性回歸、邏輯回歸以及SVM的相互對比感知器算法PLA Spark Multilayer perceptron classifier (MLPC)多層感知器分類器一.單層感知器