利用Python實現一個感知機學習算法

本文轉載自查看原文 2017-06-09 19:46 6733 python/ perceptron/ machine learning algorithm

　　本文主要參考英文教材Python Machine Learning第二章。pdf文檔下載鏈接: https://pan.baidu.com/s/1nuS07Qp 密碼: gcb9。

　　本文主要內容包括利用Python實現一個感知機模型並利用這個感知機模型完成一個分類任務。

　　Warren和McCullock於1943年首次提出MCP neuron神經元模型^[1]，之后，Frank Rosenblatt在MCP neuron model的基礎之上提出了感知機Perceptron模型^[2]。具體細節請閱讀教材第二章。

　　采用面向對象的方法編寫一個感知機接口，這樣就可以初始化一個新的感知機對象，這個對象可以通過fit()方法從數據中學習參數，通過predict()方法做預測。下面通過代碼來講解實現過程：

import numpy as np

class Perceptron(object):
    """Perceptron classifier.
    Parameters
    ------------
    eta:float,Learning rate (between 0.0 and 1.0)
    n_iter:int,Passes over the training dataset.
    
    Attributes
    -------------
    w_: 1d-array,Weights after fitting.
    errors_: list,Numebr of misclassifications in every epoch.
    """
    def __init__(self,eta=0.01,n_iter=10):
        self.eta = eta
        self.n_iter = n_iter
    def fit(self,X,y):
        """Fit training data.先對權重參數初始化，然后對訓練集中每一個樣本循環，根據感知機算法學習規則對權重進行更新
        Parameters
        ------------
        X: {array-like}, shape=[n_samples, n_features]
            Training vectors, where n_samples is the number of samples and n_featuers is the number of features.
        y: array-like, shape=[n_smaples]
            Target values.
        Returns
        ----------
        self: object
        """
        self.w_ = np.zeros(1 + X.shape[1]) # add w_0
　　　　　#初始化權重。數據集特征維數+1。
        self.errors_ = []#用於記錄每一輪中誤分類的樣本數
        
        for _ in range(self.n_iter):
            errors = 0
            for xi, target in zip(X,y):
                update = self.eta * (target - self.predict(xi))#調用了predict()函數
                self.w_[1:] += update * xi
                self.w_[0] += update
                errors += int(update != 0.0)
            self.errors_.append(errors)
        return self
    
    def net_input(self,X):
        """calculate net input"""
        return np.dot(X,self.w_[1:]) + self.w_[0]#計算向量點乘
    
    def predict(self,X):#預測類別標記
        """return class label after unit step"""
        return np.where(self.net_input(X) >= 0.0,1,-1)

　　接下來使用鳶尾花Iris數據集來訓練感知機模型。加載兩類花：Setosa和Versicolor。屬性選定為：sepal length和petal length。當然，不局限於兩個屬性。我們可通過One-vs-All(OvA)或One-vs-Rest(OvR)技術講二分類擴展到多分類的情形。

import pandas as pd#用pandas讀取數據
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import ListedColormap
from Perceptron_1 import Perceptron

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',header=None)#讀取數據還可以用request這個包
print(df.tail())#輸出最后五行數據，看一下Iris數據集格式

"""抽取出前100條樣本，這正好是Setosa和Versicolor對應的樣本，我們將Versicolor
對應的數據作為類別1，Setosa對應的作為-1。對於特征，我們抽取出sepal length和petal
length兩維度特征，然后用散點圖對數據進行可視化"""   

y = df.iloc[0:100,4].values
y = np.where(y == 'Iris-setosa',-1,1)
X = df.iloc[0:100,[0,2]].values
plt.scatter(X[:50,0],X[:50,1],color = 'red',marker='o',label='setosa')
plt.scatter(X[50:100,0],X[50:100,1],color='blue',marker='x',label='versicolor')
plt.xlabel('petal length')
plt.ylabel('sepal lenght')
plt.legend(loc='upper left')
plt.show()

#train our perceptron model now
#為了更好地了解感知機訓練過程，我們將每一輪的誤分類
#數目可視化出來，檢查算法是否收斂和找到分界線
ppn=Perceptron(eta=0.1,n_iter=10)
ppn.fit(X,y)
plt.plot(range(1,len(ppn.errors_)+1),ppn.errors_,marker='o')
plt.xlabel('Epoches')
plt.ylabel('Number of misclassifications')
plt.show()

#畫分界線超平面
def plot_decision_region(X,y,classifier,resolution=0.02):
    #setup marker generator and color map
    markers=('s','x','o','^','v')
    colors=('red','blue','lightgreen','gray','cyan')
    cmap=ListedColormap(colors[:len(np.unique(y))])
    
    #plot the desicion surface
    x1_min,x1_max=X[:,0].min()-1,X[:,0].max()+1
    x2_min,x2_max=X[:,1].min()-1,X[:,1].max()+1               
    
    xx1,xx2=np.meshgrid(np.arange(x1_min,x1_max,resolution),
                        np.arange(x2_min,x2_max,resolution))
    Z=classifier.predict(np.array([xx1.ravel(),xx2.ravel()]).T)
    Z=Z.reshape(xx1.shape)
    
    plt.contour(xx1,xx2,Z,alpha=0.4,cmap=cmap)
    plt.xlim(xx1.min(),xx1.max())
    plt.ylim(xx2.min(),xx2.max())
    
    #plot class samples
    for idx,cl in enumerate(np.unique(y)):
        plt.scatter(x=X[y==cl,0],y=X[y==cl,1],alpha=0.8,c=cmap(idx), marker=markers[idx],label=cl)

plot_decision_region(X,y,classifier=ppn)
plt.xlabel('sepal length [cm]')
plt.ylabel('petal length [cm]')
plt.legend(loc='upperleft')
plt.show()

　　結果如下圖：

　　若兩類模式是線性可分的，即存在一個線性超平面能將它們分開，則感知機的學習過程一定會收斂converge而求得適當的權向量；否則感知機學習過程就會發生振盪fluctuation，權重向量難以穩定下來，不能求得合適解，具體的證明過程見文章[3]

　　代碼中用到NumPy、Pandas和Matplotlib庫，不熟悉的可以通過如下鏈接學習。

　　NumPy: http://wiki.scipy.org/Tentative_NumPy_Tutorial

　　Pandas: http://pandas.pydata.org/pandas-docs/stable/tutorials.html

　　Matplotlib: http://matplotlib.org/users/beginner.html

References：

[1] W. S. McCulloch and W. Pitts. A Logical Calculus of the Ideas Immanent in Nervous Activity. The bulletin of mathematical biophysics, 5(4):115–133, 1943

[2] F. Rosenblatt, The Perceptron, a Perceiving and Recognizing Automaton. Cornell Aeronautical Laboratory, 1957

[3] Minsky, M. and S. Papert. (1969). Perceptrons. MIT Press, Cambridge, MA.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 感知機學習算法及其Python實現感知機-Python實現感知機（python實現）算法學習筆記——感知機原理及其代碼實現用Python實現感知機（python機器學習一）感知機簡單算法的實現感知機算法（PLA）代碼實現感知機算法 matlab實現感知機算法機器學習－感知機實現（1）