【作業四】林軒田機器學習技法 + 機器學習公開新課學習個人體會

本文轉載自查看原文 2015-08-17 19:41 3929 機器學習公開課筆記

這次作業的coding任務量比較大，總的來說需要實現neural network, knn, kmeans三種模型。

Q11~Q14為Neural Network的題目，我用單線程實現的，運行的時間比較長，因此把這幾道題的正確答案記錄如下：

Q11: 6

Q12: 0.001

Q13: 0.01

Q14: 0.02 ≤ Eout ≤ 0.04

其中Q11和Q14的答案比較明顯，Q12和Q13有兩個答案比較接近（參考了討論區的內容，最終也調出來了）

neural network的代碼實現思路如下：

1）實現權重矩陣W初始化（def init_W(nnet_struct, w_range))

2）實現計算每一輪神經元輸出的函數，即bp算法中的forward過程(def forward_process(x, y, W))

3）實現計算每一輪output error對於每個神經元輸入score的導數，即bp算法中的backward過程(def backward_process(x, y, neuron_output, W))

4）利用梯度下降方法，更新各層權重矩陣W的函數(def update_W_withGD(x, neuron_output, gradient, W, ita))

其中最難的是步驟3），要想實現矩陣化編程，需要對神經網絡的每層結構熟練，同時對於你使用的編程語言的矩陣化操作要非常熟悉；自己在這個方面比較欠缺，還得是熟能生巧。

>>自己第一次寫NNet的算法，從單隱層（隱層個數2）開始調試的：按照模塊1）2）3）4）的順序，各個模塊調試；循序漸進的調試速度比較慢，但模塊質量高一些，后面的聯合調試就省事一些。

>>如果是特別復雜的網絡，如何對這種gradient的算法進行調試呢？因為gradient各個點的gradient幾乎是不可能都算到的，在網上查了gradient checking方法：http://ufldl.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization

>>NNet的調參真的很重要，就Q14來說，即使是hidden units的總個數一樣，如果每層的個數不同，最后的結果也是有差別的（我第一次比較粗心，把NNet的結構按照 3 8 1這樣了，發現結果沒有 8 3 1這樣好），后面多搜搜調參相關的資料積累一下。

代碼如下（沒有把調試的代碼刪掉，可以記錄調試的經過，同時也防止以后犯類似的錯誤），確實亂了一些，請看官包涵了：

#encoding=utf8
import sys
import numpy as np
import math
from random import *

##
# read data from local file
# return with numpy array
def read_input_data(path):
    x = []
    y = []
    for line in open(path).readlines():
        if line.strip()=='': continue
        items = line.strip().split(' ')
        tmp_x = []
        for i in range(0,len(items)-1): tmp_x.append(float(items[i]))
        x.append(tmp_x)
        y.append(float(items[-1]))
    return np.array(x),np.array(y)

## 
# initialize weight matrix
# input neural network structure & initilizing uniform value range (both low and high)
# each layer's bias need to be added
# return with inialized W
def init_W(nnet_struct, w_range):
    W = []
    for i in range(1,len(nnet_struct)):
        tmp_w = np.random.uniform(w_range['low'], w_range['high'], (nnet_struct[i-1]+1,nnet_struct[i]) )
        W.append(tmp_w)
    return W

## 
# randomly pick sample from raw data for Stochastic Gradient Descent
# T indicates the iterative numbers
# return with data for each SGD iteration
def pick_SGD_data(x, y, T):
    sgd_x = np.zeros((T,x.shape[1]))
    sgd_y = np.zeros(T)
    for i in range(T):
        index = randint(0, x.shape[0]-1)
        sgd_x[i] = x[index]
        sgd_y[i] = y[index]
    return sgd_x, sgd_y

## 
# forward process
# calculate each neuron's output
def forward_process(x, y, W):
    ret = []
    #print W[0].shape
    #print W[1].shape
    pre_x = np.hstack((1,x))
    for i in range(len(W)):
        pre_x = np.tanh(np.dot(pre_x, W[i]))
        ret.append(pre_x)
        pre_x = np.hstack((1,pre_x))
    return ret

##
# backward process
# calcultae the gradient of error and each neuron's input score
def backward_process(x, y, neuron_output, W):
    ret = []
    L = len(neuron_output)
    # print neuron_output[0].shape, neuron_output[1].shape
    # Output layer
    score = np.dot( np.hstack((1, neuron_output[L-2])), W[L-1])
    # print score
    # print score.shape
    gradient = np.array( [-2 * (y-neuron_output[L-1][0]) * tanh_gradient(score)] )
    # print gradient
    # print gradient.shape
    ret.insert(0, gradient)
    # Hidden layer 
    for i in range(L-2,-1,-1):
        if i==0:
            score = np.dot(np.hstack((1, x)),W[i])
            # print score.shape
            # print gradient.shape
            # print W[1][1:].transpose().shape
            # print score
            gradient = np.dot(gradient, W[1][1:].transpose()) * tanh_gradient(score)
            # print gradient
            # print gradient.shapeq
            ret.insert(0, gradient)
        else:
            score = np.dot(np.hstack((1,neuron_output[i-1])),W[i])
            # print score.shape
            # print gradient.shape
            # print W[i+1][1:].transpose().shape
            # print "......"
            gradient = np.dot(gradient , W[i+1][1:].transpose()) * tanh_gradient(score)
            # print gradient.shape
            # print "======"
            ret.insert(0, gradient)
    return ret

# give a numpy array
# boardcast tanh gradient to each element
def tanh_gradient(s):
    ret = np.zeros(s.shape)
    for i in range(s.shape[0]):
        ret[i] = 4.000001 / (math.exp(2*s[i])+math.exp(-2*s[i])+2)
    return ret


##
# update W with Gradient Descent
def update_W_withGD(x, neuron_output, gradient, W, ita):
    ret = []
    L = len(W)
    # print "L:"+str(L)
    # print neuron_output[0].shape, neuron_output[1].shape
    # print gradient[0].shape, gradient[1].shape
    # print W[0].shape, W[1].shape
    # print np.hstack((1,x)).transpose().shape
    # print gradient[0].shape
    ret.append( W[0] - ita * np.array([np.hstack((1,x))]).transpose() * gradient[0] )
    for i in range(1, L, 1):
        ret.append( W[i] - ita * np.array([np.hstack((1,neuron_output[i-1]))]).transpose() * gradient[i] )
    # print len(ret)
    return ret

## 
# calculate Eout
def calculate_E(W, path):
    x,y = read_input_data(path)
    error_count = 0
    for i in range(x.shape[0]):
        if predict(x[i],y[i],W):
            error_count += 1
    return 1.000001*error_count/x.shape[0]

def predict(x, y, W):
    y_predict = x
    for i in range(0, len(W), 1):
        y_predict = np.tanh( np.dot( np.hstack((1,y_predict)), W[i] ) )
    y_predict = 1 if y_predict>0 else -1
    return y_predict!=y

##
# Q11
def Q11(x,y):
    R = 20 # repeat time
    Ms = { 6, 16 } # hidden units
    M_lowests = {}
    for M in Ms: M_lowests[M] = 0
    for r in range(R):
        T = 50000
        ita = 0.1
        min_M = -1
        E_min = float("inf")
        for M in Ms:
            sgd_x, sgd_y = pick_SGD_data(x, y, T)
            nnet_struct = [ x.shape[1], M, 1 ]
            # print nnet_struct
            w_range = {}
            w_range['low'] = -0.1
            w_range['high'] = 0.1
            W = init_W(nnet_struct, w_range)
            # for i in range(len(W)):
            #    print W[i]
            # print sgd_x,sgd_y
            for t in range(T):
                neuron_output = forward_process(sgd_x[t], sgd_y[t], W)
                # print sgd_x[t],sgd_y[t]
                # print W
                # print neuron_output
                error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W)
                # print error_neuronInputScore_gradient
                W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita)
            E = calculate_E(W,"test.dat")
            # print str(r)+":::"+str(M)+":"+str(E)
            M_lowests[M] += E
    for k,v in M_lowests.items():
        print str(k)+":"+str(v)

##
# Q12
def Q12(x,y):
    ita = 0.1
    M = 3
    nnet_struct = [ x.shape[1], M, 1 ]
    Rs = { 0.001, 0.1 }
    R_lowests = {}
    for R in Rs: R_lowests[R] = 0
    N = 40
    T = 30000
    for i in range(N):
        for R in Rs:
            sgd_x, sgd_y = pick_SGD_data(x, y, T)
            w_range = {}
            w_range['low'] = -1*R
            w_range['high'] = R
            W = init_W(nnet_struct, w_range)
            for t in range(T):
                neuron_output = forward_process(sgd_x[t], sgd_y[t], W)
                error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W)
                W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita)
            E = calculate_E(W, "test.dat")
            print str(R)+":"+str(E)
            R_lowests[R] += E
    for k,v in R_lowests.items():
        print str(k)+":"+str(v)

## 
# Q13
def Q13(x,y):
    M = 3
    nnet_struct = [ x.shape[1], M, 1 ]
    itas = {0.001,0.01,0.1}
    ita_lowests = {}
    for ita in itas: ita_lowests[ita] = 0
    N = 20
    T = 20000
    for i in range(N):
        for ita in itas:
            sgd_x, sgd_y = pick_SGD_data(x, y, T)
            w_range = {}
            w_range['low'] = -0.1
            w_range['high'] = 0.1
            W = init_W(nnet_struct, w_range)
            for t in range(T):
                neuron_output = forward_process(sgd_x[t], sgd_y[t], W)
                error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W)
                W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita)
            E = calculate_E(W, "test.dat")
            print str(ita)+":"+str(E)
            ita_lowests[ita] += E
    for k,v in ita_lowests.items():
        print str(k)+":"+str(v)

##
# Q14
def Q14(x,y):
    T = 50000
    ita = 0.01
    E_total = 0
    R = 10
    for i in range(R):
        nnet_struct = [ x.shape[1], 8, 3, 1 ]
        w_range = {}
        w_range['low'] = -0.1
        w_range['high'] = 0.1
        W = init_W(nnet_struct, w_range)
        sgd_x, sgd_y = pick_SGD_data(x, y, T)
        for t in range(T):
            neuron_output = forward_process(sgd_x[t], sgd_y[t], W)
            error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W)
            W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita)    
        E = calculate_E(W, "test.dat")
        print E
        E_total += E
    print E_total*1.0/R


def main():
    x,y = read_input_data("train.dat")
    # print x.shape, y.shape
    # Q11(x, y)
    # Q12(x, y)
    # Q13(x, y)
    Q14(x, y)





if __name__ == '__main__':
    main()

Q15~Q18是KNN算法相關的，各道題幾乎秒出結果，這里不記錄答案了：

KNN的核心，也就是KNN函數了：

1）給定K個鄰居數，返回這個點屬於哪一類，代碼盡量寫的可配置一些

2）numpy有個argsort函數，可以根據數組的value大小，對下標index進行排序；並返回排序后的index；利用好這個特性，代碼很簡潔

3）如果是其他的語言，應該實現一個類似numpy.argsort的模塊，代碼整體上清晰不少能

KNN的代碼如下：

#encoding=utf8
import sys
import numpy as np
import math
from random import *

##
# read data from local file
# return with numpy array
def read_input_data(path):
    x = []
    y = []
    for line in open(path).readlines():
        if line.strip()=='': continue
        items = line.strip().split(' ')
        tmp_x = []
        for i in range(0,len(items)-1): tmp_x.append(float(items[i]))
        x.append(tmp_x)
        y.append(float(items[-1]))
    return np.array(x),np.array(y)


## 
# KNN ( for binary classification )
# input all labeled data & test sample
# return with label
def KNN(k, x, y, test_x):
    distance = np.sum((x-test_x)*(x-test_x), axis=1)
    order = np.argsort(distance)
    ret = 0
    for i in range(k):
        ret += y[order[i]]
    return 1 if ret>0 else -1


##
# Q15 calculate Ein
def calculate_Ein(x, y):
    error_count = 0
    k = 5
    for i in range(x.shape[0]-1):
        # tmp_x = np.vstack( ( x[0:i],x[(i+1):(x.shape[0]-1)] ) )
        # tmp_y = np.hstack( ( y[0:i],y[(i+1):(x.shape[0]-1)] ) )
        ret = KNN( k, x, y, x[i])
        if y[i]!=ret:
            error_count += 1
    return 1.0*error_count/x.shape[0]

##
# Q16 calculate Eout
def calculate_Eout(x, y, path):
    test_x, test_y = read_input_data(path)
    error_count = 0
    k = 1
    for i in range(test_x.shape[0]):
        ret = KNN (k, x, y, test_x[i])
        if test_y[i]!=ret:
            error_count += 1
    return 1.0*error_count/test_x.shape[0]

def main():
    x,y = read_input_data("knn_train.dat")
    print calculate_Ein(x,y)
    print calculate_Eout(x,y, "knn_test.dat")

if __name__ == '__main__':
    main()

Q19~Q20是Kmeans算法相關的，運行代碼也很快可以得出結果，不記錄答案了：

Kmeans的算法實現思路非常清晰：

1）實現初始化隨機選各類中心點的功能（題目中是隨機選原始數據的點，如果是其他的選點方法，單獨拎出來一個模塊，不影響其他模塊）

2）實現每次更新各個數據點類別的功能（def update_category(x, K, centers)）

3）固定各個點的類別，更新各個類別的center點坐標（def update_centers(x, y, K)）

模塊實現上，得益於numpy的矩陣計算操作函數。（應該掌握一套自己的矩陣計算操作代碼，這樣可以隨時拿起來二次開發）

代碼如下：

#encoding=utf8
import sys
import numpy as np
import math
from random import *

##
# read data from local file
# return with numpy array
def read_input_data(path):
    x = []
    for line in open(path).readlines():
        if line.strip()=='': continue
        items = line.strip().split(' ')
        tmp_x = []
        for i in range(0,len(items)): tmp_x.append(float(items[i]))
        x.append(tmp_x)
    return np.array(x)


## 
# input all data and category K
# return K category centers
def Kmeans(x, K):
    T = 50 
    E_total = 0
    for t in range(T):
        centers = init_centers(x, K)
        y = np.zeros(x.shape[0])
        R = 50
        for r in range(R):
            y = update_category(x, K, centers)
            centers = update_centers(x, y, K)
        E = calculate_Ein(x, y, centers)
        print E
        E_total += E
    return E_total*1.0/T

def init_centers(x, K):
    ret = []
    order = range(x.shape[0])
    np.random.shuffle(order)
    for i in range(K):
        ret.append(x[order[i]])
    return np.array(ret)

def update_category(x, K, centers):
    y = []
    for i in range(x.shape[0]):
        category = -1
        distance = float("inf")
        for k in range(K):
            d = np.sum((x[i] - centers[k])*(x[i] - centers[k]),axis=0)
            if d < distance:
                distance = d
                category = k
        y.append(category)
    return np.array(y)

def update_centers(x, y, K):
    centers = []
    for k in range(K):
        # print "np.sum(x[np.where(y==k)],axis=0)"
        # print np.sum(x[np.where(y==k)],axis=0).shape
        center = np.sum(x[np.where(y==k)],axis=0)*1.0/np.array(np.where(y==k)).shape[1]
        centers.append(center)
    return np.array(centers)

def calculate_Ein(x, y, centers):
    # print centers[0].shape
    error_total = 0
    for i in range(x.shape[0]):
        error_total += np.sum((x[i]-centers[y[i]])*(x[i]-centers[y[i]]),axis=0)
    return 1.0*error_total/x.shape[0]


def main():
    x = read_input_data("kmeans_train.dat")
    # print x.shape
    print Kmeans(x,2)


if __name__ == '__main__':
    main()

==========================================================================

完成了這次作業后，終於跟完了《機器學習基石+機器學習技法》32次課，8次coding作業。

個人上完這門課后，主要有三點收獲：

1）通過coding的作業題目，實現了一些主流機器學習算法（Perceptron、AdaBoost-stump、Linear Regression、Logistic Regression、Decision Tree、Neural Network、KNN、Kmeans）；以前都是用算法包，對各個算法的理解不如實現過一遍來得深和細。

2）以前對各個算法的理解就是會用（其實也不能說太會用），上完課程后，對每個模型的Motivation有了一定的掌握：模型為什么要這么設計？Regularizer為什么要這么設計？模型的利弊有哪些？以及模型的一些比較直觀的數學原理推導。

3）以前看待各個機器學習算法，都是孤立的看待每個算法（這個算法是解決啥的，那個算法是解決啥的），沒有成體系地把各個算法拎起來。台大這門課在整個授課環節中，都貫穿了非常強的體系的觀念，這里舉兩個例子：

　　a. Linear Network與Factorization有啥聯系（15講）

　　b. Decision Tree與AdaBoost有啥關系（8、9講）

　　c. Linear Regression與Neural Network有啥關系（12講）

在看這門課之前，是絕對不會把上面的每組中兩個模型聯系起來看待的；但這門課確實給了比較深的motivation，非常強的全局主線。

最后，談一點個人上公開課的體會：

1）只聽一遍：走馬觀花，學到的東西微乎其微

2）聽課，寫作業：實踐者的態度去學，學到的東西比只聽課要多了去了

3）聽課，寫作業，寫聽課blog：實踐者+研究者的態度去學；“最好的學就是教”，在寫blog的過程中，會強迫自己把當時很多不清晰的point都搞清楚，要不然真的寫不出來

4）循環進行3）：溫故知新的道理大家都懂，就看有沒有時間吧

Sign 就寫到這了.....

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【Matrix Factorization】林軒田機器學習技法【Deep Learning】林軒田機器學習技法【作業一】林軒田機器學習基石【作業二】林軒田機器學習基石【Perceptron Learning Algorithm】林軒田機器學習基石台大林軒田老師《機器學習基石》和《機器學習技法》筆記大綱 Coursera - 機器學習基石 - 林軒田 | 作業一 - 題目 & 答案 & 解析機器學習系列：個人體會《機器學習技法》---核型邏輯回歸《機器學習技法》---隨機森林