這次作業的coding任務量比較大,總的來說需要實現neural network, knn, kmeans三種模型。
Q11~Q14為Neural Network的題目,我用單線程實現的,運行的時間比較長,因此把這幾道題的正確答案記錄如下:
Q11: 6
Q12: 0.001
Q13: 0.01
Q14: 0.02 ≤ Eout ≤ 0.04
其中Q11和Q14的答案比較明顯,Q12和Q13有兩個答案比較接近(參考了討論區的內容,最終也調出來了)
neural network的代碼實現思路如下:
1)實現權重矩陣W初始化(def init_W(nnet_struct, w_range))
2)實現計算每一輪神經元輸出的函數,即bp算法中的forward過程(def forward_process(x, y, W))
3)實現計算每一輪output error對於每個神經元輸入score的導數,即bp算法中的backward過程(def backward_process(x, y, neuron_output, W))
4)利用梯度下降方法,更新各層權重矩陣W的函數(def update_W_withGD(x, neuron_output, gradient, W, ita))
其中最難的是步驟3),要想實現矩陣化編程,需要對神經網絡的每層結構熟練,同時對於你使用的編程語言的矩陣化操作要非常熟悉;自己在這個方面比較欠缺,還得是熟能生巧。
>>自己第一次寫NNet的算法,從單隱層(隱層個數2)開始調試的:按照模塊1)2)3)4)的順序,各個模塊調試;循序漸進的調試速度比較慢,但模塊質量高一些,后面的聯合調試就省事一些。
>>如果是特別復雜的網絡,如何對這種gradient的算法進行調試呢?因為gradient各個點的gradient幾乎是不可能都算到的,在網上查了gradient checking方法:http://ufldl.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization
>>NNet的調參真的很重要,就Q14來說,即使是hidden units的總個數一樣,如果每層的個數不同,最后的結果也是有差別的(我第一次比較粗心,把NNet的結構按照 3 8 1這樣了,發現結果沒有 8 3 1這樣好),后面多搜搜調參相關的資料積累一下。
代碼如下(沒有把調試的代碼刪掉,可以記錄調試的經過,同時也防止以后犯類似的錯誤),確實亂了一些,請看官包涵了:
#encoding=utf8 import sys import numpy as np import math from random import * ## # read data from local file # return with numpy array def read_input_data(path): x = [] y = [] for line in open(path).readlines(): if line.strip()=='': continue items = line.strip().split(' ') tmp_x = [] for i in range(0,len(items)-1): tmp_x.append(float(items[i])) x.append(tmp_x) y.append(float(items[-1])) return np.array(x),np.array(y) ## # initialize weight matrix # input neural network structure & initilizing uniform value range (both low and high) # each layer's bias need to be added # return with inialized W def init_W(nnet_struct, w_range): W = [] for i in range(1,len(nnet_struct)): tmp_w = np.random.uniform(w_range['low'], w_range['high'], (nnet_struct[i-1]+1,nnet_struct[i]) ) W.append(tmp_w) return W ## # randomly pick sample from raw data for Stochastic Gradient Descent # T indicates the iterative numbers # return with data for each SGD iteration def pick_SGD_data(x, y, T): sgd_x = np.zeros((T,x.shape[1])) sgd_y = np.zeros(T) for i in range(T): index = randint(0, x.shape[0]-1) sgd_x[i] = x[index] sgd_y[i] = y[index] return sgd_x, sgd_y ## # forward process # calculate each neuron's output def forward_process(x, y, W): ret = [] #print W[0].shape #print W[1].shape pre_x = np.hstack((1,x)) for i in range(len(W)): pre_x = np.tanh(np.dot(pre_x, W[i])) ret.append(pre_x) pre_x = np.hstack((1,pre_x)) return ret ## # backward process # calcultae the gradient of error and each neuron's input score def backward_process(x, y, neuron_output, W): ret = [] L = len(neuron_output) # print neuron_output[0].shape, neuron_output[1].shape # Output layer score = np.dot( np.hstack((1, neuron_output[L-2])), W[L-1]) # print score # print score.shape gradient = np.array( [-2 * (y-neuron_output[L-1][0]) * tanh_gradient(score)] ) # print gradient # print gradient.shape ret.insert(0, gradient) # Hidden layer for i in range(L-2,-1,-1): if i==0: score = np.dot(np.hstack((1, x)),W[i]) # print score.shape # print gradient.shape # print W[1][1:].transpose().shape # print score gradient = np.dot(gradient, W[1][1:].transpose()) * tanh_gradient(score) # print gradient # print gradient.shapeq ret.insert(0, gradient) else: score = np.dot(np.hstack((1,neuron_output[i-1])),W[i]) # print score.shape # print gradient.shape # print W[i+1][1:].transpose().shape # print "......" gradient = np.dot(gradient , W[i+1][1:].transpose()) * tanh_gradient(score) # print gradient.shape # print "======" ret.insert(0, gradient) return ret # give a numpy array # boardcast tanh gradient to each element def tanh_gradient(s): ret = np.zeros(s.shape) for i in range(s.shape[0]): ret[i] = 4.000001 / (math.exp(2*s[i])+math.exp(-2*s[i])+2) return ret ## # update W with Gradient Descent def update_W_withGD(x, neuron_output, gradient, W, ita): ret = [] L = len(W) # print "L:"+str(L) # print neuron_output[0].shape, neuron_output[1].shape # print gradient[0].shape, gradient[1].shape # print W[0].shape, W[1].shape # print np.hstack((1,x)).transpose().shape # print gradient[0].shape ret.append( W[0] - ita * np.array([np.hstack((1,x))]).transpose() * gradient[0] ) for i in range(1, L, 1): ret.append( W[i] - ita * np.array([np.hstack((1,neuron_output[i-1]))]).transpose() * gradient[i] ) # print len(ret) return ret ## # calculate Eout def calculate_E(W, path): x,y = read_input_data(path) error_count = 0 for i in range(x.shape[0]): if predict(x[i],y[i],W): error_count += 1 return 1.000001*error_count/x.shape[0] def predict(x, y, W): y_predict = x for i in range(0, len(W), 1): y_predict = np.tanh( np.dot( np.hstack((1,y_predict)), W[i] ) ) y_predict = 1 if y_predict>0 else -1 return y_predict!=y ## # Q11 def Q11(x,y): R = 20 # repeat time Ms = { 6, 16 } # hidden units M_lowests = {} for M in Ms: M_lowests[M] = 0 for r in range(R): T = 50000 ita = 0.1 min_M = -1 E_min = float("inf") for M in Ms: sgd_x, sgd_y = pick_SGD_data(x, y, T) nnet_struct = [ x.shape[1], M, 1 ] # print nnet_struct w_range = {} w_range['low'] = -0.1 w_range['high'] = 0.1 W = init_W(nnet_struct, w_range) # for i in range(len(W)): # print W[i] # print sgd_x,sgd_y for t in range(T): neuron_output = forward_process(sgd_x[t], sgd_y[t], W) # print sgd_x[t],sgd_y[t] # print W # print neuron_output error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W) # print error_neuronInputScore_gradient W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita) E = calculate_E(W,"test.dat") # print str(r)+":::"+str(M)+":"+str(E) M_lowests[M] += E for k,v in M_lowests.items(): print str(k)+":"+str(v) ## # Q12 def Q12(x,y): ita = 0.1 M = 3 nnet_struct = [ x.shape[1], M, 1 ] Rs = { 0.001, 0.1 } R_lowests = {} for R in Rs: R_lowests[R] = 0 N = 40 T = 30000 for i in range(N): for R in Rs: sgd_x, sgd_y = pick_SGD_data(x, y, T) w_range = {} w_range['low'] = -1*R w_range['high'] = R W = init_W(nnet_struct, w_range) for t in range(T): neuron_output = forward_process(sgd_x[t], sgd_y[t], W) error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W) W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita) E = calculate_E(W, "test.dat") print str(R)+":"+str(E) R_lowests[R] += E for k,v in R_lowests.items(): print str(k)+":"+str(v) ## # Q13 def Q13(x,y): M = 3 nnet_struct = [ x.shape[1], M, 1 ] itas = {0.001,0.01,0.1} ita_lowests = {} for ita in itas: ita_lowests[ita] = 0 N = 20 T = 20000 for i in range(N): for ita in itas: sgd_x, sgd_y = pick_SGD_data(x, y, T) w_range = {} w_range['low'] = -0.1 w_range['high'] = 0.1 W = init_W(nnet_struct, w_range) for t in range(T): neuron_output = forward_process(sgd_x[t], sgd_y[t], W) error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W) W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita) E = calculate_E(W, "test.dat") print str(ita)+":"+str(E) ita_lowests[ita] += E for k,v in ita_lowests.items(): print str(k)+":"+str(v) ## # Q14 def Q14(x,y): T = 50000 ita = 0.01 E_total = 0 R = 10 for i in range(R): nnet_struct = [ x.shape[1], 8, 3, 1 ] w_range = {} w_range['low'] = -0.1 w_range['high'] = 0.1 W = init_W(nnet_struct, w_range) sgd_x, sgd_y = pick_SGD_data(x, y, T) for t in range(T): neuron_output = forward_process(sgd_x[t], sgd_y[t], W) error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W) W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita) E = calculate_E(W, "test.dat") print E E_total += E print E_total*1.0/R def main(): x,y = read_input_data("train.dat") # print x.shape, y.shape # Q11(x, y) # Q12(x, y) # Q13(x, y) Q14(x, y) if __name__ == '__main__': main()
Q15~Q18是KNN算法相關的,各道題幾乎秒出結果,這里不記錄答案了:
KNN的核心,也就是KNN函數了:
1)給定K個鄰居數,返回這個點屬於哪一類,代碼盡量寫的可配置一些
2)numpy有個argsort函數,可以根據數組的value大小,對下標index進行排序;並返回排序后的index;利用好這個特性,代碼很簡潔
3)如果是其他的語言,應該實現一個類似numpy.argsort的模塊,代碼整體上清晰不少能
KNN的代碼如下:
#encoding=utf8 import sys import numpy as np import math from random import * ## # read data from local file # return with numpy array def read_input_data(path): x = [] y = [] for line in open(path).readlines(): if line.strip()=='': continue items = line.strip().split(' ') tmp_x = [] for i in range(0,len(items)-1): tmp_x.append(float(items[i])) x.append(tmp_x) y.append(float(items[-1])) return np.array(x),np.array(y) ## # KNN ( for binary classification ) # input all labeled data & test sample # return with label def KNN(k, x, y, test_x): distance = np.sum((x-test_x)*(x-test_x), axis=1) order = np.argsort(distance) ret = 0 for i in range(k): ret += y[order[i]] return 1 if ret>0 else -1 ## # Q15 calculate Ein def calculate_Ein(x, y): error_count = 0 k = 5 for i in range(x.shape[0]-1): # tmp_x = np.vstack( ( x[0:i],x[(i+1):(x.shape[0]-1)] ) ) # tmp_y = np.hstack( ( y[0:i],y[(i+1):(x.shape[0]-1)] ) ) ret = KNN( k, x, y, x[i]) if y[i]!=ret: error_count += 1 return 1.0*error_count/x.shape[0] ## # Q16 calculate Eout def calculate_Eout(x, y, path): test_x, test_y = read_input_data(path) error_count = 0 k = 1 for i in range(test_x.shape[0]): ret = KNN (k, x, y, test_x[i]) if test_y[i]!=ret: error_count += 1 return 1.0*error_count/test_x.shape[0] def main(): x,y = read_input_data("knn_train.dat") print calculate_Ein(x,y) print calculate_Eout(x,y, "knn_test.dat") if __name__ == '__main__': main()
Q19~Q20是Kmeans算法相關的,運行代碼也很快可以得出結果,不記錄答案了:
Kmeans的算法實現思路非常清晰:
1)實現初始化隨機選各類中心點的功能(題目中是隨機選原始數據的點,如果是其他的選點方法,單獨拎出來一個模塊,不影響其他模塊)
2)實現每次更新各個數據點類別的功能(def update_category(x, K, centers))
3)固定各個點的類別,更新各個類別的center點坐標(def update_centers(x, y, K))
模塊實現上,得益於numpy的矩陣計算操作函數。(應該掌握一套自己的矩陣計算操作代碼,這樣可以隨時拿起來二次開發)
代碼如下:
#encoding=utf8 import sys import numpy as np import math from random import * ## # read data from local file # return with numpy array def read_input_data(path): x = [] for line in open(path).readlines(): if line.strip()=='': continue items = line.strip().split(' ') tmp_x = [] for i in range(0,len(items)): tmp_x.append(float(items[i])) x.append(tmp_x) return np.array(x) ## # input all data and category K # return K category centers def Kmeans(x, K): T = 50 E_total = 0 for t in range(T): centers = init_centers(x, K) y = np.zeros(x.shape[0]) R = 50 for r in range(R): y = update_category(x, K, centers) centers = update_centers(x, y, K) E = calculate_Ein(x, y, centers) print E E_total += E return E_total*1.0/T def init_centers(x, K): ret = [] order = range(x.shape[0]) np.random.shuffle(order) for i in range(K): ret.append(x[order[i]]) return np.array(ret) def update_category(x, K, centers): y = [] for i in range(x.shape[0]): category = -1 distance = float("inf") for k in range(K): d = np.sum((x[i] - centers[k])*(x[i] - centers[k]),axis=0) if d < distance: distance = d category = k y.append(category) return np.array(y) def update_centers(x, y, K): centers = [] for k in range(K): # print "np.sum(x[np.where(y==k)],axis=0)" # print np.sum(x[np.where(y==k)],axis=0).shape center = np.sum(x[np.where(y==k)],axis=0)*1.0/np.array(np.where(y==k)).shape[1] centers.append(center) return np.array(centers) def calculate_Ein(x, y, centers): # print centers[0].shape error_total = 0 for i in range(x.shape[0]): error_total += np.sum((x[i]-centers[y[i]])*(x[i]-centers[y[i]]),axis=0) return 1.0*error_total/x.shape[0] def main(): x = read_input_data("kmeans_train.dat") # print x.shape print Kmeans(x,2) if __name__ == '__main__': main()
==========================================================================
完成了這次作業后,終於跟完了《機器學習基石+機器學習技法》32次課,8次coding作業。
個人上完這門課后,主要有三點收獲:
1)通過coding的作業題目,實現了一些主流機器學習算法(Perceptron、AdaBoost-stump、Linear Regression、Logistic Regression、Decision Tree、Neural Network、KNN、Kmeans);以前都是用算法包,對各個算法的理解不如實現過一遍來得深和細。
2)以前對各個算法的理解就是會用(其實也不能說太會用),上完課程后,對每個模型的Motivation有了一定的掌握:模型為什么要這么設計?Regularizer為什么要這么設計?模型的利弊有哪些?以及模型的一些比較直觀的數學原理推導。
3)以前看待各個機器學習算法,都是孤立的看待每個算法(這個算法是解決啥的,那個算法是解決啥的),沒有成體系地把各個算法拎起來。台大這門課在整個授課環節中,都貫穿了非常強的體系的觀念,這里舉兩個例子:
a. Linear Network與Factorization有啥聯系(15講)
b. Decision Tree與AdaBoost有啥關系(8、9講)
c. Linear Regression與Neural Network有啥關系(12講)
在看這門課之前,是絕對不會把上面的每組中兩個模型聯系起來看待的;但這門課確實給了比較深的motivation,非常強的全局主線。
最后,談一點個人上公開課的體會:
1)只聽一遍:走馬觀花,學到的東西微乎其微
2)聽課,寫作業:實踐者的態度去學,學到的東西比只聽課要多了去了
3)聽課,寫作業,寫聽課blog:實踐者+研究者的態度去學;“最好的學就是教”,在寫blog的過程中,會強迫自己把當時很多不清晰的point都搞清楚,要不然真的寫不出來
4)循環進行3):溫故知新的道理大家都懂,就看有沒有時間吧
Sign 就寫到這了.....