課程筆記
Coursera—Andrew Ng機器學習—課程筆記 Lecture 9_Neural Networks learning
作業說明
Exercise 4,Week 5,實現反向傳播 backpropagation神經網絡算法, 對圖片中手寫數字 0-9 進行識別。
數據集 :ex4data1.mat。手寫數字圖片數據,5000個樣例。每張圖片20px * 20px,也就是一共400個特征。數據集X維度為5000 * 400
ex4weights.mat。神經網絡每一層的權重。
文件清單
ex4.m- Octave/MATLAB script that steps you through the exercise
ex4data1.mat- Training set of hand-written digits
ex4weights.mat- Neural network parameters for exercise 4
submit.m- Submission script that sends your solutions to our servers
displayData.m- Function to help visualize the dataset
fmincg.m- Function minimization routine (similar to fminunc)
sigmoid.m- Sigmoid function
computeNumericalGradient.m- Numerically compute gradients
checkNNGradients.m- Function to help check your gradients
debugInitializeWeights.m- Function for initializing weights
predict.m- Neural network prediction function
[*] sigmoidGradient.m- Compute the gradient of the sigmoid function
[*] randInitializeWeights.m- Randomly initialize weights
[*] nnCostFunction.m- Neural network cost function
* 為必須要完成的
結論
和上一周的作業一樣。因為Octave里數組下標從1開始。所以這里將分類結果0用10替代。預測結果中的1-10代表圖片數字為1,2,3,4,5,6,7,8,9,0
矩陣運算 tricky 的地方在於維度對應,哪里需要轉置很關鍵。
1 神經網絡
1.1 數據可視化
在數據集X里隨機選擇100個數字,繪制圖像
displayData.m:
function [h, display_array] = displayData(X, example_width) %DISPLAYDATA Display 2D data in a nice grid % [h, display_array] = DISPLAYDATA(X, example_width) displays 2D data % stored in X in a nice grid. It returns the figure handle h and the % displayed array if requested. % Set example_width automatically if not passed in if ~exist('example_width', 'var') || isempty(example_width) example_width = round(sqrt(size(X, 2))); end % Gray Image colormap(gray); % Compute rows, cols [m n] = size(X); example_height = (n / example_width); % Compute number of items to display display_rows = floor(sqrt(m)); display_cols = ceil(m / display_rows); % Between images padding pad = 1; % Setup blank display display_array = - ones(pad + display_rows * (example_height + pad), ... pad + display_cols * (example_width + pad)); % Copy each example into a patch on the display array curr_ex = 1; for j = 1:display_rows for i = 1:display_cols if curr_ex > m, break; end % Copy the patch % Get the max value of the patch max_val = max(abs(X(curr_ex, :))); display_array(pad + (j - 1) * (example_height + pad) + (1:example_height), ... pad + (i - 1) * (example_width + pad) + (1:example_width)) = ... reshape(X(curr_ex, :), example_height, example_width) / max_val; curr_ex = curr_ex + 1; end if curr_ex > m, break; end end % Display Image h = imagesc(display_array, [-1 1]); % Do not show axis axis image off drawnow; end
ex4.m里的調用
load('ex4data1.mat'); m = size(X, 1); % Randomly select 100 data points to display sel = randperm(size(X, 1)); sel = sel(1:100); displayData(X(sel, :));
運行效果如下:
1.2 模型表示
ex4.m 里載入已經調好的權重矩陣weight。
% Load saved matrices from file load('ex4weights.mat'); % The matrices Theta1 and Theta2 will now be in your workspace % Theta1 has size 25 x 401 % Theta2 has size 10 x 26
這里g(z) 使用 sigmoid 函數。
神經網絡中,從上到下的每個原點是 feature 特征 x0, x1, x2...,不是實例。計算過程其實就是 feature 一層一層映射的過程。一層轉換之后,feature可能變多、也可能變少。下一層 i+1層 feature 的個數是通過權重矩陣里當前 θ(i) 的 row 行數來控制。
兩層權重 θ 已經在 ex4weights.mat 里給出。從a1映射到a2權重矩陣 θ1為 25 * 401,從a2映射到a3權重矩陣 θ2為10 * 26。因為最后有10個分類。(這意味着運算的時候要注意轉置)
1.3 前饋神經網絡和代價函數
首先完成不包含正則項的代價函數,公式如下:
注意,和之前不同的是: 由於y是范圍0-9的數字,計算之前需要轉換為下面這種向量的形式:
代碼為:
% convert y(0-9) to vector c = 1:num_labels; yt = zeros(m,num_labels); for i = 1:m yt(i,:) = (c==y(i)); end
nnCostFunction.m 計算代價函數的代碼如下:
% compute h(x) a1 = [ones(m, 1) X]; %5000x401 a2 = sigmoid(a1 * Theta1'); %5000x401乘以401x25得到5000x25。即把401個feature映射到25 a2 = [ones(m, 1) a2]; %5000x26 hx = sigmoid(a2 * Theta2'); %5000x26乘以26x10得到5000x10。即把26個feature映射到10 % first term part1 = -yt.*log(hx);
% second term part2 = (1-yt).*log(1-hx); % compute J J = 1 / m * sum(sum(part1 - part2));
需要注意的是,上一次作業里邏輯回歸的代價函數計算使用的是矩陣相乘的方式
part1 = -yt' * log(hx); part2 = (1-yt') * log(1-hx);
而這里神經網絡的公式中有兩層求和,需要使用 “矩陣點乘,sum,再sum” 的方式計算。 如果使用矩陣相乘省略一層sum,結果會出錯。
1.4 正則化的代價函數
給神經網絡中的代價函數加上正則項,公式如下:
nnCostFunction.m 代碼如下:
% convert y(0-9) to vector c = 1:num_labels; yt = zeros(m,num_labels); for i = 1:m yt(i,:) = (c==y(i)); end % compute h(x) a1 = [ones(m, 1) X]; %5000x401 a2 = sigmoid(a1 * Theta1'); %5000x401乘以401x25得到5000x25。即把401個feature映射到25 a2 = [ones(m, 1) a2]; %5000x26 hx = sigmoid(a2 * Theta2'); %5000x26乘以26x10得到5000x10。即把26個feature映射到10 % first term part1 = -yt.*log(hx); % second term part2 = (1-yt).*log(1-hx); % regularization term regTerm = lambda / 2 / m * (sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2))); % J with regularization J = 1 / m * sum(sum(part1 - part2)) + regTerm;
ex4.m 里的調用如下:
% 不使用正則化
lambda = 0; J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, ... num_labels, X, y, lambda); % 使用正則化 lambda = 1; J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, ... num_labels, X, y, lambda);
2 反向傳播
2.1 sigmoid gradient
計算sigmoid函數的梯度,公式如下:
sigmoidGradient.m
function g = sigmoidGradient(z) %SIGMOIDGRADIENT returns the gradient of the sigmoid function %evaluated at z % g = SIGMOIDGRADIENT(z) computes the gradient of the sigmoid function % evaluated at z. This should work regardless if z is a matrix or a % vector. In particular, if z is a vector or matrix, you should return % the gradient for each element. g = sigmoid(z).*(1-sigmoid(z)); //要求對向量和矩陣同樣適用,所以使用點乘而不是直接相乘 end
ex4.m中的調用
%% ================ Part 5: Sigmoid Gradient ================ g = sigmoidGradient([1 -0.5 0 0.5 1]); fprintf('Sigmoid gradient evaluated at [1 -0.5 0 0.5 1]:\n '); fprintf('%f ', g);
2.2 隨機初始化
在訓練神經網絡時,隨機初始化參數來進行對稱破壞非常重要。隨機初始化的一個有效策略是在
的范圍內統一隨機選擇 θ(l)的值,你應該使用 εinit = 0.12。 這里對值的選擇有一個說明:
randInitializeWeights.m
function W = randInitializeWeights(L_in, L_out)
%RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in %incoming connections and L_out outgoing connections % W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights % of a layer with L_in incoming connections and L_out outgoing % connections. % % Note that W should be set to a matrix of size(L_out, 1 + L_in) as % the column row of W handles the "bias" terms
epsilon_init = 0.12; W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init; end
ex4.m 里的調用為:
%% ================ Part 6: Initializing Pameters ================
initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size); initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels); % Unroll parameters initial_nn_params = [initial_Theta1(:) ; initial_Theta2(:)];
2.3 反向傳播
反向傳播算法,由右到左計算誤差項 δj(l):
詳細請看我的課程筆記:Coursera—Andrew Ng機器學習—課程筆記 Lecture 9_Neural Networks learning
(1)根據上面的公式計算 “誤差項 error term”。 代碼如下:
%----------------------------PART 2---------------------------------- % Accumulate the error term delta_3 = hx - yt; % 5000 x 10 delta_2 = delta_3 * Theta2 .* sigmoidGradient([ones(m, 1) z2]); % 5000 x 26 = 5000 x 10 * 10 x 26 .* 5000 x 26
% 去掉 δ2(0) 這一項 delta_2 = delta_2(:,2:end); % 5000 x 25
(2)計算梯度,公式和代碼如下:
% Accumulate the gradient D2 = delta_3' * a2; % 10 x 26 = 10 x 5000 * 5000 x 26 D1 = delta_2' * a1; % 25 x 401 = 25 x 5000 * 5000 x 401
(4)獲得代價函數 J(θ)針對Theta1 和 Theta2 的偏導數 ,公式和代碼如下:
% Obtain the (unregularized) gradient for the neural network cost function Theta2_grad = 1/m * D2; Theta1_grad = 1/m * D1;
2.4 梯度校驗
梯度校驗的原理:
如果梯度計算正確,則下面兩個值的差應該比較小
在 computeNumericalGradient.m 中,已經實現了梯度校驗的過程,它會生成一個小型神經網絡和數據集 來進行校驗。如果梯度計算正確,會得到一個小於 e-9 的差值。
在真正開始模型學習時,需要關閉梯度校驗。
2.5 正則化神經網絡
上面計算出的偏導數沒有加入正則項, 加入正則項的公式如下 ( j = 0 不參與正則化,即將θ的第一列置為0)
%----------------------------PART 3---------------------------------- %---Regularize gradients
temp1 = Theta1; temp2 = Theta2; temp1(:,1) = 0; % set first column to 0 temp2(:,1) = 0; % set first column to 0 Theta1_grad = Theta1_grad + lambda/m * temp1; Theta2_grad = Theta2_grad + lambda/m * temp2;
ex4.m 中的調用:
%% =============== Part 8: Implement Regularization =============== % Check gradients by running checkNNGradients lambda = 3; checkNNGradients(lambda); % Also output the costFunction debugging values debug_J = nnCostFunction(nn_params, input_layer_size, ... hidden_layer_size, num_labels, X, y, lambda);
2.6 使用 fmincg 函數訓練參數
ex4.m 中的調用如下:
%% =================== Part 8: Training NN =================== options = optimset('MaxIter', 50); % You should also try different values of lambda lambda = 1; % Create "short hand" for the cost function to be minimized costFunction = @(p) nnCostFunction(p, ... input_layer_size, ... hidden_layer_size, ... num_labels, X, y, lambda); % Now, costFunction is a function that takes in only one argument (the % neural network parameters) [nn_params, cost] = fmincg(costFunction, initial_nn_params, options);% Obtain Theta1 and Theta2 back from nn_params Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ... hidden_layer_size, (input_layer_size + 1)); Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ... num_labels, (hidden_layer_size + 1));
3 可視化hidden layer
如果我們將 Theta1 中的一行拿出來,去掉了第一個 bias term,得到一個 400 維的向量。可視化hidden單元的一種方法,就是將這個 400 維向量重新整形為 20×20 圖像,並顯示它。
ex4.m 中的調用如下:
%% ================= Part 9: Visualize Weights ================= displayData(Theta1(:, 2:end)); % 去掉第一列
圖像如下,Theta1 的每一行對應一個小格子:
4 預測
預測准確率為 94.34%。 我們引入正則化的作用是避免過擬合,如果將2.6中的 λ 設置為 0 或一個小數值,或者通過調整MaxIter,甚至可能得到一個准確率為100%的模型。但這種模型對於預測新來的數據,表現可能很差。
ex4.m 中的調用如下
%% ================= Part 10: Implement Predict ================= pred = predict(Theta1, Theta2, X);
5 運行結果
運行ex4.m 得到的結果如下:
Loading and Visualizing Data ... Program paused. Press enter to continue. Loading Saved Neural Network Parameters ... Feedforward Using Neural Network ... Cost at parameters (loaded from ex4weights): 0.287629 (this value should be about 0.287629) Program paused. Press enter to continue. Checking Cost Function (w/ Regularization) ... Cost at parameters (loaded from ex4weights): 0.383770 (this value should be about 0.383770) Program paused. Press enter to continue. Evaluating sigmoid gradient... Sigmoid gradient evaluated at [1 -0.5 0 0.5 1]: 0.196612 0.235004 0.250000 0.235004 0.196612 Program paused. Press enter to continue. Initializing Neural Network Parameters ... Checking Backpropagation... -0.0093 -0.0093 0.0089 0.0089 -0.0084 -0.0084 0.0076 0.0076 -0.0067 -0.0067 -0.0000 -0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000 0.0000 -0.0000 -0.0000 -0.0002 -0.0002 0.0002 0.0002 -0.0003 -0.0003 0.0003 0.0003 -0.0004 -0.0004 -0.0001 -0.0001 0.0001 0.0001 -0.0001 -0.0001 0.0002 0.0002 -0.0002 -0.0002 0.3145 0.3145 0.1111 0.1111 0.0974 0.0974 0.1641 0.1641 0.0576 0.0576 0.0505 0.0505 0.1646 0.1646 0.0578 0.0578 0.0508 0.0508 0.1583 0.1583 0.0559 0.0559 0.0492 0.0492 0.1511 0.1511 0.0537 0.0537 0.0471 0.0471 0.1496 0.1496 0.0532 0.0532 0.0466 0.0466 The above two columns you get should be very similar. (Left-Your Numerical Gradient, Right-Analytical Gradient) If your backpropagation implementation is correct, then the relative difference will be small (less than 1e-9). Relative Difference: 2.2366e-11 Program paused. Press enter to continue. Checking Backpropagation (w/ Regularization) ... -0.0093 -0.0093 0.0089 0.0089 -0.0084 -0.0084 0.0076 0.0076 -0.0067 -0.0067 -0.0168 -0.0168 0.0394 0.0394 0.0593 0.0593 0.0248 0.0248 -0.0327 -0.0327 -0.0602 -0.0602 -0.0320 -0.0320 0.0249 0.0249 0.0598 0.0598 0.0386 0.0386 -0.0174 -0.0174 -0.0576 -0.0576 -0.0452 -0.0452 0.0091 0.0091 0.0546 0.0546 0.3145 0.3145 0.1111 0.1111 0.0974 0.0974 0.1187 0.1187 0.0000 0.0000 0.0337 0.0337 0.2040 0.2040 0.1171 0.1171 0.0755 0.0755 0.1257 0.1257 -0.0041 -0.0041 0.0170 0.0170 0.1763 0.1763 0.1131 0.1131 0.0862 0.0862 0.1323 0.1323 -0.0045 -0.0045 0.0015 0.0015 The above two columns you get should be very similar. (Left-Your Numerical Gradient, Right-Analytical Gradient) If your backpropagation implementation is correct, then the relative difference will be small (less than 1e-9). Relative Difference: 2.17629e-11 Cost at (fixed) debugging parameters (w/ lambda = 10): 0.576051 (this value should be about 0.576051) Program paused. Press enter to continue. Training Neural Network... Iteration 1 | Cost: 3.298708e+00 Iteration 2 | Cost: 3.254768e+00 Iteration 3 | Cost: 3.209718e+00 Iteration 4 | Cost: 3.124366e+00 Iteration 5 | Cost: 2.858652e+00 Iteration 6 | Cost: 2.454280e+00 Iteration 7 | Cost: 2.259612e+00 Iteration 8 | Cost: 2.184967e+00 Iteration 9 | Cost: 1.895567e+00 Iteration 10 | Cost: 1.794052e+00 Iteration 11 | Cost: 1.658111e+00 Iteration 12 | Cost: 1.551086e+00 Iteration 13 | Cost: 1.440756e+00 Iteration 14 | Cost: 1.319321e+00 Iteration 15 | Cost: 1.218193e+00 Iteration 16 | Cost: 1.174144e+00 >> Iteration 17 | Cost: 1.121406e+00 Iteration 18 | Cost: 1.001795e+00 Iteration 19 | Cost: 9.730070e-01 Iteration 20 | Cost: 9.396211e-01 Iteration 21 | Cost: 8.982489e-01 Iteration 22 | Cost: 8.785754e-01 Iteration 23 | Cost: 8.558708e-01 Iteration 24 | Cost: 8.358078e-01 Iteration 25 | Cost: 8.074475e-01 Iteration 26 | Cost: 7.975287e-01 Iteration 27 | Cost: 7.883648e-01 Iteration 28 | Cost: 7.543000e-01 Iteration 29 | Cost: 7.318456e-01 Iteration 30 | Cost: 7.151468e-01 Iteration 31 | Cost: 6.919630e-01 Iteration 32 | Cost: 6.823971e-01 Iteration 33 | Cost: 6.766813e-01 Iteration 34 | Cost: 6.639429e-01 Iteration 35 | Cost: 6.579100e-01 Iteration 36 | Cost: 6.491120e-01 Iteration 37 | Cost: 6.405250e-01 Iteration 38 | Cost: 6.318625e-01 Iteration 39 | Cost: 6.180036e-01 Iteration 40 | Cost: 6.081649e-01 Iteration 41 | Cost: 5.973954e-01 Iteration 42 | Cost: 5.684440e-01 Iteration 43 | Cost: 5.465935e-01 Iteration 44 | Cost: 5.399081e-01 Iteration 45 | Cost: 5.320386e-01 Iteration 46 | Cost: 5.289632e-01 Iteration 47 | Cost: 5.252995e-01 Iteration 48 | Cost: 5.236517e-01 Iteration 49 | Cost: 5.233562e-01 Iteration 50 | Cost: 5.197894e-01 Program paused. Press enter to continue. Visualizing Neural Network... Program paused. Press enter to continue. Training Set Accuracy: 94.340000
https://github.com/madoubao/coursera_machine_learning/tree/master/homework/machine-learning-ex4/ex4