【原】Coursera—Andrew Ng機器學習—編程作業 Programming Exercise 4—反向傳播神經網絡

本文轉載自查看原文 2018-12-20 14:22 1199 Andrew Ng/ 機器學習/ Coursera

課程筆記

Coursera—Andrew Ng機器學習—課程筆記 Lecture 9_Neural Networks learning

作業說明

Exercise 4，Week 5，實現反向傳播 backpropagation神經網絡算法，對圖片中手寫數字 0-9 進行識別。

數據集 ：ex4data1.mat。手寫數字圖片數據，5000個樣例。每張圖片20px * 20px，也就是一共400個特征。數據集X維度為5000 * 400

　　　 ex4weights.mat。神經網絡每一層的權重。

文件清單

ex4.m- Octave/MATLAB script that steps you through the exercise
ex4data1.mat- Training set of hand-written digits
ex4weights.mat- Neural network parameters for exercise 4
submit.m- Submission script that sends your solutions to our servers
displayData.m- Function to help visualize the dataset
fmincg.m- Function minimization routine (similar to fminunc)
sigmoid.m- Sigmoid function
computeNumericalGradient.m- Numerically compute gradients
checkNNGradients.m- Function to help check your gradients
debugInitializeWeights.m- Function for initializing weights
predict.m- Neural network prediction function
[*] sigmoidGradient.m- Compute the gradient of the sigmoid function
[*] randInitializeWeights.m- Randomly initialize weights
[*] nnCostFunction.m- Neural network cost function

　　* 為必須要完成的

結論

和上一周的作業一樣。因為Octave里數組下標從1開始。所以這里將分類結果0用10替代。預測結果中的1-10代表圖片數字為1,2,3,4,5,6,7,8,9,0

矩陣運算 tricky 的地方在於維度對應，哪里需要轉置很關鍵。

1 神經網絡

1.1 數據可視化

在數據集X里隨機選擇100個數字，繪制圖像

displayData.m：

function [h, display_array] = displayData(X, example_width) %DISPLAYDATA Display 2D data in a nice grid % [h, display_array] = DISPLAYDATA(X, example_width) displays 2D data % stored in X in a nice grid. It returns the figure handle h and the % displayed array if requested. % Set example_width automatically if not passed in if ~exist('example_width', 'var') || isempty(example_width) example_width = round(sqrt(size(X, 2))); end % Gray Image colormap(gray); % Compute rows, cols [m n] = size(X); example_height = (n / example_width); % Compute number of items to display display_rows = floor(sqrt(m)); display_cols = ceil(m / display_rows); % Between images padding pad = 1; % Setup blank display display_array = - ones(pad + display_rows * (example_height + pad), ... pad + display_cols * (example_width + pad)); % Copy each example into a patch on the display array curr_ex = 1; for j = 1:display_rows for i = 1:display_cols if curr_ex > m, break; end % Copy the patch % Get the max value of the patch max_val = max(abs(X(curr_ex, :))); display_array(pad + (j - 1) * (example_height + pad) + (1:example_height), ... pad + (i - 1) * (example_width + pad) + (1:example_width)) = ... reshape(X(curr_ex, :), example_height, example_width) / max_val; curr_ex = curr_ex + 1; end if curr_ex > m, break; end end % Display Image h = imagesc(display_array, [-1 1]); % Do not show axis axis image off drawnow; end

ex4.m里的調用

load('ex4data1.mat');
m = size(X, 1);

% Randomly select 100 data points to display
sel = randperm(size(X, 1));
sel = sel(1:100);

displayData(X(sel, :));

運行效果如下：

1.2 模型表示

ex4.m 里載入已經調好的權重矩陣weight。

% Load saved matrices from file load('ex4weights.mat'); % The matrices Theta1 and Theta2 will now be in your workspace % Theta1 has size 25 x 401 % Theta2 has size 10 x 26

這里g(z) 使用 sigmoid 函數。

神經網絡中，從上到下的每個原點是 feature 特征 x0, x1, x2...，不是實例。計算過程其實就是 feature 一層一層映射的過程。一層轉換之后，feature可能變多、也可能變少。下一層 i+1層 feature 的個數是通過權重矩陣里當前 θ⁽ⁱ⁾ 的 row 行數來控制。

兩層權重 θ 已經在 ex4weights.mat 里給出。從a1映射到a2權重矩陣 θ₁為 25 * 401，從a2映射到a3權重矩陣 θ₂為10 * 26。因為最后有10個分類。（這意味着運算的時候要注意轉置）

1.3 前饋神經網絡和代價函數

首先完成不包含正則項的代價函數，公式如下：

注意，和之前不同的是：由於y是范圍0-9的數字，計算之前需要轉換為下面這種向量的形式：

代碼為：

% convert y（0-9） to vector
c = 1:num_labels;
yt = zeros(m,num_labels); 
for i = 1:m
    yt(i,:) = (c==y(i));   
end

nnCostFunction.m 計算代價函數的代碼如下：

% compute h(x)
a1 = [ones(m, 1) X];    %5000x401
a2 = sigmoid(a1 * Theta1');  %5000x401乘以401x25得到5000x25。即把401個feature映射到25

a2 = [ones(m, 1) a2];    %5000x26
hx = sigmoid(a2 * Theta2');  %5000x26乘以26x10得到5000x10。即把26個feature映射到10

% first term
part1 = -yt.*log(hx);

% second term
part2 = (1-yt).*log(1-hx);
% compute J
J = 1 / m * sum(sum(part1 - part2));

需要注意的是，上一次作業里邏輯回歸的代價函數計算使用的是矩陣相乘的方式

part1 = -yt' * log(hx);  part2 = (1-yt') * log(1-hx);

而這里神經網絡的公式中有兩層求和，需要使用 “矩陣點乘，sum，再sum” 的方式計算。如果使用矩陣相乘省略一層sum，結果會出錯。

1.4 正則化的代價函數

給神經網絡中的代價函數加上正則項，公式如下：

nnCostFunction.m 代碼如下：

% convert y（0-9） to vector
c = 1:num_labels;
yt = zeros(m,num_labels); 
for i = 1:m
    yt(i,:) = (c==y(i));   
end

% compute h(x)
a1 = [ones(m, 1) X];    %5000x401
a2 = sigmoid(a1 * Theta1');  %5000x401乘以401x25得到5000x25。即把401個feature映射到25

a2 = [ones(m, 1) a2];    %5000x26
hx = sigmoid(a2 * Theta2');  %5000x26乘以26x10得到5000x10。即把26個feature映射到10

% first term
part1 = -yt.*log(hx);
% second term
part2 = (1-yt).*log(1-hx);

% regularization term
regTerm = lambda / 2 / m * (sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2)));

% J with regularization
J = 1 / m * sum(sum(part1 - part2)) + regTerm;

ex4.m 里的調用如下：

% 不使用正則化
lambda = 0; J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, ... num_labels, X, y, lambda); % 使用正則化 lambda = 1; J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, ... num_labels, X, y, lambda);

2 反向傳播

2.1 sigmoid gradient

計算sigmoid函數的梯度，公式如下：

sigmoidGradient.m

function g = sigmoidGradient(z)
%SIGMOIDGRADIENT returns the gradient of the sigmoid function
%evaluated at z
%   g = SIGMOIDGRADIENT(z) computes the gradient of the sigmoid function
%   evaluated at z. This should work regardless if z is a matrix or a
%   vector. In particular, if z is a vector or matrix, you should return
%   the gradient for each element.
　　
g = sigmoid(z).*(1-sigmoid(z));　　//要求對向量和矩陣同樣適用，所以使用點乘而不是直接相乘

end

ex4.m中的調用

%% ================ Part 5: Sigmoid Gradient  ================

g = sigmoidGradient([1 -0.5 0 0.5 1]);
fprintf('Sigmoid gradient evaluated at [1 -0.5 0 0.5 1]:\n  ');
fprintf('%f ', g);

2.2 隨機初始化

在訓練神經網絡時，隨機初始化參數來進行對稱破壞非常重要。隨機初始化的一個有效策略是在

的范圍內統一隨機選擇 θ^（l）的值，你應該使用 ε_init = 0.12。這里對值的選擇有一個說明：

randInitializeWeights.m

function W = randInitializeWeights(L_in, L_out)

%RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in
%incoming connections and L_out outgoing connections
%   W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights 
%   of a layer with L_in incoming connections and L_out outgoing 
%   connections. 
%
%   Note that W should be set to a matrix of size(L_out, 1 + L_in) as
%   the column row of W handles the "bias" terms


epsilon_init = 0.12; 
W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init; 

end

ex4.m 里的調用為：

%% ================ Part 6: Initializing Pameters ================


initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size);
initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels);

% Unroll parameters
initial_nn_params = [initial_Theta1(:) ; initial_Theta2(:)];

2.3 反向傳播

反向傳播算法，由右到左計算誤差項 δ_j^(l)：

詳細請看我的課程筆記：Coursera—Andrew Ng機器學習—課程筆記 Lecture 9_Neural Networks learning

（1）根據上面的公式計算 “誤差項 error term”。代碼如下：

%----------------------------PART 2----------------------------------
% Accumulate the error term
delta_3 = hx - yt;                                               % 5000 x 10
delta_2 = delta_3 * Theta2 .* sigmoidGradient([ones(m, 1) z2]);  % 5000 x 26 = 5000 x 10 * 10 x 26 .* 5000 x 26

% 去掉 δ₂^（0）這一項
delta_2 = delta_2(:,2:end);   　　　　　　　　　　　　　　　　　　　　　　% 5000 x 25

（2）計算梯度，公式和代碼如下：

% Accumulate the gradient 
D2 = delta_3' * a2;    % 10 x 26 = 10 x 5000 * 5000 x 26
D1 = delta_2' * a1;    % 25 x 401 = 25 x 5000 * 5000 x 401

（4）獲得代價函數 J（θ）針對Theta1 和 Theta2 的偏導數，公式和代碼如下：

% Obtain the (unregularized) gradient for the neural network cost function
Theta2_grad = 1/m * D2;
Theta1_grad = 1/m * D1;

2.4 梯度校驗

梯度校驗的原理：

如果梯度計算正確，則下面兩個值的差應該比較小

在 computeNumericalGradient.m 中，已經實現了梯度校驗的過程，它會生成一個小型神經網絡和數據集來進行校驗。如果梯度計算正確，會得到一個小於 e^-9的差值。

在真正開始模型學習時，需要關閉梯度校驗。

2.5 正則化神經網絡

上面計算出的偏導數沒有加入正則項，加入正則項的公式如下（ j = 0 不參與正則化，即將θ的第一列置為0）

%----------------------------PART 3----------------------------------
%---Regularize gradients

temp1 = Theta1;
temp2 = Theta2;
temp1(:,1) = 0; 　　% set first column to 0
temp2(:,1) = 0; 　　% set first column to 0
Theta1_grad = Theta1_grad + lambda/m * temp1;
Theta2_grad = Theta2_grad + lambda/m * temp2;

ex4.m 中的調用：

%% =============== Part 8: Implement Regularization ===============

%  Check gradients by running checkNNGradients
lambda = 3;
checkNNGradients(lambda);

% Also output the costFunction debugging values
debug_J  = nnCostFunction(nn_params, input_layer_size, ...
                          hidden_layer_size, num_labels, X, y, lambda);

2.6 使用 fmincg 函數訓練參數

ex4.m 中的調用如下：

%% =================== Part 8: Training NN ===================

options = optimset('MaxIter', 50);

%  You should also try different values of lambda
lambda = 1;

% Create "short hand" for the cost function to be minimized
costFunction = @(p) nnCostFunction(p, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, X, y, lambda);

% Now, costFunction is a function that takes in only one argument (the
% neural network parameters)
[nn_params, cost] = fmincg(costFunction, initial_nn_params, options);% Obtain Theta1 and Theta2 back from nn_params
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

3 可視化hidden layer

如果我們將 Theta1 中的一行拿出來，去掉了第一個 bias term，得到一個 400 維的向量。可視化hidden單元的一種方法，就是將這個 400 維向量重新整形為 20×20 圖像，並顯示它。

ex4.m 中的調用如下：

%% ================= Part 9: Visualize Weights =================

displayData(Theta1(:, 2:end));   % 去掉第一列

圖像如下，Theta1 的每一行對應一個小格子：

4 預測

預測准確率為 94.34%。我們引入正則化的作用是避免過擬合，如果將2.6中的 λ 設置為 0 或一個小數值，或者通過調整MaxIter，甚至可能得到一個准確率為100%的模型。但這種模型對於預測新來的數據，表現可能很差。

ex4.m 中的調用如下

%% ================= Part 10: Implement Predict =================

pred = predict(Theta1, Theta2, X);

5 運行結果

運行ex4.m 得到的結果如下：

Loading and Visualizing Data ...

Program paused. Press enter to continue.


Loading Saved Neural Network Parameters ...

Feedforward Using Neural Network ...
Cost at parameters (loaded from ex4weights): 0.287629 
(this value should be about 0.287629)

Program paused. Press enter to continue.


Checking Cost Function (w/ Regularization) ... 
Cost at parameters (loaded from ex4weights): 0.383770 
(this value should be about 0.383770)
Program paused. Press enter to continue.


Evaluating sigmoid gradient...
Sigmoid gradient evaluated at [1 -0.5 0 0.5 1]:
  0.196612 0.235004 0.250000 0.235004 0.196612 

Program paused. Press enter to continue.


Initializing Neural Network Parameters ...

Checking Backpropagation... 
   -0.0093   -0.0093
    0.0089    0.0089
   -0.0084   -0.0084
    0.0076    0.0076
   -0.0067   -0.0067
   -0.0000   -0.0000
    0.0000    0.0000
   -0.0000   -0.0000
    0.0000    0.0000
   -0.0000   -0.0000
   -0.0002   -0.0002
    0.0002    0.0002
   -0.0003   -0.0003
    0.0003    0.0003
   -0.0004   -0.0004
   -0.0001   -0.0001
    0.0001    0.0001
   -0.0001   -0.0001
    0.0002    0.0002
   -0.0002   -0.0002
    0.3145    0.3145
    0.1111    0.1111
    0.0974    0.0974
    0.1641    0.1641
    0.0576    0.0576
    0.0505    0.0505
    0.1646    0.1646
    0.0578    0.0578
    0.0508    0.0508
    0.1583    0.1583
    0.0559    0.0559
    0.0492    0.0492
    0.1511    0.1511
    0.0537    0.0537
    0.0471    0.0471
    0.1496    0.1496
    0.0532    0.0532
    0.0466    0.0466

The above two columns you get should be very similar.
(Left-Your Numerical Gradient, Right-Analytical Gradient)

If your backpropagation implementation is correct, then 
the relative difference will be small (less than 1e-9). 

Relative Difference: 2.2366e-11

Program paused. Press enter to continue.





Checking Backpropagation (w/ Regularization) ... 
   -0.0093   -0.0093
    0.0089    0.0089
   -0.0084   -0.0084
    0.0076    0.0076
   -0.0067   -0.0067
   -0.0168   -0.0168
    0.0394    0.0394
    0.0593    0.0593
    0.0248    0.0248
   -0.0327   -0.0327
   -0.0602   -0.0602
   -0.0320   -0.0320
    0.0249    0.0249
    0.0598    0.0598
    0.0386    0.0386
   -0.0174   -0.0174
   -0.0576   -0.0576
   -0.0452   -0.0452
    0.0091    0.0091
    0.0546    0.0546
    0.3145    0.3145
    0.1111    0.1111
    0.0974    0.0974
    0.1187    0.1187
    0.0000    0.0000
    0.0337    0.0337
    0.2040    0.2040
    0.1171    0.1171
    0.0755    0.0755
    0.1257    0.1257
   -0.0041   -0.0041
    0.0170    0.0170
    0.1763    0.1763
    0.1131    0.1131
    0.0862    0.0862
    0.1323    0.1323
   -0.0045   -0.0045
    0.0015    0.0015

The above two columns you get should be very similar.
(Left-Your Numerical Gradient, Right-Analytical Gradient)

If your backpropagation implementation is correct, then 
the relative difference will be small (less than 1e-9). 

Relative Difference: 2.17629e-11


Cost at (fixed) debugging parameters (w/ lambda = 10): 0.576051 
(this value should be about 0.576051)

Program paused. Press enter to continue.


Training Neural Network... 
Iteration     1 | Cost: 3.298708e+00
Iteration     2 | Cost: 3.254768e+00
Iteration     3 | Cost: 3.209718e+00

Iteration     4 | Cost: 3.124366e+00
Iteration     5 | Cost: 2.858652e+00
Iteration     6 | Cost: 2.454280e+00
Iteration     7 | Cost: 2.259612e+00
Iteration     8 | Cost: 2.184967e+00

Iteration     9 | Cost: 1.895567e+00
Iteration    10 | Cost: 1.794052e+00
Iteration    11 | Cost: 1.658111e+00
Iteration    12 | Cost: 1.551086e+00
Iteration    13 | Cost: 1.440756e+00
Iteration    14 | Cost: 1.319321e+00
Iteration    15 | Cost: 1.218193e+00
Iteration    16 | Cost: 1.174144e+00

>> 
Iteration    17 | Cost: 1.121406e+00
Iteration    18 | Cost: 1.001795e+00
Iteration    19 | Cost: 9.730070e-01
Iteration    20 | Cost: 9.396211e-01
Iteration    21 | Cost: 8.982489e-01
Iteration    22 | Cost: 8.785754e-01
Iteration    23 | Cost: 8.558708e-01

Iteration    24 | Cost: 8.358078e-01
Iteration    25 | Cost: 8.074475e-01
Iteration    26 | Cost: 7.975287e-01
Iteration    27 | Cost: 7.883648e-01
Iteration    28 | Cost: 7.543000e-01
Iteration    29 | Cost: 7.318456e-01

Iteration    30 | Cost: 7.151468e-01
Iteration    31 | Cost: 6.919630e-01
Iteration    32 | Cost: 6.823971e-01
Iteration    33 | Cost: 6.766813e-01
Iteration    34 | Cost: 6.639429e-01
Iteration    35 | Cost: 6.579100e-01

Iteration    36 | Cost: 6.491120e-01
Iteration    37 | Cost: 6.405250e-01
Iteration    38 | Cost: 6.318625e-01
Iteration    39 | Cost: 6.180036e-01
Iteration    40 | Cost: 6.081649e-01
Iteration    41 | Cost: 5.973954e-01
Iteration    42 | Cost: 5.684440e-01

Iteration    43 | Cost: 5.465935e-01
Iteration    44 | Cost: 5.399081e-01
Iteration    45 | Cost: 5.320386e-01
Iteration    46 | Cost: 5.289632e-01
Iteration    47 | Cost: 5.252995e-01
Iteration    48 | Cost: 5.236517e-01
Iteration    49 | Cost: 5.233562e-01

Iteration    50 | Cost: 5.197894e-01
Program paused. Press enter to continue.


Visualizing Neural Network... 

Program paused. Press enter to continue.


Training Set Accuracy: 94.340000

完整代碼

https://github.com/madoubao/coursera_machine_learning/tree/master/homework/machine-learning-ex4/ex4

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【原】Coursera—Andrew Ng機器學習—編程作業 Programming Exercise 2——邏輯回歸 Stanford coursera Andrew Ng 機器學習課程編程作業（Exercise 2）及總結解決Coursera平台上Andrew.Ng的機器學習課程無法正常提交編程作業的問題 Andrew NG 機器學習編程作業2 Octave Andrew NG 機器學習編程作業3 Octave 【原】Coursera—Andrew Ng斯坦福機器學習（0）——課程地址和軟件下載【原】Coursera—Andrew Ng機器學習—Week 1 習題—Linear Regression with One Variable 單變量線性回歸 stanford coursera 機器學習編程作業 exercise 3（邏輯回歸實現多分類問題）【原】Coursera—Andrew Ng機器學習—課程筆記 Lecture 1_Introduction and Basic Concepts 介紹和基本概念【原】Coursera—Andrew Ng機器學習—課程筆記 Lecture 2_Linear regression with one variable 單變量線性回歸