這是MATLAB深度學習工具箱中CNN代碼的學習筆記。
工具箱可以從github上下載:https://github.com/rasmusbergpalm/DeepLearnToolbox
建議參考CNN代碼分析筆記:https://blog.csdn.net/u013007900/article/details/51428186
講解誤差反向傳播的筆記:https://blog.csdn.net/viatorsun/article/details/82696475
MATLAB卷積運算筆記:https://blog.csdn.net/baoxiao7872/article/details/80435214
推薦閱讀:《深度學習》
在CNN的示例中,使用自帶的數據(手寫數字的圖片)進行CNN的訓練和測試。
全部代碼名稱如下圖所示。
其中test_example_CNN為測試示例,mnist_uint8為數據,該部分代碼及注釋如下:
function test_example_CNN load mnist_uint8; %手寫數字樣本,每個樣本特征為28*28的向量 train_x = double(reshape(train_x',28,28,60000))/255; %訓練數據,重塑數組為28*28,60000份,並歸一化 test_x = double(reshape(test_x',28,28,10000))/255; %測試數據,10000份 train_y = double(train_y'); test_y = double(test_y'); %% ex1 Train a 6c-2s-12c-2s Convolutional neural network %will run 1 epoch in about 200 second and get around 11% error. %With 100 epochs you'll get around 1.2% error rand('state',0) %每次產生的隨機數都相同 cnn.layers = { struct('type', 'i') %input layer 輸入層 struct('type', 'c', 'outputmaps', 6, 'kernelsize', 5) %convolution layer 卷積層 % outputmaps:卷積輸出特征圖像個數 6 % kernelsize:卷積核尺寸 5 struct('type', 's', 'scale', 2) %sub sampling layer 降采樣層,功能類似於pooling % 降采樣尺寸 2 struct('type', 'c', 'outputmaps', 12, 'kernelsize', 5) %convolution layer 卷積層 % outputmaps:卷積輸出特征圖像個數 12 % kernelsize:卷積核尺寸 5 struct('type', 's', 'scale', 2) %subsampling layer 降采樣層 %降采樣尺寸 2 }; % 此處定義神經網絡一共有5層:輸入層-卷積層-降采樣層-卷積層-降采樣層 opts.alpha = 1; %學習效率 opts.batchsize = 50; %批訓練樣本數量 opts.numepochs = 1; %迭代次數 cnn = cnnsetup(cnn, train_x, train_y); %CNN初始化 cnn = cnntrain(cnn, train_x, train_y, opts); %訓練CNN [er, bad] = cnntest(cnn, test_x, test_y); %測試CNN %plot mean squared error figure; plot(cnn.rL); %畫出MSE,均方誤差 assert(er<0.12, 'Too big error');
根據代碼運行順序,接下來運行cnnsetup,對CNN中的參數進行初始化,主要設置卷積核初始值和偏置初始值。代碼及注釋如下:
%% 初始化CNN參數 % 卷積核,偏置,尾部單層感知機 function net = cnnsetup(net, x, y) assert(~isOctave() || compare_versions(OCTAVE_VERSION, '3.8.0', '>='), ['Octave 3.8.0 or greater is required for CNNs as there is a bug in convolution in previous versions. See http://savannah.gnu.org/bugs/?39314. Your version is ' myOctaveVersion]); inputmaps = 1; % 每次輸入map個數 mapsize = size(squeeze(x(:, :, 1))); % 將3維數據壓縮成2維數據並計算矩陣尺寸 % mapsize = [28, 28] for l = 1 : numel(net.layers) % 對各層神經網絡的參數進行初始化設置 if strcmp(net.layers{l}.type, 's') %subsampling layer 若為降采樣層 mapsize = mapsize / net.layers{l}.scale; % 若l=3,mapsize = [24, 24]/2 = [12, 12] % 若l=5,mapsize = [8, 8]/2 = [4, 4] assert(all(floor(mapsize)==mapsize), ['Layer ' num2str(l) ' size must be integer. Actual: ' num2str(mapsize)]); for j = 1 : inputmaps net.layers{l}.b{j} = 0; %一個降采樣層的所有輸入map,偏置b初始化為0 end end if strcmp(net.layers{l}.type, 'c') %卷積層 mapsize = mapsize - net.layers{l}.kernelsize + 1; % 卷積后map的尺寸(默認步長stride為1) % 計算方式:(原圖尺寸 - 卷積核尺寸)/步長 + 1 % 若l=2,mapsize = [28, 28] - 5 + 1 = [24, 24] % 若l=4,mapsize = [12, 12] - 5 + 1 = [8, 8] fan_out = net.layers{l}.outputmaps * net.layers{l}.kernelsize ^ 2; % 此次卷積核神經元總數 % 若l=2,fan_out = 6 * 5^2 = 150 % 若l=4,fan_out = 12 * 5^2 = 300 for j = 1 : net.layers{l}.outputmaps % output map fan_in = inputmaps * net.layers{l}.kernelsize ^ 2; % 每個輸出map對應卷積核神經元總數 % 若l=2,j=1, fan_in = 1 * 5^2 = 25; % 若l=4,j=1,fan_in= 6 * 5^2 = 150 for i = 1 : inputmaps % input map net.layers{l}.k{i}{j} = (rand(net.layers{l}.kernelsize) - 0.5) * 2 * sqrt(6 / (fan_in + fan_out)); % 對每個卷積核值進行初始化 % k{i}{j} = (隨機5*5矩陣 - 0.5)* 2 * sqrt(6/(fan_in + fan_out)) % 若l=2,該層共有1*6=6個卷積核 % 若l=4,該層共有6*12=72個卷積核,上層神經網絡生成的每個特征圖像對應6個卷積核 end net.layers{l}.b{j} = 0; % 偏置為0 % 每個輸出map只有一個bias,並非每個filter一個bias end inputmaps = net.layers{l}.outputmaps; %更新下一層的輸入特征圖像個數 % 若l=2,inputmaps = 6 % 若l=4,inputmaps = 12 end end % 'onum' is the number of labels, that's why it is calculated using size(y, 1). If you have 20 labels so the output of the network will be 20 neurons. % 'fvnum' is the number of output neurons at the last layer, the layer just before the output layer. % 'ffb' is the biases of the output neurons. % 'ffW' is the weights between the last layer and the output neurons. Note that the last layer is fully connected to the output layer, that's why the size of the weights is (onum * fvnum) fvnum = prod(mapsize) * inputmaps; % 最后一層(輸出層前一層)神經元數量 % fvnum = 4 * 4 * 12 = 196 onum = size(y, 1); %標簽總個數 % onum = 10 net.ffb = zeros(onum, 1); %輸出神經元偏置 net.ffW = (rand(onum, fvnum) - 0.5) * 2 * sqrt(6 / (onum + fvnum)); %最后一層與輸出神經元的連接權重 % ffW = (隨機10*196矩陣 - 0.5) * 2 * sqrt(6 / (10 + 196)) end
接下來運行cnntrain,利用分批數據訓練神經網絡,代碼及注釋如下:
%% 訓練CNN function net = cnntrain(net, x, y, opts) m = size(x, 3); % 訓練樣本總個數 % m = 60000 numbatches = m / opts.batchsize; % 能夠分成的批次總數 % numbatches = 60000/50 if rem(numbatches, 1) ~= 0 error('numbatches not integer'); end net.rL = []; for i = 1 : opts.numepochs %迭代次數 disp(['epoch ' num2str(i) '/' num2str(opts.numepochs)]); tic; %計時 kk = randperm(m); %生成1~m隨機序號向量 for l = 1 : numbatches batch_x = x(:, :, kk((l - 1) * opts.batchsize + 1 : l * opts.batchsize)); %隨機抽取一批樣本 batch_y = y(:, kk((l - 1) * opts.batchsize + 1 : l * opts.batchsize)); net = cnnff(net, batch_x); %前向過程 net = cnnbp(net, batch_y); %計算反向誤差,計算梯度 net = cnnapplygrads(net, opts); %卷積核權重更新 if isempty(net.rL) net.rL(1) = net.L; end net.rL(end + 1) = 0.99 * net.rL(end) + 0.01 * net.L; % net.L為損失函數MSE % net.rL為損失函數的平滑序列 end toc; end end
在cnntrain中,根據運行順序,首先運行cnnff,進行前向過程,將數據輸入神經網絡,獲得相應的輸出。代碼及注釋如下:
%% 前向過程 function net = cnnff(net, x) n = numel(net.layers); % 神經網絡層數 n=5 net.layers{1}.a{1} = x; % 第一層神經網絡(輸入層) % a是輸入map,為一個[28, 28, 50]的矩陣(具體情況具體定) inputmaps = 1; for l = 2 : n % for each layer if strcmp(net.layers{l}.type, 'c') %卷積層 % !!below can probably be handled by insane matrix operations for j = 1 : net.layers{l}.outputmaps % for each output map % create temp output map z = zeros(size(net.layers{l - 1}.a{1}) - [net.layers{l}.kernelsize - 1 net.layers{l}.kernelsize - 1 0]); % z用於存儲輸出特征圖像值 % 若l=2,size(net.layers{l - 1}.a{1}) = [28, 28, 50], % z = zeros([28, 28, 50] - [5 - 1, 5 - 1, 0]) = zeros([24, 24, 50]) for i = 1 : inputmaps % for each input map % convolve with corresponding kernel and add to temp output map z = z + convn(net.layers{l - 1}.a{i}, net.layers{l}.k{i}{j}, 'valid'); % 將輸入與卷積核進行卷積運算,輸出未被填充0的部分 end % add bias, pass through nonlinearity net.layers{l}.a{j} = sigm(z + net.layers{l}.b{j}); % 加偏置,采用sigmoid函數進行非線性化 % 獲得激活函數結果,作為該層輸出 end % set number of input maps to this layers number of outputmaps inputmaps = net.layers{l}.outputmaps; % 下一層的輸入為該層輸出特征圖像個數 elseif strcmp(net.layers{l}.type, 's') %降采樣層 % downsample for j = 1 : inputmaps z = convn(net.layers{l - 1}.a{j}, ones(net.layers{l}.scale) / (net.layers{l}.scale ^ 2), 'valid'); % !! replace with variable % 上一層輸出與2*2且值全為1/4的矩陣進行卷積運算,返回未被填充的部分 net.layers{l}.a{j} = z(1 : net.layers{l}.scale : end, 1 : net.layers{l}.scale : end, :); end end end % 尾部單層感知機 % concatenate all end layer feature maps into vector net.fv = []; for j = 1 : numel(net.layers{n}.a) sa = size(net.layers{n}.a{j}); %最后一層 % sa = [4, 4, 50] net.fv = [net.fv; reshape(net.layers{n}.a{j}, sa(1) * sa(2), sa(3))]; % 重塑為[12, 16, 50]; end % feedforward into output perceptrons net.o = sigm(net.ffW * net.fv + repmat(net.ffb, 1, size(net.fv, 2))); %輸出 % sigmoid函數非線性化 % sigmoid([10, 196] * [196, 50] + 50份偏置) % 輸出乘以權重 end
在cnntrain中,繼續運行cnnbp,計算反向誤差和梯度。這一部分比較難理解,建議先看一下反向誤差傳播的原理。代碼及注釋如下:
function net = cnnbp(net, y) n = numel(net.layers); % error net.e = net.o - y; %神經網絡前向過程的輸出與期望輸出的誤差 % loss function net.L = 1/2* sum(net.e(:) .^ 2) / size(net.e, 2); % 損失函數:均方誤差(1/2方便計算微分) %% backprop deltas 誤差反向傳播 net.od = net.e .* (net.o .* (1 - net.o)); % output delta % 輸出層誤差向上一層(單層感知機)傳遞 % error * output * (1-output) 為損失函數相對於參數的偏微分,沒考慮學習速度 net.fvd = (net.ffW' * net.od); % feature vector delta 特征向量誤差傳遞到單層感知機 if strcmp(net.layers{n}.type, 'c') % only conv layers has sigm function 前一層為卷積層時 net.fvd = net.fvd .* (net.fv .* (1 - net.fv)); % sigmoid求導,誤差再求導一次,因為卷積結果進行了非線性化 end % reshape feature vector deltas into output map style sa = size(net.layers{n}.a{1}); %最后一層輸出map尺寸4*4,共12個,50張 fvnum = sa(1) * sa(2); %4*4 for j = 1 : numel(net.layers{n}.a) %j=1:12 net.layers{n}.d{j} = reshape(net.fvd(((j - 1) * fvnum + 1) : j * fvnum, :), sa(1), sa(2), sa(3)); % 4*4*50 誤差矩陣 end for l = (n - 1) : -1 : 1 %從后向前 if strcmp(net.layers{l}.type, 'c') %卷積層,誤差從降采樣層獲得 for j = 1 : numel(net.layers{l}.a) net.layers{l}.d{j} = net.layers{l}.a{j} .* (1 - net.layers{l}.a{j}) .* (expand(net.layers{l + 1}.d{j}, [net.layers{l + 1}.scale net.layers{l + 1}.scale 1]) / net.layers{l + 1}.scale ^ 2); % expand:多項式展開相乘,將后一層的誤差矩陣展開還原(降采樣的逆過程) % 仍然為error*output*(1-output)形式 end elseif strcmp(net.layers{l}.type, 's') %降采樣層,誤差從卷積層獲得,進行反卷積過程 for i = 1 : numel(net.layers{l}.a) z = zeros(size(net.layers{l}.a{1})); for j = 1 : numel(net.layers{l + 1}.a) z = z + convn(net.layers{l + 1}.d{j}, rot180(net.layers{l + 1}.k{i}{j}), 'full'); %卷積核旋轉180度,反卷積 end net.layers{l}.d{i} = z; end end end %% calc gradients 計算梯度 for l = 2 : n %從前向后 if strcmp(net.layers{l}.type, 'c') %卷積層 for j = 1 : numel(net.layers{l}.a) for i = 1 : numel(net.layers{l - 1}.a) %前一層 net.layers{l}.dk{i}{j} = convn(flipall(net.layers{l - 1}.a{i}), net.layers{l}.d{j}, 'valid') / size(net.layers{l}.d{j}, 3); % 卷積核修改量=輸入圖像*輸出圖像誤差矩陣 end net.layers{l}.db{j} = sum(net.layers{l}.d{j}(:)) / size(net.layers{l}.d{j}, 3); %偏置 end end end % 計算單層感知機梯度(修改量) net.dffW = net.od * (net.fv)' / size(net.od, 2); % 權重修改量 net.dffb = mean(net.od, 2); % 偏置修改量 function X = rot180(X) X = flipdim(flipdim(X, 1), 2); end end
在cnntrain中,繼續運行cnnapplygrads,根據計算的修改量,更新卷積核的權重。這部分代碼我的工具箱里沒有,因此我從網上的代碼中復制了一份。代碼及注釋如下:
function net = cnnapplygrads(net, opts) %使用梯度 %特征抽取層(卷機降采樣)的權重更新 for l = 2 : numel(net.layers) %從第二層開始 if strcmp(net.layers{l}.type, 'c')%對於每個卷積層 for j = 1 : numel(net.layers{l}.a)%枚舉該層的每個輸出 %枚舉所有卷積核net.layers{l}.k{ii}{j} for ii = 1 : numel(net.layers{l - 1}.a)%枚舉上層的每個輸出 net.layers{l}.k{ii}{j} = net.layers{l}.k{ii}{j} - opts.alpha * net.layers{l}.dk{ii}{j}; % 修正卷積核值 end % 修正偏置bias net.layers{l}.b{j} = net.layers{l}.b{j} - opts.alpha * net.layers{l}.db{j}; end end end %單層感知機的權重更新 net.ffW = net.ffW - opts.alpha * net.dffW; net.ffb = net.ffb - opts.alpha * net.dffb; end
自此,CNN的訓練已經完成。接下來利用cnntest測試訓練的深度學習神經網絡分類准確程度如何。
function [er, bad] = cnntest(net, x, y) % feedforward net = cnnff(net, x); % 前向傳播 [~, h] = max(net.o); %輸出結果最大值 [~, a] = max(y); bad = find(h ~= a); % 預測錯誤的樣本數量 er = numel(bad) / size(y, 2); % 計算錯誤概率 end