前言:
本次實驗是練習convolution和pooling的使用,更深一層的理解怎樣對大的圖片采用convolution得到每個特征的輸出結果,然后采用pooling方法對這些結果進行計算,使之具有平移不變等特性。實驗參考的是斯坦福網頁教程:Exercise:Convolution and Pooling。也可以參考前面的博客:Deep learning:十七(Linear Decoders,Convolution和Pooling),且本次試驗是在前面博文Deep learning:二十二(linear decoder練習)的學習到的特征提取網絡上進行的。
實驗基礎:
首先來看看整個訓練和測試過程的大概流程:從本文可以更清楚的看到,在訓練階段,是對小的patches進行whitening的。由於輸入的數據是大的圖片,所以每次進行convolution時都需要進行whitening和網絡的權值計算,這樣每一個學習到的隱含層節點的特征對每一張圖片都可以得到一張稍小的特征圖片,接着對這張特征圖片進行均值pooling(在這之前,程序中有一些代碼來測試convolution和pooling代碼的正確性)。有了這些特征值以及標注值,就可以用softmax來訓練多分類器了。
在測試階段是對大圖片采取convolution的,每次convolution的圖像塊也同樣需要用訓練時的whitening參數進行預處理,分別經過convolution和pooling提取特征,這和前面的訓練過程一樣。然后用訓練好的softmax分類器就可進行預測了。
訓練特征提取的網絡參數用的時間比較多,而訓練比如說softmax分類器則用的時間比較短。
在matlab中當有n維數組時,一般是從右向左進行剝皮計算,因為matlab輸出都是按照這種方法進行的。當然了,如果要理解的話,從左向右和從右向左都是可以的,只要是方便理解就行。
程序中進行convolution測試的理由是:先用cnnConvolve函數計算出所給樣本的convolution值,然后隨機選取多個patch,用直接代數運算的方法得出網絡的輸出值,如果對於所有(比如說這里選的1000個)的patch,這兩者之間的差都非常小的話,說明convution計算是正確的。
程序中進行pooling測試的理由是:采用函數cnnPool來計算,而該函數的參數為polling的維數以及需要pooling的數據。因此程序中先隨便給一組數據,然后用手動的方法計算出均值pooling的結果,最后用cnnPool函數也計算出一個結果,如果兩者的結果相同,則說明pooling函數是正確的。
程序中顏色特征的學習體現在:每次只對RGB中的一個通道進行convolution,分別計算3次,然后把三個通道得到的convolution結果矩陣對應元素相加即可。這樣的話,后面的Pooling操作只需在一個圖像上進行即可。
Convolution后得到的形式如下:
convolvedFeatures(featureNum, imageNum, imageRow, imageCol)
pooling后得到的形式如下:
pooledFeatures(featureNum, imageNum, poolRow, poolCol)
圖片的保存形式如下:
convImages(imageRow, imageCol, imageChannel, imageNum)
由於只需訓練4個類別的softmax分類器,所以其速度非常快,1分鍾都不到。
一些matlab函數:
squeeze:
B = squeeze(A),B與A有相同的元素,但所有只有一行或只有一列的那個維度(a singleton dimension)被去除掉了。A singleton dimension的特征是size(A,dim) = 1。二維陣列不受squeeze影響; 如果 A 是一個row or column矢量或a scalar (1-by-1) value, then B = A。比如,rand(4,1,3)產生一個均勻分布的陣列,共3頁,每頁4行1列,經過squeeze后,1列的那個維度就沒有了,只剩下4行3列的一個二維陣列。而rand(4,2,3)因為沒有1列或1行的維度,所有squeeze后沒有變化。
size:
size(A,n),如果A是一個多維矩陣,那么size(A,n)表示第n維的大小,返回值為一個實數。
實驗結果:
訓練出來的特征圖像為:
最終的預測准確度為:Accuracy: 80.406%
實驗主要部分代碼:
CnnExercise.m:
%% CS294A/CS294W Convolutional Neural Networks Exercise % Instructions % ------------ % % This file contains code that helps you get started on the % convolutional neural networks exercise. In this exercise, you will only % need to modify cnnConvolve.m and cnnPool.m. You will not need to modify % this file. %%====================================================================== %% STEP 0: Initialization % Here we initialize some parameters used for the exercise. imageDim = 64; % image dimension imageChannels = 3; % number of channels (rgb, so 3) patchDim = 8; % patch dimension numPatches = 50000; % number of patches visibleSize = patchDim * patchDim * imageChannels; % number of input units ,8*8*3=192 outputSize = visibleSize; % number of output units hiddenSize = 400; % number of hidden units epsilon = 0.1; % epsilon for ZCA whitening poolDim = 19; % dimension of pooling region %%====================================================================== %% STEP 1: Train a sparse autoencoder (with a linear decoder) to learn % features from color patches. If you have completed the linear decoder % execise, use the features that you have obtained from that exercise, % loading them into optTheta. Recall that we have to keep around the % parameters used in whitening (i.e., the ZCA whitening matrix and the % meanPatch) % --------------------------- YOUR CODE HERE -------------------------- % Train the sparse autoencoder and fill the following variables with % the optimal parameters: optTheta = zeros(2*hiddenSize*visibleSize+hiddenSize+visibleSize, 1);%對patch網絡作用的所有參數個數 ZCAWhite = zeros(visibleSize, visibleSize); meanPatch = zeros(visibleSize, 1); load STL10Features.mat; % -------------------------------------------------------------------- % Display and check to see that the features look good W = reshape(optTheta(1:visibleSize * hiddenSize), hiddenSize, visibleSize); b = optTheta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize); displayColorNetwork( (W*ZCAWhite)');%以前的博客中有解釋 %%====================================================================== %% STEP 2: Implement and test convolution and pooling % In this step, you will implement convolution and pooling, and test them % on a small part of the data set to ensure that you have implemented % these two functions correctly. In the next step, you will actually % convolve and pool the features with the STL10 images. %% STEP 2a: Implement convolution % Implement convolution in the function cnnConvolve in cnnConvolve.m % Note that we have to preprocess the images in the exact same way % we preprocessed the patches before we can obtain the feature activations. load stlTrainSubset.mat % loads numTrainImages, trainImages, trainLabels %% Use only the first 8 images for testing convImages = trainImages(:, :, :, 1:8); % NOTE: Implement cnnConvolve in cnnConvolve.m first!w和b已經是矩陣或向量的形式了 convolvedFeatures = cnnConvolve(patchDim, hiddenSize, convImages, W, b, ZCAWhite, meanPatch); %% STEP 2b: Checking your convolution % To ensure that you have convolved the features correctly, we have % provided some code to compare the results of your convolution with % activations from the sparse autoencoder % For 1000 random points for i = 1:1000 featureNum = randi([1, hiddenSize]);%隨機選取一個特征 imageNum = randi([1, 8]);%隨機選取一個樣本 imageRow = randi([1, imageDim - patchDim + 1]);%隨機選取一個點 imageCol = randi([1, imageDim - patchDim + 1]); %在那8張圖片中隨機選取1張圖片,然后又根據隨機選取的左上角點選取1個patch patch = convImages(imageRow:imageRow + patchDim - 1, imageCol:imageCol + patchDim - 1, :, imageNum); patch = patch(:); %這樣是按照列的順序來排列的 patch = patch - meanPatch; patch = ZCAWhite * patch;%用同樣的參數對該patch進行白化處理 features = feedForwardAutoencoder(optTheta, hiddenSize, visibleSize, patch); %計算出該patch的輸出值 if abs(features(featureNum, 1) - convolvedFeatures(featureNum, imageNum, imageRow, imageCol)) > 1e-9 fprintf('Convolved feature does not match activation from autoencoder\n'); fprintf('Feature Number : %d\n', featureNum); fprintf('Image Number : %d\n', imageNum); fprintf('Image Row : %d\n', imageRow); fprintf('Image Column : %d\n', imageCol); fprintf('Convolved feature : %0.5f\n', convolvedFeatures(featureNum, imageNum, imageRow, imageCol)); fprintf('Sparse AE feature : %0.5f\n', features(featureNum, 1)); error('Convolved feature does not match activation from autoencoder'); end end disp('Congratulations! Your convolution code passed the test.'); %% STEP 2c: Implement pooling % Implement pooling in the function cnnPool in cnnPool.m % NOTE: Implement cnnPool in cnnPool.m first! pooledFeatures = cnnPool(poolDim, convolvedFeatures); %% STEP 2d: Checking your pooling % To ensure that you have implemented pooling, we will use your pooling % function to pool over a test matrix and check the results. testMatrix = reshape(1:64, 8, 8);%將1~64這64個數字弄成一個矩陣,按列的方向依次遞增 %直接計算均值pooling值 expectedMatrix = [mean(mean(testMatrix(1:4, 1:4))) mean(mean(testMatrix(1:4, 5:8))); ... mean(mean(testMatrix(5:8, 1:4))) mean(mean(testMatrix(5:8, 5:8))); ]; testMatrix = reshape(testMatrix, 1, 1, 8, 8); %squeeze去掉維度為1的那一維 pooledFeatures = squeeze(cnnPool(4, testMatrix));%參數值為4表明是對4*4的區域進行pooling if ~isequal(pooledFeatures, expectedMatrix) disp('Pooling incorrect'); disp('Expected'); disp(expectedMatrix); disp('Got'); disp(pooledFeatures); else disp('Congratulations! Your pooling code passed the test.'); end %%====================================================================== %% STEP 3: Convolve and pool with the dataset % In this step, you will convolve each of the features you learned with % the full large images to obtain the convolved features. You will then % pool the convolved features to obtain the pooled features for % classification. % % Because the convolved features matrix is very large, we will do the % convolution and pooling 50 features at a time to avoid running out of % memory. Reduce this number if necessary stepSize = 50; assert(mod(hiddenSize, stepSize) == 0, 'stepSize should divide hiddenSize');%hiddenSize/stepSize為整數,這里分8次進行 load stlTrainSubset.mat % loads numTrainImages, trainImages, trainLabels load stlTestSubset.mat % loads numTestImages, testImages, testLabels pooledFeaturesTrain = zeros(hiddenSize, numTrainImages, ...%image是大圖片的尺寸,這里為64 floor((imageDim - patchDim + 1) / poolDim), ... %.poolDim為多大的區域pool一次,這里為19,即19*19大小pool一次. floor((imageDim - patchDim + 1) / poolDim) );%最后算出的pooledFeaturesTrain大小為400*2000*3*3 pooledFeaturesTest = zeros(hiddenSize, numTestImages, ... floor((imageDim - patchDim + 1) / poolDim), ... floor((imageDim - patchDim + 1) / poolDim) );%pooledFeaturesTest大小為400*3200*3*3 tic(); for convPart = 1:(hiddenSize / stepSize)%stepSize表示分批次進行原始圖片數據的特征提取,一次進行stepSize個隱含層節點 featureStart = (convPart - 1) * stepSize + 1;%選取起始的特征 featureEnd = convPart * stepSize;%選取結束的特征 fprintf('Step %d: features %d to %d\n', convPart, featureStart, featureEnd); Wt = W(featureStart:featureEnd, :); bt = b(featureStart:featureEnd); fprintf('Convolving and pooling train images\n'); convolvedFeaturesThis = cnnConvolve(patchDim, stepSize, ...%參數2表示的是當前"隱含層"節點的個數 trainImages, Wt, bt, ZCAWhite, meanPatch); pooledFeaturesThis = cnnPool(poolDim, convolvedFeaturesThis); pooledFeaturesTrain(featureStart:featureEnd, :, :, :) = pooledFeaturesThis; toc(); clear convolvedFeaturesThis pooledFeaturesThis;%這些大的變量在不用的情況下全部刪除掉,因為后面用的是test部分 fprintf('Convolving and pooling test images\n'); convolvedFeaturesThis = cnnConvolve(patchDim, stepSize, ... testImages, Wt, bt, ZCAWhite, meanPatch); pooledFeaturesThis = cnnPool(poolDim, convolvedFeaturesThis); pooledFeaturesTest(featureStart:featureEnd, :, :, :) = pooledFeaturesThis; toc(); clear convolvedFeaturesThis pooledFeaturesThis; end % You might want to save the pooled features since convolution and pooling takes a long time save('cnnPooledFeatures.mat', 'pooledFeaturesTrain', 'pooledFeaturesTest'); toc(); %%====================================================================== %% STEP 4: Use pooled features for classification % Now, you will use your pooled features to train a softmax classifier, % using softmaxTrain from the softmax exercise. % Training the softmax classifer for 1000 iterations should take less than % 10 minutes. % Add the path to your softmax solution, if necessary % addpath /path/to/solution/ % Setup parameters for softmax softmaxLambda = 1e-4;%權值懲罰系數 numClasses = 4; % Reshape the pooledFeatures to form an input vector for softmax softmaxX = permute(pooledFeaturesTrain, [1 3 4 2]);%permute是調整順序,把圖片放在最后 softmaxX = reshape(softmaxX, numel(pooledFeaturesTrain) / numTrainImages,...%numel(pooledFeaturesTrain) / numTrainImages numTrainImages); %為每一張圖片得到的特征向量長度 softmaxY = trainLabels; options = struct; options.maxIter = 200; softmaxModel = softmaxTrain(numel(pooledFeaturesTrain) / numTrainImages,...%第一個參數為inputSize numClasses, softmaxLambda, softmaxX, softmaxY, options); %%====================================================================== %% STEP 5: Test classifer % Now you will test your trained classifer against the test images softmaxX = permute(pooledFeaturesTest, [1 3 4 2]); softmaxX = reshape(softmaxX, numel(pooledFeaturesTest) / numTestImages, numTestImages); softmaxY = testLabels; [pred] = softmaxPredict(softmaxModel, softmaxX); acc = (pred(:) == softmaxY(:)); acc = sum(acc) / size(acc, 1); fprintf('Accuracy: %2.3f%%\n', acc * 100);%計算預測准確度 % You should expect to get an accuracy of around 80% on the test images.
cnnConvolve.m:
function convolvedFeatures = cnnConvolve(patchDim, numFeatures, images, W, b, ZCAWhite, meanPatch) %cnnConvolve Returns the convolution of the features given by W and b with %the given images % % Parameters: % patchDim - patch (feature) dimension % numFeatures - number of features % images - large images to convolve with, matrix in the form % images(r, c, channel, image number) % W, b - W, b for features from the sparse autoencoder % ZCAWhite, meanPatch - ZCAWhitening and meanPatch matrices used for % preprocessing % % Returns: % convolvedFeatures - matrix of convolved features in the form % convolvedFeatures(featureNum, imageNum, imageRow, imageCol) patchSize = patchDim*patchDim; assert(numFeatures == size(W,1), 'W should have numFeatures rows'); numImages = size(images, 4);%第4維的大小,即圖片的樣本數 imageDim = size(images, 1);%第1維的大小,即圖片的行數 imageChannels = size(images, 3);%第3維的大小,即圖片的通道數 assert(patchSize*imageChannels == size(W,2), 'W should have patchSize*imageChannels cols'); % Instructions: % Convolve every feature with every large image here to produce the % numFeatures x numImages x (imageDim - patchDim + 1) x (imageDim - patchDim + 1) % matrix convolvedFeatures, such that % convolvedFeatures(featureNum, imageNum, imageRow, imageCol) is the % value of the convolved featureNum feature for the imageNum image over % the region (imageRow, imageCol) to (imageRow + patchDim - 1, imageCol + patchDim - 1) % % Expected running times: % Convolving with 100 images should take less than 3 minutes % Convolving with 5000 images should take around an hour % (So to save time when testing, you should convolve with less images, as % described earlier) % -------------------- YOUR CODE HERE -------------------- % Precompute the matrices that will be used during the convolution. Recall % that you need to take into account the whitening and mean subtraction % steps WT = W*ZCAWhite;%等效的網絡參數 b_mean = b - WT*meanPatch;%針對未均值化的輸入數據需要加入該項 % -------------------------------------------------------- convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1); for imageNum = 1:numImages for featureNum = 1:numFeatures % convolution of image with feature matrix for each channel convolvedImage = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1); for channel = 1:imageChannels % Obtain the feature (patchDim x patchDim) needed during the convolution % ---- YOUR CODE HERE ---- offset = (channel-1)*patchSize; feature = reshape(WT(featureNum,offset+1:offset+patchSize), patchDim, patchDim);%取一個權值圖像塊出來 im = images(:,:,channel,imageNum); % Flip the feature matrix because of the definition of convolution, as explained later feature = flipud(fliplr(squeeze(feature))); % Obtain the image im = squeeze(images(:, :, channel, imageNum));%取一張圖片出來 % Convolve "feature" with "im", adding the result to convolvedImage % be sure to do a 'valid' convolution % ---- YOUR CODE HERE ---- convolvedoneChannel = conv2(im, feature, 'valid'); convolvedImage = convolvedImage + convolvedoneChannel;%直接把3通道的值加起來,理由:3通道相當於有3個feature-map,類似於cnn第2層以后的輸入。 % ------------------------ end % Subtract the bias unit (correcting for the mean subtraction as well) % Then, apply the sigmoid function to get the hidden activation % ---- YOUR CODE HERE ---- convolvedImage = sigmoid(convolvedImage+b_mean(featureNum)); % ------------------------ % The convolved feature is the sum of the convolved values for all channels convolvedFeatures(featureNum, imageNum, :, :) = convolvedImage; end end end function sigm = sigmoid(x) sigm = 1./(1+exp(-x)); end
cnnPool.m:
function pooledFeatures = cnnPool(poolDim, convolvedFeatures) %cnnPool Pools the given convolved features % % Parameters: % poolDim - dimension of pooling region % convolvedFeatures - convolved features to pool (as given by cnnConvolve) % convolvedFeatures(featureNum, imageNum, imageRow, imageCol) % % Returns: % pooledFeatures - matrix of pooled features in the form % pooledFeatures(featureNum, imageNum, poolRow, poolCol) % numImages = size(convolvedFeatures, 2);%圖片數 numFeatures = size(convolvedFeatures, 1);%特征數 convolvedDim = size(convolvedFeatures, 3);%圖片的行數 resultDim = floor(convolvedDim / poolDim); pooledFeatures = zeros(numFeatures, numImages, resultDim, resultDim); % -------------------- YOUR CODE HERE -------------------- % Instructions: % Now pool the convolved features in regions of poolDim x poolDim, % to obtain the % numFeatures x numImages x (convolvedDim/poolDim) x (convolvedDim/poolDim) % matrix pooledFeatures, such that % pooledFeatures(featureNum, imageNum, poolRow, poolCol) is the % value of the featureNum feature for the imageNum image pooled over the % corresponding (poolRow, poolCol) pooling region % (see http://ufldl/wiki/index.php/Pooling ) % % Use mean pooling here. % -------------------- YOUR CODE HERE -------------------- for imageNum = 1:numImages for featureNum = 1:numFeatures for poolRow = 1:resultDim offsetRow = 1+(poolRow-1)*poolDim; for poolCol = 1:resultDim offsetCol = 1+(poolCol-1)*poolDim; patch = convolvedFeatures(featureNum,imageNum,offsetRow:offsetRow+poolDim-1,... offsetCol:offsetCol+poolDim-1);%取出一個patch pooledFeatures(featureNum,imageNum,poolRow,poolCol) = mean(patch(:));%使用均值pool end end end end end
參考資料:
Deep learning:十七(Linear Decoders,Convolution和Pooling)
Exercise:Convolution and Pooling
Deep learning:二十二(linear decoder練習)
http://blog.sina.com.cn/s/blog_50363a790100wyeq.html