前言:
這篇文章主要是用來練習softmax regression在多分類器中的應用,關於該部分的理論知識已經在前面的博文中Deep learning:十三(Softmax Regression)有所介紹。本次的實驗內容是參考網頁:http://deeplearning.stanford.edu/wiki/index.php/Exercise:Softmax_Regression。主要完成的是手寫數字識別,采用的是MNIST手寫數字數據庫,其中訓練樣本有6萬個,測試樣本有1萬個,且數字是0~9這10個。每個樣本是一張小圖片,大小為28*28的。
實驗環境:matlab2012a
實驗基礎:
這次實驗只用了softmax模型,也就是說沒有隱含層,而只有輸入層和輸出層,因為實驗中並沒有提取出MINST樣本的特征,而是直接用的原始像素特征。實驗中主要是計算系統的損失函數和其偏導數,其計算公式如下所示:
一些matlab函數:
sparse:
生成一個稀疏矩陣,比如說sparse(A, B, k),,其中A和B是個向量,k是個常量。這里生成的稀疏矩陣的值都為參數k,稀疏矩陣位置值坐標點有A和B相應的位置點值構成。
full:
生成一個正常矩陣,一般都是利用稀疏矩陣來還原的。
實驗錯誤:
按照作者給的starter code,結果連數據都加載不下來,出現如下錯誤提示:Error using permute Out of memory. Type HELP MEMORY for your options. 結果跟蹤定位到loadMNISTImages.m文件中的images = permute(images,[2 1 3])這句代碼,究其原因就是說images矩陣過大,在有限內存下不能夠將其進行維度旋轉變換。可是這個數據已經很小了,才幾十兆而已,參考了很多out of memory的方法都不管用,后面直接把改句的前面一句代碼images = reshape(images, numCols, numRows, numImages);改成images = reshape(images, numRows, numCols, numImages);反正實現的效果都是一樣的。因為原因是內存問題,所以要么用64bit的matlab,要買自己對該函數去優化下,節省運行過程中的內存。
實驗結果:
Accuracy: 92.640%
和網頁教程中給的結果非常相近了。
實驗主要部分代碼:
softmaxExercise.m:
%% CS294A/CS294W Softmax Exercise % Instructions % ------------ % % This file contains code that helps you get started on the % softmax exercise. You will need to write the softmax cost function % in softmaxCost.m and the softmax prediction function in softmaxPred.m. % For this exercise, you will not need to change any code in this file, % or any other files other than those mentioned above. % (However, you may be required to do so in later exercises) %%====================================================================== %% STEP 0: Initialise constants and parameters % % Here we define and initialise some constants which allow your code % to be used more generally on any arbitrary input. % We also initialise some parameters used for tuning the model. inputSize = 28 * 28; % Size of input vector (MNIST images are 28x28) numClasses = 10; % Number of classes (MNIST images fall into 10 classes) lambda = 1e-4; % Weight decay parameter %%====================================================================== %% STEP 1: Load data % % In this section, we load the input and output data. % For softmax regression on MNIST pixels, % the input data is the images, and % the output data is the labels. % % Change the filenames if you've saved the files under different names % On some platforms, the files might be saved as % train-images.idx3-ubyte / train-labels.idx1-ubyte images = loadMNISTImages('train-images.idx3-ubyte'); labels = loadMNISTLabels('train-labels.idx1-ubyte'); labels(labels==0) = 10; % Remap 0 to 10 inputData = images; % For debugging purposes, you may wish to reduce the size of the input data % in order to speed up gradient checking. % Here, we create synthetic dataset using random data for testing % DEBUG = true; % Set DEBUG to true when debugging. DEBUG = false; if DEBUG inputSize = 8; inputData = randn(8, 100); labels = randi(10, 100, 1); end % Randomly initialise theta theta = 0.005 * randn(numClasses * inputSize, 1);%輸入的是一個列向量 %%====================================================================== %% STEP 2: Implement softmaxCost % % Implement softmaxCost in softmaxCost.m. [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, inputData, labels); %%====================================================================== %% STEP 3: Gradient checking % % As with any learning algorithm, you should always check that your % gradients are correct before learning the parameters. % if DEBUG numGrad = computeNumericalGradient( @(x) softmaxCost(x, numClasses, ... inputSize, lambda, inputData, labels), theta); % Use this to visually compare the gradients side by side disp([numGrad grad]); % Compare numerically computed gradients with those computed analytically diff = norm(numGrad-grad)/norm(numGrad+grad); disp(diff); % The difference should be small. % In our implementation, these values are usually less than 1e-7. % When your gradients are correct, congratulations! end %%====================================================================== %% STEP 4: Learning parameters % % Once you have verified that your gradients are correct, % you can start training your softmax regression code using softmaxTrain % (which uses minFunc). options.maxIter = 100; %softmaxModel其實只是一個結構體,里面包含了學習到的最優參數以及輸入尺寸大小和類別個數信息 softmaxModel = softmaxTrain(inputSize, numClasses, lambda, ... inputData, labels, options); % Although we only use 100 iterations here to train a classifier for the % MNIST data set, in practice, training for more iterations is usually % beneficial. %%====================================================================== %% STEP 5: Testing % % You should now test your model against the test images. % To do this, you will first need to write softmaxPredict % (in softmaxPredict.m), which should return predictions % given a softmax model and the input data. images = loadMNISTImages('t10k-images.idx3-ubyte'); labels = loadMNISTLabels('t10k-labels.idx1-ubyte'); labels(labels==0) = 10; % Remap 0 to 10 inputData = images; size(softmaxModel.optTheta) size(inputData) % You will have to implement softmaxPredict in softmaxPredict.m [pred] = softmaxPredict(softmaxModel, inputData); acc = mean(labels(:) == pred(:)); fprintf('Accuracy: %0.3f%%\n', acc * 100); % Accuracy is the proportion of correctly classified images % After 100 iterations, the results for our implementation were: % % Accuracy: 92.200% % % If your values are too low (accuracy less than 0.91), you should check % your code for errors, and make sure you are training on the % entire data set of 60000 28x28 training images % (unless you modified the loading code, this should be the case)
softmaxCost.m
function [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, data, labels) % numClasses - the number of classes % inputSize - the size N of the input vector % lambda - weight decay parameter % data - the N x M input matrix, where each column data(:, i) corresponds to % a single test set % labels - an M x 1 matrix containing the labels corresponding for the input data % % Unroll the parameters from theta theta = reshape(theta, numClasses, inputSize);%將輸入的參數列向量變成一個矩陣 numCases = size(data, 2);%輸入樣本的個數 groundTruth = full(sparse(labels, 1:numCases, 1));%這里sparse是生成一個稀疏矩陣,該矩陣中的值都是第三個值1 %稀疏矩陣的小標由labels和1:numCases對應值構成 cost = 0; thetagrad = zeros(numClasses, inputSize); %% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Compute the cost and gradient for softmax regression. % You need to compute thetagrad and cost. % The groundTruth matrix might come in handy. M = bsxfun(@minus,theta*data,max(theta*data, [], 1)); M = exp(M); p = bsxfun(@rdivide, M, sum(M)); cost = -1/numCases * groundTruth(:)' * log(p(:)) + lambda/2 * sum(theta(:) .^ 2); thetagrad = -1/numCases * (groundTruth - p) * data' + lambda * theta; % ------------------------------------------------------------------ % Unroll the gradient matrices into a vector for minFunc grad = [thetagrad(:)]; end
softmaxTrain.m:
function [softmaxModel] = softmaxTrain(inputSize, numClasses, lambda, inputData, labels, options) %softmaxTrain Train a softmax model with the given parameters on the given % data. Returns softmaxOptTheta, a vector containing the trained parameters % for the model. % % inputSize: the size of an input vector x^(i) % numClasses: the number of classes % lambda: weight decay parameter % inputData: an N by M matrix containing the input data, such that % inputData(:, c) is the cth input % labels: M by 1 matrix containing the class labels for the % corresponding inputs. labels(c) is the class label for % the cth input % options (optional): options % options.maxIter: number of iterations to train for if ~exist('options', 'var') options = struct; end if ~isfield(options, 'maxIter') options.maxIter = 400; end % initialize parameters theta = 0.005 * randn(numClasses * inputSize, 1); % Use minFunc to minimize the function addpath minFunc/ options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost % function. Generally, for minFunc to work, you % need a function pointer with two outputs: the % function value and the gradient. In our problem, % softmaxCost.m satisfies this. minFuncOptions.display = 'on'; [softmaxOptTheta, cost] = minFunc( @(p) softmaxCost(p, ... numClasses, inputSize, lambda, ... inputData, labels), ... theta, options); % Fold softmaxOptTheta into a nicer format softmaxModel.optTheta = reshape(softmaxOptTheta, numClasses, inputSize); softmaxModel.inputSize = inputSize; softmaxModel.numClasses = numClasses; end
softmaxPredict.m:
function [pred] = softmaxPredict(softmaxModel, data) % softmaxModel - model trained using softmaxTrain % data - the N x M input matrix, where each column data(:, i) corresponds to % a single test set % % Your code should produce the prediction matrix % pred, where pred(i) is argmax_c P(y(c) | x(i)). % Unroll the parameters from theta theta = softmaxModel.optTheta; % this provides a numClasses x inputSize matrix pred = zeros(1, size(data, 2)); %% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Compute pred using theta assuming that the labels start % from 1. [nop, pred] = max(theta * data); % pred= max(peed_temp); % --------------------------------------------------------------------- end
參考資料:
Deep learning:十三(Softmax Regression)
http://deeplearning.stanford.edu/wiki/index.php/Exercise:Softmax_Regression