【原】Coursera—Andrew Ng機器學習—編程作業 Programming Exercise 1 線性回歸

本文轉載自查看原文 2018-10-30 09:38 1231 Andrew Ng/ 機器學習/ Octave/ Coursera

作業說明

　　Exercise 1，Week 2，使用Octave實現線性回歸模型。數據集 ex1data1.txt ，ex1data2.txt

　　單變量線性回歸必須實現，實現代價函數計算Computing Cost 和梯度下降Gradient Descent。

　　多變量線性回歸可選，實現特征Feature Normalization、代價函數計算Computing Cost 、梯度下降Gradient Descent 和 Normal Equations 。

文件清單

ex1.m
ex1_multi.m
ex1data1.txt - ex1.m 用到的數據組
ex1data2.txt - ex1_multi.m 用到的數據組
submit.m - 提交代碼
[*] warmUpExercise.m
[*] plotData.m
[*] computeCost.m
[*] gradientDescent.m
[+] computeCostMulti.m
[+] gradientDescentMulti.m
[+] featureNormalize.m
[+] normalEqn.m

　　* 為必須要完成的
　　+ 為可

　　背景：假設我們現在是個連鎖餐廳的老板，已經在很多城市開了連鎖店（提供訓練組），現在想再開新店，需要通過以前的數據預測新開的店的收益。

　　ex1data1.txt 提供所需要的訓練組，第一列是城市人口，第二列是對應的收益。負值代表着虧損。

結論

　　當數據的特征維度比較小的時候，使用正規方程方法不需要進行特征歸一化，而且結果穩定。梯度下降有可能得到局部最優解，導致結果不同。

　　注意矩陣計算的一些問題，求和前面一項需要轉置。

必做題單變量線性回歸

一、warmUp

　　單變量線性回歸入口在ex1.m

　　warmUpExercise.m

A = eye(5)

二、繪制數據圖

　　我實現的 plotData.m：

20   plot(x,y,'rx', 'MarkerSize', 10);
21   xlabel('Population of City in 10,000s');
22   ylabel('Profit in $10,000s');
23   title('POPULATION AND PROFIT');

　　 ex1.m 中的調用：

 1 %% ======================= Part 2: Plotting ======================= 3 data = load('ex1data1.txt');
 4 X = data(:, 1); y = data(:, 2);
 5 m = length(y); % number of training examples
 6  8 plotData(X, y);

　　運行效果如下：

三、代價函數

　　我實現的 computeCost.m：

 1 function J = computeCost(X, y, theta) 7 m = length(y); % number of training examples
 8 10 J = 0;
11 17   predictions = X * theta; % predictions of hapothesis on all m examples
20   sqrErrors = (predictions - y) .^ 2; % squared errors .^ 指的是對數據中每個元素平方
21   
22   J = 1 / (2 * m) * sum(sqrErrors); 
27 end

四、梯度下降

　　我實現的 gradientDescent.m

　矩陣性質：(AB)^T =B^TA^T

 1 function [theta, J_history, theta_history] = gradientDescent(X, y, theta, alpha, num_iters) 7 m = length(y); % number of training examples
 8 J_history = zeros(num_iters, 1);
 9 theta_history = zeros(2, num_iters); % 【改動】使用 2×iteration維矩陣，保存theta每次迭代的歷史
10 
11 for iter = 1:num_iters
12 
20     % 這里為了方便理解 拆的比較細，可以組合成一步操作 theta = theta - (alpha / m) * X' * (X * theta - y)
21     % prediction h(x)  m×2矩陣 * 2×1向量 = m維列向量
22     predictions = X * theta; 
23     % error h(x)-y m維列向量
24     errors = predictions - y; % 
25     % derivative of J()  m維行向量 * m×2矩陣 = 2維列向量
26     lineLope =  X' * errors;  
27     % theta 2維列向量
28     theta = theta - (alpha / m) * lineLope; %
37     38     J_history(iter) = computeCost(X, y, theta);  % Save the cost J in every iteration 
39     theta_history(:,iter) = theta;  % 給theta_history 第iter列賦值
40 end
41 
42 end

五、繪制預測曲線　　

　　ex1.m 中的調用：

 1 %% =================== Part 3: Cost and Gradient descent ===================
 2 
 3 % 設置 X 和 theta
 4 X = [ones(m, 1), data(:,1)]; % Add a column of ones to x
 5 theta = zeros(2, 1); % initialize fitting parameters
 6 
 7 % 設置迭代次數和學習速率
 8 iterations = 2000;
 9 alpha = 0.01;
10 12 % compute and display initial cost      計算theta=[0;0]時代價
13 J = computeCost(X, y, theta);16 
17 % further testing of the cost function  計算theta=[-1;2]時代價
18 J = computeCost(X, y, [-1 ; 2]);21 
25 fprintf('\nRunning Gradient Descent ...\n')27 % 【改動1】改為獲取多個返回值，J_history保存每次迭代的代價，theta保存每次迭代的theta0和theta1
28 % 原：theta = gradientDescent(X, y, theta, alpha, iterations);
29 [theta,J_history,theta_history] = gradientDescent(X, y, theta, alpha, iterations);
30 36 
37 % Plot the linear fit 在數據圖上繪制最終的擬合直線
38 hold on; % keep previous plot visible
39 plot(X(:,2), X*theta, '-')
40 legend('Training data', 'Linear regression')
41 hold off % don't overlay any more plots on this figure

　　運行結果如下：

四、繪制cost和theta變化曲線

自己加的功能。在ex1.m中增加以下代碼，繪制圖像。展示迭代過程中 cost 和 theta 的變化曲線

 1 % --------------【改動2】繪制代價和theta變化曲線 start --------------
 2 fprintf('Size of J_history saved by gradient descent:\n');
 3 fprintf('%f\n', size(J_history));
 4 iterX = [1:iterations]'; % 生成圖像橫坐標，迭代次數
 5 
 6 % 繪左側圖，展示迭代過程中代價的變化曲線
 7 subplot(1,2,1);
 8 plot(iterX, J_history, '-','linewidth',3); % 繪制代價函數曲線
 9 title('cost of each step'),
10 xlabel('iteration'),ylabel('value of cost'),
11 legend('value of cost');
12 
13 % 繪右側圖，展示迭代過程中theta的變化曲線
14 theta0_history = theta_history(1,:);
15 theta1_history = theta_history(2,:);
16 subplot(1,2,2);
17 plot(iterX,theta0_history,'-','linewidth',2);
18 hold on;
19 plot(iterX,theta1_history,'-','linewidth',2,'color','r');
20 title('theta of each step'),xlabel('iteration'),ylabel('value of theta'),legend('theta0','theta1');
21 % --------------【改動2】繪制代價和theta變化曲線 end --------------
22 
23 % Predict values for population sizes of 35,000 and 70,000
24 predict1 = [1, 3.5] * theta;
25 fprintf('For population = 35,000, we predict a profit of %f\n',...
26     predict1*10000);
27 predict2 = [1, 7] * theta;
28 fprintf('For population = 70,000, we predict a profit of %f\n',...
29     predict2*10000);

輸出如下：

五、繪制代價函數三維曲線和等高線圖

　　ex1.m 調用：

　　1、初始化 theta0 為（-10，10）均分100個點，theta1 為（-1,4）均分100個點，J_vals 為100 * 100的數組

　　這里使用了Matlab中的均分計算指令 linspace(x1,x2,N) ，用於產生x1、x2之間的N點行線性的矢量。其中x1、x2、N分別為起始值、終止值、元素個數。默認N為100。

　　2、循環計算每組（theta0，theta1），用 J_vals 保存對應的代價cost。

 1 %% ============= Part 4: Visualizing J(theta_0, theta_1) =============
 2 fprintf('Visualizing J(theta_0, theta_1) ...\n')
 3 
 4 % Grid over which we will calculate J
 5 theta0_vals = linspace(-10, 10, 100);
 6 theta1_vals = linspace(-1, 4, 100);
 7 
 8 % initialize J_vals to a matrix of 0's
 9 J_vals = zeros(length(theta0_vals), length(theta1_vals));
10 
11 % Fill out J_vals
12 for i = 1:length(theta0_vals)
13     for j = 1:length(theta1_vals)
14       t = [theta0_vals(i); theta1_vals(j)];
15       J_vals(i,j) = computeCost(X, y, t);
16     end
17 end
18 
19 
20 % Because of the way meshgrids work in the surf command, we need to
21 % transpose J_vals before calling surf, or else the axes will be flipped
22 J_vals = J_vals';
23 % Surface plot
24 figure;
25 surf(theta0_vals, theta1_vals, J_vals)
26 xlabel('\theta_0'); ylabel('\theta_1');
27 
28 % Contour plot
29 figure;
30 % Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100
31 contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20))
32 xlabel('\theta_0'); ylabel('\theta_1');
33 hold on;
34 plot(theta(1), theta(2), 'rx', 'MarkerSize', 10, 'LineWidth', 2);

　　3、將theta0作為X坐標，theta1作為Y坐標，J_vals作為Z坐標，繪制三維圖形

　　4、將theta0作為X坐標，theta1作為Y坐標，繪制J_vals的等高線圖

　　5、在等高線圖中，標記上面求出的使代價函數最小的 theta0，theta1點的位置。在等高線中心

　　這些圖像的目的是為了展示 $J (θ)$ 最適點，每一步梯度下降都會更靠近這個點。

可以通過旋轉看到為什么叫“輪廓圖”：

選做多變量線性回歸

一、特征歸一化

需要用特性放縮讓數據的范圍縮小，使得梯度下降計算的更快：

- 計算每個特性的平均值(mean)
- 計算標准差(standard deviations)
- 特性放縮(feature scaling)

* 這里利用的是標准差(standard deviation)，也可以使用差值(max - min)。

　　featureNormalize.m 如下：

 1 function [X_norm, mu, sigma] = featureNormalize(X)  7   9 X_norm = X; 10 mu = zeros(1, size(X, 2)); % 1行，列數和X相同 11 sigma = zeros(1, size(X, 2)); 12 13 % ====================== YOUR CODE HERE ====================== 29 mu = mean(X); 30 sigma = std(X); 32 X_norm = (X_norm - mu) ./ sigma; 35 36 end

二、代價函數和梯度下降

　　因為在單變量線性回歸中，使用的是向量化的計算方法，對於多變量線性回歸同樣適用。不需要重新寫

　　 computeCostMulti.m 和 computCost.m 一樣，gradientDescentMulti.m 和gradientDescent.m 一樣

　　ex1_multi.m 里的調用：

 1 %% ================ Part 2: Gradient Descent ================
 3 alpha = 1.2;
 4 num_iters = 400;
 5 
 6 % Init Theta and Run Gradient Descent 
 7 theta = zeros(3, 1);
 8 [theta,J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);
 9 
10 % Plot the convergence graph
11 figure;
12 plot(1:numel(J_history), J_history, '-b', 'LineWidth', 2);
13 xlabel('Number of iterations');
14 ylabel('Cost J');
15 
16 % Estimate the price of a 1650 sq-ft, 3 br house
17 % 這里要注意，需要把輸入的值進行 normalize，然后才能代入預測方程中
18 predict_x = [1650,3];
19 predict_x = (predict_x - mu) ./ sigma;
20 price = [1, predict_x] * theta;

三、正規方程

　　公式：

　　normalEqn.m 實現：

 1 function [theta] = normalEqn(X, y)
 6 theta = zeros(size(X, 2), 1);
12 theta = pinv(X' * X) * X' * y
15 end

　　ex1_multi 里的調用：

 1 %% ================ Part 3: Normal Equations ================

 6 %  predict the price of a 1650 sq-ft, 3 br house.10 data = csvread('ex1data2.txt');   % 重新加載數據
11 X = data(:, 1:2);
12 y = data(:, 3);
13 m = length(y);
14 
15 % Add intercept term to X
16 X = [ones(m, 1) X];
17 
18 % Calculate the parameters from the normal equation
19 theta = normalEqn(X, y);      % 使用正規方程進行計算25

28 % ====================== YOUR CODE HERE ======================

29 price = [1, 1650, 3] * theta; % 預測結果 30 31 % ============================================================

四、測試

　　運行結果：

 1 Loading data ...
 2 First 10 examples from the dataset:
 3  x = [2104 3], y = 399900
 4  x = [1600 3], y = 329900
 5  x = [2400 3], y = 369000
 6  x = [1416 2], y = 232000
 7  x = [3000 4], y = 539900
 8  x = [1985 4], y = 299900
 9  x = [1534 3], y = 314900
10  x = [1427 3], y = 198999
11  x = [1380 3], y = 212000
12  x = [1494 3], y = 242500
13 Program paused. Press enter to continue.
14 Normalizing Features ...
15    1.00000000   0.13000987  -0.22367519
16    1.00000000  -0.50418984  -0.22367519
17    1.00000000   0.50247636  -0.22367519
18    1.00000000  -0.73572306  -1.53776691
19    1.00000000   1.25747602   1.09041654
20    1.00000000  -0.01973173   1.09041654
21    1.00000000  -0.58723980  -0.22367519
22    1.00000000  -0.72188140  -0.22367519
23    1.00000000  -0.78102304  -0.22367519
24    1.00000000  -0.63757311  -0.22367519
25    1.00000000  -0.07635670   1.09041654
26    1.00000000  -0.00085674  -0.22367519
27    1.00000000  -0.13927334  -0.22367519
28    1.00000000   3.11729182   2.40450826
29    1.00000000  -0.92195631  -0.22367519
30    1.00000000   0.37664309   1.09041654
62 Running gradient descent ...
63 Theta computed from gradient descent:
64  334302.063993
65  100087.116006
66  3673.548451

　　（1）當 α = 0.05，預測一個1650 sq-ft, 3 br house 的房屋的售價。梯度下降和正規方程的預測值不同：

68 Predicted price of a 1650 sq-ft, 3 br house (using gradient descent): 69 $289314.620338 81 82 Predicted price of a 1650 sq-ft, 3 br house (using normal equations): 83 $293081.464335