特征歸一化、特征映射、正則化


特征歸一化,特征映射,正則化

特征歸一化(Feature Normalize/Feature Scaling)

應用簡介

當數據集的各個屬性之間的值分布差別較大時,運用梯度下降算法求解局部最優解時會需要很小的學習率以及多次迭代才能達到最優解。因此,使用特征歸一化主要有以下兩條作用:

  • 歸一化后加快了梯度下降求最優解的速度;

  • 歸一化有可能提高精度

常見類型

  • 最大最小標准化(Min-Max Normalization)

    適用於本身就分布在有限范圍內的數據

    \[x_i^{(j)} = \frac{x_i^{(j)}-\min{(x_i)}}{\max{(x_i)-\min{(x_i)}}} \]

  • 均值方差歸一化 (Cepstral mean and variance normalization,CMVN)

    適用於分布沒有明顯邊界的數據

\[x_i^{(j)} = \frac{x_i^{(j)}-\mu_i}{\sigma_i} \]

​ 其中

\[\left\{ \begin{align} \mu_i &= \frac{1}{m} * \sum_{j=1}^{m} x_i^{(j)} \\ \sigma_i &= \sqrt{\frac{\sum_{j=1}^{m}(x_i^{(j)}-\mu_i)^2}{m}} \end{align} \right. \]

​ 在使用均值方差歸一化后,需要記錄\(\mu_i,\sigma_i\)以便后期使用。下面展示均值方差歸一化的具體實現

function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X 
%   FEATURENORMALIZE(X) returns a normalized version of X where
%   the mean value of each feature is 0 and the standard deviation
%   is 1. This is often a good preprocessing step to do when
%   working with learning algorithms.

% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));

% ====================== YOUR CODE HERE ======================
% Instructions: First, for each feature dimension, compute the mean
%               of the feature and subtract it from the dataset,
%               storing the mean value in mu. Next, compute the 
%               standard deviation of each feature and divide
%               each feature by it's standard deviation, storing
%               the standard deviation in sigma. 
%
%               Note that X is a matrix where each column is a 
%               feature and each row is an example. You need 
%               to perform the normalization separately for 
%               each feature. 
mu = mean(X); % X:m*2 , mu: 2 vector mu(1) : X_1) mu(2) : X_2
sigma = std(X);
X_norm(:,1) = (X_norm(:,1) - mu(1)) / sigma(1);
X_norm(:,2) = (X_norm(:,2) - mu(2)) / sigma(2);
% Hint: You might find the 'mean' and 'std' functions useful.
%       
% =========================================================

end

調用方式

[X, mu, sigma] = featureNormalize(X);

特征映射(Feature Mapping)

特征映射用於制造非線性回歸復雜屬性。通過循環將原本的輸入值矩陣擴展成多項展開式的形式。這樣做能夠獲得不同於線性回歸的更加復雜、合理的目標函數。

function out = mapFeature(X1, X2)
% MAPFEATURE Feature mapping function to polynomial features
%
%   MAPFEATURE(X1, X2) maps the two input features
%   to quadratic features used in the regularization exercise.
%
%   Returns a new feature array with more features, comprising of 
%   X1, X2, X1.^2, X2.^2, X1*X2, X1*X2.^2, etc..
%
%   Inputs X1, X2 must be the same size
%

degree = 6;
out = ones(size(X1(:,1)));
for i = 1:degree
    for j = 0:i
        out(:, end+1) = (X1.^(i-j)).*(X2.^j);
    end
end

end

過擬合與正則化(Overfitting and Regularization)

每一個數據集都有可能出現一些異常樣本,它們雖然也是真實的數據,但不滿足其余大多數樣本所共同構成的規律。例如,在面積-房價問題上,有可能出現某一個房子的面積很小,但是很貴,或者某一個房子面積很大,但十分便宜。再比如,加入我們要判斷一個西瓜是否是好瓜,可供參考的屬性包括「色澤,根蒂,紋理,形狀」,對於其中「形狀」這一屬性,從直觀上來考慮其對好瓜的影響較小,但特定的樣本可能導致擬合出的參數受形狀的影響過多。

對於這類異常樣本,如果學習過深就會強行使得構造出的目標函數通過或逼近這些異常樣本,構造出來一個經驗誤差很小,而泛化誤差很大的模型。這樣的現象稱為過擬合。顯然,過擬合是一種不符合普遍規律的錯誤擬合,為了避免出現過擬合現象,一般采用正則化技術。

線性回歸正則化

對於線性回歸,我們引入懲罰系數\(\lambda\),以及懲罰項 \(\lambda * \sum_{j=1}^{n}\theta_j^2\)

  • 梯度下降正則化

    \[J(\theta) = \frac{1}{2m}[\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2 + \lambda\sum_{j=1}^n\theta_j^2] \]

  • 正規方程正則化

    \[\Theta = \text{pinv}(X^TX + \lambda \begin{bmatrix} 0 & 0 & 0 & 0 & ... & 0 \\ 0 & 1 & 0 & 0 & ... & 0 \\ 0 & 0 & 1 & 0 & ... & 0 \\ 0 & 0 & 0 & 1 &... \\ :\\ :\\ 0 & 0 & ... & ... &0 & 1 \end{bmatrix}) * X^T * Y \]

分類正則化

對分類問題的代價函數\(J(\theta)\)添加正則項\(\frac{\lambda}{2m}\sum_{j=1}^{n}\theta_j^2\) 來正則化

截屏2020-09-17 下午8.58.35

截屏2020-09-17 下午8.59.01

示例代碼:

function [J, grad] = costFunction(theta, x, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta


J = -1/m * sum(y.*log(sigmoid(x*theta))+(1-y).*log(1-sigmoid(x*theta))) + lambda/(2*m)*sum(theta.*theta);
grad = 1/m * x' * (sigmoid(x*theta)-y) + lambda / m * theta;

% =============================================================

end


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM