損失函數(Loss Function) -1

本文轉載自查看原文 2014-11-08 18:30 63446 機器學習

http://www.ics.uci.edu/~dramanan/teaching/ics273a_winter08/lectures/lecture14.pdf

Loss Function

損失函數可以看做誤差部分(loss term) + 正則化部分(regularization term)

1.1 Loss Term

Gold Standard (ideal case)
Hinge (SVM, soft margin)
Log (logistic regression, cross entropy error)
Squared loss (linear regression)
Exponential loss (Boosting)

Gold Standard 又被稱為0-1 loss，記錄分類錯誤的次數

Hinge Loss http://en.wikipedia.org/wiki/Hinge_loss

For an intended output t = ±1 and a classifier score y, the hinge loss of the prediction y is defined as

Note that y should be the "raw" output of the classifier's decision function, not the predicted class label. E.g., in linear SVMs,

It can be seen that when t and y have the same sign (meaning y predicts the right class) and

, the hinge loss

, but when they have opposite sign,

increases linearly with y (one-sided error).

來自 <http://en.wikipedia.org/wiki/Hinge_loss>

Plot of hinge loss (blue) vs. zero-one loss (misclassification, green:y < 0) for t = 1 and variable y. Note that the hinge loss penalizes predictions y < 1, corresponding to the notion of a margin in a support vector machine.

來自 <http://en.wikipedia.org/wiki/Hinge_loss>

在Pegasos: Primal Estimated sub-GrAdient SOlver for SVM論文中

這里把第一部分看成正規化部分，第二部分看成誤差部分，注意對比ng關於svm的課件

不考慮規則化

考慮規則化

Log Loss

Ng的課件1，先是講 linear regression 然后引出最小二乘誤差，之后概率角度高斯分布解釋最小誤差。

然后講邏輯回歸，使用MLE來引出優化目標是使得所見到的訓練數據出現概率最大

最大化下面的log似然函數

而這個恰恰就是最小化cross entropy！

http://en.wikipedia.org/wiki/Cross_entropy

http://www.cnblogs.com/rocketfan/p/3350450.html 信息論，交叉熵與KL divergence關系

Cross entropy can be used to define loss function in machine learning and optimization. The true probability

is the true label, and the given distribution

is the predicted value of the current model.

More specifically, let us consider logistic regression, which (in its most basic guise) deals with classifying a given set of data points into two possible classes generically labelled

and

. The logistic regression model thus predicts an output

, given an input vector

. The probability is modeled using thelogistic function

. Namely, the probability of finding the output

is given by

where the vector of weights

is learned through some appropriate algorithm such as gradient descent. Similarly, the conjugate probability of finding the output

is simply given by

The true (observed) probabilities can be expressed similarly as

and

Having set up our notation,

and

, we can use cross entropy to get a measure for similarity between

and

The typical loss function that one uses in logistic regression is computed by taking the average of all cross-entropies in the sample. For specifically, suppose we have

samples with each sample labeled by

. The loss function is then given by:

where

, with

the logistic function as before.

The logistic loss is sometimes called cross-entropy loss. It's also known as log loss (In this case, the binary label is often denoted by {-1,+1}).^[1]

來自 <http://en.wikipedia.org/wiki/Cross_entropy>

因此和ng從MLE角度給出的結論是完全一致的！差別是最外面的一個負號

也就是邏輯回歸的優化目標函數是交叉熵

修正 14.8這個公式課件里面應該寫錯了一點第一個+ 應該是-，這樣對應loss 優化目標是越小越好，MLE對應越大也好。

squared loss

exponential loss

指數誤差通常用在boosting中，指數誤差始終> 0，但是確保越接近正確的結果誤差越小，反之越大。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 損失函數(Loss Function) -1 損失函數（Loss Function）對數損失函數(Logarithmic Loss Function)的原理和 Python 實現 Focal Loss 損失函數簡述 tf常見的損失函數（LOSS）匯總 Focal loss損失函數所解決的問題損失函數：Hinge Loss（max margin） tf使用交叉熵損失函數，loss為負機器學習中的損失函數（着重比較：hinge loss vs softmax loss） softmax loss function：歸一化指數函數