http://www.ics.uci.edu/~dramanan/teaching/ics273a_winter08/lectures/lecture14.pdf
-
Loss Function
損失函數可以看做 誤差部分(loss term) + 正則化部分(regularization term)

1.1 Loss Term
- Gold Standard (ideal case)
- Hinge (SVM, soft margin)
- Log (logistic regression, cross entropy error)
- Squared loss (linear regression)
- Exponential loss (Boosting)
Gold Standard 又被稱為0-1 loss, 記錄分類錯誤的次數

Hinge Loss http://en.wikipedia.org/wiki/Hinge_loss
For an intended output t = ±1 and a classifier score y, the hinge loss of the prediction y is defined as

Note that y should be the "raw" output of the classifier's decision function, not the predicted class label. E.g., in linear SVMs,

It can be seen that when t and y have the same sign (meaning y predicts the right class) and

, the hinge loss

, but when they have opposite sign,

increases linearly with y (one-sided error).
來自 <http://en.wikipedia.org/wiki/Hinge_loss>
Plot of hinge loss (blue) vs. zero-one loss (misclassification, green:y < 0) for t = 1 and variable y. Note that the hinge loss penalizes predictions y < 1, corresponding to the notion of a margin in a support vector machine.
來自 <http://en.wikipedia.org/wiki/Hinge_loss>

在Pegasos: Primal Estimated sub-GrAdient SOlver for SVM論文中

這里把第一部分看成正規化部分,第二部分看成誤差部分,注意對比ng關於svm的課件
不考慮規則化

考慮規則化

Log Loss
Ng的課件1,先是講 linear regression 然后引出最小二乘誤差,之后概率角度高斯分布解釋最小誤差。
然后講邏輯回歸,使用MLE來引出優化目標是使得所見到的訓練數據出現概率最大



最大化下面的log似然函數

而這個恰恰就是最小化cross entropy!
http://en.wikipedia.org/wiki/Cross_entropy
http://www.cnblogs.com/rocketfan/p/3350450.html 信息論,交叉熵與KL divergence關系


Cross entropy can be used to define loss function in machine learning and optimization. The true probability

is the true label, and the given distribution

is the predicted value of the current model.
More specifically, let us consider logistic regression, which (in its most basic guise) deals with classifying a given set of data points into two possible classes generically labelled

and

. The logistic regression model thus predicts an output

, given an input vector

. The probability is modeled using thelogistic function

. Namely, the probability of finding the output

is given by

where the vector of weights

is learned through some appropriate algorithm such as gradient descent. Similarly, the conjugate probability of finding the output

is simply given by

The true (observed) probabilities can be expressed similarly as

and

.
Having set up our notation,

and

, we can use cross entropy to get a measure for similarity between

and

:

The typical loss function that one uses in logistic regression is computed by taking the average of all cross-entropies in the sample. For specifically, suppose we have

samples with each sample labeled by

. The loss function is then given by:

where

, with

the logistic function as before.
The logistic loss is sometimes called cross-entropy loss. It's also known as log loss (In this case, the binary label is often denoted by {-1,+1}).[1]
來自 <http://en.wikipedia.org/wiki/Cross_entropy>
因此和ng從MLE角度給出的結論是完全一致的! 差別是最外面的一個負號
也就是邏輯回歸的優化目標函數是 交叉熵

修正 14.8這個公式 課件里面應該寫錯了一點 第一個+ 應該是-,這樣對應loss 優化目標是越小越好,MLE對應越大也好。
squared loss

exponential loss

指數誤差通常用在boosting中,指數誤差始終> 0,但是確保越接近正確的結果誤差越小,反之越大。

