損失函數(Loss Function) -1


http://www.ics.uci.edu/~dramanan/teaching/ics273a_winter08/lectures/lecture14.pdf

  1. Loss Function

    損失函數可以看做 誤差部分(loss term) + 正則化部分(regularization term)

1.1 Loss Term

  • Gold Standard (ideal case)
  • Hinge (SVM, soft margin)
  • Log (logistic regression, cross entropy error)
  • Squared loss (linear regression)
  • Exponential loss (Boosting)

   

Gold Standard 又被稱為0-1 loss, 記錄分類錯誤的次數

Hinge Loss http://en.wikipedia.org/wiki/Hinge_loss

For an intended output t = ±1 and a classifier score y, the hinge loss of the prediction y is defined as

Note that y should be the "raw" output of the classifier's decision function, not the predicted class label. E.g., in linear SVMs, 

It can be seen that when t and y have the same sign (meaning y predicts the right class) and 

, the hinge loss 

, but when they have opposite sign, 

increases linearly with y (one-sided error).

   

來自 <http://en.wikipedia.org/wiki/Hinge_loss>

Plot of hinge loss (blue) vs. zero-one loss (misclassification, green:y < 0) for t = 1 and variable y. Note that the hinge loss penalizes predictions y < 1, corresponding to the notion of a margin in a support vector machine.

   

來自 <http://en.wikipedia.org/wiki/Hinge_loss>

   

   

在Pegasos: Primal Estimated sub-GrAdient SOlver for SVM論文中

這里把第一部分看成正規化部分,第二部分看成誤差部分,注意對比ng關於svm的課件

不考慮規則化

考慮規則化

   

Log Loss

Ng的課件1,先是講 linear regression 然后引出最小二乘誤差,之后概率角度高斯分布解釋最小誤差。

然后講邏輯回歸,使用MLE來引出優化目標是使得所見到的訓練數據出現概率最大

   

   

最大化下面的log似然函數

而這個恰恰就是最小化cross entropy

   

http://en.wikipedia.org/wiki/Cross_entropy

http://www.cnblogs.com/rocketfan/p/3350450.html 信息論,交叉熵與KL divergence關系

   

Cross entropy can be used to define loss function in machine learning and optimization. The true probability 

 is the true label, and the given distribution 

 is the predicted value of the current model.

More specifically, let us consider logistic regression, which (in its most basic guise) deals with classifying a given set of data points into two possible classes generically labelled 

 and 

. The logistic regression model thus predicts an output 

, given an input vector 

. The probability is modeled using thelogistic function 

. Namely, the probability of finding the output 

 is given by

where the vector of weights 

 is learned through some appropriate algorithm such as gradient descent. Similarly, the conjugate probability of finding the output 

 is simply given by

The true (observed) probabilities can be expressed similarly as 

 and 

.

   

Having set up our notation, 

 and 

, we can use cross entropy to get a measure for similarity between 

 and 

:

The typical loss function that one uses in logistic regression is computed by taking the average of all cross-entropies in the sample. For specifically, suppose we have 

 samples with each sample labeled by 

. The loss function is then given by:

where 

, with 

 the logistic function as before.

   

The logistic loss is sometimes called cross-entropy loss. It's also known as log loss (In this case, the binary label is often denoted by {-1,+1}).[1]

   

來自 <http://en.wikipedia.org/wiki/Cross_entropy>

   

   

因此和ng從MLE角度給出的結論是完全一致的! 差別是最外面的一個負號

也就是邏輯回歸的優化目標函數是 交叉熵

   修正 14.8這個公式 課件里面應該寫錯了一點 第一個+ 應該是-,這樣對應loss 優化目標是越小越好,MLE對應越大也好。

squared loss

   

exponential loss

指數誤差通常用在boosting中,指數誤差始終> 0,但是確保越接近正確的結果誤差越小,反之越大。

   

   


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM