Huber loss--转

本文转载自查看原文 2016-06-15 08:46 1691 机器学习算法

原文地址：https://en.wikipedia.org/wiki/Huber_loss

In statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss. A variant for classification is also sometimes used.

Definition

Huber loss (green,

The Huber loss function describes the penalty incurred by an estimation procedure $f. Huber (1964) defines the loss function piecewise by [1]$

This function is quadratic for small values of $|a|=\delta$

Motivation

Two very commonly used loss functions are the squared loss, {\displaystyle L(a)=a^{2}} $L(a) = a^2$ , and the absolute loss, {\displaystyle L(a)=|a|} $L(a)=|a|$ . The squared loss function results in an arithmetic mean-unbiased estimator, and the absolute-value loss function results in a median-unbiased estimator (in the one-dimensional case, and a geometric median-unbiased estimator for the multi-dimensional case). The squared loss has the disadvantage that it has the tendency to be dominated by outliers—when summing over a set of {\displaystyle a} $a$ 's (as in {\textstyle \sum _{i=1}^{n}L(a_{i})} ${\textstyle \sum _{i=1}^{n}L(a_{i})}$ ), the sample mean is influenced too much by a few particularly large a-values when the distribution is heavy tailed: in terms of estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions.

As defined above, the Huber loss function is convex in a uniform neighborhood of its minimum {\displaystyle a=0} $a=0$ , at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points {\displaystyle a=-\delta } $a=-\delta$ and {\displaystyle a=\delta } $a=\delta$ . These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimator (using the absolute value function).

Pseudo-Huber loss function

The Pseudo-Huber loss function can be used as a smooth approximation of the Huber loss function, and ensures that derivatives are continuous for all degrees. It is defined as^[3]^[4]

As such, this function approximates {\displaystyle a^{2}/2} $a^{2}/2$ for small values of {\displaystyle a} $a$ , and approximates a straight line with slope {\displaystyle \delta } $\delta$ for large values of {\displaystyle a} $a$ .

While the above is the most common form, other smooth approximations of the Huber loss function also exist.^[5]

Variant for classification

For classification purposes, a variant of the Huber loss called modified Huber is sometimes used. Given a prediction {\displaystyle f(x)} $f(x)$ (a real-valued classifier score) and a true binary class label {\displaystyle y\in \{+1,-1\}} $y\in \{+1,-1\}$ , the modified Huber loss is defined as^[6]

The term {\displaystyle \max(0,1-y\,f(x))} $\max(0,1-y\,f(x))$ is the hinge loss used by support vector machines; the quadratically smoothed hinge loss is a generalization of {\displaystyle L} $L$ .^[6]

Applications

The Huber loss function is used in robust statistics, M-estimation and additive modelling.^[7]

References

Huber, Peter J. (1964). "Robust Estimation of a Location Parameter". Annals of Statistics 53 (1): 73–101. doi:10.1214/aoms/1177703732. JSTOR 2238020.
Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). The Elements of Statistical Learning. p. 349. Compared to Hastie et al., the loss is scaled by a factor of ½, to be consistent with Huber's original definition given earlier.
Charbonnier, P.; Blanc-Feraud, L.; Aubert, G.; Barlaud, M. (1997). "Deterministic edge-preserving regularization in computed imaging". IEEE Trans. Image Processing 6(2): 298–311. doi:10.1109/83.551699.
Hartley, R.; Zisserman, A. (2003). Multiple View Geometry in Computer Vision (2nd ed.). Cambridge University Press. p. 619. ISBN 0-521-54051-8.
Lange, K. (1990). "Convergence of Image Reconstruction Algorithms with Gibbs Smoothing". IEEE Trans. Medical Imaging 9 (4): 439–446. doi:10.1109/42.61759.
Zhang, Tong (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. ICML.
Friedman, J. H. (2001). "Greedy Function Approximation: A Gradient Boosting Machine". Annals of Statistics 26 (5): 1189–1232. doi:10.1214/aos/1013203451.JSTOR 2699986.

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 Huber Loss MSE, MAE, Huber loss详解 Dice Loss contrastive loss MSE Loss 机器学习-Loss函数-Triplet loss&Circle loss circle loss：统一softmax CrossEntropy loss 和 triplet loss / 2020 keras中loss与val_loss的关系 (2)GAN的loss导致问题 Focal Loss(RetinaNet) 与 OHEM