Lasoo 与ridge regression 区别

本文转载自查看原文 2018-03-26 04:58 3808

lasso 也叫L1正则化惩罚系数的绝对值

　　ridge 也叫L2正则化惩罚系数的平方

ridge 惩罚后每个系数都收缩

　　lasso 惩罚后，有的系数直接变成0 其他系数收缩

LASSO: least absolute selection and shrinkage operator

　　lasso 有变量选择的功能

===============

共同点为：

(1) 当截距项存在时，都不惩罚截距项

beta_0 = mean(y)

(2) 都是有偏的

(3) 都要把系数scale后，再进行惩罚，因为 sum || beta||时，要保证fair

=============

关于bias 和variance

bias of lasso estimate increasing as lambda increasing

variance of lasso estimate increasing as lambda increasing

=============

关于预测误差的讨论

(1) 有说类似：In terms of prediction error (or mean squared error), the lasso
performs comparably to ridge regression.

(2) 有说ridge更好：“Typically ridge or ℓ2 penalties are **much better** for minimizing prediction error rather than ℓ1 penalties. The reason for this is that when two predictors are highly correlated, ℓ1 regularizer will simply pick one of the two predictors. In contrast, the ℓ2 regularizer will keep both of them and jointly shrink the corresponding coefficients a little bit. Thus, while the ℓ1 penalty can certainly reduce overfitting, you may also experience a loss in predictive power.”

===================

拓展

Bayesian Lasso

当在贝叶斯框架下考虑问题，参数的先验分布选择了laplace分布，则最大化后验概率导出的目标函数为lasso形式。

Bayesian Ridge

参数的先验分布选择了正态分布，则最大化后验概率导出的目标函数为ridge形式。

见 https://www.zhihu.com/question/23536142

进阶阅读

Trevor Park & George Casella (2008) The Bayesian Lasso, Journal of the
American Statistical Association, 103:482, 681-686, DOI: 10.1198/016214508000000337

=================

应用

The lasso, Bayesian lasso, and extensions can be done using the
【monomvn package in R】.

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 Ridge Regression岭回归 Kernel ridge regression（KRR）再谈Lasso回归 | elastic net | Ridge Regression 线性回归——lasso回归和岭回归（ridge regression） ISLR系列：(4.2)模型选择 Ridge Regression & the Lasso Sklearn库例子3：分类——岭回归分类（Ridge Regression ）例子 scikit-learn中的岭回归（Ridge Regression）与Lasso回归 L1,L2范数和正则化到lasso ridge regression Linear Regression Multiple Regression