Lasoo 与ridge regression 区别


  • lasso 也叫L1正则化 惩罚系数的绝对值

  ridge 也叫L2正则化 惩罚系数的平方

 

  • ridge 惩罚后 每个系数都收缩

  lasso 惩罚后,有的系数直接变成0 其他系数收缩

 

  • LASSO: least absolute selection and shrinkage operator

  lasso 有变量选择的功能

 

===============

共同点为:

(1) 当截距项存在时,都不惩罚截距项

beta_0 = mean(y)

(2) 都是有偏的

(3) 都要把系数scale后,再进行惩罚,因为 sum || beta||时,要保证fair

 

=============

关于bias 和variance

bias of lasso estimate  increasing as lambda increasing

variance of lasso estimate increasing as lambda increasing

 

=============

关于预测误差的讨论

(1) 有说类似:In terms of prediction error (or mean squared error), the lasso
performs comparably to ridge regression.

(2) 有说ridge更好:“Typically ridge or ℓ2 penalties are **much better** for minimizing prediction error rather than ℓ1 penalties. The reason for this is that when two predictors are highly correlated, ℓ1 regularizer will simply pick one of the two predictors. In contrast, the ℓ2 regularizer will keep both of them and jointly shrink the corresponding coefficients a little bit. Thus, while the ℓ1 penalty can certainly reduce overfitting, you may also experience a loss in predictive power.”

===================

拓展

Bayesian Lasso

当在贝叶斯框架下考虑问题,参数的先验分布选择了laplace分布,则最大化后验概率导出的目标函数为lasso形式。

Bayesian Ridge

参数的先验分布选择了正态分布,则最大化后验概率导出的目标函数为ridge形式。

https://www.zhihu.com/question/23536142

进阶阅读

Trevor Park & George Casella (2008) The Bayesian Lasso, Journal of the
American Statistical Association, 103:482, 681-686, DOI: 10.1198/016214508000000337

 

=================

应用

The lasso, Bayesian lasso, and extensions can be done using the
【monomvn package in R】.


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM