- lasso 也叫L1正则化 惩罚系数的绝对值
ridge 也叫L2正则化 惩罚系数的平方
- ridge 惩罚后 每个系数都收缩
lasso 惩罚后,有的系数直接变成0 其他系数收缩
- LASSO: least absolute selection and shrinkage operator
lasso 有变量选择的功能
===============
共同点为:
(1) 当截距项存在时,都不惩罚截距项
beta_0 = mean(y)
(2) 都是有偏的
(3) 都要把系数scale后,再进行惩罚,因为 sum || beta||时,要保证fair
=============
关于bias 和variance
bias of lasso estimate increasing as lambda increasing
variance of lasso estimate increasing as lambda increasing
=============
关于预测误差的讨论
(1) 有说类似:In terms of prediction error (or mean squared error), the lasso
performs comparably to ridge regression.
(2) 有说ridge更好:“Typically ridge or ℓ2 penalties are **much better** for minimizing prediction error rather than ℓ1 penalties. The reason for this is that when two predictors are highly correlated, ℓ1 regularizer will simply pick one of the two predictors. In contrast, the ℓ2 regularizer will keep both of them and jointly shrink the corresponding coefficients a little bit. Thus, while the ℓ1 penalty can certainly reduce overfitting, you may also experience a loss in predictive power.”
===================
拓展
Bayesian Lasso
当在贝叶斯框架下考虑问题,参数的先验分布选择了laplace分布,则最大化后验概率导出的目标函数为lasso形式。
Bayesian Ridge
参数的先验分布选择了正态分布,则最大化后验概率导出的目标函数为ridge形式。
见 https://www.zhihu.com/question/23536142
进阶阅读
Trevor Park & George Casella (2008) The Bayesian Lasso, Journal of the
American Statistical Association, 103:482, 681-686, DOI: 10.1198/016214508000000337
=================
应用
The lasso, Bayesian lasso, and extensions can be done using the
【monomvn package in R】.