- lasso 也叫L1正則化 懲罰系數的絕對值
ridge 也叫L2正則化 懲罰系數的平方
- ridge 懲罰后 每個系數都收縮
lasso 懲罰后,有的系數直接變成0 其他系數收縮
- LASSO: least absolute selection and shrinkage operator
lasso 有變量選擇的功能
===============
共同點為:
(1) 當截距項存在時,都不懲罰截距項
beta_0 = mean(y)
(2) 都是有偏的
(3) 都要把系數scale后,再進行懲罰,因為 sum || beta||時,要保證fair
=============
關於bias 和variance
bias of lasso estimate increasing as lambda increasing
variance of lasso estimate increasing as lambda increasing
=============
關於預測誤差的討論
(1) 有說類似:In terms of prediction error (or mean squared error), the lasso
performs comparably to ridge regression.
(2) 有說ridge更好:“Typically ridge or ℓ2 penalties are **much better** for minimizing prediction error rather than ℓ1 penalties. The reason for this is that when two predictors are highly correlated, ℓ1 regularizer will simply pick one of the two predictors. In contrast, the ℓ2 regularizer will keep both of them and jointly shrink the corresponding coefficients a little bit. Thus, while the ℓ1 penalty can certainly reduce overfitting, you may also experience a loss in predictive power.”
===================
拓展
Bayesian Lasso
當在貝葉斯框架下考慮問題,參數的先驗分布選擇了laplace分布,則最大化后驗概率導出的目標函數為lasso形式。
Bayesian Ridge
參數的先驗分布選擇了正態分布,則最大化后驗概率導出的目標函數為ridge形式。
見 https://www.zhihu.com/question/23536142
進階閱讀
Trevor Park & George Casella (2008) The Bayesian Lasso, Journal of the
American Statistical Association, 103:482, 681-686, DOI: 10.1198/016214508000000337
=================
應用
The lasso, Bayesian lasso, and extensions can be done using the
【monomvn package in R】.