Lasoo 與ridge regression 區別


  • lasso 也叫L1正則化 懲罰系數的絕對值

  ridge 也叫L2正則化 懲罰系數的平方

 

  • ridge 懲罰后 每個系數都收縮

  lasso 懲罰后,有的系數直接變成0 其他系數收縮

 

  • LASSO: least absolute selection and shrinkage operator

  lasso 有變量選擇的功能

 

===============

共同點為:

(1) 當截距項存在時,都不懲罰截距項

beta_0 = mean(y)

(2) 都是有偏的

(3) 都要把系數scale后,再進行懲罰,因為 sum || beta||時,要保證fair

 

=============

關於bias 和variance

bias of lasso estimate  increasing as lambda increasing

variance of lasso estimate increasing as lambda increasing

 

=============

關於預測誤差的討論

(1) 有說類似:In terms of prediction error (or mean squared error), the lasso
performs comparably to ridge regression.

(2) 有說ridge更好:“Typically ridge or ℓ2 penalties are **much better** for minimizing prediction error rather than ℓ1 penalties. The reason for this is that when two predictors are highly correlated, ℓ1 regularizer will simply pick one of the two predictors. In contrast, the ℓ2 regularizer will keep both of them and jointly shrink the corresponding coefficients a little bit. Thus, while the ℓ1 penalty can certainly reduce overfitting, you may also experience a loss in predictive power.”

===================

拓展

Bayesian Lasso

當在貝葉斯框架下考慮問題,參數的先驗分布選擇了laplace分布,則最大化后驗概率導出的目標函數為lasso形式。

Bayesian Ridge

參數的先驗分布選擇了正態分布,則最大化后驗概率導出的目標函數為ridge形式。

https://www.zhihu.com/question/23536142

進階閱讀

Trevor Park & George Casella (2008) The Bayesian Lasso, Journal of the
American Statistical Association, 103:482, 681-686, DOI: 10.1198/016214508000000337

 

=================

應用

The lasso, Bayesian lasso, and extensions can be done using the
【monomvn package in R】.


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM