淺談范數正則化


淺談范數正則化

    作者:凱魯嘎吉 - 博客園 http://www.cnblogs.com/kailugaji/

    這篇博客介紹不同范數作為正則化項時的作用。首先介紹了常見的向量范數與矩陣范數,然后說明添加正則化項的原因,之后介紹向量的$L_0$,$L_1$,$L_2$范數及其作為正則化項的作用,對三者進行比較分析,並用貝葉斯觀點解釋傳統線性模型與正則化項。隨后,介紹矩陣的$L_{2, 1}$范數及其推廣形式$L_{p, q}$范數,以及矩陣的核范數及其推廣形式Schatten范數。最后,用MATLAB程序編寫了Laplace分布與Gauss分布的概率密度函數圖。有關矩陣范數優化求解問題可參考:一類涉及矩陣范數的優化問題 - 凱魯嘎吉 - 博客園 

1. 向量范數與矩陣范數

2. 為什么要添加正則項?

3. $L_0$范數

4. $L_1$范數

5. $L_2$范數

6. $L_1$范數與$L_2$范數作為正則項的區別

7. 用概率解釋傳統線性回歸模型

8. $L_2$范等價於Gauss先驗

9. $L_1$范數等價於Laplace先驗

10. 矩陣的$L_{2, 1}$范數及$L_{p, q}$范數

11. 矩陣的核范數及Schatten范數

12. MATLAB程序:Laplace分布與Gauss分布的概率密度函數圖

%% Demo of Laplace Density Function
% x : variable
% lambda : size para
%miu: location para
clear
clc
x = -10:0.1:10;
y_1=Laplace_distribution(x, 0, 1);
y_2=Laplace_distribution(x, 0, 2);
y_3=Laplace_distribution(x, 0, 4);
y_4=Laplace_distribution(x, -5, 4);
y_5=Laplace_distribution(x, 5, 4);
y_6=normpdf(x,0,1);
plot(x, y_1, 'r-', x, y_2, 'g-', x, y_3, 'c-', x, y_4, 'm-', x, y_5, 'y-', x, y_6, 'b-', 'LineWidth',1.2);
legend('\mu =0, \lambda=1','\mu=0, \lambda=2','\mu=0, \lambda=4','\mu=-5, \lambda=4','\mu=5, \lambda=4', '\mu=0, \sigma=1'); %圖例的設置
xlabel('x');
ylabel('f(x)');
title('Laplace vs Gauss pdf');
set(gca, 'FontName', 'Times New Roman', 'FontSize',11);
saveas(gcf,sprintf('demo_Laplace_Gauss.jpg'),'bmp'); %保存圖片

%% Laplace Density Function
function y=Laplace_distribution(x, miu, lambda)
    y = 1 / (2*lambda) * exp( -abs(x-miu)/lambda);
end

13. 參考文獻

[1] 證明核范數是矩陣秩的凸包絡

EJ Candès,  Recht B . Exact Matrix Completion via Convex Optimization[J]. Foundations of Computational Mathematics, 2009, 9(6):717.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.312.1183&rep=rep1&type=pdf

[2] 關於說明$L_1$范數是$L_0$范數的凸包絡的文獻及教案

Donoho D L ,  Huo X . Uncertainty Principles and Ideal Atomic Decomposition[J]. IEEE Transactions on Information Theory, 2001, 47(7):2845-2862.

http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=00BC0C50CDECB265657379792F917FFE?doi=10.1.1.161.9300&rep=rep1&type=pdf

Learning with Combinatorial Structure Note for Lecture 12

http://people.csail.mit.edu/stefje/fall15/notes_lecture12.pdf

L1-norm Methods for Convex-Cardinality Problems

https://web.stanford.edu/class/ee364b/lectures/l1_slides.pdf

[3] 有關過擬合的教案及圖片來源

2017 Lecture 2: Overfitting. Regularization

https://www.cs.mcgill.ca/~dprecup/courses/ML/Lectures/ml-lecture02.pdf

[4] 一些可供參考的資料

The difference between L1 and L2 regularization

https://explained.ai/regularization/L1vsL2.html

Why L1 norm for sparse models

https://stats.stackexchange.com/questions/45643/why-l1-norm-for-sparse-models

Why L1 regularization can “zero out the weights” and therefore leads to sparse models? [duplicate]

https://stats.stackexchange.com/questions/375374/why-l1-regularization-can-zero-out-the-weights-and-therefore-leads-to-sparse-m

What are L1, L2 and Elastic Net Regularization in neural networks?

https://www.machinecurve.com/index.php/2020/01/21/what-are-l1-l2-and-elastic-net-regularization-in-neural-networks/

Introduction. Sharpness Enhancement and Denoising of Image Using L1-Norm Minimization Technique in Adaptive Bilateral Filter. 

https://www.ijsr.net/archive/v3i11/T0NUMTQxMzUy.pdf


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM