如何理解機器學習/統計學中的各種范數norm | L1 | L2 | 使用哪種regularization方法？

本文轉載自查看原文 2018-04-07 19:02 1604

參考：

L1 Norm Regularization and Sparsity Explained for Dummies 專為小白解釋的文章，文筆十分之幽默

why does a small L1 norm give a sparse solution?
why does a sparse solution avoid over-fitting?
what does regularization do really?

減少feature的數量可以防止over fitting，尤其是在特征比樣本數多得多的情況下。

L1就二維而言是一個四邊形（L1 norm is |x| + |y|），它是只有形狀沒有大小的，所以可以不斷伸縮。我們得到的參數是一個直線（兩個參數時），也就是我們有無數種取參數的方法，但是我們想滿足L1的約束條件，所以要選擇相交點的參數組。

Then why not letting p < 1? That’s because when p < 1, there are calculation difficulties. 所以我們通常只在L1和L2之間選，這是因為計算問題，並不是不能。

l0-Norm, l1-Norm, l2-Norm, … , l-infinity Norm

$\left \| x \right \|_p = \sqrt[p]{\sum_{i}\left | x_i \right |^p}$ where $p \epsilon \mathbb{R}$

就是一個簡單的公式而已，所有的范數瞬間都可以理解了。（注意范數的寫法，寫在下面，帶雙豎杠）

Before answering your question I need to edit that Manhattan norm is actually L1 norm and Euclidean norm is L2.

As for real-life meaning, Euclidean norm measures the beeline/bird-line distance, i.e. just the length of the line segment connecting two points. However, when we move around, especially in a crowded city area like Manhattan, we obviously cannot follow a straight line (unless you can fly like a bird). Instead, we need to follow a grid-like route, e.g. 3 blocks to teh west, then 4 blocks to the south. The length of this grid route is the Manhattan norm.

之前的印象是L1就是Lasso，是一個四邊形，相當於絕對值。

L2就是Ridge，相當於是一個圓。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 機器學習中的規則化范數(L0, L1, L2, 核范數) 機器學習中的范數規則化之（一）L0、L1與L2范數 L0、L1和L2范數在機器學習中的用途機器學習中正則懲罰項L0/L1/L2范數詳解 L1、L2范數理解 4.機器學習——統計學習三要素與最大似然估計、最大后驗概率估計及L1、L2正則化『科學計算』L0、L1與L2范數_理解機器學習：L1和L2正則化項的理解深度學習——L0、L1及L2范數 OpenCV-Python教程：統計函數~L1、L2、無窮范數、漢明范數(norm,NORM_HAMMING2,NORM_HAMMING)