參考:
L1 Norm Regularization and Sparsity Explained for Dummies 專為小白解釋的文章,文筆十分之幽默
- why does a small L1 norm give a sparse solution?
- why does a sparse solution avoid over-fitting?
- what does regularization do really?
減少feature的數量可以防止over fitting,尤其是在特征比樣本數多得多的情況下。
L1就二維而言是一個四邊形(L1 norm is |x| + |y|),它是只有形狀沒有大小的,所以可以不斷伸縮。我們得到的參數是一個直線(兩個參數時),也就是我們有無數種取參數的方法,但是我們想滿足L1的約束條件,所以 要選擇相交點的參數組。

Then why not letting p < 1? That’s because when p < 1, there are calculation difficulties. 所以我們通常只在L1和L2之間選,這是因為計算問題,並不是不能。
l0-Norm, l1-Norm, l2-Norm, … , l-infinity Norm
where ![]()
就是一個簡單的公式而已,所有的范數瞬間都可以理解了。(注意范數的寫法,寫在下面,帶雙豎杠)
Before answering your question I need to edit that Manhattan norm is actually L1 norm and Euclidean norm is L2.
As for real-life meaning, Euclidean norm measures the beeline/bird-line distance, i.e. just the length of the line segment connecting two points. However, when we move around, especially in a crowded city area like Manhattan, we obviously cannot follow a straight line (unless you can fly like a bird). Instead, we need to follow a grid-like route, e.g. 3 blocks to teh west, then 4 blocks to the south. The length of this grid route is the Manhattan norm.
之前的印象是L1就是Lasso,是一個四邊形,相當於絕對值。
L2就是Ridge,相當於是一個圓。
