An unbiased risk estimator vs. The same minimizer
- 矯正后的 loss 可以導致一個無偏的(consistent)估計,
\[\mathbb E_{p(x,\tilde{y})}[\ell_{correct}^1(h(x),\tilde{y})] = \mathbb E_{p(x,y)}[\ell(h(x),y)],\forall\,h \]
- 兩者有相同的 minimizer:
\[\mathop{\arg\!\min}_h\mathbb E_{p(x,\tilde{y})}[\ell_{correct}^2(h(x),\tilde{y})] = \mathop{\arg\!\min}_h\mathbb E_{p(x,y)}[\ell(h(x),y)] \]
上述條件 (2) 比 (1)要弱一些:
- (1) 可以推出(2)
- 在 (1) 不成立的情形下,(2)有可能成立。
- (2) 一般是在保留相同的 bayesian optimal classifier
Reference:
- NIPS-13. Learning with noisy label
- CVPR-17. Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach
- ICML-20. Does Label Smoothing Mitigate Label Noise?
- ICML-20. Learning with Multiple Complementary Labels
Statistically consistent, classifier-consistency, risk-consistent
統計相容性,更多的考慮,在 \(n\to\infty\) 的情況下,兩者之間的差別
Reference:
- NIPS-19 Are anchor points really indispensable in label-noise learning?(提到 risk-consistent, classifier consistent)
- ICML-20 Does Label Smoothing Mitigate Label Noise? (提到 classification consistency)
- ICML-13 On the statistical Consistency of Algorithms for Binary Classification under Class Imbalance
- ICML-20. Progressive Identification of True Labels for Partial-Label Learning (classifier-consistency)
- ICML-20 Learning with multiple complementary labels (classifier-consistency, risk-consistent)
- NIPS-20 Provably consistent partial-label learning (risk consistent, classifier-consistent)
Excess risk bound vs. Generalization bound vs. learnability
(1). Excess risk 主要考慮的是當前 (ERM 算法所導出)分類器 與 最優的分類器的泛化誤差的 gap
(2). Generalization bound 考慮的是經驗誤差與泛化誤差的 uniform 的 gap,對假設空間中的所有假設同時成立,因此需要用 Rademacher complexity or VC dim 來刻畫假設空間的復雜度。
(3). 有了 generalization bound, 就非常容易導出 excess risk bound, 幾乎就是兩倍的關系。(參見 Foundations of ML (2nd) Proposition 4.1 )
(4). 可學習性考慮的是 ERM 算法輸出的分類器的泛化誤差 與 最優的分類器的泛化誤差之間的 gap,其實就是 Excess risk。
參考文獻:
- ICML20. Class-Weighted Classification: Trade-offs and Robust Approaches.
- ICML20. Learning with Bounded Instance- and Label-dependent Label Noise.
Plug-in classifiers
Reference
- NIPS09
- ICML20.
- ICML20. Class-Weighted Classification: Trade-offs and Robust Approaches
- 之前審稿的 rejection paper
Loss unbounded below 導致 overfit
不同於 0-1 error, 凸 loss 通常是無界的,會導致給與 outlier 過大的權重
Reference:
- NIPS-09
- ICML-20. Learning with Multiple Complementary Labels
- NIPS-19. Robust Bi-Tempered Logistic Loss Based on Bregman Divergences
0-1 loss non-convex, non-smooth
Bayes classifier 其實是在優化 0-1 loss, 也就是在優化錯誤的概率。
Reference:
- NeuroComputing-15. Making Risk Minimization Tolerant to Label Noise