Co-Training vs Self-Training

本文轉載自查看原文 2018-01-12 07:16 2412

首先，在實際做classification的場景中，經常會遇到只有少量的labeled data而更多的data都是unlabeled 的情況。co-training和self-training這兩個算法即是用來解決這樣情況的。

下面分別描述這兩種算法：

1.Self-training:

用已有的Labled data先建立一個分類器，建好之后用它去estimate那些unlabeled的data.

之后，之前的labeled data加上新estimate出來的 “pseudo-labeled” unlabeled data一起，再train出來一個新的分類器。

重負上述步驟，直到所有unlabeled data都被歸類進去。

2.Co-training:

used in special cases of the more general multi-view learning.

即當要training的數據，可以從不同的views來看待的時候。舉個例子，在做網頁分類（web-page classification）這個模型時候，feature的來源有兩個部分，一是URL features of the websites 記為 A, 二是text features of the websites 記為 B.

co-training的算法是：

• Inputs: An initial collection of labeled documents and one of unlabeled documents.

• Loop while there exist documents without class labels:

• Build classifier A using the A portion of each document.

• Build classifier B using the B portion of each document.

• For each class C, pick the unlabeled document about which classifier A is most confident that its class label is C and add it to the collection of labeled documents.

• For each class C, pick the unlabeled document about which classifier B is most confident that its class label is C and add it to the collection of labeled documents.

• Output: Two classifiers, A and B, that predict class labels for new documents. These predictions can be combined by multiplying together and then renormalizing their class probability scores.

即兩組用features A,B分別做兩個分類器，單獨每個分類器里面用self-training的方法分別進行training的迭代（每次增加新的unlabeled數據），最后使用兩個self-training結束的分類器，一起進行prediction.

其主要的思路是，對於那些可以feature可以天然split的數據，用每組feature做出不同的分類器，不同features做出來的分類器可以相互互補

最后總結：

co-training和self-training之前最直觀的區別就是：在學習的過程中，前者有兩個分類器(classifier)，而后者僅有一個分類器。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 self-training and co-training 什么是co-training A Three-Stage Self-Training Framework for Semi-Supervised Semantic Segmentation Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training training 2 Socially-Aware Self-Supervised Tri-Training for Recommendation PCIE training Adversarial Training TensorFlow Training 優化函數 HITCON-Training-Writeup