python大戰機器學習——半監督學習

本文轉載自查看原文 2017-09-23 20:57 3072 機器學習

　　半監督學習：綜合利用有類標的數據和沒有類標的數據，來生成合適的分類函數。它是一類可以自動地利用未標記的數據來提升學習性能的算法

1、生成式半監督學習

　　優點：方法簡單，容易實現。通常在有標記數據極少時，生成式半監督學習方法比其他方法性能更好

　　缺點：假設的生成式模型必須與真實數據分布吻合。如果不吻合則可能效果很差。而如何給出與真實數據分布吻合的生成式模型，這就需要對問題領域的充分了解

2、圖半監督學習

（1）標記傳播算法：

　　優點：概念清晰

　　缺點：存儲開銷大，難以直接處理大規模數據；而且對於新的樣本加入，需要對原圖重構並進行標記傳播

（2）迭代式標記傳播算法：

　　輸入：有標記樣本集Dl，未標記樣本集Du，構圖參數δ，折中參數α

　　輸出：未標記樣本的預測結果y

　　步驟：

　　　　1）計算W

　　　　2）基於W構造標記傳播矩陣S

　　　　3）根據公式初始化F<0>

　　　　4）t=0

　　　　5)迭代，迭代終止條件是F收斂至F*：

　　　　　　F<t+1>=αSF<t>+(1-α)Y

　　　　　　t=t+1

　　　　6）構造未標記樣本的預測結果yi

　　　　7）輸出結果y

　　LabelPropagation實驗代碼：

 1 import numpy as np
 2 import matplotlib.pyplot as plt
 3 from sklearn import metrics
 4 from sklearn import datasets
 5 from sklearn.semi_supervised import LabelPropagation
 6 
 7 def load_data():
 8     digits=datasets.load_digits()
 9     rng=np.random.RandomState(0)
10     index=np.arange(len(digits.data))
11     rng.shuffle(index)
12     X=digits.data[index]
13     Y=digits.target[index]
14     n_labeled_points=int(len(Y)/10)
15     unlabeled_index=np.arange(len(Y))[n_labeled_points:]
16 
17     return X,Y,unlabeled_index
18 
19 def test_LabelPropagation(*data):
20     X,Y,unlabeled_index=data
21     Y_train=np.copy(Y)
22     Y_train[unlabeled_index]=-1
23     cls=LabelPropagation(max_iter=100,kernel='rbf',gamma=0.1)
24     cls.fit(X,Y_train)
25     print("Accuracy:%f"%cls.score(X[unlabeled_index],Y[unlabeled_index]))
26 
27 X,Y,unlabeled_index=load_data()
28 test_LabelPropagation(X,Y,unlabeled_index)

View Code

　　實驗結果：

可見預測的准確率還是挺高的

　　LabelSpreading實驗代碼：

 1 import numpy as np
 2 import matplotlib.pyplot as plt
 3 from sklearn import metrics
 4 from sklearn import datasets
 5 from sklearn.semi_supervised import LabelPropagation,LabelSpreading
 6 
 7 def load_data():
 8     digits=datasets.load_digits()
 9     rng=np.random.RandomState(0)
10     index=np.arange(len(digits.data))
11     rng.shuffle(index)
12     X=digits.data[index]
13     Y=digits.target[index]
14     n_labeled_points=int(len(Y)/10)
15     unlabeled_index=np.arange(len(Y))[n_labeled_points:]
16 
17     return X,Y,unlabeled_index
18 
19 def test_LabelPropagation(*data):
20     X,Y,unlabeled_index=data
21     Y_train=np.copy(Y)
22     Y_train[unlabeled_index]=-1
23     cls=LabelPropagation(max_iter=100,kernel='rbf',gamma=0.1)
24     cls.fit(X,Y_train)
25     print("Accuracy:%f"%cls.score(X[unlabeled_index],Y[unlabeled_index]))
26 
27 def test_LabelSpreading(*data):
28     X,Y,unlabeled_index=data
29     Y_train=np.copy(Y)
30     Y_train[unlabeled_index]=-1
31     cls=LabelSpreading(max_iter=100,kernel='rbf',gamma=0.1)
32     cls.fit(X,Y_train)
33     predicted_labels=cls.transduction_[unlabeled_index]
34     true_labels=Y[unlabeled_index]
35     print("Accuracy:%f"%metrics.accuracy_score(true_labels,predicted_labels))
36 
37 X,Y,unlabeled_index=load_data()
38 #test_LabelPropagation(X,Y,unlabeled_index)
39 test_LabelSpreading(X,Y,unlabeled_index)

View Code

　　注：LabelSpreading類似於LabelPropagation，但是使用基於normalized graph Laplacian and soft clamping的距離矩陣

　　實驗結果：

　　預測效果也很不錯

3、總結

　　半監督學習在利用未標記樣本后並非必然提升泛化性能，在有些情況下甚至會導致性能下降。對生成式方法，原因通常是模型假設不准確。因此需要依賴充分可靠的領域知識來設計模型。更一般的安全半監督學習仍然是未加解決的難題。安全是指：利用未標記樣本后，能確保返回性能至少不差於僅利用有標記樣本

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【機器學習】半監督學習 Python 機器學習實戰 —— 無監督學習（上） Python 機器學習實戰 —— 監督學習（上） Python 機器學習實戰 —— 監督學習（下）機器學習中的有監督學習，無監督學習，半監督學習監督學習與無監督學習的區別_機器學習機器學習分類之監督學習、無監督學習和強化學習機器學習一 -- 什么是監督學習和無監督學習？機器學習基礎---無監督學習之降維機器學習-有監督學習-分類算法