機器學習筆記：Hierarchical_clustering with scikit-learn(層次聚類)

本文轉載自查看原文 2019-11-18 12:47 400 機器學習筆記

Hierarchical clustering-層次聚類

概念：層次聚類（hierarchical clustering）試圖在不同層次對數據集進行划分，從而形成樹形的聚類結構。數據集的划分可以采用“自底向上”的聚合策略，也可以采用“自頂向下”的分拆策略。

算法：AGNES(AGglomerative NESting)是一種采用自底向上聚合策略的層次聚類算法。它先將數據集中的每個樣本看作一個初始聚類簇，然后在算法運行的每一步中找出距離最近的兩個聚類簇進行合並，該過程不斷重復，直至達到預設的聚類簇個數。這里的關鍵是如何計算聚類簇之間的距離。實際上，每個聚類簇是一個樣本集合，因此，只需要采用關於集合的某種距離即可。通常采用三種距離d_min,d_max,d_avg,在AGNES算法中被相應地稱為“單鏈接（single-linkage）”、“全鏈接（complete-linkage）”、“均鏈接（average-linkage）”算法。

代碼示例：

#import the library that we need
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import adjusted_rand_score
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt

#Use the Agglomerative algorithm and plot the adjusted_rand_score
#one line is linkage='avarage', the other is linkage='complete'
#database is load_iris
def test_AgglomerativeClustering(*data):
    X,y=data
    linkages=['average','complete']
    nums=range(1,50)
    fig=plt.figure()
    ax=fig.add_subplot(1,1,1)
    markers="+o*"
    for i,linkage in enumerate(linkages):
        ARIs=[]
        for num in nums:
            clu=AgglomerativeClustering(n_clusters=num,linkage=linkage)#n_clusters:the number of clusters we want;linkage:the way to calculate the distance
            predicts=clu.fit_predict(X)
            ARIs.append(adjusted_rand_score(predicts,y))
        ax.plot(nums,ARIs,marker=markers[i],label="linkage:%s"%linkage)
        ax.set_xlabel("n_clusters")
        ax.set_ylabel("ARI")
        ax.legend(loc="best")
    plt.show()

#main function
def main():
    Data1=load_iris()
    X=Data1.data
    y=Data1.target
    test_AgglomerativeClustering(X,y)
    pass

if __name__=='__main__':
    main()

運行結果：

在pycharm中如果想要看某個庫函數的具體參數及代碼，可以按Ctrl鍵后鼠標移動到庫函數位置，點擊左鍵后即可進入該庫函數。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 機器學習筆記之聚類算法層次聚類 Hierarchical Clustering 層次聚類 Hierarchical Clustering Python scikit-learn機器學習工具包學習筆記 [Python & Machine Learning] 學習筆記之scikit-learn機器學習庫 Python scikit-learn機器學習工具包學習筆記：feature_selection模塊 [譯]使用scikit-learn進行機器學習(scikit-learn教程1) 機器學習算法庫scikit-learn的安裝 Python機器學習庫scikit-learn實踐基於 Python 和 Scikit-Learn 的機器學習介紹 4.2 Scikit-Learn簡介（機器學習篇）