使用肘部法確定k-means均值的k值

本文轉載自查看原文 2019-10-23 11:07 635 機器學習

import numpy as np
from sklearn.cluster import KMeans
from scipy.spatial.distance import cdist
import matplotlib.pyplot as plt

c1x = np.random.uniform(0.5, 1.5, (1, 10))
c1y = np.random.uniform(0.5, 1.5, (1, 10))
c2x = np.random.uniform(3.5, 4.5, (1, 10))
c2y = np.random.uniform(3.5, 4.5, (1, 10))
x = np.hstack((c1x, c2x))
y = np.hstack((c1y, c2y))
X = np.vstack((x, y)).T

K = range(1, 10)
meanDispersions = []
for k in K:
    kmeans = KMeans(n_clusters=k)
    kmeans.fit(X)
    #理解為計算某個與其所屬類聚中心的歐式距離
    #最終是計算所有點與對應中心的距離的平方和的均值
    meanDispersions.append(sum(np.min(cdist(X, kmeans.cluster_centers_, 'euclidean'), axis=1)) / X.shape[0])

plt.plot(K, meanDispersions, 'bx-')
plt.xlabel('k')
plt.ylabel('Average Dispersion')
plt.title('Selecting k with the Elbow Method')
plt.show()

X為：

[[0.84223858 1.18059879]
 [0.84834276 0.84499409]
 [1.13263229 1.34316399]
 [0.95487981 0.59743761]
 [0.81646041 1.32361288]
 [0.90405171 0.54047701]
 [1.2723004  1.3461647 ]
 [0.52939142 1.03325549]
 [0.84592514 0.74344317]
 [1.07882783 1.4286598 ]
 [3.71702311 3.97510452]
 [3.95476036 3.83842502]
 [4.4297804  3.91854623]
 [4.08686159 4.15798624]
 [3.90406684 3.84413461]
 [4.32395689 4.06825926]
 [4.23112269 3.78578326]
 [3.70602931 4.08608482]
 [3.58690191 4.37072349]
 [4.38564657 4.02168693]]

隨着K的增加，縱軸呈下降趨勢且最終趨於穩定，那么拐點肘部處的位置所對應的k 值，不妨認為是相對最佳的類聚數量值。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 K均值(K-MEANS) 機器學習 | 算法筆記- K均值（K-Means） K-均值（K-means）聚類算法基於R實現k-means法與k-medoids法 K-means Algorithm 聚類-K-Means 【算法】K-Means聚類算法（k-平均或k-均值）非監督學習方法---k均值聚類（k-means） K-means與K-means++ 【Spark】使用spark進行K-means分析