k-means k均值聚類的弱點/缺點

本文轉載自查看原文 2012-03-06 11:52 4886 [25] 生物信息學

Similar to other algorithm, K-mean clustering has many weaknesses:

1 When the numbers of data are not so many, initial grouping will determine the cluster significantly. 當數據數量不是足夠大時，初始化分組很大程度上決定了聚類，影響聚類結果。
2 The number of cluster, K, must be determined before hand. 要事先指定K的值。
3 We never know the real cluster, using the same data, if it is inputted in a different order may produce different cluster if the number of data is a few. 數據數量不多時，輸入的數據的順序不同會導致結果不同。
4 Sensitive to initial condition. Different initial condition may produce different result of cluster. The algorithm may be trapped in the local optimum. 對初始化條件敏感。
5 We never know which attribute contributes more to the grouping process since we assume that each attribute has the same weight. 無法確定哪個屬性對聚類的貢獻更大。
6 weakness of arithmetic mean is not robust to outliers. Very far data from the centroid may pull the centroid away from the real one. 使用算術平均值對outlier不魯棒。
7 The result is circular cluster shape because based on distance. 因為基於距離，故結果是圓形的聚類形狀。

One way to overcome those weaknesses is to use K-mean clustering only if there are available many data. To overcome outliers problem, we can use median instead of mean. 克服缺點的方法：使用盡量多的數據；使用中位數代替均值來克服outlier的問題。

Some people pointed out that K means clustering cannot be used for other type of data rather than quantitative data. This is not true! See how you can use multivariate data up to n dimensions (even mixed data type) here. The key to use other type of dissimilarity is in the distance matrix.

http://people.revoledu.com/kardi/tutorial/kMean/Weakness.htm

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 k均值聚類（k-means clustering）聚類-K-Means K-均值（K-means）聚類算法 K-Means K均值聚類 python代碼實現 K均值(K-MEANS) K-Means 聚類算法 K-means聚類算法 K-Means聚類算法 K-means聚類算法聚類算法：K-Means