python實現kNN（最近鄰）

本文轉載自查看原文 2020-05-04 19:32 775 機器學習

什么是最近鄰？

最近鄰可以用於分類和回歸，這里以分類為例。給定一個訓練集，對新輸入的實例，在訓練數據集中找到與該實例最接近的k個實例，這k個實例的多數屬於某個類，就把該輸入實例分為這個類

最近鄰模型的三個基本要素？

距離度量、K值的選擇和分類決策規則。

距離度量：一般是歐式距離，也可以是Lp距離和曼哈頓距離。

下面是一個具體的例子：

k值怎么選擇？

接下來是代碼實現：

from __future__ import print_function, division
import numpy as np
from mlfromscratch.utils import euclidean_distance

class KNN():
    """ K Nearest Neighbors classifier.

    Parameters:
    -----------
    k: int
        The number of closest neighbors that will determine the class of the 
        sample that we wish to predict.
    """
    def __init__(self, k=5):
        self.k = k

    def _vote(self, neighbor_labels):
        """ Return the most common class among the neighbor samples """
        counts = np.bincount(neighbor_labels.astype('int'))
        return counts.argmax()

    def predict(self, X_test, X_train, y_train):
        y_pred = np.empty(X_test.shape[0])
        # Determine the class of each sample
        for i, test_sample in enumerate(X_test):
            # Sort the training samples by their distance to the test sample and get the K nearest
            idx = np.argsort([euclidean_distance(test_sample, x) for x in X_train])[:self.k]
            # Extract the labels of the K nearest neighboring training samples
            k_nearest_neighbors = np.array([y_train[i] for i in idx])
            # Label sample as the most common class label
            y_pred[i] = self._vote(k_nearest_neighbors)

        return y_pred

其中一些numpy中的函數用法：

numpy.bincount()

numpy.argmax()：

numpy.argsort()：返回排序后數組的索引

接着是其中使用到了euclidean_distance()：

def euclidean_distance(x1, x2):
    """ Calculates the l2 distance between two vectors """
    distance = 0
    # Squared distance between each coordinate
    for i in range(len(x1)):
        distance += pow((x1[i] - x2[i]), 2)
    return math.sqrt(distance)

這里使用的是l2距離。

運行的主函數：

from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

from mlfromscratch.utils import train_test_split, normalize, accuracy_score
from mlfromscratch.utils import euclidean_distance, Plot
from mlfromscratch.supervised_learning import KNN

def main():
    data = datasets.load_iris()
    X = normalize(data.data)
    y = data.target
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

    clf = KNN(k=5)
    y_pred = clf.predict(X_test, X_train, y_train)
    
    accuracy = accuracy_score(y_test, y_pred)

    print ("Accuracy:", accuracy)

    # Reduce dimensions to 2d using pca and plot the results
    Plot().plot_in_2d(X_test, y_pred, title="K Nearest Neighbors", accuracy=accuracy, legend_labels=data.target_names)


if __name__ == "__main__":
    main()

結果：

Accuracy: 0.9795918367346939

理論知識：來自統計學習方法

代碼來源：https://github.com/eriklindernoren/ML-From-Scratch

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 KNN（最近鄰算法）最近鄰算法（KNN） day-9 sklearn庫和python自帶庫實現最近鄰KNN算法 python最近鄰分類器KNN算法分類：kNN（k nearest neighbour）最近鄰算法（Python） Python 實現 KNN（K-近鄰）算法【機器學習】最近鄰算法KNN Python 實現 KD-Tree 最近鄰算法 KNN(k-nearest neighbor的縮寫)又叫最近鄰算法 KNN（K-Nearest Neighbor）最近鄰規則分類