KNN算法的超參數


一:定義

  超參數是在開始學習過程之前設置值的參數,而不是通過訓練得到的參數數據。

二:常用超參數

  k近鄰算法的k,權重weight,明可夫斯基距離公式的p,這三個參數都在KNeighborsClassifier類的構造函數中。

三:共同代碼

 

import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import datasets

digits = datasets.load_digits()

x = digits.data
y = digits.target

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)

 

 

四:k的最優數值

best_score = 0.0
best_k = -1
for k in range(1,11):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(x_train,y_train)
    t = knn.score(x_test,y_test)
    if t>best_score:
        best_score = t
        best_k = k

print(best_k)
print(best_score)

 

五:weight的最優數值

  如果取值為uniform,例如:當我們取k等於3,結果預測到三個點距離最近的點為三個,sklearn就會選擇一個進行返回預測結果,但是我們如果考慮距離也就是取值為distance,就會有一個權重的概念,一般為距離的倒數,例如該點到另外三個點的距離為1,3,4則權重為1,1/3,1/4,則返回1這個點作為預測結果。

 

best_score = 0.0
best_k = -1
best_method = ''
for method in ['uniform','distance']:
    for k in range(1,11):
        knn = KNeighborsClassifier(n_neighbors=k,weights=method)
        knn.fit(x_train,y_train)
        t = knn.score(x_test,y_test)
        if t>best_score:
            best_score = t
            best_k = k
            best_method = method

print(best_score)
print(best_k)
print(best_method)

 

 

六:p的最優數值

  當需要p的參數時,weight必須為distance,不能為uniform

best_score = 0.0
best_k = -1
best_p = 1
for i in range(1,6):
    for k in range(1,11):
        knn = KNeighborsClassifier(n_neighbors=k,weights='distance',p=i)
        knn.fit(x_train,y_train)
        t = knn.score(x_test,y_test)
        if t>best_score:
            best_k = k
            best_score = t
            best_p = i
print(best_p)
print(best_score)
print(best_k)

 

   

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM