一:定義
超參數是在開始學習過程之前設置值的參數,而不是通過訓練得到的參數數據。
二:常用超參數
k近鄰算法的k,權重weight,明可夫斯基距離公式的p,這三個參數都在KNeighborsClassifier類的構造函數中。
三:共同代碼
import numpy as np from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split from sklearn import datasets digits = datasets.load_digits() x = digits.data y = digits.target x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)
四:k的最優數值
best_score = 0.0
best_k = -1
for k in range(1,11):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(x_train,y_train)
t = knn.score(x_test,y_test)
if t>best_score:
best_score = t
best_k = k
print(best_k)
print(best_score)
五:weight的最優數值
如果取值為uniform,例如:當我們取k等於3,結果預測到三個點距離最近的點為三個,sklearn就會選擇一個進行返回預測結果,但是我們如果考慮距離也就是取值為distance,就會有一個權重的概念,一般為距離的倒數,例如該點到另外三個點的距離為1,3,4則權重為1,1/3,1/4,則返回1這個點作為預測結果。
best_score = 0.0
best_k = -1
best_method = ''
for method in ['uniform','distance']:
for k in range(1,11):
knn = KNeighborsClassifier(n_neighbors=k,weights=method)
knn.fit(x_train,y_train)
t = knn.score(x_test,y_test)
if t>best_score:
best_score = t
best_k = k
best_method = method
print(best_score)
print(best_k)
print(best_method)
六:p的最優數值
當需要p的參數時,weight必須為distance,不能為uniform
best_score = 0.0
best_k = -1
best_p = 1
for i in range(1,6):
for k in range(1,11):
knn = KNeighborsClassifier(n_neighbors=k,weights='distance',p=i)
knn.fit(x_train,y_train)
t = knn.score(x_test,y_test)
if t>best_score:
best_k = k
best_score = t
best_p = i
print(best_p)
print(best_score)
print(best_k)
