cs231n assignment1 KNN



title: cs231n assignment1 KNN
tags:
- KNN
- cs231n
categories:
- 機器學習
date: 2019年9月16日 17:03:13


利用KNN算法做圖像分類。python2.7環境

首先運行cs231n/datasets下的get_datasets.sh獲取數據集,如果你是windows用戶,也可以在網盤下載后解壓到datasets里。

鏈接: https://pan.baidu.com/s/1KMh7OoXAX3etAwIflorilg 提取碼: q1rd

k-Nearest Neighbor (kNN) exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

The kNN classifier consists of two stages:

  • During training, the classifier takes the training data and simply remembers it
  • During testing, kNN classifies every test image by comparing to all training images and transfering the labels of the k most similar training examples
  • The value of k is cross-validated

In this exercise you will implement these steps and understand the basic Image Classification pipeline, cross-validation, and gain proficiency in writing efficient, vectorized code.

載入后的數據集里有50000個訓練集和10000個測試集

cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print 'Training data shape: ', X_train.shape
print 'Training labels shape: ', y_train.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape
Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)

為了減少運算量,訓練集和測試集分別只取5000和500個

num_training = 5000
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

num_test = 500
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

將數據拉成二維向量

X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
print X_train.shape, X_test.shape
out:
(5000, 3072) (500, 3072)

接下來修改cs231n/classifiers/k_nearest_neighbor.py
先實現用兩層循環求測試集和訓練集的L2距離

 for i in xrange(num_test):
      for j in xrange(num_train):
        #####################################################################
        # TODO:                                                             #
        # Compute the l2 distance between the ith test point and the jth    #
        # training point, and store the result in dists[i, j]. You should   #
        # not use a loop over dimension.                                    #
        #####################################################################
        # pass
        dists[i][j] = np.sqrt(np.sum(np.square(X[i] - self.X_train[j])))
        #####################################################################
        #                       END OF YOUR CODE                            #
        #####################################################################
    return dists

用一層循環,利用了python的廣播機制。

for i in xrange(num_test):
      #######################################################################
      # TODO:                                                               #
      # Compute the l2 distance between the ith test point and all training #
      # points, and store the result in dists[i, :].                        #
      #######################################################################
      # pass
      dists[i] = np.sqrt(np.sum(np.square(self.X_train - X[i]), axis = 1))
      #######################################################################
      #                         END OF YOUR CODE                            #
      #######################################################################
    return dists

不用循環的方式比較難理解,推薦https://blog.csdn.net/zhyh1435589631/article/details/54236643
如果測試集X是MxD,訓練集self.X_train是NxD,那么 d1是MxN,d2.shape=(N,)可以認為是N維行向量,d3是M維列向量,所以可以相加,也是利用的python的廣播機制。

 #########################################################################
    # TODO:                                                                 #
    # Compute the l2 distance between all test points and all training      #
    # points without using any explicit loops, and store the result in      #
    # dists.                                                                #
    #                                                                       #
    # You should implement this function using only basic array operations; #
    # in particular you should not use functions from scipy.                #
    #                                                                       #
    # HINT: Try to formulate the l2 distance using matrix multiplication    #
    #       and two broadcast sums.                                         #
    #########################################################################
    # pass
    d1 = -2*np.dot(X, self.X_train.T)
    d2 = np.sum(np.square(self.X_train), axis=1)
    d3 = np.sum(np.square(X), axis=1)
    d3 = d3.reshape(d3.shape[0],1)
    dists = np.sqrt(d1+d2+d3)
    #dists = np.sqrt(-2*np.dot(X, self.X_train.T) + np.sum(np.square(self.X_train), axis = 1) + np.transpose([np.sum(np.square(X), axis = 1)]))
    #########################################################################
    #                         END OF YOUR CODE                              #
    #########################################################################
    return dists

根據得到的dists對測試圖像作出預測k=5時表示利用5個圖像投票作出預測

def predict_labels(self, dists, k=1):
    """
    Given a matrix of distances between test points and training points,
    predict a label for each test point.

    Inputs:
    - dists: A numpy array of shape (num_test, num_train) where dists[i, j]
      gives the distance betwen the ith test point and the jth training point.

    Returns:
    返回的y是一個一維矩陣,y[i]表示對第i個測試圖像的預測分類,結果是0-9
    - y: A numpy array of shape (num_test,) containing predicted labels for the
      test data, where y[i] is the predicted label for the test point X[i].  
    """
    num_test = dists.shape[0]
    y_pred = np.zeros(num_test)
    for i in xrange(num_test):
      # A list of length k storing the labels of the k nearest neighbors to
      # the ith test point.
      closest_y = []
      #########################################################################
      # TODO:                                                                 #
      # Use the distance matrix to find the k nearest neighbors of the ith    #
      # testing point, and use self.y_train to find the labels of these       #
      # neighbors. Store these labels in closest_y.                           #
      # Hint: Look up the function numpy.argsort.                             #
      #########################################################################
      # pass     
      # np.argsort()返回由小到大排序后的下標,比如
      # np.argsort([4,2,5,1]) return [3,1,0,2]
      # 排序后取前k個,dists存的是相近的圖像,而y_train轉換成圖像的分類(標簽)
      closest_y = self.y_train[np.argsort(dists[i])[:k]]
      #########################################################################
      # TODO:                                                                 #
      # Now that you have found the labels of the k nearest neighbors, you    #
      # need to find the most common label in the list closest_y of labels.   #
      # Store this label in y_pred[i]. Break ties by choosing the smaller     #
      # label.                                                                #
      #########################################################################
      # pass
      # np.bincount()返回索引出現的次數,比如:
      # x = np.array([0, 1, 1, 3, 2, 1, 7])
      # np.bincount(x)  out:array([1, 3, 1, 1, 0, 0, 0, 1])
      # argmax()返回最大數的下標
      y_pred[i] = np.argmax(np.bincount(closest_y))
      #########################################################################
      #                           END OF YOUR CODE                            # 
      #########################################################################

    return y_pred

k_nearest_neighor.py補全完。

繼續進行,你會看到得到的dists的維度是(500,5000),表示500測試集和5000訓練集的差,值越小表明圖像越相似。如果將dists畫出來,將是類似下面的圖:

dists.png

黑色表示值低,白色表示值高。
如果有一個測試數據和很多訓練數據都相似,你將看到暗色的一條線。

我們可以通過classifier.predict_labels(dists, k=1)來從dists里提取最相近的圖像的分類,並計算識別率。

y_test_pred = classifier.predict_labels(dists, k=1)

# Compute and print the fraction of correctly predicted examples
num_correct = np.sum(y_test_pred == y_test)
accuracy = float(num_correct) / num_test
print 'Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy)

out: Got 137 / 500 correct => accuracy: 0.274000

如果我們嘗試不同的k,結果也會有所不同。比如k=5時的識別率可能是0.278

之后我們可以測試下k_nearest_neighbor.py里循環的不同實現方式速度的差別,在我的機器上是:

Two loop version took 29.503677 seconds
One loop version took 155.006175 seconds
No loop version took 0.291267 seconds

交叉驗證

在使用訓練集對參數進行訓練的時候,經常會發現人們通常會將一整個訓練集分為三個部分(比如mnist手寫訓練集)。一般分為:訓練集(train_set),評估集(valid_set),測試集(test_set)這三個部分。這其實是為了保證訓練效果而特意設置的。其中測試集很好理解,其實就是完全不參與訓練的數據,僅僅用來觀測測試效果的數據。而訓練集和評估集則牽涉到下面的知識了。

因為在實際的訓練中,訓練的結果對於訓練集的擬合程度通常還是挺好的(初始條件敏感),但是對於訓練集之外的數據的擬合程度通常就不那么令人滿意了。因此我們通常並不會把所有的數據集都拿來訓練,而是分出一部分來(這一部分不參加訓練)對訓練集生成的參數進行測試,相對客觀的判斷這些參數對訓練集之外的數據的符合程度。這種思想就稱為交叉驗證(Cross Validation)

我們可以通過交叉驗證找到使得識別率最高的k的值。

本次試驗我們把訓練集分成5部分放入X_train_folds和y_train_folds,其中y_train_folds[i]就是對應X_train_folds[i]的標簽。

################################################################################
# TODO:                                                                        #
# Split up the training data into folds. After splitting, X_train_folds and    #
# y_train_folds should each be lists of length num_folds, where                #
# y_train_folds[i] is the label vector for the points in X_train_folds[i].     #
# Hint: Look up the numpy array_split function.                                #
################################################################################
# pass
# 將y_train拉成列向量
y_train_ = y_train.reshape(-1, 1)
#使用np.array_split將向量分成等長的num_folds份
X_train_folds , y_train_folds = np.array_split(X_train, num_folds), np.array_split(y_train_, num_folds)
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

使用k_to_accuracies = {}存儲運算結果,k_to_accuracies是一個字典類型,其中k_to_accuracies[i]存儲一個長度為num_folds的list,表示k=i時的交叉驗證精度。

################################################################################
# pass
for k_ in k_choices:
    k_to_accuracies.setdefault(k_, [])
for i in range(num_folds):
    classifier = KNearestNeighbor()
    X_val_train = np.vstack(X_train_folds[0:i] + X_train_folds[i+1:])
    y_val_train = np.vstack(y_train_folds[0:i] + y_train_folds[i+1:])
    y_val_train = y_val_train[:,0]
    classifier.train(X_val_train, y_val_train)
    for k_ in k_choices:
        y_val_pred = classifier.predict(X_train_folds[i], k=k_)
        num_correct = np.sum(y_val_pred == y_train_folds[i][:,0])
        accuracy = float(num_correct) / len(y_val_pred)
        k_to_accuracies[k_] = k_to_accuracies[k_] + [accuracy]
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

部分結果如下:

k = 1, accuracy = 0.263000
k = 1, accuracy = 0.257000
k = 1, accuracy = 0.264000
k = 1, accuracy = 0.278000
k = 1, accuracy = 0.266000
k = 3, accuracy = 0.239000
k = 3, accuracy = 0.249000
k = 3, accuracy = 0.240000
k = 3, accuracy = 0.266000
k = 3, accuracy = 0.254000
k = 5, accuracy = 0.248000
k = 5, accuracy = 0.266000
k = 5, accuracy = 0.280000
k = 5, accuracy = 0.292000
k = 5, accuracy = 0.280000
k = 8, accuracy = 0.262000
k = 8, accuracy = 0.282000
k = 8, accuracy = 0.273000
k = 8, accuracy = 0.290000

畫出圖來就是這樣

image.png

可以看到最高精度在k=10處取得,因此我們將k設置為10,重新計算knn的識別精度

# Based on the cross-validation results above, choose the best value for k,   
# retrain the classifier using all the training data, and test it on the test
# data. You should be able to get above 28% accuracy on the test data.
best_k = 10

classifier = KNearestNeighbor()
classifier.train(X_train, y_train)
y_test_pred = classifier.predict(X_test, k=best_k)

# Compute and display the accuracy
num_correct = np.sum(y_test_pred == y_test)
accuracy = float(num_correct) / num_test
print 'Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy)

out: Got 141 / 500 correct => accuracy: 0.282000

得到精度0.282,雖然只有少許提示,但改變k的值並不會明顯增加計算復雜度,所以哪怕只有少量的提升也是有意義的。

The End!


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM