深度學習原理與框架-神經網絡-cifar10分類(代碼) 1.np.concatenate(進行數據串接) 2.np.hstack(將數據橫着排列) 3.hasattr(判斷.py文件的函數是否存在) 4.reshape(維度重構) 5.tanspose(維度位置變化) 6.pickle.load(f文件讀入) 7.np.argmax(獲得最大值索引) 8.np.maximum(閾值比較)

本文轉載自查看原文 2019-03-06 23:59 963

橫1. np.concatenate(list, axis=0) 將數據進行串接，這里主要是可以將列表進行x軸獲得y軸的串接

參數說明：list表示需要串接的列表，axis=0，表示從上到下進行串接

2.np.hstack(list) 將列表進行橫向排列

參數說明：list.append([1, 2]), list.append([3, 4]) np.hstack(list) , list等於[1, 2, 3, 4]

3. hasattr(optim, 'sgd') 判斷optim.py中是否存在一個函數名為'sgd'的函數， ‘sgd’ = getattr(optim, 'sgd') 把函數功能賦予給‘sgd’字符串

4..reshape(N, 3, 32, 32) 將輸入的數據array進行維度的變化

參數說明:N表示樣本數,3, 32, 32第二個維度變為3，第三個維度變為32，第四個維度變為32

5..transpose(0, 2, 3, 1) # 進行矩陣維度的位置替換，

參數說明：0表示第一個位置，2表示把三號位置的維度變為二號位置，3表示把三號位置的維度變為2號位置，1表示將1號位置變為4號位置

6.pick.load(f, encoding='latin1') # 將二進制打開的文件f,使用pick.load進行讀入

參數說明：f需要讀入的二進制文件，encoding表示讀入的方式

7.np.argmax(score, axis=1) 表示返回行上最大值得索引值

參數說明：score表示輸入的得分值，axis=1表示從左到右進行表示

8.np.maximum(0, x) 如果x小於0，則輸入0，否者輸出x

參數說明：0表示閾值，x表示輸入參數

cifar神經網絡的代碼說明：

數據主要分為三部分：

第一部分：數據的准備

第二部分：神經網絡模型的構造，返回loss和梯度值

第三部分：將數據與模型輸入到函數中，用於進行模型的訓練，同時進行驗證集的預測，來判斷驗證集的預測結果，保留最好的驗證集結果的參數組合

第一部分：數據的准備

第一步：構造列表，使用with open() as f: pickle.load進行數據的載入, 使用.reshape(1000, 3, 32, 32).transpose(0, 3, 1, 2).astype('float')

第二步：使用np.concatenate()將列表進行串接, 選出5000個數據集做為訓練集，選擇5000到5500個數據做為驗證集，從測試集的數據中挑選出500個數據作為測試集

第三步：將返回的訓練樣本和測試樣本進行拆分，從訓練樣本中取出5000個數據作為訓練集，500個樣本作為驗證集，從測試數據中取出500個數據作為測試集

第四步：將圖像樣本減去均值), 即- np.mean(train_X, axis=0) ,並使用transpose()將樣本數據的維度進行變換

第五步：返回訓練樣本，驗證集，測試集的字典

代碼：data_utils.py

import pickle as pickle
import numpy as np
import os
import importlib
import sys
importlib.reload(sys)
#from scipy.misc import imread

def load_CIFAR_batch(filename):
  """ load single batch of cifar """
  # 第一步：使用pick.load讀取數據，使用.reshape進行矩陣變化和.tanspose進行維度變化
  with open(filename, 'rb') as f:
    datadict = pickle.load(f, encoding='latin1')
    X = datadict['data']
    Y = datadict['labels']
    X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float")
    Y = np.array(Y)
    return X, Y

def load_CIFAR10(ROOT):
  """ load all of cifar """
  xs = []
  ys = []
  # 第二步：使用列表數據添加，並使用np.concatenate進行串接，去除矩陣的維度
  for b in range(1,2):
    f = os.path.join(ROOT, 'data_batch_%d' % (b, ))
    X, Y = load_CIFAR_batch(f)
    xs.append(X)
    ys.append(Y)
  # 將數據進行串接
  Xtr = np.concatenate(xs)
  Ytr = np.concatenate(ys)
  del X, Y
  # 加載測試數據
  Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch'))
  return Xtr, Ytr, Xte, Yte


def get_CIFAR10_data(num_training=5000, num_validation=500, num_test=500):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for classifiers. These are the same steps as we used for the SVM, but
    condensed to a single function.
    """
    # Load the raw CIFAR-10 data

    cifar10_dir = 'D://BaiduNetdiskDownload//神經網絡入門基礎（PPT，代碼）//紲炵粡緗戠粶鍏ラ棬鍩虹錛圥PT錛屼唬鐮侊級//cifar-10-batches-py//'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
    print(X_train.shape)
    # Subsample the data
    # 第三步：將返回訓練樣本和測試樣本，進行數據的拆分，分出5000個訓練集，驗證集和測試集
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean image
    # 第四步:減去圖片的均值，將訓練集，驗證集和測試集
    mean_image = np.mean(X_train, axis=0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image
    
    # Transpose so that channels come first
    X_train = X_train.transpose(0, 3, 1, 2).copy()
    X_val = X_val.transpose(0, 3, 1, 2).copy()
    X_test = X_test.transpose(0, 3, 1, 2).copy()

    # Package data into a dictionary
    # 第五步：返回訓練集，驗證集和測試集的字典
    return {
      'X_train': X_train, 'y_train': y_train,
      'X_val': X_val, 'y_val': y_val,
      'X_test': X_test, 'y_test': y_test,
    }

第二部分：神經網絡模型的構造，用於計算損失值和梯度

第一步：def __iniit(數據維度，隱藏層維度，輸出層維度，權重初始值范圍，正則化懲罰項初始化)

第二步：構造初始化的self.params用於存放權重參數，初始化權重參數w1,b1, w2, b2

第三步：構造loss函數,前向傳播求得loss，反向傳播計算各個權重的梯度值

前向傳播：

第一步：對輸入的X進行第一次的前向傳播，包括x * w + b 線性變化和relu激活層函數np.maximum(0, x)

第二步：對第一層的輸出結果，在第二層進行線性變化x * w + b，獲得各個類別得分

第三步：如果沒有標簽，作為預測結果，直接返回得分值

第四步：計算類別的概率值softmax, e^(x-max(x)) / ∑( e^(x-max(x)) ),使用np.sum(-np.log(prob([np.arange(N), y]))) 來表示交叉熵損失函數

第五步：求得softmax / dx 的值為， softmax - 1，即prob[np.arange(x), y] - 1, 將損失值和softmax對應於x的梯度進行返回
反向傳播：

第一步：對於前向傳播求得的softmax/dx獲得的導數值dout，將其回傳到第二層，求得dx(用於第一層的回傳)，dw2， db2 = dout * w(第二層的權重w), dout * x(第二層輸入)， np.sum(dout, axis=0)

第二步：對於第二層回傳的dx，進行第一層的回傳，第一層進行了兩步操作，第一步是線性變化，第二步是relu激活層，先對激活層進行回傳，對於激活層的回傳，輸入值大於0的，回傳的結果不變，輸入值小於0的，回傳的結果為0，即dx[x<0] = 0 ，將回傳的結果用於線性dx, dw1, db1與上述步驟相同

第三步：將求得的dw2，db2，dw1， db1保存在grads中，將loss和梯度值進行返回

主代碼：fc_net.py

from layer_utils import *
import numpy as np
class TwoLayerNet(object):   
    # 第一步：構造初始化超參數,在書寫代碼的時候可以使用
    def __init__(self, input_dim=3*32*32, hidden_dim=100, num_classes=10,           
                              weight_scale=1e-3, reg=0.0):    
        """    
        Initialize a new network.   
        Inputs:    
        - input_dim: An integer giving the size of the input    
        - hidden_dim: An integer giving the size of the hidden layer    
        - num_classes: An integer giving the number of classes to classify    
        - dropout: Scalar between 0 and 1 giving dropout strength.    
        - weight_scale: Scalar giving the standard deviation for random 
                        initialization of the weights.    
        - reg: Scalar giving L2 regularization strength.    
        """
        # 第二步：構造權重字典，並且進行w1,b1,w2,b2的權重初始化
        self.params = {}    
        self.reg = reg   
        self.params['W1'] = weight_scale * np.random.randn(input_dim, hidden_dim)     
        self.params['b1'] = np.zeros((1, hidden_dim))    
        self.params['W2'] = weight_scale * np.random.randn(hidden_dim, num_classes)  
        self.params['b2'] = np.zeros((1, num_classes))

    # 第三步：構造loss函數用於進行前向傳播和反向傳播，返回loss和權重梯度grads
    def loss(self, X, y=None):    
        """   
        Compute loss and gradient for a minibatch of data.    
        Inputs:    
        - X: Array of input data of shape (N, d_1, ..., d_k)    
        - y: Array of labels, of shape (N,). y[i] gives the label for X[i].  
        Returns:   
        If y is None, then run a test-time forward pass of the model and return:    
        - scores: Array of shape (N, C) giving classification scores, where              
                  scores[i, c] is the classification score for X[i] and class c. 
        If y is not None, then run a training-time forward and backward pass and    
        return a tuple of:    
        - loss: Scalar value giving the loss   
        - grads: Dictionary with the same keys as self.params, mapping parameter             
                 names to gradients of the loss with respect to those parameters.    
        """
        # 前向傳播，計算得分和損失值
        scores = None
        N = X.shape[0]
        # Unpack variables from the params dictionary
        # 權重參數w和b,
        # 獲得當前的參數值
        W1, b1 = self.params['W1'], self.params['b1']
        W2, b2 = self.params['W2'], self.params['b2']
        # 第一步：第一層神經網絡進行線性變化和relu變化 第一層的輸出結果
        h1, cache1 = affine_relu_forward(X, W1, b1)
        # 第二步：第二層神經網絡進行線性變化
        out, cache2 = affine_forward(h1, W2, b2)
        scores = out              # (N,C)
        # 第三步：如果沒有labels，直接返回得分值作為預測結果
        if y is None:   
            return scores
        # 第四步：計算損失值和softmax的反向傳播的結果
        loss, grads = 0, {}
        data_loss, dscores = softmax_loss(scores, y)
        # 加上L2正則化懲罰項
        reg_loss = 0.5 * self.reg * np.sum(W1*W1) + 0.5 * self.reg * np.sum(W2*W2)
        loss = data_loss + reg_loss

        # 反向傳播，用於計算梯度值
        # 第一步：計算傳到第二層的反向傳播的結果，即dw2和db2
        dh1, dW2, db2 = affine_backward(dscores, cache2)
        # 第二步：計算relu的反向傳播以及x*w + b 反向傳播的結果
        dX, dW1, db1 = affine_relu_backward(dh1, cache1)
        # Add the regularization gradient contribution
        # 加入正則化求導的梯度值dw2 和 dw1
        dW2 += self.reg * W2
        dW1 += self.reg * W1
        # 第三步：將梯度值加入到grads的字典中, 返回損失值和grads梯度值
        grads['W1'] = dW1
        grads['b1'] = db1
        grads['W2'] = dW2
        grads['b2'] = db2

        return loss, grads

調用代碼：layer_utils.py

from layers import *



def affine_relu_forward(x, w, b):
  """
  Convenience layer that perorms an affine transform followed by a ReLU

  Inputs:
  - x: Input to the affine layer
  - w, b: Weights for the affine layer

  Returns a tuple of:
  - out: Output from the ReLU
  - cache: Object to give to the backward pass
  """
  a, fc_cache = affine_forward(x, w, b)
  out, relu_cache = relu_forward(a)
  cache = (fc_cache, relu_cache)
  return out, cache


def affine_relu_backward(dout, cache):
  """
  Backward pass for the affine-relu convenience layer
  """
  fc_cache, relu_cache = cache
  da = relu_backward(dout, relu_cache)
  dx, dw, db = affine_backward(da, fc_cache)
  return dx, dw, db


pass


def conv_relu_forward(x, w, b, conv_param):
  """
  A convenience layer that performs a convolution followed by a ReLU.

  Inputs:
  - x: Input to the convolutional layer
  - w, b, conv_param: Weights and parameters for the convolutional layer
  
  Returns a tuple of:
  - out: Output from the ReLU
  - cache: Object to give to the backward pass
  """
  a, conv_cache = conv_forward_fast(x, w, b, conv_param)
  out, relu_cache = relu_forward(a)
  cache = (conv_cache, relu_cache)
  return out, cache


def conv_relu_backward(dout, cache):
  """
  Backward pass for the conv-relu convenience layer.
  """
  conv_cache, relu_cache = cache
  da = relu_backward(dout, relu_cache)
  dx, dw, db = conv_backward_fast(da, conv_cache)
  return dx, dw, db


def conv_relu_pool_forward(x, w, b, conv_param, pool_param):
  """
  Convenience layer that performs a convolution, a ReLU, and a pool.

  Inputs:
  - x: Input to the convolutional layer
  - w, b, conv_param: Weights and parameters for the convolutional layer
  - pool_param: Parameters for the pooling layer

  Returns a tuple of:
  - out: Output from the pooling layer
  - cache: Object to give to the backward pass
  """
  a, conv_cache = conv_forward_fast(x, w, b, conv_param)
  s, relu_cache = relu_forward(a)
  out, pool_cache = max_pool_forward_fast(s, pool_param)
  cache = (conv_cache, relu_cache, pool_cache)
  return out, cache


def conv_relu_pool_backward(dout, cache):
  """
  Backward pass for the conv-relu-pool convenience layer
  """
  conv_cache, relu_cache, pool_cache = cache
  ds = max_pool_backward_fast(dout, pool_cache)
  da = relu_backward(ds, relu_cache)
  dx, dw, db = conv_backward_fast(da, conv_cache)
  return dx, dw, db

調用代碼：layers

import numpy as np

def affine_forward(x, w, b):   
    """    
    Computes the forward pass for an affine (fully-connected) layer. 
    The input x has shape (N, d_1, ..., d_k) and contains a minibatch of N   
    examples, where each example x[i] has shape (d_1, ..., d_k). We will    
    reshape each input into a vector of dimension D = d_1 * ... * d_k, and    
    then transform it to an output vector of dimension M.    
    Inputs:    
    - x: A numpy array containing input data, of shape (N, d_1, ..., d_k)    
    - w: A numpy array of weights, of shape (D, M)    
    - b: A numpy array of biases, of shape (M,)   
    Returns a tuple of:    
    - out: output, of shape (N, M)    
    - cache: (x, w, b)   
    """
    out = None
    # Reshape x into rows
    N = x.shape[0]
    x_row = x.reshape(N, -1)         # (N,D)
    out = np.dot(x_row, w) + b       # (N,M)
    cache = (x, w, b)

    return out, cache

def affine_backward(dout, cache):   
    """    
    Computes the backward pass for an affine layer.    
    Inputs:    
    - dout: Upstream derivative, of shape (N, M)    
    - cache: Tuple of: 
    - x: Input data, of shape (N, d_1, ... d_k)    
    - w: Weights, of shape (D, M)    
    Returns a tuple of:   
    - dx: Gradient with respect to x, of shape (N, d1, ..., d_k)    
    - dw: Gradient with respect to w, of shape (D, M) 
    - db: Gradient with respect to b, of shape (M,)    
    """    
    x, w, b = cache    
    dx, dw, db = None, None, None   
    dx = np.dot(dout, w.T)                       # (N,D)    
    dx = np.reshape(dx, x.shape)                 # (N,d1,...,d_k)   
    x_row = x.reshape(x.shape[0], -1)            # (N,D)    
    dw = np.dot(x_row.T, dout)                   # (D,M)    
    db = np.sum(dout, axis=0, keepdims=True)     # (1,M)    

    return dx, dw, db

def relu_forward(x):   
    """    
    Computes the forward pass for a layer of rectified linear units (ReLUs).    
    Input:    
    - x: Inputs, of any shape    
    Returns a tuple of:    
    - out: Output, of the same shape as x    
    - cache: x    
    """   
    out = None    
    out = ReLU(x)    
    cache = x    

    return out, cache

def relu_backward(dout, cache):   
    """  
    Computes the backward pass for a layer of rectified linear units (ReLUs).   
    Input:    
    - dout: Upstream derivatives, of any shape    
    - cache: Input x, of same shape as dout    
    Returns:    
    - dx: Gradient with respect to x    
    """    
    dx, x = None, cache    
    dx = dout    
    dx[x <= 0] = 0    

    return dx

def svm_loss(x, y):   
    """    
    Computes the loss and gradient using for multiclass SVM classification.    
    Inputs:    
    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth class         
         for the ith input.    
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and         
         0 <= y[i] < C   
    Returns a tuple of:    
    - loss: Scalar giving the loss   
    - dx: Gradient of the loss with respect to x    
    """    
    N = x.shape[0]   
    correct_class_scores = x[np.arange(N), y]    
    margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)    
    margins[np.arange(N), y] = 0   
    loss = np.sum(margins) / N   
    num_pos = np.sum(margins > 0, axis=1)    
    dx = np.zeros_like(x)   
    dx[margins > 0] = 1    
    dx[np.arange(N), y] -= num_pos    
    dx /= N    

    return loss, dx

def softmax_loss(x, y):    
    """    
    Computes the loss and gradient for softmax classification.    Inputs:    
    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth class         
    for the ith input.    
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and         
         0 <= y[i] < C   
    Returns a tuple of:    
    - loss: Scalar giving the loss    
    - dx: Gradient of the loss with respect to x   
    """
    # 計算概率值
    probs = np.exp(x - np.max(x, axis=1, keepdims=True))    
    probs /= np.sum(probs, axis=1, keepdims=True)    
    N = x.shape[0]
    # 計算損失值函數
    loss = -np.sum(np.log(probs[np.arange(N), y])) / N
    # 計算softmax回傳即dsoftmax / dx 的結果
    dx = probs.copy()    
    dx[np.arange(N), y] -= 1    
    dx /= N    

    return loss, dx

def ReLU(x):    
    """ReLU non-linearity."""    
    return np.maximum(0, x)

第三部分：將數據與模型進行輸入，使用bacth_size獲得loss和grads，使用grads對參數params進行更新，使用驗證集數據來判別准確率，返回驗證集最好的參數集

第一步：獲得輸入字典中的各個參數值，如果不存在就使用默認值，多出來的參數值就報錯，使用hasattr判斷.py文件中是否有這個名字的函數,使用getattr將函數對應的功能復制給函數名

第二步：進行參數的初始化操作，比如epoch=0，以及一些loss和accuracy的列表，因為要進行動量梯度化，因此對學習率做一個參數名和學習率對應的字典

第三步：進行模型的參數訓練，使用num_sample / batch_size計算一個epoch循環的次數，使用epoch*every_epoch，計算所有的循環次數

第四步：循環，進行參數的更新：

第一步：從樣本中，隨機選擇batch_size個序列號，獲得一個batch_size的訓練數據和標簽值

第二步：帶入到model.loss函數中計算loss和grads

第三步：使用動量梯度下降v = 0.9*v - 學習率*dw， w = w + v 進行參數更新，同時返回w和v，對數據中的w和v進行更新

第五步：每循環一個epoch，學習率*0.9，降低一次，每次循環epoch計算一次訓練集和驗證集的准確率，使用model.loss，對於返回的score值，使用

(y == np.argmax(scores)).mean() 計算准確率

第六步：對於驗證集結果比最好驗證集結果要好的參數集，進行保存，在循環的末尾，將其賦值給當前的params

代碼：
主函數：Solver.py

import numpy as np

import optim


class Solver(object):
  """
  A Solver encapsulates all the logic necessary for training classification
  models. The Solver performs stochastic gradient descent using different
  update rules defined in optim.py.

  The solver accepts both training and validataion data and labels so it can
  periodically check classification accuracy on both training and validation
  data to watch out for overfitting.

  To train a model, you will first construct a Solver instance, passing the
  model, dataset, and various optoins (learning rate, batch size, etc) to the
  constructor. You will then call the train() method to run the optimization
  procedure and train the model.
  
  After the train() method returns, model.params will contain the parameters
  that performed best on the validation set over the course of training.
  In addition, the instance variable solver.loss_history will contain a list
  of all losses encountered during training and the instance variables
  solver.train_acc_history and solver.val_acc_history will be lists containing
  the accuracies of the model on the training and validation set at each epoch.
  
  Example usage might look something like this:
  
  data = {
    'X_train': # training data
    'y_train': # training labels
    'X_val': # validation data
    'X_train': # validation labels
  }
  model = MyAwesomeModel(hidden_size=100, reg=10)
  solver = Solver(model, data,
                  update_rule='sgd',
                  optim_config={
                    'learning_rate': 1e-3,
                  },
                  lr_decay=0.95,
                  num_epochs=10, batch_size=100,
                  print_every=100)
  solver.train()


  A Solver works on a model object that must conform to the following API:

  - model.params must be a dictionary mapping string parameter names to numpy
    arrays containing parameter values.

  - model.loss(X, y) must be a function that computes training-time loss and
    gradients, and test-time classification scores, with the following inputs
    and outputs:

    Inputs:
    - X: Array giving a minibatch of input data of shape (N, d_1, ..., d_k)
    - y: Array of labels, of shape (N,) giving labels for X where y[i] is the
      label for X[i].

    Returns:
    If y is None, run a test-time forward pass and return:
    - scores: Array of shape (N, C) giving classification scores for X where
      scores[i, c] gives the score of class c for X[i].

    If y is not None, run a training time forward and backward pass and return
    a tuple of:
    - loss: Scalar giving the loss
    - grads: Dictionary with the same keys as self.params mapping parameter
      names to gradients of the loss with respect to those parameters.
  """

  def __init__(self, model, data, **kwargs):
    """
    Construct a new Solver instance.
    
    Required arguments:
    - model: A model object conforming to the API described above
    - data: A dictionary of training and validation data with the following:
      'X_train': Array of shape (N_train, d_1, ..., d_k) giving training images
      'X_val': Array of shape (N_val, d_1, ..., d_k) giving validation images
      'y_train': Array of shape (N_train,) giving labels for training images
      'y_val': Array of shape (N_val,) giving labels for validation images
      
    Optional arguments:
    - update_rule: A string giving the name of an update rule in optim.py.
      Default is 'sgd'.
    - optim_config: A dictionary containing hyperparameters that will be
      passed to the chosen update rule. Each update rule requires different
      hyperparameters (see optim.py) but all update rules require a
      'learning_rate' parameter so that should always be present.
    - lr_decay: A scalar for learning rate decay; after each epoch the learning
      rate is multiplied by this value.
    - batch_size: Size of minibatches used to compute loss and gradient during
      training.
    - num_epochs: The number of epochs to run for during training.
    - print_every: Integer; training losses will be printed every print_every
      iterations.
    - verbose: Boolean; if set to false then no output will be printed during
      training.
    """
    # 第一步：樣本的初始化，並獲得輸入的字典的參數，使用pop來獲得參數值, 使用hasattr判斷函數是否存在
    # 使用getattr獲得函數功能的函數名
    self.model = model
    self.X_train = data['X_train']
    self.y_train = data['y_train']
    self.X_val = data['X_val']
    self.y_val = data['y_val']
    
    # Unpack keyword arguments
    # .pop表示的如果不傳入的話，就使用當前值做替代
    self.update_rule = kwargs.pop('update_rule', 'sgd')
    self.optim_config = kwargs.pop('optim_config', {})
    self.lr_decay = kwargs.pop('lr_decay', 1.0)
    self.batch_size = kwargs.pop('batch_size', 100)
    self.num_epochs = kwargs.pop('num_epochs', 10)

    self.print_every = kwargs.pop('print_every', 10)
    self.verbose = kwargs.pop('verbose', True)

    # Throw an error if there are extra keyword arguments
    if len(kwargs) > 0:
      extra = ', '.join('"%s"' % k for k in kwargs.keys())
      raise ValueError('Unrecognized arguments %s' % extra)

    # Make sure the update rule exists, then replace the string
    # name with the actual function
    if not hasattr(optim, self.update_rule):
      raise ValueError('Invalid update_rule "%s"' % self.update_rule)
    self.update_rule = getattr(optim, self.update_rule)

    self._reset()


  def _reset(self):
    """
    Set up some book-keeping variables for optimization. Don't call this
    manually.
    """
    # Set up some variables for book-keeping
    # 第二步：初始化參數，其次是對dw與參數名對應的字典，用於進行動量梯度下降
    self.epoch = 0
    self.best_val_acc = 0
    self.best_params = {}
    self.loss_history = []
    self.train_acc_history = []
    self.val_acc_history = []

    # Make a deep copy of the optim_config for each parameter
    self.optim_configs = {}
    for p in self.model.params:
      d = {k: v for k, v in self.optim_config.items()}
      self.optim_configs[p] = d


  def _step(self):
    """
    Make a single gradient update. This is called by train() and should not
    be called manually.
    """
    # Make a minibatch of training data
    num_train = self.X_train.shape[0]
    batch_mask = np.random.choice(num_train, self.batch_size)
    X_batch = self.X_train[batch_mask]
    y_batch = self.y_train[batch_mask]

    # Compute loss and gradient
    loss, grads = self.model.loss(X_batch, y_batch)
    self.loss_history.append(loss)

    # Perform a parameter update
    for p, w in self.model.params.items():
      dw = grads[p]
      config = self.optim_configs[p]
      next_w, next_config = self.update_rule(w, dw, config)
      self.model.params[p] = next_w
      self.optim_configs[p] = next_config


  def check_accuracy(self, X, y, num_samples=None, batch_size=100):
    """
    Check accuracy of the model on the provided data.
    
    Inputs:
    - X: Array of data, of shape (N, d_1, ..., d_k)
    - y: Array of labels, of shape (N,)
    - num_samples: If not None, subsample the data and only test the model
      on num_samples datapoints.
    - batch_size: Split X and y into batches of this size to avoid using too
      much memory.
      c
    Returns:
    - acc: Scalar giving the fraction of instances that were correctly
      classified by the model.
    """
    
    # 對訓練數據進行下采樣，用於進行訓練集樣本的准確率計算
    N = X.shape[0]
    if num_samples is not None and N > num_samples:
      mask = np.random.choice(N, num_samples)
      N = num_samples
      X = X[mask]
      y = y[mask]

    # Compute predictions in batches
    num_batches = N / batch_size
    if N % batch_size != 0:
      num_batches += 1
    y_pred = []
    # batch進行交叉驗證，每次只進行部分的結果預測。
    for i in range(int(num_batches)):
      start = i * batch_size
      end = (i + 1) * batch_size
      scores = self.model.loss(X[start:end])
      y_pred.append(np.argmax(scores, axis=1))
    y_pred = np.hstack(y_pred)
    acc = np.mean(y_pred == y)

    return acc


  def train(self):
    """
    Run optimization to train the model.
    """
    # 第三步：計算循環的次數
    num_train = self.X_train.shape[0]
    iterations_per_epoch = max(num_train / self.batch_size, 1)
    num_iterations = self.num_epochs * iterations_per_epoch

    for t in range(int(num_iterations)):
      # 第四步：每一次循環，隨機選擇一個batch_size的數據獲得損失值和更新的參數,將其存入，同時更新dw，用做下一次的動量梯度
      self._step()

      # Maybe print training loss
      # 打印當前的損失值
      if self.verbose and t % self.print_every == 0:
        print('(Iteration %d / %d) loss: %f' % (
               t + 1, num_iterations, self.loss_history[-1]))

      # At the end of every epoch, increment the epoch counter and decay the
      # learning rate.
      epoch_end = (t + 1) % iterations_per_epoch == 0
      # 第五步： 循環一個epoch值，對optim_configs進行學習率的更新
      if epoch_end:
        self.epoch += 1
        for k in self.optim_configs:
          self.optim_configs[k]['learning_rate'] *= self.lr_decay

      # Check train and val accuracy on the first iteration, the last
      # iteration, and at the end of each epoch.
      # 第六步：每經過一個epoch值，計算一次准確率
      first_it = (t == 0)
      last_it = (t == num_iterations + 1)
      if first_it or last_it or epoch_end:
        train_acc = self.check_accuracy(self.X_train, self.y_train,
                                        num_samples=1000)
        val_acc = self.check_accuracy(self.X_val, self.y_val)
        self.train_acc_history.append(train_acc)
        self.val_acc_history.append(val_acc)

        if self.verbose:
          print('(Epoch %d / %d) train acc: %f; val_acc: %f' % (
                 self.epoch, self.num_epochs, train_acc, val_acc))

        # Keep track of the best model
        if val_acc > self.best_val_acc:
          self.best_val_acc = val_acc
          self.best_params = {}
          for k, v in self.model.params.items():
            self.best_params[k] = v.copy()

    # At the end of training swap the best params into the model
    self.model.params = self.best_params

副函數：optim.py, 里面的參數更新的方法：sgd， sgd_momentum,rmsprop, adam

import numpy as np

def sgd(w, dw, config=None):    
    """    
    Performs vanilla stochastic gradient descent.    
    config format:    
    - learning_rate: Scalar learning rate.    
    """    
    if config is None: config = {}    
    config.setdefault('learning_rate', 1e-2)    
    w -= config['learning_rate'] * dw    

    return w, config

def sgd_momentum(w, dw, config=None):    
    """    
    Performs stochastic gradient descent with momentum.    
    config format:    
    - learning_rate: Scalar learning rate.    
    - momentum: Scalar between 0 and 1 giving the momentum value.                
    Setting momentum = 0 reduces to sgd.    
    - velocity: A numpy array of the same shape as w and dw used to store a moving    
    average of the gradients.   
    """   
    if config is None: config = {}    
    config.setdefault('learning_rate', 1e-2)   
    config.setdefault('momentum', 0.9)    
    v = config.get('velocity', np.zeros_like(w))    
    next_w = None    
    v = config['momentum'] * v - config['learning_rate'] * dw
    next_w = w + v
    config['velocity'] = v    

    return next_w, config

def rmsprop(x, dx, config=None):    
    """    
    Uses the RMSProp update rule, which uses a moving average of squared gradient    
    values to set adaptive per-parameter learning rates.    
    config format:    
    - learning_rate: Scalar learning rate.    
    - decay_rate: Scalar between 0 and 1 giving the decay rate for the squared                  
    gradient cache.    
    - epsilon: Small scalar used for smoothing to avoid dividing by zero.    
    - cache: Moving average of second moments of gradients.   
    """    
    if config is None: config = {}    
    config.setdefault('learning_rate', 1e-2)  
    config.setdefault('decay_rate', 0.99)    
    config.setdefault('epsilon', 1e-8)    
    config.setdefault('cache', np.zeros_like(x))    
    next_x = None    
    cache = config['cache']    
    decay_rate = config['decay_rate']    
    learning_rate = config['learning_rate']    
    epsilon = config['epsilon']    
    cache = decay_rate * cache + (1 - decay_rate) * (dx**2)    
    x += - learning_rate * dx / (np.sqrt(cache) + epsilon)  
    config['cache'] = cache    
    next_x = x    

    return next_x, config

def adam(x, dx, config=None):    
    """    
    Uses the Adam update rule, which incorporates moving averages of both the  
    gradient and its square and a bias correction term.    
    config format:    
    - learning_rate: Scalar learning rate.    
    - beta1: Decay rate for moving average of first moment of gradient.    
    - beta2: Decay rate for moving average of second moment of gradient.   
    - epsilon: Small scalar used for smoothing to avoid dividing by zero.    
    - m: Moving average of gradient.    
    - v: Moving average of squared gradient.    
    - t: Iteration number.   
    """    
    if config is None: config = {}    
    config.setdefault('learning_rate', 1e-3)    
    config.setdefault('beta1', 0.9)    
    config.setdefault('beta2', 0.999)    
    config.setdefault('epsilon', 1e-8)    
    config.setdefault('m', np.zeros_like(x))    
    config.setdefault('v', np.zeros_like(x))    
    config.setdefault('t', 0)   
    next_x = None    
    m = config['m']    
    v = config['v']    
    beta1 = config['beta1']    
    beta2 = config['beta2']    
    learning_rate = config['learning_rate']    
    epsilon = config['epsilon']   
    t = config['t']    
    t += 1    
    m = beta1 * m + (1 - beta1) * dx    
    v = beta2 * v + (1 - beta2) * (dx**2)    
    m_bias = m / (1 - beta1**t)    
    v_bias = v / (1 - beta2**t)    
    x += - learning_rate * m_bias / (np.sqrt(v_bias) + epsilon)    
    next_x = x    
    config['m'] = m    
    config['v'] = v    
    config['t'] = t    

    return next_x, config

上述代碼的主函數 two_layer_fc_net_start.py

import matplotlib.pyplot as plt
from fc_net import *
from data_utils import get_CIFAR10_data
from solver import Solver
import numpy as np
# 第一部分：讀取數據
data = get_CIFAR10_data()
# 第二部分：構建模型
model = TwoLayerNet(reg=0.9)
# 第三部分：進行模型的參數訓練
solver = Solver(model, data,                
                lr_decay=0.95,                
                print_every=100, num_epochs=40, batch_size=400, 
                update_rule='sgd_momentum',                
                optim_config={'learning_rate': 5e-4, 'momentum': 0.9})

solver.train()                 
# 畫圖
plt.subplot(2, 1, 1) 
plt.title('Training loss')
plt.plot(solver.loss_history, 'o')
plt.xlabel('Iteration')

plt.subplot(2, 1, 2)
plt.title('Accuracy')
plt.plot(solver.train_acc_history, '-o', label='train')
plt.plot(solver.val_acc_history, '-o', label='val')
plt.plot([0.5] * len(solver.val_acc_history), 'k--')
plt.xlabel('Epoch')
plt.legend(loc='lower right')
plt.gcf().set_size_inches(15, 12)
plt.show()

# 對模型計算驗證集和測試集
best_model = model
y_test_pred = np.argmax(best_model.loss(data['X_test']), axis=1)
y_val_pred = np.argmax(best_model.loss(data['X_val']), axis=1)
print('Validation set accuracy: ', (y_val_pred == data['y_val']).mean())
print('Test set accuracy: ', (y_test_pred == data['y_test']).mean())
# Validation set accuracy:  about 52.9%
# Test set accuracy:  about 54.7%


# Visualize the weights of the best network
"""
from vis_utils import visualize_grid

def show_net_weights(net):    
    W1 = net.params['W1']    
    W1 = W1.reshape(3, 32, 32, -1).transpose(3, 1, 2, 0)    
    plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))   
    plt.gca().axis('off')    
show_net_weights(best_model)
plt.show()
"""

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python：numpy數組拼接之np.concatenate、hstack 、vstack [轉]Python numpy函數hstack() vstack() stack() dstack() vsplit() concatenate() cifar10數據集訓練 CIFAR10/CIFAR100數據集介紹 np.max() 和 np.maximum()的區別 VGGNet實現cifar10數據集分類實戰keras——用CNN實現cifar10圖像分類 CIFAR10自定義網絡實戰簡單CNN的實現-基於Cifar10數據集 python 圖像分類問題 (cifar10)