CS231n Solver.py 詳解

本文轉載自查看原文 2016-06-14 09:02 1681 cs231n/ numpy

Solver是一個類，該類用於接收數據與標簽，對權值進行相應求解，在solver類中調整一些超參數以達到最好的訓練效果。

成員函數

初始化函數

 1 def __init__(self, model, data, **kwargs):
 2     """
 3     Construct a new Solver instance.
 4     
 5     Required arguments:
 6     - model: A model object conforming to the API described above
 7     - data: A dictionary of training and validation data with the following:
 8       'X_train': Array of shape (N_train, d_1, ..., d_k) giving training images
 9       'X_val': Array of shape (N_val, d_1, ..., d_k) giving validation images
10       'y_train': Array of shape (N_train,) giving labels for training images
11       'y_val': Array of shape (N_val,) giving labels for validation images
12       
13     Optional arguments:
14     - update_rule: A string giving the name of an update rule in optim.py.
15       Default is 'sgd'.
16     - optim_config: A dictionary containing hyperparameters that will be
17       passed to the chosen update rule. Each update rule requires different
18       hyperparameters (see optim.py) but all update rules require a
19       'learning_rate' parameter so that should always be present.
20     - lr_decay: A scalar for learning rate decay; after each epoch the learning
21       rate is multiplied by this value.
22     - batch_size: Size of minibatches used to compute loss and gradient during
23       training.
24     - num_epochs: The number of epochs to run for during training.
25     - print_every: Integer; training losses will be printed every print_every
26       iterations.
27     - verbose: Boolean; if set to false then no output will be printed during
28       training.
29     """
30     self.model = model
31     self.X_train = data['X_train']
32     self.y_train = data['y_train']
33     self.X_val = data['X_val']
34     self.y_val = data['y_val']
35     
36     # Unpack keyword arguments
37     self.update_rule = kwargs.pop('update_rule', 'sgd')
38     self.optim_config = kwargs.pop('optim_config', {})
39     self.lr_decay = kwargs.pop('lr_decay', 1.0)
40     self.batch_size = kwargs.pop('batch_size', 100)
41     self.num_epochs = kwargs.pop('num_epochs', 10)
42 
43     self.print_every = kwargs.pop('print_every', 100)
44     self.verbose = kwargs.pop('verbose', True)
45 
46     # Throw an error if there are extra keyword arguments
47     if len(kwargs) > 0:
48       extra = ', '.join('"%s"' % k for k in kwargs.keys())
49       raise ValueError('Unrecognized arguments %s' % extra)
50 
51     # Make sure the update rule exists, then replace the string
52     # name with the actual function
53     if not hasattr(optim, self.update_rule):
54       raise ValueError('Invalid update_rule "%s"' % self.update_rule)
55     self.update_rule = getattr(optim, self.update_rule)
56 
57     self._reset()

初始化函數接收的變量有：

（1）模型model，這本是一個類對象，定義了網絡的結構特征，和數據，優化方法等沒有關系，就是單純的一個網絡結構，包含了網絡前向后向的計算函數。

（2）數據data，這是一個結構體，包含了訓練集：X_train。驗證集X_val。訓練標簽：y_train。驗證標簽：y_val

（3）第三個參數**kwargs是指將輸入的量寫成一個字典的形式。在初始化函數中會依次進行pop，如果沒有設定某些值就賦予一個默認值

重置函數

 1 def _reset(self):
 2     """
 3     Set up some book-keeping variables for optimization. Don't call this
 4     manually.
 5     """
 6     # Set up some variables for book-keeping
 7     self.epoch = 0
 8     self.best_val_acc = 0
 9     self.best_params = {}
10     self.loss_history = []
11     self.train_acc_history = []
12     self.val_acc_history = []
13 
14     # Make a deep copy of the optim_config for each parameter
15     self.optim_configs = {}
16     for p in self.model.params:
17       d = {k: v for k, v in self.optim_config.iteritems()}
18       self.optim_configs[p] = d

重置函數對一些solver類中的變量進行了重置。特別注意的是新建了一個

optim_configs字典來存儲優化的參數，之前的優化參數保存在self.optim_config字典中，這兩個是完全不一樣的！！

_step函數

 1 def _step(self):
 2     """
 3     Make a single gradient update. This is called by train() and should not
 4     be called manually.
 5     """
 6     # Make a minibatch of training data
 7     num_train = self.X_train.shape[0] %確定有多少個訓練集樣本
 8     batch_mask = np.random.choice(num_train, self.batch_size) % 從中隨機選擇出batch_size這么多個
 9     X_batch = self.X_train[batch_mask] % 從訓練集中截取
10     y_batch = self.y_train[batch_mask] % 截取對應的標志
11 
12     # Compute loss and gradient %計算損失函數和梯度
13     loss, grads = self.model.loss(X_batch, y_batch) % 調用模型的loss函數進行計算
14     self.loss_history.append(loss) % 將loss值存入一個向量中，后面會plot出來。注意每一個loss都是用一個batch這么多數據求出來的
15 
16     # Perform a parameter update
17     for p, w in self.model.params.iteritems():
18       dw = grads[p]
19       config = self.optim_configs[p]
20       next_w, next_config = self.update_rule(w, dw, config)% 注意這里！！，之前使用過getattr函數，所以成了一個函數
21       self.model.params[p] = next_w
22       self.optim_configs[p] = next_config

check_accuracy函數

 1 def check_accuracy(self, X, y, num_samples=None, batch_size=100):
 2     """
 3     Check accuracy of the model on the provided data.
 4     
 5     Inputs:
 6     - X: Array of data, of shape (N, d_1, ..., d_k)
 7     - y: Array of labels, of shape (N,)
 8     - num_samples: If not None, subsample the data and only test the model
 9       on num_samples datapoints.
10     - batch_size: Split X and y into batches of this size to avoid using too
11       much memory.
12       
13     Returns:
14     - acc: Scalar giving the fraction of instances that were correctly
15       classified by the model.
16     """
17     
18     # Maybe subsample the data
19     N = X.shape[0] % 輸入例子的個數
20     if num_samples is not None and N > num_samples: % 例子太多隨機抽取一些子類
21       mask = np.random.choice(N, num_samples)
22       N = num_samples
23       X = X[mask] % 隨機抽取一些子例子
24       y = y[mask]
25 
26     # Compute predictions in batches
27     num_batches = N / batch_size % 看看N可以分成幾個batch
28     if N % batch_size != 0: %如果不能整除
29       num_batches += 1 % 分成的份數加1
30     y_pred = [] %預測值
31     for i in xrange(num_batches): %對每一份例子進行循環
32       start = i * batch_size % 選出當前的例子：這是開頭
33       end = (i + 1) * batch_size % 選出當前的例子： 這是結尾
34       scores = self.model.loss(X[start:end]) % 對開頭結尾之間的例子進行預測
35       y_pred.append(np.argmax(scores, axis=1)) %將預測后的值取最大值代表該例子的類別，並鏈接
36     y_pred = np.hstack(y_pred) %將所有的預測合在一起
37     acc = np.mean(y_pred == y) % 求一個平均，做為准確率
38 
39     return acc % 返回准確率

之所以我們分成batch來求，然后合在一起，是為了防止例子過多，內存裝不下。

train函數

 1 def train(self):
 2     """
 3     Run optimization to train the model.
 4     """
 5     num_train = self.X_train.shape[0] % 讀取訓練的例子的個數
 6     iterations_per_epoch = max(num_train / self.batch_size, 1) % 在下面進行解釋
 7     num_iterations = self.num_epochs * iterations_per_epoch
 8 
 9     for t in xrange(num_iterations): % 對每一個iteration進行循環！！
10       self._step() % 更新一下。每次更新都是從所有例子中，抽取batch_size個例子，所以batch越小，要想覆蓋所有的數據集
所需要的迭代次數越多，也就解釋了上面的iterations_per_epoch的來源
11 
12       # Maybe print training loss
13       if self.verbose and t % self.print_every == 0: % 在計算過程中觀察中間結果，
14         print '(Iteration %d / %d) loss: %f' % ( %可見print_every后面是迭代的次數
15                t + 1, num_iterations, self.loss_history[-1]) % 不是epoch的次數
16 
17       # At the end of every epoch, increment the epoch counter and decay the
18       # learning rate.
19       epoch_end = (t + 1) % iterations_per_epoch == 0 由於每個epoch是由一些iteration組成
20       if epoch_end: %如果到達了足夠多的iteration，也就是epoch結束了
21         self.epoch += 1 % epoch加 1
22         for k in self.optim_configs:  % 所有的learning_rate都要decay
23           self.optim_configs[k]['learning_rate'] *= self.lr_decay
24 
25       # Check train and val accuracy on the first iteration, the last
26       # iteration, and at the end of each epoch.
27       first_it = (t == 0) % 在第一個和最后一個iteration，以及epoch結束時檢查acc
28       last_it = (t == num_iterations + 1) %
29       if first_it or last_it or epoch_end: % 計算train和val的acc
30         train_acc = self.check_accuracy(self.X_train, self.y_train,
31                                         num_samples=1000)
32         val_acc = self.check_accuracy(self.X_val, self.y_val)
33         self.train_acc_history.append(train_acc)% 將兩個的acc進行記錄
34         self.val_acc_history.append(val_acc)
35 
36         if self.verbose:
37           print '(Epoch %d / %d) train acc: %f; val_acc: %f' % (
38                  self.epoch, self.num_epochs, train_acc, val_acc)
39 
40         # Keep track of the best model
41         if val_acc > self.best_val_acc:
42           self.best_val_acc = val_acc
43           self.best_params = {}
44           for k, v in self.model.params.iteritems():
45             self.best_params[k] = v.copy()
46 
47     # At the end of training swap the best params into the model
48     self.model.params = self.best_params

iterations_per_epoch和num_iterations比較奇怪

（1）iterations_per_epoch：用訓練集中例子的個數除以batch的個數，如果小於1就取1.

比如訓練集有10000個例子，一個batch取100個例子，那么該變量為100。代表在一個epoch中迭代100次？

比如訓練集有10000個例子，一個batch取50個例子，那么改變量為200，代表在一個epoch中迭代200次？

一個batch越小，一個epoch中迭代的次數越大。

（2）num_iterations：用self.num_epochs的個數，乘以上面的每個epoch中迭代的次數，就是總的迭代數。

（3）在每一個epoch結束的時候，對learning_rate進行decay

打法

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 cs231n官方note筆記 cs231n（三）誤差反向傳播【CS231N】3、Softmax分類器 CS231N Assignment4 Two Layer Net CS231n assignment2 Q2 Batch Normalization 『cs231n』RNN之理解LSTM網絡『cs231n』通過代碼理解風格遷移 CS231n -Assignments 1 Q1 and Q2 斯坦福大學CS231n簡要筆記和課后作業斯坦福大學cs231n作業參考（中文版）