Solver是一個類,該類用於接收數據與標簽,對權值進行相應求解,在solver類中調整一些超參數以達到最好的訓練效果。
成員函數
初始化函數
1 def __init__(self, model, data, **kwargs): 2 """ 3 Construct a new Solver instance. 4 5 Required arguments: 6 - model: A model object conforming to the API described above 7 - data: A dictionary of training and validation data with the following: 8 'X_train': Array of shape (N_train, d_1, ..., d_k) giving training images 9 'X_val': Array of shape (N_val, d_1, ..., d_k) giving validation images 10 'y_train': Array of shape (N_train,) giving labels for training images 11 'y_val': Array of shape (N_val,) giving labels for validation images 12 13 Optional arguments: 14 - update_rule: A string giving the name of an update rule in optim.py. 15 Default is 'sgd'. 16 - optim_config: A dictionary containing hyperparameters that will be 17 passed to the chosen update rule. Each update rule requires different 18 hyperparameters (see optim.py) but all update rules require a 19 'learning_rate' parameter so that should always be present. 20 - lr_decay: A scalar for learning rate decay; after each epoch the learning 21 rate is multiplied by this value. 22 - batch_size: Size of minibatches used to compute loss and gradient during 23 training. 24 - num_epochs: The number of epochs to run for during training. 25 - print_every: Integer; training losses will be printed every print_every 26 iterations. 27 - verbose: Boolean; if set to false then no output will be printed during 28 training. 29 """ 30 self.model = model 31 self.X_train = data['X_train'] 32 self.y_train = data['y_train'] 33 self.X_val = data['X_val'] 34 self.y_val = data['y_val'] 35 36 # Unpack keyword arguments 37 self.update_rule = kwargs.pop('update_rule', 'sgd') 38 self.optim_config = kwargs.pop('optim_config', {}) 39 self.lr_decay = kwargs.pop('lr_decay', 1.0) 40 self.batch_size = kwargs.pop('batch_size', 100) 41 self.num_epochs = kwargs.pop('num_epochs', 10) 42 43 self.print_every = kwargs.pop('print_every', 100) 44 self.verbose = kwargs.pop('verbose', True) 45 46 # Throw an error if there are extra keyword arguments 47 if len(kwargs) > 0: 48 extra = ', '.join('"%s"' % k for k in kwargs.keys()) 49 raise ValueError('Unrecognized arguments %s' % extra) 50 51 # Make sure the update rule exists, then replace the string 52 # name with the actual function 53 if not hasattr(optim, self.update_rule): 54 raise ValueError('Invalid update_rule "%s"' % self.update_rule) 55 self.update_rule = getattr(optim, self.update_rule) 56 57 self._reset()
初始化函數接收的變量有:
(1)模型model,這本是一個類對象,定義了網絡的結構特征,和數據,優化方法等沒有關系,就是單純的一個網絡結構,包含了網絡前向后向的計算函數。
(2)數據data,這是一個結構體,包含了訓練集:X_train。驗證集X_val。訓練標簽:y_train。驗證標簽:y_val
(3)第三個參數**kwargs是指將輸入的量寫成一個字典的形式。在初始化函數中會依次進行pop,如果沒有設定某些值就賦予一個默認值
重置函數
1 def _reset(self): 2 """ 3 Set up some book-keeping variables for optimization. Don't call this 4 manually. 5 """ 6 # Set up some variables for book-keeping 7 self.epoch = 0 8 self.best_val_acc = 0 9 self.best_params = {} 10 self.loss_history = [] 11 self.train_acc_history = [] 12 self.val_acc_history = [] 13 14 # Make a deep copy of the optim_config for each parameter 15 self.optim_configs = {} 16 for p in self.model.params: 17 d = {k: v for k, v in self.optim_config.iteritems()} 18 self.optim_configs[p] = d
重置函數對一些solver類中的變量進行了重置。特別注意的是新建了一個
optim_configs字典來存儲優化的參數,之前的優化參數保存在self.optim_config字典中,這兩個是完全不一樣的!!
_step函數
1 def _step(self): 2 """ 3 Make a single gradient update. This is called by train() and should not 4 be called manually. 5 """ 6 # Make a minibatch of training data 7 num_train = self.X_train.shape[0] %確定有多少個訓練集樣本 8 batch_mask = np.random.choice(num_train, self.batch_size) % 從中隨機選擇出batch_size這么多個 9 X_batch = self.X_train[batch_mask] % 從訓練集中截取 10 y_batch = self.y_train[batch_mask] % 截取對應的標志 11 12 # Compute loss and gradient %計算損失函數和梯度 13 loss, grads = self.model.loss(X_batch, y_batch) % 調用模型的loss函數進行計算 14 self.loss_history.append(loss) % 將loss值存入一個向量中,后面會plot出來。注意每一個loss都是用一個batch這么多數據求出來的 15 16 # Perform a parameter update 17 for p, w in self.model.params.iteritems(): 18 dw = grads[p] 19 config = self.optim_configs[p] 20 next_w, next_config = self.update_rule(w, dw, config)% 注意這里!!,之前使用過getattr函數,所以成了一個函數 21 self.model.params[p] = next_w 22 self.optim_configs[p] = next_config
check_accuracy函數
1 def check_accuracy(self, X, y, num_samples=None, batch_size=100): 2 """ 3 Check accuracy of the model on the provided data. 4 5 Inputs: 6 - X: Array of data, of shape (N, d_1, ..., d_k) 7 - y: Array of labels, of shape (N,) 8 - num_samples: If not None, subsample the data and only test the model 9 on num_samples datapoints. 10 - batch_size: Split X and y into batches of this size to avoid using too 11 much memory. 12 13 Returns: 14 - acc: Scalar giving the fraction of instances that were correctly 15 classified by the model. 16 """ 17 18 # Maybe subsample the data 19 N = X.shape[0] % 輸入例子的個數 20 if num_samples is not None and N > num_samples: % 例子太多隨機抽取一些子類 21 mask = np.random.choice(N, num_samples) 22 N = num_samples 23 X = X[mask] % 隨機抽取一些子例子 24 y = y[mask] 25 26 # Compute predictions in batches 27 num_batches = N / batch_size % 看看N可以分成幾個batch 28 if N % batch_size != 0: %如果不能整除 29 num_batches += 1 % 分成的份數加1 30 y_pred = [] %預測值 31 for i in xrange(num_batches): %對每一份例子進行循環 32 start = i * batch_size % 選出當前的例子:這是開頭 33 end = (i + 1) * batch_size % 選出當前的例子: 這是結尾 34 scores = self.model.loss(X[start:end]) % 對開頭結尾之間的例子進行預測 35 y_pred.append(np.argmax(scores, axis=1)) %將預測后的值取最大值代表該例子的類別,並鏈接 36 y_pred = np.hstack(y_pred) %將所有的預測合在一起 37 acc = np.mean(y_pred == y) % 求一個平均,做為准確率 38 39 return acc % 返回准確率
之所以我們分成batch來求,然后合在一起,是為了防止例子過多,內存裝不下。
train函數
1 def train(self): 2 """ 3 Run optimization to train the model. 4 """ 5 num_train = self.X_train.shape[0] % 讀取訓練的例子的個數 6 iterations_per_epoch = max(num_train / self.batch_size, 1) % 在下面進行解釋 7 num_iterations = self.num_epochs * iterations_per_epoch 8 9 for t in xrange(num_iterations): % 對每一個iteration進行循環!! 10 self._step() % 更新一下。每次更新都是從所有例子中,抽取batch_size個例子,所以batch越小,要想覆蓋所有的數據集
所需要的迭代次數越多,也就解釋了上面的iterations_per_epoch的來源 11 12 # Maybe print training loss 13 if self.verbose and t % self.print_every == 0: % 在計算過程中觀察中間結果, 14 print '(Iteration %d / %d) loss: %f' % ( %可見print_every后面是迭代的次數 15 t + 1, num_iterations, self.loss_history[-1]) % 不是epoch的次數 16 17 # At the end of every epoch, increment the epoch counter and decay the 18 # learning rate. 19 epoch_end = (t + 1) % iterations_per_epoch == 0 由於每個epoch是由一些iteration組成 20 if epoch_end: %如果到達了足夠多的iteration,也就是epoch結束了 21 self.epoch += 1 % epoch加 1 22 for k in self.optim_configs: % 所有的learning_rate都要decay 23 self.optim_configs[k]['learning_rate'] *= self.lr_decay 24 25 # Check train and val accuracy on the first iteration, the last 26 # iteration, and at the end of each epoch. 27 first_it = (t == 0) % 在第一個和最后一個iteration,以及epoch結束時檢查acc 28 last_it = (t == num_iterations + 1) % 29 if first_it or last_it or epoch_end: % 計算train和val的acc 30 train_acc = self.check_accuracy(self.X_train, self.y_train, 31 num_samples=1000) 32 val_acc = self.check_accuracy(self.X_val, self.y_val) 33 self.train_acc_history.append(train_acc)% 將兩個的acc進行記錄 34 self.val_acc_history.append(val_acc) 35 36 if self.verbose: 37 print '(Epoch %d / %d) train acc: %f; val_acc: %f' % ( 38 self.epoch, self.num_epochs, train_acc, val_acc) 39 40 # Keep track of the best model 41 if val_acc > self.best_val_acc: 42 self.best_val_acc = val_acc 43 self.best_params = {} 44 for k, v in self.model.params.iteritems(): 45 self.best_params[k] = v.copy() 46 47 # At the end of training swap the best params into the model 48 self.model.params = self.best_params
iterations_per_epoch和num_iterations比較奇怪
(1)iterations_per_epoch:用訓練集中例子的個數除以batch的個數,如果小於1就取1.
比如訓練集有10000個例子,一個batch取100個例子,那么該變量為100。代表在一個epoch中迭代100次?
比如訓練集有10000個例子,一個batch取50個例子,那么改變量為200, 代表在一個epoch中迭代200次?
一個batch越小,一個epoch中迭代的次數越大。
(2)num_iterations:用self.num_epochs的個數,乘以上面的每個epoch中迭代的次數,就是總的迭代數。
(3)在每一個epoch結束的時候,對learning_rate進行decay
打法