Deep Learning 31: 不同版本的keras,對同樣的代碼,得到不同結果的原因總結


一.疑問

這幾天一直糾結於一個問題:

同樣的代碼,為什么在keras的0.3.3版本中,擬合得比較好,也沒有過擬合,驗證集准確率一直高於訓練准確率. 但是在換到keras的1.2.0版本中的時候,就過擬合了,驗證誤差一直高於訓練誤差

二.答案

今天終於發現原因了,原來是這兩個版本的keras的optimezer實現不一樣,但是它們的默認參數是一樣的,因為我代碼中用的是adam方法優化,下面就以optimezer中的adam來舉例說明:

1.下面是keras==0.3.3時,其中optimezer.py中的adam方法實現:

 1 class Adam(Optimizer):
 2     '''Adam optimizer.
 3 
 4     Default parameters follow those provided in the original paper.
 5 
 6     # Arguments
 7         lr: float >= 0. Learning rate.
 8         beta_1/beta_2: floats, 0 < beta < 1. Generally close to 1.
 9         epsilon: float >= 0. Fuzz factor.
10 
11     # References
12         - [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
13     '''
14     def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8,
15                  *args, **kwargs):
16         super(Adam, self).__init__(**kwargs)
17         self.__dict__.update(locals())
18         self.iterations = K.variable(0)
19         self.lr = K.variable(lr)
20         self.beta_1 = K.variable(beta_1)
21         self.beta_2 = K.variable(beta_2)
22 
23     def get_updates(self, params, constraints, loss):
24         grads = self.get_gradients(loss, params)
25         self.updates = [(self.iterations, self.iterations+1.)]
26 
27         t = self.iterations + 1
28         lr_t = self.lr * K.sqrt(1 - K.pow(self.beta_2, t)) / (1 - K.pow(self.beta_1, t))
29 
30         for p, g, c in zip(params, grads, constraints):
31             # zero init of moment
32             m = K.variable(np.zeros(K.get_value(p).shape))
33             # zero init of velocity
34             v = K.variable(np.zeros(K.get_value(p).shape))
35 
36             m_t = (self.beta_1 * m) + (1 - self.beta_1) * g
37             v_t = (self.beta_2 * v) + (1 - self.beta_2) * K.square(g)
38             p_t = p - lr_t * m_t / (K.sqrt(v_t) + self.epsilon)
39 
40             self.updates.append((m, m_t))
41             self.updates.append((v, v_t))
42             self.updates.append((p, c(p_t)))  # apply constraints
43         return self.updates
44 
45     def get_config(self):
46         return {"name": self.__class__.__name__,
47                 "lr": float(K.get_value(self.lr)),
48                 "beta_1": float(K.get_value(self.beta_1)),
49                 "beta_2": float(K.get_value(self.beta_2)),
50                 "epsilon": self.epsilon}

 

2.下面是keras==1.2.0時,其中optimezer.py中的adam方法實現:

 1 class Adam(Optimizer):
 2     '''Adam optimizer.
 3 
 4     Default parameters follow those provided in the original paper.
 5 
 6     # Arguments
 7         lr: float >= 0. Learning rate.
 8         beta_1/beta_2: floats, 0 < beta < 1. Generally close to 1.
 9         epsilon: float >= 0. Fuzz factor.
10 
11     # References
12         - [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
13     '''
14     def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999,
15                  epsilon=1e-8, decay=0., **kwargs):
16         super(Adam, self).__init__(**kwargs)
17         self.__dict__.update(locals())
18         self.iterations = K.variable(0)
19         self.lr = K.variable(lr)
20         self.beta_1 = K.variable(beta_1)
21         self.beta_2 = K.variable(beta_2)
22         self.decay = K.variable(decay)
23         self.inital_decay = decay
24 
25     def get_updates(self, params, constraints, loss):
26         grads = self.get_gradients(loss, params)
27         self.updates = [K.update_add(self.iterations, 1)]
28 
29         lr = self.lr
30         if self.inital_decay > 0:
31             lr *= (1. / (1. + self.decay * self.iterations))
32 
33         t = self.iterations + 1
34         lr_t = lr * K.sqrt(1. - K.pow(self.beta_2, t)) / (1. - K.pow(self.beta_1, t))
35 
36         shapes = [K.get_variable_shape(p) for p in params]
37         ms = [K.zeros(shape) for shape in shapes]
38         vs = [K.zeros(shape) for shape in shapes]
39         self.weights = [self.iterations] + ms + vs
40 
41         for p, g, m, v in zip(params, grads, ms, vs):
42             m_t = (self.beta_1 * m) + (1. - self.beta_1) * g
43             v_t = (self.beta_2 * v) + (1. - self.beta_2) * K.square(g)
44             p_t = p - lr_t * m_t / (K.sqrt(v_t) + self.epsilon)
45 
46             self.updates.append(K.update(m, m_t))
47             self.updates.append(K.update(v, v_t))
48 
49             new_p = p_t
50             # apply constraints
51             if p in constraints:
52                 c = constraints[p]
53                 new_p = c(new_p)
54             self.updates.append(K.update(p, new_p))
55         return self.updates
56 
57     def get_config(self):
58         config = {'lr': float(K.get_value(self.lr)),
59                   'beta_1': float(K.get_value(self.beta_1)),
60                   'beta_2': float(K.get_value(self.beta_2)),
61                   'decay': float(K.get_value(self.decay)),
62                   'epsilon': self.epsilon}
63         base_config = super(Adam, self).get_config()
64         return dict(list(base_config.items()) + list(config.items()))

讀代碼對比,可發現這兩者實現方式有不同,而我的代碼中一直使用的是adam的默認參數,所以才會結果不一樣.

三.解決

要避免這一問題可用以下方法:

1.在自己的代碼中,要對優化器的參數給定,不要用默認參數.

adam = optimizers.Adam(lr=1e-4)

但是,在keras官方文檔中,明確有說明,在用這些優化器的時候,最好使用默認參數,所以也可采用第2種方法.

2.優化函數中的優化方法要給定,也就是在訓練的時候,在fit函數中的callbacks參數中的schedule要給定.

比如:

 1 # Callback that implements learning rate schedule
 2 schedule = Step([20], [1e-4, 1e-6])
 3 
 4 history = model.fit(X_train, Y_train,
 5                     batch_size=batch_size, nb_epoch=nb_epoch, validation_data=(X_test,Y_test),
 6                     callbacks=[
 7                         schedule,
 8                         keras.callbacks.ModelCheckpoint(filepath, monitor='val_loss', verbose=0,save_best_only=True, mode='auto')# 該回調函數將在每個epoch后保存模型到filepath
 9                         # ,keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, verbose=0, mode='auto')# 當監測值不再改善時,該回調函數將中止訓練.當early stop被激活(如發現loss相比上一個epoch訓練沒有下降),則經過patience個epoch后停止訓練
10                         ],
11                     verbose=2, shuffle=True)

 

其中Step函數如下:

 1 class Step(Callback):
 2 
 3     def __init__(self, steps, learning_rates, verbose=0):
 4         self.steps = steps
 5         self.lr = learning_rates
 6         self.verbose = verbose
 7 
 8     def change_lr(self, new_lr):
 9         old_lr = K.get_value(self.model.optimizer.lr)
10         K.set_value(self.model.optimizer.lr, new_lr)
11         if self.verbose == 1:
12             print('Learning rate is %g' %new_lr)
13 
14     def on_epoch_begin(self, epoch, logs={}):
15         for i, step in enumerate(self.steps):
16             if epoch < step:
17                 self.change_lr(self.lr[i])
18                 return
19         self.change_lr(self.lr[i+1])
20 
21     def get_config(self):
22         config = {'class': type(self).__name__,
23                   'steps': self.steps,
24                   'learning_rates': self.lr,
25                   'verbose': self.verbose}
26         return config
27 
28     @classmethod
29     def from_config(cls, config):
30         offset = config.get('epoch_offset', 0)
31         steps = [step - offset for step in config['steps']]
32         return cls(steps, config['learning_rates'],
33                    verbose=config.get('verbose', 0))

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM