論文 Bag of Tricks for Image Classification with Convolutional Neural Networks. 中提到,加 L2 正則就相當於將該權重趨向 0,而對於 CNN 而言,一般只對卷積層和全連接層的 weights 進行 L2(weight decay),而不對 biases 進行。Batch Normalization 層也不進行 L2。
PyTorch,只對卷積層和全連接層的 weights 進行 L2(weight decay):
weight_decay_list = (param for name, param in model.named_parameters() if name[-4:] != 'bias' and "bn" not in name)
no_decay_list = (param for name, param in model.named_parameters() if name[-4:] == 'bias' or "bn" in name)
parameters = [{'params': weight_decay_list},
{'params': no_decay_list, 'weight_decay': 0.}]
optimizer = torch.optim.SGD(parameters, lr=0.1, momentum=0.9, weight_decay=5e-4, nesterov=True)
References
[1] He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., Li, M. (2019). Bag of Tricks for Image Classification with Convolutional Neural Networks. (CVPR) https://dx.doi.org/10.1109/cvpr.2019.00065