nn.Module基類的構造函數:
def __init__(self): self._parameters = OrderedDict() self._modules = OrderedDict() self._buffers = OrderedDict() self._backward_hooks = OrderedDict() self._forward_hooks = OrderedDict() self.training = True
其中每個屬性的解釋如下:
_parameters:字典,保存用戶直接設置的parameter,self.param1 = nn.Parameter(t.randn(3, 3))會被檢測到,在字典中加入一個key為'param',value為對應parameter的item。而self.submodule = nn.Linear(3, 4)中的parameter則不會存於此。_modules:子module,通過self.submodel = nn.Linear(3, 4)指定的子module會保存於此。_buffers:緩存。如batchnorm使用momentum機制,每次前向傳播需用到上一次前向傳播的結果。_backward_hooks與_forward_hooks:鈎子技術,用來提取中間變量,類似variable的hook。training:BatchNorm與Dropout層在訓練階段和測試階段中采取的策略不同,通過判斷training值來決定前向傳播策略。
上述幾個屬性中,_parameters、_modules和_buffers這三個字典中的鍵值,都可以通過self.key方式獲得,效果等價於self._parameters['key'].
定義一個Module,這個Module即包含自己的Parameters有包含子Module及其Parameters,
import torch as t
from torch import nn
from torch.autograd import Variable as V
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 等價與self.register_parameter('param1' ,nn.Parameter(t.randn(3, 3)))
self.param1 = nn.Parameter(t.rand(3, 3))
self.submodel1 = nn.Linear(3, 4)
def forward(self, input):
x = self.param1.mm(input)
x = self.submodel11(x)
return x
net = Net()
一、_modules
# 打印網絡對象的話會輸出子module結構
print(net)Net( (submodel1): Linear(in_features=3, out_features=4) )# ._modules輸出的也是子module結構,不過數據結構和上面的有所不同
print(net.submodel1)
print(net._modules) # 字典子類Linear(in_features=3, out_features=4) OrderedDict([('submodel1', Linear(in_features=3, out_features=4))])for name, submodel in net.named_modules():
print(name, submodel)Net( (submodel1): Linear(in_features=3, out_features=4) ) submodel1 Linear(in_features=3, out_features=4)print(list(net.named_modules())) # named_modules其實是包含了本層的module集合
[('', Net( (submodel1): Linear(in_features=3, out_features=4) )), ('submodel1', Linear(in_features=3, out_features=4))]
二、_parameters
# ._parameters存儲的也是這個結構
print(net.param1)
print(net._parameters) # 字典子類,僅僅包含直接定義的nn.Parameters參數Parameter containing: 0.6135 0.8082 0.4519 0.9052 0.5929 0.2810 0.6825 0.4437 0.3874 [torch.FloatTensor of size 3x3] OrderedDict([('param1', Parameter containing: 0.6135 0.8082 0.4519 0.9052 0.5929 0.2810 0.6825 0.4437 0.3874 [torch.FloatTensor of size 3x3] )])
for name, param in net.named_parameters():
print(name, param.size())param1 torch.Size([3, 3]) submodel1.weight torch.Size([4, 3]) submodel1.bias torch.Size([4])
三、_buffers
bn = nn.BatchNorm1d(2) input = V(t.rand(3, 2), requires_grad=True) output = bn(input) bn._buffers
OrderedDict([('running_mean',
1.00000e-02 *
9.1559
1.9914
[torch.FloatTensor of size 2]), ('running_var',
0.9003
0.9019
[torch.FloatTensor of size 2])])
四、training
input = V(t.arange(0, 12).view(3, 4)) model = nn.Dropout() # 在訓練階段,會有一半左右的數被隨機置為0 model(input)
model.training = False # 在測試階段,dropout什么都不做 model(input)
Module.train()、Module.eval() 方法和 Module.training屬性的關系
print(net.training, net.submodel1.training) net.train() # 將本層及子層的training設定為True net.eval() # 將本層及子層的training設定為False net.training = True # 注意,對module的設置僅僅影響本層,子module不受影響 net.training, net.submodel1.training
