torch.nn ------ 參數Parameter與Module容器

本文轉載自查看原文 2022-03-31 16:23 1090 pytorch

torch.nn ------ 參數Parameter與Module容器

作者：elfin 參考資料來源：torch.nn

一、Parameter
二、torch.nn之容器
- 2.1 Module

torch.nn是構建計算圖的基礎模塊，model.train()、model.val()分別為模型的訓練、評估模式。

一、Parameter

nn.parameter.Parameter(data=None, requires_grad=True)

將張量加入模型，通過requires_grad=True來進行控制是否可更新參數！與torch.tensor直接設置requires_grad=True的區別是直接設置不會將數據保存到model.parameter()中，那么你在保存模型參數的時候很可能就遺漏了關鍵數據，導致模型訓練效果較好，用同樣的數據測試，推理效果卻很差！

參數介紹：

data: 參數張量
requires_grad: 是否計算梯度進行參數更新

關於另外兩個：UninitializedParameter、UninitializedBuffer實際意義不大，我們最多用到from torch.nn.parameter import Parameter，這兩個如果有特殊用處可以點擊閱讀原文告訴我！

Top---Bottom

二、torch.nn之容器

2.1 Module

所有神經網絡模型的基類，你的模型應該是它的子類。模塊還可以包含其他模塊，允許將它們嵌套在樹結構中。您可以將子模塊分配為常規屬性，下面我們以深度卷積depth-wise為例進行講解：

class DepthWiseConv(nn.Module):
    """基於group分組實現的深度卷積"""

    def __init__(self, dim=768):
        super(DepthWiseConv, self).__init__()
        self.DWConv = nn.Conv2d(
            in_channels=dim, out_channels=dim,
            kernel_size=3, stride=1, padding=1,
            bias=True, groups=dim
        )

    def forward(self, x):
        x = self.DWConv(x)
        return x

這樣在上面兩個函數中實現的代碼都會被注冊，參數會保存到DepthWiseConv.parameter()中，注意如果你是自己實現了一個可訓練的參數，必須使用Parameter進行封裝！

屬性變量training

控制模塊是訓練模式還是評估模式

>>> DepthWiseConv.training
AttributeError: type object 'DepthWiseConv' has no attribute 'training'
>>> DepthWiseConv.training = False
>>> DepthWiseConv.training
False

上面的代碼告訴我們，模塊本身是沒有這個屬性變量的！明顯我們的代碼中沒有聲明這個變量，那么實例化之后會不會就有了呢？

>>> model = DepthWiseConv()
>>> model.training
True

經過測試我們得出：模塊實例化之后，默認是有屬性變量training的，而且默認值是True，注意即使我設置了DepthWiseConv該屬性為False，初始化實例后還是為True。這是因為這個參數是在Module類中實現的，即使我們自己寫模塊的時候聲明了這個變量，在調用父類的__init__()時，這個變量還是會被初始化為True。

Top---Bottom

2.2.1 add_model

參數說明：

name: 添加的子模塊名字
module：一個nn.Module子類模塊

將子模塊添加到當前模塊，可以使用給定名稱作為屬性訪問模塊。如我們在自定義一個模塊時，有時會有一系列操作，如果我們按照下面的方式給出，子模型將不會被注冊：

class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()
        self.layer = [nn.Linear(64, 64) for _ in range(5)]
    
    def forward(self, x):
    	x = self.layer(x)
        return x

我們調用這個模塊：

>>> test = Test()
>>> list(test.modules())
[Test()]

不難發現layer子模塊是沒有被注冊的！遇到這種情況我們可以選擇使用ModuleList類進行封裝，也可以使用add_model進行添加：

class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()
        self.layer = [nn.Linear(64, 64) for _ in range(5)]
        for i, layer in self.layer:
            self.add_module(f"layer_{i}", layer)
    
    def forward(self, x):
    	x = self.layer(x)
        return x

2.2.2 apply

對所有子模塊傳給\(fn\)處理，\(fn\)是我們傳給apply的函數體！

class Mlp(nn.Module):
    """
    定制多層感知器
    """

    def __init__(self, in_features, hidden_features=None, out_features=None,
                 act_layer=nn.GELU, drop=0., linear=False):
        super(Mlp, self).__init__()
        # 根據輸入通道數進行隱層、輸出的通道數
        out_features = out_features or in_features
        hidden_features = hidden_features or in_features
        self.FC1 = nn.Conv2d(in_features, hidden_features, 1)
        self.DW_Conv = DepthWiseConv(in_features)
        self.ACT = act_layer()
        self.FC2 = nn.Conv2d(hidden_features, out_features, 1)
        self.DROP = nn.Dropout(drop)
        self.LINEAR = linear
        if self.LINEAR:
            self.ReLU = nn.ReLU(inplace=True)
        self.apply(self._init_weights)

    @staticmethod
    def _init_weights(m):
        if isinstance(m, nn.Linear):
            trunc_normal_(m.weight, std=.02)
            if m.bias is not None:
                nn.init.constant_(m.bias, 0)
        elif isinstance(m, nn.Conv2d):
            fan_out = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
            fan_out //= m.groups
            m.weight.data.normal_(0, math.sqrt(2.0 / fan_out))
            if m.bias is not None:
                m.bias.data.zero_()
        elif isinstance(m, nn.LayerNorm):
            nn.init.constant_(m.bias, 0)
            nn.init.constant_(m.weight, 1.)
        pass

    def forward(self, x):
        x = self.FC1(x)
        if self.LINEAR:
            x = self.ReLU(x)
        x = self.DW_Conv(x)
        x = self.ACT(x)
        x = self.DROP(x)
        x = self.FC2(x)
        x = self.DROP(x)
        return x

根據權值初始化函數我們可以知道，函數體每次處理的是子模塊\(m\)，你可以使用model.children()進行子模塊的查看！

Top---Bottom

2.2.3 buffers模型緩存

buffers模型緩存主要是將一些不訓練的張量保存進state_dict中，如果普通張量沒有使用set_buffers()接口進行注冊，那么我們保存加載就不會有這個變量。

與buffers相關的類方法有：

buffers(): 獲取所有已經注冊的buffer對象；
get_buffer(target): 獲取特定的buffer對象，target是buffer的名字；
named_buffers(prefix='', recurse=True): 返回帶前綴的buffer迭代器，recurse參數控制是否循環獲取子模塊，如果為False，則只返回當前模塊，不會尋找子模塊；
register_buffer(name, tensor, persistent=True): 將張量注冊為一個buffer，persistent控制模型的state_dict是否包含這個變量。

以上是模塊關於緩存buffer的所有接口，我們下面來進行簡單測試：

class BuffersTest(nn.Module):
    def __init__(self):
        super(BuffersTest, self).__init__()
        self.data1 = torch.tensor([1, 2, 3, 4])
        self.register_buffer("data2", torch.tensor([5, 6, 7, 8]))
        
    def forward(self, x):
        x = self.data1 * x + self.data2
        return x

現在我們實例化，查看模型的參數、緩存、state_dict：

>>> buffer_test = BuffersTest()      # 實例化
>>> list(buffer_test.buffers())      # 查看buffer
[tensor([5, 6, 7, 8])]
>>> list(buffer_test.parameters())
[]
>>> list(buffer_test.state_dict())
['data2']

我們添加一個可訓練參數，再看三者發生什么變換：

>>> buffer_test.data3 = Parameter(data=torch.tensor([9, 10]).float())
>>> list(buffer_test.buffers())      # 查看buffer
[tensor([5, 6, 7, 8])]
>>> list(buffer_test.parameters())
[Parameter containing:
 tensor([ 9., 10.], requires_grad=True)]
>>> list(buffer_test.state_dict())
['data3', 'data2']

經過前后實驗對比，我們不難發現，parameters與buffers都會在state_dict中出現，但是buffers()與parameters()是不同的對象，對於可訓練的數據我們應該使用Parameter封裝，對於不參與更新的張量且需要保存在模型里的我們需要使用register_buffer進行注冊！

Top---Bottom

2.2.4 parameter相關方法

parameter相關方法是對需要梯度計算進行參數更新的數據集成的一類類方法。

parameters(recurse=True)
get_parameter(target)
named_parameters(prefix='', recurse=True)
register_parameter(name, param)

這些方法我們不再進行解釋，完全可以參考buffer的相關方法！

Top---Bottom

2.2.5 module相關方法

所有的相關方法：

modules: 前面我們已經多次提到，這是我們必須掌握的基礎方法
get_submodule(target)
register_module(name, module): add_module的別稱
named_modules(memo=None, prefix='', remove_duplicate=True)

module相關方法和buffer類似，但是又不是完全一樣的規則。主要是獲取子模型上，因為module是當前模塊，我們查找的時候是找它的子模塊，所以接口是get_submodule(target)，同樣地named_modules同樣可以獲取所有子模塊，但是它是返回全部的，所以我們查詢單個時需要使用get_submodule(target)降低復雜度。

named_modules方法和其他類型接口也不一樣，這里除前綴參數外，有兩個不同的參數：

memo=None memo是備忘實錄，我們不用管這個參數
remove_duplicate=True 是否刪除結果中重復的模塊實例

原始的named_modules返回：

>>> for i, m in enumerate(model.named_modules()):
...:    print(i, "-->", m)
...:
0 --> ('', DepthWiseConv(
  (DWConv): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
))
1 --> ('DWConv', Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768))

添加前綴的named_modules返回：

>>> for i, m in enumerate(model.named_modules(prefix="DW")):
...:    print(i, "-->", m)
...:
0 --> ('DW', DepthWiseConv(
  (DWConv): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
))
1 --> ('DW.DWConv', Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768))

很明顯子模塊的名字前都加了目標前綴！

Top---Bottom

2.2.6 children和named_children

>>> list(model.children())
[Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)]
>>> list(model.named_children())
[('DWConv',
  Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768))]

named_children與children相比，前者返回是一個元組，包含了子模塊實例的名字！

Top---Bottom

2.2.7 模塊的數據類型改變

方法	Desc	inplace
bfloat16()	TPU專用數據類型，相當於float32截取前16位	True
double()	改變元數據為雙精度	True
float()	將數據類型轉換為float32	True
half()	將數據轉換為半精度數據	True

2.2.8 設備選擇

通過model.cpu()將模塊置於cpu處理；通過model.cuda(device=None)選擇將模塊置於選擇的顯卡上進行處理！

2.2.9 鈎子方法

register_backward_hook: 在模塊上注冊一個后向鈎子
register_forward_hook
register_forward_pre_hook
register_full_backward_hook

方法解讀參考：

https://blog.csdn.net/foneone/article/details/107099060 (hook機制理解及模塊中間層輸出)
https://blog.csdn.net/winycg/article/details/100695373 （torch獲取中間層信息）

使用鈎子函數我們可以獲取模型中間層的信息，這樣方便我們查看信息！

Top---Bottom

2.2.10 其他方法介紹

set_extra_state

設置額外的狀態信息。如model.set_extra_state(state)，state是一個字典。我們可以使用gets_extra_state()獲取添加的額外狀態。

extra_repr

要打印自定義的額外信息，您應該在自己的模塊中重新實現此方法。單行和多行字符串都可以接受。

def extra_repr(self):
    res = """
    正在打印:
    MODEL: DepthWiseConv
    """
    print(res)
    for m, v in self.named_children():
        print(m)
        print(v)

實現的內容將在實例化的時候進行打印！

>>> model = DepthWiseConv(6)
        正在打印:
        MODEL: DepthWiseConv
        
DWConv
Conv2d(6, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=6)

綜上，不建議重構此方法！

requires_grad_

model.requires_grad_()默認開啟參數更新，即梯度傳播！返回模型本身！

zero_grad

將所有模型參數的梯度設置為零，set_to_none是否將梯度設置為none，默認是False。

type

inplace修改模型的參數和緩存數據類型

>>> model.type(torch.int32)
DepthWiseConv(
  (DWConv): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
)

修改模型參數的設備、數據類型，它有多種調用方式：

to(device=None, dtype=None, non_blocking=False)
to(dtype, non_blocking=False)
to(tensor, non_blocking=False)
to(memory_format=torch.channels_last)

此方法的參數：

device： torch.device對象
dtype: torch.dtype對象
tensor：滿足參數、緩存數據類型的張量
memory_format：此模塊中 4D 參數和緩沖區的所需內存格式

案例：

>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

to_empty

將參數和緩存移動到指定設備而不復制存儲

dump_patches

如果從模塊中添加(刪除)了新的參數(緩存)，則該版本將發生沖突，並且模塊的_load_from_state_dict方法可以比較版本號，如果狀態字典來自更改之前，則可以進行適當的更改。

share_memory

將底層存儲移動到共享內存，對於 CUDA 張量如果底層存儲已經在共享內存中並且，這是一個空操作。共享內存中的張量無法調整大小。

xpu

將所有模型參數和緩沖區移動到 XPU。XPU資訊參考：https://blog.csdn.net/ybhuangfugui/article/details/116616954

同cuda()、cpu()方法類似！

Top---Bottom

完！

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 module 'torch.nn' has no attribute 'GELU' pytorch模塊介紹：torch.nn pytorch torch.nn 實現上采樣——nn.Upsample PyTorch里面的torch.nn.Parameter() 從 relu 的多種實現來看 torch.nn 與 torch.nn.functional 的區別與聯系 torch.nn.lstm參數 [pytorch筆記] torch.nn vs torch.nn.functional; model.eval() vs torch.no_grad(); nn.Sequential() vs nn.moduleList pytorch中torch.nn構建神經網絡的不同層的含義利用torch.nn實現前饋神經網絡解決回歸任務在多分類任務實驗中用torch.nn實現dropout