PyTorch Hook¶

為什么要引入hook? -> hook可以做什么？
都有哪些hook?
如何使用hook?

1. 為什么引入hook?¶

參考：Pytorch中autograd以及hook函數詳解
在pytorch中的自動求梯度機制(Autograd mechanics)中，如果將tensor的requires_grad設為True, 那么涉及到它的一系列運算將在反向傳播中自動求梯度。

x = torch.randn(5, 5) # requires_grad=False by default
y = torch.randn(5, 5) # requires_grad=False by default
z = torch.randn((5, 5), requires_grad=True)
a = x + y
b = a + z
print(a.requires_grad, b.requires_grad)

False True

但是自動求導的機制有個我們需要注意的地方：在自動求導機制中只保存葉子節點，也就是中間變量在計算完成梯度后會自動釋放以節省空間. 所以下面代碼我們在計算過程中只得到了z對x的梯度，而y和z的梯度都在梯度計算后被自動釋放了，所以顯示為None.

x = torch.tensor([1,2],dtype=torch.float32,requires_grad=True)
y = x * 2
z = torch.mean(y)
z.backward()
print("x.grad =", x.grad)
print("y.grad =", y.grad)
print("z.grad =", z.grad)

x.grad = tensor([1., 1.])
y.grad = None
z.grad = None

那么能否得到y,z的梯度呢？這就需要引入hook.
在pytorch的tutorial中介紹：
We’ve inspected the weights and the gradients. But how about inspecting / modifying the output and grad_output of a layer ? We introduce hooks for this purpose. hook的引入是為了讓我們可以檢測或者修改一個layer的output或者grad_output.

2. hook的種類¶

TENSOR.register_hook(FUNCTION)
MODULE.register_forward_hook(FUNCTION)
MODULE.register_backward_hook(FUNCTION)

可以為Module或者Tensor注冊hook。
如果為Tensor注冊hook, 用register_hook();
如果為Module注冊hook, 若希望獲取前向傳播中layer的input, output信息，可以用register_forward_hook(); 如果為Module注冊hook, 若希望獲取反向傳播中layer的grad_in, grad_out信息，可以用register_backward_hook().

3. TENSOR.register_hook(FUNCTION)¶

x = torch.tensor([1,2],dtype=torch.float32,requires_grad=True)
y = x * 2
y.register_hook(print)
z = torch.mean(y)
z.backward()

tensor([0.5000, 0.5000])

以上代碼中，對y進行register_hook引入print這個函數，print即是簡單的打印，將y相關的grad打印出來。
在執行z.backward()執行的時候，由於y的hook函數也執行了，打印出了y關於輸出z的梯度，即 tensor([0.5000, 0.5000]) 便是y的梯度。

4. MODULE.register_forward_hook(FUNCTION) && MODULE.register_backward_hook(FUNCTION)¶

參考鏈接：Toy example to understand Pytorch hooks
介紹這兩個的用法前，我們先定義module, 之后的hook便是為以下的module注冊的。

import numpy as np
import torch
import torch.nn as nn
from IPython.display import Image

1. Define the network¶

''' Define the Net '''
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(2,2)
        self.s1 = nn.Sigmoid()
        self.fc2 = nn.Linear(2,2)
        self.s2 = nn.Sigmoid()
        self.fc1.weight = torch.nn.Parameter(torch.Tensor([[0.15,0.2],[0.250,0.30]]))
        self.fc1.bias = torch.nn.Parameter(torch.Tensor([0.35]))
        self.fc2.weight = torch.nn.Parameter(torch.Tensor([[0.4,0.45],[0.5,0.55]]))
        self.fc2.bias = torch.nn.Parameter(torch.Tensor([0.6]))
        
    def forward(self, x):
        x= self.fc1(x)
        x = self.s1(x)
        x= self.fc2(x)
        x = self.s2(x)
        return x

net = Net()
print(net)

Net(
  (fc1): Linear(in_features=2, out_features=2, bias=True)
  (s1): Sigmoid()
  (fc2): Linear(in_features=2, out_features=2, bias=True)
  (s2): Sigmoid()
)

''' Get the value of parameters defined in the Net '''
# parameters: weight and bias
print(list(net.parameters()))

[Parameter containing:
tensor([[0.1500, 0.2000],
        [0.2500, 0.3000]], requires_grad=True), Parameter containing:
tensor([0.3500], requires_grad=True), Parameter containing:
tensor([[0.4000, 0.4500],
        [0.5000, 0.5500]], requires_grad=True), Parameter containing:
tensor([0.6000], requires_grad=True)]

''' feed the input data to get the output and loss '''
# input data
data = torch.Tensor([0.05,0.1])
# output of last layer
out = net(data)
target = torch.Tensor([0.01,0.99])  # a dummy target, for example
criterion = nn.MSELoss()
loss = criterion(out, target)
print(loss)

tensor(0.2984, grad_fn=<MseLossBackward>)

2. The structure of hook input, output && grad_in, grad_out¶

在MODULE.register_forward_hook(FUNCTION)中，涉及到input, output參數，
在MODULE.register_backward_hook(FUNCTION)中，涉及到grad_in, grad_out參數，下面的圖示顯示了input, output分別是一個layer的輸入和輸出；
grad_in是整個神經網絡的輸出(可以想成最終的損失L)對layer的output求偏導， grad_out是 ( L對output求偏導 × output對input的偏導) => 鏈式法則。

from google.colab import files
from IPython.display import Image
uploaded = files.upload()

Saving hook_in_out.png to hook_in_out.png

Image("hook_in_out.png")

input ----------------------------> output------------------> Last layer output
y--------------------------------------------z -----> ... ------------> L
grad_out <------------------------ grad_in
(dL/dz) * (dz / dy) ---------------(dL/dz)

下面代碼中，如果backward = False, 表示的是前向傳播，input, output分別對應layer的輸入和輸出；
如果backward = True, 表示的是反向傳播過程，input 表示的是上圖中的 grad_in, output 表示的是上圖中的 grad_out.

''' Define hook 
'''
# A simple hook class that returns the input and output of a layer during forward/backward pass
class Hook():
    def __init__(self, module, backward=False):
        if backward==False:
            self.hook = module.register_forward_hook(self.hook_fn)
        else:
            self.hook = module.register_backward_hook(self.hook_fn)
    def hook_fn(self, module, input, output):
        self.input = input
        self.output = output
    def close(self):
        self.hook.remove()

# get the _modules.items()
# format: (name, module) 
print(list(net._modules.items()))

# use layer[0] to get the name and layer[1] to get the module
for layer in net._modules.items():
  print(layer[0], layer[1])

[('fc1', Linear(in_features=2, out_features=2, bias=True)), ('s1', Sigmoid()), ('fc2', Linear(in_features=2, out_features=2, bias=True)), ('s2', Sigmoid())]
fc1 Linear(in_features=2, out_features=2, bias=True)
s1 Sigmoid()
fc2 Linear(in_features=2, out_features=2, bias=True)
s2 Sigmoid()

為Hook類創建對象時，需要傳入module參數，以下代碼通過layer[1] 獲取。將前向的hook都放在hookF數組中，將反向的hook都放在hookB的數組中。
注意一定要先注冊hook, 之后再將data傳入神經網路進行前向傳播，即注冊hook一定要在net(data)之前進行，因為hook函數是在forward的時候進行綁定的。

''' Register hooks on each layer 
'''
hookF = [Hook(layer[1]) for layer in list(net._modules.items())]
hookB = [Hook(layer[1],backward=True) for layer in list(net._modules.items())]

# run a data batch
out=net(data)
print(out)

tensor([0.7514, 0.7729], grad_fn=<SigmoidBackward>)

3. Get the hook input, output and grad_in, grad_out value¶

注意loss.backward(retain_graph = True) 對於backward_hook並不適用
以下報錯顯示了 'Hook' object has no attribute 'input'，對於loss, 它並不是一個有input,output的網絡層，而只是網絡最后一層的輸出與target的aggregated的結果。
而之前定義的Hook中，要求有明確的input和output,所以，並不適用於loss.backward()
應該采用out.backward(label_tensor, retain_graph = True)

3.1 loss.backward(retain_graph = True)¶

loss.backward(retain_graph = True)

print('***'*3+'  Forward Hooks Inputs & Outputs  '+'***'*3)
for hook in hookF:
    print(hook.input)
    print(hook.output)
    print('---'*17)
print('\n')

#! loss.backward(retain_graph=True)  # doesn't work with backward hooks, 
#! since it's not a network layer but an aggregated result from the outputs of last layer vs target 
print('***'*3+'  Backward Hooks Inputs & Outputs  '+'***'*3)
for hook in hookB:             
    print(hook.input)          
    print(hook.output)         
    print('---'*17)

*********  Forward Hooks Inputs & Outputs  *********
(tensor([0.0500, 0.1000]),)
tensor([0.3775, 0.3925], grad_fn=<AddBackward0>)
---------------------------------------------------
(tensor([0.3775, 0.3925], grad_fn=<AddBackward0>),)
tensor([0.5933, 0.5969], grad_fn=<SigmoidBackward>)
---------------------------------------------------
(tensor([0.5933, 0.5969], grad_fn=<SigmoidBackward>),)
tensor([1.1059, 1.2249], grad_fn=<AddBackward0>)
---------------------------------------------------
(tensor([1.1059, 1.2249], grad_fn=<AddBackward0>),)
tensor([0.7514, 0.7729], grad_fn=<SigmoidBackward>)
---------------------------------------------------


*********  Backward Hooks Inputs & Outputs  *********

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-34-8c71e064b825> in <module>()
     12 print('***'*3+'  Backward Hooks Inputs & Outputs  '+'***'*3)
     13 for hook in hookB:
---> 14print(hook.input)
     15     print(hook.output)
     16     print('---'*17)

AttributeError: 'Hook' object has no attribute 'input'

3.2 out.backward(TENSOR, retain_graph = True)¶

下面采用的是正確的out.backward(torch.tensor([1,1],dtype=torch.float),retain_graph=True)的形式。
由於調用backward()的是out, 一個tensor而不是scalar, pytorch中不能直接求解它的Jacobian矩陣，需要為其指定grad_tensors.grad_tensors 可以看做對應張量的每個元素的梯度。
比如對於 y.backward(v,retain_graph = True), 其中 y = (y1, y2, y3), v = (v1, v2, v3), 那么backward中執行的操作是，先分別 (y1 v1, y2 v2, y3 * v3)，之后再對y求偏導，y再對parameter求偏導，鏈式法則。

其實也可以看做，在一般對網絡的輸出y, 與標簽l，利用損失函數得到一個損失標量L，表示為：
L = v1 y1 + v2 y2 + v3 y3;
dL/dy = (v1, v2, v3);
dL/dw = dL/dy dy/dw =( v1 dy/dw, v2 dy/dw, v3 * dy/dw)
上式dL/dw中的v即為 y.backward(v,retain_graph = True)的v的體現。相當於對於y.backward()的梯度都對應乘了v的系數。

out.backward(torch.tensor([1, 1], dtype = torch.float), retain_graph = True)
print('***'*3+'  Forward Hooks Inputs & Outputs  '+'***'*3)
for hook in hookF:
    print(hook.input)
    print(hook.output)
    print('---'*17)
print('\n')
print('***'*3+'  Backward Hooks Inputs & Outputs  '+'***'*3)
for hook in hookB:             
    print(hook.input)          
    print(hook.output)         
    print('---'*17)

*********  Forward Hooks Inputs & Outputs  *********
(tensor([0.0500, 0.1000]),)
tensor([0.3775, 0.3925], grad_fn=<AddBackward0>)
---------------------------------------------------
(tensor([0.3775, 0.3925], grad_fn=<AddBackward0>),)
tensor([0.5933, 0.5969], grad_fn=<SigmoidBackward>)
---------------------------------------------------
(tensor([0.5933, 0.5969], grad_fn=<SigmoidBackward>),)
tensor([1.1059, 1.2249], grad_fn=<AddBackward0>)
---------------------------------------------------
(tensor([1.1059, 1.2249], grad_fn=<AddBackward0>),)
tensor([0.7514, 0.7729], grad_fn=<SigmoidBackward>)
---------------------------------------------------


*********  Backward Hooks Inputs & Outputs  *********
(tensor([0.0392, 0.0435]), tensor([0.0827]))
(tensor([0.0392, 0.0435]),)
---------------------------------------------------
(tensor([0.0392, 0.0435]),)
(tensor([0.1625, 0.1806]),)
---------------------------------------------------
(tensor([0.1868, 0.1755]), tensor([0.3623]))
(tensor([0.1868, 0.1755]),)
---------------------------------------------------
(tensor([0.1868, 0.1755]),)
(tensor([1., 1.]),)
---------------------------------------------------

4. Module Hooks Problem¶

Problem with backward hook function #598
在該Issue中，指出了pytorchde module的一個問題：
“Ok, so the problem is that module hooks are actually registered on the last function that the module has created. In your case x + y + z is computed as ((x + y) + z) so the hook is registered on that (_ + z) operation, and this is why you're getting only two grad inputs.

We'll definitely have to resolve this but it will need a large change in the autograd internals. However, right now @colesbury is rewriting them to make it possible to have multiple functions dispatched in parallel, and they would heavily conflict with his work. For now use only Variable hooks (or module hooks, but not on containers). Sorry!”

翻譯過來是，module hooks只為一個module的最后的function注冊，比如對於 (x + y + z)，本應分別得到關於(x, y, z)這三個的grad, 但是pytorch會先計算(x + y), 之后計算( _ + z), 所以最終只有兩個grad，一個是關於(x + y)整體的grad, 一個是關於z的grad. 這是pytorch開發中一個比較難以解決的問題，目前該問題還沒有被解決。
鑒於這個問題，為了避免不必要的bug出現，設計者建議使用tensor的register_hook, 而不是module的hook。如果出現類似問題，可以知道從這里找原因。

from IPython.display import Image
Image(filename = "../../Downloads/zhifubao.png", width = 200, height = 200)

The wound is the place where the Light enters you. ~Rumi

[pytorch] PyTorch Hook