Pytorch學習筆記11----model.train()與model.eval()的用法、Dropout原理、relu,sigmiod,tanh激活函數、nn.Linear淺析、輸出整個tensor的方法

本文轉載自查看原文 2020-08-03 09:35 7987 Pytorch自然語言處理

1.model.train()與model.eval()的用法

看別人的面經時，瀏覽到一題，問的就是這個。自己剛接觸pytorch時套用別人的框架，會在訓練開始之前寫上model.trian()，在測試時寫上model.eval()。然后自己寫的時候也就保留了這個習慣，沒有去想其中原因。

在經過一番查閱之后，總結如下：
如果模型中有BN層(Batch Normalization）和Dropout，需要在訓練時添加model.train()，在測試時添加model.eval()。其中model.train()是保證BN層用每一批數據的均值和方差，而model.eval()是保證BN用全部訓練數據的均值和方差；而對於Dropout，model.train()是隨機取一部分網絡連接來訓練更新參數，而model.eval()是利用到了所有網絡連接。

聯系Batch Normalization和Dropout的原理之后就不難理解為何要這么做了。

2.Dropout

dropout常常用於抑制過擬合，pytorch也提供了很方便的函數。但是經常不知道dropout的參數p是什么意思。在TensorFlow中p叫做keep_prob,就一直以為pytorch中的p應該就是保留節點數的比例，但是實驗結果發現反了，實際上表示的是不保留節點數的比例。看下面的例子:

a = torch.randn(10,1)
>>> tensor([[ 0.0684],
        [-0.2395],
        [ 0.0785],
        [-0.3815],
        [-0.6080],
        [-0.1690],
        [ 1.0285],
        [ 1.1213],
        [ 0.5261],
        [ 1.1664]])

P=0.5

torch.nn.Dropout(0.5)(a)
>>> tensor([[ 0.0000],  
        [-0.0000],  
        [ 0.0000],  
        [-0.7631],  
        [-0.0000],  
        [-0.0000],  
        [ 0.0000],  
        [ 0.0000],  
        [ 1.0521],  
        [ 2.3328]])

數值上的變化： 2.3328=1.1664*2

設置Dropout時，torch.nn.Dropout(0.5), 這里的 0.5 是指該層（layer）的神經元在每次迭代訓練時會隨機有 50% 的可能性被丟棄（失活），不參與訓練

將上一層數據減少一半傳播

3.relu,sigmiod,tanh激活函數

在神經網絡中原本輸入輸出都是線性關系，但現實中，許多的問題是非線性的（比如，房價問題中，房價不可能隨着房子面積的增加一直線性增加），這個時候就神經網絡的線性輸出，再經過激勵函數，便使得原本線性的關系變成非線性了，增強了神經網絡的性能。

常用的激活函數：relu，sigmoid，tanh，softmax，softplus

import torch
import torch.nn.functional as F
from torch.autograd import Variable
import matplotlib.pyplot as plt  # 為了方便找展示，用了matplotlib

# 生成數據
tensor_data = torch.linspace(-5, 5, 200)
variable_data = Variable(tensor_data)
np_data = variable_data.data.numpy()

# 激活函數    (轉為numpy是為了畫圖)
relu_function = torch.relu(variable_data).data.numpy()
sigmoid_function = torch.sigmoid(variable_data).data.numpy()
tanh_function = torch.tanh(variable_data).data.numpy()
softplus_function = F.softplus(variable_data).data.numpy()

# 使用matplotlib作圖
plt.figure(1, figsize=(6, 6))

plt.subplot(221)
plt.plot(np_data, relu_function, c="green", label="relu")
plt.ylim(-1, 5)
plt.legend(loc="best")

plt.subplot(222)
plt.plot(np_data, sigmoid_function, c="green", label="sigmoid")
plt.ylim(-0.2, 1.2)
plt.legend(loc="best")

plt.subplot(223)
plt.plot(np_data, tanh_function, c="green", label="tanh")
plt.ylim(-1.2, 1.2)
plt.legend(loc="best")

plt.subplot(224)
plt.plot(np_data, softplus_function, c="green", label="softplus")
plt.ylim(-0.2, 6)
plt.legend(loc="best")

plt.show()

結果：

4.nn.Linear淺析

對輸入數據進行線性變換

查看源碼：

初始化部分

    def __init__(self, in_features, out_features, bias=True):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()

需要實現的內容：

參數說明：

Args:
        in_features: size of each input sample  輸入的二維張量的大小
        out_features: size of each output sample 輸出的二維張量的大小
        bias: If set to ``False``, the layer will not learn an additive bias.
            Default: ``True``

舉個例子：

>>> m = nn.Linear(20, 30)
>>> input = torch.randn(128, 20)
>>> output = m(input)
>>> print(output.size())
torch.Size([128, 30])

張量的大小由 128 x 20 變成了 128 x 30

執行的操作是：

[128,20]×[20,30]=[128,30]

5.輸出整個tensor的方法

torch.set_printoptions(profile="full")
print(logit)  # prints the whole tensor
torch.set_printoptions(profile="default")  # reset
print(logit)  # prints the truncated tensor

參考文獻：

https://blog.csdn.net/Qy1997/article/details/106455717

https://www.cnblogs.com/marsggbo/p/10592643.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pytorch函數之nn.Linear 激活函數的比較，sigmoid，tanh，relu sigmod、tanh、ReLU激活函數的實現 pytorch 使用的時候的 model.train() 和 model.eval() Pytorch本人疑問(2)model.train()和model.eval()的區別深度學習面試題05：激活函數sigmod、tanh、ReLU、LeakyRelu、Relu6 ReLu(Rectified Linear Units)激活函數。深度學習基礎系列（三）| sigmoid、tanh和relu激活函數的直觀解釋激活函數（Activation functions）--（sigmoid、tanh、ReLu） [pytorch筆記] torch.nn vs torch.nn.functional; model.eval() vs torch.no_grad(); nn.Sequential() vs nn.moduleList