pytorch之nn.Conv1d詳解

之前學習pytorch用於文本分類的時候，用到了一維卷積，花了點時間了解其中的原理，看網上也沒有詳細解釋的博客，所以就記錄一下。

Conv1d
class torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

in_channels(int) – 輸入信號的通道。在文本分類中，即為詞向量的維度
out_channels(int) – 卷積產生的通道。有多少個out_channels，就需要多少個1維卷積
kerner_size(int or tuple) - 卷積核的尺寸，卷積核的大小為(k,)，第二個維度是由in_channels來決定的，所以實際上卷積大小為kerner_size*in_channels
stride(int or tuple, optional) - 卷積步長
padding (int or tuple, optional)- 輸入的每一條邊補充0的層數
dilation(int or tuple, `optional``) – 卷積核元素之間的間距
groups(int, optional) – 從輸入通道到輸出通道的阻塞連接數
bias(bool, optional) - 如果bias=True，添加偏置
舉個例子:

conv1 = nn.Conv1d(in_channels=256，out_channels=100,kernel_size=2)
input = torch.randn(32,35,256)
# batch_size x text_len x embedding_size -> batch_size x embedding_size x text_len
input = input.permute(0,2,1)
out = conv1(input)
print(out.size())

這里32為batch_size，35為句子最大長度，256為詞向量

再輸入一維卷積的時候，需要將32*25*256變換為32*256*35，因為一維卷積是在最后維度上掃的，最后out的大小即為：32*100*（35-2+1）=32*100*34

附上一張圖，可以很直觀的理解一維卷積是如何用的：

圖中輸入的詞向量維度為5，輸入大小為7*5，一維卷積和的大小為2、3、4，每個都有兩個，總共6個特征。

對於k=4，見圖中紅色的大矩陣，卷積核大小為4*5，步長為1。這里是針對輸入從上到下掃一遍，輸出的向量大小為((7-4)/1+1)*1=4*1，最后經過一個卷積核大小為4的max_pooling，變成1個值。最后獲得6個值，進行拼接，在經過一個全連接層，輸出2個類別的概率。

附上一個代碼來詳解：

其中，embedding_size=256, feature_size=100, window_sizes=[3,4,5,6], max_text_len=35

class TextCNN(nn.Module):
    def __init__(self, config):
        super(TextCNN, self).__init__()
        self.is_training = True
        self.dropout_rate = config.dropout_rate
        self.num_class = config.num_class
        self.use_element = config.use_element
        self.config = config
 
        self.embedding = nn.Embedding(num_embeddings=config.vocab_size, 
                                embedding_dim=config.embedding_size)
        self.convs = nn.ModuleList([
                nn.Sequential(nn.Conv1d(in_channels=config.embedding_size, 
                                        out_channels=config.feature_size, 
                                        kernel_size=h),
#                              nn.BatchNorm1d(num_features=config.feature_size), 
                              nn.ReLU(),
                              nn.MaxPool1d(kernel_size=config.max_text_len-h+1))
                     for h in config.window_sizes
                    ])
        self.fc = nn.Linear(in_features=config.feature_size*len(config.window_sizes),
                            out_features=config.num_class)
        if os.path.exists(config.embedding_path) and config.is_training and config.is_pretrain:
            print("Loading pretrain embedding...")
            self.embedding.weight.data.copy_(torch.from_numpy(np.load(config.embedding_path)))    
    
    def forward(self, x):
        embed_x = self.embedding(x)
        
        #print('embed size 1',embed_x.size())  # 32*35*256
# batch_size x text_len x embedding_size  -> batch_size x embedding_size x text_len
        embed_x = embed_x.permute(0, 2, 1)
        #print('embed size 2',embed_x.size())  # 32*256*35
        out = [conv(embed_x) for conv in self.convs]  #out[i]:batch_size x feature_size*1
        #for o in out:
        #    print('o',o.size())  # 32*100*1
        out = torch.cat(out, dim=1)  # 對應第二個維度（行）拼接起來，比如說5*2*1,5*3*1的拼接變成5*5*1
        #print(out.size(1)) # 32*400*1
        out = out.view(-1, out.size(1)) 
        #print(out.size())  # 32*400 
        if not self.use_element:
            out = F.dropout(input=out, p=self.dropout_rate)
            out = self.fc(out)
        return out

embed_x一開始大小為32*35*256，32為batch_size。經過permute，變為32*256*35，輸入到自定義的網絡后，out中的每一個元素，大小為32*100*1，共有4個元素。在dim=1維度上進行拼接后，變為32*400*1，在經過view，變為32*400，最后通過400*num_class大小的全連接矩陣，變為32*2。

===================================================================================================================================

Pytorch中計算卷積方法的區別（conv2d的區別）

在二維矩陣間的運算：

class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

對由多個特征平面組成的輸入信號進行2D的卷積操作。

torch.nn.functional.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)

在由多個輸入平面組成的輸入圖像上應用2D卷積，這個操作其實和上面的操作是一樣的，只不過這個操作多用於計算一組卷積核對於輸入的卷積結果，而上面的那條代碼更多的則是用在定義網絡中去。

======================================================================================================================

先來看二維卷積conv2d

conv2d(input, filter, strides, padding, use_cudnn_on_gpu=True, data_format="NHWC", dilations=[1, 1, 1, 1], name=None)
"""Computes a 2-D convolution given 4-D `input` and `filter` tensors."""
給定4維的輸入張量和濾波器張量來進行2維的卷積計算。

input：4維張量，形狀：[batch, in_height, in_width, in_channels]

filter：濾波器（卷積核），4維張量，形狀：[filter_height, filter_width, in_channels, out_channels]

strides：濾波器滑動窗口在input的每一維度上，每次要滑動的步長，是一個長度為4的一維張量。

padding：邊界填充算法參數，有兩個值：‘SAME’、‘VALID’。具體差別體現在卷積池化后，特征圖的大小變化上面。卷積池化后特征矩陣的大小計算參見https://blog.csdn.net/qq_26552071/article/details/81171161

return：該函數返回一個張量，其類型與input輸入張量相同。

再看一維卷積conv1d，python中的一維卷積最終還是通過二維卷積實現的，先將輸入張量和濾波器的維度擴展，再調用二維卷積conv2d來實現。

def conv1d(value,filters, stride, padding, use_cudnn_on_gpu=None, data_format=None, name=None)
"""Computes a 1-D convolution given 3-D input and filter tensors."""
給定三維的輸入張量和濾波器來進行1維卷積計算。

input：3維張量，形狀shape和data_format有關：

（1）data_format = "NWC", shape = [batch, in_width, in_channels]

（2）data_format = "NCW", shape = [batch, in_channels, in_width]

filters：3維張量，shape = [filter_width, in_channels, out_channels],

stride：濾波器窗口移動的步長，為一個整數。

padding：與上文一致。

由conv1d源碼可以看出，一維卷積的實現，是先對輸入張量和filter擴展了一維，然后調用二維卷積進行運算的：

    value = array_ops.expand_dims(value, spatial_start_dim)  # 輸入張量
    filters = array_ops.expand_dims(filters, 0)  # 濾波器
    result = gen_nn_ops.conv2d(
        value,
        filters,
        strides,
        padding,
        use_cudnn_on_gpu=use_cudnn_on_gpu,
        data_format=data_format)
    return array_ops.squeeze(result, [spatial_start_dim])

下面為conv1d完整源碼：

def conv1d(value,
           filters,
           stride,
           padding,
           use_cudnn_on_gpu=None,
           data_format=None,
           name=None): 
  with ops.name_scope(name, "conv1d", [value, filters]) as name:
    # Reshape the input tensor to [batch, 1, in_width, in_channels]
    if data_format is None or data_format == "NHWC" or data_format == "NWC":
      data_format = "NHWC"
      spatial_start_dim = 1
      strides = [1, 1, stride, 1]
    elif data_format == "NCHW" or data_format == "NCW":
      data_format = "NCHW"
      spatial_start_dim = 2
      strides = [1, 1, 1, stride]
    else:
      raise ValueError("data_format must be \"NWC\" or \"NCW\".")
    value = array_ops.expand_dims(value, spatial_start_dim)
    filters = array_ops.expand_dims(filters, 0)
    result = gen_nn_ops.conv2d(
        value,
        filters,
        strides,
        padding,
        use_cudnn_on_gpu=use_cudnn_on_gpu,
        data_format=data_format)
    return array_ops.squeeze(result, [spatial_start_dim])

==============================================

備忘

input w*h

output wo*ho

filter F

Padding P

stride S

wo = (w - F + 2*P)/S +1

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pytorch 中conv1d操作 Conv2D keras conv2D參數 Pytorch的conv2d實現圖像邊緣檢測和均值模糊 TensorFlow之卷積函數（conv2d） TensorFlow conv2d實現卷積 TensorFlow中的兩種conv2d方法和kernel_initializer Keras Conv1d 參數及輸入輸出詳解 TensorFlow基礎筆記(11) conv2D函數 Conv2d常設置參數意義-繼續擴展