因為研究方向為關系抽取,所以在文本的處理方面,一維卷積方法是很有必要掌握的,簡單介紹下加深學習印象。
Pytorch官方參數說明:
Conv1d
class torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
- in_channels(
int
) – 輸入信號的通道。在文本分類中,即為詞向量的維度 - out_channels(
int
) – 卷積產生的通道。有多少個out_channels,就需要多少個1維卷積 - kernel_size(
int
ortuple
) - 卷積核的尺寸,卷積核的大小為(k,),第二個維度是由in_channels來決定的,所以實際上卷積大小為kernel_size*in_channels - stride(
int
ortuple
,optional
) - 卷積步長 - padding (
int
ortuple
,optional
)- 輸入的每一條邊補充0的層數 - dilation(
int
ortuple
, `optional``) – 卷積核元素之間的間距 - groups(
int
,optional
) – 從輸入通道到輸出通道的阻塞連接數 - bias(
bool
,optional
) - 如果bias=True
,添加偏置
有一位博主解釋的很清楚,我附上他的內容,閱讀連接我會在下方提供。
1 conv1 = nn.Conv1d(in_channels=256,out_channels=100,kernel_size=2) 2 input = torch.randn(32,35,256) 3 # batch_size x text_len x embedding_size -> batch_size x embedding_size x text_len 4 input = input.permute(0,2,1) 5 out = conv1(input) 6 print(out.size())
這里32為batch_size,35為句子最大長度,256為詞向量
再輸入一維卷積的時候,需要將32*35*256變換為32*256*35,因為一維卷積是在最后維度上掃的,最后out的大小即為:32*100*(35-2+1)=32*100*34
圖中輸入的詞向量維度為5,輸入大小為7*5,一維卷積和的大小為2、3、4,每個都有兩個,總共6個特征。
對於k=4,見圖中紅色的大矩陣,卷積核大小為4*5,步長為1。這里是針對輸入從上到下掃一遍,輸出的向量大小為((7-4)/1+1)*1=4*1,最后經過一個卷積核大小為4的max_pooling,變成1個值。最后獲得6個值,進行拼接,在經過一個全連接層,輸出2個類別的概率。
附上的代碼詳解如下:
其中,embedding_size=256, feature_size=100, window_sizes=[3,4,5,6], max_text_len=35
1 class TextCNN(nn.Module): 2 def __init__(self, config): 3 super(TextCNN, self).__init__() 4 self.is_training = True 5 self.dropout_rate = config.dropout_rate 6 self.num_class = config.num_class 7 self.use_element = config.use_element 8 self.config = config 9 10 self.embedding = nn.Embedding(num_embeddings=config.vocab_size, 11 embedding_dim=config.embedding_size) 12 self.convs = nn.ModuleList([ 13 nn.Sequential(nn.Conv1d(in_channels=config.embedding_size, 14 out_channels=config.feature_size, 15 kernel_size=h), 16 # nn.BatchNorm1d(num_features=config.feature_size), 17 nn.ReLU(), 18 nn.MaxPool1d(kernel_size=config.max_text_len-h+1)) 19 for h in config.window_sizes 20 ]) 21 self.fc = nn.Linear(in_features=config.feature_size*len(config.window_sizes), 22 out_features=config.num_class) 23 if os.path.exists(config.embedding_path) and config.is_training and config.is_pretrain: 24 print("Loading pretrain embedding...") 25 self.embedding.weight.data.copy_(torch.from_numpy(np.load(config.embedding_path))) 26 27 def forward(self, x): 28 embed_x = self.embedding(x) 29 30 #print('embed size 1',embed_x.size()) # 32*35*256 31 # batch_size x text_len x embedding_size -> batch_size x embedding_size x text_len 32 embed_x = embed_x.permute(0, 2, 1) 33 #print('embed size 2',embed_x.size()) # 32*256*35 34 out = [conv(embed_x) for conv in self.convs] #out[i]:batch_size x feature_size*1 35 #for o in out: 36 # print('o',o.size()) # 32*100*1 37 out = torch.cat(out, dim=1) # 對應第二個維度(行)拼接起來,比如說5*2*1,5*3*1的拼接變成5*5*1 38 #print(out.size(1)) # 32*400*1 39 out = out.view(-1, out.size(1)) 40 #print(out.size()) # 32*400 41 if not self.use_element: 42 out = F.dropout(input=out, p=self.dropout_rate) 43 out = self.fc(out) 44 return out
embed_x一開始大小為32*35*256,32為batch_size。經過permute,變為32*256*35,輸入到自定義的網絡后,out中的每一個元素,大小為32*100*1,共有4個元素。在dim=1維度上進行拼接后,變為32*400*1,在經過view,變為32*400,最后通過400*num_class大小的全連接矩陣,變為32*2。
我在關系抽取的論文閱讀中,作者使用CNN卷積代碼如下:
1 class CNN3(nn.Module): 2 def __init__(self, config): 3 super(CNN3, self).__init__() 4 self.config = config 5 self.word_emb = nn.Embedding(config.data_word_vec.shape[0], config.data_word_vec.shape[1]) 6 self.word_emb.weight.data.copy_(torch.from_numpy(config.data_word_vec)) 7 self.word_emb.weight.requires_grad = False 8 9 # self.char_emb = nn.Embedding(config.data_char_vec.shape[0], config.data_char_vec.shape[1]) 10 # self.char_emb.weight.data.copy_(torch.from_numpy(config.data_char_vec)) 11 # char_dim = config.data_char_vec.shape[1] 12 # char_hidden = 100 13 # self.char_cnn = nn.Conv1d(char_dim, char_hidden, 5) 14 15 self.coref_embed = nn.Embedding(config.max_length, config.coref_size, padding_idx=0) 16 self.ner_emb = nn.Embedding(7, config.entity_type_size, padding_idx=0) 17 input_size = config.data_word_vec.shape[1] + config.coref_size + config.entity_type_size #+ char_hidden 18 #140維 19 self.out_channels = 200 20 self.in_channels = input_size 21 self.kernel_size = 3 22 self.stride = 1 23 self.padding = int((self.kernel_size - 1) / 2) 24 self.cnn_1 = nn.Conv1d(self.in_channels, self.out_channels, self.kernel_size, self.stride, self.padding) 25 self.cnn_2 = nn.Conv1d(self.out_channels, self.out_channels, self.kernel_size, self.stride, self.padding) 26 self.cnn_3 = nn.Conv1d(self.out_channels, self.out_channels, self.kernel_size, self.stride, self.padding) 27 self.max_pooling = nn.MaxPool1d(self.kernel_size, stride=self.stride, padding=self.padding) 28 self.relu = nn.ReLU() 29 self.dropout = nn.Dropout(config.cnn_drop_prob) 30 self.bili = torch.nn.Bilinear(self.out_channels+config.dis_size, self.out_channels+config.dis_size, config.relation_num) 31 self.dis_embed = nn.Embedding(20, config.dis_size, padding_idx=10) 32 33 # model(context_idxs, context_pos, context_ner, context_char_idxs, input_lengths, h_mapping, t_mapping, relation_mask, dis_h_2_t, dis_t_2_h) 34 def forward(self, context_idxs, pos, context_ner, context_char_idxs, context_lens, h_mapping, t_mapping, relation_mask, dis_h_2_t, dis_t_2_h): 35 # para_size, char_size, bsz = context_idxs.size(1), context_char_idxs.size(2), context_idxs.size(0) 36 # context_ch = self.char_emb(context_char_idxs.contiguous().view(-1, char_size)).view(bsz * para_size, char_size, -1) 37 # context_ch = self.char_cnn(context_ch.permute(0, 2, 1).contiguous()).max(dim=-1)[0].view(bsz, para_size, -1) 38 39 # self.word_emb(context_idxs).shape = [40,512,config.data_word_vec.shape[1]] 40 # self.coref_embed(pos) = [40,512,config.coref_size] 41 # self.coref_embed(pos) = [40,512,config.coref_size] 42 sent = torch.cat([self.word_emb(context_idxs), self.coref_embed(pos), self.ner_emb(context_ner)], dim=-1) 43 sent = sent.permute(0, 2, 1)#torch.Size([40, 140, 512]) 44 45 # batch * embedding_size * max_len 46 x = self.cnn_1(sent) #(b,140,512)->(b,200,512) 47 x = self.max_pooling(x)#(b,200,512)->(b,200,512) 48 x = self.relu(x) 49 x = self.dropout(x) 50 51 x = self.cnn_2(x)#(b,200,512)->(b,200,512) 52 x = self.max_pooling(x)#(b,200,512)->(b,200,512) 53 x = self.relu(x) 54 x = self.dropout(x) 55 56 x = self.cnn_3(x)#(b,200,512)->(b,200,512) 57 x = self.max_pooling(x)#(b,200,512)->(b,200,512) 58 x = self.relu(x) 59 x = self.dropout(x) 60 context_output = x.permute(0, 2, 1) #(b,512,200) 因為每一步都pading=1加了兩列,所以最后輸出沒有區別 61 start_re_output = torch.matmul(h_mapping, context_output) #(b,1800,512)*(b,512,200) ->(b,1800,200) 62 end_re_output = torch.matmul(t_mapping, context_output) 63 s_rep = torch.cat([start_re_output, self.dis_embed(dis_h_2_t)], dim=-1) #(b,1800,200+20) 64 t_rep = torch.cat([end_re_output, self.dis_embed(dis_t_2_h)], dim=-1) 65 predict_re = self.bili(s_rep, t_rep) #(b,1800,97) 66 return predict_re
作者定義了三層CNN,第一層通過增加通道數並使用大小為3的卷積核來提取特征,為了使句子最大長度維度每次經過卷積層不變而設計了padding
並不是直接通過CNN進行關系分類,先通過第一層提取特征,后經過兩層CNN(數據shape未變化,功能?),將通過3層CNN得到的context_ouput,此時context_output已經含有了一些文檔級信息,然后將h_mapping中含有的頭實體mask的信息與ontext_ouput相乘,將t_mapping中含有的尾實體mask的信息與ontext_ouput相乘。
接着將頭實體到尾實體的距離特征與start_re_output的特征維度上進行拼接,將尾實體到頭實體的距離特征與end_re_output的特征維度上進行拼接
最后將兩個數據均送入與定義好的雙線性層進行預測,得到預測的結果。
可以看到35-37行代碼作者嘗試使用Glove訓練好的字符的預訓練量,可能遇到問題放棄了。因此效果並不好。
Conv2d
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
in_channels |
Number of channels in the input image |
out_channels |
Number of channels produced by the convolution |
kernel_size |
卷積核尺寸 |
stride |
步長,控制cross-correlation的步長,可以設為1個int型數或者一個(int, int)型的tuple。 |
padding |
(補0):控制zero-padding的數目。 |
dilation |
(擴張):控制kernel點(卷積核點)的間距 |
groups |
(卷積核個數):通常來說,卷積個數唯一,但是對某些情況,可以設置范圍在1 —— in_channels中數目的卷積核: |
bias |
adds a learnable bias to the output. |
實例:
1 import torch 2 x = torch.randn(2,1,7,3) 3 conv = torch.nn.Conv2d(1,8,(2,3)) 4 res = conv(x) 5 print(res.shape) # shape = (2, 8, 6, 1)
輸入x:
[ batch_size, channels, height_1, width_1 ]
batch_size | 一個batch中樣例的個數 | 2 |
channels | 通道數,也就是當前層的深度 | 1 |
height_1 | 圖片的高 | 7 |
width_1 | 圖片的寬 | 3 |
Conv2d的參數
[ channels, output, height_2, width_2 ]
channels | 通道數,和上面保持一致,也就是當前層的深度 | 1 |
output | 輸出的深度 | 8 |
height_2 | 過濾器filter的高 | 2 |
weight_2 | 過濾器filter的寬 | 3 |
輸出res
[batch_size,output,height_3,width_3]
batch_size | 一個batch中樣例的個數,同上 | 2 |
output | 輸出的深度 | 8 |
height_3 | 卷積結果的高度 | h1-h2+1 = 7-2+1 = 6 |
weight_3 | 卷積結果的寬度 | w1-w2+1 = 3-3+1 = 1 |
Shape:
$(N,C_{in},H_{in},W_{in})$
$(N,C_{out},H_{out},W_{out})$
$H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor$
$W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor$
參考:
pytorch 中nn.MaxPool1d() 和nn.MaxPool2d()對比: https://www.jianshu.com/p/c5b8e02bedbe
pytorch中的nn.Bilinear的計算原理詳解: https://blog.csdn.net/nihate/article/details/90480459
pytorch之nn.Conv1d詳解:https://blog.csdn.net/sunny_xsc1994/article/details/82969867
pytorch中的matmul: https://blog.csdn.net/yu_1628060739/article/details/102720385
torch.nn.Conv2d()函數詳解:https://blog.csdn.net/m0_37586991/article/details/87855342