Pytorch學習筆記05---- pack_padded_sequence和pad_packed_sequence理解

本文轉載自查看原文 2020-07-25 15:18 676 Pytorch自然語言處理

1.為什么要用pack_padded_sequence
在使用深度學習特別是RNN(LSTM/GRU)進行序列分析時，經常會遇到序列長度不一樣的情況，此時就需要對同一個batch中的不同序列使用padding的方式進行序列長度對齊（可以都填充為batch中最長序列的長度，也可以設置一個統一的長度，對所有序列長截短填），方便將訓練數據輸入到LSTM模型進行訓練，填充后一個batch的序列可以統一處理，加快速度。但是此時會有一個問題，LSTM會對序列中非填充部分和填充部分同等看待，這樣會影響模型訓練的精度，應該告訴LSTM相關序列的padding情況，讓LSTM只對非填充部分進行運算。此時，pytorch中的pack_padded_sequence就有了用武之地。

其實有時候，可以填充后直接做，影響有時也不是很大，使用pack_padded_sequence后效果可能會更好。

結合例子分析：

如果不用pack和pad操作會有一個問題，什么問題呢？比如上圖，句子“Yes”只有一個單詞，但是padding了多余的pad符號，這樣會導致LSTM對它的表示通過了非常多無用的字符，這樣得到的句子表示就會有誤差，更直觀的如下圖：

那么我們正確的做法應該是怎么樣呢？

在上面這個例子，我們想要得到的表示僅僅是LSTM過完單詞"Yes"之后的表示，而不是通過了多個無用的“Pad”得到的表示：如下圖：

torch.nn.utils.rnn.pack_padded_sequence()

這里的pack，理解成壓緊比較好。將一個填充過的變長序列壓緊。（填充時候，會有冗余，所以壓緊一下）

其中pack的過程為：（注意pack的形式，不是按行壓，而是按列壓）

pack之后，原來填充的 PAD（一般初始化為0）占位符被刪掉了。

輸入的形狀可以是(T×B×* )。T是最長序列長度，B是batch size，*代表任意維度(可以是0)。如果batch_first=True的話，那么相應的 input size 就是 (B×T×*)。

Variable中保存的序列，應該按序列長度的長短排序，長的在前，短的在后。即input[:,0]代表的是最長的序列，input[:, B-1]保存的是最短的序列。

NOTE： 只要是維度大於等於2的input都可以作為這個函數的參數。你可以用它來打包labels，然后用RNN的輸出和打包后的labels來計算loss。通過PackedSequence對象的.data屬性可以獲取 Variable。

參數說明:

input (Variable) – 變長序列被填充后的 batch
lengths (list[int]) – Variable 中每個序列的有效長度(即去掉pad的真實長度)。
batch_first (bool, optional) – 如果是True，input的形狀應該是B*T*size。

返回值:

一個PackedSequence 對象。

torch.nn.utils.rnn.pad_packed_sequence()

填充packed_sequence。

上面提到的函數的功能是將一個填充后的變長序列壓緊。這個操作和pack_padded_sequence()是相反的。把壓緊的序列再填充回來。填充時會初始化為0。

返回的Varaible的值的size是 T×B×*, T 是最長序列的長度，B 是 batch_size,如果 batch_first=True,那么返回值是B×T×*。

Batch中的元素將會以它們長度的逆序排列。

參數說明:

sequence (PackedSequence) – 將要被填充的 batch
batch_first (bool, optional) – 如果為True，返回的數據的格式為 B×T×*。

返回值: 一個tuple，包含被填充后的序列，和batch中序列的長度列表

2.小案例：

假設有demo.txt文件，包含下面5段文本/序列：

Some people like to choose those who are different from themselves while others prefer those who are similar to themselves.
People choose friends in differrent ways.
For instance, if an active and energetic guy proposes to his equally active and energetic friends that they should have some activities, it is more likely that his will agree at once.
When people have friends similar to themselves, they and their friends chat, play, and do thing together natually and harmoniously.
The result is that they all can feel relaxed and can trully enjoy each other's company.

使用下面的腳本將單詞轉換為索引，並填充為統一的長度：

import numpy as np
import torch
import torch.nn as nn

vocab = {} #詞到索引的映射字典
token_id = 1 #token_id=0 預留給填充符號
lengths = [] #存儲每個文本的實際長度

with open('demo.txt', 'r') as f:
    for l in f:
        tokens = l.strip().split() #這里對英文分詞 簡單的按空格切分。（當然可以使用一些效果更好的分詞工具，可以把標點分出來）
        print(tokens)
        lengths.append(len(tokens))
        for t in tokens:
            if t not in vocab:
                vocab[t] = token_id
                token_id += 1

x = np.zeros((len(lengths), max(lengths))) #所有文本填充為最大的長度
l_no = 0
with open('demo.txt', 'r') as f:
    for l in f:
        tokens = l.strip().split()
        for i in range(len(tokens)):
            x[l_no, i] = vocab[tokens[i]]
        l_no += 1

print(x)
print(x.shape)

x = torch.tensor(x,requires_grad=True)
lengths = torch.Tensor(lengths)

print("lenghts:",lengths)
#所有文本長度按從大到小排序 （降序）,返回排序后的索引idx_sort
_, idx_sort = torch.sort(torch.Tensor(lengths), dim=0, descending=True)
print("idx_sort:",idx_sort)
#對索引idx_sort進行從小到大排序 ,返回排序后的索引 idx_unsort
_, idx_unsort = torch.sort(idx_sort, dim=0)
print("idx_unsort:",idx_unsort)

x1 = x[idx_sort]#x中的各個文本 隨着排序 即最長的文本在第一行...
lengths1 = list(lengths[idx_sort])#此時各個文本對應的長度（從大到小排序后）
print("lenghts1:",lengths1)
print("x1的形狀與內容:")
print(x1)
print(x1.shape)
x2=x1[idx_unsort]
print("x2的形狀與內容:")
print(x2)
print(x2.shape)

控制台輸出：

D:\softwaretools\anaconda\python.exe D:/pycharmprojects/hoteltest01/hoteltest01/testpy/test07_pack_pad.py
['Some', 'people', 'like', 'to', 'choose', 'those', 'who', 'are', 'different', 'from', 'themselves', 'while', 'others', 'prefer', 'those', 'who', 'are', 'similar', 'to', 'themselves.']
['People', 'choose', 'friends', 'in', 'differrent', 'ways.']
['For', 'instance,', 'if', 'an', 'active', 'and', 'energetic', 'guy', 'proposes', 'to', 'his', 'equally', 'active', 'and', 'energetic', 'friends', 'that', 'they', 'should', 'have', 'some', 'activities,', 'it', 'is', 'more', 'likely', 'that', 'his', 'will', 'agree', 'at', 'once.']
['When', 'people', 'have', 'friends', 'similar', 'to', 'themselves,', 'they', 'and', 'their', 'friends', 'chat,', 'play,', 'and', 'do', 'thing', 'together', 'natually', 'and', 'harmoniously.']
['The', 'result', 'is', 'that', 'they', 'all', 'can', 'feel', 'relaxed', 'and', 'can', 'trully', 'enjoy', 'each', "other's", 'company.']
[[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 6. 7. 8. 15. 4. 16. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [17.  5. 18. 19. 20. 21.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [22. 23. 24. 25. 26. 27. 28. 29. 30.  4. 31. 32. 26. 27. 28. 18. 33. 34.
  35. 36. 37. 38. 39. 40. 41. 42. 33. 31. 43. 44. 45. 46.]
 [47.  2. 36. 18. 15.  4. 48. 34. 27. 49. 18. 50. 51. 27. 52. 53. 54. 55.
  27. 56.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [57. 58. 40. 33. 34. 59. 60. 61. 62. 27. 60. 63. 64. 65. 66. 67.  0.  0.
   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]
(5, 32)
lenghts: tensor([20.,  6., 32., 20., 16.])
idx_sort: tensor([2, 0, 3, 4, 1])
idx_unsort: tensor([1, 4, 0, 2, 3])
lenghts1: [tensor(32.), tensor(20.), tensor(20.), tensor(16.), tensor(6.)]
x1的形狀與內容:
tensor([[22., 23., 24., 25., 26., 27., 28., 29., 30.,  4., 31., 32., 26., 27.,
         28., 18., 33., 34., 35., 36., 37., 38., 39., 40., 41., 42., 33., 31.,
         43., 44., 45., 46.],
        [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13., 14.,
          6.,  7.,  8., 15.,  4., 16.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.],
        [47.,  2., 36., 18., 15.,  4., 48., 34., 27., 49., 18., 50., 51., 27.,
         52., 53., 54., 55., 27., 56.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.],
        [57., 58., 40., 33., 34., 59., 60., 61., 62., 27., 60., 63., 64., 65.,
         66., 67.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.],
        [17.,  5., 18., 19., 20., 21.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.]], dtype=torch.float64, grad_fn=<IndexBackward>)
torch.Size([5, 32])
x2的形狀與內容:
tensor([[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 6., 7., 8., 15., 4., 16., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [17.,  5., 18., 19., 20., 21.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.],
        [22., 23., 24., 25., 26., 27., 28., 29., 30.,  4., 31., 32., 26., 27.,
         28., 18., 33., 34., 35., 36., 37., 38., 39., 40., 41., 42., 33., 31.,
         43., 44., 45., 46.],
        [47.,  2., 36., 18., 15.,  4., 48., 34., 27., 49., 18., 50., 51., 27.,
         52., 53., 54., 55., 27., 56.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.],
        [57., 58., 40., 33., 34., 59., 60., 61., 62., 27., 60., 63., 64., 65.,
         66., 67.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.]], dtype=torch.float64, grad_fn=<IndexBackward>)
torch.Size([5, 32])

Process finished with exit code 0

由x2與原始x的形狀是一樣的，主要是因為下面兩行

idx_sort: tensor([2, 0, 3, 4, 1])
idx_unsort: tensor([1, 4, 0, 2, 3])

x_packed = nn.utils.rnn.pack_padded_sequence(input=x1, lengths=lengths1, batch_first=True)
print(x_packed)

需要注意的是，pack_padded_sequence函數的參數，lengths需要從大到小排序(length1)，x1已根據長度大小排好序(最長的序列在第一行…)，batch_first如果設置為true，則x的第一維為batch_size，第二維為seq_length，否則相反。
打印x_packed如下：

PackedSequence(data=tensor([22.,  1., 47., 57., 17., 23.,  2.,  2., 58.,  5., 24.,  3., 36., 40.,
        18., 25.,  4., 18., 33., 19., 26.,  5., 15., 34., 20., 27.,  6.,  4.,
        59., 21., 28.,  7., 48., 60., 29.,  8., 34., 61., 30.,  9., 27., 62.,
         4., 10., 49., 27., 31., 11., 18., 60., 32., 12., 50., 63., 26., 13.,
        51., 64., 27., 14., 27., 65., 28.,  6., 52., 66., 18.,  7., 53., 67.,
        33.,  8., 54., 34., 15., 55., 35.,  4., 27., 36., 16., 56., 37., 38.,
        39., 40., 41., 42., 33., 31., 43., 44., 45., 46.], dtype=torch.float64,
       grad_fn=<PackPaddedSequenceBackward>), batch_sizes=tensor([5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1]), sorted_indices=None, unsorted_indices=None)

他把x1的兩個維度合並成了一個維度，原本x1(batch_size,max_seq_len)=（5，32）,x_packed相當於對x1按列進行訪問，並且忽略掉其中的填充值0；下面多出的batch_size有max_seq_len=32個數字，可以理解為對x1進行按列訪問時，每一列非填充值的個數，可以看到剛開始的幾列沒有填充值（每個序列的開始部分），值為batch_size=5,后面由於有的序列不夠長，逐漸出現填充值0，所以batch_size的大小逐漸變小<5,直到最后等於1，也就是只有那個batch中最長的序列還有非填充值，其余序列都是填充值0.

參考文獻：

https://blog.csdn.net/sdu_hao/article/details/105408552

https://www.cnblogs.com/sbj123456789/p/9834018.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pytorch中的pack_padded_sequence和pad_packed_sequence用法 Pytorch中的RNN之pack_padded_sequence()和pad_packed_sequence() lstm pytorch梳理之 batch_first 參數和torch.nn.utils.rnn.pack_padded_sequence sequence UML時序圖(Sequence Diagram)學習筆記 pytorch中pad和pack操作使用詳解對sequence的一些理解 Convolutional Sequence to Sequence Learning 論文筆記 Oracle筆記之序列(Sequence) 數據結構習題Pop Sequence的理解----小白筆記^_^