1.Pytorch中的LSTM模型參數說明
class torch.nn.LSTM(*args, **kwargs)
Pytorch官方文檔中參數說明:
Args: input_size: The number of expected features in the input `x` hidden_size: The number of features in the hidden state `h` num_layers: Number of recurrent layers. E.g., setting ``num_layers=2`` would mean stacking two LSTMs together to form a `stacked LSTM`, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1 bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`. Default: ``True`` batch_first: If ``True``, then the input and output tensors are provided as (batch, seq, feature). Default: ``False`` dropout: If non-zero, introduces a `Dropout` layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to :attr:`dropout`. Default: 0 bidirectional: If ``True``, becomes a bidirectional LSTM. Default: ``False`` Inputs: input, (h_0, c_0) - **input** of shape `(seq_len, batch, input_size)`: tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See :func:`torch.nn.utils.rnn.pack_padded_sequence` or :func:`torch.nn.utils.rnn.pack_sequence` for details. - **h_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch. If the LSTM is bidirectional, num_directions should be 2, else it should be 1. - **c_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the initial cell state for each element in the batch. If `(h_0, c_0)` is not provided, both **h_0** and **c_0** default to zero. Outputs: output, (h_n, c_n) - **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor containing the output features `(h_t)` from the last layer of the LSTM, for each `t`. If a :class:`torch.nn.utils.rnn.PackedSequence` has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using ``output.view(seq_len, batch, num_directions, hidden_size)``, with forward and backward being direction `0` and `1` respectively. Similarly, the directions can be separated in the packed case. - **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the hidden state for `t = seq_len`. Like *output*, the layers can be separated using ``h_n.view(num_layers, num_directions, batch, hidden_size)`` and similarly for *c_n*. - **c_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the cell state for `t = seq_len`.
參數列表:
- input_size:x的特征維度,自然語言處理中表示詞向量的特征維度(100維、200維、300維)
- hidden_size:隱藏層的特征維度
- num_layers:lstm隱層的層數,默認為1
- bias:False則bih=0和bhh=0. 默認為True
- batch_first:True則輸入輸出的數據格式為 (batch, seq, feature)
- dropout:除最后一層,每一層的輸出都進行dropout,默認為: 0
- bidirectional:True則為雙向lstm默認為False
- 輸入:input, (h0, c0)
- 輸出:output, (hn,cn)
輸入數據格式:
input (seq_len, batch, input_size)
h0 (num_layers * num_directions, batch, hidden_size)
c0 (num_layers * num_directions, batch, hidden_size)
輸出數據格式:
output (seq_len, batch, hidden_size * num_directions)
hn (num_layers * num_directions, batch, hidden_size)
cn (num_layers * num_directions, batch, hidden_size)
Pytorch里的LSTM單元接受的輸入都必須是3維的張量(Tensors).每一維代表的意思不能弄錯。
第一維度體現的是batch_size,也就是一次性喂給網絡多少條句子,或者股票數據中的,一次性喂給模型多少個時間單位的數據,具體到每個時刻,也就是一次性喂給特定時刻處理的單元的單詞數或者該時刻應該喂給的股票數據的條數。上圖中10表示一次性喂給模型10個句子。
第二維體現的是序列(sequence)結構,也就是序列的個數,用文章來說,就是每個句子的長度,因為是喂給網絡模型,一般都設定為確定的長度,也就是我們喂給LSTM神經元的每個句子的長度,當然,如果是其他的帶有帶有序列形式的數據,則表示一個明確分割單位長度。上圖中40表示10個句子的統一長度均為40個單詞。
例如是如果是股票數據內,這表示特定時間單位內,有多少條數據。這個參數也就是明確這個層中有多少個確定的單元來處理輸入的數據。
第三維度體現的是輸入的元素(elements of input),也就是,每個具體的單詞用多少維向量來表示,或者股票數據中 每一個具體的時刻的采集多少具體的值,比如最低價,最高價,均價,5日均價,10均價,等等。上圖中100表示每個單詞的詞向量是100維的。
H0-Hn是什么意思呢?就是每個時刻中間神經元應該保存的這一時刻的根據輸入和上一課的時候的中間狀態值應該產生的本時刻的狀態值,
這個數據單元是起的作用就是記錄這一時刻之前考慮到所有之前輸入的狀態值,形狀應該是和特定時刻的輸出一致
c0-cn就是開關,決定每個神經元的隱藏狀態值是否會影響的下一時刻的神經元的處理,形狀應該和h0-hn一致。
當然如果是雙向,和多隱藏層還應該考慮方向和隱藏層的層數。
參考文獻:https://zhuanlan.zhihu.com/p/41261640
https://www.zhihu.com/question/41949741/answer/318771336