當模型有多輸出的時候,容易產生此問題,如以下程序所示:
# zero the parameter gradients
model.zero_grad()
# forward + backward + optimize
outputs, hidden = model(inputs, hidden)
loss = _loss(outputs, session, items)
acc_loss += loss.data[0]
loss.backward()
# Add parameters' gradients to their values, multiplied by learning rate
for p in model.parameters():
p.data.add_(-learning_rate, p.grad.data)
第一種解決方案:
detach/repackage the hidden state in between batches. There are (at least) three ways to do this.
hidden.detach_()hidden = hidden.detach()hidden = Variable(hidden.data, requires_grad=True)
第二種解決方案:
replace loss.backward() with loss.backward(retain_graph=True) but know that each successive batch will take more time than the previous one because it will have to back-propagate all the way through to the start of the first batch.
通常來說,第二種解決方案速度很慢,如果內存小的話會內存溢出
