one of the variables needed for gradient computation has been modified by an inplace operation

本文轉載自查看原文 2021-06-02 18:28 2225 Pytorch

記錄一個pytorch多卡訓練遇到的bug
報錯如下：

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512]] is at version 30; expected version 29 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

這個是多卡訓練時候遇到的，單卡是一切正常的

先按網上的提示，在報錯的代碼前加上with torch.autograd.set_detect_anomaly(True):語句，之后它會把掛掉時候的棧顯示出來，我的打出來是在batchNorm那里出的問題

搜索得到一個方案：https://discuss.pytorch.org/t/ddp-sync-batch-norm-gradient-computation-modified/82847/5

解決方法就是在DDP那里加上一個broadcast_buffers=False參數

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。