tensorflow 單機多GPU訓練時間比單卡更慢/沒有很大時間上提升


使用tensorflow model庫里的cifar10 多gpu訓練時,最后測試發現時間並沒有減少,反而更慢

參考以下兩個鏈接

https://github.com/keras-team/keras/issues/9204

https://medium.com/@c_61011/why-multi-gpu-training-is-not-faster-f439fe6dd6ec

原因可能是在cpu上進行參數梯度同步占每一步的很大比例

‘’‘

It seems that CPU-side data-preprocessing can be one of the reason that greatly slow down the multi-GPU training, do you try disabling some pre-processing options such as data-augmentation and then see any boost?

Besides, the current version of multi_gpu_model seems to benefit large NN-models only, such as Xception, since weights synchronization is not the bottleneck. When it is wrapped to simple model such as mnist_cnn and cifar_cnn, weights synchronization is pretty frequent and makes the whole time much slower.

 ’‘’

 

然后看到建議上提高模型復雜度(尤其是卷積層數)或者提高輸入數據的大小,就可以看到多gpu訓練的優勢效果了

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM