⚠ TensorFlow-GPU 執行模型訓練時報錯:
InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.
解決方案:『TensorFlow: Dst tensor is not initialized - Stack Overflow』
主要原因在於 batch_size 太大,內存無法負載,將 batch_size 適當調小即可正常運行。
【注】默認情況下,TF 會盡可能地多分配占用 GPU 內存,通過調整 GPUConfig 可以設置為按需分配內存,參考『TensorFlow 文檔』和『TensorFlow 代碼』。
另外,使用 Jupyter Notebook 進行長期模型訓練時,可能由於 GPU 內存無法及時釋放導致該報錯。參考『此答案』可以解決此問題,定義如下函數:
from keras.backend import set_session from keras.backend import clear_session from keras.backend import get_session import gc # Reset Keras Session def reset_keras(): sess = get_session() clear_session() sess.close() sess = get_session() try: del classifier # this is from global space - change this as you need except: pass print(gc.collect()) # if it does something you should see a number as output # use the same config as you used to create the session config = tf.compat.v1.ConfigProto() config.gpu_options.per_process_gpu_memory_fraction = 1 config.gpu_options.visible_device_list = "0" set_session(tf.compat.v1.Session(config=config))
需要清除 GPU 內存時,直接調用 reset_keras 函數即可。例如:
dense_layers = [0, 1, 2] layer_sizes = [32, 64, 128] conv_layers = [1, 2, 3] for dense_layer in dense_layers: for layer_size in layer_sizes: for conv_layer in conv_layers: reset_keras() # training your model here