在Windows10的GPU上跑一段普通的TensorFlow代碼報錯如下
2019-04-02 09:50:47.986024: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 11.00GiB freeMemory: 9.10GiB
2019-04-02 09:50:47.991931: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2019-04-02 09:50:48.667536: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-02 09:50:48.672794: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0
2019-04-02 09:50:48.675436: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N
2019-04-02 09:50:48.678921: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8795 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-04-02 09:50:51.208473: E C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\stream_executor\cuda\cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_INSTRUCTION
2019-04-02 09:50:51.213582: F C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_event_mgr.cc:208] Unexpected Event status: 1
這是因為Windows版本下,GPU版本的TensorFlow里,tf.one_hot()函數有bug,最簡單的解決辦法是把這個函數放到CPU上(會影響速度,但不會報錯)
with tf.device('/cpu:0'):
b = tf.one_hot(a, 123)
還有一些其他報錯也是這個問題,比如這里stackoverflow
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_driver.cc:1177] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED :: No stack trace available
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_util.cc:370] GPU sync failed