機器學習深度學習框架使用問題匯總

本文轉載自查看原文 2019-07-06 10:26 727 tensorflow/ [深度學習]/ 報錯/ keras/ [機器學習]

1.使用keras做mnist分類時，運行時GPU報錯

錯誤信息如下：

2019-07-06 10:26:32.949617: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-07-06 10:26:33.125786: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1392] Found device 0 with properties: 
name: GeForce GTX 960 major: 5 minor: 2 memoryClockRate(GHz): 1.2785
pciBusID: 0000:02:00.0
totalMemory: 4.00GiB freeMemory: 3.33GiB
2019-07-06 10:26:33.125952: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1471] Adding visible gpu devices: 0
2019-07-06 10:26:33.395215: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-06 10:26:33.395314: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:958]      0 
2019-07-06 10:26:33.395375: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0:   N 
2019-07-06 10:26:33.395553: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3062 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960, pci bus id: 0000:02:00.0, compute capability: 5.2)
2019-07-06 10:26:33.982153: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_blas.cc:459] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-07-06 10:26:33.982424: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_blas.cc:459] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-07-06 10:26:33.983696: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_blas.cc:459] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-07-06 10:26:33.983828: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_blas.cc:459] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-07-06 10:26:33.984097: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_blas.cc:459] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-07-06 10:26:33.985419: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_blas.cc:459] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-07-06 10:26:33.985513: W T:\src\github\tensorflow\tensorflow\stream_executor\stream.cc:2009] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call
    return fn(*args)
  File "C:\Users\Administrator\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "C:\Users\Administrator\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(32, 784), b.shape=(784, 32), m=32, n=32, k=784
     [[Node: dense_1/MatMul = MatMul[T=DT_FLOAT, _class=["loc:@training/RMSprop/gradients/dense_1/MatMul_grad/MatMul_1"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_dense_1_input_0_2/_29, dense_1/kernel/read)]]
     [[Node: loss/mul/_61 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_361_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

解決辦法：

在前面設置tensorflow對GPU內存的分配比例：

# 解決報錯GPU運行報錯的問題
# 這里導入tf，用來修改tf后端的配置
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
# 將顯存容量調到只會使用30%
config.gpu_options.per_process_gpu_memory_fraction = 0.3
# 使用設置好的配置
set_session(tf.Session(config=config))

2.Matplotlib和Qt5py的兼容問題

在Pycharm中使用matplotlib的時候，當取消了show plots in tool windows選項時，報錯：

pycharm This application failed to start because it could not find or load the Qt 
platform plugin "windows"

解決辦法：

在系統變量中添加QT_PLUGIN_PATH

3.訓練途中出現NaN數值，比如loss和accuracy等

1.一般出現NaN時，是因為有一些非法計算過程，例如log(0)，所以我們要檢查是否在計算過程中存在tf.math.log()等函數

如果有的話，可以使用tf.log(tf.clip_by_value(y,1e-8,1.0))

2.可以嘗試調整學習率

4.使用tensorflow訓練時出現調用cudnn錯誤

錯誤信息：

tensorflow/stream_executor/cuda/cuda_driver.cc:406 failed call to cuInit: CUDA_ERROR_UNKNOWN

解決方法：

　　在Nvida官網查看顯卡所需驅動版本：https://www.geforce.cn/drivers

下載並安裝更新，問題解決。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 圖像——基於深度學習和機器學習的深度學習筆記匯總機器學習與深度學習：微積分知識匯總機器學習&深度學習視頻資料匯總機器學習和深度學習視頻資料匯總機器學習面試問題匯總深度學習機器學習面試問題准備 DMLC深度機器學習框架MXNet的編譯安裝機器學習&深度學習基礎（目錄）機器學習平台和深度學習平台【機器學習基礎】關於深度學習的Tips