1.在進行
gdb python
r XX.py
where
調試時,報出以下錯誤:
1)每次運行都開38個線程,是否是線程超載[New Thread 0x7ffff2fd2700 (LWP 7415)]
[New Thread 0x7ffff27d1700 (LWP 7416)] [New Thread 0x7fffeffd0700 (LWP 7417)] [New Thread 0x7fffeb7cf700 (LWP 7418)] [New Thread 0x7fffe8fce700 (LWP 7419)] [New Thread 0x7fffe67cd700 (LWP 7420)] [New Thread 0x7fffe3fcc700 (LWP 7421)] [New Thread 0x7fffe17cb700 (LWP 7422)] [New Thread 0x7fffdefca700 (LWP 7423)] [New Thread 0x7fffdc7c9700 (LWP 7424)] [New Thread 0x7fffd9fc8700 (LWP 7425)] [New Thread 0x7fffd77c7700 (LWP 7426)] [New Thread 0x7fffd4fc6700 (LWP 7427)] [New Thread 0x7fffd27c5700 (LWP 7428)] [New Thread 0x7fffcffc4700 (LWP 7429)] [New Thread 0x7fffcd7c3700 (LWP 7430)] [New Thread 0x7fffcafc2700 (LWP 7431)] [New Thread 0x7fffc87c1700 (LWP 7432)] [New Thread 0x7fffc5fc0700 (LWP 7433)] [New Thread 0x7fffc37bf700 (LWP 7434)] [New Thread 0x7fffc0fbe700 (LWP 7435)] [New Thread 0x7fffbe7bd700 (LWP 7436)] [New Thread 0x7fffbbfbc700 (LWP 7437)] [New Thread 0x7fffb97bb700 (LWP 7438)] [New Thread 0x7fffb6fba700 (LWP 7439)] [New Thread 0x7fffb47b9700 (LWP 7440)] [New Thread 0x7fffb1fb8700 (LWP 7441)] [New Thread 0x7fffaf7b7700 (LWP 7442)] [New Thread 0x7fffacfb6700 (LWP 7443)] [New Thread 0x7fffaa7b5700 (LWP 7444)] [New Thread 0x7fffa7fb4700 (LWP 7445)] [New Thread 0x7fffa57b3700 (LWP 7446)] [New Thread 0x7fffa2fb2700 (LWP 7447)] [New Thread 0x7fffa07b1700 (LWP 7448)] [New Thread 0x7fff9dfb0700 (LWP 7449)]
[New Thread 0x7fff9b7af700 (LWP 7450)] [New Thread 0x7fff98fae700 (LWP 7451)] [New Thread 0x7fff967ad700 (LWP 7452)] [New Thread 0x7fff93fac700 (LWP 7453)]
2)現在報出:
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled 。。。 File "pygpu/gpuarray.pyx", line 658, in pygpu.gpuarray.init File "pygpu/gpuarray.pyx", line 587, in pygpu.gpuarray.pygpu_init GpuArrayException: cuDeviceGet: CUDA_ERROR_INVALID_DEVICE: invalid device ordinal
先不解決這個,先嘗試測試一下:
發現,在import keras,也會報上述同樣的錯誤!
conda install mkl conda install mkl-service #使用以上兩句均顯示: # All requested packages already installed. conda install blas
依舊不可以導入keras包。
3)將原有的conda環境刪除,又新創建了環境,用conda安裝了mkl之后,嘗試import keras之后,仍然報錯:
Using Theano backend. ~/lib/python2.7/site-packages/theano/gpuarray/dnn.py:184: UserWarning: Your cuDNN version is more recent than Theano.
If you encounter problems, try updating Theano or downgrading cuDNN to a version >= v5 and <= v7. warnings.warn("Your cuDNN version is more recent than " ERROR (theano.gpuarray): Could not initialize pygpu, support disabled Traceback (most recent call last): File "~/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 227, in <module> use(config.device) File "~/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 214, in use init_dev(device, preallocate=preallocate) File "~/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 99, in init_dev **args) File "pygpu/gpuarray.pyx", line 658, in pygpu.gpuarray.init File "pygpu/gpuarray.pyx", line 587, in pygpu.gpuarray.pygpu_init GpuArrayException: cuDeviceGet: CUDA_ERROR_INVALID_DEVICE: invalid device ordinal
在我的.theanorc配置文件中,是這么寫的:
[global] floatX = float32 device =cuda1
嘗試去掉cuda編號?居然成功了!
Using Theano backend. ~/.conda/envs/xhs/lib/python2.7/site-packages/theano/gpuarray/dnn.py:184: UserWarning: Your cuDNN version is more recent than Theano.
If you encounter problems, try updating Theano or downgrading cuDNN to a version >= v5 and <= v7. warnings.warn("Your cuDNN version is more recent than " Using cuDNN version 7201 on context None Mapped name None to device cuda: GeForce GTX 1080 Ti (0000:03:00.0)
接下來嘗試解決 上述的用戶警告。
由於theano已經是1.0.4最新版本,無法再進行更新,只能嘗試將cuDNN版本降級。
但是使用conda list查看所有安裝的包:
cudnn 6.0.21 cuda8.0_0 https://mirrors.tuna.tsinghua.edu.cn/a
#嘗試此命令查看pygpu是否可用 DEVICE="cuda" python -c "import pygpu; pygpu.test()"
出現以下問題:https://github.com/Theano/Theano/issues/6420
此幫助里說,如果不是使用多個GPU可以忽略test_collectives error。
#嘗試以下, python test_gpu.py ~/.conda/envs/xhs/lib/python2.7/site-packages/theano/gpuarray/dnn.py:184: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to a version >= v5 and <= v7. warnings.warn("Your cuDNN version is more recent than " Using cuDNN version 7201 on context None Mapped name None to device cuda: GeForce GTX 1080 Ti (0000:03:00.0) [GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, vector)>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)] Looping 1000 times took 0.192847 seconds Result is [1.2317803 1.6187935 1.5227807 ... 2.2077181 2.2996776 1.623233 ] Used the gpu
發現其使用的cudnn版本是7.2,明明是6.0但是卻調用了7.2?
查看cuda的版本信息發現:
nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Sep__1_21:08:03_CDT_2017 Cuda compilation tools, release 9.0, V9.0.176
//發現安裝cuda簡直十分麻煩,所以下嘗試一下運行程序。
Starting epoch 0...
段錯誤 (核心已轉儲)
http://imatlab.lofter.com/post/286ffc_a6ead7
#查看分配占空間的大小 ulimit -a #顯示 stack size (kbytes, -s) 8192
也就僅僅8M大小,實在是太小了。
改為ulimit -s 102400,仍舊段錯誤。
試圖將其調整為更大或者unlimit時,報錯:
-bash: ulimit: stack size: 無法修改 limit 值: 不允許的操作
#使用sudo提示如下: sudo: ulimit:找不到命令
在limit.conf下加了
#* soft stack unlimited
再使用ulimit -s unlimited就可以用了,但是運行程序發現仍是段錯誤,繼續修改
#max locked memory (kbytes, -l) 64 #嘗試修改maxloc但是同樣的方法不起作用
——————
終於解決了,在github上keras項目下發布的issue中找到了:

由於本機上的CUDA版本為9,所以又根據教程安裝了CUDA8版本,以及cuDNN6.0版本,之后就可以了!!!
就是由於CUDA9不適合theano1.0!!!所以必須將版本,降版本之后就沒有上述的warning了,就可以成功跑theano后端的keras代碼了。
