os: ubuntu14.04.4
python: 2.7.13
tensorflow-gpu: 1.4.1
cuda: 8.0.44-1
cudnn: cudnn-8.0-linux-x64-v6.0-tgz
1.安裝支持gpu設置的tensorflow-gpu
pip install tensorflow-gpu==1.4.1 -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
2.安裝cuda
dpkg -i cuda-repo-ubuntu1404_10.0.130-1_amd64.deb apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/7fa2af80.pub apt-get update apt-get install cuda=8.0.44-1
安裝完cuda,就有nvidia-smi命令可以在shell命令行查看gpu設備。因為nvidia-418、nvidia-418-dev這2個已經被當成依賴安裝完成了。
當時因為碰到這個問題 https://devtalk.nvidia.com/default/topic/1048630/b/t/post/5322060/
解決思路來自 https://developer.nvidia.com/cuda-10.0-download-archive選擇操作系統、版本,下載cuda-repo-ubuntu1404_10.0.130-1_amd64.deb。
3.安裝cudnn
因為libcudnn.so.6: cannot open shared object file: No such file or directory這個報錯
google了一圈發現, 問題出在 TensorFlow 1.4-gpu 是基於cuDNN6,需要的也就是libcudnn.so.6了。
解決方案:
到官網https://developer.nvidia.com/cudnn下載相應的cudnn庫
tar xvzf cudnn-8.0-linux-x64-v6.0.tgz
cp -P cuda/include/cudnn.h /usr/local/cuda/include
cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64
chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
Now set Path variables
$ vim ~/.bashrc
翻到最底部加上:
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda
最后進去python命令行
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
看看輸出信息有沒有顯示GPU設備
另外如果nvidia-smi碰到以下報錯,可以嘗試重啟(反正我是這么解決的。。)
Failed to initialize NVML: Driver/library version mismatch