ubuntu18.04開機后NVIDIA顯卡驅動加載失敗


1.開機按esc進入ubuntu高級選項,選擇內核版本,之后回車

注意:記住此版本號

2.接下來按照如下操作

# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

# GRUB_DEFAULT=0
GRUB_DEFAULT="1> 2"  # 修改后的配置
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX=""

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"


3.重啟之后采用uname -r 查看當前的內核版本

參考鏈接:https://www.toutiao.com/i7023555532728353294/

執行darknet下面的文件后出現新的問題

(yolov4) waq@waq-MS-7885:~/Downloads/ai/Vitis-AI-1.3.2/yolo_dploy/darknet-master$ ./darknet detector train  cfg/voc.data cfg/yolov4.cfg  yolov4.weights -map
CUDA status Error: file: ./src/dark_cuda.c : () : line: 38 : build time: Nov 22 2021 - 20:42:38 

 CUDA Error: unknown error
Darknet error location: ./src/dark_cuda.c, check_error, line #69
CUDA Error: unknown error: Bad file descriptor
(yolov4) waq@waq-MS-7885:~/Downloads/ai/Vitis-AI-1.3.2/yolo_dploy/darknet-master$ 

搜集資料發現是cuda的問題,哎,再重新裝一次cuda!!!
1.官網下載安裝文件,我這里下載的是run文件,安裝之前卸載掉之前安裝的舊版本cuda10.1(具體版本查看nvcc --version)
一般默認安裝路徑是/usr/local/ 下面,卸載舊版本 進入到 /usr/local/cuda-10.1/bin下面,執行 sudo ./cuda-uninstaller文件,最后成功卸載,可以緊接着刪除文件夾即可
2.安裝新下載的run文件

注意選擇安裝的時候不要勾選驅動,去掉x即可,其他的勾選,安裝完之后會有一個summary:

  ===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-10.1/
Samples:  Installed in /home/waq/, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-10.1/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-10.1/lib64, or, add /usr/local/cuda-10.1/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.1/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.1/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 418.00 is required for CUDA 10.1 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log

3.安裝完成之后添加環境變量,在home目錄下,ctrl+H打開隱藏的文件,找到.bashrc,打開添加路徑(vi ~/vim .bashrc)
4.完成之后測試官方的例子,一直失敗

  
  ./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 999
-> unknown error
Result = FAIL

5.只好重新裝一遍驅動了。。。。。。。。。。
https://www.jianshu.com/p/8594771c7d5e
Loading new nvidia-495.44 DKMS files...
Building for 4.15.0-162-generic 4.15.0-163-generic
Building for architecture x86_64
Building initial module for 4.15.0-162-generic
Error! Bad return status for module build on kernel: 4.15.0-162-generic (x86_64)
Consult /var/lib/dkms/nvidia/495.44/build/make.log for more information.
Setting up nvidia-compute-utils-495 (495.44-0ubuntu0.18.04.1) ...
Warning: The home dir /nonexistent you specified can't be accessed: No such file or directory
Adding system user nvidia-persistenced' (UID 121) ... Adding new group nvidia-persistenced' (GID 127) ...
Adding new user nvidia-persistenced' (UID 121) with group nvidia-persistenced' ...
pam_tally2: /var/log/tallylog is either world writable or not a normal file
pam_tally2: Authentication error
useradd: failed to reset the tallylog entry of user "nvidia-persistenced"
Not creating home directory `/nonexistent'.
(哎,可能是上次跑一個程序的時候由於修改了gcc版本,導致安裝失敗),接下里重新修改gcc版本。。。。
參考:
https://blog.csdn.net/JerryZhang__/article/details/108865176
https://forum.xanmod.org/thread-3635.html

sudo apt-get update
sudo apt-get install gcc-8
sudo apt-get install g++-8
cd /usr/bin
sudo rm gcc g++
sudo ln -s gcc-8 gcc
sudo ln -s g++-8 g++
https://blog.csdn.net/weixin_44128857/article/details/108554751

3.修改完gcc版本之后,安裝cuda,然后再添加環境變量,最后測試

注意cuda版本需要對應,我截圖中版本不同,所以需要將環境變量里面的版本修改為具體安裝的版本


4.安裝cudnn
安裝cudnn
進入https://developer.nvidia.com/cudnn 下載對應的文件夾,解壓,
注意:一定要和cuda的版本對應!!!

下載完成后解壓並進入文件夾:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include/ 
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ 
sudo chmod a+r /usr/local/cuda/include/cudnn.h 
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

在終端查看CUDNN版本:

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
輸出結果:

  #define CUDNN_MAJOR 7
#define CUDNN_MINOR 5
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM