当前版本信息
系统: ubuntu18.04
显卡型号: Tesla V100
显卡驱动: NVIDIA-SMI 410.48
cuda版本: 10.0, V10.0.130
打算新安装个cuda10.1版本, 由于显卡驱动版本太低, cuda10.1要求的最低显卡驱动版本是418.39, 所以需要重装显卡驱动,新装cuda10.1和相对应cudnn, 显卡驱动版本对应关系见链接https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
-
官网下载显卡驱动和cuda,cudnn
显卡驱动版本: 418.197.02
cuda版本: 10.1
cudnn版本: cuDNN v8.0.5 -
卸载显卡驱动
(1) 关掉图形界面sudo service lightdm stop
(2) 卸载原有驱动, 一路默认
sudo /usr/bin/nvidia-uninstall
(3) 重启
sudo reboot
-
安装新的显卡驱动
(1) 关掉图形界面sudo service lightdm stop
(2) 赋予权限
chmod +x NVIDIA-Linux-x86_64-418.197.02.run
(3) 安装新的驱动驱动
```bash sudo ./NVIDIA-Linux-x86_64-418.197.02.run -no-x-check -no-nouveau-check -no-opengl-files ``` *-no-x-check安装驱动时关闭x服务; -no-nouveau-check 安装驱动时禁用Nouveau -no-opengl-files 安装时只装驱动文件,不安装Opengl*
Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. 选择no WARNING: Unable to find a suitable destination to install 32-bit compatibility libraries. Your system may not be set up for 32-bit compatibility. 32-bit compatibility files will not be installed; if you wish to install them, re-run the installation and set a valid directory with the --compat32-libdir option. 选择 OK Installation of the kernel module for the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version 418.197.02) is now complete. 选择 OK
(4) 启动显示服务
sudo service lightdm restart
(5) 查看是否安装成功:
nvidia-smi
-
安装cuda10.1
(1) 安装cuda10.1sudo sh cuda_10.1.243_418.87.00_linux.run
安装过程中的选择 Do you accept the above EULA? 选择accept
不修改原有的软链接 A symlink already exists at /usr/local/cuda. Update to this installation? 选择No
安装完成后
此时/burr/local/下
administrator@dell:/usr/local$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 administrator@dell:/usr/local$ ls -l total 68 lrwxrwxrwx 1 root root 20 May 14 2020 cuda -> /usr/local/cuda-10.0 drwxr-xr-x 19 root root 4096 May 14 2020 cuda-10.0 drwxr-xr-x 14 root root 4096 Jun 24 14:21 cuda-10.1
(2) 在~/.bashrc中添加/修改环境变量:
sudo vim ~/.bashrc
#将原来的cuda环境变量 #export PATH=$PATH:/usr/local/cuda-10.0/bin:/home/administrator/anaconda3/bin:/home/administrator/pycharm-2020.1.1/bin #export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH #将原来的cuda-10.0改为cuda export PATH=$PATH:/usr/local/cuda/bin:/home/administrator/anaconda3/bin:/home/administrator/pycharm-2020.1.1/bin export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
在终端运行, 使环境变量生效
source ~/.bashrc
-
切换cuda版本
(1) 查看当前cuda版本,为10.0administrator@dell:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130
(2) 切换/usr/local/cuda软链接的指向
sudo rm -rf /usr/local/cuda #删除之前创建的软链接 sudo ln -s /usr/local/cuda-10.1 /usr/local/cuda #创建新的软链接
(3) 查看cuda版本,发现cuda版本未切换
dministrator@dell:/usr/local$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130
(4) 查看/usr/local/cuda/version.txt中的版本信息
administrator@dell:/usr/local$ cat /usr/local/cuda/version.txt CUDA Version 10.1.243
(5) 解决方案
sudo vim /etc/profile
末尾加上
export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64$LD_LIBRARY_PATH export CUDA_HOME=/usr/local/cuda
在终端运行, 使环境变量生效
source /etc/profile
查看cuda版本,此时切换成功
administrator@dell:/usr/local$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation tools, release 10.1, V10.1.243
-
安装cuDNN
(1) 查看当前cudnnadministrator@dell:~$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2 #define CUDNN_MAJOR 7 #define CUDNN_MINOR 6 #define CUDNN_PATCHLEVEL 5 -- #define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL) #include "driver_types.h"
(2) 解压cudnn压缩包
tar -zxf cudnn-10.1-linux-x64-v8.0.5.39.tgz
(3) 将解压出来的cuda文件夹中的指定文件文件复制到/usr/local/cuda-10.1对应的文件夹下
sudo cp cuda/include/cudnn*.h /usr/local/cuda-10.1/include sudo cp cuda/lib64/libcudnn* /usr/local/cuda-10.1/lib64 sudo chmod a+r /usr/local/cuda-10.1/include/cudnn*.h /usr/local/cuda-10.1/lib64/libcudnn*
(4) 验证是否安装成功
administrator@dell:~$ cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2 #define CUDNN_MAJOR 8 #define CUDNN_MINOR 0 #define CUDNN_PATCHLEVEL 5 -- #define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL) #endif /* CUDNN_VERSION_H */
参考文章
[1] 安装多个版本的cuda和cudnn
[2] 非root用户在linux下安装多个版本的CUDA和cuDNN(cuda 8、cuda 10.1 等)
[3] Ubuntu 多版本Cuda(8.0,9.0)以及CuDnn安装