docker 关于nvidia或GPU的报错
-
- 报错一:Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown
- 报错二:Error response from daemon: could not select device driver "" with capabilities: [[gpu]]
报错一:Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown
解决方法:
step1:检查CUDA
在宿主机器上查看CUDA版本,报错Failed to initialize NVML: Driver/library version mismatch。
step2:检查驱动版本和内核版本
查看显卡驱动所使用的的内核版本
cat /proc/driver/nvidia/version
查看驱动程序版本
sudo dpkg --list | grep nvidia-*
发现实际内核版本与驱动需求版本不一致。原先可用的CUDA突然出现了这个问题应该是某些操作带动了驱动的更新(若没有自动更新)。
检查驱动做了什么更新