命令:
lspci | grep -i nvidia
b:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
af:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
命令:
nvidia-smi -L
GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-658852dd-63f5-582e-a34f-41b9e40d10f9)
GPU 1: Tesla P100-PCIE-16GB (UUID: GPU-8b0b68ae-d897-bf98-a8ce-6d0c020ca346)
命令:
nvidia-smi
Thu Dec 5 04:08:13 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.74 Driver Version: 418.74 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:3B:00.0 Off | 0 |
| N/A 43C P0 35W / 250W | 5338MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:AF:00.0 Off | N/A |
| N/A 38C P0 29W / 250W | 10MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
nvidia-smi命令详解:https://blog.csdn.net/u013066730/article/details/84831552
查看GPU运行状况
watch nvidia-smi
CUDA error:out of memory:
https://blog.csdn.net/keneyr/article/details/90266134
https://blog.csdn.net/eefresher/article/details/99979056
nvidia-smi 0进程,但Memory-Usage占满,GPU-Util为0%
可能的问题一: 代码问题
https://blog.csdn.net/cx415462822/article/details/103564170
https://www.cnblogs.com/wowarsenal/p/5644813.html
可能的问题二:jupyter notebook 用完占着显存不释放
https://blog.csdn.net/m0_38007695/article/details/88954699
查看进程:
fuser -v /dev/nvidia*
(base) mcj@mcj:~$ sudo fuser -v /dev/nvidia*
用户 进程号 权限 命令
/dev/nvidia0: root 927 F...m Xorg
mcj 1656 F...m gnome-shell
mcj 51465 F...m python3
mcj 51468 F...m python3
mcj 51475 F...m python3
mcj 87345 F...m jcef_helper
mcj 165919 F...m Web Content
mcj 165992 F...m Web Content
mcj 166118 F...m Web Content
mcj 166361 F...m Web Content
/dev/nvidiactl: root 927 F...m Xorg
mcj 1656 F...m gnome-shell
mcj 51465 F...m python3
mcj 51468 F...m python3
mcj 51475 F...m python3
mcj 87345 F...m jcef_helper
mcj 165919 F...m Web Content
mcj 165992 F...m Web Content
mcj 166118 F...m Web Content
mcj 166361 F...m Web Content
/dev/nvidia-modeset: root 927 F.... Xorg
mcj 1656 F.... gnome-shell
mcj 87345 F.... jcef_helper
mcj 165919 F.... Web Content
mcj 165992 F.... Web Content
mcj 166118 F.... Web Content
mcj 166361 F.... Web Content
/dev/nvidia-uvm: mcj 51465 F...m python3
mcj 51468 F...m python3
mcj 51475 F...m python3
ps -ef|grep 进程号
kill 进程号