Nvidia 显卡 Failed to initialize NVML Driver/library version mismatch 错误解决方案

2022-08-05 14:42:24 浏览数 (1)

本文记录错误 Failed to initialize NVML: Driver/library version mismatch 错误解决方案。

问题复现

代码语言:javascript复制
$ nvidia-smi 

-->
Failed to initialize NVML: Driver/library version mismatch

问题分析

  • NVIDIA 内核驱动版本与系统驱动不一致
查看显卡驱动所使用的内核版本
代码语言:javascript复制
cat /proc/driver/nvidia/version

-->
NVRM version: NVIDIA UNIX x86_64 Kernel Module  430.34  Wed Jun 26 12:19:48 CDT 2019
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)

  • 内核版本 Kernel Module 为 430.34, 系统内核 16.04.12
查看系统驱动日志
代码语言:javascript复制
cat /var/log/dpkg.log | grep nvidia

-->
2021-03-30 14:04:55 install libnvidia-compute-460-server:amd64 <none> 460.32.03-0ubuntu0.18.04.2
2021-03-30 14:04:55 status half-installed libnvidia-compute-460-server:amd64 460.32.03-0ubuntu0.18.04.2
2021-03-30 14:04:57 status unpacked libnvidia-compute-460-server:amd64 460.32.03-0ubuntu0.18.04.2
2021-03-30 14:04:57 status unpacked libnvidia-compute-460-server:amd64 460.32.03-0ubuntu0.18.04.2
2021-03-30 14:05:15 install nvidia-cuda-dev:amd64 <none> 9.1.85-3ubuntu1
2021-03-30 14:05:15 status half-installed nvidia-cuda-dev:amd64 9.1.85-3ubuntu1
2021-03-30 14:05:34 status unpacked nvidia-cuda-dev:amd64 9.1.85-3ubuntu1
2021-03-30 14:05:34 status unpacked nvidia-cuda-dev:amd64 9.1.85-3ubuntu1
2021-03-30 14:05:34 install nvidia-cuda-doc:all <none> 9.1.85-3ubuntu1
2021-03-30 14:05:34 status half-installed nvidia-cuda-doc:all 9.1.85-3ubuntu1
2021-03-30 14:05:38 status unpacked nvidia-cuda-doc:all 9.1.85-3ubuntu1
2021-03-30 14:05:38 status unpacked nvidia-cuda-doc:all 9.1.85-3ubuntu1
2021-03-30 14:05:38 install nvidia-cuda-gdb:amd64 <none> 9.1.85-3ubuntu1
2021-03-30 14:05:38 status half-installed nvidia-cuda-gdb:amd64 9.1.85-3ubuntu1
2021-03-30 14:05:39 status unpacked nvidia-cuda-gdb:amd64 9.1.85-3ubuntu1
2021-03-30 14:05:39 status unpacked nvidia-cuda-gdb:amd64 9.1.85-3ubuntu1
2021-03-30 14:05:39 install nvidia-profiler:amd64 <none> 9.1.85-3ubuntu1
2021-03-30 14:05:39 status half-installed nvidia-profiler:amd64 9.1.85-3ubuntu1

  • 可以看到曾经安装过系统内核 18.04 的 460.32 的驱动
查看驱动程序
代码语言:javascript复制
sudo dpkg --list | grep nvidia-*

-->
ii  libnvidia-compute-460-server:amd64   460.32.03-0ubuntu0.18.04.2                 amd64        NVIDIA libcompute package
ii  libnvidia-container-tools            1.0.5-1                                    amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64           1.0.5-1                                    amd64        NVIDIA container runtime library
ii  nvidia-container-runtime             3.1.4-1                                    amd64        NVIDIA container runtime
ii  nvidia-container-toolkit             1.0.5-1                                    amd64        NVIDIA container runtime hook
ii  nvidia-cuda-dev                      9.1.85-3ubuntu1                            amd64        NVIDIA CUDA development files
ii  nvidia-cuda-doc                      9.1.85-3ubuntu1                            all          NVIDIA CUDA and OpenCL documentation

  • 可以看到系统安装了ubuntu 内核18.04 下的 nvidia 460 驱动
  • 实际系统内核版本与驱动需求的版本不一致是问题产生的根源

解决方案

  • 卸载现有驱动,重新安装
卸载驱动
代码语言:javascript复制
sudo /usr/bin/nvidia-uninstall
sudo apt-get --purge remove nvidia-*
sudo apt-get purge nvidia*
sudo apt-get purge libnvidia*

  • 直到命令不输出任何内容
代码语言:javascript复制
sudo dpkg --list | grep nvidia-*

  • 重新安装
代码语言:javascript复制
sudo chmod a x NVIDIA-Linux-x86_64-450.80.02.run
sudo ./NVIDIA-Linux-x86_64-450.80.02.run -no-x-check -no-nouveau-check -no-opengl-files

–no-opengl-files 只安装驱动文件,不安装OpenGL文件 –no-x-check 安装驱动时不检查X服务 –no-nouveau-check 安装驱动时不检查nouveau

查看驱动更新结果

代码语言:javascript复制
$ nvidia-smi

参考资料

  • https://blog.csdn.net/qq_40200387/article/details/90341107
  • https://cloud.tencent.com/developer/article/2066776

0 人点赞