前言
部署KYC人脸对比服务,需要GPU支持,生产环境都是容器,所以就需要捣鼓下docker如何支持GPU。
准备工作
服务器类型:AWS-g4 GPU型号 :T4 (要求显卡驱动版本大于520) OS Version:Ubuntu 22.04
部署显卡驱动
根据所创建服务器的显卡型号,去nvidia官网下载指定的驱动,也可以根据aws官网文档从s3里面cp对应的驱动,我这里直接去nvidia官网下载相关驱动程序。 [ 部署kyc 人脸识别(face api,需要显卡驱动大于520,g2类型服务器的显卡k520,在官网没有大于520版本的驱动,) ]
代码语言:javascript复制sudo apt-get upgrade -y linux-aws
reboot
sudo apt-get install -y gcc make linux-headers-$(uname -r)
mkdir /data
cd /data/
mkdir software
cd software/
wget https://us.download.nvidia.com/tesla/535.104.05/nvidia-driver-local-repo-ubuntu2204-535.104.05_1.0-1_amd64.deb
dpkg -i nvidia-driver-local-repo-ubuntu2204-535.104.05_1.0-1_amd64.deb
sudo cp /var/nvidia-driver-local-repo-ubuntu2204-535.104.05/nvidia-driver-local-62140ACB-keyring.gpg /usr/share/keyrings/
dpkg -i nvidia-driver-local-repo-ubuntu2204-535.104.05_1.0-1_amd64.deb
cat << EOF | sudo tee --append /etc/modprobe.d/blacklist.conf
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
EOF
GRUB_CMDLINE_LINUX="rdblacklist=nouveau"
/var/nvidia-driver-local-repo-ubuntu2204-535.104.05/nvidia-driver-local-62140ACB-keyring.gpg
部署cuda
代码语言:javascript复制distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
cuda-keyring.
sudo apt-key del 7fa2af80
directories
dpkg -i cuda-keyring_1.0-1_all.deb
apt-get update
sudo apt-get -y install cuda-drivers
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
nvidia-smi
部署runtime
代码语言:javascript复制sudo apt-get install nvidia-container-runtime
测试
使用如下nvidia-smi
命令,可以查看到kyc容器已经开始使用GPU了。
root@ip-192-115-111-202:~# nvidia-smi
Mon Sep 11 04:17:51 2023
---------------------------------------------------------------------------------------
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|----------------------------------------- ---------------------- ----------------------
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|========================================= ====================== ======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 42C P0 29W / 70W | 13828MiB / 15360MiB | 0% Default |
| | | N/A |
----------------------------------------- ---------------------- ----------------------
---------------------------------------------------------------------------------------
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 4178 C /app/.venv/bin/python 2812MiB |
| 0 N/A N/A 4219 C /app/.venv/bin/python 2800MiB |
| 0 N/A N/A 4525 C /app/.venv/bin/python 1822MiB |
| 0 N/A N/A 4707 C /app/.venv/bin/python 2704MiB |
| 0 N/A N/A 4877 C /app/.venv/bin/python 1832MiB |
| 0 N/A N/A 4917 C /app/.venv/bin/python 1854MiB |
---------------------------------------------------------------------------------------
文档
- aws:文档
- nvidia:文档