本教程带您一下自制属于你自己的数字人播报视频,即通过人脸图像和一段语音音频生成照片说话视频。
先看两段此工具生成的视频:
所使用的 工具为SadTalker,所使用测试环境为google免费提供的colab,具体使用方法:
1、确认GPU 及 CUDA 环境可用
代码语言:javascript复制### 确认GPU 及 CUDA 环境可用
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader
2、安装环境及下载源代码
代码语言:javascript复制!update-alternatives --install /usr/local/bin/python3 python3 /usr/bin/python3.8 2
!update-alternatives --install /usr/local/bin/python3 python3 /usr/bin/python3.9 1
!sudo apt install python3.8
!sudo apt-get install python3.8-distutils
!python --version
!apt-get update
!apt install software-properties-common
!sudo dpkg --remove --force-remove-reinstreq python3-pip python3-setuptools python3-wheel
!apt-get install python3-pip
print('Git clone project and install requirements...')
!git clone https://github.com/Winfredy/SadTalker &> /dev/null
� SadTalker
!export PYTHONPATH=/content/SadTalker:$PYTHONPATH
!python3.8 -m pip install torch==1.12.1 cu113 torchvision==0.13.1 cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
!apt update
!apt install ffmpeg &> /dev/null
!python3.8 -m pip install -r requirements.txt
3、下载预训练模型
代码语言:javascript复制print('下载预训练模型...')
!rm -rf checkpoints
!bash scripts/download_models.sh
4、生成数字人播报视频
准备照片和音频文件,照片必须要有清楚的头像,音频随便找一段讲话的音频即可:
照片:examples/source_image/face.png
音频文件:examples/driven_audio/jack.mp3
代码语言:javascript复制# 此处指定数字人图片为face.png 音频文件为 jack.mp3
img = 'examples/source_image/face.png'
print(img)
!python3.8 inference.py --driven_audio ./examples/driven_audio/jack.mp3
--source_image {img}
--result_dir ./results --still --preprocess full --enhancer gfpgan
生成完成会输出如下信息,里面包含了视频文件名称:./results/2024_01_18_15.04.41.mp4
代码语言:javascript复制examples/source_image/face3.png
using safetensor as default
3DMM Extraction for source image
landmark Det:: 100% 1/1 [00:00<00:00, 15.77it/s]
3DMM Extraction In Video:: 100% 1/1 [00:00<00:00, 22.80it/s]
mel:: 100% 1787/1787 [00:00<00:00, 18679.82it/s]
audio2exp:: 100% 179/179 [00:00<00:00, 295.25it/s]
Face Renderer:: 100% 894/894 [08:45<00:00, 1.70it/s]
IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (256, 254) to (256, 256) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).
The generated video is named ./results/2024_01_18_15.04.41/face3##2023zongjie.mp4
OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
seamlessClone:: 100% 1787/1787 [01:01<00:00, 28.90it/s]
The generated video is named ./results/2024_01_18_15.04.41/face3##2023zongjie_full.mp4
face enhancer....
Face Enhancer:: 100% 1787/1787 [15:12<00:00, 1.96it/s]
The generated video is named ./results/2024_01_18_15.04.41/face3##2023zongjie_enhanced.mp4
The generated video is named: ./results/2024_01_18_15.04.41.mp4