自制数字人播报视频

本教程带您一下自制属于你自己的数字人播报视频，即通过人脸图像和一段语音音频生成照片说话视频。

先看两段此工具生成的视频：

所使用的工具为SadTalker，所使用测试环境为google免费提供的colab，具体使用方法：

1、确认GPU 及 CUDA 环境可用

代码语言：javascript复制

### 确认GPU 及 CUDA 环境可用
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader

2、安装环境及下载源代码

代码语言：javascript复制

!update-alternatives --install /usr/local/bin/python3 python3 /usr/bin/python3.8 2
!update-alternatives --install /usr/local/bin/python3 python3 /usr/bin/python3.9 1
!sudo apt install python3.8

!sudo apt-get install python3.8-distutils

!python --version

!apt-get update

!apt install software-properties-common

!sudo dpkg --remove --force-remove-reinstreq python3-pip python3-setuptools python3-wheel

!apt-get install python3-pip

print('Git clone project and install requirements...')
!git clone https://github.com/Winfredy/SadTalker &> /dev/null
� SadTalker
!export PYTHONPATH=/content/SadTalker:$PYTHONPATH
!python3.8 -m pip install torch==1.12.1 cu113 torchvision==0.13.1 cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
!apt update
!apt install ffmpeg &> /dev/null
!python3.8 -m pip install -r requirements.txt

3、下载预训练模型

代码语言：javascript复制

print('下载预训练模型...')
!rm -rf checkpoints
!bash scripts/download_models.sh

4、生成数字人播报视频

准备照片和音频文件，照片必须要有清楚的头像，音频随便找一段讲话的音频即可：

照片：examples/source_image/face.png

音频文件：examples/driven_audio/jack.mp3

代码语言：javascript复制

# 此处指定数字人图片为face.png 音频文件为 jack.mp3
img = 'examples/source_image/face.png'
print(img)
!python3.8 inference.py --driven_audio ./examples/driven_audio/jack.mp3 
           --source_image {img} 
           --result_dir ./results --still --preprocess full --enhancer gfpgan

生成完成会输出如下信息，里面包含了视频文件名称：./results/2024_01_18_15.04.41.mp4

代码语言：javascript复制

examples/source_image/face3.png
using safetensor as default
3DMM Extraction for source image
landmark Det:: 100% 1/1 [00:00<00:00, 15.77it/s]
3DMM Extraction In Video:: 100% 1/1 [00:00<00:00, 22.80it/s]
mel:: 100% 1787/1787 [00:00<00:00, 18679.82it/s]
audio2exp:: 100% 179/179 [00:00<00:00, 295.25it/s]
Face Renderer:: 100% 894/894 [08:45<00:00,  1.70it/s]
IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (256, 254) to (256, 256) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).
The generated video is named ./results/2024_01_18_15.04.41/face3##2023zongjie.mp4
OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
seamlessClone:: 100% 1787/1787 [01:01<00:00, 28.90it/s]
The generated video is named ./results/2024_01_18_15.04.41/face3##2023zongjie_full.mp4
face enhancer....
Face Enhancer:: 100% 1787/1787 [15:12<00:00,  1.96it/s]
The generated video is named ./results/2024_01_18_15.04.41/face3##2023zongjie_enhanced.mp4
The generated video is named: ./results/2024_01_18_15.04.41.mp4

python3 工具模型视频音频

0 人点赞