用于Stable Diffusion的 ControlNet 简介

更好地控制文本到图像的生成

本教程介绍了使用 HuggingFace 的 diffusers 包通过 ControlNet 生成文本到图像的技术指南。

ControlNet 是一种通过添加额外条件来控制 Stable Diffusion 模型的神经网络结构。它提供了一种在文本到图像生成过程中通过条件输入（例如涂鸦、边缘图、分割图、姿势关键点等）增强 Stable Diffusion 的方法。因此，生成的图像将更加接近 ControlNet 中的输入图像的要求，这比图像到图像生成等传统方法有很大改进。

此外，可以使用消费级 GPU 上的小型数据集来训练 ControlNet 模型。然后，可以使用任何预先训练的Stable Diffusion模型来增强该模型，以生成文本到图像。

ControNet 的初始版本带有以下checkpoint：

Canny edge — 黑色背景上有白色边缘的单色图像。
Depth/Shallow areas — 灰度图像，黑色代表深层区域，白色代表浅层区域。
Normal map — 法线映射图像。
Semantic segmentation map — ADE20K的分割协议图像。
HED edge — 黑色背景上带有白色软边缘的单色图像。
Scribbles — 黑色背景上带有白色轮廓的手绘单色图像。
OpenPose bone (pose keypoints) — OpenPose 骨骼图像。
M-LSD line — 黑色背景上仅由白色直线组成的单色图像。

让我们继续下一部分的设置和安装。

安装

强烈建议在安装 diffusers 包之前创建一个新的虚拟环境。

diffusers

激活虚拟环境并运行以下命令安装稳定版本的diffusers模块：

代码语言：txt复制

pip install diffusers

PS: ControlNet 需要 diffusers>=0.14.0

对于最新版本的 diffusers 软件包，请按如下方式安装：

代码语言：txt复制

pip install git https://github.com/huggingface/diffusers

accelerate

您可以 accelerate 按如下方式安装该模块：

代码语言：txt复制

pip install accelerate

本教程包含一些依赖于的代码片段 accelerate>=0.17.0 ，在撰写本文时尚未在 PyPi 上发布。安装最新版本如下：

代码语言：txt复制

pip install git https://github.com/huggingface/accelerate

OpenCV-Python

请注意，根据不同的 ControlNet，预处理器和依赖项会有所不同。为简单起见，本教程将介绍 canny edge 处理器，该处理器取决于软件包 opencv-python 。

opencv-python 有 4 种不同的依赖包。官方文档推荐使用 opencv-contrib-python 依赖包，但可以使用以下任何依赖包来完成推理：

opencv-python — 主包
opencv-contrib-python — 完整包（附带 contrib/额外模块）
opencv-python-headless — 没有 GUI 的主包
opencv-contrib-python-headless — 没有 GUI 的完整包

通过以下命令安装它（可以根据您的喜好替换依赖包名称）：

代码语言：txt复制

pip install opencv-contrib-python

controlnet-aux

另一方面，OpenPose 处理器需要 controlnet-aux 依赖包。运行以下命令来安装它：

代码语言：txt复制

pip install controlnet-aux

xformers（可选）

xformers 依赖包显着提高了推理速度。最新版本附带了对 PyTorch 1.13.1 的pip wheels 支持。

Pip install (win/linux)

对于那些使用 torch==1.13.1 的工程师，只需运行以下命令即可安装 xformers ：

代码语言：txt复制

pip install -U xformers

Conda (linux)

对于 conda 用户，安装支持torch==1.12.1或torch==1.13.1

代码语言：txt复制

conda install xformers

从源头构建

对于其他工程师，请考虑 xformers 直接从源代码构建：

代码语言：shell复制

# (Optional) Makes the build much faster
pip install ninja

# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types
pip install -v -U git https://github.com/facebookresearch/xformers.git@main#egg=xformers
# (this can take dozens of minutes)

实现

让我们探讨如何利用 canny edge ControlNet 进行图像生成。它需要 canny edge 图像作为输入。

Canny

创建一个名为 canny_inference.py 的新文件，并添加以下import语句：

代码语言：txt复制

import cv2
import numpy as np
from PIL import Image

然后，继续添加以下代码片段，从现有图像创建 canny edge 图像

代码语言：txt复制

import cv2
import numpy as np
from PIL import Image

image = Image.open('input.png')
image = np.array(image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
canny_image.save('canny.png')

保存文件并运行以下命令将图像转换为 canny edge 图像：

代码语言：txt复制

python canny_inference.py

看一下如下例子：

下一步是使用 canny 图像作为条件输入来执行推理。修改导入语句如下：

代码语言：txt复制

import cv2
 import torch
import numpy as np
from PIL import Image
 from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, DPMSolverMultistepScheduler

通过初始化 ControlNet 和 Stable Diffusion pipelines 来更新代码：

代码语言：txt复制

...

canny_image = Image.fromarray(image)
# canny_image.save('canny.png')

# for deterministic generation
generator = torch.Generator(device='cuda').manual_seed(12345)
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-canny",
    torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
)
# change the scheduler
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
# enable xformers (optional), requires xformers installation
pipe.enable_xformers_memory_efficient_attention()
# cpu offload for memory saving, requires accelerate>=0.17.0
pipe.enable_model_cpu_offload()

运行推理并保存生成的图像：

代码语言：txt复制

...

# cpu offload for memory saving, requires accelerate>=0.17.0
pipe.enable_model_cpu_offload()

image = pipe(
    "a beautiful lady, celebrity, red dress, dslr, colour photo, realistic, high quality",
    negative_prompt="cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, blurry, bad anatomy, bad proportions",
    num_inference_steps=20,
    generator=generator,
    image=canny_image,
    controlnet_conditioning_scale=0.5
).images[0]
image.save('output.png')

接受 StableDiffusionControlNetPipeline 以下参数：

controlnet_conditioning_scale — controlnet 的输出先乘以 controlnet_conditioning_scale ，然后再添加到原始unet 的残差中。 controlnet_conditioning_scale 默认为 1.0 并接受 0.0 – 1.0 之间的任何值。

运行脚本，您应该得到以下输出：

让我们使用不同的输入图像和设置再次重新运行脚本：

代码语言：txt复制

...

image = pipe(
-   "a beautiful lady, celebrity, red dress, dslr, colour photo, realistic, high quality",
    "a beautiful lady wearing blue yoga pants working out on beach, realistic, high quality",
    negative_prompt="cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, blurry, bad anatomy, bad proportions",
    num_inference_steps=20,
    generator=generator,
    image=canny_image,
-   controlnet_conditioning_scale=0.5
    controlnet_conditioning_scale=1.0
).images[0]
image.save('tmp/output.png')

输出如下：

OpenPose

让我们尝试使用 OpenPose 骨骼图像作为条件输入。看一下下图作为其外观的参考：

controlnet-aux 模块提供将图像转换为 OpenPose 骨骼图像的支持。创建一个名为 pose_inference.py 的新 Python 文件并添加以下import：

代码语言：txt复制

import torch
from PIL import Image
from controlnet_aux import OpenposeDetector
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, DPMSolverMultistepScheduler

继续添加以下代码片段：

代码语言：txt复制

...

image = Image.open('input.png')
openpose = OpenposeDetector.from_pretrained('lllyasviel/ControlNet')
pose_image = openpose(image)
pose_image.save('pose.png')

保存文件并运行以下命令将图像转换为 OpenPose 骨骼图像：

代码语言：txt复制

python pose_inference.py

请看以下示例以供参考：

通过附加以下代码行来完成脚本：

代码语言：txt复制

...

# for deterministic generation
generator = torch.Generator(device='cuda').manual_seed(12345)
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-openpose",
    torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
)
# change the scheduler
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
# enable xformers (optional), requires xformers installation
pipe.enable_xformers_memory_efficient_attention()
# cpu offload for memory saving, requires accelerate>=0.17.0
pipe.enable_model_cpu_offload()

# cpu offload for memory saving, requires accelerate>=0.17.0
pipe.enable_model_cpu_offload()

image = pipe(
    "a beautiful hollywood actress wearing black dress attending award winning event, red carpet stairs at background",
    negative_prompt="cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, blurry, bad anatomy, bad proportions",
    num_inference_steps=20,
    generator=generator,
    image=pose_image,
    controlnet_conditioning_scale=1.0
).images[0]
image.save('output.png')

运行脚本，输出如下：

ControlNet 是一种极其强大的神经网络结构，可以通过添加额外条件来控制扩散模型。

PS：在撰写本文时，开源社区仍在积极开发对 Multi-ControlNet的支持。

新功能提供了一种使用多个 ControlNet 并将输出添加在一起以生成图像的方法，从而可以更好地控制整个图像。只需传入

代码语言：txt复制

import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", 
                                                   torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", 
                                                   torch_dtype=torch.float16).to("cuda")

pipe = StableDiffusionControlNetPipeline.from_pretrained(
 "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16,
 controlnet=[
  controlnet_pose, 
  controlnet_canny
 ],
).to("cuda")

image = pipe(prompt='...',
             image=[pose_image, canny_image],
        ).images[0]
image.save("output.png")

当使用多个 ControlNet 时，您可以通过controlnet_conditioning_scale传入浮点数列表作为输入参数来控制比例因子，如下所示：

代码语言：txt复制

controlnet_conditioning_scale=[1.0, 0.5]

结论

让我们回顾一下今天的学习要点。

本文首先简要介绍 ControlNet 和支持的模型列表。

然后，通过 pip install 继续设置和安装步骤。

随后，它继续使用 opencv-python 以获得 canny edge 图像。然后将输出用作文本到图像生成的条件输入。

除此之外，本教程还解释了如何使用 OpenPose 骨骼图像作为条件输入。 controlnet-aux 模块可以方便地将图像转换为 OpenPose 图像。

感谢您阅读这篇文章。祝你有美好的一天！

参考

Github — ControlNet

HuggingFace 文档 — ControlNet

人工智能 stablediffusioncontrolnet diffusers

0 人点赞