图像变换基础：齐次坐标系

★本文是《机器学习数学基础》一书第 2 章 2.2.4 节的节选。本书由电子工业出版社出版。2022年春节后在各大平台发售。 ”

2.2.4 齐次坐标系

在前面讨论线性变换的时候，我们没有提到平移。什么是平移？以二维的平面为例，如图2-2-10所示，向量 overrightarrow{O'A'} 就是向量 overrightarrow{OA} 平移的结果，即连接两个图形的对应点的直线平行，则两个图形是平移变换。很显然，这种平移不是线性变换——向量 overrightarrow{O'A'} 所在直线并不是平面空间的子空间。尽管如此，我们可以用矩阵加法表示图2-2-10所示的平移变换：

begin{split}&overrightarrow{OA} = begin{bmatrix}0&1\0&2end{bmatrix}, overrightarrow{O'A'}=begin{bmatrix}3&4\1&3end{bmatrix}\&begin{bmatrix}3&4\1&3end{bmatrix}=begin{bmatrix}0&1\0&2end{bmatrix} begin{bmatrix}3 &3\1 & 1end{bmatrix}end{split}

图 2-2-10

既然平移不是线性变换，当然就不能用矩阵乘法的形式表示。然而在计算机图形学中，旋转、缩放、平移又是三种非常经典且常用的图形变换，旋转、缩放用矩阵乘法形式表示，偏偏平移不能，这从形式上看不美，还不便于计算和操作。为了解决这个问题，数学家们引入了齐次坐标系，这是一种与笛卡尔坐标完全不同的坐标形式，还是以平面空间为例，在笛卡尔坐标中，每个点可以用 (x,y) 的形式表示，在齐次坐标系中，则变成了 (x', y', w) ，其中x = x'/w, y=y'/w 。通常，可以设 w=1 （关于齐次坐标系的详细内容，读者可以参考有关计算机图形学资料）。

利用齐次坐标系，图2-2-10所示的平移就可以写成：

begin{split}&overrightarrow{OA} = begin{bmatrix}0&1\0&2\1&1end{bmatrix}, overrightarrow{O'A'}=begin{bmatrix}3&4\1&3\1&1end{bmatrix}\&begin{bmatrix}3&4\1&3\1&1end{bmatrix}=begin{bmatrix}1&0&3\0&1&1\0&0&1end{bmatrix}begin{bmatrix}0 &1\0 & 2\1&1end{bmatrix}end{split}

这样，平移也可以用矩阵乘法形式表示了。还是注意，这本质上不是线性变换，只不过创建了齐次坐标系之后，可以使用线性变换的形式。

对于二维向量空间的齐次坐标系，以下几个矩阵分别是实现了齐次坐标中的旋转、缩放、平移变换：

旋转：pmb{A} = begin{bmatrix}costheta & -sintheta & 0 \ sintheta & costheta & 0 \ 0 & 0 & 1end{bmatrix} ，theta 表示旋转的角度
缩放： pmb{C} = begin{bmatrix}r_x & 0 & 0\0 & r_y & 0\0 & 0 & 1end{bmatrix}，r_x、r_y、分别为 x、y

方向的缩放倍数

平移：pmb{E} = begin{bmatrix}1&0&h\0&1&k\0&0&1end{bmatrix} ，h，k 分别为 h，k 移动的长度

图 2-2-11

对于某个向量分别实施缩放、旋转、平移变换，则可写成：

begin{split}pmb{EAC}begin{bmatrix}x\y\1end{bmatrix}&=begin{bmatrix}1&0&h\0&1&k\0&0&1end{bmatrix}begin{bmatrix}costheta & -sintheta & 0 \ sintheta & costheta & 0 \ 0 & 0 & 1end{bmatrix}begin{bmatrix}r & 0 & 0\0 & r & 0\0 & 0 & 1end{bmatrix}begin{bmatrix}x\y\1end{bmatrix}\&=begin{bmatrix}rcostheta&-rsintheta&hrcostheta-krsintheta\rsintheta&rcostheta&hrsintheta krcostheta\0&0&1end{bmatrix}begin{bmatrix}x\y\1end{bmatrix}end{split}

对于图2-2-11中的Delta ABC ，如果要让它连续完成“缩放→旋转→平移”变换之后，最后变成了 Delta A'B'C' ，用 pmb{M}= pmb{EAC} 实现：

begin{split}pmb{M}=pmb{EAC}&= begin{bmatrix}1&0&h\0&1&k\0&0&1end{bmatrix}begin{bmatrix}costheta & -sintheta & 0 \ sintheta & costheta & 0 \ 0 & 0 & 1end{bmatrix}begin{bmatrix}r & 0 & 0\0 & r & 0\0 & 0 & 1end{bmatrix}\&=begin{bmatrix}rcostheta&-rsintheta&hrcostheta-krsintheta\rsintheta&rcostheta&hrsintheta krcostheta\0&0&1end{bmatrix}end{split}

设 h=-2,k=1,r=2,theta=frac{pi}{2} ，则：

pmb{M} =begin{bmatrix}rcostheta&-rsintheta&hrcostheta-krsintheta\rsintheta&rcostheta&hrsintheta krcostheta\0&0&1end{bmatrix}= begin{bmatrix}0&-2&-2\2&0&1\0&0&1end{bmatrix}

于是：pmb{M}begin{bmatrix}3&0&1\3&2&1\1&1&1end{bmatrix}=begin{bmatrix}-8&-6&-4\7&1&3\1&1&1end{bmatrix} 即：

A:begin{bmatrix}3\3\1end{bmatrix} mapsto A':begin{bmatrix}-8\7\1end{bmatrix}, B:begin{bmatrix}0\2\1end{bmatrix} mapsto B':begin{bmatrix}-6\1\1end{bmatrix}, C:begin{bmatrix}1\1\1end{bmatrix} mapsto C':begin{bmatrix}-4\3\1end{bmatrix}

。如前所述，缩放、旋转是线性变换，但平移不是。如果将线性变换和平移综合起来，统称这类变换为仿射变换（affine transformation）。常见的仿射变换，除了缩放、旋转和平移之外，还包括反射和剪切。

以上以手工计算的方式演示了图形变换的基本原理，在程序中，我们会使用一些库和模块实现各种图形变换。下面以目前常用的 OpenCV 为例，演示图形的平移、缩放和旋转变换。

1. 平移

代码语言：javascript复制

import cv2
import numpy as np
import matplotlib.pyplot as plt

img = cv2.imread("headpic.png")
M = np.float32([[1, 0, 500], [0, 1, 1000]])
rows, cols, ch = img.shape

res = cv2.warpAffine(img, M, (rows, cols)) 

plt.subplot(121)
plt.imshow(img)
plt.title('Input')

plt.subplot(122)
plt.imshow(res)
plt.title('Output')

输出图像：

在上述程序中，M = np.float32([[1,0,500],[0,1,1000]])是平移变换矩阵，即 pmb{E}=begin{bmatrix}1&0&h\0&1&k\0&0&1end{bmatrix} ，只是在程序中省略了矩阵的最后一行。构造的矩阵M中，h=500, k=1000 ，这就是分别在x 轴和 y 轴方向移动距离（对照输出图像）。

OpenCV 中的函数warpAffine()实现了图像按照平移矩阵的仿射变换，其函数形式是warpAffine(src, M, dsize)，主要参数的含义为：

src：需要变换的图像对象，即上述程序中的img；
M：变换矩阵，上述程序中即为定义的平移变换矩阵M；
dsize：变换后输出图像的大小。程序中(rows,cols)使用了输入图像的大小。

2. 缩放

仿照实现平移变换的程序，构造缩放矩阵，依然使用warpAffine()函数实现变换。

代码语言：javascript复制

M = np.float32([[0.5, 0, 0],[0, 0.5, 0]])

res = cv2.warpAffine(img, M, (rows//2, cols//2)) 

plt.subplot(121)
plt.imshow(img)
plt.title('Input')

plt.subplot(122)
plt.imshow(res)
plt.title('Output')

输出图像：

在 OpenCV 中，还提供了专门实现缩放操作的函数 cv2.resize() ，如果实现以上输出效果，将上述程序中的 res = cv2.warpAffine(img, C, (rows//2,cols//2)) 替换为 res2 = cv2.resize(img,(rows//2,cols//2)) 即可，其中的 (rows//2,cols//2) 为缩放后的图像大小。

3. 旋转

虽然可以按照旋转变换的矩阵形式，比如旋转角度 theta=45° ，构建旋转矩阵，再使用warpAffine()函数实现变换，但是，这样做的结果往往不如人意。

代码语言：javascript复制

M = np.float32([[0.702, -0.702, 0],[0.702, 0.702, 0]])

res = cv2.warpAffine(img, M, (4000, 4000)) 

plt.subplot(121)
plt.imshow(img)
plt.title('Input')

plt.subplot(122)
plt.imshow(res)
plt.title('Output')

输出图像：

从输出结果中可以看出，上述旋转是以原始图像的坐标原点（注意：计算机图形中坐标原点在左上角）为旋转中心，旋转了 45° 。为了避免此种情况，可以使用 OpenCV 中的专有函数构造旋转变换矩阵，如以下程序所示。

代码语言：javascript复制

M = cv2.getRotationMatrix2D((rows/2, cols/2), 45, 1)
res = cv2.warpAffine(img, M, (rows, cols)) 

plt.subplot(121)
plt.imshow(img)
plt.title('Input')

plt.subplot(122)
plt.imshow(res)
plt.title('Output')

输出图像：

函数 getRotationMatrix2D(center, angle, scale) 可以设置旋转中心（center）、旋转角度（angle）和缩放比例（scale）。

以上简要介绍了OpenCV中的实现旋转、缩放、平移三种变换的函数，除了这三种变换之外，OpenCV还支持其他形式的变换，比如对应点变换（用函数cv2.getAffineTransform构造变换矩阵）等。读者若对计算机视觉或计算机图形学有兴趣，不妨深入研习OpenCV的有关应用。

如果用深度学习框架训练模型，往往需要大量的数据，但是很多真实业务中，数据量并不充足，此时常常会采取一些方式扩充数据。对于图像数据而言，比较简单的数据扩充方式包括图像水平翻转、尺度变换、旋转等。

opencv

0 人点赞