来源 | Medium
编辑 | 代码医生团队
什么是面部关键点?
面部关键点也称为面部地标,通常指定面部的鼻子,眼睛,嘴巴等区域,该面部按68个关键点分类,并带有该坐标的坐标(x,y)。使用面部关键点,可以实现面部识别,情绪识别等。
点代表关键点
选择数据集:
由于Udacity已经提供了YouTube Faces数据集,因此将使用它。它是一个数据集,包含3,425个面部视频,旨在研究视频中无约束的面部识别问题。这些视频已通过处理步骤输入,并转换为包含一个脸部和关联关键点的图像帧集。
https://www.cs.tau.ac.il/~wolf/ytfaces/
训练和测试数据:
该面部关键点数据集由5770幅彩色图像组成。所有这些图像都分为训练数据集或测试数据集。
这些图像中的3462是训练图像,供在创建模型以预测关键点时使用。
2308是测试图像,将用于测试模型的准确性。
预处理数据:
为了将数据(图像)输入到神经网络,必须通过将numpy数组转换为Pytorch张量来将图像转换为固定的尺寸大小和标准的颜色范围(以便进行更快的计算)。
转换:
- Normalize:将彩色图像转换为[0,1]范围的灰度值,并将关键点规范化为大约[-1,1]的范围
- Rescale:将图像重新缩放至所需尺寸。
- RandomCrop:随机裁剪图像。
- ToTensor:将numpy图片转换为torch图像。
# test out some of these transforms
rescale = Rescale(100)
crop = RandomCrop(50)
composed = transforms.Compose([Rescale(250),
RandomCrop(224)])
# apply the transforms to a sample image
test_num = 500
sample = face_dataset[test_num]
fig = plt.figure()
for i, tx in enumerate([rescale, crop, composed]):
transformed_sample = tx(sample)
ax = plt.subplot(1, 3, i 1)
plt.tight_layout()
ax.set_title(type(tx).__name__)
show_keypoints(transformed_sample['image'], transformed_sample['keypoints'])
plt.show()
转换输出
创建转换后的数据集:
代码语言:javascript复制# define the data tranform
# order matters! i.e. rescaling should come before a smaller crop
data_transform = transforms.Compose([Rescale(250),
RandomCrop(224),
Normalize(),
ToTensor()])
# create the transformed dataset
transformed_dataset = FacialKeypointsDataset(csv_file='/data/training_frames_keypoints.csv',
root_dir='/data/training/',
transform=data_transform)
这里224 * 224px是通过变换获得的标准化输入图像大小,输出类别得分应为136,即136/2 = 68
定义CNN架构:
在查看了要使用的数据并了解了图像的形状和关键点之后,就可以定义一个可以从该数据中学习的卷积神经网络了。
定义此CNN的所有层,唯一的要求是:
- 该网络接收一个正方形(宽度和高度相同)的灰度图像作为输入。
- 它以代表关键点的线性层结尾(最后一层输出136个值,对于68个关键点(x,y)对中的每个输出2个值)。
卷积层的形状:
K — out_channels:卷积层中的过滤器数
F — kernel_size
S-卷积的步幅
P-填充
W —上一层的宽度/高度(正方形)
The self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)
output size = (W-F)/S 1 = (224–5)/1 1 = 220, the output Tensor for one image will have the dimensions: (1, 220, 220)
1 input image channel (grayscale), 32 output channels/feature maps, 5x5 square convolution kernel
代码语言:javascript复制self.conv1 = nn.Conv2d(1, 32, 5)
# output size = (W-F)/S 1 = (224-5)/1 1 = 220
self.pool1 = nn.MaxPool2d(2, 2)
# 220/2 = 110 the output Tensor for one image, will have the #dimensions: (32, 110, 110)
self.conv2 = nn.Conv2d(32,64,3)
# output size = (W-F)/S 1 = (110-3)/1 1 = 108
self.pool2 = nn.MaxPool2d(2, 2)
#108/2=54 the output Tensor for one image, will have the #dimensions: (64, 54, 54)
self.conv3 = nn.Conv2d(64,128,3)
# output size = (W-F)/S 1 = (54-3)/1 1 = 52
self.pool3 = nn.MaxPool2d(2, 2)
#52/2=26 the output Tensor for one image, will have the #dimensions: (128, 26, 26)
self.conv4 = nn.Conv2d(128,256,3)
# output size = (W-F)/S 1 = (26-3)/1 1 = 24
self.pool4 = nn.MaxPool2d(2, 2)
#24/2=12 the output Tensor for one image, will have the #dimensions: (256, 12, 12)
self.conv5 = nn.Conv2d(256,512,1)
# output size = (W-F)/S 1 = (12-1)/1 1 = 12
self.pool5 = nn.MaxPool2d(2, 2)
#12/2=6 the output Tensor for one image, will have the #dimensions: (512, 6, 6)
#Linear Layer
self.fc1 = nn.Linear(512*6*6, 1024)
self.fc2 = nn.Linear(1024, 136)
代码语言:javascript复制self.conv1 = nn.Conv2d(1, 32, 5)
# output size = (W-F)/S 1 = (224-5)/1 1 = 220
self.pool1 = nn.MaxPool2d(2, 2)
# 220/2 = 110 the output Tensor for one image, will have the dimensions: (32, 110, 110)
self.conv2 = nn.Conv2d(32,64,3)
# output size = (W-F)/S 1 = (110-3)/1 1 = 108
self.pool2 = nn.MaxPool2d(2, 2)
#108/2=54 the output Tensor for one image, will have the dimensions: (64, 54, 54)
self.conv3 = nn.Conv2d(64,128,3)
# output size = (W-F)/S 1 = (54-3)/1 1 = 52
self.pool3 = nn.MaxPool2d(2, 2)
#52/2=26 the output Tensor for one image, will have the dimensions: (128, 26, 26)
self.conv4 = nn.Conv2d(128,256,3)
# output size = (W-F)/S 1 = (26-3)/1 1 = 24
self.pool4 = nn.MaxPool2d(2, 2)
#24/2=12 the output Tensor for one image, will have the dimensions: (256, 12, 12)
self.conv5 = nn.Conv2d(256,512,1)
# output size = (W-F)/S 1 = (12-1)/1 1 = 12
self.pool5 = nn.MaxPool2d(2, 2)
#12/2=6 the output Tensor for one image, will have the dimensions: (512, 6, 6)
#Linear Layer
self.fc1 = nn.Linear(512*6*6, 1024)
self.fc2 = nn.Linear(1024, 136)
可以添加Dropouts来规范化深度神经网络。获得更好结果的秘诀之一是将辍学的概率(p)保持在0.1到0.5的范围内。同样,最好有多个丢弃概率(p)不同的值。
代码语言:javascript复制self.drop1 = nn.Dropout(p = 0.1)
self.drop2 = nn.Dropout(p = 0.2)
self.drop3 = nn.Dropout(p = 0.25)
self.drop4 = nn.Dropout(p = 0.25)
self.drop5 = nn.Dropout(p = 0.3)
self.drop6 = nn.Dropout(p = 0.4)
- 将构建具有ReLU作为激活功能的前馈网络。
def forward(self, x):
## TODO: Define the feedforward behavior of this model
## x is the input image and, as an example, here you may choose to include a pool/conv step:
## x = self.pool(F.relu(self.conv1(x)))
x = self.pool1(F.relu(self.conv1(x)))
x = self.drop1(x)
x = self.pool2(F.relu(self.conv2(x)))
x = self.drop2(x)
x = self.pool3(F.relu(self.conv3(x)))
x = self.drop3(x)
x = self.pool4(F.relu(self.conv4(x)))
x = self.drop4(x)
x = self.pool5(F.relu(self.conv5(x)))
x = self.drop5(x)
x = x.view(x.size(0), -1)
x = F.relu(self.fc1(x))
x = self.drop6(x)
x = self.fc2(x)
# a modified x, having gone through all the layers of your model, should be returned
return x
代码语言:javascript复制def forward(self, x):
## TODO: Define the feedforward behavior of this model
## x is the input image and, as an example, here you may choose to #include a pool/conv step:
## x = self.pool(F.relu(self.conv1(x)))
x = self.pool1(F.relu(self.conv1(x)))
x = self.drop1(x)
x = self.pool2(F.relu(self.conv2(x)))
x = self.drop2(x)
x = self.pool3(F.relu(self.conv3(x)))
x = self.drop3(x)
x = self.pool4(F.relu(self.conv4(x)))
x = self.drop4(x)
x = self.pool5(F.relu(self.conv5(x)))
x = self.drop5(x)
x = x.view(x.size(0), -1)
x = F.relu(self.fc1(x))
x = self.drop6(x)
x = self.fc2(x)
# a modified x, having gone through all the layers of your model, #should be returned
return x
- 像以前一样创建转换后的面部关键点数据集
# create the transformed dataset
transformed_dataset = FacialKeypointsDataset(csv_file='/data/training_frames_keypoints.csv',
root_dir='/data/training/',
transform=data_transform)
print('Number of images: ', len(transformed_dataset))
# iterate through the transformed dataset and print some stats about the first few samples
for i in range(4):
sample = transformed_dataset[i]
print(i, sample['image'].size(), sample['keypoints'].size())
- 批处理和加载数据
接下来,在定义了转换后的数据集之后,可以使用PyTorch的DataLoader类以任意大小批量加载训练数据,并重新整理数据以训练模型。
代码语言:javascript复制# load training data in batches
batch_size = 10
train_loader = DataLoader(transformed_dataset,
batch_size=batch_size,
shuffle=True,
num_workers=0)
- 训练CNN模型并追踪损失
## TODO: Define the loss and optimization
import torch.optim as optim
criterion = nn.SmoothL1Loss()
optimizer = optim.Adam(net.parameters(), lr = 0.001)
注意:请尝试使用其他准则“ 损失”函数,并且将学习率的值设置为可能的最低数;在这种情况下(0.001)。
- 训练和初步观察
为了快速观察模型的训练方式并决定是否应该修改其结构或超参数,建议首先从一个或两个时期开始。在训练时,请注意模型的损失随时间变化的表现:起初是否会迅速减少,然后减慢?首先要花一点时间吗?如果更改训练数据的批量大小或修改损失函数会怎样?等等
使用这些初步观察结果来更改模型,并确定最佳体系结构,然后再训练许多时期并创建最终模型。
找到好模型后,请保存它。这样就可以稍后加载和使用它。
在训练了神经网络以检测面部关键点之后,可以将该网络应用于包含面部的任何图像。
- 使用项目中的Haar级联检测器检测任何图像中的人脸。
# load in a haar cascade classifier for detecting frontal faces
face_cascade = cv2.CascadeClassifier('detector_architectures/haarcascade_frontalface_default.xml')
# run the detector
# the output here is an array of detections; the corners of each detection box
# if necessary, modify these parameters until you successfully identify every face in a given image
faces = face_cascade.detectMultiScale(image, 1.2, 2)
# make a copy of the original image to plot detections on
image_with_detections = image.copy()
# loop over the detected faces, mark the image where each face is found
for (x,y,w,h) in faces:
# draw a rectangle around each detected face
# you may also need to change the width of the rectangle drawn depending on image resolution
cv2.rectangle(image_with_detections,(x,y),(x w,y h),(255,0,0),3)
fig = plt.figure(figsize=(9,9))
plt.imshow(image_with_detections)
Haar级联检测器
将每个检测到的人脸转换为输入张量
需要对检测到的每张脸执行以下步骤:
- 将人脸从RGB转换为灰度
- 标准化灰度图像,使其颜色范围落在[0,1]而不是[0,255]
- 将检测到的人脸重新缩放为CNN的预期正方形尺寸(224x224,建议)
- 将numpy图像重塑为torch图像。
检测并显示预测的关键点
在将每个面部适当地转换为输入张量供网络用作输入后,可以将网络应用于每个面部。输出应该是预测的面部关键点。
这些关键点将需要“未规范化”才能显示,并且可能会发现编写诸如的辅助函数会有所帮助show_keypoints。
代码语言:javascript复制def showpoints(image,keypoints):
plt.figure()
keypoints = keypoints.data.numpy()
keypoints = keypoints * 60.0 68
keypoints = np.reshape(keypoints, (68, -1))
plt.imshow(image, cmap='gray')
plt.scatter(keypoints[:, 0], keypoints[:, 1], s=50, marker='.', c='r')
from torch.autograd import Variable
image_copy = np.copy(image)
# loop over the detected faces from your haar cascade
for (x,y,w,h) in faces:
# Select the region of interest that is the face in the image
roi = image_copy[y:y h,x:x w]
## TODO: Convert the face region from RGB to grayscale
roi = cv2.cvtColor(roi, cv2.COLOR_RGB2GRAY)
image = roi
## TODO: Normalize the grayscale image so that its color range falls ##in [0,1] instead of [0,255]
roi = roi/255.0
## TODO: Rescale the detected face to be the expected square ##size for your CNN (224x224, suggested)
roi = cv2.resize(roi, (224,224))
## TODO: Reshape the numpy image shape (H x W x C) into a torch ##image shape (C x H x W)
roi = np.expand_dims(roi, 0)
roi = np.expand_dims(roi, 0)
## TODO: Make facial keypoint predictions using your loaded, ##trained network
roi_torch = Variable(torch.from_numpy(roi))
roi_torch = roi_torch.type(torch.FloatTensor)
keypoints = net(roi_torch)
## TODO: Display each detected face and the corresponding keypoints
showpoints(image,keypoints)
代码语言:javascript复制def showpoints(image,keypoints):
plt.figure()
keypoints = keypoints.data.numpy()
keypoints = keypoints * 60.0 68
keypoints = np.reshape(keypoints, (68, -1))
plt.imshow(image, cmap='gray')
plt.scatter(keypoints[:, 0], keypoints[:, 1], s=50, marker='.', c='r')
from torch.autograd import Variable
image_copy = np.copy(image)
# loop over the detected faces from your haar cascade
for (x,y,w,h) in faces:
# Select the region of interest that is the face in the image
roi = image_copy[y:y h,x:x w]
## TODO: Convert the face region from RGB to grayscale
roi = cv2.cvtColor(roi, cv2.COLOR_RGB2GRAY)
image = roi
## TODO: Normalize the grayscale image so that its color range falls in [0,1] instead of [0,255]
roi = roi/255.0
## TODO: Rescale the detected face to be the expected square size for your CNN (224x224, suggested)
roi = cv2.resize(roi, (224,224))
## TODO: Reshape the numpy image shape (H x W x C) into a torch image shape (C x H x W)
roi = np.expand_dims(roi, 0)
roi = np.expand_dims(roi, 0)
## TODO: Make facial keypoint predictions using your loaded, trained network
roi_torch = Variable(torch.from_numpy(roi))
roi_torch = roi_torch.type(torch.FloatTensor)
keypoints = net(roi_torch)
## TODO: Display each detected face and the corresponding keypoints
showpoints(image,keypoints)
输出:
检测到面部关键点
哦! 就Voldemort所担心的CNN无法检测到的鼻子而言,Pinnochio的一条建议可能会有所帮助。
随时在Github上查看项目。
https://github.com/Noob-can-Compile/Facial_Keypoint_Detection