卷积神经网络
卷积神经网络主要包括3层,即:卷积层、池化层以及全连接层。本文讲分别细致介绍这三层的作用和计算来复习一下卷积神经网络。本文采用简单的LeNet来讨论这些问题,模型的结构如下。
卷积层
卷积层的功能是特征提取。我们先设定下面的符号:
- H:图片高度;
- W:图片宽度;
- D:原始图片通道数,也是卷积核个数;
- F:卷积核高宽大小;
- P:图像边扩充大小;
- S:滑动步长
- K:卷积核的个数
在卷积操作中卷积核是可学习的参数,每层卷积的参数大小为D×F×F×K。这样看来卷积层的参数还是比较少的,主要原因是采用了两个重要的特性:局部连接和权值共享。
- 局部连接 从神经网络连接结构的角度, CNN的底层与隐藏不再是全连接, 而是局部区域的成块连接:
成块连接后, 那些小块, 还能在上层聚集成更大的块:
- 权值共享 给一张输入图片,用一个filter去扫这张图,filter里面的数就叫权重,这张图每个位置就是被同样的filter扫的,所以权重是一样的,也就是共享。
池化层
如果用上面的方法堆砌CNN网络, 隐藏层的参数还是太多了, 不是吗? 每个相邻块都要在上层生成一个大的块。所以有时我们为了减少参数复杂度, 不严格把相邻的块都至少聚合成一个上层块, 我们可以把下层块分一些区域, 在这些区域中聚合:
所以池化层的功能主要是对输入的特征图进行压缩,一方面使特征图变小,简化网络计算复杂度;另一方面进行特征压缩,提取主要特征。 最常见的池化操作为平均池化mean pooling和最大池化max pooling:
- 平均池化:计算图像区域的平均值作为该区域池化后的值。
- 最大池化:选图像区域的最大值作为该区域池化后的值。
3D的卷积和池化如图所示:
全连接层
卷积取的是局部特征,全连接就是把以前的局部特征重新通过权值矩阵组装成完整的图,将输出值送给分类器(如softmax分类器)。
LeNet
第一层,卷积层
输入图像的大小32x32x1,卷积核尺寸为5x5,深度为6,不使用全0填充,步长为1。所以这一层的输出:28x28x6,卷积层共有5x5x1x6 6=156个参数。
第二层,池化层
这一层的输入为第一层的输出,是一个28x28x6的节点矩阵。本层采用的过滤器大小为2x2,长和宽的步长均为2,所以本层的输出矩阵大小为14x14x6。
第三层,卷积层
本层的输入矩阵大小为14x14x6,使用的过滤器大小为5x5,深度为16.本层不使用全0填充,步长为1。本层的输出矩阵大小为10x10x16。本层有5x5x6x16 16=2416个参数。
第四层,池化层
本层的输入矩阵大小10x10x16。本层采用的过滤器大小为2x2,长和宽的步长均为2,所以本层的输出矩阵大小为5x5x16。
第五层,全连接层
本层的输入矩阵大小为5x5x16,在LeNet-5论文中将这一层成为卷积层,但是因为过滤器的大小就是5x5,所以和全连接层没有区别。如果将5x5x16矩阵中的节点拉成一个向量,那么这一层和全连接层就一样了。本层的输出节点个数为120,总共有5x5x16x120 120=48120个参数。
第六层,全连接层
本层的输入节点个数为120个,输出节点个数为84个,总共参数为120x84 84=10164个。
第七层,全连接层
本层的输入节点个数为84个,输出节点个数为10个,总共参数为84x10 10=850
一些有用的代码
在NoteBook里面显示训练用到的图片
代码语言:javascript复制%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
# Parameters for our graph; we'll output images in a 4x4 configuration
nrows = 4
ncols = 4
# Index for iterating over images
pic_index = 0
# Set up matplotlib fig, and size it to fit 4x4 pics
fig = plt.gcf()
fig.set_size_inches(ncols * 4, nrows * 4)
pic_index = 8
next_cat_pix = [os.path.join(train_cats_dir, fname)
for fname in train_cat_fnames[pic_index-8:pic_index]]
next_dog_pix = [os.path.join(train_dogs_dir, fname)
for fname in train_dog_fnames[pic_index-8:pic_index]]
for i, img_path in enumerate(next_cat_pix next_dog_pix):
# Set up subplot; subplot indices start at 1
sp = plt.subplot(nrows, ncols, i 1)
sp.axis('Off') # Don't show axes (or gridlines)
img = mpimg.imread(img_path)
plt.imshow(img)
plt.show()
主要使用matplotlib画一个4x4的图,代码效果如下:
Keras图像增强
通过对现有图像执行随机变换来人为地增加训练样例的多样性和数量,以创建一组新变体。当原始训练数据集相对较小时,数据增加特别有用。
代码语言:javascript复制from tensorflow.keras.preprocessing.image import ImageDataGenerator
# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
# datagen = ImageDataGenerator(
# rotation_range=40,
# width_shift_range=0.2,
# height_shift_range=0.2,
# shear_range=0.2,
# zoom_range=0.2,
# horizontal_flip=True,
# fill_mode='nearest')
# Flow training images in batches of 20 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
train_dir, # This is the source directory for training images
target_size=(150, 150), # All images will be resized to 150x150
batch_size=20,
# Since we use binary_crossentropy loss, we need binary labels
class_mode='binary')
# Flow validation images in batches of 20 using test_datagen generator
validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=20,
class_mode='binary')
history = model.fit_generator(
train_generator,
steps_per_epoch=100, # 2000 images = batch_size * steps
epochs=15,
validation_data=validation_generator,
validation_steps=50, # 1000 images = batch_size * steps
verbose=2)
ImageDataGenerator的一些参数介绍:
- rotation_range is a value in degrees (0–180), a range within which to randomly rotate pictures.
- width_shift and height_shift are ranges (as a fraction of total width or height) within which to randomly translate pictures vertically or horizontally.
- shear_range is for randomly applying shearing transformations.
- zoom_range is for randomly zooming inside pictures.
- horizontal_flip is for randomly flipping half of the images horizontally. This is relevant when there are no assumptions of horizontal assymmetry (e.g. real-world pictures).
- fill_mode is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift.
可视化中间层
代码语言:javascript复制import numpy as np
import random
from tensorflow.keras.preprocessing.image import img_to_array, load_img
# Let's define a new Model that will take an image as input, and will output
# intermediate representations for all layers in the previous model after
# the first.
successive_outputs = [layer.output for layer in model.layers[1:]]
visualization_model = Model(img_input, successive_outputs)
# Let's prepare a random input image of a cat or dog from the training set.
cat_img_files = [os.path.join(train_cats_dir, f) for f in train_cat_fnames]
dog_img_files = [os.path.join(train_dogs_dir, f) for f in train_dog_fnames]
img_path = random.choice(cat_img_files dog_img_files)
img = load_img(img_path, target_size=(150, 150)) # this is a PIL image
x = img_to_array(img) # Numpy array with shape (150, 150, 3)
x = x.reshape((1,) x.shape) # Numpy array with shape (1, 150, 150, 3)
# Rescale by 1/255
x /= 255
# Let's run our image through our network, thus obtaining all
# intermediate representations for this image.
successive_feature_maps = visualization_model.predict(x)
# These are the names of the layers, so can have them as part of our plot
layer_names = [layer.name for layer in model.layers]
# Now let's display our representations
for layer_name, feature_map in zip(layer_names, successive_feature_maps):
if len(feature_map.shape) == 4:
# Just do this for the conv / maxpool layers, not the fully-connected layers
n_features = feature_map.shape[-1] # number of features in feature map
# The feature map has shape (1, size, size, n_features)
size = feature_map.shape[1]
# We will tile our images in this matrix
display_grid = np.zeros((size, size * n_features))
for i in range(n_features):
# Postprocess the feature to make it visually palatable
x = feature_map[0, :, :, i]
x -= x.mean()
x /= x.std()
x *= 64
x = 128
x = np.clip(x, 0, 255).astype('uint8')
# We'll tile each filter into this big horizontal grid
display_grid[:, i * size : (i 1) * size] = x
# Display the grid
scale = 20. / n_features
plt.figure(figsize=(scale * n_features, scale))
plt.title(layer_name)
plt.grid(False)
plt.imshow(display_grid, aspect='auto', cmap='viridis')
可以看出,从浅到深模型学习到的特征越来越抽象,图像的原始像素的信息越来越少,但是关于图像类别的信息越来越精细。
Loss和Acc可视化
代码语言:javascript复制# Retrieve a list of accuracy results on training and test data
# sets for each training epoch
acc = history.history['acc']
val_acc = history.history['val_acc']
# Retrieve a list of list results on training and test data
# sets for each training epoch
loss = history.history['loss']
val_loss = history.history['val_loss']
# Get number of epochs
epochs = range(len(acc))
# Plot training and validation accuracy per epoch
plt.plot(epochs, acc)
plt.plot(epochs, val_acc)
plt.title('Training and validation accuracy')
plt.figure()
# Plot training and validation loss per epoch
plt.plot(epochs, loss)
plt.plot(epochs, val_loss)
plt.title('Training and validation loss')
上图表示模型过拟合了,简单来说就是训练集和验证集上模型表现不一致。主要原因的数据集太小,一些示例太少,导致模型学习到的知识推广不到新的数据集上,即当模型开始用不相关的特征进行预测的时候就会发生过拟合。例如,如果你作为人类,只能看到三个伐木工人的图像,以及三个水手人的图像,其中唯一一个戴帽子的人是伐木工人,你可能会开始认为戴着帽子是一名伐木工人而不是水手的标志。然后你会做一个非常差的伐木工人/水手分类器。
Reference
- https://developers.google.cn/machine-learning/practica/image-classification/
- 从LeNet到VGG,看卷积 池化串联的网络结构