Caffe Vision 层 - 卷积层 ConvLayer
Caffe 的视觉层一般采用 images 作为输入,输出另一种 images. 也可以是其它类型的数据和维度.
images 可以是单通道 (1 channel) 的灰度图,也可以是三通道(3 channel) 的 RGB 彩色图片.
视觉层一般是对输入 images 的特定区域进行特定处理,得到特定区域对应的输出区域,如 Convolution Layer, Pooling Layer, Spatial Pyramid Pooling (SPP), Crop, Deconvolution Layer, Im2Col 等.
卷积层 Conv Layer
Conv 层采用一组待学习的 filters 对输入图片进行卷积操作,每一个 filter 输出一个 feature map.
Caffe 提供了 Conv 层的 CPU 和 GPU 实现:
- 头文件 -
./include/caffe/layers/conv_layer.hpp
- CPU 实现 -
./src/caffe/layers/conv_layer.cpp
- CUDA GPU 实现 -
./src/caffe/layers/conv_layer.cu
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
# learning rate and decay multipliers for the filters
param {
lr_mult: 1
decay_mult: 1
}
# learning rate and decay multipliers for the biases
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96 # learn 96 filters
kernel_size: 11 # each filter is 11x11
stride: 4 # step 4 pixels between each filter application
weight_filler {
type: "gaussian" # initialize the filters from a Gaussian
std: 0.01 # distribution with stdev 0.01 (default mean: 0)
}
bias_filler {
type: "constant" # initialize the biases to zero (0)
value: 0
}
}
}
2. caffe.proto 中的定义
代码语言:javascript复制message ConvolutionParameter {
optional uint32 num_output = 1; // 网络层输出数
optional bool bias_term = 2 [default = true]; // 是否有 bias 项
// Pad, kernel size, and stride are all given as a single value for equal
// dimensions in all spatial dimensions, or once per spatial dimension.
repeated uint32 pad = 3; // The padding size; defaults to 0
repeated uint32 kernel_size = 4; // The kernel size
repeated uint32 stride = 6; // The stride; defaults to 1
// Factor used to dilate the kernel, (implicitly) zero-filling the resulting holes.
repeated uint32 dilation = 18; // The dilation; defaults to 1
// For 2D convolution only, the *_h and *_w versions may also be used to
// specify both spatial dimensions.
optional uint32 pad_h = 9 [default = 0]; // The padding height (2D only)
optional uint32 pad_w = 10 [default = 0]; // The padding width (2D only)
optional uint32 kernel_h = 11; // The kernel height (2D only)
optional uint32 kernel_w = 12; // The kernel width (2D only)
optional uint32 stride_h = 13; // The stride height (2D only)
optional uint32 stride_w = 14; // The stride width (2D only)
// 将输入通道和输出通道数分组
optional uint32 group = 5 [default = 1]; // The group size for group conv
optional FillerParameter weight_filler = 7; // The filler for the weight
optional FillerParameter bias_filler = 8; // The filler for the bias
enum Engine {
DEFAULT = 0;
CAFFE = 1;
CUDNN = 2;
}
optional Engine engine = 15 [default = DEFAULT];
// The axis to interpret as "channels" when performing convolution.
// Preceding dimensions are treated as independent inputs;
// succeeding dimensions are treated as "spatial".
// With (N, C, H, W) inputs, and axis == 1 (the default), we perform
// N independent 2D convolutions, sliding C-channel (or (C/g)-channels, for
// groups g>1) filters across the spatial axes (H, W) of the input.
// With (N, C, D, H, W) inputs, and axis == 1, we perform
// N independent 3D convolutions, sliding (C/g)-channels
// filters across the spatial axes (D, H, W) of the input.
optional int32 axis = 16 [default = 1];
// Whether to force use of the general ND convolution, even if a specific
// implementation for blobs of the appropriate number of spatial dimensions
// is available. (Currently, there is only a 2D-specific convolution
// implementation; for input blobs with num_axes != 2, this option is
// ignored and the ND implementation will be used.)
optional bool force_nd_im2col = 17 [default = false];
}
3. 参数说明
Conv 层在 Caffe 定义中涉及的参数:convolution_param
.
- num_output(CoCoC_o) - filters 数
- kernel_size - 指定的每个 filter 的 height 和 width,也可以定义为
kernel_h
和kernel_w
- weight_filler - 权重初始化
- type: ‘constant’ value: 0 默认值
- type: “gaussian”
- type: “positive_unitball”
- type: “uniform”
- type: “msra”
- type: “bilinear”
- bias_term - 可选参数(默认
True
),指定是否学习 bias,在 filter 输出上添加额外的 biases. - pad - 补零,可选参数(默认为 0),也可以是
pad_h
和pad_w
. - stride - 步长,可选参数(默认为 1),也可以是
stride_h
和stride_w
. - group - 分组,可选参数(默认为 1),如果 group>1,则限制每个 filter 的连续性,分组到输入的一个子集subset 中. 即: 输入和输出通道被分为 group 个组,第 i 个输出通道组仅与第 i 个输入通道组相连接.
3.1 dilation 参数
[论文阅读理解 - Dilated Convolution]
3.2 group 参数
ResNeXt - Aggregated Residual Transformations for Deep Neural Networks 论文有关于 Group Convolution 的介绍.
论文阅读理解 - ResNeXt - Aggregated Residual Transformations for DNN
根据 Caffe 官方给出的说明:
group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a subset of the input. Specifically, the input and output channels are separated into g groups, and the i-th output group channels will be only connected to the i-th input group channels. group - 分组,可选参数(默认为 1),如果 group>1,则限制每个 filter 的连续性,分组到输入的一个子集subset 中. 即: 输入和输出通道被分为 group 个组,第 i 个输出通道组仅与第 i 个输入通道组相连接.
例如:
group conv ResNext与Xception——对模型的新思考
4. gif 图示
Github - conv_arithmetic 给出的动图展示效果很不错.
以下图中,蓝色 maps 是输入,青色 maps 是输出.
Blue maps are inputs, and cyan maps are outputs.
- No padding, no strides
- Arbitrary padding, no strides
- Half padding, no strides
- Full padding, no strides
- No padding, strides
- Padding, strides
- Padding, strides (odd)
- Dilated convolution - No padding, no stride, dilation