Caffe Vision 层 - 卷积层 ConvLayer

2020-06-12 15:36:23 浏览数 (1)

Caffe Vision 层 - 卷积层 ConvLayer

Caffe 的视觉层一般采用 images 作为输入,输出另一种 images. 也可以是其它类型的数据和维度.

images 可以是单通道 (1 channel) 的灰度图,也可以是三通道(3 channel) 的 RGB 彩色图片.

视觉层一般是对输入 images 的特定区域进行特定处理,得到特定区域对应的输出区域,如 Convolution Layer, Pooling Layer, Spatial Pyramid Pooling (SPP), Crop, Deconvolution Layer, Im2Col 等.

卷积层 Conv Layer

Conv 层采用一组待学习的 filters 对输入图片进行卷积操作,每一个 filter 输出一个 feature map.

Caffe 提供了 Conv 层的 CPU 和 GPU 实现:

  • 头文件 - ./include/caffe/layers/conv_layer.hpp
  • CPU 实现 - ./src/caffe/layers/conv_layer.cpp
  • CUDA GPU 实现 - ./src/caffe/layers/conv_layer.cu
代码语言:javascript复制
  layer {
    name: "conv1"
    type: "Convolution"
    bottom: "data"
    top: "conv1"
    # learning rate and decay multipliers for the filters
    param { 
        lr_mult: 1 
        decay_mult: 1 
    }
    # learning rate and decay multipliers for the biases
    param { 
        lr_mult: 2 
        decay_mult: 0 
    }
    convolution_param {
      num_output: 96     # learn 96 filters
      kernel_size: 11    # each filter is 11x11
      stride: 4          # step 4 pixels between each filter application
      weight_filler {
        type: "gaussian" # initialize the filters from a Gaussian
        std: 0.01        # distribution with stdev 0.01 (default mean: 0)
      }
      bias_filler {
        type: "constant" # initialize the biases to zero (0)
        value: 0
      }
    }
  }

2. caffe.proto 中的定义

代码语言:javascript复制
message ConvolutionParameter {
  optional uint32 num_output = 1;  // 网络层输出数
  optional bool bias_term = 2 [default = true]; // 是否有 bias 项

  // Pad, kernel size, and stride are all given as a single value for equal
  // dimensions in all spatial dimensions, or once per spatial dimension.
  repeated uint32 pad = 3; // The padding size; defaults to 0
  repeated uint32 kernel_size = 4; // The kernel size
  repeated uint32 stride = 6; // The stride; defaults to 1

  // Factor used to dilate the kernel, (implicitly) zero-filling the resulting holes.
  repeated uint32 dilation = 18; // The dilation; defaults to 1

  // For 2D convolution only, the *_h and *_w versions may also be used to
  // specify both spatial dimensions.
  optional uint32 pad_h = 9 [default = 0]; // The padding height (2D only)
  optional uint32 pad_w = 10 [default = 0]; // The padding width (2D only)
  optional uint32 kernel_h = 11; // The kernel height (2D only)
  optional uint32 kernel_w = 12; // The kernel width (2D only)
  optional uint32 stride_h = 13; // The stride height (2D only)
  optional uint32 stride_w = 14; // The stride width (2D only)

  // 将输入通道和输出通道数分组
  optional uint32 group = 5 [default = 1]; // The group size for group conv

  optional FillerParameter weight_filler = 7; // The filler for the weight
  optional FillerParameter bias_filler = 8; // The filler for the bias
  enum Engine {
    DEFAULT = 0;
    CAFFE = 1;
    CUDNN = 2;
  }
  optional Engine engine = 15 [default = DEFAULT];

  // The axis to interpret as "channels" when performing convolution.
  // Preceding dimensions are treated as independent inputs;
  // succeeding dimensions are treated as "spatial".
  // With (N, C, H, W) inputs, and axis == 1 (the default), we perform
  // N independent 2D convolutions, sliding C-channel (or (C/g)-channels, for
  // groups g>1) filters across the spatial axes (H, W) of the input.
  // With (N, C, D, H, W) inputs, and axis == 1, we perform
  // N independent 3D convolutions, sliding (C/g)-channels
  // filters across the spatial axes (D, H, W) of the input.
  optional int32 axis = 16 [default = 1];

  // Whether to force use of the general ND convolution, even if a specific
  // implementation for blobs of the appropriate number of spatial dimensions
  // is available. (Currently, there is only a 2D-specific convolution
  // implementation; for input blobs with num_axes != 2, this option is
  // ignored and the ND implementation will be used.)
  optional bool force_nd_im2col = 17 [default = false];
}

3. 参数说明

Conv 层在 Caffe 定义中涉及的参数:convolution_param.

  • num_output(CoCoC_o) - filters 数
  • kernel_size - 指定的每个 filter 的 height 和 width,也可以定义为 kernel_hkernel_w
  • weight_filler - 权重初始化
    • type: ‘constant’ value: 0 默认值
    • type: “gaussian”
    • type: “positive_unitball”
    • type: “uniform”
    • type: “msra”
    • type: “bilinear”
  • bias_term - 可选参数(默认True),指定是否学习 bias,在 filter 输出上添加额外的 biases.
  • pad - 补零,可选参数(默认为 0),也可以是 pad_hpad_w.
  • stride - 步长,可选参数(默认为 1),也可以是 stride_hstride_w.
  • group - 分组,可选参数(默认为 1),如果 group>1,则限制每个 filter 的连续性,分组到输入的一个子集subset 中. 即: 输入和输出通道被分为 group 个组,第 i 个输出通道组仅与第 i 个输入通道组相连接.

3.1 dilation 参数

[论文阅读理解 - Dilated Convolution]

3.2 group 参数

ResNeXt - Aggregated Residual Transformations for Deep Neural Networks 论文有关于 Group Convolution 的介绍.

论文阅读理解 - ResNeXt - Aggregated Residual Transformations for DNN

根据 Caffe 官方给出的说明:

group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a subset of the input. Specifically, the input and output channels are separated into g groups, and the i-th output group channels will be only connected to the i-th input group channels. group - 分组,可选参数(默认为 1),如果 group>1,则限制每个 filter 的连续性,分组到输入的一个子集subset 中. 即: 输入和输出通道被分为 group 个组,第 i 个输出通道组仅与第 i 个输入通道组相连接.

例如:

group conv ResNext与Xception——对模型的新思考

4. gif 图示

Github - conv_arithmetic 给出的动图展示效果很不错.

以下图中,蓝色 maps 是输入,青色 maps 是输出.

Blue maps are inputs, and cyan maps are outputs.

  • No padding, no strides
  • Arbitrary padding, no strides
  • Half padding, no strides
  • Full padding, no strides
  • No padding, strides
  • Padding, strides
  • Padding, strides (odd)
  • Dilated convolution - No padding, no stride, dilation

0 人点赞