深度残差网络(ResNet)论文学习(附代码实现)

本文结合50层深度残差网络的实现学习何博士的大作-Deep Residual Learning for Image Recognition。理论上，深层网络结构包含了浅层网络结构所有可能的解空间，但是实际网络训练中，随着网络深度的增加，网络的准确度出现饱和，甚至下降的现象，这个现象可以在下图直观看出来：56层的网络比20层网络效果还要差。但是这种退化并不是因为过拟合导致的，因为56层的神经网络的训练误差同样高。

56层神经网络和20层神经网络训练误差和测试误差对比

这就是神经网络的退化现象。何博士提出的残差学习的方法解决了解决了神经网络的退化问题，在深度学习领域取得了巨大的成功。

1.Residual Networks

各个深度的神经网络的结构如下:

ResNet神经网络结构

50层网络的结构实际上是把34层网络的2个3x3的卷积层替换成3个卷积层：1x1、3x3、1x1,可以看到50层的网络相对于34层的网络，效果上有不小的提升。

model	top-1 err.	top-1 err.
plain-34	28.54	10.02
ResNet-34 A	25.03	7.76
ResNet-34 B	24.52	7.46
ResNet-34 C	24.19	7.40
ResNet-50	22.85	6.71
ResNet-101	21.75	6.05
ResNet-152	21.43	5.71

代码实现

ResNet 50代码实现的网络结构与上图50层的网络架构完全一致。对于深度较深的神经网络，BN必不可少，关于BN的介绍和实现可以参考以前的文章。

代码语言：javascript复制

class ResNet50(object):
    def __init__(self, inputs, num_classes=1000, is_training=True,
                 scope="resnet50"):
        self.inputs =inputs
        self.is_training = is_training
        self.num_classes = num_classes

        with tf.variable_scope(scope):
            # construct the model
            net = conv2d(inputs, 64, 7, 2, scope="conv1") # -> [batch, 112, 112, 64]
            net = tf.nn.relu(batch_norm(net, is_training=self.is_training, scope="bn1"))
            net = max_pool(net, 3, 2, scope="maxpool1")  # -> [batch, 56, 56, 64]
            net = self._block(net, 256, 3, init_stride=1, is_training=self.is_training,
                              scope="block2")           # -> [batch, 56, 56, 256]
            net = self._block(net, 512, 4, is_training=self.is_training, scope="block3")
                                                        # -> [batch, 28, 28, 512]
            net = self._block(net, 1024, 6, is_training=self.is_training, scope="block4")
                                                        # -> [batch, 14, 14, 1024]
            net = self._block(net, 2048, 3, is_training=self.is_training, scope="block5")
                                                        # -> [batch, 7, 7, 2048]
            net = avg_pool(net, 7, scope="avgpool5")    # -> [batch, 1, 1, 2048]
            net = tf.squeeze(net, [1, 2], name="SpatialSqueeze") # -> [batch, 2048]
            self.logits = fc(net, self.num_classes, "fc6")       # -> [batch, num_classes]
            self.predictions = tf.nn.softmax(self.logits)

2.Building Block

每个Block中往往包含多个子Block，每个子Block又有多个卷积层组成。每个Block的第一个子Block的第一个卷积层的stride=2，完成Feature Map的下采样的工作。

残差网络的Building Block结构

代码实现

代码语言：javascript复制

def _block(self, x, n_out, n, init_stride=2, is_training=True, scope="block"):
    with tf.variable_scope(scope):
        h_out = n_out // 4
        out = self._bottleneck(x, h_out, n_out, stride=init_stride,
                            is_training=is_training, scope="bottlencek1")
        for i in range(1, n):
            out = self._bottleneck(out, h_out, n_out, is_training=is_training,
                                scope=("bottlencek%s" % (i   1)))
        return out

3. Bottleneck Architectures

在更深层(esNet-50/101/152)的神经网络中为了节省计算耗时，作者对神经网络的架构进行了改造，将原有的两层3x3卷积层改造为三层卷积层：1x1，3x3，1x1。

The three layers are 1×1, 3×3, and 1×1 convolutions, where the 1×1 layers are responsible for reducing and then increasing (restoring)dimensions, leaving the 3×3 layer a bottleneck with smaller input/output dimensions。

BottleNeck结构

代码实现:

x: 是输入数据，格式为[BatchSize, ImageHeight，ImageWidth, ChannelNum]；

h_out: 卷积核个数；

n_out: Block的输出的卷积核个数；

stride: 卷积步长；

is_training: 用于Batch Normalization；

代码语言：javascript复制

def _bottleneck(self, x, h_out, n_out, stride=None, is_training=True, scope="bottleneck"):
    """ A residual bottleneck unit"""
    n_in = x.get_shape()[-1]
    if stride is None:
        stride = 1 if n_in == n_out else 2

    with tf.variable_scope(scope):
        h = conv2d(x, h_out, 1, stride=stride, scope="conv_1")
        h = batch_norm(h, is_training=is_training, scope="bn_1")
        h = tf.nn.relu(h)
        h = conv2d(h, h_out, 3, stride=1, scope="conv_2")
        h = batch_norm(h, is_training=is_training, scope="bn_2")
        h = tf.nn.relu(h)
        h = conv2d(h, n_out, 1, stride=1, scope="conv_3")
        h = batch_norm(h, is_training=is_training, scope="bn_3")
        if n_in != n_out:
            shortcut = conv2d(x, n_out, 1, stride=stride, scope="conv_4")
            shortcut = batch_norm(shortcut, is_training=is_training, scope="bn_4")
        else:
            shortcut = x
        return tf.nn.relu(shortcut   h)

4. Shortcuts

Identity Mapping是深度残差网络的一个核心思想，深度残差网络中Building Block表达公式如下:

x是Layer Input， y是未经过Relu激活函数的Layer Output，

是待学习的残差映射。

上式仅仅能处理

和x维度相同的情况，当二者维度不同的情况下应该怎么处理呢？

作者提出了两种处理方式: zero padding shortcut和 projection shortcut。并在实验中构造三种shortcut的方式：

A) 当数据维度增加时，采用zero padding进行数据填充；

B) 当数据维度增加时，采用projection的方式；数据维度不变化时，直接使用恒等映射；

C) 数据维度增加与否都采用projection的方式；

三种方式的对比效果如下:

model	top-1 err.	top-1 err.
plain-34	28.54	10.02
ResNet-34 A	25.03	7.76
ResNet-34 B	24.52	7.46
ResNet-34 C	24.19	7.40

可以看到效果排序如下： A < B < C，但是三者的差别并不是很大，所以作者得出一个结论: projection shortcut并不是解决退化问题的关键，所以为了避免projection带来的计算负担，论文中很少采用C作为shortcut的方案。上述代码中使用projection shortcut结合Identity shortcut的映射方案。

5.其它辅助函数的实现

5.1 变量初始化

代码语言：javascript复制

fc_initializer = tf.contrib.layers.xavier_initializer
conv2d_initializer = tf.contrib.layers.xavier_initializer_conv2d

5.2 创建变量的辅助函数

代码语言：javascript复制

# create weight variable
def create_var(name, shape, initializer, trainable=True):
    return tf.get_variable(name, shape=shape, dtype=tf.float32,
                           initializer=initializer, trainable=trainable)

5.3 卷积辅助函数

代码语言：javascript复制

# conv2d layer
def conv2d(x, num_outputs, kernel_size, stride=1, scope="conv2d"):
    num_inputs = x.get_shape()[-1]
    with tf.variable_scope(scope):
        kernel = create_var("kernel", [kernel_size, kernel_size,
                                       num_inputs, num_outputs],
                            conv2d_initializer())
        return tf.nn.conv2d(x, kernel, strides=[1, stride, stride, 1],
                            padding="SAME")

5.4 全连接辅助函数

代码语言：javascript复制

# fully connected layer
def fc(x, num_outputs, scope="fc"):
    num_inputs = x.get_shape()[-1]
    with tf.variable_scope(scope):
        weight = create_var("weight", [num_inputs, num_outputs],
                            fc_initializer())
        bias = create_var("bias", [num_outputs,],
                          tf.zeros_initializer())
        return tf.nn.xw_plus_b(x, weight, bias)

5.5 Batch Normalization(BN)的函数

代码语言：javascript复制

# batch norm layer
def batch_norm(x, decay=0.999, epsilon=1e-03, is_training=True,
               scope="scope"):
    x_shape = x.get_shape()
    num_inputs = x_shape[-1]
    reduce_dims = list(range(len(x_shape) - 1))
    with tf.variable_scope(scope):
        beta = create_var("beta", [num_inputs,],
                               initializer=tf.zeros_initializer())
        gamma = create_var("gamma", [num_inputs,],
                                initializer=tf.ones_initializer())
        # for inference
        moving_mean = create_var("moving_mean", [num_inputs,],
                                 initializer=tf.zeros_initializer(),
                                 trainable=False)
        moving_variance = create_var("moving_variance", [num_inputs],
                                     initializer=tf.ones_initializer(),
                                     trainable=False)
    if is_training:
        mean, variance = tf.nn.moments(x, axes=reduce_dims)
        update_move_mean = moving_averages.assign_moving_average(moving_mean,
                                                mean, decay=decay)
        update_move_variance = moving_averages.assign_moving_average(moving_variance,
                                                variance, decay=decay)
        tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_move_mean)
        tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_move_variance)
    else:
        mean, variance = moving_mean, moving_variance
    return tf.nn.batch_normalization(x, mean, variance, beta, gamma, epsilon)

5.6 池化函数

代码语言：javascript复制

# avg pool layer
def avg_pool(x, pool_size, scope):
    with tf.variable_scope(scope):
        return tf.nn.avg_pool(x, [1, pool_size, pool_size, 1],
                strides=[1, pool_size, pool_size, 1], padding="VALID")

# max pool layer
def max_pool(x, pool_size, stride, scope):
    with tf.variable_scope(scope):
        return tf.nn.max_pool(x, [1, pool_size, pool_size, 1],
                              [1, stride, stride, 1], padding="SAME")

神经网络批量计算编程算法

0 人点赞