文章来源: https://medium.com/@CinnamonAITaiwan/cnn模型-resnet-mobilenet-densenet-shufflenet-efficientnet-5eba5c8df7e4
CNN演进
下图为我们了展示了2018前常用CNN模型大小与Accuracy的比较,网络上不乏介绍CNN演进的文章[LeNet/AlexNet/Vgg/ Inception/ResNet],写的也都很好,今天我们为各位读者介绍几个最新的CNN模型,如何搭建以及他们的优势在哪里。
CNN模型比较
CNN经典架构
要了解最新模型的优势,有一些架构的基本观念还是得先认识,下面就让我们来看看:Inception、残差网络、Depthwise Separable Convolution的观念。
Inception
Inception的架构最早由Google在2014年提出,其目的在于结合不同特征接收域(Receptive Field)的Kernel,而我们要怎么做到这一点呢?大家可以先看看下图:
Inception架构
图中展示了经典的Inception架构,接在Feature Maps后一共有四条分支,其中三条先经过1*1 kernel的压缩这样做的意义主要是为了控制输出Channels的深度,并同时能增加模型的非线性;一条则是先通过3*3 kernel,而为了确保输出Feature Map在长宽上拥有一样尺寸,我们就要借用Padding技巧,1*1 kernel输出大小与输入相同,而3*3、5*5kernel则分别设定补边值为1、2,当然在tensorflow、Keras中最快的方式就是设定padding=same,就能在步长为1时确保输出尺寸维持相同。具体实现代码如下:
代码语言:javascript复制import tensorflow as tf
def Inception(input_data, input_depth = 192):
with tf.name_scope('Branch_1'):
X_1 = tf.layers.conv2d(input_data, 64, (1, 1))
X_1 = tf.layers.batch_normalization(X_1)
X_1 = tf.nn.leaky_relu(X_1)
with tf.name_scope('Branch_2'):
X_2 = tf.layers.conv2d(input_data, 96, (1, 1))
X_2 = tf.layers.batch_normalization(X_2)
X_2 = tf.nn.leaky_relu(X_2)
X_2 = tf.layers.conv2d(X_2, 128, (3, 3), padding = 'same')
X_2 = tf.layers.batch_normalization(X_2)
X_2 = tf.nn.leaky_relu(X_2)
with tf.name_scope('Branch_3'):
X_3 = tf.layers.conv2d(input_data, 16, (1, 1))
X_3 = tf.layers.batch_normalization(X_3)
X_3 = tf.nn.leaky_relu(X_3)
X_3 = tf.layers.conv2d(X_3, 48, (3, 3), padding = 'same')
X_3 = tf.layers.batch_normalization(X_3)
X_3 = tf.nn.leaky_relu(X_3)
X_3 = tf.layers.conv2d(X_3, 32, (5, 5), padding = 'same')
X_3 = tf.layers.batch_normalization(X_3)
X_3 = tf.nn.leaky_relu(X_3)
with tf . name_scope('Branch_4'):
X_4 = tf.layers.max_pooling2d(input_data, 2, 1, padding = 'same')
X_4 = tf.layers.batch_normalization(X_4)
X_4 = tf.nn.leaky_relu(X_4)
X_4 = tf.layers.conv2d(X_4, 32, (1, 1), padding = 'same')
X_4 = tf.layers.batch_normalization(X_4)
X_4 = tf.nn.leaky_relu(X_4)
out = tf.concat((X_1, X_2, X_3, X_4), axis = 3)
return out
残差网络
残差结构
上图为经典的残差结构,将输入的input与经过2–3层的F(x)跨接并相加,使输出表示为y=F(x) x,这样的好处在于反向传播时能保至少会有一个1存在,降低梯度消失(vanishing gradient)发生的可能性。
什么意思呢?举个例子来说,如果上方function中y(输出)对x偏微分,有一项是x自己对自己微分(得到1),在链式求导中每一项都保有一个1,比较不容易梯度消失,因此可以搭建更深的网路。
tensorflow实现的残差结构如下:
代码语言:javascript复制def Residual_Block(input_data, in_channel, out_channel, s = 1):
X_shortcut = input_data ##记住输入
X = tf.layers.conv2d(input_data, out_channel, (1, 1), strides = (s , s))
X = tf.layers.batch_normalization(X)
X = tf.nn.relu(X)
X = tf.layers.conv2d(X, out_channel, (3, 3), padding = 'same', strides = (s, s))
X = tf.layers.batch_normalization(X)
X = tf.nn.relu(X)
X = tf.layers.conv2d(X, out_channel, (1 , 1), strides = (1, 1),)
X = tf.layers.batch_normalization(X)
if(in_channel != out_channel):
X_shortcut = tf.layers.conv2d(X_shortcut, out_channel, (1 , 1),)
X_shortcut = tf.layers.batch_normalization(X)
X = X X_shortcut
X = tf.nn.relu(X)
return X
Depthwise Separable Convolution
上图为Depthwise Separable Convolution的架构,有别于一般的卷积,其主要可以分为两个步骤:
第一个步骤先将输入Feature Maps与k*k 、深度与input相同的kernel卷积(Depthwise),并且每一个Feature Map与Kernel的卷积是独立的。
第二步再用1*1 、深度与输出深度相同的kernel卷积(Pointwise)。
这样的好处是可以节省大量的参数,下方我们试着算算看参数量差别:
代码语言:javascript复制import tensorflow as tf
#计算总参数量
def get_num_params ():
total_parameters = 0
for variable in tf.trainable_variables():
shape = variable.get_shape()
# print(shape)
# print(len(shape))
variable_parameters = 1
for dim in shape:
# print(dim)
variable_parameters *= dim.value
# print(variable_parameters)
total_parameters = variable_parameters
return total_parameters
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
X = tf.layers.conv2d(inputs, 64, (3, 3), strides = (1, 1), activation = tf.nn.leaky_relu)
print (get_num_params()) ## (3*3*3 1)*64=1792
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
X = tf.layers.separable_conv2d(inputs, 64, (3, 3), padding = 'SAME')
print(get_num_params()) ## 3*3*3 (1*1*3 1)*64=283
由上方程式可以看出,同样输出是300*300*64,separable convolution的参数量大概是一般Convolution的1/6,达到轻量化模型的目的。
Depthwise Separable Convolution的参考代码如下:
代码语言:javascript复制import tensorflow as tf
import tensorflow.contrib as tc
slim = tc.slim
tf.reset_default_graph()
##單獨定義 depthwise_conv層
## 參考 https://github.com/TropComplique/shufflenet-v2-tensorflow/blob/master/architecture.py
def depthwise_conv(
x, kernel=3, stride=1, padding='SAME',
activation_fn=None, normalizer_fn=None,
weights_initializer=tf.contrib.layers.xavier_initializer(),
data_format='NHWC', scope='depthwise_conv'):
with tf.variable_scope(scope):
assert data_format == 'NHWC'
in_channels = x.shape[3].value
W = tf.get_variable(
'depthwise_weights',
[kernel, kernel, in_channels, 1], dtype=tf.float32,
initializer=weights_initializer
)
x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC')
x = normalizer_fn(x) if normalizer_fn is not None else x # batch normalization
x = activation_fn(x) if activation_fn is not None else x # nonlinearity
return x
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
out=depthwise_conv(
inputs, kernel=3, stride=1, padding='SAME',
activation_fn=None, normalizer_fn=None,
weights_initializer=tf.contrib.layers.xavier_initializer(),
data_format='NHWC', scope='depthwise_conv')
print(get_num_params()) ## 3*3*3=27
##運用slim更簡單
def depthwise_conv_bn(x, kernel_size, stride=1, dilation=1):
with tf.variable_scope(None, 'depthwise_conv_bn'):
x = slim.separable_conv2d(x, None, kernel_size, depth_multiplier=1, stride=stride,
rate=dilation,)
#x = slim.batch_norm(x, activation_fn=None, fused=False)
return x
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
out = depthwise_conv_bn(inputs, (3,3), stride=1, dilation=1)
print(get_num_params()) ## 3*3*3=27
CNN模型
ResNetV2
(a) ResnetV1 (e) ResnetV2以及其他变形
ResnetV2同样由Kaiming He团队提出,承袭ResnetV1的残差概念,但在Identity branch(左线)与Residual branch(右线)上做了一些更改。
拿掉Residual Block后的ReLU
作者认为,ReLU接在每个Residual block后面会导致Forward Propagation陷入单调递增,降低表达能力。
拿掉Identity branch后的BN
而如果使用图B的做法,BN层会改变Identity Branch的信息分布,造成收敛速度下降。论文中还有用到一个小技巧,先用1*1 kernel压缩深度,最后再用1*1 kernel回放深度,借此降低运算。
代码语言:javascript复制def ResentV2_block(input_data, input_depth, compress_depth, output_depth, strides = (1, 1)):
X_shortcut = input_data
X = tf.layers.conv2d(input_data, compress_depth, (1, 1)) ##先压缩
X = tf.layers.batch_normalization(X)
X = tf.nn.leaky_relu(X)
X = tf.layers.conv2d(X, compress_depth, (3, 3), padding = 'same', strides = strides)
X = tf.layers.batch_normalization(X)
X = tf.nn.leaky_relu(X)
X = tf.layers.conv2d(X, output_depth, (1, 1)) ##再放大
if(input_depth != output_depth):
X_shortcut = tf.layers.conv2d(X_shortcut, output_depth, (1, 1), strides = strides , padding = 'same') ##深度不同
if(input_depth == output_depth) & (strides != (1, 1)):
X_shortcut = tf.image.resize_images(X_shortcut, (X.shape[1], X.shape[2]), method = 0) ##Size不同
out = X_shortcut X
return out
有了Residual_Block,大家就可以依照论文给的参数去重建ResnetV2模型,论文中还有一些变化,像是多种变形的Residual_Block,有兴趣的读者们可以再深入去了解。
Inception-ResNet
InceptionResnet-A block
Inception-ResNet也是目前时常会用到的model,像是Inception-ResNetV2、InceptionV4等模型,我们上面有了Inception以及Residual Block的观念其实就很容易理解Inception-ResNet。模型核心就是把Residual Block中的Residual branch修改成Inception架构,文献中提出了三种不一样的组合,我们在这里实现InceptionResnet-A block。
代码语言:javascript复制def InceptionResentA_block(input_data, input_depth = 3, output_depth = 384):
X_shortcut = input_data
with tf.name_scope('Branch_1'):
X_1 = tf.layers.conv2d(input_data, 32, (1, 1))
X_1 = tf.layers.batch_normalization(X_1)
X_1 = tf.nn.leaky_relu(X_1)
with tf.name_scope('Branch_2'):
X_2 = tf.layers.conv2d(input_data, 32, (1, 1))
X_2 = tf.layers.batch_normalization(X_2)
X_2 = tf.nn.leaky_relu(X_2)
X_2 = tf.layers.conv2d(X_2 , 32, (3, 3), padding = 'same')
X_2 = tf.layers.batch_normalization(X_2)
X_2 = tf.nn.leaky_relu(X_2)
with tf.name_scope('Branch_3'):
X_3 = tf.layers.conv2d(input_data, 32, (1, 1))
X_3 = tf.layers.batch_normalization(X_3)
X_3 = tf.nn.leaky_relu(X_3)
X_3 = tf.layers.conv2d(X_3 , 48, (3, 3), padding = 'same')
X_3 = tf.layers.batch_normalization(X_3)
X_3 = tf.nn.leaky_relu(X_3)
X_3 = tf.layers.conv2d(X_3 , 64, (3, 3), padding = 'same')
X_3 = tf.layers.batch_normalization(X_3)
X_3 = tf.nn.leaky_relu(X_3)
out = tf . concat ((X_1, X_2, X_3), axis = 3)
out = tf.layers.conv2d(out, output_depth, (1, 1))
if(input_depth != output_depth):
X_shortcut = tf.layers.conv2d(X_shortcut, output_depth, (1, 1))
out = X_shortcut out
return out
DenseNet
Densenet架构
DenseNet为轻量模型的代表之一。下方代码实现Dense_Stage_Block(同时引入Depthwise separable convolution来进一步节省参数,加快模型速度,原文为一般卷积层):
代码语言:javascript复制def Dense_Stage(inputs_, depth=64, repeat=8):
for _ in range(repeat):
X_input = inputs_
X = tf.layers.conv2d(inputs_,depth, (1,1), strides=(1,1), activation=tf.nn.leaky_relu)
X = tf.layers.batch_normalization(X)
X = tf.layers.separable_conv2d(X, depth, (3,3), padding='SAME')
X = tf.nn.leaky_relu(X)
X = tf.layers.batch_normalization(X)
X = tf.concat([X_input,X],3)
inputs_ = X
return X
ShuffleNetV2
谈到轻量级模型,『ShuffleNet』应该是目前常见模型中的翘楚。轻量级模型主要有两个分支,分别为UC Berkeley and Stanford University推出的『SqueezeNet』以及Google推出的『MobileNet』,Depthwise separable convolution就是源于MobileNet,而SqueezeNet的原理与Inception非常类似在这就先不多加赘述。
ShuffleNet以SqueezeNet为基础并做了一些改变,其原理与Depthwise separable convolution有几分神似,Depthwise separable convolution是由Depthwise+Pointwise convolution组成,而之所以要运用Pointwise convolution是因为Depthwise中Feature Maps通道不流通的问题,在Depthwise Convolution中每一个Kernel都只对一张Feature Map卷积,并不能看到全局的信息。而在ShuffleNet中,Group Convolution一样有通道不流通的问题(参考下图,与Depthwise非常类似),然而不同于MobileNet使用Pointwise convolution来解决,ShuffleNet使用的方法就是『Shuffle』,直接把不同Group的Feature Map洗牌,送到下一层,这样一来又进一步节省了Pointwise convolution中的参数,达到『超轻量』级别。
Group Convolution
好,有了一些基本观念,现在让我们来看看ShuffleNetV2相较于V1做了哪些重要的改变:
1*1卷积
首先,V1使用大量的1*1卷积,会增加MAC(乘法加法操作),在Depthwise Separable Convolution中占运算及参数大宗的就是Pointwise Convolution,因此在V2中先对进入Block的Feature Maps做Split。
输出使用Concate
作者发现,Pixelwise的运算如相加与ReLU也是造成MAC上升的主因,因此V2中使用Concat取代V1的Add。
ShufflenetV1以及ShufflenetV2,(a) V1基本架构、(b)带有downsampling的V1架构、© V2基本架构、(d)带有downsampling的V2架构
下方代码为大家示范如何搭建一个ShuffleNetV2的Block,其中比较要注意的是Shuffle_group要能被输入Feature Map通道深度所整除。
代码语言:javascript复制##參考:https://github.com/timctho/shufflenet-v2-tensorflow/blob/master/module.py
##參考:https://github.com/TropComplique/shufflenet-v2-tensorflow/blob/master/architecture.py
def shuffle_unit(x, groups): ##一般的shuffle depthwise_conv輸出的Feature Map
with tf.variable_scope('shuffle_unit'):
n, h, w, c = x.get_shape().as_list()
x = tf.reshape(x, shape=([tf.shape(x)[0], h, w, groups, c // groups]))
x = tf.transpose(x, tf.convert_to_tensor([0, 1, 2, 4, 3]))
x = tf.reshape(x, shape=[tf.shape(x)[0], h, w, c])
return x
def depthwise_conv(
x, kernel=3, stride=1, padding='SAME',
activation_fn=None, normalizer_fn=None,
weights_initializer=tf.contrib.layers.xavier_initializer(),
data_format='NHWC', scope='depthwise_conv'): ##一般的depthwise_conv
with tf.variable_scope(scope):
assert data_format == 'NHWC'
in_channels = x.shape[3].value
W = tf.get_variable(
'depthwise_weights',
[kernel, kernel, in_channels, 1], dtype=tf.float32,
initializer=weights_initializer
)
x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC')
x = tf.layers.batch_normalization(x) if normalizer_fn is not None else x # batch normalization
x = tf.nn.leaky_relu(x) if activation_fn is not None else x # nonlinearity
return x
def conv_bn_relu(x, out_channel, kernel_size, stride=1): ##一般的Convolution BN Relu
with tf.variable_scope(None, 'conv_bn_relu'):
x = tf.layers.conv2d(x, out_channel, kernel_size, stride,)
x = tf.nn.leaky_relu(tf.layers.batch_normalization(x))
return x
def shufflenet_v2_block(x, out_channel, kernel_size, stride=1, shuffle_group=2): ##shufflenet_v2_block
with tf.variable_scope(None, 'shuffle_v2_block'):
if stride == 1:
top, bottom = tf.split(x, num_or_size_splits=2, axis=3)
half_channel = out_channel // 2
top = conv_bn_relu(top, half_channel, 1)
top = depthwise_conv_bn(top, kernel_size, stride)
top = conv_bn_relu(top, half_channel, 1)
out = tf.concat([top, bottom], axis=3)
out = shuffle_unit(out, shuffle_group)
else: ##downsampling的Block
half_channel = out_channel // 2
b0 = conv_bn_relu(x, half_channel, 1)
b0 = depthwise_conv_bn(b0, kernel_size, stride)
b0 = conv_bn_relu(b0, half_channel, 1)
b1 = depthwise_conv_bn(x, kernel_size, stride)
b1 = conv_bn_relu(b1, half_channel, 1)
out = tf.concat([b0, b1], axis=3)
out = shuffle_unit(out, shuffle_group)
return out
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 4])
out = shufflenet_v2_block(inputs, 2, (3,3), stride=1, shuffle_group=2)
print(get_num_params())
EfficientNet
EfficientNet由Google于2019年提出,透过Google AutoML的技术,搭建了八种高效的模型,分别为B0-B7,而如果我们将细节拆开来看,其实Bottleneck是由MobileNetV2所提出的Inverted Residual Block加上Squeeze-and-Excitation Networks所组成,所以我们其实只要会搭建MBConv block就能重现EfficientNet的架构,下方我们先来看看MobileNetV2相较于MobileNetV1与Resnet做了哪些重要改变。
EfficientnetB0架構
先扩张再压缩
作者认为,当低通道数的Feature Map经过ReLU激活后,所有值都会大于等于零,造成大量信息的流失,因此有别于Resnet先压缩、MobileNetV1直接做Depthwise separable convolution,MobileNetV2则是先透过Pointwise卷积扩张Feature Map深度。
跨接
相较于V1、V2采用ReseNet概念,对Feature Map进行跨接。
输出改用线性激活
如上方提到的,作者认为低通道数的Feature Map不适合使用ReLU激活,因此将输出层改用线性激活,如果想要使用ReLU的话,要确保输出通道深度。
MobilenetV1 、MobilenetV2、Resnet比较
跟不同阵营的shuffleNet架构比较一下,MobileNetV2推出时ShuffleNetV2还没推出,所以图中是与ShuffleNetV1比较。
Shufflenet、MobilenetV2比较
下方代码示范如何搭建MobileNetV2中的Residual_block。
代码语言:javascript复制def depthwise_conv (x, kernel = 3, stride = 1, padding = 'SAME',
activation_fn = None, normalizer_fn = None,
weights_initializer = tf.contrib.layers.xavier_initializer(),
data_format = 'NHWC', scope = 'depthwise_conv'):
##一般的depthwise_conv
with tf.variable_scope(scope):
assert data_format == 'NHWC'
in_channels = x.shape [ 3 ].value
W = tf.get_variable ('depthwise_weights',
[kernel, kernel, in_channels, 1 ], dtype = tf.float32,
initializer = weights_initializer)
x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format = 'NHWC')
x = tf.layers.batch_normalization(x) if normalizer_fn is not None else x # batch normalization
x = tf.nn.leaky_relu(x) if activation_fn is not None else x # nonlinearity
return x
def res_block(input, expansion_ratio, output_dim, stride, name, bias = False, shortcut = True):
with tf.name_scope(name), tf.variable_scope(name):
# pw
bottleneck_dim = round(expansion_ratio * input.get_shape().as_list()[-1])
net = tf.layers.conv2d(input, bottleneck_dim,( 1, 1), name = 'pw',
kernel_regularizer = tf.contrib.layers.l2_regularizer(0.003), use_bias = bias) ##先扩张
net = tf.layers.batch_normalization(net, name = 'pw_bn')
net = tf.nn.relu6(net)
# dw
net = depthwise_conv(net)
net = tf.layers.batch_normalization(net, name = 'dw_bn')
net = tf.nn.relu6(net)
# pw & linear
net = tf.layers.conv2d(net, output_dim, (1, 1), name = 'pw_linear',
kernel_regularizer = tf.contrib.layers.l2_regularizer(0.003), use_bias = bias) ##压回输出深度
net = tf.layers.batch_normalization(net, name = 'pw_linear_bn')
# element wise add, only for stride==1
if shortcut and stride == 1 :
in_dim = int(input.get_shape().as_list()[-1])
if in_dim != output_dim :
ins = tf.layers.conv2d(input, output_dim, (1, 1), name = 'ex_dim',
kernel_regularizer = tf.contrib.layers.l2_regularizer(0.003), use_bias = bias)
net = ins net
else :
net = input net
return net
SENet(Squeeze-and-Excitation Networks)
SENET_Block
有了Inverted Residual Block,我们还缺Squeeze-and-Excitation Networks,SENet的核心思想在于通过网络去学习特征权重,使得有效的特征图权重大,无效或效果小的特征图权重小的方式训练模型达到更好的结果,我认为跟Attention有几分神似。
结合Residual后的架构如上,透过Global Average Pooling获得全局信息(Squeeze),利用FC层获取语意信息,先压缩再扩张(Excitation),最后将各个Feature Map得到系数去乘回本来的Input(中间压缩层运用ReLU感觉不太适合?),具体结合Inverted residual block,建构MBConv代码如下。
代码语言:javascript复制import tensorflow as tf
def depthwise_conv(
x, kernel=3, stride=1, padding='SAME',
activation_fn=None, normalizer_fn=None,
weights_initializer=tf.contrib.layers.xavier_initializer(),
data_format='NHWC', scope='depthwise_conv'): ##一般的depthwise_conv
with tf.variable_scope(scope):
assert data_format == 'NHWC'
in_channels = x.shape[3].value
W = tf.get_variable(
'depthwise_weights',
[kernel, kernel, in_channels, 1], dtype=tf.float32,
initializer=weights_initializer
)
x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC')
x = tf.layers.batch_normalization(x) if normalizer_fn is not None else x # batch normalization
x = tf.nn.leaky_relu(x) if activation_fn is not None else x # nonlinearity
return x
def MBConvBlock(input, expansion_ratio, output_dim, stride, name, squeeze ,bias=False, shortcut=True,
use_Squeeze_Excitation=True):
with tf.name_scope(name), tf.variable_scope(name):
# pw
bottleneck_dim = round(expansion_ratio*input.get_shape().as_list()[-1])
net = tf.layers.conv2d(input, bottleneck_dim,(1,1), name='pw',
kernel_regularizer=tf.contrib.layers.l2_regularizer(0.003),use_bias=bias) ##先擴張
net = tf.layers.batch_normalization(net, name='pw_bn')
net = tf.nn.relu6(net)
# dw
net = depthwise_conv(net, stride=stride)
net = tf.layers.batch_normalization(net, name='dw_bn')
net = tf.nn.relu6(net)
# pw & linear
net = tf.layers.conv2d(net, output_dim,(1,1), name='pw_linear',
kernel_regularizer=tf.contrib.layers.l2_regularizer(0.003), use_bias=bias) ##壓回輸出深度
net = tf.layers.batch_normalization(net, name='pw_linear_bn')
# SENET-Squeeze-Excitation
if use_Squeeze_Excitation:
in_dim=int(net.get_shape().as_list()[-1])
Squeeze=tf.layers.average_pooling2d(net, net.get_shape()[1:-1], 1)
Squeeze=tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=in_dim//squeeze))
Excitation=tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=output_dim))
Excitation=tf.nn.sigmoid(Excitation)
net = tf.reshape(Excitation, [-1,1,1,output_dim])*net
in_dim=int(input.get_shape().as_list()[-1])
if shortcut and stride == 1:
if in_dim != output_dim:
ins = tf.layers.conv2d(input, output_dim,(1,1), name='ex_dim',
kernel_regularizer=tf.contrib.layers.l2_regularizer(0.003), use_bias=bias)
net = ins net
else:
net = input net
return net
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 43])
out = MBConvBlock(inputs, 4, 64, 1, 'first', 4,bias=False, shortcut=True, use_Squeeze_Excitation=True)
MobileNetV3
只能说大神们发论文的速度比我们看论文的速度还要快,MobileNetV3传承MobileNetV1的Depthwise Separable Convolution、MobileNetV2的跨接与先放大再压缩观念,并加入了Squeeze-and-Excitation Networks,所以整个架构上与EfficientNet的MBConvBlock很相似,除此之外MobileNetV3在激励函数上做了一些变动:
部分Block中的ReLU使用H-swish取代,Sigmoid则使用H-sigmoid取代,H-swish是参考swish函数设计,主要是由于swish函数运算较慢,作者实验证实,使用H-swish能提高准度。
H-swish激励函数
激励函数之间的差异
下图为MobileNetV3与MobileNetV2的比较图,图中可以发现相同Latency下,MobileNetV3模型在Top-1 Accuracy上都较为胜出。
MobilenetV3 vs MobilenetV2
下方的代码实现MobileNetV3的Bottleneck。
代码语言:javascript复制import tensorflow as tf
def Hswish(input_):
return input_* tf.nn.relu6(input_ 3.) / 6.
def Hsigmoid(input_):
return tf.nn.relu6(input_ 3.) / 6.
def depthwise_conv(
x, kernel=3, stride=1, padding='SAME',
activation_fn=None, normalizer_fn=None,
weights_initializer=tf.contrib.layers.xavier_initializer(),
data_format='NHWC', scope='depthwise_conv'): ##一般的depthwise_conv
with tf.variable_scope(scope):
assert data_format == 'NHWC'
in_channels = x.shape[3].value
W = tf.get_variable(
'depthwise_weights',
[kernel, kernel, in_channels, 1], dtype=tf.float32,
initializer=weights_initializer
)
x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC',)
x = tf.layers.batch_normalization(x) if normalizer_fn is not None else x # batch normalization
x = tf.nn.leaky_relu(x) if activation_fn is not None else x # nonlinearity
return x
def SEBlock(input_, squeeze=4):
in_dim=int(input_.get_shape().as_list()[-1])
Squeeze = tf.layers.average_pooling2d(input_, input_.get_shape()[1:-1], 1)
Squeeze = tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=in_dim//squeeze))
Excitation = tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=in_dim))
Excitation = Hsigmoid(Excitation) ##Hsigmoid replace Sigmoid
Excitation = tf.reshape(Excitation, [-1,1,1,in_dim])
return input_*Excitation
def MobileV3Bottleneck(input_,expand_size, squeeze,out_size, kernel_size,stride=1, relu=True, se=True):
Shortcut = input_
in_dim = int(input_.get_shape().as_list()[-1])
out = tf.layers.batch_normalization(tf.layers.conv2d(input_,expand_size, (1,1), (1,1), use_bias=False))
if relu:
out = tf.nn.relu(out) #or relu6
else:
out = Hswish(out)
out = depthwise_conv(out, kernel=kernel_size, stride=stride, padding='SAME')
out = tf.layers.batch_normalization(out)
if relu:
out = tf.nn.relu(out) #or relu6
else:
out = Hswish(out)
out = tf.layers.batch_normalization(tf.layers.conv2d(out, out_size, (1,1), (1,1), use_bias=False))
if (in_dim != out_size) and (stride == 1):
Shortcut = tf.layers.conv2d(Shortcut,out_size, (1,1), strides = (stride, stride), use_bias=False)
Shortcut = tf.layers.batch_normalization(Shortcut)
if se:
assert squeeze <= out_size
out = SEBlock(out,squeeze=squeeze)
out = out Shortcut if stride == 1 else out
return out
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 80])
out = MobileBottleneck(inputs,480,4,112,3,stride=1,relu=False,se=True)
结论
今天为读者们介绍了几个最新CNN架构的核心技术,相信大家在看完后都有所收获,往后在搭建模型时,也不会局限在pretrained model,而是能依照自己的需求与想法打造最适合的Model。