本文独家改进: VanillaNet助力RT-DETR ,替换backbone,简到极致、浅到极致!深度为6的网络即可取得76.36%@ImageNet的精度,深度为13的VanillaNet甚至取得了83.1%的惊人性能。
推荐指数:五星
1.VanillaNet
论文:https://arxiv.org/pdf/2305.12972.pdf
来自华为诺亚、悉尼大学的研究者们提出了一种极简的神经网络模型 VanillaNet,以极简主义的设计为理念,网络中仅仅包含最简单的卷积计算,去掉了残差和注意力模块,在计算机视觉中的各种任务上都取得了不俗的效果。
VanillaNet,这是一种设计优雅的神经网络架构。 通过避免高深度、shortcuts和自注意力等复杂操作,VanillaNet 简洁明了但功能强大。
- 对于Stem部分,采用4×4卷积进行特征变换;
- 对于body部分的每个stage,首先采用MaxPool进行特征下采样,然后采用一个1×1进行特征处理;
- 对于head部分,采用两个非线性层进行分类处理
深度为6的网络即可取得76.36%@ImageNet的精度,深度为13的VanillaNet甚至取得了83.1%的惊人性能。
2. VanillaNet引入到RT-DETR
2.1 加入加入ultralytics/nn/backbone/VanillaNet.py
核心代码:
代码语言:javascript复制class VanillaNet(nn.Module):
def __init__(self, in_chans=3, num_classes=1000, dims=[96, 192, 384, 768],
drop_rate=0, act_num=3, strides=[2, 2, 2, 1], deploy=False, ada_pool=None, **kwargs):
super().__init__()
self.deploy = deploy
if self.deploy:
self.stem = nn.Sequential(
nn.Conv2d(in_chans, dims[0], kernel_size=4, stride=4),
activation(dims[0], act_num)
)
else:
self.stem1 = nn.Sequential(
nn.Conv2d(in_chans, dims[0], kernel_size=4, stride=4),
nn.BatchNorm2d(dims[0], eps=1e-6),
)
self.stem2 = nn.Sequential(
nn.Conv2d(dims[0], dims[0], kernel_size=1, stride=1),
nn.BatchNorm2d(dims[0], eps=1e-6),
activation(dims[0], act_num)
)
self.act_learn = 1
self.stages = nn.ModuleList()
for i in range(len(strides)):
if not ada_pool:
stage = Block(dim=dims[i], dim_out=dims[i 1], act_num=act_num, stride=strides[i], deploy=deploy)
else:
stage = Block(dim=dims[i], dim_out=dims[i 1], act_num=act_num, stride=strides[i], deploy=deploy,
ada_pool=ada_pool[i])
self.stages.append(stage)
self.depth = len(strides)
self.apply(self._init_weights)
self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]
def _init_weights(self, m):
if isinstance(m, (nn.Conv2d, nn.Linear)):
weight_init.trunc_normal_(m.weight, std=.02)
nn.init.constant_(m.bias, 0)
def change_act(self, m):
for i in range(self.depth):
self.stages[i].act_learn = m
self.act_learn = m
def forward(self, x):
input_size = x.size(2)
scale = [4, 8, 16, 32]
features = [None, None, None, None]
if self.deploy:
x = self.stem(x)
else:
x = self.stem1(x)
x = torch.nn.functional.leaky_relu(x, self.act_learn)
x = self.stem2(x)
if input_size // x.size(2) in scale:
features[scale.index(input_size // x.size(2))] = x
for i in range(self.depth):
x = self.stages[i](x)
if input_size // x.size(2) in scale:
features[scale.index(input_size // x.size(2))] = x
return features
def _fuse_bn_tensor(self, conv, bn):
kernel = conv.weight
bias = conv.bias
running_mean = bn.running_mean
running_var = bn.running_var
gamma = bn.weight
beta = bn.bias
eps = bn.eps
std = (running_var eps).sqrt()
t = (gamma / std).reshape(-1, 1, 1, 1)
return kernel * t, beta (bias - running_mean) * gamma / std
def switch_to_deploy(self):
if not self.deploy:
self.stem2[2].switch_to_deploy()
kernel, bias = self._fuse_bn_tensor(self.stem1[0], self.stem1[1])
self.stem1[0].weight.data = kernel
self.stem1[0].bias.data = bias
kernel, bias = self._fuse_bn_tensor(self.stem2[0], self.stem2[1])
self.stem1[0].weight.data = torch.einsum('oi,icjk->ocjk', kernel.squeeze(3).squeeze(2),
self.stem1[0].weight.data)
self.stem1[0].bias.data = bias (self.stem1[0].bias.data.view(1, -1, 1, 1) * kernel).sum(3).sum(2).sum(1)
self.stem = torch.nn.Sequential(*[self.stem1[0], self.stem2[2]])
self.__delattr__('stem1')
self.__delattr__('stem2')
for i in range(self.depth):
self.stages[i].switch_to_deploy()
self.deploy = True
详见:
https://blog.csdn.net/m0_63774211/article/details/134407373
我正在参与2023腾讯技术创作特训营第三期有奖征文,组队打卡瓜分大奖!