YOLOv8轻量级:Ghostnet、G_ghost、Ghostnetv2家族大作战(三):华为GhostNet再升级,全系列硬件最优极简AI网络G_ghost

2023-11-02 15:08:08 浏览数 (1)

1.Ghostnet、G_ghost、Ghostnetv2性能比较

引入到yolov8,Bottleneck与c2f结合,代替backbone中的所有c2f。

layers

parameters

GFLOPs

kb

YOLOv8s

168

11125971

28.4

21991

YOLOv8_C2f_GhostBottleneckV2s

279

2553539

6.8

5250

YOLOv8_C2f_GhostBottlenecks

267

2553539

6.8

5248

YOLOv8_C2f_g_ghostBottlenecks

195

2581091

6.9

5283

2.G_ghost介绍

论文:https://arxiv.org/abs/2201.03297

GhostNet作为近年来最流行的轻量级神经网络架构,其在ARM和CPU端的应用已经非常广泛。而在GPU和NPU这种并行计算设备上,原版GhostNet并没有体现出优势。最近,华为诺亚的研究者针对GPU等设备的特点,巧妙引入跨层的廉价操作,减少计算量的同时减少的内存数据搬运,基于此设计了GPU版GhostNet。实验表明,G-GhostNet在现有GPU设备上达到了速度和精度的最佳平衡。在华为自研NPU昇腾310上,G-GhostNet的速度比同量级ResNet要快30%以上。该论文已被计算机视觉顶级期刊IJCV收录。

如图所示,我们提出了两种基于跨层廉价操作的stage结构:

G-Ghost stage:给定模块的输入输出通道数(一般情况下输入通道数和输出通道数相同),第二层开始所有的卷积层的通道数都是输出通道数的1/2,剩下的1/2通道的输出特征由第一层卷积层的输出经廉价操作产生。

带mix操作的G-Ghost stage:在G-Ghost stage的基础上,第二层开始所有的卷积层的通道数都是输出通道数的1/2,剩下1/2通道的输出特征由之前所有卷积层分别经过廉价操作产生。

在轻量神经网络对比中,G-GhostNet同时达到最快的推理速度和最高的推理精度。如下图所示,G-GhostNet在24ms的推理时延下,达到了超过77.5%的ImageNet正确率,远超其他网络如MobileNetV3和EfficientNet。

3.Yolov8引入G-GhostNet

3.1 加入ultralytics/nn/backbone/g_ghost.py

核心代码:

代码语言:javascript复制
class GGhostRegNet(nn.Module):

    def __init__(self, block, layers, widths, num_classes=1000, zero_init_residual=True,
                 group_width=1, replace_stride_with_dilation=None,
                 norm_layer=None):
        super(GGhostRegNet, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer

        self.inplanes = 32
        self.dilation = 1
        if replace_stride_with_dilation is None:
            # each element in the tuple indicates if we should replace
            # the 2x2 stride with a dilated convolution instead
            replace_stride_with_dilation = [False, False, False, False]
        if len(replace_stride_with_dilation) != 4:
            raise ValueError("replace_stride_with_dilation should be None "
                             "or a 4-element tuple, got {}".format(replace_stride_with_dilation))
        self.group_width = group_width
        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=3, stride=2, padding=1,
                               bias=False)
        self.bn1 = norm_layer(self.inplanes)
        self.relu = nn.ReLU(inplace=True)
        
        self.layer1 = self._make_layer(block, widths[0], layers[0], stride=2,
                                       dilate=replace_stride_with_dilation[0])
        
        self.inplanes = widths[0]
        if layers[1] > 2:
            self.layer2 = Stage(block, self.inplanes, widths[1], group_width, layers[1], stride=2,
                          dilate=replace_stride_with_dilation[1], cheap_ratio=0.5) 
        else:      
            self.layer2 = self._make_layer(block, widths[1], layers[1], stride=2,
                                           dilate=replace_stride_with_dilation[1])
        
        self.inplanes = widths[1]
        self.layer3 = Stage(block, self.inplanes, widths[2], group_width, layers[2], stride=2,
                      dilate=replace_stride_with_dilation[2], cheap_ratio=0.5)
        
        self.inplanes = widths[2]
        if layers[3] > 2:
            self.layer4 = Stage(block, self.inplanes, widths[3], group_width, layers[3], stride=2,
                          dilate=replace_stride_with_dilation[3], cheap_ratio=0.5) 
        else:
            self.layer4 = self._make_layer(block, widths[3], layers[3], stride=2,
                                           dilate=replace_stride_with_dilation[3])
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(0.2)
        self.fc = nn.Linear(widths[-1] * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
        norm_layer = self._norm_layer
        downsample = None
        previous_dilation = self.dilation
        if dilate:
            self.dilation *= stride
            stride = 1
        if stride != 1 or self.inplanes != planes:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes, stride),
                norm_layer(planes),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample, self.group_width,
                            previous_dilation, norm_layer))
        self.inplanes = planes
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes, group_width=self.group_width,
                                dilation=self.dilation,
                                norm_layer=norm_layer))

        return nn.Sequential(*layers)

    def _forward_impl(self, x):
        # See note [TorchScript super()]
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.dropout(x)
        x = self.fc(x)

        return x

    def forward(self, x):
        return self._forward_impl(x)

详见:

https://blog.csdn.net/m0_63774211/article/details/131301450

我正在参与2023腾讯技术创作特训营第三期有奖征文,组队打卡瓜分大奖!

0 人点赞