1.Ghostnet、G_ghost、Ghostnetv2性能比较
引入到yolov8,Bottleneck与c2f结合,代替backbone中的所有c2f。
layers | parameters | GFLOPs | kb | |
---|---|---|---|---|
YOLOv8s | 168 | 11125971 | 28.4 | 21991 |
YOLOv8_C2f_GhostBottleneckV2s | 279 | 2553539 | 6.8 | 5250 |
YOLOv8_C2f_GhostBottlenecks | 267 | 2553539 | 6.8 | 5248 |
YOLOv8_C2f_g_ghostBottlenecks | 195 | 2581091 | 6.9 | 5283 |
2.G_ghost介绍
论文:https://arxiv.org/abs/2201.03297
GhostNet作为近年来最流行的轻量级神经网络架构,其在ARM和CPU端的应用已经非常广泛。而在GPU和NPU这种并行计算设备上,原版GhostNet并没有体现出优势。最近,华为诺亚的研究者针对GPU等设备的特点,巧妙引入跨层的廉价操作,减少计算量的同时减少的内存数据搬运,基于此设计了GPU版GhostNet。实验表明,G-GhostNet在现有GPU设备上达到了速度和精度的最佳平衡。在华为自研NPU昇腾310上,G-GhostNet的速度比同量级ResNet要快30%以上。该论文已被计算机视觉顶级期刊IJCV收录。
如图所示,我们提出了两种基于跨层廉价操作的stage结构:
G-Ghost stage:给定模块的输入输出通道数(一般情况下输入通道数和输出通道数相同),第二层开始所有的卷积层的通道数都是输出通道数的1/2,剩下的1/2通道的输出特征由第一层卷积层的输出经廉价操作产生。
带mix操作的G-Ghost stage:在G-Ghost stage的基础上,第二层开始所有的卷积层的通道数都是输出通道数的1/2,剩下1/2通道的输出特征由之前所有卷积层分别经过廉价操作产生。
在轻量神经网络对比中,G-GhostNet同时达到最快的推理速度和最高的推理精度。如下图所示,G-GhostNet在24ms的推理时延下,达到了超过77.5%的ImageNet正确率,远超其他网络如MobileNetV3和EfficientNet。
3.Yolov8引入G-GhostNet
3.1 加入ultralytics/nn/backbone/g_ghost.py
核心代码:
代码语言:javascript复制class GGhostRegNet(nn.Module):
def __init__(self, block, layers, widths, num_classes=1000, zero_init_residual=True,
group_width=1, replace_stride_with_dilation=None,
norm_layer=None):
super(GGhostRegNet, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
self._norm_layer = norm_layer
self.inplanes = 32
self.dilation = 1
if replace_stride_with_dilation is None:
# each element in the tuple indicates if we should replace
# the 2x2 stride with a dilated convolution instead
replace_stride_with_dilation = [False, False, False, False]
if len(replace_stride_with_dilation) != 4:
raise ValueError("replace_stride_with_dilation should be None "
"or a 4-element tuple, got {}".format(replace_stride_with_dilation))
self.group_width = group_width
self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=3, stride=2, padding=1,
bias=False)
self.bn1 = norm_layer(self.inplanes)
self.relu = nn.ReLU(inplace=True)
self.layer1 = self._make_layer(block, widths[0], layers[0], stride=2,
dilate=replace_stride_with_dilation[0])
self.inplanes = widths[0]
if layers[1] > 2:
self.layer2 = Stage(block, self.inplanes, widths[1], group_width, layers[1], stride=2,
dilate=replace_stride_with_dilation[1], cheap_ratio=0.5)
else:
self.layer2 = self._make_layer(block, widths[1], layers[1], stride=2,
dilate=replace_stride_with_dilation[1])
self.inplanes = widths[1]
self.layer3 = Stage(block, self.inplanes, widths[2], group_width, layers[2], stride=2,
dilate=replace_stride_with_dilation[2], cheap_ratio=0.5)
self.inplanes = widths[2]
if layers[3] > 2:
self.layer4 = Stage(block, self.inplanes, widths[3], group_width, layers[3], stride=2,
dilate=replace_stride_with_dilation[3], cheap_ratio=0.5)
else:
self.layer4 = self._make_layer(block, widths[3], layers[3], stride=2,
dilate=replace_stride_with_dilation[3])
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.dropout = nn.Dropout(0.2)
self.fc = nn.Linear(widths[-1] * block.expansion, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
norm_layer = self._norm_layer
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes:
downsample = nn.Sequential(
conv1x1(self.inplanes, planes, stride),
norm_layer(planes),
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample, self.group_width,
previous_dilation, norm_layer))
self.inplanes = planes
for _ in range(1, blocks):
layers.append(block(self.inplanes, planes, group_width=self.group_width,
dilation=self.dilation,
norm_layer=norm_layer))
return nn.Sequential(*layers)
def _forward_impl(self, x):
# See note [TorchScript super()]
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.dropout(x)
x = self.fc(x)
return x
def forward(self, x):
return self._forward_impl(x)
详见:
https://blog.csdn.net/m0_63774211/article/details/131301450
我正在参与2023腾讯技术创作特训营第三期有奖征文,组队打卡瓜分大奖!