本文独家改进:LSKNet 助力RT-DETR ,替换backbone,Large Selective Kernel Network (LSKNet),可以动态地调整其大空间感受野,以更好地建模遥感场景中各种物体的测距的场景。
推荐指数:五星
1. LSKNet介绍
论文:https://arxiv.org/pdf/2303.09030.pdf
本文提出了Large Selective Kernel Network (LSKNet),可以动态地调整其大空间感受野,以更好地建模遥感场景中各种物体的测距的场景。
首次在遥感物体检测领域探索大选择性卷积核机制的工作。在没有任何附加条件的情况下,我们 LSKNet 比主流检测器轻量的多,而且在多个数据集上刷新了 SOTA!HRSC2016(98.46% mAP)、DOTA-v1.0(81.64% mAP)和 FAIR1M-v1.0(47.87% mAP)。
提出的方法包括动态调整特征提取骨干的感受野,以便更有效地处理被检测物体的不同的广泛背景。这是通过一个空间选择机制来实现的,该机制对一连串的大 depth-wise 卷积核所处理的特征进行有效加权,然后在空间上将它们合并。这些核的权重是根据输入动态确定的,允许该模型自适应地使用不同的大核,并根据需要调整空间中每个目标的感受野。
2. LSKNet 引入到RT-DETR
2.1 加入ultralytics/nn/backbone/lsknet.py
代码语言:javascript复制class LSKblock(nn.Module):
def __init__(self, dim):
super().__init__()
self.conv0 = nn.Conv2d(dim, dim, 5, padding=2, groups=dim)
self.conv_spatial = nn.Conv2d(dim, dim, 7, stride=1, padding=9, groups=dim, dilation=3)
self.conv1 = nn.Conv2d(dim, dim//2, 1)
self.conv2 = nn.Conv2d(dim, dim//2, 1)
self.conv_squeeze = nn.Conv2d(2, 2, 7, padding=3)
self.conv = nn.Conv2d(dim//2, dim, 1)
def forward(self, x):
attn1 = self.conv0(x)
attn2 = self.conv_spatial(attn1)
attn1 = self.conv1(attn1)
attn2 = self.conv2(attn2)
attn = torch.cat([attn1, attn2], dim=1)
avg_attn = torch.mean(attn, dim=1, keepdim=True)
max_attn, _ = torch.max(attn, dim=1, keepdim=True)
agg = torch.cat([avg_attn, max_attn], dim=1)
sig = self.conv_squeeze(agg).sigmoid()
attn = attn1 * sig[:,0,:,:].unsqueeze(1) attn2 * sig[:,1,:,:].unsqueeze(1)
attn = self.conv(attn)
return x * attn
class LSKblockAttention(nn.Module):
def __init__(self, d_model):
super().__init__()
self.proj_1 = nn.Conv2d(d_model, d_model, 1)
self.activation = nn.GELU()
self.spatial_gating_unit = LSKblock(d_model)
self.proj_2 = nn.Conv2d(d_model, d_model, 1)
def forward(self, x):
shorcut = x.clone()
x = self.proj_1(x)
x = self.activation(x)
x = self.spatial_gating_unit(x)
x = self.proj_2(x)
x = x shorcut
return x
详见: https://blog.csdn.net/m0_63774211/article/details/134402474
我正在参与2023腾讯技术创作特训营第三期有奖征文,组队打卡瓜分大奖!