Yolov5/Yolov7改进Involution(内卷),可构建用于视觉识别的新一代神经网络!涨点显著!

2023-11-30 16:17:00 浏览数 (1)

摘要:​涨点神器!利用Involution(内卷),可构建用于视觉识别的新一代神经网络!在分类、检测和分割任务上涨点显著!

1.Involution(内卷)

涨点神器!利用Involution(内卷),可构建用于视觉识别的新一代神经网络!在分类、检测和分割任务上涨点显著!

Inverting the Inherence of Convolution for Visual Recognition(CVPR2021)

论文链接:https://arxiv.org/abs/2103.06255 github代码链接:https://github.com/d-li14/involution

作者认为卷积操作的两个特征虽然也有一定的优势,但同样也有缺点。所以提出了Involution,Involution所拥有的特征正好和卷积相对称,即 spatial-specific and channel-agnostic

那就是通道无关和特定于空间。和卷积一样,内卷也有内卷核(involution kernels)。内卷核在空间范围上是不同的,但在通道之间共享。看到这里就有一定的画面感了。

内卷的优点:

1.可以在更大的空间范围中总结上下文信息,从而克服long-range interaction(本来的卷积操作只能在特定的小空间如3x3中集合空间信息)

2.内卷可以将权重自适应地分配到不同的位置,从而对空间域中信息量最大的视觉元素进行优先级排序。(本来的卷积在空间的每一个地方都是用到同一个卷积核,用的同一套权重)

2.Yolov5加入Involution

2.1 Involution加入common.py中:

代码语言:javascript复制
class Involution(nn.Module):

    def __init__(self,c1,c2,kernel_size,stride):
        super(Involution, self).__init__()
        self.kernel_size = kernel_size
        self.stride = stride
        self.c1 = c1
        reduction_ratio = 4
        self.group_channels = 16
        self.groups = self.c1 // self.group_channels
        self.conv1 = Conv(
            c1, c1 // reduction_ratio,1)
        self.conv2 = Conv(
            c1 // reduction_ratio,
            kernel_size**2 * self.groups,
        1,1)
           
        if stride > 1:
            self.avgpool = nn.AvgPool2d(stride, stride)
        self.unfold = nn.Unfold(kernel_size, 1, (kernel_size-1)//2, stride)    

    def forward(self, x):

        weight = self.conv2(self.conv1(x if self.stride == 1 else self.avgpool(x)))
        b, c, h, w = weight.shape
        weight = weight.view(b, self.groups, self.kernel_size**2, h, w).unsqueeze(2)
        out = self.unfold(x).view(b, self.groups, self.group_channels, self.kernel_size**2, h, w)
        out = (weight * out).sum(dim=3).view(b, self.c1, h, w)
     
        return out

2.2 修改yolov5s_involution.yaml

代码语言:javascript复制
# parameters
nc: 1 # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple

# anchors
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
  [-1, 1, Conv, [128, 3, 2]],
  [-1, 1, Involution, [128,3,2]],  # 2-P2/4
   [-1, 3, C3, [128,True]],
   [-1, 1, Conv, [256, 3, 2]],
   [-1, 1, Involution, [256,3,2]], # 5-P3/8
   [-1, 6, C3, [256,True]],
   [-1, 1, Conv, [512,3,2]],     #7
   [-1, 1, Involution, [512,3,1]],   # 8-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],
   [-1, 1, Involution, [1024,3,1]], # 11-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024,5]],
  ]

# YOLOv5 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 8], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 17

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 5], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 21 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 18], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 24 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 27 (P5/32-large)

   [[21, 24, 27], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

by CSDN AI小怪兽 https://cv2023.blog.csdn.net/article/details/129621812

我正在参与2023腾讯技术创作特训营第三期有奖征文,组队打卡瓜分大奖!

0 人点赞