基于Yolov7的目标检测改进

南瓜AI

已于 2025-04-03 09:27:19 修改

阅读量91

点赞数

CC 4.0 BY-SA版权

文章标签： YOLO 人工智能计算机视觉 python 算法

于 2023-07-11 15:35:02 首次发布

本文链接：https://blog.csdn.net/zycnice/article/details/131660396

一、在网络中引入DCNv3

1、首先要安装DCNv3

参考：https://blog.csdn.net/zycnice/article/details/130783757

2、复现yolov7源码

https://github.com/WongKinYiu/yolov7
这里还要安装MMCV和MMdetection

3、将intern_image.py移至yolov7项目中的nets文件夹下。

比如在backbone.py中引入创新在下面加入代码打印计算量和计算参数

if __name__ == '__main__':
    from torchinfo import summary

    x = torch.randn(2, 3, 320, 320)
    x = x.cuda()

    model = Backbone(transition_channels = 32, block_channels = 32, n = 4, phi = 'l', pretrained = False)
    #model = Backbone_inter_image(transition_channels=32, block_channels=32, n=4, phi='l', pretrained=False)
    model.cuda()
    summary(model, (2, 3, 640, 640))

    feat1, feat2, feat3 = model(x)
    print(feat1.shape, feat2.shape, feat3.shape)

在这里插入图片描述

4、将intern_image封装好，这是在intern_image.py里加的一个类

class Internimage(nn.Module):
    def __init__(self,
                 channels,
                 depth,
                 groups,
                 downsample=True,
                 af='GELU',
                 ): # for InternImage-H/G
        super().__init__()
        self.itern_image = InternImageBlock(core_op = 'DCNv3',  # DCNv3
                        channels = channels,
                        depth = depth,  # [4, 4, 18, 4]
                        groups = groups,
                        downsample = downsample,
                        mlp_ratio = 4.,
                        drop = 0.,
                        drop_path=0.,
                        act_layer = af,
                        norm_layer = 'LN',
                        post_norm = False,
                        offset_scale = 1.0,
                        layer_scale = 1.0,
                        with_cp = False,
                        dw_kernel_size = None,  # for InternImage-H/G
                        post_norm_block_ids = None,  # for InternImage-H/G
                        res_post_norm = False,  # for InternImage-H/G
                        center_feature_scale = False) # for InternImage-H/G
    def forward(self, x, return_wo_downsample=False):
        x = x.permute(0, 2, 3, 1) #原始代码没有这一行，这是为了方便放入YOLO中，再转换维度
        x = self.itern_image(x)
        x = x.permute(0, 3, 1, 2) #原始代码没有这一行，这是为了方便放入YOLO中，再转换维度
        return x

到时候直接在YOLO的网络部分调用这一个类就好了。大部分参数都是不用改的，所以调用的时候就保留上面5个参数，避免代码太长，可读性不好。因为yolo里大部分都是卷积操作，所以这里还是使用了permute函数转换维度，使用inrern_image layer后再转换回去。（因为是在ViT的基础上改进的DCNV3）

5、在yolo的网络代码中加入或替换Internimage，比如

self.stem = nn.Sequential(
            Conv(3, transition_channels, 3, 1),  # 640 * 640 * 32
            Internimage(channels=32, depth=2, groups=4, downsample=False, af='GELU'),
            Conv(transition_channels, transition_channels * 2, 3, 2),  # 320 * 320 * 64
            Conv(transition_channels * 2, transition_channels * 2, 3, 1),  # 320 * 320 * 64
        )

6、计算量比较：

当输入尺寸是(2, 3, 640, 640)时，

在backbone某部分单独加入一个intern_image层后的计算量，这是按照Internimage原始tiny版本的配置来的，实际加入的话depth和group不用加入这么多
网络改进	参数量
yolov7原始	26,694,432
在stem中的第一层CBS后面加入Internimage：depth=4, groups=4，GELU	26,811,392
在dark2中的第一层CBS后面加入Internimage：depth=4, groups=4，GELU	28,142,720
在dark3中的第一层CBS后面加入Internimage：depth=4, groups=8，GELU	32,433,632
在dark4中的第一层CBS后面加入Internimage：depth=18, groups=16，GELU	129,507,296
在dark5中的第一层CBS后面加入Internimage：depth=4, groups=32，GELU	72,284,320

验证groups对参数量的影响
网络改进	参数量
yolov7原始	26,694,432
在stem中的第一层CBS后面加入Internimage：depth=4, groups=4，GELU	26,811,392
在stem中的第一层CBS后面加入Internimage：depth=4, groups=8，GELU	26,839,904
在dark4中的第一层CBS后面加入Internimage：depth=18, groups=2，GELU	122,526,392
在dark4中的第一层CBS后面加入Internimage：depth=18, groups=8，GELU	125,518,208
在dark4中的第一层CBS后面加入Internimage：depth=18, groups=16，GELU	129,507,296

当depth为1时，每一层单独增加的参数
网络改进	参数量
yolov7原始	26,694,432
在stem中的第一层CBS后面加入Internimage：depth=1, groups=4，GELU	26,730,896
在dark2中的第一层CBS后面加入Internimage：depth=1, groups=4，GELU	27,056,888
在dark3中的第一层CBS后面加入Internimage：depth=1, groups=8，GELU	28,130,000
在dark4中的第一层CBS后面加入Internimage：depth=1, groups=16，GELU	32,186,576
在dark5中的第一层CBS后面加入Internimage：depth=1, groups=32，GELU	38,093,440

1、从上面的数据能看出group越大，计算量越大，随着网络的深度增加，group应该逐渐增加。
随着网络深度的增加，通道数增加了，那么计算量也会增加。

2、在更深的层中，Internimage导致参数量增加的过多了，应该谨慎使用。

3、为什么Internimage源码在第三层（也就是yolo中的dark4）将depth设置成18，这肯定有深意，经过了大量实验佐证的，所以还是不要瞎设置。

4、此外将depth设置成0几乎不引入参数，但还是有点，如果GPU资源足够好的话，还可以封装的更好。

7、实际的改进：

1、先试一下所有五块都引入1层Internimage layer

class Backbone_inter_image(nn.Module):
    '''
    这是改进的backbone网络，结合了intern_image layer,这里的主要操作是dcnv3。如果使用了这个函数，就要把nets.yolo.py中YoloBody的self.backbone换成Backbone_inter_image
    即将原始的Backbone注释掉或删掉。
    初始的不添加任何改进的网络名叫Backbone，在上面。
    '''
    def __init__(self, transition_channels, block_channels, n, phi, pretrained = False):  # 32, 32, 4, l, True
        super().__init__()
        # -----------------------------------------------#
        #   输入图片是640, 640, 3
        # -----------------------------------------------#
        ids = {
            'l': [-1, -3, -5, -6],
            'x': [-1, -3, -5, -7, -8],
        }[phi]
        depths = [1, 1, 1, 1, 1]
        groups_nums = [4, 4, 8, 16, 32]
        AF = ['GELU', 'GELU', 'GELU', 'GELU', 'GELU']

        self.stem = nn.Sequential(
            Conv(3, transition_channels, 3, 1),  # 640 * 640 * 32
            Internimage(channels=transition_channels, depth=depths[0], groups=groups_nums[0], downsample=False, af='GELU'),
            Conv(transition_channels, transition_channels * 2, 3, 2),  # 320 * 320 * 64
            Conv(transition_channels * 2, transition_channels * 2, 3, 1),  # 320 * 320 * 64
        )
        self.dark2 = nn.Sequential(
            Conv(transition_channels * 2, transition_channels * 4, 3, 2),  # 160 * 160 * 128
            Internimage(channels = transition_channels * 4, depth = depths[1], groups = groups_nums[1], downsample = False, af = 'GELU'),
            Multi_Concat_Block(transition_channels * 4, block_channels * 2, transition_channels * 8, n = n, ids = ids),
            # 160 * 160 * 256
        )
        self.dark3 = nn.Sequential(
            Transition_Block(transition_channels * 8, transition_channels * 4),  # 80 * 80 * 256
            Internimage(channels = transition_channels * 8, depth = depths[2], groups = groups_nums[2], downsample = False, af = 'GELU'),
            Multi_Concat_Block(transition_channels * 8, block_channels * 4, transition_channels * 16, n = n, ids = ids),
            # 80 * 80 * 512
        )
        self.dark4 = nn.Sequential(
            Transition_Block(transition_channels * 16, transition_channels * 8),  # 40 * 40 * 512
            Internimage(channels = transition_channels * 16, depth = depths[3], groups =groups_nums[3], downsample = False, af = 'GELU'),
            Multi_Concat_Block(transition_channels * 16, block_channels * 8, transition_channels * 32, n = n,
                               ids = ids),  # 40 * 40 * 1024
        )
        self.dark5 = nn.Sequential(
            Transition_Block(transition_channels * 32, transition_channels * 16),  # 20 * 20 * 1024
            Internimage(channels = transition_channels * 32, depth = depths[4], groups = groups_nums[4], downsample = False, af = 'GELU'),
            Multi_Concat_Block(transition_channels * 32, block_channels * 8, transition_channels * 32, n = n,
                               ids = ids),  # 20 * 20 * 1024
        )