Python pipeline机制学习之BEVFormer中的pipeline

BEVFormer配置文件有pipeline的定义

train_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
    dict(type='PhotoMetricDistortionMultiViewImage'),
    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectNameFilter', classes=class_names),
    dict(type='RandomScaleImageMultiViewImage', scales=[0.3]),
    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
    dict(type='PadMultiViewImage', size_divisor=32),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(type='CustomCollect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img'])
]

作用如下:
LoadMultiViewImageFromFiles: 从文件加载多视角摄像头图像,并将其转换为float32类型。
PhotoMetricDistortionMultiViewImage: 应用光度畸变以进行数据增强。
LoadAnnotations3D: 加载3D边界框注释和标签。
ObjectRangeFilter: 根据预定义的点云范围过滤对象。
ObjectNameFilter: 过滤不在指定类别列表中的对象。
RandomScaleImageMultiViewImage: 随机缩放图像(此处缩放比例为0.3)。
NormalizeMultiviewImage: 使用提供的均值和标准差对图像数据进行标准化。
PadMultiViewImage: 填充图像以确保其维度可被32整除。
DefaultFormatBundle3D: 将数据转换为3D检测模型的标准格式。
CustomCollect3D: 收集指定的键(gt_bboxes_3d、gt_labels_3d、img)作为模型输入。
这种是通过调用 MMDetection3D 框架提供的接口串联起来,在训练过程中按顺序对数据进行处理。

字段对应类文件位置
LoadMultiViewImageFromFilesLoadMultiViewImageFromFilesmmdet3d/datasets/pipelines/loading.py
PhotoMetricDistortionMultiViewImagePhotoMetricDistortionMultiViewImagemmdet3d/datasets/pipelines/transform_3d.py
LoadAnnotations3DLoadAnnotations3Dmmdet3d/datasets/pipelines/formating.py
ObjectRangeFilterObjectRangeFiltermmdet3d/datasets/pipelines/transform_3d.py
ObjectNameFilterObjectNameFiltermmdet3d/datasets/pipelines/transform_3d.py
RandomScaleImageMultiViewImageRandomScaleImageMultiViewImagemmdet3d_plugin/bevformer/pipelines/transform.py
NormalizeMultiviewImageNormalizeMultiviewImagemmdet3d/datasets/pipelines/transform_3d.py
PadMultiViewImagePadMultiViewImagemmdet3d/datasets/pipelines/transform_3d.py
DefaultFormatBundle3DDefaultFormatBundle3Dmmdet3d/datasets/pipelines/formating.py
CustomCollect3DCustomCollect3Dmmdet3d/datasets/pipelines/formating.py

其中RandomScaleImageMultiViewImage类为BEVFormer项目新开发,属于插件,其他均直接调用MMDetection3D已经实现的类。
如~\BEVFormer\mmdetection3d\mmdet3d\datasets\pipelines\loading.py中的

@PIPELINES.register_module()
class LoadMultiViewImageFromFiles(object):
    """Load multi channel images from a list of separate channel files.
    Expects results['img_filename'] to be a list of filenames.
    Args:
        to_float32 (bool): Whether to convert the img to float32.
            Defaults to False.
        color_type (str): Color type of the file. Defaults to 'unchanged'.
    """
    def __init__(self, to_float32=False, color_type='unchanged'):
        self.to_float32 = to_float32
        self.color_type = color_type
    def __call__(self, results):
        """Call function to load multi-view image from files.
        Args:
            results (dict): Result dict containing multi-view image filenames.
        Returns:
            dict: The result dict containing the multi-view image data. \
                Added keys and values are described below.
                - filename (str): Multi-view image filenames.
                - img (np.ndarray): Multi-view image arrays.
                - img_shape (tuple[int]): Shape of multi-view image arrays.
                - ori_shape (tuple[int]): Shape of original image arrays.
                - pad_shape (tuple[int]): Shape of padded image arrays.
                - scale_factor (float): Scale factor.
                - img_norm_cfg (dict): Normalization configuration of images.
        """
        filename = results['img_filename']
        # img is of shape (h, w, c, num_views)
        img = np.stack(
            [mmcv.imread(name, self.color_type) for name in filename], axis=-1)
        if self.to_float32:
            img = img.astype(np.float32)
        results['filename'] = filename
        # unravel to list, see `DefaultFormatBundle` in formating.py
        # which will transpose each image separately and then stack into array
        results['img'] = [img[..., i] for i in range(img.shape[-1])]
        results['img_shape'] = img.shape
        results['ori_shape'] = img.shape
        # Set initial values for default meta_keys
        results['pad_shape'] = img.shape
        results['scale_factor'] = 1.0
        num_channels = 1 if len(img.shape) < 3 else img.shape[2]
        results['img_norm_cfg'] = dict(
            mean=np.zeros(num_channels, dtype=np.float32),
            std=np.ones(num_channels, dtype=np.float32),
            to_rgb=False)
        return results
    def __repr__(self):
        """str: Return a string that describes the module."""
        repr_str = self.__class__.__name__
        repr_str += f'(to_float32={self.to_float32}, '
        repr_str += f"color_type='{self.color_type}')"
        return repr_str

这个类的作用是从文件中加载多视角图像(如多个摄像头视角)。继承自 object,并实现了标准的数据变换接口(即 call 方法),符合 MMDetection3D 的数据 pipeline 规范。
当在 bevformer_tiny_rgb.py 中如下配置时,MMDetection3D 框架就会使用这个类来加载图像。

dict(type='LoadMultiViewImageFromFiles', to_float32=True)

一般通过build_pipeline()操作构建数据处理流水线对象,Compose把 dict(type=‘xxx’) 列表转换为可执行的变换对象序列,该操作一般包括以下步骤:
(1).遍历 pipeline 配置中的每一个 dict(type=‘xxx’)。
(2).使用反射机制查找对应的类名(如 LoadMultiViewImageFromFiles)。
(3).实例化这些类(传入其他参数,如to_float=True)。
(4).返回可被调用的 pipeline 流水线供训练使用。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值