Python pipeline机制学习之BEVFormer中的pipeline

最新推荐文章于 2025-07-15 00:00:50 发布

信雪神话

最新推荐文章于 2025-07-15 00:00:50 发布

阅读量598

点赞数 7

CC 4.0 BY-SA版权

分类专栏： AutomaticDrive 文章标签： python 学习服务器

本文链接：https://blog.csdn.net/hookie1990/article/details/148953305

AutomaticDrive 专栏收录该内容

4 篇文章

订阅专栏

BEVFormer配置文件有pipeline的定义

train_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
    dict(type='PhotoMetricDistortionMultiViewImage'),
    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectNameFilter', classes=class_names),
    dict(type='RandomScaleImageMultiViewImage', scales=[0.3]),
    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
    dict(type='PadMultiViewImage', size_divisor=32),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(type='CustomCollect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img'])
]

作用如下：
LoadMultiViewImageFromFiles: 从文件加载多视角摄像头图像，并将其转换为float32类型。
PhotoMetricDistortionMultiViewImage: 应用光度畸变以进行数据增强。
LoadAnnotations3D: 加载3D边界框注释和标签。
ObjectRangeFilter: 根据预定义的点云范围过滤对象。
ObjectNameFilter: 过滤不在指定类别列表中的对象。
RandomScaleImageMultiViewImage: 随机缩放图像（此处缩放比例为0.3）。
NormalizeMultiviewImage: 使用提供的均值和标准差对图像数据进行标准化。
PadMultiViewImage: 填充图像以确保其维度可被32整除。
DefaultFormatBundle3D: 将数据转换为3D检测模型的标准格式。
CustomCollect3D: 收集指定的键（gt_bboxes_3d、gt_labels_3d、img）作为模型输入。
这种是通过调用 MMDetection3D 框架提供的接口串联起来，在训练过程中按顺序对数据进行处理。

字段	对应类	文件位置
LoadMultiViewImageFromFiles	LoadMultiViewImageFromFiles	mmdet3d/datasets/pipelines/loading.py
PhotoMetricDistortionMultiViewImage	PhotoMetricDistortionMultiViewImage	mmdet3d/datasets/pipelines/transform_3d.py
LoadAnnotations3D	LoadAnnotations3D	mmdet3d/datasets/pipelines/formating.py
ObjectRangeFilter	ObjectRangeFilter	mmdet3d/datasets/pipelines/transform_3d.py
ObjectNameFilter	ObjectNameFilter	mmdet3d/datasets/pipelines/transform_3d.py
RandomScaleImageMultiViewImage	RandomScaleImageMultiViewImage	mmdet3d_plugin/bevformer/pipelines/transform.py
NormalizeMultiviewImage	NormalizeMultiviewImage	mmdet3d/datasets/pipelines/transform_3d.py
PadMultiViewImage	PadMultiViewImage	mmdet3d/datasets/pipelines/transform_3d.py
DefaultFormatBundle3D	DefaultFormatBundle3D	mmdet3d/datasets/pipelines/formating.py
CustomCollect3D	CustomCollect3D	mmdet3d/datasets/pipelines/formating.py

其中RandomScaleImageMultiViewImage类为BEVFormer项目新开发，属于插件，其他均直接调用MMDetection3D已经实现的类。
如~\BEVFormer\mmdetection3d\mmdet3d\datasets\pipelines\loading.py中的

@PIPELINES.register_module()
class LoadMultiViewImageFromFiles(object):
    """Load multi channel images from a list of separate channel files.
    Expects results['img_filename'] to be a list of filenames.
    Args:
        to_float32 (bool): Whether to convert the img to float32.
            Defaults to False.
        color_type (str): Color type of the file. Defaults to 'unchanged'.
    """
    def __init__(self, to_float32=False, color_type='unchanged'):
        self.to_float32 = to_float32
        self.color_type = color_type
    def __call__(self, results):
        """Call function to load multi-view image from files.
        Args:
            results (dict): Result dict containing multi-view image filenames.
        Returns:
            dict: The result dict containing the multi-view image data. \
                Added keys and values are described below.
                - filename (str): Multi-view image filenames.
                - img (np.ndarray): Multi-view image arrays.
                - img_shape (tuple[int]): Shape of multi-view image arrays.
                - ori_shape (tuple[int]): Shape of original image arrays.
                - pad_shape (tuple[int]): Shape of padded image arrays.
                - scale_factor (float): Scale factor.
                - img_norm_cfg (dict): Normalization configuration of images.
        """
        filename = results['img_filename']
        # img is of shape (h, w, c, num_views)
        img = np.stack(
            [mmcv.imread(name, self.color_type) for name in filename], axis=-1)
        if self.to_float32:
            img = img.astype(np.float32)
        results['filename'] = filename
        # unravel to list, see `DefaultFormatBundle` in formating.py
        # which will transpose each image separately and then stack into array
        results['img'] = [img[..., i] for i in range(img.shape[-1])]
        results['img_shape'] = img.shape
        results['ori_shape'] = img.shape
        # Set initial values for default meta_keys
        results['pad_shape'] = img.shape
        results['scale_factor'] = 1.0
        num_channels = 1 if len(img.shape) < 3 else img.shape[2]
        results['img_norm_cfg'] = dict(
            mean=np.zeros(num_channels, dtype=np.float32),
            std=np.ones(num_channels, dtype=np.float32),
            to_rgb=False)
        return results
    def __repr__(self):
        """str: Return a string that describes the module."""
        repr_str = self.__class__.__name__
        repr_str += f'(to_float32={self.to_float32}, '
        repr_str += f"color_type='{self.color_type}')"
        return repr_str

这个类的作用是从文件中加载多视角图像（如多个摄像头视角）。继承自 object，并实现了标准的数据变换接口（即 call 方法），符合 MMDetection3D 的数据 pipeline 规范。
当在 bevformer_tiny_rgb.py 中如下配置时，MMDetection3D 框架就会使用这个类来加载图像。

dict(type='LoadMultiViewImageFromFiles', to_float32=True)

一般通过build_pipeline()操作构建数据处理流水线对象，Compose把 dict(type=‘xxx’) 列表转换为可执行的变换对象序列，该操作一般包括以下步骤：
(1).遍历 pipeline 配置中的每一个 dict(type=‘xxx’)。
(2).使用反射机制查找对应的类名（如 LoadMultiViewImageFromFiles）。
(3).实例化这些类（传入其他参数，如to_float=True）。
(4).返回可被调用的 pipeline 流水线供训练使用。