BEVFormer配置文件有pipeline的定义
train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='PhotoMetricDistortionMultiViewImage'),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='RandomScaleImageMultiViewImage', scales=[0.3]),
dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='CustomCollect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img'])
]
作用如下:
LoadMultiViewImageFromFiles: 从文件加载多视角摄像头图像,并将其转换为float32类型。
PhotoMetricDistortionMultiViewImage: 应用光度畸变以进行数据增强。
LoadAnnotations3D: 加载3D边界框注释和标签。
ObjectRangeFilter: 根据预定义的点云范围过滤对象。
ObjectNameFilter: 过滤不在指定类别列表中的对象。
RandomScaleImageMultiViewImage: 随机缩放图像(此处缩放比例为0.3)。
NormalizeMultiviewImage: 使用提供的均值和标准差对图像数据进行标准化。
PadMultiViewImage: 填充图像以确保其维度可被32整除。
DefaultFormatBundle3D: 将数据转换为3D检测模型的标准格式。
CustomCollect3D: 收集指定的键(gt_bboxes_3d、gt_labels_3d、img)作为模型输入。
这种是通过调用 MMDetection3D 框架提供的接口串联起来,在训练过程中按顺序对数据进行处理。
字段 | 对应类 | 文件位置 |
---|---|---|
LoadMultiViewImageFromFiles | LoadMultiViewImageFromFiles | mmdet3d/datasets/pipelines/loading.py |
PhotoMetricDistortionMultiViewImage | PhotoMetricDistortionMultiViewImage | mmdet3d/datasets/pipelines/transform_3d.py |
LoadAnnotations3D | LoadAnnotations3D | mmdet3d/datasets/pipelines/formating.py |
ObjectRangeFilter | ObjectRangeFilter | mmdet3d/datasets/pipelines/transform_3d.py |
ObjectNameFilter | ObjectNameFilter | mmdet3d/datasets/pipelines/transform_3d.py |
RandomScaleImageMultiViewImage | RandomScaleImageMultiViewImage | mmdet3d_plugin/bevformer/pipelines/transform.py |
NormalizeMultiviewImage | NormalizeMultiviewImage | mmdet3d/datasets/pipelines/transform_3d.py |
PadMultiViewImage | PadMultiViewImage | mmdet3d/datasets/pipelines/transform_3d.py |
DefaultFormatBundle3D | DefaultFormatBundle3D | mmdet3d/datasets/pipelines/formating.py |
CustomCollect3D | CustomCollect3D | mmdet3d/datasets/pipelines/formating.py |
其中RandomScaleImageMultiViewImage类为BEVFormer项目新开发,属于插件,其他均直接调用MMDetection3D已经实现的类。
如~\BEVFormer\mmdetection3d\mmdet3d\datasets\pipelines\loading.py中的
@PIPELINES.register_module()
class LoadMultiViewImageFromFiles(object):
"""Load multi channel images from a list of separate channel files.
Expects results['img_filename'] to be a list of filenames.
Args:
to_float32 (bool): Whether to convert the img to float32.
Defaults to False.
color_type (str): Color type of the file. Defaults to 'unchanged'.
"""
def __init__(self, to_float32=False, color_type='unchanged'):
self.to_float32 = to_float32
self.color_type = color_type
def __call__(self, results):
"""Call function to load multi-view image from files.
Args:
results (dict): Result dict containing multi-view image filenames.
Returns:
dict: The result dict containing the multi-view image data. \
Added keys and values are described below.
- filename (str): Multi-view image filenames.
- img (np.ndarray): Multi-view image arrays.
- img_shape (tuple[int]): Shape of multi-view image arrays.
- ori_shape (tuple[int]): Shape of original image arrays.
- pad_shape (tuple[int]): Shape of padded image arrays.
- scale_factor (float): Scale factor.
- img_norm_cfg (dict): Normalization configuration of images.
"""
filename = results['img_filename']
# img is of shape (h, w, c, num_views)
img = np.stack(
[mmcv.imread(name, self.color_type) for name in filename], axis=-1)
if self.to_float32:
img = img.astype(np.float32)
results['filename'] = filename
# unravel to list, see `DefaultFormatBundle` in formating.py
# which will transpose each image separately and then stack into array
results['img'] = [img[..., i] for i in range(img.shape[-1])]
results['img_shape'] = img.shape
results['ori_shape'] = img.shape
# Set initial values for default meta_keys
results['pad_shape'] = img.shape
results['scale_factor'] = 1.0
num_channels = 1 if len(img.shape) < 3 else img.shape[2]
results['img_norm_cfg'] = dict(
mean=np.zeros(num_channels, dtype=np.float32),
std=np.ones(num_channels, dtype=np.float32),
to_rgb=False)
return results
def __repr__(self):
"""str: Return a string that describes the module."""
repr_str = self.__class__.__name__
repr_str += f'(to_float32={self.to_float32}, '
repr_str += f"color_type='{self.color_type}')"
return repr_str
这个类的作用是从文件中加载多视角图像(如多个摄像头视角)。继承自 object,并实现了标准的数据变换接口(即 call 方法),符合 MMDetection3D 的数据 pipeline 规范。
当在 bevformer_tiny_rgb.py 中如下配置时,MMDetection3D 框架就会使用这个类来加载图像。
dict(type='LoadMultiViewImageFromFiles', to_float32=True)
一般通过build_pipeline()操作构建数据处理流水线对象,Compose把 dict(type=‘xxx’) 列表转换为可执行的变换对象序列,该操作一般包括以下步骤:
(1).遍历 pipeline 配置中的每一个 dict(type=‘xxx’)。
(2).使用反射机制查找对应的类名(如 LoadMultiViewImageFromFiles)。
(3).实例化这些类(传入其他参数,如to_float=True)。
(4).返回可被调用的 pipeline 流水线供训练使用。