Qi_Frustum_PointNets_for_CVPR_2018_paper.pdf_Frustum

需积分: 9 106 浏览量 2021-05-18 14:14:33 上传评论收藏 2.09MB PDF 举报

在这篇题为《Qi_Frustum_PointNets_for_CVPR_2018_paper.pdf》的论文中，作者们集中探讨了从RGB-D数据进行三维目标检测的问题。RGB-D数据指的是包含颜色信息（RGB）和深度信息（D）的数据集。作者们提出了一种新颖的方法，名为Frustum PointNets，该方法直接在原始点云上进行处理，利用成熟的2D检测技术和深度图来映射点云，从而解决点云数据的三维检测问题。该方法解决了如何在大型场景的点云中高效地定位物体的问题（区域提议）。以往的方法要么关注图像，要么关注三维体素（3D voxels），往往会遮蔽三维数据中的自然模式和不变性。与此不同的是，作者们提出的方法直接利用原始点云，并借助成熟的二维物体检测器和高级的三维深度学习技术进行物体定位，从而在保持效率的同时，实现了对小型物体的高召回率。由于直接在原始点云中学习，该方法还能够精确地估计出即使在强遮挡或点数非常稀疏的情况下也能准确估计三维边界框。在KITTI和SUN RGB-D三维检测基准测试中，该方法的表现优于现有的最先进技术，并且具有实时能力。深度学习和点云数据处理是这篇文章的核心内容。在深度学习领域，卷积神经网络（CNN）被广泛应用于图像处理任务中，但在三维数据处理上存在一定的局限性。三维点云数据缺乏规则的网格结构，这给深度学习模型的设计和应用带来了挑战。因此，该论文作者利用了PointNet架构，这是一种可以处理不规则结构数据的神经网络，它可以在点云上直接进行学习和特征提取。 PointNet架构在处理点云数据方面具有独特的优势。它通过学习直接在原始点云上进行，不依赖于点云的任何预处理。PointNet通过一个对称函数能够处理可变数量的点，这意味着它能够处理不同密度和不同规模的点云数据集。 Frustum PointNets结合了二维检测技术和三维深度学习的优点，通过将二维区域提议（region proposal）转换为三维锥形（frustum），以定位和检测场景中的物体。这种方法有助于减少三维检测的计算负担，同时提高了检测的准确性。此外，文章还介绍了三维物体检测的重要性，尤其是在自动驾驶和增强现实（AR）等应用中。随着三维传感器在移动设备和自动驾驶车辆上的普及，大量三维数据被采集和处理。因此，如何高效准确地从三维传感器数据中检测物理对象，成为了三维感知任务中非常重要的一部分。三维目标检测的任务不仅包括对物体进行分类，还包括估计物理对象的方向三维边界框。作者通过实验验证了所提出方法的有效性。在多个三维检测基准测试中，Frustum PointNets展现出了优越的性能，尤其是在小型物体检测和高遮挡场景下的表现，这表明该方法在实际应用中具有很高的潜力。论文的研究成果不仅推动了三维目标检测技术的发展，也为未来的相关研究提供了新的思路和方法论。

资源推荐

资源详情

资源评论

Frustum PointNets for 3D Object Detection from RGB-D Data

Charles R. Qi

1∗

Wei Liu

Chenxia Wu

Hao Su

Leonidas J. Guibas

Stanford University

Nuro, Inc.

UC San Diego

Abstract

In this work, we study 3D object detection from RGB-

D data in both indoor and outdoor scenes. While previous

methods focus on images or 3D voxels, often obscuring nat-

ural 3D patterns and invariances of 3D data, we directly

operate on raw point clouds by popping up RGB-D scans.

However, a key challenge of this approach is how to efﬁ-

ciently localize objects in point clouds of large-scale scenes

(region proposal). Instead of solely relying on 3D propos-

als, our method leverages both mature 2D object detec-

tors and advanced 3D deep learning for object localization,

achieving efﬁciency as well as high recall for even small ob-

jects. Beneﬁted from learning directly in raw point clouds,

our method is also able to precisely estimate 3D bound-

ing boxes even under strong occlusion or with very sparse

points. Evaluated on KITTI and SUN RGB-D 3D detection

benchmarks, our method outperforms the state of the art by

remarkable margins while having real-time capability.

1. Introduction

Recently, great progress has been made on 2D image un-

derstanding tasks, such as object detection [

10] and instance

segmentation [

11]. However, beyond getting 2D bounding

boxes or pixel masks, 3D understanding is eagerly in de-

mand in many applications such as autonomous driving and

augmented reality (AR). With the popularity of 3D sensors

deployed on mobile devices and autonomous vehicles, more

and more 3D data is captured and processed. In this work,

we study one of the most important 3D perception tasks –

3D object detection, which classiﬁes the object category and

estimates oriented 3D bounding boxes of physical objects

from 3D sensor data.

While 3D sensor data is often in the form of point clouds,

how to represent point cloud and what deep net architec-

tures to use for 3D object detection remains an open prob-

lem. Most existing works convert 3D point clouds to im-

ages by projection [

30, 21] or to volumetric grids by quan-

tization [

33, 18, 21] and then apply convolutional networks.

∗

Majority of the work done as an intern at Nuro, Inc.

depth to point cloud

2D region (from CNN) to 3D frustum

3D box (from PointNet)

Figure 1. 3D object detection pipeline. Given RGB-D data, we

ﬁrst generate 2D object region proposals in the RGB image using a

CNN. Each 2D region is then extruded to a 3D viewing frustum in

which we get a point cloud from depth data. Finally, our frustum

PointNet predicts a (oriented and amodal) 3D bounding box for

the object from the points in frustum.

This data representation transformation, however, may ob-

scure natural 3D patterns and invariances of the data. Re-

cently, a number of papers have proposed to process point

clouds directly without converting them to other formats.

For example, [

20, 22] proposed new types of deep net archi-

tectures, called PointNets, which have shown superior per-

formance and efﬁciency in several 3D understanding tasks

such as object classiﬁcation and semantic segmentation.

While PointNets are capable of classifying a whole point

cloud or predicting a semantic class for each point in a point

cloud, it is unclear how this architecture can be used for

instance-level 3D object detection. Towards this goal, we

have to address one key challenge: how to efﬁciently pro-

pose possible locations of 3D objects in a 3D space. Imi-

tating the practice in image detection, it is straightforward

to enumerate candidate 3D boxes by sliding windows [

or by 3D region proposal networks such as [

27]. However,

the computational complexity of 3D search typically grows

cubically with respect to resolution and becomes too ex-

pensive for large scenes or real-time applications such as

autonomous driving.

Instead, in this work, we reduce the search space fol-

lowing the dimension reduction principle: we take the ad-

vantage of mature 2D object detectors (Fig.

1). First, we

extract the 3D bounding frustum of an object by extruding

2D bounding boxes from image detectors. Then, within the

3D space trimmed by each of the 3D frustums, we consecu-

tively perform 3D object instance segmentation and amodal

918

3D bounding box regression using two variants of Point-

Net. The segmentation network predicts the 3D mask of

the object of interest (i.e. instance segmentation); and the

regression network estimates the amodal 3D bounding box

(covering the entire object even if only part of it is visible).

In contrast to previous work that treats RGB-D data as

2D maps for CNNs, our method is more 3D-centric as we

lift depth maps to 3D point clouds and process them us-

ing 3D tools. This 3D-centric view enables new capabilities

for exploring 3D data in a more effective manner. First,

in our pipeline, a few transformations are applied succes-

sively on 3D coordinates, which align point clouds into a

sequence of more constrained and canonical frames. These

alignments factor out pose variations in data, and thus make

3D geometry pattern more evident, leading to an easier job

of 3D learners. Second, learning in 3D space can better ex-

ploits the geometric and topological structure of 3D space.

In principle, all objects live in 3D space; therefore, we be-

lieve that many geometric structures, such as repetition, pla-

narity, and symmetry, are more naturally parameterized and

captured by learners that directly operate in 3D space. The

usefulness of this 3D-centric network design philosophy has

been supported by much recent experimental evidence.

Our method achieve leading positions on KITTI 3D ob-

ject detection [

1] and bird’s eye view detection [2] bench-

marks. Compared with the previous state of the art [

5], our

method is 8.04% better on 3D car AP with high efﬁciency

(running at 5 fps). Our method also ﬁts well to indoor RGB-

D data where we have achieved 8.9% and 6.4% better 3D

mAP than [

13] and [24] on SUN-RGBD while running one

to three orders of magnitude faster.

The key contributions of our work are as follows:

• We propose a novel framework for RGB-D data based

3D object detection called Frustum PointNets.

• We show how we can train 3D object detectors un-

der our framework and achieve state-of-the-art perfor-

mance on standard 3D object detection benchmarks.

• We provide extensive quantitative evaluations to vali-

date our design choices as well as rich qualitative re-

sults for understanding the strengths and limitations of

our method.

2. Related Work

3D Object Detection from RGB-D Data Researchers

have approached the 3D detection problem by taking var-

ious ways to represent RGB-D data.

Front view image based methods: [

3, 19, 34] take

monocular RGB images and shape priors or occlusion pat-

terns to infer 3D bounding boxes. [

15, 6] represent depth

data as 2D maps and apply CNNs to localize objects in 2D

image. In comparison we represent depth as a point cloud

and use advanced 3D deep networks (PointNets) that can

exploit 3D geometry more effectively.

Bird’s eye view based methods: MV3D [

5] projects Li-

DAR point cloud to bird’s eye view and trains a region pro-

posal network (RPN [

23]) for 3D bounding box proposal.

However, the method lags behind in detecting small objects,

such as pedestrians and cyclists and cannot easily adapt to

scenes with multiple objects in vertical direction.

3D based methods: [

31, 28] train 3D object classiﬁers

by SVMs on hand-designed geometry features extracted

from point cloud and then localize objects using sliding-

window search. [

7] extends [31] by replacing SVM with

3D CNN on voxelized 3D grids. [

24] designs new geomet-

ric features for 3D object detection in a point cloud. [

29, 14]

convert a point cloud of the entire scene into a volumetric

grid and use 3D volumetric CNN for object proposal and

classiﬁcation. Computation cost for those method is usu-

ally quite high due to the expensive cost of 3D convolutions

and large 3D search space. Recently, [

13] proposes a 2D-

driven 3D object detection method that is similar to ours

in spirit. However, they use hand-crafted features (based

on histogram of point coordinates) with simple fully con-

nected networks to regress 3D box location and pose, which

is sub-optimal in both speed and performance. In contrast,

we propose a more ﬂexible and effective solution with deep

3D feature learning (PointNets).

Deep Learning on Point Clouds Most existing works

convert point clouds to images or volumetric forms before

feature learning. [

33, 18, 21] voxelize point clouds into

volumetric grids and generalize image CNNs to 3D CNNs.

[

16, 25, 32, 7] design more efﬁcient 3D CNN or neural net-

work architectures that exploit sparsity in point cloud. How-

ever, these CNN based methods still require quantitization

of point clouds with certain voxel resolution. Recently, a

few works [

20, 22] propose a novel type of network archi-

tectures (PointNets) that directly consumes raw point clouds

without converting them to other formats. While PointNets

have been applied to single object classiﬁcation and seman-

tic segmentation, our work explores how to extend the ar-

chitecture for the purpose of 3D object detection.

3. Problem Deﬁnition

Given RGB-D data as input, our goal is to classify and

localize objects in 3D space. The depth data, obtained from

LiDAR or indoor depth sensors, is represented as a point

cloud in RGB camera coordinates. The projection matrix

is also known so that we can get a 3D frustum from a 2D

image region. Each object is represented by a class (one

among k predeﬁned classes) and an amodal 3D bounding

box. The amodal box bounds the complete object even if

part of the object is occluded or truncated. The 3D box is

919

剩余9页未读，继续阅读

评论收藏

内容反馈

CSDNmasterNDSC

粉丝: 19

Qi_Frustum_PointNets_for_CVPR_2018_paper.pdf

2018年aicity挑战赛第一题论文Tang_Single-Camera_and_Inter-Camera_CVPR_2018_paper.pdf

Homayounfar_Hierarchical_Recurrent_Attention_CVPR_2018_paper.pdf

Mattyus_Matching_Adversarial_Networks_CVPR_2018_paper.pdf

Barath_Five-Point_Fundamental_Matrix_CVPR_2018_paper.pdf

Yang_PIXOR_Real-Time_3D_CVPR_2018_paper.pdf

PointNet.pdf

2018_Frustum PointNets for 3D Object Detection from RGB-D Data论文翻译

data_arrange.rar_Frustum_Frustum culling

3D-frustum-pointnets.zip

rvize在ros中显示代码

Ray_Casting_Algorithm.zip_Graphics_ray_zip

SteamVR开发文档.pdf

Men_A_Common_Framework_CVPR_2018_paper.pdf

Fast Extraction of Viewing Frustum Planes from the World-View-Projection Matrix.pdf

abspractdecomposition.rar_远程控制编程_Flex_

imageJoin.zip_抽象_球 拼接_球面 平面_球面图片 平面_球面拼接

threejs-camera-roll-frustum

openGL.zip_C++_OpenGL 渲染_opengl 地形_地形 opengl_地形渲染

PathfindingProject_Free_WebsiteDownload_4_0_11_cdebea3b.zip

OpenGL3.3_GetStarting_FirstCameraCorrect.

matlab开发-UEAgeometryzip

Metal.编程向导英文版.and.Reference.via.Swift.pdf.zip

Directx11_CSM.rarDirectx11_CSM.rarDirectx11_CSM.rar

中文版ROAM实时动态LOD地形渲染代码

View Frustum Culling Tutorial

D3D11_ColorMappedTerrain.rar

examplesForACIS70

基于cesium实现视频投射功能

OpenGL.rar_Transformations

frustum-convnet:用于 3D 对象检测的 F-ConvNet 的 PyTorch 实现

OOP - 面向对象编程 - 学习/实践

DEXA 2021：数据库与专家系统应用前沿

最新资源

imageJoin.zip_抽象_球拼接_球面平面_球面图片平面_球面拼接