How to Build a Data Closed-loop Platform for Autonomous Driving?

How to Build a Data Closed-loop Platform for
Autonomous Driving?
Yu Huang
Sunnyvale, California
Yu.huang07@gmail.com

Outline
• Introduction;
• data driven models for autonomous driving;
• cloud computing infrastructure and big data processing;
• annotation tools for training data;
• large scale model training platform;
• model testing and verification;
• related machine learning techniques;
• Conclusion.

Introduction
• Development engineering of autonomous driving is to solve a “long-tail problem” of rare events;
• Corner cases occurring, bring valuable sources for data-driven algorithms & models.
https://www.self-driving-cars.org/

• Tesla’s data engine
ICML 2019
Introduction

• Google Waymo‘s ML factory
Introduction
MIT 2019

• Nvidia’s AV ML platform MAGLEV
Introduction

data driven models for autonomous driving
• Usually the self driving platform is classified as end-to-end (E2E) or modular system
“A Survey of Autonomous Driving: Common Practices and Emerging Technologies”

• Usually it is obvious that the E2E system applies data driven models
“E2E Learning of Driving Models with Surround-View Cameras and Route Planners”

• Modular system
• Perception
• Mapping-Localization
• Prediction
• Planning
• Control
• Sensor Data Preprocessing
• Simulation

• Perception: 2D/3D detection, segmetation, tracking and (early/late) fusion etc.

• Perception: 2D/3D detection, segmentation, tracking and (early/late) fusion etc.

• Mapping-Localization: semantic map, feature design, map update/online mapping, SLAM,
pose estimation and odometry etc.

• Prediction: trajectory forecasting, agent behavior & interaction, multimodal, and perception-
prediction etc.

• Planning: reinforcement learning, imitation learning, inverse reinforcement learning, localization
& personalization of planning (aggressive or conservative), prediction-planning, and mapping-
localization-prediction-planning etc.

• Planning: reinforcement learning, imitation learning, inverse reinforcement learning,
localization & personalization of planning (aggressive or conservative), prediction-planning,
and mapping-localization-prediction-planning etc.

• Control: reinforcement learning, imitation learning, inverse reinforcement learning, and
planning-control etc.

• Sensor Data Preprocessing: pollution/dust detection, defogging, deraining, desnowing,
denoising, and enhancement etc.

• Simulation: vehicle/human, sensor, traffic, road and environment modeling
etc.

cloud computing infrastructure and big data processing
• Data batch/stream processing, workflow management, distributed
computing, state monitoring and data storage.
AWS Momenta
Amazon Elastic Compute
Cloud（EC2）
Amazon Elasticsearch Service
Amazon Kinesis
Amazon SageMaker

Apache Spark
Apache Kafka
Apache Flink
Apache Airflow

• Data batch/stream processing, resource monitoring & scheduling, workflow
management, distributed computing, state monitoring and data storage.
Apache Cassandra
Apache HBase
Apache Mesos

• Data batch/stream processing, resource monitoring & scheduling, workflow
management, distributed computing, state monitoring and data storage.
Kubernetes
Apache Hudi
Presto

annotation tools for training data
• There are manual, semi-automatic or full automatic tools for annotation.

• visualization tools are used for viewing/debugging/replaying the data, besides of annotation.
Uber open sourced visualization tool: Autonomous Visualization System (AVS)

”XVIZ“- Protocol for Real-Time Transfer and Visualization of Autonomy Data

streetscape.gl：a visualization toolkit for autonomy and robotics data encoded in the XVIZ protocol.

large scale model training platform
• There are open deep learning training platforms, previously as Caffe, now the most popular ones
are Tensorflow and PyTorch.

large scale model training platform
• There are open deep learning training platforms, previously as Caffe, now the most popular ones
are Tensorflow and PyTorch.
Ring AllReduce Architecture
Parameter Server Architecture (PS)

model testing and verification
• Model Testing and Verification: simulation (MIL/SIL/HIL/VIL）, closed driving
district, open driving area & users (such as Tesla’s shadow mode) .

• Model Testing and Verification（MIL/SIL/HIL/VIL）
LiDARsim

S3：Shape, Skeleton, and Skinning

SceneGen

TrafficSim

GeoSim

AdvSim

SurfelGAN

• Testing from closed driving district

• Testing from open driving area

• Testing from users (such as Tesla’s shadow mode)

related machine learning techniques
• Active learning
• OOD detection & Corner case detection
• Data augmentation/Adversarial learning
• Transfer learning/Domain adaptation
• AutoML/Meta-learning
• Semi-supervised learning
• Self-supervised learning
• Zero/Few shot learning
• Continual learning/Open world learning

• active learning ： The goal
of active learning is to find effective
ways to choose data points to label,
from a pool of unlabeled data points, in
order to maximize the accuracy. Active
learning is typically an iterative process
in which a model is learned at each
iteration and a set of points is chosen
to be labelled from a pool of unlabeled
points using some heuristics.

• active learning：

•OOD detection & Corner case detection： To detect OOD samples based on uncertainty estimate is
important in safety-critical applications; The challenging task of corner case detection, aims at
detecting these unusual situations, which could become critical to communicate this to the
autonomous driving system (online use case), also in offline mode to screen vast amounts of data and
select only the relevant situations.

• Data augmentation/Adversarial learning： Data Augmentation
encompasses a suite of techniques that enhance the size and quality of training
datasets such that better Deep Learning models can be built using them;
Adversarial training can be an effective method for searching for augmentations.

• Data augmentation/Adversarial learning：

• Transfer learning/Domain adaptation： Transfer learning (TL) relaxes the hypothesis
that the training data must be independent and identically distributed (i.i.d.) with the test data,
which motivates us to use transfer learning to against the problem of insufficient training data;
Domain adaptation (DA) is a particular case of transfer learning (TL) that utilizes labeled data in
one or more relevant source domains to execute new tasks in a target domain.

• Transfer learning/Domain adaptation

• AutoML/Meta-learning： Automated Machine Learning (AutoML) is designed to reduce the
demand for data scientists and enable domain experts to automatically build machine learning
applications without much requirement for statistical and machine learning knowledge; Meta-
learning is closely related to AutoML since they share the same objectives of study, namely the
learning tools and learning problem.

• Semi-supervised learning： Self-supervised Learning is to leverage the unlabeled data
to produce a prediction function with trainable parameters, that is more accurate than what
would have been obtained by only using the labeled data.

• Semi-supervised learning：

• Self-supervised learning： Self-supervised learning viewed as a branch
of unsupervised learning, which aims at recovering, not discovering; It uses
a pretext task to learn representations on unlabeled data.

•Zero/Few shot learning： Zero-shot learning (ZSL) aims to recognize objects
whose instances may not be seen during training. Zero shot learning belongs to
transfer learning; Few-Shot Learning (FSL) comes for learning from limited
supervised information to get the hang of the task; Many FSL methods are meta-
learning methods, using the meta-learner as prior knowledge.

•Zero/Few shot learning：

• Continual learning/Open world learning： Continual learning can
continually accumulate knowledge over different tasks without the need to retrain
from scratch; Open set recognition (OSR), requiring the classifiers to not only
accurately classify the seen classes, but also effectively deal with unseen ones;
Open world learning can be seen as a sub task of continual learning.

• Continual learning/Open world learning：

Conclusion
• In summary, the key in the data closed loop building is the
sourceful data.
• The data driven models or algorithms applied to solve
autonomous driving tasks is the base.
• The trend for this system upgrade depends on:
Ø Data mode (camera, LiDAR, radar, IMU etc.)
Ø Data driven model architecture (AutoML)
Ø Policy to select and use the data (Corner case).

How to Build a Data Closed-loop Platform for Autonomous Driving?

More Related Content

What's hot

Similar to How to Build a Data Closed-loop Platform for Autonomous Driving?

More from Yu Huang

Recently uploaded

In this document

How to Build a Data Closed-loop Platform for Autonomous Driving?