How to Build a Data Closed-loop Platform for
Autonomous Driving?
Yu Huang
Sunnyvale, California
Yu.huang07@gmail.com
Outline
• Introduction;
• data driven models for autonomous driving;
• cloud computing infrastructure and big data processing;
• annotation tools for training data;
• large scale model training platform;
• model testing and verification;
• related machine learning techniques;
• Conclusion.
Introduction
• Development engineering of autonomous driving is to solve a “long-tail problem” of rare events;
• Corner cases occurring, bring valuable sources for data-driven algorithms & models.
https://www.self-driving-cars.org/
• Tesla’s data engine
ICML 2019
Introduction
• Google Waymo‘s ML factory
Introduction
MIT 2019
• Nvidia’s AV ML platform MAGLEV
Introduction
data driven models for autonomous driving
• Usually the self driving platform is classified as end-to-end (E2E) or modular system
“A Survey of Autonomous Driving: Common Practices and Emerging Technologies” 
data driven models for autonomous driving
• Usually it is obvious that the E2E system applies data driven models
“E2E Learning of Driving Models with Surround-View Cameras and Route Planners”
data driven models for autonomous driving
• Modular system
• Perception
• Mapping-Localization
• Prediction
• Planning
• Control
• Sensor Data Preprocessing
• Simulation
data driven models for autonomous driving
• Perception: 2D/3D detection, segmetation, tracking and (early/late) fusion etc.
data driven models for autonomous driving
• Perception: 2D/3D detection, segmentation, tracking and (early/late) fusion etc.
data driven models for autonomous driving
• Mapping-Localization: semantic map, feature design, map update/online mapping, SLAM,
pose estimation and odometry etc.
data driven models for autonomous driving
• Prediction: trajectory forecasting, agent behavior & interaction, multimodal, and perception-
prediction etc.
data driven models for autonomous driving
• Planning: reinforcement learning, imitation learning, inverse reinforcement learning, localization
& personalization of planning (aggressive or conservative), prediction-planning, and mapping-
localization-prediction-planning etc.
data driven models for autonomous driving
• Planning: reinforcement learning, imitation learning, inverse reinforcement learning,
localization & personalization of planning (aggressive or conservative), prediction-planning,
and mapping-localization-prediction-planning etc.
data driven models for autonomous driving
• Control: reinforcement learning, imitation learning, inverse reinforcement learning, and
planning-control etc.
data driven models for autonomous driving
• Sensor Data Preprocessing: pollution/dust detection, defogging, deraining, desnowing,
denoising, and enhancement etc.
data driven models for autonomous driving
• Simulation: vehicle/human, sensor, traffic, road and environment modeling
etc.
cloud computing infrastructure and big data processing
• Data batch/stream processing, workflow management, distributed
computing, state monitoring and data storage.
AWS Momenta
Amazon Elastic Compute
Cloud(EC2)
Amazon Elasticsearch Service
Amazon Kinesis
Amazon SageMaker
cloud computing infrastructure and big data processing
• Data batch/stream processing, workflow management, distributed
computing, state monitoring and data storage.
cloud computing infrastructure and big data processing
• Data batch/stream processing, workflow management, distributed
computing, state monitoring and data storage.
Apache Spark
Apache Kafka
Apache Flink
Apache Airflow
cloud computing infrastructure and big data processing
• Data batch/stream processing, resource monitoring & scheduling, workflow
management, distributed computing, state monitoring and data storage.
Apache Cassandra
Apache HBase
Apache Mesos
cloud computing infrastructure and big data processing
• Data batch/stream processing, resource monitoring & scheduling, workflow
management, distributed computing, state monitoring and data storage.
Kubernetes
Apache Hudi
Presto
annotation tools for training data
• There are manual, semi-automatic or full automatic tools for annotation.
annotation tools for training data
• There are manual, semi-automatic or full automatic tools for annotation.
annotation tools for training data
• There are manual, semi-automatic or full automatic tools for annotation.
annotation tools for training data
• There are manual, semi-automatic or full automatic tools for annotation.
annotation tools for training data
• visualization tools are used for viewing/debugging/replaying the data, besides of annotation.
Uber open sourced visualization tool: Autonomous Visualization System (AVS)
annotation tools for training data
• visualization tools are used for viewing/debugging/replaying the data, besides of annotation.
”XVIZ“- Protocol for Real-Time Transfer and Visualization of Autonomy Data
annotation tools for training data
• visualization tools are used for viewing/debugging/replaying the data, besides of annotation.
streetscape.gl:a visualization toolkit for autonomy and robotics data encoded in the XVIZ protocol.
large scale model training platform
• There are open deep learning training platforms, previously as Caffe, now the most popular ones
are Tensorflow and PyTorch.
large scale model training platform
• There are open deep learning training platforms, previously as Caffe, now the most popular ones
are Tensorflow and PyTorch.
Ring AllReduce Architecture
Parameter Server Architecture (PS)
model testing and verification
• Model Testing and Verification: simulation (MIL/SIL/HIL/VIL), closed driving
district, open driving area & users (such as Tesla’s shadow mode) .
model testing and verification
• Model Testing and Verification(MIL/SIL/HIL/VIL)
LiDARsim
model testing and verification
• Model Testing and Verification(MIL/SIL/HIL/VIL)
S3:Shape, Skeleton, and Skinning
model testing and verification
• Model Testing and Verification(MIL/SIL/HIL/VIL)
SceneGen
model testing and verification
• Model Testing and Verification(MIL/SIL/HIL/VIL)
TrafficSim
model testing and verification
• Model Testing and Verification(MIL/SIL/HIL/VIL)
GeoSim
model testing and verification
• Model Testing and Verification(MIL/SIL/HIL/VIL)
AdvSim
model testing and verification
• Model Testing and Verification(MIL/SIL/HIL/VIL)
SurfelGAN
model testing and verification
• Testing from closed driving district
model testing and verification
• Testing from open driving area
model testing and verification
• Testing from users (such as Tesla’s shadow mode)
related machine learning techniques
• Active learning
• OOD detection & Corner case detection
• Data augmentation/Adversarial learning
• Transfer learning/Domain adaptation
• AutoML/Meta-learning
• Semi-supervised learning
• Self-supervised learning
• Zero/Few shot learning
• Continual learning/Open world learning
related machine learning techniques
• active learning : The goal
of active learning is to find effective
ways to choose data points to label,
from a pool of unlabeled data points, in
order to maximize the accuracy. Active
learning is typically an iterative process
in which a model is learned at each
iteration and a set of points is chosen
to be labelled from a pool of unlabeled
points using some heuristics.
related machine learning techniques
• active learning:
related machine learning techniques
•OOD detection & Corner case detection: To detect OOD samples based on uncertainty estimate is
important in safety-critical applications; The challenging task of corner case detection, aims at
detecting these unusual situations, which could become critical to communicate this to the
autonomous driving system (online use case), also in offline mode to screen vast amounts of data and
select only the relevant situations.
related machine learning techniques
• Data augmentation/Adversarial learning: Data Augmentation
encompasses a suite of techniques that enhance the size and quality of training
datasets such that better Deep Learning models can be built using them;
Adversarial training can be an effective method for searching for augmentations.
related machine learning techniques
• Data augmentation/Adversarial learning:
related machine learning techniques
• Transfer learning/Domain adaptation: Transfer learning (TL) relaxes the hypothesis
that the training data must be independent and identically distributed (i.i.d.) with the test data,
which motivates us to use transfer learning to against the problem of insufficient training data;
Domain adaptation (DA) is a particular case of transfer learning (TL) that utilizes labeled data in
one or more relevant source domains to execute new tasks in a target domain.
related machine learning techniques
• Transfer learning/Domain adaptation
related machine learning techniques
• AutoML/Meta-learning: Automated Machine Learning (AutoML) is designed to reduce the
demand for data scientists and enable domain experts to automatically build machine learning
applications without much requirement for statistical and machine learning knowledge; Meta-
learning is closely related to AutoML since they share the same objectives of study, namely the
learning tools and learning problem.
related machine learning techniques
• Semi-supervised learning: Self-supervised Learning is to leverage the unlabeled data
to produce a prediction function with trainable parameters, that is more accurate than what
would have been obtained by only using the labeled data.
related machine learning techniques
• Semi-supervised learning:
related machine learning techniques
• Self-supervised learning: Self-supervised learning viewed as a branch
of unsupervised learning, which aims at recovering, not discovering; It uses
a pretext task to learn representations on unlabeled data.
related machine learning techniques
•Zero/Few shot learning: Zero-shot learning (ZSL) aims to recognize objects
whose instances may not be seen during training. Zero shot learning belongs to
transfer learning; Few-Shot Learning (FSL) comes for learning from limited
supervised information to get the hang of the task; Many FSL methods are meta-
learning methods, using the meta-learner as prior knowledge.
related machine learning techniques
•Zero/Few shot learning:
related machine learning techniques
•Zero/Few shot learning:
related machine learning techniques
• Continual learning/Open world learning: Continual learning can
continually accumulate knowledge over different tasks without the need to retrain
from scratch; Open set recognition (OSR), requiring the classifiers to not only
accurately classify the seen classes, but also effectively deal with unseen ones;
Open world learning can be seen as a sub task of continual learning.
related machine learning techniques
• Continual learning/Open world learning:
related machine learning techniques
• Continual learning/Open world learning:
Conclusion
• In summary, the key in the data closed loop building is the
sourceful data.
• The data driven models or algorithms applied to solve
autonomous driving tasks is the base.
• The trend for this system upgrade depends on:
Ø Data mode (camera, LiDAR, radar, IMU etc.)
Ø Data driven model architecture (AutoML)
Ø Policy to select and use the data (Corner case).
Thank You

How to Build a Data Closed-loop Platform for Autonomous Driving?

  • 1.
    How to Builda Data Closed-loop Platform for Autonomous Driving? Yu Huang Sunnyvale, California [email protected]
  • 2.
    Outline • Introduction; • datadriven models for autonomous driving; • cloud computing infrastructure and big data processing; • annotation tools for training data; • large scale model training platform; • model testing and verification; • related machine learning techniques; • Conclusion.
  • 3.
    Introduction • Development engineeringof autonomous driving is to solve a “long-tail problem” of rare events; • Corner cases occurring, bring valuable sources for data-driven algorithms & models. https://www.self-driving-cars.org/
  • 4.
    • Tesla’s dataengine ICML 2019 Introduction
  • 5.
    • Google Waymo‘sML factory Introduction MIT 2019
  • 6.
    • Nvidia’s AVML platform MAGLEV Introduction
  • 7.
    data driven modelsfor autonomous driving • Usually the self driving platform is classified as end-to-end (E2E) or modular system “A Survey of Autonomous Driving: Common Practices and Emerging Technologies” 
  • 8.
    data driven modelsfor autonomous driving • Usually it is obvious that the E2E system applies data driven models “E2E Learning of Driving Models with Surround-View Cameras and Route Planners”
  • 9.
    data driven modelsfor autonomous driving • Modular system • Perception • Mapping-Localization • Prediction • Planning • Control • Sensor Data Preprocessing • Simulation
  • 10.
    data driven modelsfor autonomous driving • Perception: 2D/3D detection, segmetation, tracking and (early/late) fusion etc.
  • 11.
    data driven modelsfor autonomous driving • Perception: 2D/3D detection, segmentation, tracking and (early/late) fusion etc.
  • 12.
    data driven modelsfor autonomous driving • Mapping-Localization: semantic map, feature design, map update/online mapping, SLAM, pose estimation and odometry etc.
  • 13.
    data driven modelsfor autonomous driving • Prediction: trajectory forecasting, agent behavior & interaction, multimodal, and perception- prediction etc.
  • 14.
    data driven modelsfor autonomous driving • Planning: reinforcement learning, imitation learning, inverse reinforcement learning, localization & personalization of planning (aggressive or conservative), prediction-planning, and mapping- localization-prediction-planning etc.
  • 15.
    data driven modelsfor autonomous driving • Planning: reinforcement learning, imitation learning, inverse reinforcement learning, localization & personalization of planning (aggressive or conservative), prediction-planning, and mapping-localization-prediction-planning etc.
  • 16.
    data driven modelsfor autonomous driving • Control: reinforcement learning, imitation learning, inverse reinforcement learning, and planning-control etc.
  • 17.
    data driven modelsfor autonomous driving • Sensor Data Preprocessing: pollution/dust detection, defogging, deraining, desnowing, denoising, and enhancement etc.
  • 18.
    data driven modelsfor autonomous driving • Simulation: vehicle/human, sensor, traffic, road and environment modeling etc.
  • 19.
    cloud computing infrastructureand big data processing • Data batch/stream processing, workflow management, distributed computing, state monitoring and data storage. AWS Momenta Amazon Elastic Compute Cloud(EC2) Amazon Elasticsearch Service Amazon Kinesis Amazon SageMaker
  • 20.
    cloud computing infrastructureand big data processing • Data batch/stream processing, workflow management, distributed computing, state monitoring and data storage.
  • 21.
    cloud computing infrastructureand big data processing • Data batch/stream processing, workflow management, distributed computing, state monitoring and data storage. Apache Spark Apache Kafka Apache Flink Apache Airflow
  • 22.
    cloud computing infrastructureand big data processing • Data batch/stream processing, resource monitoring & scheduling, workflow management, distributed computing, state monitoring and data storage. Apache Cassandra Apache HBase Apache Mesos
  • 23.
    cloud computing infrastructureand big data processing • Data batch/stream processing, resource monitoring & scheduling, workflow management, distributed computing, state monitoring and data storage. Kubernetes Apache Hudi Presto
  • 24.
    annotation tools fortraining data • There are manual, semi-automatic or full automatic tools for annotation.
  • 25.
    annotation tools fortraining data • There are manual, semi-automatic or full automatic tools for annotation.
  • 26.
    annotation tools fortraining data • There are manual, semi-automatic or full automatic tools for annotation.
  • 27.
    annotation tools fortraining data • There are manual, semi-automatic or full automatic tools for annotation.
  • 28.
    annotation tools fortraining data • visualization tools are used for viewing/debugging/replaying the data, besides of annotation. Uber open sourced visualization tool: Autonomous Visualization System (AVS)
  • 29.
    annotation tools fortraining data • visualization tools are used for viewing/debugging/replaying the data, besides of annotation. ”XVIZ“- Protocol for Real-Time Transfer and Visualization of Autonomy Data
  • 30.
    annotation tools fortraining data • visualization tools are used for viewing/debugging/replaying the data, besides of annotation. streetscape.gl:a visualization toolkit for autonomy and robotics data encoded in the XVIZ protocol.
  • 31.
    large scale modeltraining platform • There are open deep learning training platforms, previously as Caffe, now the most popular ones are Tensorflow and PyTorch.
  • 32.
    large scale modeltraining platform • There are open deep learning training platforms, previously as Caffe, now the most popular ones are Tensorflow and PyTorch. Ring AllReduce Architecture Parameter Server Architecture (PS)
  • 33.
    model testing andverification • Model Testing and Verification: simulation (MIL/SIL/HIL/VIL), closed driving district, open driving area & users (such as Tesla’s shadow mode) .
  • 34.
    model testing andverification • Model Testing and Verification(MIL/SIL/HIL/VIL) LiDARsim
  • 35.
    model testing andverification • Model Testing and Verification(MIL/SIL/HIL/VIL) S3:Shape, Skeleton, and Skinning
  • 36.
    model testing andverification • Model Testing and Verification(MIL/SIL/HIL/VIL) SceneGen
  • 37.
    model testing andverification • Model Testing and Verification(MIL/SIL/HIL/VIL) TrafficSim
  • 38.
    model testing andverification • Model Testing and Verification(MIL/SIL/HIL/VIL) GeoSim
  • 39.
    model testing andverification • Model Testing and Verification(MIL/SIL/HIL/VIL) AdvSim
  • 40.
    model testing andverification • Model Testing and Verification(MIL/SIL/HIL/VIL) SurfelGAN
  • 41.
    model testing andverification • Testing from closed driving district
  • 42.
    model testing andverification • Testing from open driving area
  • 43.
    model testing andverification • Testing from users (such as Tesla’s shadow mode)
  • 44.
    related machine learningtechniques • Active learning • OOD detection & Corner case detection • Data augmentation/Adversarial learning • Transfer learning/Domain adaptation • AutoML/Meta-learning • Semi-supervised learning • Self-supervised learning • Zero/Few shot learning • Continual learning/Open world learning
  • 45.
    related machine learningtechniques • active learning : The goal of active learning is to find effective ways to choose data points to label, from a pool of unlabeled data points, in order to maximize the accuracy. Active learning is typically an iterative process in which a model is learned at each iteration and a set of points is chosen to be labelled from a pool of unlabeled points using some heuristics.
  • 46.
    related machine learningtechniques • active learning:
  • 47.
    related machine learningtechniques •OOD detection & Corner case detection: To detect OOD samples based on uncertainty estimate is important in safety-critical applications; The challenging task of corner case detection, aims at detecting these unusual situations, which could become critical to communicate this to the autonomous driving system (online use case), also in offline mode to screen vast amounts of data and select only the relevant situations.
  • 48.
    related machine learningtechniques • Data augmentation/Adversarial learning: Data Augmentation encompasses a suite of techniques that enhance the size and quality of training datasets such that better Deep Learning models can be built using them; Adversarial training can be an effective method for searching for augmentations.
  • 49.
    related machine learningtechniques • Data augmentation/Adversarial learning:
  • 50.
    related machine learningtechniques • Transfer learning/Domain adaptation: Transfer learning (TL) relaxes the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to against the problem of insufficient training data; Domain adaptation (DA) is a particular case of transfer learning (TL) that utilizes labeled data in one or more relevant source domains to execute new tasks in a target domain.
  • 51.
    related machine learningtechniques • Transfer learning/Domain adaptation
  • 52.
    related machine learningtechniques • AutoML/Meta-learning: Automated Machine Learning (AutoML) is designed to reduce the demand for data scientists and enable domain experts to automatically build machine learning applications without much requirement for statistical and machine learning knowledge; Meta- learning is closely related to AutoML since they share the same objectives of study, namely the learning tools and learning problem.
  • 53.
    related machine learningtechniques • Semi-supervised learning: Self-supervised Learning is to leverage the unlabeled data to produce a prediction function with trainable parameters, that is more accurate than what would have been obtained by only using the labeled data.
  • 54.
    related machine learningtechniques • Semi-supervised learning:
  • 55.
    related machine learningtechniques • Self-supervised learning: Self-supervised learning viewed as a branch of unsupervised learning, which aims at recovering, not discovering; It uses a pretext task to learn representations on unlabeled data.
  • 56.
    related machine learningtechniques •Zero/Few shot learning: Zero-shot learning (ZSL) aims to recognize objects whose instances may not be seen during training. Zero shot learning belongs to transfer learning; Few-Shot Learning (FSL) comes for learning from limited supervised information to get the hang of the task; Many FSL methods are meta- learning methods, using the meta-learner as prior knowledge.
  • 57.
    related machine learningtechniques •Zero/Few shot learning:
  • 58.
    related machine learningtechniques •Zero/Few shot learning:
  • 59.
    related machine learningtechniques • Continual learning/Open world learning: Continual learning can continually accumulate knowledge over different tasks without the need to retrain from scratch; Open set recognition (OSR), requiring the classifiers to not only accurately classify the seen classes, but also effectively deal with unseen ones; Open world learning can be seen as a sub task of continual learning.
  • 60.
    related machine learningtechniques • Continual learning/Open world learning:
  • 61.
    related machine learningtechniques • Continual learning/Open world learning:
  • 62.
    Conclusion • In summary,the key in the data closed loop building is the sourceful data. • The data driven models or algorithms applied to solve autonomous driving tasks is the base. • The trend for this system upgrade depends on: Ø Data mode (camera, LiDAR, radar, IMU etc.) Ø Data driven model architecture (AutoML) Ø Policy to select and use the data (Corner case).
  • 63.