SlideShare a Scribd company logo
Deep Learning for
Embedded Vision System
Vion Technologies Co., Ltd.
Jan. 11th, 2017
Hai Tao, Dr.
Credits to all my colleagures who make
this presntation possible
Essentials
Founded
in 2005
2014.12
Series A
Funding
2016
NEEQ/
Series B
Funding
Vion Technologies: A Leader in the Field of Computer Vision
• Vion Technologies Co. Ltd., founded in 2005, currently employs
200+ talented staffs. The company is developing CV HW/SW total
solutions for intelligent transportation systems (ITS), smart video
surveillance systems and business intelligence systems.
• Huge potential for CV products in ToB markets
• Every year more than 40 million surveillance cameras are sold
globally (IDC data analysis)
• High resolution (720p, 1080p, even 4K resolution) IP cameras
are replacing the D1 resolution analog cameras
• Better algorithms enable more applications in ToB applications
• High performance, low power consumption, low cost
processors are available
• IOT+Computer Vision, Where Are
the Applications ?
• Embedded CV Hardware
• GPU, VPU, and FPGA
Intersection violation capture & smart
plate number recognition & light control
Transit & emergency vehicle lane use
capture
Parking Violation Capture
Smart Traffic
Smart parking management
Industry Applications
Malls Retail Stores
Theatre People CountingSubway People Counting
Cultural Attraction Guest Traffic
Transit People Counting
Smart People Counting
Industry Applications
Security & Counterterrorism: Fighting Security & Counterterrorism: Chasing
Rail: Driver Fatigue Intruder Alerts
Banking: ATM Protection
Mining: Production Safety
Public Security, City Management, Banking, Rail,
Border Control and Many More ...
Industry Applications
• IOT+Computer Vision, Where Are the
Applications ?
• Embedded CV Hardware
• GPU, VPU, and FPGA
• Integrated image sensing
and analysis
• Wifi probe & iBeacon
• POE powered
• Patented exterior design,
screw free installation
• H.264 real-time video output
• 2-year data storage
• Integrated image sensing and
analysis
• RS485, GPIO
• Patented exterior design
specially for transit
• H.264real-time video output
• 2-year data storage
• IP65, sealed against dust & water
• Sensor rich (multi-axis/temp)
• 3/6/8MP 25fps
• High performance platform
• 3G/4G/WIFI
• Smart traffic industry
Sensor Rich ITS Camera Smart Traffic Camera Bus People Counting
Smart Cameras
Spec: 4K resolution 4/3' CCD, Ambarella processor, Xilinx FPGA module
Applications: ePolice at road intersections, covering 4 lanes. The first 4K@25fps
ePolice in the world
Release data: 2016 Q3
Smart Cameras - 4K ePolice Camera
Spec: ARM processor, compact form format
Applications: People counting for Shopping malls and retail stores.
Release Date: 2016 Q3
Smart Cameras - People Counting
Tarsier I Module - A Step to Smart
Edge Device
Low Power
<1.5W
Video & Audio
Interface
camera, other
processors
Low Cost
<$15
Multi-Modal
IC Technology
28nm low power
High Performance
Deep Learning(CNN,
Recurrent DNN) >
40GFLOPS
Front-End AI Module - Tarsier I
Quick Time-to-Market
16'Q3
Smart Cameras Design- Bus Traffic Counting
• GPU platform, 300
GFLOPS
• Analog/IP Video Input
• 2.5 inch hardrive &
EMMC
• USB3.0, dual gigabyte
LAN
• High performance,
300 GFLOPS
• 4 3.5” hard drives
• USB 3.0
• Dual gigabyte
network ports
Front End Control
Terminal
• Dual GPU, 600
GFLOPS
• 8 analog video &
audio input
• Hard drive & EMMC
storage
• 4 alarm in, 2 out
Smart Video & Audio
Analysis Terminal
• 40 nVidia GPUs
• 80ch 1080P H.264
decoding
• Processing up to
160ch@D1 or
80ch@1080p
High Density GPU
Cluster Server
Back-End GPU Processing Units
CBox - Single GPU
Unit
Spec: 40 nVidia GPUs, <600W, analyze up to
160ch@D1 or 80ch@1080p in real time
Applications: ITS, crowd management, IVS in
various industries
Release date: Q3,2016
Back-End GPU Processing Units - StarNet I
• IOT+Computer Vision, Where Are the
Applications ?
• Embedded CV Hardware
• GPU, VPU, and FPGA
DNN Speed on TK1, MA2450
• Nvidia TK1: 120ms/frame
<12W
• Movidius MA2450: 140ms/frame
<1.5W
Nvidia Tegra K1: CNN Implementation
• GPU for detection (relatively low frequency) and CPU for tracking
• Memory footprint is optimized via buffer sharing and TK1's unified mem mechanism
• Maximize CPU & GPU utilization via nvidia asynchronous ops and streams.
• cuDnn library for general layers
• Non-standard layers are implemented based on fine-tuned kernels
• 1x1 convolution, Balance between MACs & accuracy
• Balance between depth & width, depth for more representative power
Movidius Implementation
• fp16 is used with no accuracy loss
• Net architecture is tuned based on depth, width, kernel size
• Convolution/bias/relu/pooling -> combined layer
• All combined layer operations run in the on-chip CMX memory
• DDR and CMX exchange data when a combined layer is completed
• Implement 2D convolution in assembly kernel
• Bias, relu and pooling are done via processor intrinsics
• Make full use of the underlying “SIMD” shave architecture
Movidius Implementation: Inter-shave Task Parallelism
• Output feature map oriented strategy
Put each shave in charge of several output feature maps, with load balanced
among all shaves
Input feature map oriented strategy
each shave processor could take charge of “a band” of input
feature maps, and compute all output channels of that spatial
“band”
The above strategies are employed according to each layer's specific
configurations, to minimize the amount of data transferred.
FPGA and DNN - Pottwal Project
FPGA and DNN - Pottwal Project
The detection of neural
network (Faster_RCNN)The detection of neural network (Faster_RCNN)
Softmax, NMS, Coordinate inversion and so on
Most of the
computation :
Image(RGB) CONV Layers
ROI
projection
Detection
result
Region Proposal
Network (RPN)
FAST
FPGA and DNN - Pottwal Project
CONV Layers
ROI
projection
Region Proposal
Network (RPN)
FAST
Interface
External memory interconnect
• Global pipeline
• Ping-Pong
• Reduced data interaction
• SIMD
• Int8
Design Features
FPGA and DNN - Pottwal Project
Performance
• Up to 8 channels of 1080p@30 detection
• Effective performance :1.2T ops
• PE computational efficiency :87.2%
• Latency:11.5ms
Platform Performance(Effective ) Power Performance per Watt
Our FPGA Platform 1.2T ops 7W 171.4G ops/W
NVIDIA TX1 220G ops 10W 22G ops/W
NVIDIA TK1 55G ops 10W 5.5G ops/W
Movidius MA2450 40G ops 1.5W 27G ops/W
Vision without Limits!
Vion Technologies Co., Ltd.
Vion Core Team
Hai Tao, Dr., Founder&CEO
Tsinghua Univ. BS'91, MS'93; UIUC
PdD'99; Sarnoff 99-01; UCSC
Assoc.&Tenured Prof. 01-10. US NSF
2004 Young Career Award. Pulished
150+ papers in CV, 10+ US patents.
Jun Song, CTO
Tsinghua Univ. Math, BS'01, MS'04;
Responsible for all R&D work.
Leads the smart traffic product core
development & hardware system design.
Xiang Zheng, Director, ITS
Tsinghua, CS, BS'01, MS'04; CV
algorithm expert; data department
manager; Rich vision product
experience.
Fan Yang, Director, Smart Counting
Tsinghua Univ. EE, BS'03, MS'06;
Manager: business intelligence group;
Manager: smart counting product line.
Yu Lin, Director, Vision System
Tsinghua Univ. AE, BS'03, MS'06;
Manager: smart city product line;
Manager: face recognition and
intelligent video analysis group.
Tianshu Wang, Product Director
Xian Jiaotong Univ. BS'93,PhD'03;
Microsoft Research 97-03, IBM
Research 03-10; Lenovo Research 10-16,
joined Vion in 2016.
Embedded CNN Structure
• decrease the model size, less than 1 million params
• limit the complextity to 1.5GMAC, < 2% of VGG
Embedded CNN Performance
• Detection Rate >89% (FDDB)
• 5% lower than VGG ( 0.2FP/frame)
• Face detection scale from 20 pixels to 400 pixels
• Detection Rate >83% for real unconstrained local scenarios
(illumination, expression, occlusion, pose)

More Related Content

PDF
Yangqing Jia at AI Frontiers: Towards Better DL Frameworks
PDF
Soumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning World
PDF
Applying Deep Learning Vision Technology to low-cost/power Embedded Systems
PDF
"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from Movidius
PDF
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
PDF
"How to Test and Validate an Automated Driving System," a Presentation from M...
PDF
“Once-for-All DNNs: Simplifying Design of Efficient Models for Diverse Hardwa...
PDF
GTC Taiwan 2017 企業端深度學習與人工智慧應用
Yangqing Jia at AI Frontiers: Towards Better DL Frameworks
Soumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning World
Applying Deep Learning Vision Technology to low-cost/power Embedded Systems
"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from Movidius
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
"How to Test and Validate an Automated Driving System," a Presentation from M...
“Once-for-All DNNs: Simplifying Design of Efficient Models for Diverse Hardwa...
GTC Taiwan 2017 企業端深度學習與人工智慧應用

What's hot (20)

PDF
Metaflow: The ML Infrastructure at Netflix
PDF
"Approaches for Vision-based Driver Monitoring," a Presentation from PathPart...
PPTX
Squeezing Deep Learning Into Mobile Phones
PDF
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
PPTX
Intel and Amazon - Powering your innovation together.
PPTX
The deep learning tour - Q1 2017
PDF
IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...
PPTX
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
PDF
Deep Learning on Everyday Devices
PDF
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
PDF
AICamp - Dr Ramine Tinati - Making Computer Vision Real
PPTX
Deep learning on mobile - 2019 Practitioner's Guide
PDF
IS-4003, A Cloud Based Medical Imaging Platform Using APU, by Kovey Kovalan
PDF
Improving Hardware Efficiency for DNN Applications
PPTX
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
PDF
"Emerging Processor Architectures for Deep Learning: Options and Trade-offs,"...
PDF
Fascinating Tales of a Strange Tomorrow
PPT
Threading Successes 01 Intro
PDF
"Deep Learning for Manufacturing Inspection Applications," a Presentation fro...
PDF
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
Metaflow: The ML Infrastructure at Netflix
"Approaches for Vision-based Driver Monitoring," a Presentation from PathPart...
Squeezing Deep Learning Into Mobile Phones
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
Intel and Amazon - Powering your innovation together.
The deep learning tour - Q1 2017
IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Deep Learning on Everyday Devices
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
AICamp - Dr Ramine Tinati - Making Computer Vision Real
Deep learning on mobile - 2019 Practitioner's Guide
IS-4003, A Cloud Based Medical Imaging Platform Using APU, by Kovey Kovalan
Improving Hardware Efficiency for DNN Applications
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
"Emerging Processor Architectures for Deep Learning: Options and Trade-offs,"...
Fascinating Tales of a Strange Tomorrow
Threading Successes 01 Intro
"Deep Learning for Manufacturing Inspection Applications," a Presentation fro...
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
Ad

Viewers also liked (20)

PDF
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
PDF
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
PDF
Intelligent Chatbot on WeChat
PDF
Adam Coates at AI Frontiers: AI for 100 Million People with Deep Learning
PDF
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
PDF
Charles Fan at AI Frontiers: The New Era of AI Plus
PDF
Junli Gu at AI Frontiers: Autonomous Driving Revolution
PDF
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
PDF
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
PDF
Jisheng Wang at AI Frontiers: Deep Learning in Security
PDF
James Manyika at AI Frontiers: A Future That Works: Automation, Employment, a...
PDF
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
PDF
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
PDF
Scaling Deep Learning with MXNet
PDF
Saket Saurabh at AI Frontiers: Data Operations or: How I Learned to Stop Data...
PDF
Liu Ren at AI Frontiers: Sensor-aware Augmented Reality
PDF
Nikko Ström at AI Frontiers: Deep Learning in Alexa
PDF
Lukasz Kaiser at AI Frontiers: How Deep Learning Quietly Revolutionized NLP
PPTX
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
PPTX
Technology and AI sharing - From 2016 to Y2017 and Beyond
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Intelligent Chatbot on WeChat
Adam Coates at AI Frontiers: AI for 100 Million People with Deep Learning
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Charles Fan at AI Frontiers: The New Era of AI Plus
Junli Gu at AI Frontiers: Autonomous Driving Revolution
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Jisheng Wang at AI Frontiers: Deep Learning in Security
James Manyika at AI Frontiers: A Future That Works: Automation, Employment, a...
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Scaling Deep Learning with MXNet
Saket Saurabh at AI Frontiers: Data Operations or: How I Learned to Stop Data...
Liu Ren at AI Frontiers: Sensor-aware Augmented Reality
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Lukasz Kaiser at AI Frontiers: How Deep Learning Quietly Revolutionized NLP
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
Technology and AI sharing - From 2016 to Y2017 and Beyond
Ad

Similar to Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System (20)

PPTX
Practices of AI guided traffic analysis
PDF
"Is Vision the New Wireless?," a Presentation from Qualcomm
PDF
Imaging automotive 2015 addfor v002
PDF
Imaging automotive 2015 addfor v002
PDF
"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...
PDF
"Achieving High-Performance Vision Processing for Embedded Applications with ...
PDF
“Autonomous Driving AI Workloads: Technology Trends and Optimization Strategi...
PDF
"The Vision AI Start-ups That Matter Most," a Presentation from Cognite Ventures
PDF
Deep Convolutional Neural Network acceleration on the Intel Xeon Phi
PDF
Deep Convolutional Network evaluation on the Intel Xeon Phi
PDF
Thesis Report - Gaurav Raina MSc ES - v2
PPTX
SEMINAR.EEEGFDFGFRDESTUTDRDFTRSYGTT.pptx
PDF
"Memory Innovation for Embedded Vision Systems," a Presentation from Samsung ...
PPTX
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
PPTX
Computer vision - Applications and Trends
PPTX
Presentation1ubv.pptx
PDF
Inside Microsoft's FPGA-Based Configurable Cloud
PDF
ICS1020CV_2022.pdf
PDF
Computer architecture for vision system
PPTX
Computer architecture for vision systems
Practices of AI guided traffic analysis
"Is Vision the New Wireless?," a Presentation from Qualcomm
Imaging automotive 2015 addfor v002
Imaging automotive 2015 addfor v002
"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...
"Achieving High-Performance Vision Processing for Embedded Applications with ...
“Autonomous Driving AI Workloads: Technology Trends and Optimization Strategi...
"The Vision AI Start-ups That Matter Most," a Presentation from Cognite Ventures
Deep Convolutional Neural Network acceleration on the Intel Xeon Phi
Deep Convolutional Network evaluation on the Intel Xeon Phi
Thesis Report - Gaurav Raina MSc ES - v2
SEMINAR.EEEGFDFGFRDESTUTDRDFTRSYGTT.pptx
"Memory Innovation for Embedded Vision Systems," a Presentation from Samsung ...
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Computer vision - Applications and Trends
Presentation1ubv.pptx
Inside Microsoft's FPGA-Based Configurable Cloud
ICS1020CV_2022.pdf
Computer architecture for vision system
Computer architecture for vision systems

More from AI Frontiers (20)

PPTX
Divya Jain at AI Frontiers : Video Summarization
PPTX
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
PDF
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
PDF
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
PDF
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
PDF
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
PDF
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
PDF
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
PDF
Percy Liang at AI Frontiers : Pushing the Limits of Machine Learning
PDF
Ilya Sutskever at AI Frontiers : Progress towards the OpenAI mission
PDF
Mark Moore at AI Frontiers : Uber Elevate
PPTX
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
PPTX
Arnaud Thiercelin at AI Frontiers : AI in the Sky
PPTX
Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...
PPTX
Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Set...
PPTX
Sumit Gupta at AI Frontiers : AI for Enterprise
PPTX
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
PPTX
Alex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
PPTX
Long Lin at AI Frontiers : AI in Gaming
PDF
Melissa Goldman at AI Frontiers : AI & Finance
Divya Jain at AI Frontiers : Video Summarization
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Percy Liang at AI Frontiers : Pushing the Limits of Machine Learning
Ilya Sutskever at AI Frontiers : Progress towards the OpenAI mission
Mark Moore at AI Frontiers : Uber Elevate
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
Arnaud Thiercelin at AI Frontiers : AI in the Sky
Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...
Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Set...
Sumit Gupta at AI Frontiers : AI for Enterprise
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Alex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
Long Lin at AI Frontiers : AI in Gaming
Melissa Goldman at AI Frontiers : AI & Finance

Recently uploaded (20)

PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
PDF
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
PDF
Chapter 2 Digital Image Fundamentals.pdf
PDF
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
PDF
Google’s NotebookLM Unveils Video Overviews
PDF
Event Presentation Google Cloud Next Extended 2025
PDF
Reimagining Insurance: Connected Data for Confident Decisions.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
PPTX
Web Security: Login Bypass, SQLi, CSRF & XSS.pptx
PDF
Sensors and Actuators in IoT Systems using pdf
PDF
Smarter Business Operations Powered by IoT Remote Monitoring
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
KodekX | Application Modernization Development
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PDF
Transforming Manufacturing operations through Intelligent Integrations
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
PDF
Dell Pro 14 Plus: Be better prepared for what’s coming
PDF
CIFDAQ's Teaching Thursday: Moving Averages Made Simple
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
Chapter 2 Digital Image Fundamentals.pdf
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
Google’s NotebookLM Unveils Video Overviews
Event Presentation Google Cloud Next Extended 2025
Reimagining Insurance: Connected Data for Confident Decisions.pdf
Modernizing your data center with Dell and AMD
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Web Security: Login Bypass, SQLi, CSRF & XSS.pptx
Sensors and Actuators in IoT Systems using pdf
Smarter Business Operations Powered by IoT Remote Monitoring
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
GamePlan Trading System Review: Professional Trader's Honest Take
KodekX | Application Modernization Development
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
Transforming Manufacturing operations through Intelligent Integrations
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Dell Pro 14 Plus: Be better prepared for what’s coming
CIFDAQ's Teaching Thursday: Moving Averages Made Simple

Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System

  • 1. Deep Learning for Embedded Vision System Vion Technologies Co., Ltd. Jan. 11th, 2017 Hai Tao, Dr. Credits to all my colleagures who make this presntation possible
  • 2. Essentials Founded in 2005 2014.12 Series A Funding 2016 NEEQ/ Series B Funding Vion Technologies: A Leader in the Field of Computer Vision • Vion Technologies Co. Ltd., founded in 2005, currently employs 200+ talented staffs. The company is developing CV HW/SW total solutions for intelligent transportation systems (ITS), smart video surveillance systems and business intelligence systems. • Huge potential for CV products in ToB markets • Every year more than 40 million surveillance cameras are sold globally (IDC data analysis) • High resolution (720p, 1080p, even 4K resolution) IP cameras are replacing the D1 resolution analog cameras • Better algorithms enable more applications in ToB applications • High performance, low power consumption, low cost processors are available
  • 3. • IOT+Computer Vision, Where Are the Applications ? • Embedded CV Hardware • GPU, VPU, and FPGA
  • 4. Intersection violation capture & smart plate number recognition & light control Transit & emergency vehicle lane use capture Parking Violation Capture Smart Traffic Smart parking management Industry Applications
  • 5. Malls Retail Stores Theatre People CountingSubway People Counting Cultural Attraction Guest Traffic Transit People Counting Smart People Counting Industry Applications
  • 6. Security & Counterterrorism: Fighting Security & Counterterrorism: Chasing Rail: Driver Fatigue Intruder Alerts Banking: ATM Protection Mining: Production Safety Public Security, City Management, Banking, Rail, Border Control and Many More ... Industry Applications
  • 7. • IOT+Computer Vision, Where Are the Applications ? • Embedded CV Hardware • GPU, VPU, and FPGA
  • 8. • Integrated image sensing and analysis • Wifi probe & iBeacon • POE powered • Patented exterior design, screw free installation • H.264 real-time video output • 2-year data storage • Integrated image sensing and analysis • RS485, GPIO • Patented exterior design specially for transit • H.264real-time video output • 2-year data storage • IP65, sealed against dust & water • Sensor rich (multi-axis/temp) • 3/6/8MP 25fps • High performance platform • 3G/4G/WIFI • Smart traffic industry Sensor Rich ITS Camera Smart Traffic Camera Bus People Counting Smart Cameras
  • 9. Spec: 4K resolution 4/3' CCD, Ambarella processor, Xilinx FPGA module Applications: ePolice at road intersections, covering 4 lanes. The first 4K@25fps ePolice in the world Release data: 2016 Q3 Smart Cameras - 4K ePolice Camera
  • 10. Spec: ARM processor, compact form format Applications: People counting for Shopping malls and retail stores. Release Date: 2016 Q3 Smart Cameras - People Counting
  • 11. Tarsier I Module - A Step to Smart Edge Device Low Power <1.5W Video & Audio Interface camera, other processors Low Cost <$15 Multi-Modal IC Technology 28nm low power High Performance Deep Learning(CNN, Recurrent DNN) > 40GFLOPS Front-End AI Module - Tarsier I Quick Time-to-Market 16'Q3
  • 12. Smart Cameras Design- Bus Traffic Counting
  • 13. • GPU platform, 300 GFLOPS • Analog/IP Video Input • 2.5 inch hardrive & EMMC • USB3.0, dual gigabyte LAN • High performance, 300 GFLOPS • 4 3.5” hard drives • USB 3.0 • Dual gigabyte network ports Front End Control Terminal • Dual GPU, 600 GFLOPS • 8 analog video & audio input • Hard drive & EMMC storage • 4 alarm in, 2 out Smart Video & Audio Analysis Terminal • 40 nVidia GPUs • 80ch 1080P H.264 decoding • Processing up to 160ch@D1 or 80ch@1080p High Density GPU Cluster Server Back-End GPU Processing Units CBox - Single GPU Unit
  • 14. Spec: 40 nVidia GPUs, <600W, analyze up to 160ch@D1 or 80ch@1080p in real time Applications: ITS, crowd management, IVS in various industries Release date: Q3,2016 Back-End GPU Processing Units - StarNet I
  • 15. • IOT+Computer Vision, Where Are the Applications ? • Embedded CV Hardware • GPU, VPU, and FPGA
  • 16. DNN Speed on TK1, MA2450 • Nvidia TK1: 120ms/frame <12W • Movidius MA2450: 140ms/frame <1.5W
  • 17. Nvidia Tegra K1: CNN Implementation • GPU for detection (relatively low frequency) and CPU for tracking • Memory footprint is optimized via buffer sharing and TK1's unified mem mechanism • Maximize CPU & GPU utilization via nvidia asynchronous ops and streams. • cuDnn library for general layers • Non-standard layers are implemented based on fine-tuned kernels • 1x1 convolution, Balance between MACs & accuracy • Balance between depth & width, depth for more representative power
  • 18. Movidius Implementation • fp16 is used with no accuracy loss • Net architecture is tuned based on depth, width, kernel size • Convolution/bias/relu/pooling -> combined layer • All combined layer operations run in the on-chip CMX memory • DDR and CMX exchange data when a combined layer is completed • Implement 2D convolution in assembly kernel • Bias, relu and pooling are done via processor intrinsics • Make full use of the underlying “SIMD” shave architecture
  • 19. Movidius Implementation: Inter-shave Task Parallelism • Output feature map oriented strategy Put each shave in charge of several output feature maps, with load balanced among all shaves Input feature map oriented strategy each shave processor could take charge of “a band” of input feature maps, and compute all output channels of that spatial “band” The above strategies are employed according to each layer's specific configurations, to minimize the amount of data transferred.
  • 20. FPGA and DNN - Pottwal Project
  • 21. FPGA and DNN - Pottwal Project The detection of neural network (Faster_RCNN)The detection of neural network (Faster_RCNN) Softmax, NMS, Coordinate inversion and so on Most of the computation : Image(RGB) CONV Layers ROI projection Detection result Region Proposal Network (RPN) FAST
  • 22. FPGA and DNN - Pottwal Project CONV Layers ROI projection Region Proposal Network (RPN) FAST Interface External memory interconnect • Global pipeline • Ping-Pong • Reduced data interaction • SIMD • Int8 Design Features
  • 23. FPGA and DNN - Pottwal Project Performance • Up to 8 channels of 1080p@30 detection • Effective performance :1.2T ops • PE computational efficiency :87.2% • Latency:11.5ms Platform Performance(Effective ) Power Performance per Watt Our FPGA Platform 1.2T ops 7W 171.4G ops/W NVIDIA TX1 220G ops 10W 22G ops/W NVIDIA TK1 55G ops 10W 5.5G ops/W Movidius MA2450 40G ops 1.5W 27G ops/W
  • 24. Vision without Limits! Vion Technologies Co., Ltd.
  • 25. Vion Core Team Hai Tao, Dr., Founder&CEO Tsinghua Univ. BS'91, MS'93; UIUC PdD'99; Sarnoff 99-01; UCSC Assoc.&Tenured Prof. 01-10. US NSF 2004 Young Career Award. Pulished 150+ papers in CV, 10+ US patents. Jun Song, CTO Tsinghua Univ. Math, BS'01, MS'04; Responsible for all R&D work. Leads the smart traffic product core development & hardware system design. Xiang Zheng, Director, ITS Tsinghua, CS, BS'01, MS'04; CV algorithm expert; data department manager; Rich vision product experience. Fan Yang, Director, Smart Counting Tsinghua Univ. EE, BS'03, MS'06; Manager: business intelligence group; Manager: smart counting product line. Yu Lin, Director, Vision System Tsinghua Univ. AE, BS'03, MS'06; Manager: smart city product line; Manager: face recognition and intelligent video analysis group. Tianshu Wang, Product Director Xian Jiaotong Univ. BS'93,PhD'03; Microsoft Research 97-03, IBM Research 03-10; Lenovo Research 10-16, joined Vion in 2016.
  • 26. Embedded CNN Structure • decrease the model size, less than 1 million params • limit the complextity to 1.5GMAC, < 2% of VGG
  • 27. Embedded CNN Performance • Detection Rate >89% (FDDB) • 5% lower than VGG ( 0.2FP/frame) • Face detection scale from 20 pixels to 400 pixels • Detection Rate >83% for real unconstrained local scenarios (illumination, expression, occlusion, pose)