Pelee: a real time object detection system on mobile devices Paper Review
This document summarizes the Pelee object detection system which uses the PeleeNet efficient feature extraction network for real-time object detection on mobile devices. PeleeNet improves on DenseNet with two-way dense layers, a stem block, dynamic bottleneck layers, and transition layers without compression. Pelee uses SSD with PeleeNet, selecting fewer feature maps and adding residual prediction blocks for faster, more accurate detection compared to SSD and YOLO. The document concludes that PeleeNet and Pelee achieve real-time classification and detection on devices, outperforming existing models in speed, cost and accuracy with simple code.
Pelee: a real time object detection system on mobile devices Paper Review
1.
Intelligence Machine VisionLab
Strictly Confidential
Pelee: A Real-Time Object Detection System on
Mobile Devices 리뷰
수아랩 이호성
2.
2Type A-3
Contents
• Introduction
•Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
3.
3Type A-3
Contents
• Introduction
•Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
4.
4Type A-3
Introduction
• Increasingneed of running CNN on mobile devices
• Limited computing power and memory resource
• Ex) Drone, Smart Camera, Smart Phone..
• A number of efficient oriented CNN have been proposed
• MobileNet, ShuffleNet, and MobileNet V2 → heavily dependent on depthwise separable convolution
• Pelee only use conventional convolution instead
• Pelee can be used for both classification and object detection!
Inefficient implementation..
PeleeNet Pelee
5.
5Type A-3
Contents
• Introduction
•Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
6.
6Type A-3
Related Works
MobileNet,2017 arXiv
• Depthwise Separable Convolution
Fig from https://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/
https://arxiv.org/pdf/1704.04861.pdf
7.
7Type A-3
Related Works
ShuffleNet,2017 arXiv
• Depthwise Separable Convolution
• Pointwise Group Convolution
• Channel Shuffle Operation
https://arxiv.org/pdf/1707.01083.pdf
8.
8Type A-3
Related Works
MobileNetV2, 2018 arXiv
• Depthwise Separable Convolution
• Linear Bottlenecks
• Inverted Residuals
https://arxiv.org/pdf/1801.04381.pdf
9.
9Type A-3
Related Works
ShuffleNetV2, 2018 arXiv
• Equal channel width minimizes memory access cost (balanced convolution)
• Excessive group convolution increases memory access cost
• Network fragmentation reduces degree of parallelism
• Element-wise operation are non-negligible
https://arxiv.org/pdf/1807.11164.pdf
10.
10Type A-3
Related Works
DenseNet,2017 arXiv
• Densely Connected Convolution
• BN-ReLU-Conv 1x1-BN-ReLU-Conv 3x3 bottleneck layer
https://arxiv.org/pdf/1608.06993.pdf
11.
11Type A-3
Related Works
MobileNet,2017 arXiv
ShuffleNet, 2017 arXiv
MobileNet V2, 2018 arXiv
ShuffleNet V2, 2018 arXiv
DenseNet, 2017 arXiv
5편의 논문에 대한 리뷰는 PR-12에서 찾아볼 수 있습니다.
https://www.youtube.com/watch?v=auKdde7Anr8&list=PLWKf9beHi3Tg50UoyTe6rIm20sVQOH1br
https://www.youtube.com/watch?v=FfBp6xJqZVA&list=PLWKf9beHi3TgstcIn8K6dI_85_ppAxzB8
PR12 Season 1
PR12 Season 2
12.
12Type A-3
Contents
• Introduction
•Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
13.
13Type A-3
PeleeNet: anefficient feature extraction network for image classification
• DenseNet variant architecture – PeleeNet
• Key Features
• Two-way Dense Layer
• Stem Block
• Dynamic number of Channels in Bottleneck Layer
• Transition Layer without Compression
• Composite Function
Classification
14.
14Type A-3
PeleeNet: anefficient feature extraction network for image classification
• Two-Way Dense Layer
• Motivated by GoogLeNet, use a 2-way dense layer
• Can get different scales of receptive fields
• Two stacked 3x3 conv → learn visual patterns for large objects
Classification
15.
15Type A-3
PeleeNet: anefficient feature extraction network for image classification
• Stem Block
• Motivated by Inception-v4 and DSOD, use a cost efficient stem block before first dense layer
• Can improve the feature expression ability w/o adding computational cost
Classification
16.
16Type A-3
PeleeNet: anefficient feature extraction network for image classification
• Dynamic number of Channels in Bottleneck Layer
• Varies according to the input shape instead of fixed 4 times of growth rate
• For the first several dense layer, bottleneck layer increases computational cost instead of reducing
Classification
17.
17Type A-3
PeleeNet: anefficient feature extraction network for image classification
• Transition Layer without Compression
• Compression factor proposed by DenseNet can hurts the feature expression
• Keep the number of output channels the same as the number of input channels in transition layer
• Composite Function
• Use conventional post-activation (Conv-BN-ReLU)
• Also add 1x1 conv after the last dense block to get the stronger representational ability
Classification
18.
18Type A-3
PeleeNet: anefficient feature extraction network for image classification
• PeleeNet
• Early stage features are very important for vision tasks
• Premature reducing the feature map size can impair representational ability
PeleeNet architecture
PeleeNet ablation study
Classification
19.
19Type A-3
PeleeNet: anefficient feature extraction network for image classification
• PeleeNet Result
• Achieves a higher accuracy and over 1.8 times faster speed than MobileNet and MobileNetV2 on
NVIDIA TX2 using only 66% of the model size of MobileNet.
• PeleeNet runs 1.8 times faster in FP16 mode than in FP32 mode.
→ Depthwise Separable Convolution is slow in TX2 FP16
Classification
ImageNet Result
Speed on NVIDIA TX2
20.
20Type A-3
Contents
• Introduction
•Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
21.
21Type A-3
Pelee: areal-time object detection system
• SSD + PeleeNet → Pelee detector
• Key Features
• Feature Map Selection
• Residual Prediction Block
• Small Convolutional Kernel for Prediction
Object Detection
Effects of key features
22.
22Type A-3
Pelee: areal-time object detection system
• Feature Map Selection
• SSD with 5 scale feature map (19x19, 10x10, 5x5, 3x3, 1x1)
• Do not use 38x38 feature map to reduce computational cost
Object Detection
SSD architecture
Feature Map Selection
23.
23Type A-3
Pelee: areal-time object detection system
• Feature Map Selection
• SSD with 5 scale feature map (19x19, 10x10, 5x5, 3x3, 1x1) – do not use 38x38
• Residual Prediction Block
• For each feature map, build residual block before conducting prediction
• 1x1 Convolutional Kernel for prediction
Object Detection
24.
24Type A-3
Pelee: areal-time object detection system
• Pelee Result
• PASCAL VOC 2007, COCO 15 benchmark
• Fast, Low Computational Cost, and Accurate than SSD, YOLO
Object Detection
25.
25Type A-3
Contents
• Introduction
•Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
26.
26Type A-3
Conclusion
• DepthwiseSeparable Convolution is not only way to build an efficient model
• PeleeNet and Pelee are built with conventional convolution
• In real devices(iPhone8, Jetson TX2), perform real-time prediction for image
classification and object detection
• Compared to existing model, PeleeNet and Pelee is faster, cheap and accurate!
• And, the code is simple to implement!! So I highly recommend it!!