Pelee: a real time object detection system on mobile devices Paper Review

Intelligence Machine Vision Lab
Strictly Confidential
Pelee: A Real-Time Object Detection System on
Mobile Devices 리뷰
수아랩 이호성

2Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion

3Type A-3
Contents
• Introduction
• Related Works
• Conclusion

4Type A-3
Introduction
• Increasing need of running CNN on mobile devices
• Limited computing power and memory resource
• Ex) Drone, Smart Camera, Smart Phone..
• A number of efficient oriented CNN have been proposed
• MobileNet, ShuffleNet, and MobileNet V2 → heavily dependent on depthwise separable convolution
• Pelee only use conventional convolution instead
• Pelee can be used for both classification and object detection!
Inefficient implementation..
PeleeNet Pelee

5Type A-3
Contents
• Introduction
• Related Works
• Conclusion

6Type A-3
Related Works
MobileNet, 2017 arXiv
• Depthwise Separable Convolution
Fig from https://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/
https://arxiv.org/pdf/1704.04861.pdf

7Type A-3
Related Works
ShuffleNet, 2017 arXiv
• Pointwise Group Convolution
• Channel Shuffle Operation

8Type A-3
Related Works
MobileNet V2, 2018 arXiv
• Linear Bottlenecks
• Inverted Residuals

9Type A-3
Related Works
ShuffleNet V2, 2018 arXiv
• Equal channel width minimizes memory access cost (balanced convolution)
• Excessive group convolution increases memory access cost
• Network fragmentation reduces degree of parallelism
• Element-wise operation are non-negligible

10Type A-3
Related Works
DenseNet, 2017 arXiv
• Densely Connected Convolution
• BN-ReLU-Conv 1x1-BN-ReLU-Conv 3x3 bottleneck layer

11Type A-3
Related Works
MobileNet, 2017 arXiv
ShuffleNet, 2017 arXiv
MobileNet V2, 2018 arXiv
ShuffleNet V2, 2018 arXiv
DenseNet, 2017 arXiv
5편의 논문에 대한 리뷰는 PR-12에서 찾아볼 수 있습니다.
https://www.youtube.com/watch?v=auKdde7Anr8&list=PLWKf9beHi3Tg50UoyTe6rIm20sVQOH1br
https://www.youtube.com/watch?v=FfBp6xJqZVA&list=PLWKf9beHi3TgstcIn8K6dI_85_ppAxzB8
PR12 Season 1
PR12 Season 2

12Type A-3
Contents
• Introduction
• Related Works
• Conclusion

13Type A-3
PeleeNet: an efficient feature extraction network for image classification
• DenseNet variant architecture – PeleeNet
• Key Features
• Two-way Dense Layer
• Stem Block
• Dynamic number of Channels in Bottleneck Layer
• Transition Layer without Compression
• Composite Function
Classification

14Type A-3
• Two-Way Dense Layer
• Motivated by GoogLeNet, use a 2-way dense layer
• Can get different scales of receptive fields
• Two stacked 3x3 conv → learn visual patterns for large objects
Classification

15Type A-3
• Stem Block
• Motivated by Inception-v4 and DSOD, use a cost efficient stem block before first dense layer
• Can improve the feature expression ability w/o adding computational cost
Classification

16Type A-3
• Dynamic number of Channels in Bottleneck Layer
• Varies according to the input shape instead of fixed 4 times of growth rate
• For the first several dense layer, bottleneck layer increases computational cost instead of reducing
Classification

17Type A-3
• Transition Layer without Compression
• Compression factor proposed by DenseNet can hurts the feature expression
• Keep the number of output channels the same as the number of input channels in transition layer
• Composite Function
• Use conventional post-activation (Conv-BN-ReLU)
• Also add 1x1 conv after the last dense block to get the stronger representational ability
Classification

18Type A-3
• PeleeNet
• Early stage features are very important for vision tasks
• Premature reducing the feature map size can impair representational ability
PeleeNet architecture
PeleeNet ablation study
Classification

19Type A-3
• PeleeNet Result
• Achieves a higher accuracy and over 1.8 times faster speed than MobileNet and MobileNetV2 on
NVIDIA TX2 using only 66% of the model size of MobileNet.
• PeleeNet runs 1.8 times faster in FP16 mode than in FP32 mode.
→ Depthwise Separable Convolution is slow in TX2 FP16
Classification
ImageNet Result
Speed on NVIDIA TX2

20Type A-3
Contents
• Introduction
• Related Works
• Conclusion

21Type A-3
Pelee: a real-time object detection system
• SSD + PeleeNet → Pelee detector
• Key Features
• Feature Map Selection
• Residual Prediction Block
• Small Convolutional Kernel for Prediction
Object Detection
Effects of key features

22Type A-3
• SSD with 5 scale feature map (19x19, 10x10, 5x5, 3x3, 1x1)
• Do not use 38x38 feature map to reduce computational cost
Object Detection
SSD architecture
Feature Map Selection

23Type A-3
• SSD with 5 scale feature map (19x19, 10x10, 5x5, 3x3, 1x1) – do not use 38x38
• Residual Prediction Block
• For each feature map, build residual block before conducting prediction
• 1x1 Convolutional Kernel for prediction
Object Detection

24Type A-3
• Pelee Result
• PASCAL VOC 2007, COCO 15 benchmark
• Fast, Low Computational Cost, and Accurate than SSD, YOLO
Object Detection

25Type A-3
Contents
• Introduction
• Related Works
• Conclusion

26Type A-3
Conclusion
• Depthwise Separable Convolution is not only way to build an efficient model
• PeleeNet and Pelee are built with conventional convolution
• In real devices(iPhone8, Jetson TX2), perform real-time prediction for image
classification and object detection
• Compared to existing model, PeleeNet and Pelee is faster, cheap and accurate!
• And, the code is simple to implement!! So I highly recommend it!!

Pelee: a real time object detection system on mobile devices Paper Review

In this document

More Related Content

Similar to Pelee: a real time object detection system on mobile devices Paper Review

More from LEE HOSEONG

Recently uploaded

Pelee: a real time object detection system on mobile devices Paper Review