Intelligence Machine Vision Lab
Strictly Confidential
Pelee: A Real-Time Object Detection System on
Mobile Devices 리뷰
수아랩 이호성
2Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
3Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
4Type A-3
Introduction
• Increasing need of running CNN on mobile devices
• Limited computing power and memory resource
• Ex) Drone, Smart Camera, Smart Phone..
• A number of efficient oriented CNN have been proposed
• MobileNet, ShuffleNet, and MobileNet V2 → heavily dependent on depthwise separable convolution
• Pelee only use conventional convolution instead
• Pelee can be used for both classification and object detection!
Inefficient implementation..
PeleeNet Pelee
5Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
6Type A-3
Related Works
MobileNet, 2017 arXiv
• Depthwise Separable Convolution
Fig from https://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/
https://arxiv.org/pdf/1704.04861.pdf
7Type A-3
Related Works
ShuffleNet, 2017 arXiv
• Depthwise Separable Convolution
• Pointwise Group Convolution
• Channel Shuffle Operation
https://arxiv.org/pdf/1707.01083.pdf
8Type A-3
Related Works
MobileNet V2, 2018 arXiv
• Depthwise Separable Convolution
• Linear Bottlenecks
• Inverted Residuals
https://arxiv.org/pdf/1801.04381.pdf
9Type A-3
Related Works
ShuffleNet V2, 2018 arXiv
• Equal channel width minimizes memory access cost (balanced convolution)
• Excessive group convolution increases memory access cost
• Network fragmentation reduces degree of parallelism
• Element-wise operation are non-negligible
https://arxiv.org/pdf/1807.11164.pdf
10Type A-3
Related Works
DenseNet, 2017 arXiv
• Densely Connected Convolution
• BN-ReLU-Conv 1x1-BN-ReLU-Conv 3x3 bottleneck layer
https://arxiv.org/pdf/1608.06993.pdf
11Type A-3
Related Works
MobileNet, 2017 arXiv
ShuffleNet, 2017 arXiv
MobileNet V2, 2018 arXiv
ShuffleNet V2, 2018 arXiv
DenseNet, 2017 arXiv
5편의 논문에 대한 리뷰는 PR-12에서 찾아볼 수 있습니다.
https://www.youtube.com/watch?v=auKdde7Anr8&list=PLWKf9beHi3Tg50UoyTe6rIm20sVQOH1br
https://www.youtube.com/watch?v=FfBp6xJqZVA&list=PLWKf9beHi3TgstcIn8K6dI_85_ppAxzB8
PR12 Season 1
PR12 Season 2
12Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
13Type A-3
PeleeNet: an efficient feature extraction network for image classification
• DenseNet variant architecture – PeleeNet
• Key Features
• Two-way Dense Layer
• Stem Block
• Dynamic number of Channels in Bottleneck Layer
• Transition Layer without Compression
• Composite Function
Classification
14Type A-3
PeleeNet: an efficient feature extraction network for image classification
• Two-Way Dense Layer
• Motivated by GoogLeNet, use a 2-way dense layer
• Can get different scales of receptive fields
• Two stacked 3x3 conv → learn visual patterns for large objects
Classification
15Type A-3
PeleeNet: an efficient feature extraction network for image classification
• Stem Block
• Motivated by Inception-v4 and DSOD, use a cost efficient stem block before first dense layer
• Can improve the feature expression ability w/o adding computational cost
Classification
16Type A-3
PeleeNet: an efficient feature extraction network for image classification
• Dynamic number of Channels in Bottleneck Layer
• Varies according to the input shape instead of fixed 4 times of growth rate
• For the first several dense layer, bottleneck layer increases computational cost instead of reducing
Classification
17Type A-3
PeleeNet: an efficient feature extraction network for image classification
• Transition Layer without Compression
• Compression factor proposed by DenseNet can hurts the feature expression
• Keep the number of output channels the same as the number of input channels in transition layer
• Composite Function
• Use conventional post-activation (Conv-BN-ReLU)
• Also add 1x1 conv after the last dense block to get the stronger representational ability
Classification
18Type A-3
PeleeNet: an efficient feature extraction network for image classification
• PeleeNet
• Early stage features are very important for vision tasks
• Premature reducing the feature map size can impair representational ability
PeleeNet architecture
PeleeNet ablation study
Classification
19Type A-3
PeleeNet: an efficient feature extraction network for image classification
• PeleeNet Result
• Achieves a higher accuracy and over 1.8 times faster speed than MobileNet and MobileNetV2 on
NVIDIA TX2 using only 66% of the model size of MobileNet.
• PeleeNet runs 1.8 times faster in FP16 mode than in FP32 mode.
→ Depthwise Separable Convolution is slow in TX2 FP16
Classification
ImageNet Result
Speed on NVIDIA TX2
20Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
21Type A-3
Pelee: a real-time object detection system
• SSD + PeleeNet → Pelee detector
• Key Features
• Feature Map Selection
• Residual Prediction Block
• Small Convolutional Kernel for Prediction
Object Detection
Effects of key features
22Type A-3
Pelee: a real-time object detection system
• Feature Map Selection
• SSD with 5 scale feature map (19x19, 10x10, 5x5, 3x3, 1x1)
• Do not use 38x38 feature map to reduce computational cost
Object Detection
SSD architecture
Feature Map Selection
23Type A-3
Pelee: a real-time object detection system
• Feature Map Selection
• SSD with 5 scale feature map (19x19, 10x10, 5x5, 3x3, 1x1) – do not use 38x38
• Residual Prediction Block
• For each feature map, build residual block before conducting prediction
• 1x1 Convolutional Kernel for prediction
Object Detection
24Type A-3
Pelee: a real-time object detection system
• Pelee Result
• PASCAL VOC 2007, COCO 15 benchmark
• Fast, Low Computational Cost, and Accurate than SSD, YOLO
Object Detection
25Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
26Type A-3
Conclusion
• Depthwise Separable Convolution is not only way to build an efficient model
• PeleeNet and Pelee are built with conventional convolution
• In real devices(iPhone8, Jetson TX2), perform real-time prediction for image
classification and object detection
• Compared to existing model, PeleeNet and Pelee is faster, cheap and accurate!
• And, the code is simple to implement!! So I highly recommend it!!
Thank you

Pelee: a real time object detection system on mobile devices Paper Review

  • 1.
    Intelligence Machine VisionLab Strictly Confidential Pelee: A Real-Time Object Detection System on Mobile Devices 리뷰 수아랩 이호성
  • 2.
    2Type A-3 Contents • Introduction •Related Works • PeleeNet: an efficient feature extraction network for image classification • Pelee: a real-time object detection system • Conclusion
  • 3.
    3Type A-3 Contents • Introduction •Related Works • PeleeNet: an efficient feature extraction network for image classification • Pelee: a real-time object detection system • Conclusion
  • 4.
    4Type A-3 Introduction • Increasingneed of running CNN on mobile devices • Limited computing power and memory resource • Ex) Drone, Smart Camera, Smart Phone.. • A number of efficient oriented CNN have been proposed • MobileNet, ShuffleNet, and MobileNet V2 → heavily dependent on depthwise separable convolution • Pelee only use conventional convolution instead • Pelee can be used for both classification and object detection! Inefficient implementation.. PeleeNet Pelee
  • 5.
    5Type A-3 Contents • Introduction •Related Works • PeleeNet: an efficient feature extraction network for image classification • Pelee: a real-time object detection system • Conclusion
  • 6.
    6Type A-3 Related Works MobileNet,2017 arXiv • Depthwise Separable Convolution Fig from https://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/ https://arxiv.org/pdf/1704.04861.pdf
  • 7.
    7Type A-3 Related Works ShuffleNet,2017 arXiv • Depthwise Separable Convolution • Pointwise Group Convolution • Channel Shuffle Operation https://arxiv.org/pdf/1707.01083.pdf
  • 8.
    8Type A-3 Related Works MobileNetV2, 2018 arXiv • Depthwise Separable Convolution • Linear Bottlenecks • Inverted Residuals https://arxiv.org/pdf/1801.04381.pdf
  • 9.
    9Type A-3 Related Works ShuffleNetV2, 2018 arXiv • Equal channel width minimizes memory access cost (balanced convolution) • Excessive group convolution increases memory access cost • Network fragmentation reduces degree of parallelism • Element-wise operation are non-negligible https://arxiv.org/pdf/1807.11164.pdf
  • 10.
    10Type A-3 Related Works DenseNet,2017 arXiv • Densely Connected Convolution • BN-ReLU-Conv 1x1-BN-ReLU-Conv 3x3 bottleneck layer https://arxiv.org/pdf/1608.06993.pdf
  • 11.
    11Type A-3 Related Works MobileNet,2017 arXiv ShuffleNet, 2017 arXiv MobileNet V2, 2018 arXiv ShuffleNet V2, 2018 arXiv DenseNet, 2017 arXiv 5편의 논문에 대한 리뷰는 PR-12에서 찾아볼 수 있습니다. https://www.youtube.com/watch?v=auKdde7Anr8&list=PLWKf9beHi3Tg50UoyTe6rIm20sVQOH1br https://www.youtube.com/watch?v=FfBp6xJqZVA&list=PLWKf9beHi3TgstcIn8K6dI_85_ppAxzB8 PR12 Season 1 PR12 Season 2
  • 12.
    12Type A-3 Contents • Introduction •Related Works • PeleeNet: an efficient feature extraction network for image classification • Pelee: a real-time object detection system • Conclusion
  • 13.
    13Type A-3 PeleeNet: anefficient feature extraction network for image classification • DenseNet variant architecture – PeleeNet • Key Features • Two-way Dense Layer • Stem Block • Dynamic number of Channels in Bottleneck Layer • Transition Layer without Compression • Composite Function Classification
  • 14.
    14Type A-3 PeleeNet: anefficient feature extraction network for image classification • Two-Way Dense Layer • Motivated by GoogLeNet, use a 2-way dense layer • Can get different scales of receptive fields • Two stacked 3x3 conv → learn visual patterns for large objects Classification
  • 15.
    15Type A-3 PeleeNet: anefficient feature extraction network for image classification • Stem Block • Motivated by Inception-v4 and DSOD, use a cost efficient stem block before first dense layer • Can improve the feature expression ability w/o adding computational cost Classification
  • 16.
    16Type A-3 PeleeNet: anefficient feature extraction network for image classification • Dynamic number of Channels in Bottleneck Layer • Varies according to the input shape instead of fixed 4 times of growth rate • For the first several dense layer, bottleneck layer increases computational cost instead of reducing Classification
  • 17.
    17Type A-3 PeleeNet: anefficient feature extraction network for image classification • Transition Layer without Compression • Compression factor proposed by DenseNet can hurts the feature expression • Keep the number of output channels the same as the number of input channels in transition layer • Composite Function • Use conventional post-activation (Conv-BN-ReLU) • Also add 1x1 conv after the last dense block to get the stronger representational ability Classification
  • 18.
    18Type A-3 PeleeNet: anefficient feature extraction network for image classification • PeleeNet • Early stage features are very important for vision tasks • Premature reducing the feature map size can impair representational ability PeleeNet architecture PeleeNet ablation study Classification
  • 19.
    19Type A-3 PeleeNet: anefficient feature extraction network for image classification • PeleeNet Result • Achieves a higher accuracy and over 1.8 times faster speed than MobileNet and MobileNetV2 on NVIDIA TX2 using only 66% of the model size of MobileNet. • PeleeNet runs 1.8 times faster in FP16 mode than in FP32 mode. → Depthwise Separable Convolution is slow in TX2 FP16 Classification ImageNet Result Speed on NVIDIA TX2
  • 20.
    20Type A-3 Contents • Introduction •Related Works • PeleeNet: an efficient feature extraction network for image classification • Pelee: a real-time object detection system • Conclusion
  • 21.
    21Type A-3 Pelee: areal-time object detection system • SSD + PeleeNet → Pelee detector • Key Features • Feature Map Selection • Residual Prediction Block • Small Convolutional Kernel for Prediction Object Detection Effects of key features
  • 22.
    22Type A-3 Pelee: areal-time object detection system • Feature Map Selection • SSD with 5 scale feature map (19x19, 10x10, 5x5, 3x3, 1x1) • Do not use 38x38 feature map to reduce computational cost Object Detection SSD architecture Feature Map Selection
  • 23.
    23Type A-3 Pelee: areal-time object detection system • Feature Map Selection • SSD with 5 scale feature map (19x19, 10x10, 5x5, 3x3, 1x1) – do not use 38x38 • Residual Prediction Block • For each feature map, build residual block before conducting prediction • 1x1 Convolutional Kernel for prediction Object Detection
  • 24.
    24Type A-3 Pelee: areal-time object detection system • Pelee Result • PASCAL VOC 2007, COCO 15 benchmark • Fast, Low Computational Cost, and Accurate than SSD, YOLO Object Detection
  • 25.
    25Type A-3 Contents • Introduction •Related Works • PeleeNet: an efficient feature extraction network for image classification • Pelee: a real-time object detection system • Conclusion
  • 26.
    26Type A-3 Conclusion • DepthwiseSeparable Convolution is not only way to build an efficient model • PeleeNet and Pelee are built with conventional convolution • In real devices(iPhone8, Jetson TX2), perform real-time prediction for image classification and object detection • Compared to existing model, PeleeNet and Pelee is faster, cheap and accurate! • And, the code is simple to implement!! So I highly recommend it!!
  • 27.