2020/05/24
Ho Seong Lee (hoya012)
Cognex Deep Learning Lab KR
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 1
Contents
• Introduction
• Related Work
• Object Detection Models
• Bag of Freebies(Tricks)
• Bag of Specials
• YOLOv4
• Experiments
• Conclusion
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 2
Introduction
The majority of Object Detectors are largely applicable only for recommendation systems
• Searching for free parking space → it’s okay to be slow → more accurate
• Car collision warning → need to fast → inaccurate
→ Need to design a fast and accurate object detector for production systems
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 3
• Develop an efficient and powerful object detection models. It makes everyone can use just
single GPU (1080 Ti or 2080 Ti)
• Verify the influence of SOTA Bag-of-Freebies and Bag-of-Specials methods
• Modify SOTA methods and make them more efficient and suitable for single GPU training
Main Contributions
Introduction
Object Detection is very popular topic in PR-12 Study
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 4
• Total 25 papers were covered! → Almost 10%!
• Recommend to watch YOLO v1, v2, v3 videos
PR-016 YOLO
By 전태균님
PR-024 YOLO v2
By 이진원님
PR-207 YOLO v3
By 이진원님
Introduction
YOLO v1 ~ v3 quick review: YOLO v1
• Very fast one-stage approach!
• Image → bounding box coordinate and class probability
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 5
Introduction
YOLO v1 ~ v3 quick review: YOLO v2
• YOLO v1 + many algorithms
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 6
YOLO v1
Batch Normalization
High resolution classifier
Anchor boxes
Dimension clusters
Direct location prediction
Fine-grained features
Multi-scale training
Darknet-19
Transfer learning
Hierarchical classification
Dataset combination with Word-tree
Joint classification and detection
Better
Faster
Stronger
Introduction
YOLO v1 ~ v3 quick review: YOLO v3
• YOLO v2 + many algorithms (YOLOv3: An Incremental Improvement)
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 7
YOLO v2
Bounding box prediction → sum of squared loss
Class prediction → Multilabel classification
Predictions across scales
Darknet-53
Introduction
YOLOv4: Optimal Speed and Accuracy of Object Detection
• YOLO 3 + many algorithms
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 8
YOLO v3
Bag of Freebies
Bag of Specials
CSPDarknet53 + SPP, PAN
Related Work
Object Detection Models
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 9
Head
Related Work
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 10
Bag of Freebies (pre-processing + training strategy)
• Call methods that only change the training strategy or only increase the training cost as “BoF”
Data Augmentation Regularization Loss Function
• Random erase
• CutOut
• MixUp
• CutMix
• Style transfer GAN
• DropOut
• DropPath
• Spatial DropOut
• DropBlock
• MSE
• IoU
• GIoU
• CIoU
• DIoU
Generalized
Distance
Complete
Training Phase
Related Work
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Bag of Specials (plugin modules + post-processing)
• Call methods that only increase the inference cost but can improve the accuracy as “BoS”
Enhancement of
receptive field
Activation function
• Spatial Pyramid
Pooling
• ASPP (dilated conv)
• Receptive Field Block
(RFB)
• ReLU
• Leaky ReLU
• Parametric ReLU
• ReLU6
• Swish
• Mish
Feature Integration
• Skip-connection
• Feature Pyramid Network
• SFAM (Scale-wise Feature
Aggregation Module)
• ASFF (adaptively spatial
feature fusion)
• BiFPN
11
Inference Phase
architecture related
Related Work
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Bag of Specials (plugin modules + post-processing)
• Call methods that only increase the inference cost but can improve the accuracy as “BoS”
Attention Module Post Processing
• Squeeze-and-Excitation
(SE)
• Spatial Attention
Module (SAM)
• NMS
• Soft NMS
• DIoU NMS
Normalization
• Batch Norm (BN)
• Cross-GPU Batch Norm
(CGBN or SyncBN)
• Filter Response
Normalization (FRN)
• Cross-Iteration Batch
Norm (CBN)
12
Inference Phase
architecture related
YOLOv4
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Selection of architecture
• Higher input network size (resolution) – for detecting multiple small-sized objects
• More layers – for a higher receptive field to cover the increased size of input network
• More parameters – for greater capacity of a model to detect multiple objects of different sizes
in a single image
13
better!
YOLOv4
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
CSPNet: A New Backbone that can Enhance Learning Capability of CNN, 2020 CVPRW
• Propose Cross Stage Partial Network to mitigate heavy inference computations
• Partition feature map of the base layer into two parts and the merge them
14
Split the gradient flow
YOLOv4
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Selection of architecture
15
YOLOv4 = YOLOv3 + CSPDarknet53 + SPP + PAN + BoF + BoS
CSPDarknet53 SPP + PAN YOLOv3
Path Aggregation Network
YOLOv4
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Selection of BoF and BoS
16
PReLU, SELU → difficult to train
ReLU6 → designed for quantization network
DropBlock’s author have compared their
method with other method and has won a lot
Only use single GPU → SyncBN is not
considered
YOLOv4
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Additional improvements
• Introduce a new data augmentation Mosaic and Self-Adversarial Training (SAT)
• Select optimal hyper-parameters while applying genetic algorithms
• Modify some existing methods for efficient training and detection
• Modified SAM
• Modified PAN
• Cross mini-Batch Normalization (CmBN)
17
YOLOv4
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Additional improvements
• Introduce a new data augmentation Mosaic and Self-Adversarial Training (SAT)
• Select optimal hyper-parameters while applying genetic algorithms
• Modify some existing methods for efficient training and detection
• Modified SAM
• Modified PAN
• Cross mini-Batch Normalization (CmBN)
18
• Mixes 4 training images → allows detection
of objects outside their normal context
• BN calculates activation statistics from 4
different images on each layer → reduce
the need for large batch size
YOLOv4
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Additional improvements
• Introduce a new data augmentation Mosaic and Self-Adversarial Training (SAT)
• Select optimal hyper-parameters while applying genetic algorithms
• Modify some existing methods for efficient training and detection
• Modified SAM
• Modified PAN
• Cross mini-Batch Normalization (CmBN)
19
• New data augmentation technique that operates in 2 forward backward stages
• In the 1st stage, the NN alters the original image instead of the network weights
→ NN executes an adversarial attack on itself
• In the 2nd stage, the NN is trained to detect an object on this modified image
→ But.. There are no experimental result of SAT.. Why??
YOLOv4
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Additional improvements
• Introduce a new data augmentation Mosaic and Self-Adversarial Training (SAT)
• Select optimal hyper-parameters while applying genetic algorithms
• Modify some existing methods for efficient training and detection
• Modified SAM
• Modified PAN
• Cross mini-Batch Normalization (CmBN)
20
Reference: https://github.com/AlexeyAB/darknet/issues/5117
YOLOv4
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Additional improvements
21
Experiments
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 22
Experimental Setup
• Please refer to the paper for details.
ImageNet for classification
MS COCO for object detection
Experiments
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Influence of BoF and Mish on classifier training
• CutMix, Mosaic data augmentation, Label smoothing → improved!
• Mish activation → Good!
23
Experiments
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Influence of BoF and Mish on object detector training
• S: Eliminate grid sensitivity → worse performance
• M: Mosaic data augmentation
• IT: IoU threshold using multiple anchors for a single GT → worse performance
• GA: Genetic algorithms for selecting the optimal hyperparameters on the first 10% of time periods
• LS: Class label smoothing → worse performance
• CBN: Cross mini-Batch Normalization
• CA: Cosine annealing scheduler
• DM: Dynamic mini-batch size → worse performance
• OA: Optimized Anchors
24
Experiments
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Influence of BoF and Mish on object detector training
• PAN + SPP + SAM → better performance!
• RFP, Gaussian YOLO(G), ASFF → worse performance
25
Experiments
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Influence of different backbones on object detector training
• Although classification accuracy of CSPResNeXt is higher compared to CSPDarknet, CSPDarknet
model shows higher accuracy in terms of object detection
26
Experiments
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Influence of different mini-batch size on object detector training
• After adding BoF and BoS, the mini-batch size has almost no effect on the detector’s performance
27
Conclusion
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
• Offer SOTA detector which is faster (FPS) and more accurate
• YOLOv4 can be trained and used on single conventional GPU with 8-16GB VRAM
• Verified a large number of features, and selected for use such of them for improving the
accuracy of both the classifier and the detector
28

YOLOv4: optimal speed and accuracy of object detection review

  • 1.
    2020/05/24 Ho Seong Lee(hoya012) Cognex Deep Learning Lab KR PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 1
  • 2.
    Contents • Introduction • RelatedWork • Object Detection Models • Bag of Freebies(Tricks) • Bag of Specials • YOLOv4 • Experiments • Conclusion PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 2
  • 3.
    Introduction The majority ofObject Detectors are largely applicable only for recommendation systems • Searching for free parking space → it’s okay to be slow → more accurate • Car collision warning → need to fast → inaccurate → Need to design a fast and accurate object detector for production systems PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 3 • Develop an efficient and powerful object detection models. It makes everyone can use just single GPU (1080 Ti or 2080 Ti) • Verify the influence of SOTA Bag-of-Freebies and Bag-of-Specials methods • Modify SOTA methods and make them more efficient and suitable for single GPU training Main Contributions
  • 4.
    Introduction Object Detection isvery popular topic in PR-12 Study PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 4 • Total 25 papers were covered! → Almost 10%! • Recommend to watch YOLO v1, v2, v3 videos PR-016 YOLO By 전태균님 PR-024 YOLO v2 By 이진원님 PR-207 YOLO v3 By 이진원님
  • 5.
    Introduction YOLO v1 ~v3 quick review: YOLO v1 • Very fast one-stage approach! • Image → bounding box coordinate and class probability PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 5
  • 6.
    Introduction YOLO v1 ~v3 quick review: YOLO v2 • YOLO v1 + many algorithms PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 6 YOLO v1 Batch Normalization High resolution classifier Anchor boxes Dimension clusters Direct location prediction Fine-grained features Multi-scale training Darknet-19 Transfer learning Hierarchical classification Dataset combination with Word-tree Joint classification and detection Better Faster Stronger
  • 7.
    Introduction YOLO v1 ~v3 quick review: YOLO v3 • YOLO v2 + many algorithms (YOLOv3: An Incremental Improvement) PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 7 YOLO v2 Bounding box prediction → sum of squared loss Class prediction → Multilabel classification Predictions across scales Darknet-53
  • 8.
    Introduction YOLOv4: Optimal Speedand Accuracy of Object Detection • YOLO 3 + many algorithms PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 8 YOLO v3 Bag of Freebies Bag of Specials CSPDarknet53 + SPP, PAN
  • 9.
    Related Work Object DetectionModels PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 9 Head
  • 10.
    Related Work PR-249 |YOLOv4: Optimal Speed and Accuracy of Object Detection 10 Bag of Freebies (pre-processing + training strategy) • Call methods that only change the training strategy or only increase the training cost as “BoF” Data Augmentation Regularization Loss Function • Random erase • CutOut • MixUp • CutMix • Style transfer GAN • DropOut • DropPath • Spatial DropOut • DropBlock • MSE • IoU • GIoU • CIoU • DIoU Generalized Distance Complete Training Phase
  • 11.
    Related Work PR-249 |YOLOv4: Optimal Speed and Accuracy of Object Detection Bag of Specials (plugin modules + post-processing) • Call methods that only increase the inference cost but can improve the accuracy as “BoS” Enhancement of receptive field Activation function • Spatial Pyramid Pooling • ASPP (dilated conv) • Receptive Field Block (RFB) • ReLU • Leaky ReLU • Parametric ReLU • ReLU6 • Swish • Mish Feature Integration • Skip-connection • Feature Pyramid Network • SFAM (Scale-wise Feature Aggregation Module) • ASFF (adaptively spatial feature fusion) • BiFPN 11 Inference Phase architecture related
  • 12.
    Related Work PR-249 |YOLOv4: Optimal Speed and Accuracy of Object Detection Bag of Specials (plugin modules + post-processing) • Call methods that only increase the inference cost but can improve the accuracy as “BoS” Attention Module Post Processing • Squeeze-and-Excitation (SE) • Spatial Attention Module (SAM) • NMS • Soft NMS • DIoU NMS Normalization • Batch Norm (BN) • Cross-GPU Batch Norm (CGBN or SyncBN) • Filter Response Normalization (FRN) • Cross-Iteration Batch Norm (CBN) 12 Inference Phase architecture related
  • 13.
    YOLOv4 PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection Selection of architecture • Higher input network size (resolution) – for detecting multiple small-sized objects • More layers – for a higher receptive field to cover the increased size of input network • More parameters – for greater capacity of a model to detect multiple objects of different sizes in a single image 13 better!
  • 14.
    YOLOv4 PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection CSPNet: A New Backbone that can Enhance Learning Capability of CNN, 2020 CVPRW • Propose Cross Stage Partial Network to mitigate heavy inference computations • Partition feature map of the base layer into two parts and the merge them 14 Split the gradient flow
  • 15.
    YOLOv4 PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection Selection of architecture 15 YOLOv4 = YOLOv3 + CSPDarknet53 + SPP + PAN + BoF + BoS CSPDarknet53 SPP + PAN YOLOv3 Path Aggregation Network
  • 16.
    YOLOv4 PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection Selection of BoF and BoS 16 PReLU, SELU → difficult to train ReLU6 → designed for quantization network DropBlock’s author have compared their method with other method and has won a lot Only use single GPU → SyncBN is not considered
  • 17.
    YOLOv4 PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection Additional improvements • Introduce a new data augmentation Mosaic and Self-Adversarial Training (SAT) • Select optimal hyper-parameters while applying genetic algorithms • Modify some existing methods for efficient training and detection • Modified SAM • Modified PAN • Cross mini-Batch Normalization (CmBN) 17
  • 18.
    YOLOv4 PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection Additional improvements • Introduce a new data augmentation Mosaic and Self-Adversarial Training (SAT) • Select optimal hyper-parameters while applying genetic algorithms • Modify some existing methods for efficient training and detection • Modified SAM • Modified PAN • Cross mini-Batch Normalization (CmBN) 18 • Mixes 4 training images → allows detection of objects outside their normal context • BN calculates activation statistics from 4 different images on each layer → reduce the need for large batch size
  • 19.
    YOLOv4 PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection Additional improvements • Introduce a new data augmentation Mosaic and Self-Adversarial Training (SAT) • Select optimal hyper-parameters while applying genetic algorithms • Modify some existing methods for efficient training and detection • Modified SAM • Modified PAN • Cross mini-Batch Normalization (CmBN) 19 • New data augmentation technique that operates in 2 forward backward stages • In the 1st stage, the NN alters the original image instead of the network weights → NN executes an adversarial attack on itself • In the 2nd stage, the NN is trained to detect an object on this modified image → But.. There are no experimental result of SAT.. Why??
  • 20.
    YOLOv4 PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection Additional improvements • Introduce a new data augmentation Mosaic and Self-Adversarial Training (SAT) • Select optimal hyper-parameters while applying genetic algorithms • Modify some existing methods for efficient training and detection • Modified SAM • Modified PAN • Cross mini-Batch Normalization (CmBN) 20 Reference: https://github.com/AlexeyAB/darknet/issues/5117
  • 21.
    YOLOv4 PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection Additional improvements 21
  • 22.
    Experiments PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection 22 Experimental Setup • Please refer to the paper for details. ImageNet for classification MS COCO for object detection
  • 23.
    Experiments PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection Influence of BoF and Mish on classifier training • CutMix, Mosaic data augmentation, Label smoothing → improved! • Mish activation → Good! 23
  • 24.
    Experiments PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection Influence of BoF and Mish on object detector training • S: Eliminate grid sensitivity → worse performance • M: Mosaic data augmentation • IT: IoU threshold using multiple anchors for a single GT → worse performance • GA: Genetic algorithms for selecting the optimal hyperparameters on the first 10% of time periods • LS: Class label smoothing → worse performance • CBN: Cross mini-Batch Normalization • CA: Cosine annealing scheduler • DM: Dynamic mini-batch size → worse performance • OA: Optimized Anchors 24
  • 25.
    Experiments PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection Influence of BoF and Mish on object detector training • PAN + SPP + SAM → better performance! • RFP, Gaussian YOLO(G), ASFF → worse performance 25
  • 26.
    Experiments PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection Influence of different backbones on object detector training • Although classification accuracy of CSPResNeXt is higher compared to CSPDarknet, CSPDarknet model shows higher accuracy in terms of object detection 26
  • 27.
    Experiments PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection Influence of different mini-batch size on object detector training • After adding BoF and BoS, the mini-batch size has almost no effect on the detector’s performance 27
  • 28.
    Conclusion PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection • Offer SOTA detector which is faster (FPS) and more accurate • YOLOv4 can be trained and used on single conventional GPU with 8-16GB VRAM • Verified a large number of features, and selected for use such of them for improving the accuracy of both the classifier and the detector 28