YOLOv4: optimal speed and accuracy of object detection review

2020/05/24
Ho Seong Lee (hoya012)
Cognex Deep Learning Lab KR
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 1

Contents
• Introduction
• Related Work
• Object Detection Models
• Bag of Freebies(Tricks)
• Bag of Specials
• YOLOv4
• Experiments
• Conclusion

Introduction
The majority of Object Detectors are largely applicable only for recommendation systems
• Searching for free parking space → it’s okay to be slow → more accurate
• Car collision warning → need to fast → inaccurate
→ Need to design a fast and accurate object detector for production systems
• Develop an efficient and powerful object detection models. It makes everyone can use just
single GPU (1080 Ti or 2080 Ti)
• Verify the influence of SOTA Bag-of-Freebies and Bag-of-Specials methods
• Modify SOTA methods and make them more efficient and suitable for single GPU training
Main Contributions

Introduction
Object Detection is very popular topic in PR-12 Study
• Total 25 papers were covered! → Almost 10%!
• Recommend to watch YOLO v1, v2, v3 videos
PR-016 YOLO
By 전태균님
PR-024 YOLO v2
By 이진원님
PR-207 YOLO v3
By 이진원님

Introduction
YOLO v1 ~ v3 quick review: YOLO v1
• Very fast one-stage approach!
• Image → bounding box coordinate and class probability

Introduction
• YOLO v1 + many algorithms
YOLO v1
Batch Normalization
High resolution classifier
Anchor boxes
Dimension clusters
Direct location prediction
Fine-grained features
Multi-scale training
Darknet-19
Transfer learning
Hierarchical classification
Dataset combination with Word-tree
Joint classification and detection
Better
Faster
Stronger

Introduction
• YOLO v2 + many algorithms (YOLOv3: An Incremental Improvement)
YOLO v2
Bounding box prediction → sum of squared loss
Class prediction → Multilabel classification
Predictions across scales
Darknet-53

Introduction
YOLOv4: Optimal Speed and Accuracy of Object Detection
• YOLO 3 + many algorithms
YOLO v3
Bag of Freebies
Bag of Specials
CSPDarknet53 + SPP, PAN

Related Work
Object Detection Models
Head

Related Work
Bag of Freebies (pre-processing + training strategy)
• Call methods that only change the training strategy or only increase the training cost as “BoF”
Data Augmentation Regularization Loss Function
• Random erase
• CutOut
• MixUp
• CutMix
• Style transfer GAN
• DropOut
• DropPath
• Spatial DropOut
• DropBlock
• MSE
• IoU
• GIoU
• CIoU
• DIoU
Generalized
Distance
Complete
Training Phase

Related Work
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection
Bag of Specials (plugin modules + post-processing)
• Call methods that only increase the inference cost but can improve the accuracy as “BoS”
Enhancement of
receptive field
Activation function
• Spatial Pyramid
Pooling
• ASPP (dilated conv)
• Receptive Field Block
(RFB)
• ReLU
• Leaky ReLU
• Parametric ReLU
• ReLU6
• Swish
• Mish
Feature Integration
• Skip-connection
• Feature Pyramid Network
• SFAM (Scale-wise Feature
Aggregation Module)
• ASFF (adaptively spatial
feature fusion)
• BiFPN
11
Inference Phase
architecture related

Related Work
Bag of Specials (plugin modules + post-processing)
• Call methods that only increase the inference cost but can improve the accuracy as “BoS”
Attention Module Post Processing
• Squeeze-and-Excitation
(SE)
• Spatial Attention
Module (SAM)
• NMS
• Soft NMS
• DIoU NMS
Normalization
• Batch Norm (BN)
• Cross-GPU Batch Norm
(CGBN or SyncBN)
• Filter Response
Normalization (FRN)
• Cross-Iteration Batch
Norm (CBN)
12
Inference Phase
architecture related

YOLOv4
Selection of architecture
• Higher input network size (resolution) – for detecting multiple small-sized objects
• More layers – for a higher receptive field to cover the increased size of input network
• More parameters – for greater capacity of a model to detect multiple objects of different sizes
in a single image
13
better!

YOLOv4
CSPNet: A New Backbone that can Enhance Learning Capability of CNN, 2020 CVPRW
• Propose Cross Stage Partial Network to mitigate heavy inference computations
• Partition feature map of the base layer into two parts and the merge them
14
Split the gradient flow

YOLOv4
Selection of architecture
15
YOLOv4 = YOLOv3 + CSPDarknet53 + SPP + PAN + BoF + BoS
CSPDarknet53 SPP + PAN YOLOv3
Path Aggregation Network

YOLOv4
Selection of BoF and BoS
16
PReLU, SELU → difficult to train
ReLU6 → designed for quantization network
DropBlock’s author have compared their
method with other method and has won a lot
Only use single GPU → SyncBN is not
considered

YOLOv4
Additional improvements
• Introduce a new data augmentation Mosaic and Self-Adversarial Training (SAT)
• Select optimal hyper-parameters while applying genetic algorithms
• Modify some existing methods for efficient training and detection
• Modified SAM
• Modified PAN
• Cross mini-Batch Normalization (CmBN)
17

YOLOv4
• Modified SAM
• Modified PAN
18
• Mixes 4 training images → allows detection
of objects outside their normal context
• BN calculates activation statistics from 4
different images on each layer → reduce
the need for large batch size

YOLOv4
• Modified SAM
• Modified PAN
19
• New data augmentation technique that operates in 2 forward backward stages
• In the 1st stage, the NN alters the original image instead of the network weights
→ NN executes an adversarial attack on itself
• In the 2nd stage, the NN is trained to detect an object on this modified image
→ But.. There are no experimental result of SAT.. Why??

YOLOv4
• Modified SAM
• Modified PAN
20
Reference: https://github.com/AlexeyAB/darknet/issues/5117

YOLOv4
21

Experiments
Experimental Setup
• Please refer to the paper for details.
ImageNet for classification
MS COCO for object detection

Experiments
Influence of BoF and Mish on classifier training
• CutMix, Mosaic data augmentation, Label smoothing → improved!
• Mish activation → Good!
23

Experiments
Influence of BoF and Mish on object detector training
• S: Eliminate grid sensitivity → worse performance
• M: Mosaic data augmentation
• IT: IoU threshold using multiple anchors for a single GT → worse performance
• GA: Genetic algorithms for selecting the optimal hyperparameters on the first 10% of time periods
• LS: Class label smoothing → worse performance
• CBN: Cross mini-Batch Normalization
• CA: Cosine annealing scheduler
• DM: Dynamic mini-batch size → worse performance
• OA: Optimized Anchors
24

Experiments
Influence of BoF and Mish on object detector training
• PAN + SPP + SAM → better performance!
• RFP, Gaussian YOLO(G), ASFF → worse performance
25

Experiments
Influence of different backbones on object detector training
• Although classification accuracy of CSPResNeXt is higher compared to CSPDarknet, CSPDarknet
model shows higher accuracy in terms of object detection
26

Experiments
Influence of different mini-batch size on object detector training
• After adding BoF and BoS, the mini-batch size has almost no effect on the detector’s performance
27

Conclusion
• Offer SOTA detector which is faster (FPS) and more accurate
• YOLOv4 can be trained and used on single conventional GPU with 8-16GB VRAM
• Verified a large number of features, and selected for use such of them for improving the
accuracy of both the classifier and the detector
28

YOLOv4: optimal speed and accuracy of object detection review

In this document

More Related Content

What's hot

Similar to YOLOv4: optimal speed and accuracy of object detection review

More from LEE HOSEONG

Recently uploaded

YOLOv4: optimal speed and accuracy of object detection review