YOLOv4: optimal speed and accuracy of object detection review
YOLOv4 builds upon previous YOLO models and introduces techniques like CSPDarknet53, SPP, PAN, Mosaic data augmentation, and modifications to existing methods to achieve state-of-the-art object detection speed and accuracy while being trainable on a single GPU. Experiments show that combining these techniques through a "bag of freebies" and "bag of specials" approach improves classifier and detector performance over baselines on standard datasets. The paper contributes an efficient object detection model suitable for production use with limited resources.
Covers YOLO evolution from v1 to v4. Emphasizes the need for speed and accuracy in object detectors, highlighting YOLO's developments and efficiency.
Discusses Object Detection models and techniques including 'Bag of Freebies' and 'Bag of Specials' which enhance training and inference while managing cost.
Explains YOLOv4 architecture enhancements like CSPDarknet53 and new techniques for data augmentation, focusing on improving efficiency and detection capacity.
Describes experimental setups and results for YOLOv4 testing, analyzing the impacts of different techniques and parameters on object detection performance.
Summarizes YOLOv4 contributions: faster processing speed and improved accuracy using single GPU setups.
YOLOv4: optimal speed and accuracy of object detection review
1.
2020/05/24
Ho Seong Lee(hoya012)
Cognex Deep Learning Lab KR
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 1
2.
Contents
• Introduction
• RelatedWork
• Object Detection Models
• Bag of Freebies(Tricks)
• Bag of Specials
• YOLOv4
• Experiments
• Conclusion
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 2
3.
Introduction
The majority ofObject Detectors are largely applicable only for recommendation systems
• Searching for free parking space → it’s okay to be slow → more accurate
• Car collision warning → need to fast → inaccurate
→ Need to design a fast and accurate object detector for production systems
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 3
• Develop an efficient and powerful object detection models. It makes everyone can use just
single GPU (1080 Ti or 2080 Ti)
• Verify the influence of SOTA Bag-of-Freebies and Bag-of-Specials methods
• Modify SOTA methods and make them more efficient and suitable for single GPU training
Main Contributions
4.
Introduction
Object Detection isvery popular topic in PR-12 Study
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 4
• Total 25 papers were covered! → Almost 10%!
• Recommend to watch YOLO v1, v2, v3 videos
PR-016 YOLO
By 전태균님
PR-024 YOLO v2
By 이진원님
PR-207 YOLO v3
By 이진원님
5.
Introduction
YOLO v1 ~v3 quick review: YOLO v1
• Very fast one-stage approach!
• Image → bounding box coordinate and class probability
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 5
6.
Introduction
YOLO v1 ~v3 quick review: YOLO v2
• YOLO v1 + many algorithms
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 6
YOLO v1
Batch Normalization
High resolution classifier
Anchor boxes
Dimension clusters
Direct location prediction
Fine-grained features
Multi-scale training
Darknet-19
Transfer learning
Hierarchical classification
Dataset combination with Word-tree
Joint classification and detection
Better
Faster
Stronger
7.
Introduction
YOLO v1 ~v3 quick review: YOLO v3
• YOLO v2 + many algorithms (YOLOv3: An Incremental Improvement)
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 7
YOLO v2
Bounding box prediction → sum of squared loss
Class prediction → Multilabel classification
Predictions across scales
Darknet-53
8.
Introduction
YOLOv4: Optimal Speedand Accuracy of Object Detection
• YOLO 3 + many algorithms
PR-249 | YOLOv4: Optimal Speed and Accuracy of Object Detection 8
YOLO v3
Bag of Freebies
Bag of Specials
CSPDarknet53 + SPP, PAN
Related Work
PR-249 |YOLOv4: Optimal Speed and Accuracy of Object Detection 10
Bag of Freebies (pre-processing + training strategy)
• Call methods that only change the training strategy or only increase the training cost as “BoF”
Data Augmentation Regularization Loss Function
• Random erase
• CutOut
• MixUp
• CutMix
• Style transfer GAN
• DropOut
• DropPath
• Spatial DropOut
• DropBlock
• MSE
• IoU
• GIoU
• CIoU
• DIoU
Generalized
Distance
Complete
Training Phase
11.
Related Work
PR-249 |YOLOv4: Optimal Speed and Accuracy of Object Detection
Bag of Specials (plugin modules + post-processing)
• Call methods that only increase the inference cost but can improve the accuracy as “BoS”
Enhancement of
receptive field
Activation function
• Spatial Pyramid
Pooling
• ASPP (dilated conv)
• Receptive Field Block
(RFB)
• ReLU
• Leaky ReLU
• Parametric ReLU
• ReLU6
• Swish
• Mish
Feature Integration
• Skip-connection
• Feature Pyramid Network
• SFAM (Scale-wise Feature
Aggregation Module)
• ASFF (adaptively spatial
feature fusion)
• BiFPN
11
Inference Phase
architecture related
12.
Related Work
PR-249 |YOLOv4: Optimal Speed and Accuracy of Object Detection
Bag of Specials (plugin modules + post-processing)
• Call methods that only increase the inference cost but can improve the accuracy as “BoS”
Attention Module Post Processing
• Squeeze-and-Excitation
(SE)
• Spatial Attention
Module (SAM)
• NMS
• Soft NMS
• DIoU NMS
Normalization
• Batch Norm (BN)
• Cross-GPU Batch Norm
(CGBN or SyncBN)
• Filter Response
Normalization (FRN)
• Cross-Iteration Batch
Norm (CBN)
12
Inference Phase
architecture related
13.
YOLOv4
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
Selection of architecture
• Higher input network size (resolution) – for detecting multiple small-sized objects
• More layers – for a higher receptive field to cover the increased size of input network
• More parameters – for greater capacity of a model to detect multiple objects of different sizes
in a single image
13
better!
14.
YOLOv4
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
CSPNet: A New Backbone that can Enhance Learning Capability of CNN, 2020 CVPRW
• Propose Cross Stage Partial Network to mitigate heavy inference computations
• Partition feature map of the base layer into two parts and the merge them
14
Split the gradient flow
15.
YOLOv4
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
Selection of architecture
15
YOLOv4 = YOLOv3 + CSPDarknet53 + SPP + PAN + BoF + BoS
CSPDarknet53 SPP + PAN YOLOv3
Path Aggregation Network
16.
YOLOv4
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
Selection of BoF and BoS
16
PReLU, SELU → difficult to train
ReLU6 → designed for quantization network
DropBlock’s author have compared their
method with other method and has won a lot
Only use single GPU → SyncBN is not
considered
17.
YOLOv4
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
Additional improvements
• Introduce a new data augmentation Mosaic and Self-Adversarial Training (SAT)
• Select optimal hyper-parameters while applying genetic algorithms
• Modify some existing methods for efficient training and detection
• Modified SAM
• Modified PAN
• Cross mini-Batch Normalization (CmBN)
17
18.
YOLOv4
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
Additional improvements
• Introduce a new data augmentation Mosaic and Self-Adversarial Training (SAT)
• Select optimal hyper-parameters while applying genetic algorithms
• Modify some existing methods for efficient training and detection
• Modified SAM
• Modified PAN
• Cross mini-Batch Normalization (CmBN)
18
• Mixes 4 training images → allows detection
of objects outside their normal context
• BN calculates activation statistics from 4
different images on each layer → reduce
the need for large batch size
19.
YOLOv4
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
Additional improvements
• Introduce a new data augmentation Mosaic and Self-Adversarial Training (SAT)
• Select optimal hyper-parameters while applying genetic algorithms
• Modify some existing methods for efficient training and detection
• Modified SAM
• Modified PAN
• Cross mini-Batch Normalization (CmBN)
19
• New data augmentation technique that operates in 2 forward backward stages
• In the 1st stage, the NN alters the original image instead of the network weights
→ NN executes an adversarial attack on itself
• In the 2nd stage, the NN is trained to detect an object on this modified image
→ But.. There are no experimental result of SAT.. Why??
20.
YOLOv4
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
Additional improvements
• Introduce a new data augmentation Mosaic and Self-Adversarial Training (SAT)
• Select optimal hyper-parameters while applying genetic algorithms
• Modify some existing methods for efficient training and detection
• Modified SAM
• Modified PAN
• Cross mini-Batch Normalization (CmBN)
20
Reference: https://github.com/AlexeyAB/darknet/issues/5117
21.
YOLOv4
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
Additional improvements
21
22.
Experiments
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection 22
Experimental Setup
• Please refer to the paper for details.
ImageNet for classification
MS COCO for object detection
23.
Experiments
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
Influence of BoF and Mish on classifier training
• CutMix, Mosaic data augmentation, Label smoothing → improved!
• Mish activation → Good!
23
24.
Experiments
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
Influence of BoF and Mish on object detector training
• S: Eliminate grid sensitivity → worse performance
• M: Mosaic data augmentation
• IT: IoU threshold using multiple anchors for a single GT → worse performance
• GA: Genetic algorithms for selecting the optimal hyperparameters on the first 10% of time periods
• LS: Class label smoothing → worse performance
• CBN: Cross mini-Batch Normalization
• CA: Cosine annealing scheduler
• DM: Dynamic mini-batch size → worse performance
• OA: Optimized Anchors
24
25.
Experiments
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
Influence of BoF and Mish on object detector training
• PAN + SPP + SAM → better performance!
• RFP, Gaussian YOLO(G), ASFF → worse performance
25
26.
Experiments
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
Influence of different backbones on object detector training
• Although classification accuracy of CSPResNeXt is higher compared to CSPDarknet, CSPDarknet
model shows higher accuracy in terms of object detection
26
27.
Experiments
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
Influence of different mini-batch size on object detector training
• After adding BoF and BoS, the mini-batch size has almost no effect on the detector’s performance
27
28.
Conclusion
PR-249 | YOLOv4:Optimal Speed and Accuracy of Object Detection
• Offer SOTA detector which is faster (FPS) and more accurate
• YOLOv4 can be trained and used on single conventional GPU with 8-16GB VRAM
• Verified a large number of features, and selected for use such of them for improving the
accuracy of both the classifier and the detector
28