Computer Vision-Based Military Tank Recognition Using Object Detection Technique An Application of The YOLO Framework
Computer Vision-Based Military Tank Recognition Using Object Detection Technique An Application of The YOLO Framework
net/publication/369776929
CITATION READS
1 198
6 authors, including:
All content following this page was uploaded by Ali Hussain on 12 April 2023.
Abstract— Military object detection is an indispensable and recognition results plays a very important role in taking any
challenging task for defence systems which includes the tracking, operational step and decision. Small negligence may render
tracing, security, and surveillance of any territory or region. massive destruction and loss in terms of valuable human lives,
These systems should be very efficient, reliable, and accurate in properties, and territories. In recent decades latest
executing their functions. A minute errant may result in mass
destruction and loss. So automatic real-time object detections are
technologies like artificial intelligence techniques such as
imperative in today’s world. Although over the years, different machine learning, deep learning, and computer vision have
traditional approaches and techniques have been used for the been used to address the military object detection challenge-es.
detection of military equipment, warheads, and other defence- Artificial intelligence techniques have been presenting a
related objects yet the efficiency and accuracy of those tremendous advancement in every field from the healthcare
techniques are comparatively low compared to the artificial domain to smart industries, agriculture to livestock, and from
intelligence-based object detection techniques. Therefore, we space to the defence sector. Computer vision [1], one of the
demonstrate the latest computer vision-based real-time object most interesting domains of artificial intelligence, has been
detection technique to detect real-time military objects with high used for object detection. It enables the computer to see,
accuracy and precision. We introduced YOLOv5 for the
detection of military tanks and flags. This model successfully
recognize and analyse the objects precisely. During the last
detects the targeted objects i.e., tank and flag with high few years, researchers presented several state-of-the-art object
confidence and precision. We trained and evaluated the detection approaches and YOLO is one of those approaches
performance of YOLOv3, YOLOv4, and four versions of the which has gained popularity in real-time object detection.
YOLOv5 model i.e., YOLOVv5s, YOLOv5m, YOLOV5l, Redmon et al. proposed the first version of You only look
YOLOV5xl with 922 images consisting of tank and flag objects. once (YOLO) [2] in 2015. Although there have been so many
The dataset has been divided into 80% training, 10% validation, research studies for object detection in different fields yet,
and 10% testing. The detection results of all six YOLO versions there has been very little research work conducted for military
are compared and evaluated. The experimental results showed object detection for example tank detection, enemy detection,
that the YOLOv5xl achieved higher performance. The precision,
recall, mAP_0.5, and mAP_0.95 were 0.99, 0.995, 0.995, and
suspicious moment detection, fighter plan detection, etc.
0.892, respectively. Since YOLOv5 is one of the latest and fastest therefore, we are applying the latest real-time object detection
real-time object detection approaches so this model will empower technology for detecting the military tanks. The
and enhance the military surveillance systems by enabling the implementation of the latest technologies like Yolo will take
military personnels to take prompt and proactive actions against the military object detection system to the next level with high
any potential threats. accuracy and performance.
Keywords: Computer vision, Military Tanks detection, YOLOv5, This paper explores the YOLO object detection approach
Object detection, Deep learning. and applies this state-of-the-art technique for military tank
detection. It is lightweight and can easily be run on systems
I. INTRODUCTION with low hardware/software configuration and specification. It
Military object detection is a very sensitive and at the same will help to beef up the security and defence of any country or
time very challenging task because it involves so many factors region. It will enable us to take proactive security measures.
and intricacies like object localization, recognition, and This approach will provide technical support for the real-time
classification. All these factors are to be considered while detection of military tanks in war fields and other defence-
developing a military object detection system. The quality of related areas. The contributions of this paper are as follows:
Authorized licensed use limited to: Inje University. Downloaded on April 12,2023 at 04:08:58 UTC from IEEE Xplore. Restrictions apply.
O Tank dataset has been collected which consists of 922 retain the real aspect ratio of objects. This model showed a
images and the images were annotated using labelImage macro-F2 score of 87.8% and a mean average precision of
annotation tool. The annotation was conducted for two objects 85.6%. It achieved a recognition speed of 0.068s/image.
i.e., Tank and Flag. Fahad Majeed et al. [6] developed a real-time surveillance
O The most advanced and real-time object detection system using YOLOv5. The model was trained and tested on a
technique i.e., YOLOv5 has been used for military tank customized dataset and Face Detection Dataset and
detection. Benchmark (FDDB). The FDDB dataset consists of 2845
O The model is trained with different parameters and images while another dataset has 500 images. The model
investigates the results. achieved an accuracy of 94% for customized datasets while it
O Subsequently, this military object detection system will gained an accuracy of 87% for the FDDB dataset by detecting
help to provide proactive steps to thwart any suspicious the face of persons accurately.
activity.
III. MATERIALS AND METHODS
II. RELATED WORK In this section, we present the materials and methods in
In military operations and defence, object detection is detail such as the collection of data, data pre-processing, tank,
considered the foundation to carry out any kind of activity like and flag detection algorithm.
tracking and tracing the objects or any counter operations. The
quality of detecting the objects determines the upcoming A. Data Source and Data Description
operations and it helps to take precise and accurate decisions The data has been collected by Open AI Lab at Inje
accordingly. One of the major aspects of military operations University as a pilot study for the detection of military tanks.
and surveillance is to identify the target holistically using We have used images of military tanks. The dataset consists of
some automatic system. There have been many ways to detect 922 images. The images contain two types of objects i.e.,
or collect data in military operations. In most cases, sensors military tanks and flags as shown in Figure 1. These images
have been used for the collection of data in warfare. are collected with a variety of scenes and with different image
Nowadays information technology has made advancements qualities.
tremendously and the traditional way of detecting objects has
been replaced with the latest and most robust ones. Now the
objects have been identified from the images using powerful
algorithms. Zhi Yang et al. [3] applied deep transfer learning
for military object detection. Their proposed approach
consisted of two parts: transfer learning and the mixed layer
scheme. They optimized their model by per-forming extensive
experiments and got the optimal model by retaining the last Figure 1. Image samples of tanks and flags dataset
three layers. The mixed layer scheme has been adapted to
make use of current information. The combination of these B. Data Pre-Processing
two approaches showed a large improvement in detecting All the images are pre-processed so that they can be used
military objects. M Calderón et al. [4] applied YOLOV3 for for the model implementation. In YOLO, we have to annotate
the detection of micro-UAV. They applied this neural each object in the images, so all the images are annotated
network-based model in CICTE Military Application using a labeling tool. In our dataset, we annotate the objects
Research Center. They found YOLOV3 very efficient in terms i.e., tanks and flags in the images, and make bounding boxes
of sensitivity and specificity for the detection of real-time of the objects. As a result of the annotation, we get text files
military objects. The model was able to recognize the objects that correspond to each image. These text files contain
from different directions while keeping the UAV in a static information about the objects in the images. In other words,
position. They examined the detection during take-off and when an image is annotated a corresponding text file is
navigation. The sensitivity during the take-off was 91% and automatically generated in the same file with the same name
the specificity was 70% however, during the navigation of of the image i.e., for “image_name.jpg”, there will be a text
GPS the sensitivity and specificity decreased to 57% and 56%, file with “image_name.txt”. This text file has the class Id of
respectively. Yongcan Yu et al. [5] proposed a real-time the objects, x, y showing the centre of the bounding box,
object detection method for underwater targets. They likewise w, h represents the width and height of the bounding
developed the model using YOLOv5s. They performed box. The dataset is divided into training and testing datasets.
different tasks like im-age pre-processing, sampling and Figure 2 shows the labeling of images.
localizing the target etc. to develop their model. They named
this model “a real-time automatic target recognition (ATR)
method”. They used side-scan sonar (SSS) for the training of
their model. Attention mechanism has been used to address
the target-sparse and feature-barren characteristics of SSS
images. Down sampling methodology has also been used to
Authorized licensed use limited to: Inje University. Downloaded on April 12,2023 at 04:08:58 UTC from IEEE Xplore. Restrictions apply.
YOLO network architecture has three major parts i.e.
backbone, neck, and head. The backbone consists of different
modules like Focus, Convolution with batch normalization,
and LeakyReLU which is denoted by (CBL), MixConv which
is an acronym for Mix convolution, Cross Stage Partial
Network represents as (CSP), and Spatial pyramid pooling
(SPP). It receives the input images with 416×416×3 resolution
through the Focus structure. The images are reduced while
utilizing the slicing operation, then kernels are applied as a
convolutional operation.
The backbone outputs are used as the input for the next
part of YOLOV5 architecture which is the neck network. The
Neck Network comprises CBL, CSP, Concat, and Up-
Figure 2. Annotation of images using labeling tool
sampling. It gives three outputs, and these are used as input of
C. YOLOv5 Network the head which is also known as the detector. Predictions are
made by this layer showing the bounding box around the
YOLO (You Only Look Once) is one of the famous deep
target objects, classification of the objects, and probability of
neural models for the detection of real-time objects [7]. It can
the object belonging to a specific class.
detect objects in a real-time environment because of its robust
performance. The first version of YOLO was proposed by D. Experimental Environment
Joseph Redmon and his other research team members in 2016.
All the experiments, data pre-processing and analysis were
It is a single neural network architecture which can detect the
performed using a 64-bit Windows operating system with
bounding boxes of the objects and predict the class probability
Intel(R) Core (TM) i7-7700 CPU @ 2.60 GHz, 3.60 GHz
of objects in the image. Unlike the previous object detection
processor, and 16 GB installed RAM. Google Colab which is
tools i.e., CNN, RCNN, Faster-RCN [8, 9], etc., Although
incorporated with NVIDIA Tesla P100, and GPU was used for
mask-RCNN has also a precise detection rate which is mainly
the experiment and training of the model. Pytorch, OpenCV,
meant for image segmentation, yet its processing time is more
Python language (Version 3.7), Keras (Version 2.4.3), and
and compared to YOLO [10]. YOLO is more suitable in a
Scikit-learn libraries were also used in this experiment.
scenario where quick and accurate detection of the object is
re-quired like in warfare speed and accuracy are very E. Performance Measures
important. It is a single network, robust and accurate. There The model has been evaluated by different performance
are several versions of YOLO, YOLOv2, YOLOv3, v4, v5 metrics like precision, recall, and mean average precision
and it has been upgraded by adding new features and (mAP) because these metrics are very important for the
functionalities [11, 12, 13, 14]. YOLOV5 is the latest, performance evaluation of the object detector. The
accurate, and fastest single-stage object detector. It was first performance metrics have been shown below in the equations.
released in May 2020. It is implemented in PyTorch [15]. It is
one of the transfer learning models which has been pre-trained
on a large dataset of MS COCO object detection dataset [16].
It has four different types of network architectures such as
small, medium, large, extra X-large. In order to overcome
overfitting, it uses data augmentation during the time of
training of data [17, 18]. It uses a new approach of
augmentation which combines four different training images
and detects the object thus enhancing the generalization. The
CONV layers are used for the extraction of features from the
images. The number of feature extraction modules and the
Where TP represents true positive which is a correct
convolution kernel at any particular area of the network is
detection of tank or flag that is actually present in the image.
different in all four types of YOLOV5. The bounding boxes
FP shows the incorrect detections made by the model i.e., the
are predicted by the anchor boxes and the object detection is
model detects an object which does not really exist in the
carried out by regression. The input images are divided into
image, for example, the model may detect the flag as tank or
S×S grids. Every grid cell takes part in the detection of
tank as a flag. FN denotes the false negative. The average
bounding boxes, detection of objects, and the confidence
precision of all classes and overall intersection over union
scores of each targeted object in the images. The bounding
(IoU) threshold gives the mean average precision (mAP).
box has confidence scores and x, y, w, and h predictions
Precision calculates how much the prediction of the model is
values. The coordinates of the box centre in a grid are denoted
accurate while recall computes how good the model detects all
by x, y while w, h shows the width and height of the bounding
the positives. The mean average precision (mAP) is one of the
box.
famous metrics used for the performance evaluation of object
Authorized licensed use limited to: Inje University. Downloaded on April 12,2023 at 04:08:58 UTC from IEEE Xplore. Restrictions apply.
detectors. It reckons the average value of precision for the
recall value over 0 to 1.
IV. EXPERIMENTS
In this section, we present the experimental details of this
research work. The developed model was able to detect the
targeted objects i.e., Tanks and flags in the images. The model
classified the tanks and flags by making bounding boxes
around them and labelled the objects correctly showing the
probability of detection as well.
A. Training YOLOv5
We trained and compared six YOLO versions. YOLOv5xl
model is the fastest one among the YOLO versions. The
neural networks were trained on google colaboratory. The Figure 3. Confusion matrix of YOLOv5 model on our test dataset.
dataset contained 922 images and 80% of the data was used
for the training of the model. We chose batch size of 16 and Figure 3 shows the confusion matrices of YOLOv5 on our
the image size was 416×416. The learning rate was 0.01 with test dataset. The confusion matrics give a broad picture of the
learning rate decay as 0.999 and SGD optimizer was used for YOLOv5 model with actual classes versus predicted classes.
the optimization of the network. The train.py script was run to
train the model while specifying the epochs, batch size,
weights of the particular model etc. in order to train the model
certain folders are needed to maintain. The dataset folder
should be divided into two folders that contain images and
labels. The image folder contains all the images while the
label folder contains the respective labels of the images in text
file format which contains the information about the classes of
objects, bounding boxes, coordinates, height, and width of the
bounding boxes. The text file contains information of one
class of the objects per line. The experi-mental parameters
have been mentioned below in table 1.
Parameters Values
Batch size 16
Image size 416
Learning rate 0.01 Figure 4. Shows the box loss and object loss for the training and
validation, respectively.
Momentum 0.937
Weight_decay 0.0005 In Figure 4, box loss and object loss have been shown for
Warmup_epochs 3.0 the training and as well as for validation. During the training,
Warmup_momentum 0.8 box loss decreases continuously as the epoch increases. At 0
epoch the box loss was 0.10 and gradually it decreased and
Warmup_bias_lr 0.1 when the model reached the epoch of 100, the loss reduced to
Optimizer SGD 0.0092. Likewise, object loss decreased from 0.0040 to 0.0032
as the epoch reached 100 and there was no further
improvement. While during the validation, the box loss
B. Experiment Results and Discussion decreased from 0.0135 to 0.0130 as the model reached at 100
We applied YOLOv3, YOLOv4, and all four versions of epochs with no further improvement in the box loss.
YOLOv5 i.e., YOLOVv5s, YOLOv5m, YOLOv5l, Furthermore, the object loss was 0.0028 at the start of the
YOLOv5xl. The results were compared, and the experimental validation but it decreased gradually as the epochs increased at
re-sults revealed that the recognition of YOLOv5xl is higher it reduced to 0.00270 when the epochs reached to 100.
than the rest of the models. The precision, recall, mAP_0.5,
and mAP_0.95 were 0.99, 0.995, 0.995, and 0.892,
respectively.
Authorized licensed use limited to: Inje University. Downloaded on April 12,2023 at 04:08:58 UTC from IEEE Xplore. Restrictions apply.
The performance measures i.e., mAP, precision, and recall
have also been calculated for all the six architectures of
YOLO. The experimental results showed that the perfor-
mance of YOLOv5xl outperformed other versions of YOLO.
It showed the mAP@_0.5 of 99.5 mAP@_0.5:0.95 of 89.2. It
has been observed that as the number of layers, parameters,
and GLOPS increased in the successive advanced versions of
YOLO, the performance results also improved respectively.
C. Evaluating YOLOV5 Detector on Test Images
Once the training was done successfully, the model was
tested and evaluated on new and unseen test images. For this
purpose, the trained weights which were generated upon the
training of the model were used and we opted the best weights
Figure. 5. Shows the precision, recall, mAP_0.5, and mAP_0.5:0.95 of for testing. The testing was conducted by specifying the path
YOLOv5xl of the weights file, the testing dataset, and executing the
detector scripts. The model detected the tanks and flags with
In figure 5. the precision value started from 0.0124 and high accuracy and performance. Since the YOLOv5 uses
gradually increased, and it reached to 0.990. Likewise recall PyTorch so its execution time was also very fast at the same
started from 0.0648 and there was an abrupt increase in the time the model predicted the objects i.e., Tanks and Flags
graph, and it reached to 0.90 but during the first 20 epochs, accurately.
there was some fluctuation in the graph however as epoch
increased it gained steady graph and eventually reached to
0.995. furthermore, mAP @0.5 and mAP @0.5:0.95 are 0.995
and 0.892 and were also shown in Figure 6.
We applied YOLOv3, YOLOv4, YOLOv5s, YOLOv5m,
YOLOv5l, and YOLOv5xl models and trained them on the
tank dataset. The models were evaluated on different
performance measures. The results have been shown in table 2.
Table 2 (A) and (B) show the comparison between the
different YOLO architectures, and the results have been
shown by evaluating different parameters like precision, recall,
execution time, number of layers, number of parameters, and Figure 6. Shows the detection of tanks and flags by the model
GLOPS which is gigabyte floating point operation per second
in each architecture. The Figure 6 shows that the YOLOv5xl model has clearly
Table 2 (A). COMPARISON OF YOLO MODELS detected the tanks and flags with high confidence and
Models Precision Recall [email protected] mAP@ .5:0.95 precision. In most of the cases the model achieved more than
90% confidence score, and, in some cases, it gained 100%
YOLOv3 0.86 0.78 0.84 - confidence score. The results depict the good performance of
the model and demonstrate how accurately the objects in the
YOLOv4 0.92 0.90 0.89 -
images have been detected making perfect bounding boxes
YOLOv5s 0.989 0.985 0.985 0.873 and confidence scores for each prediction.
Authorized licensed use limited to: Inje University. Downloaded on April 12,2023 at 04:08:58 UTC from IEEE Xplore. Restrictions apply.
will enhance military capabilities. Furthermore, it helps deter
Sikandar Ali received his B.E. degree in
the potential threats and enable to take proactive steps to beef
Computer Engineering from Mehran
up security and defence. University of Engineering & Technology,
Pakistan. He got his MS from the Department
ACKNOWLEDGMENT of Computer Science from Chungbuk National
University, the Republic of Korea.
This research was supported by the MSIT (Ministry of Furthermore, he is now a Ph.D. candidate at
Science ICT), Korea, under the National Program for Inje University South Korea in the department
Excellence in SW, supervised by the IITP (Institute of of Digital Anti-Aging Healthcare. His research
Information & Communications Technology Planning & interests include artificial intelligence, data
science, big data, machine learning, deep learning, reinforcement
Evaluation) in 2022, and by the National Research Foundation learning, Computer vision, and medical imaging.
of Korea (NRF) funded by the Ministry of Education
(2021R1I1A1A01050306). Abdullah received a B.E. degree in Electrical
and Computer Engineering from the
REFERENCES COMSATS University Islamabad (CUI),
Abbottabad Campus, Pakistan, in 2018. He is
[1] Forsyth, D. and J. Ponce, Computer vision: A modern approach. 2011: currently pursuing an M.Sc. degree from the
Prentice hall. department of Digital Anti-Aging Healthcare
[2] Calderón, M., W.G. Aguilar, and D. Merizalde, Visual-Based Real- in Inje University South Korea. His research
Time Detection Using Neural Networks and Micro-UAVs for Military interest is Machine learning, deep learning,
Operations. Developments and Advances in Defense and Security, computer vision, medical imaging, and
2020: p. 55. multimedia processing.
[3] Zhou, F., H. Zhao, and Z. Nie. Safety Helmet Detection Based on
YOLOv5. in 2021 IEEE International Conference on Power Ali Athar received his BSSE degree
Electronics, Computer Applications (ICPECA). 2021. IEEE. Software Engineering from Government
[4] Kasper-Eulaers, M., et al., Detecting Heavy Goods Vehicles in Rest College University Faisalabad (GCUF),
Areas in Winter Conditions Using YOLOv5. Algorithms, 2021. 14(4): Pakistan. He received his MS degree from
p. 114. NUST, Pakistan. He is pursuing his Ph.D.
[5] Amudhan, A., et al., RFSOD: a lightweight single-stage detector for degree from the Institute of Digital Anti-
real-time embedded applications to detect small-size objects. Journal of aging and healthcare at Inje University. His
Real-Time Image Processing, 2021: p. 1-14. research areas include Text Mining, Machine
[6] Kashiyama, T., H. Sobue, and Y. Sekimoto, Sky Monitoring System learning, and Deep learning.
for Flying Object Detection Using 4K Resolution Camera. Sensors,
2020. 20(24): p. 7071.
[7] WANG, Xiaolong; SHRIVASTAVA, Abhinav; GUPTA, Abhinav. A-
fast-rcnn: Hard positive generation via adversary for object detection. Maisam Ali received his B.E degree in
In: Proceedings of the IEEE conference on computer vision and pattern Electrical and communication Engineering
recognition. 2017. p. 2606-2615. from Hamrdard University, Pakistan. He is
[8] REN, Yun; ZHU, Changren; XIAO, Shunping. Object detection based currently pursuing his master’s degree from
on fast/faster RCNN employing fully convolutional architectures. Inje University. His research interests are
Mathematical Problems in Engineering, 2018, 2018. artificial intelligence, machine learning,
[9] PENG, Hanyu; CHEN, Shifeng. BDNN: Binary convolution neural deep learning, computer vision.
networks for fast object detection. Pattern Recognition Letters, 2019,
125: 91-97.
[10] HARALICK, Robert M.; SHAPIRO, Linda G. Image segmentation
techniques. Computer vision, graphics, and image processing, 1985, Ali Hussain. received his BSCS. degree in
29.1: 100-132. Computer Science from Government College
[11] Redmon Joseph, S. D. R. G., & Farhadi, A. (2016). You only look once: University Faisalabad (GCUF), Pakistan in 2019.
Unified, real-time object detection. In Proceedings of the IEEE Furthermore, he got his master’s from Inje
conference on computer vision and pattern recognition. University South Korea in the department of
[12] Redmon Joseph, S. D. R. G., & Farhadi, A. (2017). Yolo9000: better, Digital Anti-Aging Healthcare. Currently doing
faster, stronger. In Proceedings of the IEEE conference on computer PhD from the same department. His research
vision and pattern recognition. interests include artificial intelligence, data
[13] Redmon J, Farhadi A. Yolov3: An incremental improvement. arXiv science, Big data, machine learning, deep
preprint arXiv:1804.02767. 2018 Apr 8. learning, Computer vision, reinforcement learning, and medical
[14] Thuan, D., Evolution of yolo algorithm and yolov5: the state-of-the-art imaging.
object detection algorithm. 2021.
[15] KETKAR, Nikhil. Introduction to pytorch. In: Deep learning with
python. Apress, Berkeley, CA, 2017. p. 195-208. Hee-Cheol Kim received his BSc at the
[16] LIN, Tsung-Yi, et al. Microsoft coco: Common objects in context. In: Department of Mathematics, MSc at the
European conference on computer vision. Springer, Cham, 2014. p. Department of Computer Science in SoGang
740-755. University in Korea, and Ph.D. at Numerical
[17] ZOPH, Barret, et al. Learning data augmentation strategies for object Analysis and Computing Science, Stockholm
detection. In: European Conference on Computer Vision. Springer, University in Sweden in 2001. He is a
Cham, 2020. p. 566-583. Professor at the Department of Computer
[18] JIANG, Wei, et al. MeshCut data augmentation for deep learning in Engineering and Head of the Institute of.
computer vision. PLoS One, 2020, 15.12: e0243613. Digital Anti-aging Healthcare, Inje
University in Korea. His research interests include machine learning,
deep learning, Computer vision, and medical informatics.
Authorized licensed use limited to: Inje University. Downloaded on April 12,2023 at 04:08:58 UTC from IEEE Xplore. Restrictions apply.
View publication stats