Automated PPE Detection System
Automated PPE Detection System
net/publication/337705605
CITATIONS READS
9 5,939
3 authors, including:
All content following this page was uploaded by Thi Lan Le on 18 August 2020.
Abstract—Construction had the most fatal occupational in- using has to be conducted not only on the construction field
juries out of all industries due to the high number of annual but also at site entry.
accidents. There are many solutions to ensure workers’ safety and Besides the PPE detection, the face and identity of the
limit these accidents, one of which is to ensure the appropriate
use of appropriate personal protective equipment (PPE) specified workers have to be defined. However, this is usually per-
in safety regulations. However, the monitoring of PPE use that formed by manually or by others technologies that require
is mainly based on manual inspection is time-consuming and extra hardware plate-form such as RFID (Radio Frequency
ineffective. This paper proposes a fully automated vision-based Identification). In this paper, for the first time, we propose a
system for real-time personal protective detection and monitor- fully automated vision-based PPE detection and monitoring.
ing. The proposed system consists of two main components: PPE
detection and face detection and recognition. Several experiments The proposed system consists of two main components: PPE
have been conducted. The obtained detection accuracy for 6 main detection and face detection and recognition. The main aim of
PPEs is up to 98% while that for face detection and recognition PPE detection is to determine the presence of required PPE
is 96%. The obtained results have demonstrated the ability to while face detection and recognition aims to determine the
detect PPE and to recognize a face with high precision and recall identity of the workers.
in real-time.
Index Terms—PPE detection, deep learning, object detection, II. R ELATED WORKS
automatic monitoring.
The enhancement of onsite construction safety has been
I. I NTRODUCTION increasingly received the attention of researchers and industrial
practitioners. The proposed solutions can be divided into two
Construction has been identified as one of the most dan- groups: a noncomputer vision-based and computer vision-
gerous job sectors. In Vietnam, according to the Report of based technique.
Ministry of labour-invalids, and social affairs in 2017, there The work of Kelm et al. [5] falls into the first group.
were 8956 occupational accidents nationwide, causing 9173 The authors designed an RFID-based portal to check whether
victims, of which 5.4% of cases involved not wearing personal the workers’ personal protective equipment (PPE) complied
protection equipment [1]. with the corresponding specifications. Dong et al. [6] use
Several onsite safety regulations have been established to real-time location system (RTLS) and virtual construction are
ensure construction workers’ safety. In the safety regulations, developed for worker’s location tracking to decide whether
the appropriate use of appropriate personal protective equip- the worker should wear a helmet and give a warning, while
ment (PPE) is specified and the contractors must ensure that the silicone single-point sensor is designed to show whether
the regulations are enforced through the monitoring process. the PPE is used properly for a further behavior assessment.
The monitoring of the use of PPE is normally conducted in However, these methods are limited in their respective ways.
two areas: the site entry and the onsite construction field. For example, the worker’s identification card only indicates
Nowadays, most of the construction fields conduct the that the distance between the worker and PPE is close and the
monitoring of PPE using manually by inspectors. This work loss of sensors may be a consideration when applying.
is tedious, time-consuming and ineffective due to the high Concerning computer vision-based techniques, taking into
number of workers to monitor in the field. account the important role of hardhat, several works have been
Recently, several technologies have been proposed to en- done for hardhat detection. Rubaiyat, et al. [7] incorporate a
hance the construction safety. Among the proposed solutions, Histogram of Oriented Gradient (HOG) with Circle Hough
computer vision has been widely used [2], [3], [4]. However, Transform (CHT) to obtain the features of workers and hard-
most of recent works focus on detecting the use of hardhat on hats. Du et al. [8] combine face detection and hardhat detection
the onsite construction field. Besides hardhat, others equip- based on color information.
ment such as glove, shoes need to be detected in order to In recent years, deep learning developed extremely fast
ensure the worker safety. Moreover, the monitoring of PPE based on a huge amount of training data and improved com-
puting capabilities of computers. The results for the problem of B. PPE and face detection
classification or detection of objects are increasingly improved.
The object detection methods can be divided into two main In our work, we employ the YOLO network for PPE and
approaches: one-stage (e.g., Faster Region-based Convolu- face detection. YOLO network has been introduced by Joseph
tional Neural Networks (Faster R-CNN) [9]) and two-stage Remon’s team [10] for object detection.
detection (e.g., You Only Look Once (YOLO) [10], Single The first version of YOLO is named YOLO v1. Different
Shot Multibox Detector (SSD) [11]). In general, one-stage from two-stage methods, the core idea behind this fast detector
detectors is faster and simpler. Inspired by the performance is a single convolutional network consisting of convolutional
of deep learning-based object detection method, in [12], [13], layers followed by 2 fully connected layers. A single neural
Fang et al. applied the Faster R-CNN algorithm to detect the network predicts bounding boxes and class probabilities di-
absence of hardhats and discovering non-certified work. The rectly from full images in one evaluation. For this, YOLO
proposed method has been evaluated in different situations and network divides the image into an S × S grids. For each
shown its high performance. grid cell, it predicts B bounding boxes, their confidence
However, most of the current works focus on detecting the score and C class probabilities as illustrated in Fig 2. The
use of hardhat on the construction site. Besides hardhat, other design of YOLO enables end-to-end training and real-time
equipment have to be detected. Moreover, to make sure that speeds while maintaining high average precision but still
the workers use proper PPE, the present of PPE should be has limitations when detecting the small objects appeared in
checked at the entry point of the construction field. groups. Therefore, YOLO v2 is introduced [14] and after that
is YOLO v3 [15]. YOLO v3 has significant improvements in
both accuracy and speed, improved the capable of detecting
III. P ROPOSED FRAMEWORK
small objects.
A. Overview
Currently, monitoring the use of personal protective equip-
ment is still done manually. It is costly and inaccurate because
a large number of workers need to be checked over a period
of time. In response to these restrictions, the overall objective
of this paper is to propose a novel solution to address the un-
resolved problem of reliably identifying workers who comply
with safety regulations at the entry point of the construction
field. The proposed system illustrated in Fig. 1 consists of two
main steps: PPE detection and face detection and recognition.
The main aim of PPE detection is to determine the presence
of required PPEs while face detection and recognition aim to
determine the identity of the workers. Inspired from impressive
results of deep convolutional neural network for different
computer vision tasks, in this paper, PPEs and face are detected
Fig. 2: Illustration of PPE or face detection based on YOLO
thanks to YOLO (You only look once) network while FaceNet
network.
is fine-tuned for face recognition.
C. Face recognition
FaceNet is introduced in 2015 by Google researchers, which Fig. 7: The triplet loss minimizes the distance between an
is a start-of-art face recognition, verification, and clustering anchor and a positive, both of which have the same identity,
neural network. It has a 22-layers deep neural network that and maximizes the distance between the anchor and a negative
directly trains its output to be a 128-dimensional embedding. of a different identity.
Once the FaceNet model having been trained with triplet loss
for different classes of faces to capture the similarities and
differences between them, the 128-dimensional embedding The embedding is represented by f(x) ∈ Rd . It embeds an
returned by the FaceNet model can be used to clusters faces image x into a d-dimensional Euclidean space. The triplet loss
effectively. Once such a vector space (embedding) is created, learns to ensure that an image xai (anchor) of a specific person
tasks such as face recognition, verification and clustering can is closer to all other images xpi (positive) of same person than
be easily implemented using standard techniques with FaceNet it is to any image xni (negative) of any other person. This is
embeddings as feature vectors. In a way, the distance would visualized in Fig 7. The loss fusion L that is being minimized
be closer for similar faces and further away for non-similar is as follows:
faces. X n
kf (xai ) − f (xpi )k22 − kf (xai ) − f (xni )k22 + α
(2)
i
B. Evaluation measures
The metrics that we use to evaluate the model’s accuracy are Fig. 8: Face is not correctly detected due to the small size.
Precision, Recall and F1-score for object and face detection
and accuracy metric for face classification. Firstly, we have to
define the meaning of TP (true positive), FP (false positive),
and FN (false negative). True Positive results when an object
is correctly identified with IOU (Intersection over Union)
between ground truth and bounding box predicted to be greater
than a threshold (in our case, this threshold is set by 0.5).
False Positive is the result of the wrong identity, which means
that the wrong class can be identified, or IOU < 0.5. False
Negative is the result of miss identification, meaning that the
object appears but is not recognized.
Precision, Recall, and F1-score are defined as follows: Fig. 9: Face is detected but incorrectly recognized due to the
2 ∗ P recision ∗ Recall presence of emotions.
F1 score = (3)
P recision + Recall
TP TP 2) Evaluation of whole system: To evaluate the whole
where P recision = T P +F P , Recall = T P +F N
system, we prepare 4 scenarios. In all scenarios, the subjects
C. Experimental results do not wear all required PPE. In this evaluation, the chosen
1) Evaluation of PPE detection and face detection and network resolution is 416 × 416. For each scenario, we
recognition at image level: To determine the number of iter- perform 5 tests and evaluate the accuracy that is the ratio
ation for the network, we evaluate the network with different between the correct recognized test and the total number of
iterations. The obtained results are shown in Table IV with tests. One test is considered as a correct recognized test if all
network resolution being 416 × 416 and threshold equal to interested PPE are correctly detected and the identity of the
0.25. As is shown, training with a large number of iterations person is correctly classified. As shown in Tab. VI, thanks to
gives better results and allows us to avoid over-fitting. Based the voting technique and the robustness of PPE detection and
on these results, we use the weights after 4,000 iterations. face recognition, the system produced the correct results even
The performance of the method on the testing dataset with some PPEs are omitted or incorrectly detected and the identity
three network resolutions is shown in Tab. V. of the person is wrong classified in some frames.
TABLE V: Obtained results for PPE and face detection with different network resolutions.
Network resolution Frame rate Evaluation metric Hardhat Shirt Glove Belt Pant Shoes Face
320 × 320 10Hz FN 8 5 0 5 5 0 0
FP 10 10 14 12 13 15 0
TP 332 245 210 155 105 150 500
Precision 0.97 0.96 0.94 0.93 0.89 0.91 1
Recall 0.98 0.98 1 0.97 0.95 1 1
F1 Score 0.97 0.97 0.97 0.95 0.92 0.95 1
416 × 416 8Hz FN 6 5 2 6 6 0 0
FP 5 7 14 12 8 13 0
TP 334 245 208 154 104 150 500
Precision 0.99 0.97 0.94 0.93 0.93 0.92 1
Recall 0.98 0.98 0.99 0.96 0.94 1 1
F1 Score 0.98 0.97 0.96 0.94 0.93 0.96 1
608 × 608 6Hz FN 10 5 2 6 6 0 0
FP 6 7 15 11 14 15 0
TP 330 245 208 154 104 150 500
Precision 0.98 0.97 0.93 0.93 0.88 0.91 1
Recall 0.97 0.98 0.99 0.96 0.94 0.99 1
F1 Score 0.97 0.97 0.96 0.94 0.91 0.95 1
TABLE VI: Detection and recognition results of the proposed[5] A. Kelm, L. Laußat, A. Meins-Becker, D. Platz, M. J. Khazaee,
system with different scenarios. A. M. Costin, M. Helmus, and J. Teizer, “Mobile passive radio
frequency identification (rfid) portal for automated and rapid control of
Types of equipment #tests #false alarms
Accuracy (%) personal protective equipment (ppe) on construction sites,” Automation
Hardhat, shirt, belt, pant, shoe 5 0 100 in Construction, vol. 36, pp. 38 – 52, 2013. [Online]. Available:
Hardhat, shirt, pant, shoe 5 0 100 http://www.sciencedirect.com/science/article/pii/S0926580513001234
Shirt, glove,belt, pant, shoe 5 0 100 [6] S. Dong, Q. He, H. Li, and Q. Yin, Automated PPE Misuse Identification
Hardhat, pant, shoe 5 0 100 and Assessment for Safety Performance Enhancement. [Online]. Available:
https://ascelibrary.org/doi/abs/10.1061/9780784479377.024
[7] A. H. M. Rubaiyat, T. T. Toma, M. Kalantari-Khandani, S. A. Rahman,
L. Chen, Y. Ye, and C. S. Pan, “Automatic detection of helmet uses for
V. C ONCLUSIONS AND F UTURE WORKS construction safety,” in 2016 IEEE/WIC/ACM International Conference on
Web Intelligence Workshops (WIW), Oct 2016, pp. 135–142.
Construction activities are dangerous and very risky. Work[8] S. Du, M. Shehata, and W. Badawy, “Hard hat detection in video sequences
safety management in construction despite being concerned, based on face features, motion and color information,” in 2011 3rd Interna-
still leaves many occupational accidents. From the practical tional Conference on Computer Research and Development, vol. 4, March
2011, pp. 25–29.
needs and the urgent requirements laid down at the construc-[9] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: towards real-time
tion sites, we have developed a PPE detection and identifica- object detection with region proposal networks,” CoRR, vol. abs/1506.01497,
tion system to replace manual inspection and monitoring, and 2015. [Online]. Available: http://arxiv.org/abs/1506.01497
[10] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only look
allow management and retrieval of data. Along with that is to once: Unified, real-time object detection,” CoRR, vol. abs/1506.02640, 2015.
build a database of images of 6 types of common protective [Online]. Available: http://arxiv.org/abs/1506.02640
equipment at construction sites. This database along with [11] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, and A. C.
labeling information can be used for the research community. Berg, “SSD: single shot multibox detector,” CoRR, vol. abs/1512.02325,
2015. [Online]. Available: http://arxiv.org/abs/1512.02325
The results of detection and identification on the test dataset [12] Q. Fang, H. Li, X. Luo, L. Ding, H. Luo, T. Rose, and W. An, “Detecting
show that the model gives good results in both accuracy and non-hardhat-use by a deep learning method from far-field surveillance videos,”
Automation in Construction, vol. 85, pp. 1–9, 01 2018.
miss rate. In the future, we will expand the data set with [13] Q. Fang, H. Li, X. Luo, L. Ding, T. M. Rose, W. An,
other equipment with different conditions. Furthermore, the and Y. Yu, “A deep learning-based method for detecting
code will be optimized to increase the speed of the system. non-certified work on construction sites,” Advanced Engineering
Informatics, vol. 35, pp. 56 – 68, 2018. [Online]. Available:
R EFERENCES http://www.sciencedirect.com/science/article/pii/S1474034617303166
[14] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” CoRR, vol.
[1] M. of labour invalids and social affairs. (2018) Notification abs/1612.08242, 2016. [Online]. Available: http://arxiv.org/abs/1612.08242
notice of the situation of labor accidents in 2017. [Online]. [15] ——, “Yolov3: An incremental improvement,” CoRR, vol. abs/1804.02767,
Available: http://vnniosh.vn/chitietN CKH/id/7559/T hong − bao − 2018. [Online]. Available: http://arxiv.org/abs/1804.02767
tinh − hinh − tai − nan − lao − dong − nam − 2017 [16] R. B. Girshick, “Fast R-CNN,” CoRR, vol. abs/1504.08083, 2015. [Online].
[2] B. E. Mneymneh, M. Abbas, and H. Khoury, “Automated hardhat Available: http://arxiv.org/abs/1504.08083
[17] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding
detection for construction safety applications,” Procedia Engineering, vol.
196, pp. 895 – 902, 2017, creative Construction Conference 2017, for face recognition and clustering,” CoRR, vol. abs/1503.03832, 2015.
CCC 2017, 19-22 June 2017, Primosten, Croatia. [Online]. Available: [Online]. Available: http://arxiv.org/abs/1503.03832
http://www.sciencedirect.com/science/article/pii/S1877705817331430 [18] AlexeyAB. (2016) GitHub,Yolo-mark gui for marking bounded
[3] ——, “Vision-based framework for intelligent monitoring of hardhat wearing boxes of objects in images for training yolo. [Online]. Available:
on construction sites,” Journal of Computing in Civil Engineering, vol. 32, https://github.com/AlexeyAB/Yolom ark
2019.
[4] Z. Zhenhua, P. Man-Woo, and E. Nehad, “Automated monitoring of hard-
hats wearing for onsite safety enhancement,” in International Construction
Specialty Conference of the Canadian Society for Civil Engineering (ICSC),
2015.