0% found this document useful (0 votes)
5 views5 pages

21BCE9535_AP2024254001483_DA01

This document presents a novel two-stage framework utilizing convolutional neural networks (CNNs) for pixel-level crack detection on highway roads, addressing the limitations of traditional image processing methods. The framework integrates a classification network for crack identification and a transformer-based network for precise segmentation, aiming to enhance detection efficiency and accuracy. The proposed approach demonstrates improved performance over existing methods by effectively handling low-contrast images and capturing fine details of pavement cracks.

Uploaded by

achyuthkumar721
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views5 pages

21BCE9535_AP2024254001483_DA01

This document presents a novel two-stage framework utilizing convolutional neural networks (CNNs) for pixel-level crack detection on highway roads, addressing the limitations of traditional image processing methods. The framework integrates a classification network for crack identification and a transformer-based network for precise segmentation, aiming to enhance detection efficiency and accuracy. The proposed approach demonstrates improved performance over existing methods by effectively handling low-contrast images and capturing fine details of pavement cracks.

Uploaded by

achyuthkumar721
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

PIXEL LEVEL CRACK DETECTION ON

HIGHWAY ROADS - USING CNN


School of Computer Science and Engineering,
VIT-AP University, Amaravati, Andhra Pradesh, 522241, India
MAHIMALIURU ACHYUTH
21BCE9535/ SCOPE / VIT -AP UNIVERSITY RESPECTED PROF : SAROJ KUMAR PANIGRAPHY SIR
[email protected] SCOPE / VIT- AP UNIVERSITY.
[email protected]

Abstract : the foreground and background, both of which are salient


one of the mostcommon structural issues inpavement differences.In most cases, the acquired image
is surface cracking, which deterioratesthe lacks overt contrast; as a result, it becomes unfeasible
pavement'ssustainability and usability. The to determine an opmal threshold to perform the pixel separa
construction of computer vision based models (such as on. Alterna vely, the local thresholding approaches
which can be adapted to the altered road environment.
image processing or deep learning-based). for the
have been established. But the features of high noise-sensi vity
independent detection of pavement surface cracks and enormous false detec ons limit its broad applica onIn the
has been paid a lot of attention in the last ten years. sphere of modern healthcare, there is an unequivocal need for
Nonetheless, authenticworld image data obtained by a rapid and precise diagnosis. Pneumonia, a widespread
linear array charge-coupled respiratory infection with potentially severe consequences,
The camera referred to as a charge-coupled device necessitates swift identification and categorization.
(CCD) camera is distinguished by its traditionally . Edge detection models, such as Sobel edge ,Canny edge,
great resolution and Roberts edge, etc., are frequently used to identifyThe surface
few crack pixels per image differs significantly from of the pavement has cracked. This is due to the severity of
public image data captured by phones or other different factors.color changes, texture changes, lighting
portable devices. We provide a novel two-stage changes, etc., edge detection modelsemphasis on the linear
framework for automatic pavement surface detection characteristics corresponding to crack boundaries to make
at the pixel level, which aims to increase the detec on results present F. Guo et al.Engineering
Applications of Artificial Intelligence 133(2024) 108312
efficiency and accuracy of pavement surface fracture
Figure 1. The workflow of this study. Discontinuities.of two
detection in engineering practice. During phase I, A
areas. Edge detectors, in reality, cannot detect thin and
classification network based on convolutional neural smooth cracks due to inapparent feature transi onsbetween
networks (CNNs) is used to identify the last pixel two areas.. In prac ce, edge detectors cannot detect thin and
cracks in the photos.. smooth cracks due to inapparent feature transi ons between
two regions. In addi on, when there are noisy images, complex
1. Introduction: parameter configura on opera ons for each image are needed,
Surface cracking is perhaps the most common type of pavement limi ng its engineering applica on with massive image data.
distress.negatively affecting pavement structural strength and Inspired by the applica on of deep learning models such as
introducing object detec on and seman c segmenta on, many studies) have
potential threats to members of the public. Periodic survey is contributed to detec ng the pavement surface crack in an end-
Mandatorily ordered by transportation management agencies in to-end manner.
tThe maintenance plan now indicates that pavement surveys still Unlike image processing-based approaches, deep learning
highly rely onexperienced crews, which make pavement surface models automa cally learn mul level features and are not sensi
crack detec on tedious and expensive. To efficiently and ve to low-contrast images that can easily cause crack detec on
accurately detect the pavement surface crack without subjec ve failure. Regarding the applica on using object detec on models
interven on and reduce the high cost on both of the human and me,
(e.g., YOLO (), Faster R–CNN), etc.), the crack is classified,
computer vision-based systems integra ng image processing or
deep learning-based models have been developed and adopted in and its loca on is provided. However, the object detec on-
prac ce Reviewing the development of crack detec on techniques, based model only presents the pavement crack in a bounding
image processing a racts the a en on in the early stage since they box and cannot depict the exact contour of the crack, restric ng
are computa onally efficient and economical . Generally, it the calcula on of crack indexes (e..
includes two categories which are threshold-based and edge-based
methods, respec vely () Width and length. Semantic segmentation models, such as fully
Threshold-based methods (e.g., global or local convolutionalFully Convolutional Network (FCN) (Long et al., 2015)
thresholding) individual pavement surface image pixels and UNet (Ronneberger et al.,2015) are therefore explored to present
Into the background Cracks , the pavement crack at the pixellevel. Based on architecture of encoder-
decoder, mul-scale feature
That Cracks establishing an adequate threshold.values. Maps are employed; meanwhile, superior recall and precision outcomes
For example, global thresholding techniques use are produced.
a single value to distinguish the intensity distributions of
created connec on. Although transformer-based approaches achieved
2. Literature survey: high

In this section, we briefly review the evolution of pavement 3.\tMethodology


surfacecrack detect on technologies such as image processing-
based methodsand machine learning-based methods. Image In this sec on, we introduce the detailed design of the proposed
processing-based pavementsurface crack detec on Image two-stage framework for pavement surface crack detec on. In
processing-based pavement surfacecrack detec on methods can be the first stage, survey data are input into a CNN for separa ng
mainly classified into twoTypes include intensity thresholding and crack image and crack-free image. After the image is iden fied
edge detection (Zakeri et al.,2017; Kheradmandi and Mehranfar, as the crack image, it would be input into the transformer
2022; Hsieh and Tsai, 2020).Intensity thresholding entailed based network for pavement surface crack detec on at the pixel
classifying the pixels of an imagepavement surface image into level. The structure of the CNN for classifica on and the
crack pixel groups andbackground. In order to make predic on
transformer-based seman c segmenta on network are
smooth and reliable, globalThresholding and local thresholding
techniques were created. InAs per Li et al. (2011), the neighboring successively introduced.
difference histogram (NDHM)was presented with the opmized
global threshold. Akagic et al.(2018) proposed a method to ResNet developed by He et al. (2016) in 2016, is one of the
calculate the histogram and Otsu'sthresholding of every sub-image most successful classifica on networks or backbones in the
for better performance on cracksegmenta on. Quan et al computa network design used in mul ple vision tasks. By using residual
on and the iden fica on of imageblocks of crack pixels.. learning with the shortcut connec ons design, the problem of
performance degrada on as well as the deep model is
Taking advantage of the huge success of deep learning techniques, addressed. In order to obtain robust classifica on performance,
the ResNet-34 with 34 layers is utilized in stage I to perform
object detec on or seman c segmenta on-based models were
the crack image classifica on. The reason we use ResNet-34
Used to detect the pavement crack at the block or pixel level. instead of ResNet-18 or ResNet-50 is that it possesses be er
The problems include low intensity differences and contrast. classifica on accuracy, and the training performance is
make image processing suffering can be addressed in the “black presented in the sec on of Experiments and Discussion
The "box" of the deep learning models significantly enhances the
detection.accuracy. YOLO family models at block level were The resulting image is of resolution 2000(H) × 2048(W),
popular at.to detect the pavement surface crack due to which is cropped to 224 × 224 for ease of training. ResNet-34
itbalancedperformance regarding accuracy and inference consists of five stages of convolution. The first-stage
speed.reported predic on results using the YOLO network and its convolutional layer uses a filter of size 7 × 7 with 64 channels.
variants. After the first stage, the output size is 112 × 112. From the
second stage to the fifth stage, the residual blocks are stacked
Though even they can offer precise loca on and category 3, 4, 6, and 3 times, respectively. Each convolutional layer
informa on of various cracks, the crack indexes like uses a filter of size 3 × 3. The respective channel numbers are
length and width were hard to obtain. Inspired by the 64, 128, 256, and 512, respectively. The respective output sizes
successful segmenta on of biomedical images with the are 56 × 56, 28 × 28, 14 × 14, and 7 × 7, respectively. Figure 3
aid of U-Net,) introduced CrackU-Net to find the shows the architecture of ResNet-34. Compared with the
pavement surface crack at the pixel level. According to VGGNet (Simonyan and Zisserman, 2014), GoogLeNet
the evalua on metrics, it demonstrated be er performance (Szegedy et al., 2015), etc., ResNet learns the residue instead
compared to the tradi onal image processing methods and of the original feature. The mo va on behind residue learning is
FCN. introduced the Shu leNet to offer strong seman c perhaps because it would be simpler to opmze the residue
segmenta on results on pavement surface crack predic on learning instead of original learning. Meanwhile, as the iden ty
through shortcut connec ons between consecutive shortcut connec ons suggested above merely perform addi on
encoding-decoding rounds. Based on CrackNet designed opera on, redundant computa onal complexity does not arise.
CrackNet-V with invariant spa al sizes for better predic Equa ons (1) and (2) represent direct mapping and iden ty
on results at the pixel level mapping, respec vely. Fig. 2 shows the building block for the
put a mul-scale a en on module in the decoder of ResNet. y=F (x,{Wi}) (1) y=F (x,{Wi}) + x (2)
DeepLabv3+, which can assign reasonable weights to where x and y are the layer input and output vectors,
different feature maps, improving the crack detec on respectively, F (x,{Wi}) is the residue learning opera on
accuracy. Combing the infrared thermography (IRT) and .
the CNN, trained and evaluated five classical CNN
segmenta on models and two UNet-based models on
RGB images, infrared images, and fused images.

Motivated by the tremendous success of transformers in


overall computer vision tasks, our earlier model, CT), is
able to detect pavement surface cracks accurately by
integrating the Swin Transformer as the encoder and the
Segformer as the decoder. In CrackFormer (the hybrid-
window a en ve vision transformer was introduced, using
a hybrid-window-based self-a en on scheme and a
weighted mul-head self-a en on philosophydeveloped
LETNet by crea ng a convolu onal stem and a local
enhancement module to enhance the local crack feature
percep on ability in transformer architecture.) introduced
SwinCrack to detect pavement surface cracks by
substituting linear components of the Swin Transformer
and integra ng the convolu onal a en on-gated skip
Compared with classic CNN models, the transformer-based
network (Yuan et al., 2021; Zhang et al., 2022b) differs in its Unlike the other stages, stage I has linear embedding
capacity to capture long-range dependencies by using the attention and the other stages have the patch merging layer. The
mechanism on high-resolution images. In general, the attention Swin Transformer blocks successively repeat 2, 2, 6 and 2
module allows the network to focus on the pavement crack by mes in each stage, respec vely. In our implementa on, C is
assigning various weights to various areas in the pavement crack
equal to 96. Thus, with the patch merging and feature
image. If there is a high level of relevance to the crack, the weight
score is greater than that of the background area, thus boosting transforma on, the other stages' total feature dimensions are
pavement crack detection. In this work, we introduce CTv2, H/8 × W/8 × 2C, H/16 × W/16 × 4C, H/32 × W/32 × 8C.
employing the Swin Transformer (Liu et al., 2021) as the encoder Fig. 6 shows the structure of the two successive Swin
and the feature pyramid network (FPN) (Yang et al., 2019) as the Transformer blocks which play crucial roles in feature
neck. extrac on. In the Swin Transformer, the classical window-
based mul-head self-a en on (W- MSA) module and the shi
ed window-based mul-head self-a en on (SW-MSA)
module are both included. Compared with W-MSA, SW-
MSA has the be er modeling power and can strengthen
connec ons between windows. Furthermore, the layerCNN
Model architectures: (6) More information regarding the
Swin Transform can be referred to (Liu et al., 2021) ( ( Zl
=W − MSA LN Zl− 1 ( Zl =MLP LN Zl ( )) ( Zl+1 =SW −
MSA LN Zl ( Zl+1 =MLP LN Zl+1 ( )) + Zl ( )) )) + Zl+1
(3) + Zl− 1 + Zl (4) (5) (6) where Zl is the output feature of
the W-MSA module. Zl+1 is the output feature of SW-
MSA module. Zl− 1 is the input feature. Zl and Zl+1 are
the output features of MLP., n the decoder part, the neck
and the decoder head are added. In the neck part, the
feature pyramid network (FPN) (Lin et al., 2017) is u lized
for feature fusion with mul ple scales generated in the
encoder. Following our previous design (Guo et al., 2023),
the decoder head of Segformer (Xie et al., 2021)
is utilized since it can accelerate the inference speed with
the simple and lightweight design. The structure of the
neck and decoder can be referred to Fig. 7. The detailed
informa on of the decoder can be referred to (Guo et al.,
and Segformer (Xie et al., 2021) head as decoder to locate the crack
2023). In particular, the neck part follows the top-down
in the pavement surface of the complex survey data. To weaken the
fashion. The input channels are 96, 192, 384, and 768,
effect brought by the extreme imbalance in the data, we build a
respectively.
separa on-combina on strategy in the seman c segmenta on training
procedure. The detailed steps of the proposed strategy are shown in
Fig. 4. That is, in the separa on phase, the acquired images are
divided into the valid image (with crack) and the invalid image.

(without crack). As the raw image is high resolu on, the slice opera
on for saving the computa onal cost is carried out. With the trained
classifica on model, valid image and invalid image are separable.
In the combina on stage, the valid imaged are input into the Fig 3: The Structure of Encoder
proposed encode-decoder structure to finish the model training.
With the trained seman c segmenta on model, the newly acquired Although, according to the experiments result, all the output
images can be tested for crack detec on. Worth mentioning, the test channels of the neck are fixed to 96. In the decoder section, the
images are sliced first, and all patches are numbered for recovering lightweight architecture with all mul layer perceptron (MLP) is
the whole image a er the crack detec on. As the training data is s ll used. Under this approach, the output dimension is 56 × 56 × 96.
highly limited, the trained model is only proven effec ve on the As depicted in Fig. 7, a er the feature fusion opera on accumulates
inspec on images acquired in the sec ons with similar conditions. global and local informa on, background and crack pixels can be
In the encoder, as indicated in Fig. 5, the input image is divided into categorized. Inspired by the use of dice loss (Sudre et al., 2017) in
patches and each patch is of size 4 × 4. Like the channels in CNN, the medical image segmenta on, we use the dice loss in
the construction of the feature dimension is done and is represented Taking advantage of the broad success of deep learning models,
by C. Therefore, the feature dimension of each patch is 4 × 4 × 3 = object detection or semantic segmentation-based models were
48. There are four stages in the architecture. After the input of the utilized to detect the pavement crack at the block or pixel level).
image, it is the patch par on opera on which divides the raw image
The problems such as low contrast and intensity difference that
into patches. Since the patch size is 4 × 4, the total number of
patches is H/4 × W/4. With the feature embedding layers, the feature cause image processing suffering can be solved in the "black box"
dimensions turn into H/4 × W/4 × 48. of the deep learning models, significantly enhancing the detection
accuracy. YOLO family models were desired at the block level to
detect the pavement surface crack because of its balanced
performance between accuracy and inference speed). reported
prediction results using the YOLO network and its variants.
Although they can give accurate location and category information
of various cracks, the crack indexes like length and width were hard
to array. Fully connected layers include a hidden layer with 64
ReLU-activated neurons, followed by the output layer with three
neurons for multi-class classification, using softmax activation. 2015), FCN (Long et al., 2015), DeepLabv3 (Chen et al.,
The model is compiled with 'rmsprop' and 2017), PSPNet (Zhao et al., 2017), CTv1 (Guo et al., 2023),
'sparse_categorical_crossentropy' for multi-class tasks, assessed and the proposed CTv2 are used. The hyperparameters and se
via Sparse Categorical Accuracy. ngs of the first stage can refer to Table 1. In order to
accelerate the training process of seman c segmenta on
models, each input image is cropped to 256 × 256 with
overlapped regions. The hyperparameters of the second stage
use the default se ngs. the labeling work difficult. With the
CTv2 model, we can discover its effec ve modeling
performance on long and complicated Cracks. However, in
the representa on of fine features, it s ll has room to improve.
As for the rest of the models, CTv1 and PSPNet are qualified
on the predic on of images 1 and 2. In addi on, CTv1 has the
similar result on image 3 as CTv2. However, the rest of
models fail in the predic on of image 3. FCN, UNet and
Fig 4: Workflow of the study .Graph DeepLabv3 cannot predict successive and long crack pa erns.
Meanwhile, they have poor performance on image 3.
Notably, even with the complex survey data, CTv2 can s ll
predict most of the crack pixels and outperforms other
models. Figs. 11 and 12 present the predic on results on the
CFD and CrackSC datasets, respec vely. In Fig. 11, with
1. Result: image 1, we find that most of the trained models can predict
the long and crossed cracks well. Meanwhile, CTv2 can
In this sec on, the details of the system configura on and training predict fine details but other models fail to capture them. For
implementa on are first introduced. A erwards, the data prepara on, instance, at the bo om of image 1, there is a transversal crack
performance indicators, training results and visualiza on results are connec ng the longitude cracks. Only CTv2 and CTv1 can
described. In the last, we present the crack detec on results using retrieve these cracks, and other models predict nothing or
different models. predict protrusions of longitude cracks. With image 2, we can
find there are long and thin cracks with bifurca ons at the top
of the pa ern. CTv2 has the closest

Figs. 11 and 12 present the predic on results on the CFD and CrackSC
datasets, respec vely. In Fig. 11, with image 1, we find that most of the
trained models can predict the long and crossed cracks well. Meanwhile,
CTv2 can predict fine details but other models fail to capture them. For
instance, at the bo om of image 1, there is a transversal crack connec ng
Fig 5 : Visualization of results of various models in the second the longitude cracks. Only CTv2 and CTv1 can retrieve these cracks,
stage based on CFD dataset. r
and other models predict nothing or predict protrusions of longitude
The Pytorch library of version 1.12.0 is used for cracks. With image 2, we can find there are long and thin cracks with
model training. A deep learning machine equipped with bifurca ons at the top of the pa ern. CTv2 has the closestresults to the
an NVIDIA 3080 Ti graphics processing unit (GPU) and ground truth label, and other results show discon nuous segments. As
the opera ng system of Ubuntu 20.04 LTS are prepared for the predic on results of image 3. We can find the raw image and
for the hardware and soware configura on. In particular, ground truth label have separate cracks. CTv2 and CTv1 can predict
the deep learning machine is with 16 G RAM and i7-CPU them at different loca ons, but other models only predict part of the
@ 3.5Hz. The first stage is responsible for the classifica whole crack leaving the le part of the crack without ge ng predicted.
on of crack and crack-free images from the survey data. Taking advantage of the neck part, it can be found that CTv2 can predict
Three classifica on models of ResNet-18, ResNet-34, and more fine details than CTv1.
ResNet-50 are used for training and evalua on. The
second stage deployed on the MMSegmenta on
(Contributors, 2020) is responsible for crack detec on at
the pixel level. Six models of UNet (Ronneberger et al.,
Acknowledgement

I would like to express my sincere gratitude to Prof.Saroj Kumar Panigrahy SIR to the successful
completion of this research on pixel-level crack detection on highway roads.

First and foremost, I would like to thank my advisor, [PROF : DEBJATI Goswami sir ], for their
continuous support, valuable guidance, and encouragement throughout the course of this study. Their
expertise in computer vision and deep learning provided the foundation for this research.

I am also grateful to the faculty and staff of the [SCOPE], [VIT -AP ], for providing the infrastructure,
resources, and technical support needed for the experiments and analysis.

Special thanks to the creators and maintainers of publicly available datasets such as the Crack Forest
Dataset (CFD), Crack500, and SDNET2018, which played a crucial role in training and validating the
pixel-level segmentation models.

7 ) REFERENCES:
Ahmadi, A., Khalesi, S., Golroo, A., 2021. An integrated machine learning model for automa c
road crack detec on and classifica on in urban areas. Int. J. Pavement
Eng. 1–17.
Akagic, A., Buza, E., Omanovic, S., Karabegovic, A., 2018. Pavement crack detec on using Otsu
thresholding for image segmenta on. 2018 41st Interna onal Conven on on Informa on
and Communica on Technology, Electronics and Microelectronics (MIPRO). IEEE, pp. 1092
1097.
Ayenu-Prah, A., A oh-Okine, N., 2008. Evalua ng pavement cracks with bidimensional empirical
mode decomposi on. EURASIP Journal on Advances in Signal Processing
1–7, 2008.
Bao, P., Zhang, L., Wu, X., 2005. Canny edge detec on enhancement by scale mul plica on.
IEEE Trans. Pa ern Anal. Mach. Intell. 27 (9), 1485–1490.
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H., 2017. Rethinking Atrous Convolu on for
Seman c Image Segmenta on arXiv preprint arXiv:1706.05587.
Contributors, M., 2020. MMSegmenta on: Openmmlab seman c segmenta on toolbox and
benchmark. Availabe online: h ps://github.com/open-mmlab/mmsegm enta on.
(Accessed 18 May 2022).
Dorafshan, S., Maguire, M., Qi, X., 2016. Automa c Surface Crack Detec on in Concrete
Structures Using OTSU Thresholding and Morphological Opera ons.
Du, Y., Pan, N., Xu, Z., Deng, F., Shen, Y., Kang, H., 2021. Pavement distress detec on and
classifica on based on YOLO network. Int. J. Pavement Eng. 22 (13),
1659–1672.
Fan, R., Bocus, M.J., Zhu, Y., Jiao, J., Wang, L., Ma, F., Cheng, S., Liu, M., 2019. Road Crack
Detec on Using Deep Convolu onal Neural Network and Adap ve Thresholding. IEEE
Intelligent Vehicles Symposium (IV), IEEE, pp. 474–479, 2019.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B., 2021. Swin transformer:
hierarchical vision transformer using shi ed windows. Proceedings of the IEEE/CVF
Interna onal Conference on Computer Vision, pp. 10012–10022.
Liu, F., Liu, J., Wang, L., 2022. Asphalt pavement crack detec on based on convolu onal neural
network and infrared thermography. IEEE Trans. Intell. Transport. Syst.
Shi, Y., Cui, L., Qi, Z., Meng, F., Chen, Z., 2016. Automa c road crack detec on using random
structured forests. IEEE Trans. Intell. Transport. Syst. 17 (12), 3434–3445.
Simonyan, K., Zisserman, A., 2014. Very Deep Convolu onal Networks for Large-Scale Image
Recogni on arXiv preprint arXiv:1409.1556.
Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M., 2017. Generalised dice
overlap as a deep learning loss func on for highly unbalanced segmenta ons, Deep
learning in medical image analysis and mul modal learning for clinical decision support.
Springer 240–248.

You might also like