0% found this document useful (0 votes)

8 views18 pages

3D segmentation and color coding

This paper presents a methodology for improving UAV-based bridge inspection by enhancing region of interest (ROI) extraction and crack detection through point cloud segmentation and 3D-to-2D projection. A deep-learning-based semantic segmentation network, RandLA-BridgeNet, is utilized to accurately extract ROIs from images containing background information, followed by a convolutional neural network for crack identification. Experimental results demonstrate high accuracy in both semantic segmentation and ROI extraction, significantly improving crack detection performance.

Uploaded by

WALEED AHMED KHAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views18 pages

3D segmentation and color coding

Uploaded by

WALEED AHMED KHAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Automation in Construction 158 (2024) 105226

Contents lists available at ScienceDirect

Automation in Construction
journal homepage: www.elsevier.com/locate/autcon

Region of interest (ROI) extraction and crack detection for UAV-based

bridge inspection using point cloud segmentation and 3D-to-2D projection
Jing-Lin Xiao , Jian-Sheng Fan , Yu-Fei Liu *, Bao-Luo Li , Jian-Guo Nie
Key Lab. of Civil Engineering Safety and Durability of China Education Ministry, Dept. of Civil Engineering, Tsinghua University, Beijing 100084, China

A R T I C L E I N F O A B S T R A C T

Keywords: For digital-image-based bridge inspection tasks, images captured by camera-carrying unmanned aircraft vehicles
Unmanned aircraft vehicles (UAVs) (UAVs) usually contain both the region of interest (ROI) and the background. However, accurately detecting
Bridge inspection cracks in concrete surface images containing background information is challenging. To improve UAV-based
Structure from motion (SfM)
bridge inspection, an image extraction and crack detection methodology is presented in this paper. First, a
Large-scale point clouds
Semantic segmentation
deep-learning-based semantic segmentation network RandLA-BridgeNet for large-scale bridge point clouds,
3D-to-2D projection which can facilitate 3D ROI extraction, is trained and tested. Second, an image ROI extraction method based on
Crack identification 3D-to-2D projection is presented to generate images containing only the ROI. Finally, a data-driven deep learning
Deep learning convolutional neural network (CNN) called the grid-based classification and box-based detection fusion model
(GCBD) is utilized to identify cracks in the processed images. An experiment is conducted on highway bridge
images to validate the presented methodology. The overall semantic segmentation and image ROI extraction
accuracies are 97.0% and 98.9%, respectively. After ROI extraction, 47.9% of the grid cells, which represent
background misrecognition, are filtered, greatly improving the crack identification accuracy.

1. Introduction efficiency of crack identification, digital image-based techniques have

been rapidly developed in recent years. Liu et al. [5] proposed an
Concrete bridges that have been in service for decades are often adaptive digital image processing method for concrete surface crack
afflicted by load overload, material aging, shrinkage, creep, and fatigue. identification, which includes three steps: image preprocessing, crack
To ensure the normal service of these structures, structural health identification and extraction, and crack parameter calculation. With the
monitoring and inspection have become a popular focus for academic booming development of artificial intelligence, deep learning methods
research and engineering applications. Cracking is an important phe have achieved great success in image recognition tasks. Thus, many
nomenon for monitoring and inspection, and it is a major factor in the researchers have used various kinds of deep neural networks, such as the
durability and safety of reinforced concrete bridges [1,2]. Cracking on dense convolutional network (DenseNet), region convolutional neural
concrete surfaces can lead to the concrete cover failure, and reinforce network (R-CNN), fully convolutional network (FCN), U-shaped
ment bars are at risk of rusting. In addition, rapidly developing stress network (U-Net), and segmentation network (SegNet), for crack iden
cracks are often a sign of structural failure and collapse. tification [1,2,6–11]. More detailed reviews of the state-of-the-art crack
Cracks can be measured and monitored by arranging contact sensors identification methods can be found in [12]. These crack identification
such as distributed fiber optic sensors [3]. However, when contact networks have been developed well under laboratory conditions. As
sensors are not applicable, the current primary means of crack identi indicated by Guo et al. [12], even dense interconnected microcracks in a
fication is manually depicting the distribution pattern and measuring strain-hardening cementitious composite (SHCC) can be satisfactorily
the width of the crack, which is time-consuming, laborious, and highly characterized when images of the cracks are taken under fully controlled
subjective, resulting in widespread errors and omissions. In a study by conditions.
Graybeal et al. [4], different inspectors gave significantly different crack The region of interest (ROI) is defined as the concrete component or
identification results for the same bridge. To improve the objectivity and surface where the crack of interest is located, while the remaining region

* Corresponding author.
E-mail addresses: [email protected] (J.-L. Xiao), [email protected] (J.-S. Fan), [email protected] (Y.-F. Liu), [email protected].
edu.cn (B.-L. Li), [email protected] (J.-G. Nie).

https://doi.org/10.1016/j.autcon.2023.105226
Received 17 May 2023; Received in revised form 25 November 2023; Accepted 27 November 2023
Available online 8 December 2023
0926-5805/© 2023 Elsevier B.V. All rights reserved.
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

is defined as the background. The aforementioned crack identification (1) Bridge images may contain complex background information,
studies primarily used images containing only the ROI as the input of and the background can sometimes be so large that the bridge is
deep neural networks. When an image contains both an ROI and back not the dominant object in the image.
ground, the deep-learning-based crack identification algorithm may (2) The characteristics of the same type of bridge components may
misidentify some parts of the background as cracks, thus affecting the differ in images with different shooting distances and lighting
quality of crack recognition. conditions.
As reviewed by Poorghasem et al. [13] and Ranyal et al. [14], various (3) Deep learning-based semantic segmentation algorithms rely
robot-based automated systems, e.g., unmanned ground vehicles (UGVs) heavily on the quality and quantity of training data. Unfortu
and unmanned aerial vehicles (UAVs) carrying vision systems such as nately, there is currently a scarcity of open-source annotated
cameras and optical lenses, were developed to facilitate image collec data, and the available scenarios are not sufficiently compre
tion. Different robot-based automated systems and computer vision hensive. Consequently, the reliability and portability of the
methods can be grouped for different engineering scenarios. Currently, trained network may suffer. Due to the diverse range of scenes
robot-based image collection and digital-image-based crack identifica encountered in practical engineering, annotating images for each
tion are relatively mature methods for road pavement and tunnel surface scene class separately is time-consuming, and building a general
scenarios [15,16]. In these two scenarios, pavement inspection vehicles dataset is extremely challenging.
and tunnel inspection vehicles, respectively, can be used to take digital
images under fully controlled conditions. The obtained images usually Considering the opportunities and limitations mentioned above, this
cover only the surfaces of the inspected structure (ROI), with no extra paper proposes a methodology for extracting image ROIs based on 3D
neous backgrounds, and are of high quality. However, bridge surface point cloud segmentation and 3D-to-2D projection, aiming to improve
distribution has a much higher complexity than road pavement and crack identification from bridge images taken by UAVs and containing
tunnel surfaces. Thus, contact inspection vehicles cannot be applied for background information. Instead of directly extracting bridge compo
this scenario. Instead, using camera-carrying UAVs is required to cap nents from the images as in previous studies [19–21], the proposed
ture bridge surface images. UAVs are in a 6-way free state in flight, and methodology offers enlightening insights about achieving this goal
the scope of the photographic scene is often difficult to control. There indirectly. The presented methodology integrates point cloud semantic
fore, images taken by UAVs will inevitably contain extraneous back segmentation and 3D-to-2D projection technologies into the UAV-based
ground and are not suitable for direct use in crack identification. bridge crack detection task, contributing to advancements in the field.
In addition to crack identification, visualized crack localization is The proposed methodology has both practical purposes (as shown in
also a matter of concern for structural health monitoring and inspection. Section 2) and great potential to improve crack detection results when
Liu et al. [17,18] fused two-dimensional (2D) digital image processing handling images containing complex background information. The
technology and three-dimensional (3D) reconstruction technology to highway bridge is taken as an example in this study to facilitate dis
achieve crack identification and localization. First, many 2D digital cussion, but the proposed methodology is also applicable to other en
images taken in the field were used to complete the 3D reconstruction of gineering scenarios.
the structure by Structure from Motion (SfM). Then, the images con
taining crack information were selected for crack identification. Finally, 2. Methodology framework
the projection method completed crack localization. Generally, images
taken at a close distance with fine details improve crack identification, The framework of the proposed methodology is illustrated in Fig. 1.
while images taken at a long distance containing scene geometric in For the inspection task of concrete bridges discussed in this paper, the
formation are improve the success rate of 3D reconstruction. For images inspector takes numerous images manually or using a UAV. These im
focusing on a local area of the concrete surface, state-of-the-art tech ages contain information about both the spatial composition of the scene
nology has been able to accurately complete crack identification. and the cracks on the concrete surface. This information is used not only
However, images containing only flat, smooth, feature-sparse structural for performing SfM to reconstruct the 3D point cloud of the bridge but
surfaces are not sufficient for SfM to be successful. For successful 3D also for subsequently identifying cracks. First, RandLA-Net, a deep
reconstruction, captured images with background information such as learning framework for semantic segmentation of large-scale point
ground and vegetation are better, as the feature points in these images clouds, is adopted to construct a point cloud semantic segmentation
are more abundant. From this perspective, the background does not network RandLA-BridgeNet for highway bridges. Bridge point clouds
need to be excluded during UAV-based image capture. from an open-source dataset are annotated to train and test the network.
Therefore, it is an objective requirement for UAV-based bridge in A large-scale bridge point cloud can be input into RandLA-BridgeNet
spection to accurately identify cracks from images with background. directly to complete semantic segmentation, and then the 3D ROI can
Extracting the ROI from the images is the best way to achieve this goal. be easily extracted from the point cloud. Second, for each image con
Some recent researchers have extracted ROIs from 2D images based on taining the bridge components to be inspected, the 3D-to-2D projection
semantic segmentation algorithms of deep learning. Taking bridge is performed based on the pinhole camera model. This step, calculating
structures as an example, Narazaki et al. [19] constructed a semantic the projection of the 3D ROI in the 2D image (i.e., 2D ROI), is essentially
segmentation algorithm containing 45 convolutional layers for bridge the inverse process of SfM 3D reconstruction. Next, the edge detection
component recognition after performing scene classification. A MATLAB algorithm is used to find the outer contour of the 2D ROI and generate a
GUI image semantic segmentation annotation tool was developed spe mask. The background pixels outside the outer contour of the 2D ROI are
cifically to manually annotate thousands of images that were combined removed using the mask, and the 2D ROI is extracted, producing an
with an existing database to generate training data. Saovana et al. [20] image containing only the ROI. Finally, the ROI image is used for crack
trained a deep CNN (DCNN) for removing irrelevant features from identification to effectively avoid background interference on the
bridge images. The training data were realistic scene images manually identification algorithm.
annotated using LabelMe, and the number of samples was expanded by Notably, the proposed methodology framework is not limited to the
rotating images and adjusting the brightness of images. Using 236 adopted SfM-based 3D reconstruction technique. It can still be applied
highway bridge images, Sajedi et al. [21] trained Fully Convolutional with minor adjustments when using other 3D reconstruction techniques
DenseNet (FC-DenseNet) to extract different kinds of bridge components and 2D digital images for bridge disease detection.
from the images. These studies show that the main challenges for bridge The remainder of this paper is organized as follows: Section 2 de
component recognition in 2D images are as follows: scribes the point cloud semantic segmentation method used to extract
3D ROIs; Section 3 describes the 3D-to-2D projection and 2D ROI

2
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

Fig. 1. Methodology framework.

3
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

extraction method; Section 4 describes the digital-image-based crack methods use only one or a combination of classic point cloud segmen
identification method; Section 5 describes the experimental study on a tation techniques and depend heavily on domain knowledge such as
real bridge for validating the proposed methodology; and Section 6 geometric features specific to particular bridge types, e.g., common di
concludes this work. mensions, axial orientation, and relative position relationships of com
ponents (piers, cap beams, girders, etc.). Therefore, these methods are
3. Point cloud segmentation technique for extracting 3D ROI only applicable to specific bridge types, and extending them to different
scenarios may result in serious errors.
3.1. Overview of point cloud segmentation To address these issues, some scholars have applied deep learning-
based methods to semantic segmentation of bridge point clouds. Kim
3D point cloud segmentation is a key research area in computer et al. [43,44] utilized PointNet, PointCNN and DGCNN for bridge point
vision that aims to classify each point of the point cloud into one of cloud segmentation, and the three methods performed similarly overall.
several classes based on its spatial location, color feature, semantic in However, these methods require block sampling operations along the
formation, etc. Classic segmentation methods include edge-based tech longitudinal direction of the bridge point cloud, and the size and overlap
niques, region growing, model fitting, and unsupervised clustering [22]. of the sampled blocks impact the segmentation results. Lee et al. [45]
More recently, supervised deep learning methods have gained promi proposed hierarchical DGCNN (HGCNN) based on PointNet and
nence [23], with voxel-based [24–26], multiview-based [27–29], and DGCNN, which effectively improved the recognition of electric poles on
point cloud-based [30–35] methods emerging. Among these, point bridges. Yang et al. [46] utilized the weighted SPG to directly process
cloud-based methods have become common due to their ability to avoid large-scale bridge point clouds, which performs better than PointNet
partial information loss resulting from data preprocessing. and DGCNN and does not require block sampling. Referring to
Some open-source deep learning semantic segmentation frameworks PointNet++, Jing et al. [47] developed BridgeNet for point cloud seg
based on point clouds have been proposed, starting with the classical mentation of masonry arch bridges and identified the bridge geometric
PointNet by Qi et al. [30]. PointNet directly uses 3D point clouds as parameters based on the segmented point clouds.
input and has become the basis for many subsequently proposed
methods. This framework, however, focuses too highly on global fea 3.2. Proposed segmentation network RandLA-BridgeNet
tures, ignores local features, and does not consider the adverse effects of
uneven point cloud density, which makes adapting it to complex scenes 3D point clouds of bridges obtained through SfM reconstruction
difficult. Thus, Qi et al. [31] proposed PointNet++, which overcomes often contain millions of points or more. While the classic PointNet
the problems of feature extraction methods to a certain extent. However, framework and its variations are widely used, they rely on block sam
it adopts the K-nearest neighbor search method, which may lead to the pling techniques to handle large-scale point clouds; these techniques can
concentration of sampling points in one direction. Point cloud data are be sensitive to sampling parameters and may affect the segmentation
usually disordered and have density inhomogeneity. Therefore, Li et al. results. To address this issue, the deep learning framework RandLA-Net
[32] proposed PointCNN to learn the local relationships of point clouds [35] is adopted in this study to develop a robust point cloud semantic
in space, which effectively reduces the time complexity and space segmentation network called RandLA-BridgeNet, which directly takes
complexity of segmentation. To address PointNet ignoring the correla the entire bridge point cloud as input. The network architecture is
tion between neighboring points, Wang et al. [33] proposed a graph illustrated in Fig. 2. The network adopts an encoder-decoder architec
convolution-based DGCNN, which includes an EdgeConv operation that ture with residual connections. The input point cloud is progressively
captures the distance information between each point and its neigh downsampled to extract the features of each point using a shared
boring points to learn edge features. While these frameworks are suit multilayer perceptron (MLP). Then, four encoding and decoding layers
able for small-scale scenarios, they require block sampling when are utilized to learn the features of the points. Finally, three fully con
handling large-scale point clouds. More specifically, large-scale point nected (FC) layers and a dropout layer are applied to predict the se
clouds must be cut into 1 m × 1 m small blocks, and then each block mantic labels of each point. Based on RandLA-Net [35], RandLA-
must be sampled to obtain 4096 points as the network input [30–33]. To BridgeNet follows most of the default parameter settings while adjust
fully adapt to large-scale point clouds, Landrieu et al. [34] proposed the ing the class definitions and the loss function to apply to the bridge point
SPG framework based on the superpoint graph. SPG first divides the cloud dataset. Since the number of points of each class in the bridge
point cloud into geometrically simple but meaningful sets of super dataset differs greatly, the weight of each class is calculated by dividing
points, forms a superpoint graph, and then embeds each superpoint into the number of points in each class by the total number of points in the
a PointNet for semantic segmentation. However, dividing the super dataset. Then, the value of 1/(weight + 0.02) is used as the weight for
points is difficult to implement and prone to classification errors. Hu each class in the loss function.
et al. [35] proposed RandLA-Net, a new framework that can directly To process a million-scale bridge point cloud directly with a deep
handle large-scale point clouds. RandLA-Net achieved good segmenta neural network, it is necessary to gradually downsample while retaining
tion results in the public large indoor and outdoor datasets S3DIS [36], as much geometric structure information as possible. Among the avail
Semantic3D [37] and SemanticKITTI [38]. able sampling methods, farthest point sampling (FPS), inverse density
In recent years, there have been several studies on point cloud seg importance sampling (IDIS), and generator-based sampling (GS) are
mentation of bridges. Due to the lack of a high-quality annotated bridge computationally expensive, while continuous relaxation-based sampling
point cloud database, some researchers have suggested learning- (CRS) is demanding on GPU memory, and policy gradient-based sam
independent segmentation methods [39–42]. Using the normal infor pling (PGS) has difficulty learning effective sampling strategies. There
mation of the points, Riveiro et al. [39] proposed a voxel-based method fore, RandLA-BridgeNet adopts random sampling (RS), which is
to recognize vertical and nonvertical components from the point cloud computationally efficient and has low memory overhead. However, RS
of a masonry arch bridge to divide the arch bridge into different parts. results in a loss of useful information. To mitigate this issue, the network
Yan et al. [40] proposed a heuristic algorithm for extracting structural incorporates a local feature aggregation (LFA) module that complements
components from the point clouds of steel bridges. Lu et al. [41] pro RS. Fig. 2 illustrates the LFA module consisting of three submodules:
posed a top-down point cloud segmentation algorithm for reinforced Local Spatial Encoding (LocSE), Attentive Pooling, and Dilated Residual
concrete bridges to complete bridge component recognition by stepwise Block. The LocSE submodule encodes the 3D coordinate information and
classification. Truong-Hong et al. [42] used a cell- and voxel-based re extracts neighborhood point features, enabling the network to better
gion growing method to extract surfaces individually from point clouds learn the geometric structure of the space from the relative position and
of reinforced concrete bridges. In conclusion, these segmentation distance information of points. Attentive pooling automatically learns

4
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

Fig. 2. Network architecture of RandLA-BridgeNet.

and aggregates useful information from neighboring point features. The superstructure and parapet and assigning the corresponding ground
dilated residual block connects two sets of LocSE and Attentive Pooling truth semantic labels. The final point cloud data used to train the
to cost-effectively increase the receptive field size of each point and network consisted of spatial location (XYZ), color (RGB), and semantic
facilitate feature propagation between neighboring points. The entire label information.
LFA module preserves the overall geometric details of the input point
cloud even if the features of some points are randomly discarded by
random downsampling. 3.4. Network training and testing

RandLA-BridgeNet was implemented in Python using the Tensor

3.3. Dataset Flow platform. The high-performance CentOS 7 workstation used to
train the network comprised an Intel Xeon Gold 6342 CPU, NVIDIA
Fig. 3 and Table 1 present the dataset utilized to train and test Ampere A 100 80 GB GPU and 100 GB RAM. The bridge point cloud
RandLA-BridgeNet. The original data were provided by Lu et al. [41], dataset was split into three subsets: 60% of the data (6 bridges) was used
who used a FARO Focus 3D X330 Terrestrial Laser Scanner to obtain the for training, 20% (2 bridges) for validation, and 20% (2 bridges) for
point clouds of ten reinforced concrete highway bridges. All the point testing. More specifically, Bridges 1, 3, 4, 5, 7, and 8 were used for
clouds possess spatial location (XYZ) and color (RGB) information. To training; Bridges 2 and 10 were used for validation; Bridges 6 and 9 were
our knowledge, this is currently the most extensive open-source bridge used for testing.
point cloud database. The ten scanned reinforced concrete bridges To quantitatively evaluate the effectiveness of semantic segmenta
represent the most typical type of highway bridges in China, North tion, several evaluation metrics commonly used in classification prob
America and Europe; the bridges are composed of abutments, piers and lems were selected. For each class, the precision, recall, intersection over
continuous girders. union (IoU) and F1 score were computed. The overall evaluation metrics
Before training the network, the original point clouds needed to be comprised overall accuracy (OA), average recall (AR), mean intersection
annotated. CloudCompare was employed to manually label the point over union (mIoU), and average F1 score. The evaluation metrics are
clouds of the ten bridges, classifying the points into background, pier, defined as follows:

5
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

Fig. 3. Dataset used to train and test RandLA-BridgeNet.

Table 1
Metadata of the bridge point cloud dataset.
Bridge number Number of points

Background Pier Superstructure Parapet Total

1 18,438,854 1,180,481 5,907,060 708,914 26,235,309

2 8,379,029 692,057 2,480,458 300,518 11,852,062
3 7,768,259 727,832 3,289,944 498,070 12,284,105
4 9,206,112 759,638 2,608,032 81,300 12,655,082
5 6,535,620 559,236 2,805,977 256,304 10,157,137
6 45,064,766 2,316,056 31,291,512 1,114,935 79,787,269
7 30,594,916 958,099 20,632,520 1,522,940 53,708,475
8 39,549,332 3,823,824 36,745,097 1,492,622 81,610,875
9 40,046,780 3,798,960 35,643,170 1,426,711 80,915,621
10 41,174,267 728,443 32,210,693 3,575,013 77,688,416

6
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

Fig. 4. Visualized comparison between prediction and ground truth of the test set.

Table 2
Quantitative evaluation of semantic segmentation of the test set.
Metrics for each class Global metrics

Metrics Background Pier Superstructure Parapet

Precision 97.5% 97.5% 96.7% 94.7% OA 97.1%

Recall 97.3% 90.6% 97.7% 90.7% AR 94.1%
IoU 94.9% 88.5% 94.6% 86.3% mIoU 91.1%
F1 score 97.4% 93.9% 97.2% 92.6% Average F1 score 95.3%

⎧ TPi
⎪
⎪ Precisioni =
⎪
⎪ TPi + FPi
⎪
⎪
⎪
⎪
⎪
⎪ TPi
⎪
⎪ Recalli =
⎪
⎪ TP + FNi
⎪
⎪ i
⎪
⎪
⎪
⎪ TPi
⎪
⎪
⎪ IoUi =
⎪
⎪
⎪ TPi + FNi + FPi
⎪
⎪
⎪
⎪
⎪
⎪ 2 × Precisioni × Recalli
⎪ (F1 score)i =
⎪
⎪
⎪ Precisioni + Recalli
⎪
⎪
⎪
⎪
⎪
⎪ ∑n
⎨ TPi
i=1 (1)
⎪ OA =
⎪
⎪
⎪ N
⎪
⎪
⎪
⎪ ∑n
⎪
⎪ Recalli
⎪
⎪
⎪
⎪ i=1
⎪
⎪ AR =
⎪
⎪ n
⎪
⎪
⎪
⎪
⎪
⎪ ∑n
⎪
⎪ IoUi
⎪
⎪
⎪
⎪ mIoUi =
i=1
⎪
⎪ n
⎪
⎪
⎪
⎪
⎪
⎪ ∑n
⎪
⎪ (F1 score)i
⎪
⎪
⎩ Average F1 score = i=1
n

where i represents the class number; TP stands for true positive,

denoting the number of elements belonging to class i that are correctly Fig. 5. Confusion matrix of RandLA-BridgeNet on the test set.
classified into class i; FP stands for false positive, denoting the number of
elements not belonging to class i that are falsely classified into class i; FN between the prediction and ground truth. Overall, the network achieved
stands for false negative, denoting the number of elements belonging to good results in semantic segmentation. However, it still made errors at
class i that are falsely classified into the other classes instead of class i; the boundaries of different parts due to the ambiguity of semantic in
and N and n are the total number of samples and classes, respectively. formation in those areas, meaning that one point may belong to more
The aforementioned evaluation metrics are applicable to classifica than one class simultaneously. For example, the points at the roots of the
tion tasks other than semantic segmentation, including image ROI parapet can also be regarded as belonging to the superstructure class. In
extraction in Section 4, which can be regarded as a binary classification addition, the successive downsampling and upsampling operations of
problem. RandLA-BridgeNet may amplify these ambiguous areas.
During the 100 training epochs, RandLA-BridgeNet achieved the best Table 2 and Fig. 5 present the quantitative evaluations of the se
mIoU of 91.6% on the validation set at epoch 54, and the training pro mantic segmentation of the test set. RandLA-BridgeNet performs
cess took a total of 17.2 h. The trained network was then applied to the remarkably overall, indicated by an OA of 97.1%. The network achieved
test set. These results are visualized in Fig. 4, which shows a comparison

7
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

Table 3
Comparison of performance metrics between representative models.
Segmentation network OA mIoU IoU for each class

Pier Parapet

PointNet [46] 97.9% 90.0% 92.5% 85.8%

DGCNN [46] 97.3% 88.1% 96.7% 81.4%
SPG [46] 97.6% 92.4% 89.3% 90.3%
WSPG [46] 99.4% 96.5% 99.8% 90.0%
RandLA-BridgeNet (this study) 97.1% 91.1% 88.5% 86.3%

impressive results in all four classes, including background, pier, su

perstructure, and parapet, with F1 scores above 92% for each class.
However, as indicated by the confusion matrix shown in Fig. 5, a small
number of the pier points were misclassified as background due to the
presence of a transition zone between the road and the root of each pier.
There were also slight errors between the background and the super
structure due to vehicles, vegetation, and scanning artifacts on the
bridge being included as background. Furthermore, the parapet stands
directly on the bridge deck alongside these background elements,
resulting in slight misclassification between the parapet, the super
structure and the background.
Table 3 compares some of the results in Table 2 with the results re
ported by Yang et al. [46]; both studies use the same dataset provided by
Lu et al. [41]. However, Yang et al. [46] manually removed the extra
neous background points from the original dataset and annotated the
remaining points as pier, pier cap, girder, deck and parapet. Therefore,
only the classes involved in both their paper and this study are listed in
Table 3. Overall, RandLA-BridgeNet performs only slightly lower in
most metrics than the state-of-the-art baselines, and it outperforms the
PointNet and the DGCNN in the mIoU and the IoU for the parapet class.
Since processing large-scale point clouds with extraneous background Fig. 6. Schematic of 3D-to-2D projection.
points directly is much more challenging than processing the cleaned
point clouds of bridges, the performance of RandLA-BridgeNet is quite the homogenous coordinate format holds:
impressive. ⎡ ⎤
w
⎡ ⎤ ⎢ f 0 2 ⎥⎡ ⎤ ⎡ ⎤
4. Image ROI extraction based on 3D-to-2D projection u ⎢ ⎥ x x
⎢ ⎥
z⎣ v ⎦ = ⎢ h ⎥⎣ y ⎦ = K⎣ y ⎦ (3)
⎢0 f ⎥
Upon deciding which bridge is to be inspected, plenty of on-site 1 ⎣ 2⎦ z z
images should be captured. SfM can then be performed to reconstruct 0 0 1
the 3D point cloud. The resultant point cloud is input into the trained
RandLA-BridgeNet to obtain semantic labels, and the components where f is the focal length of the camera; (u, v) are the coordinates of the
requiring detection are then processed individually. projection point p in the image coordinate system where the origin is the
upper left point; and K is the intrinsic parameter matrix (3 × 3) of the
4.1. 3D-to-2D projection of ROI camera representing the correspondence relationship between the
camera coordinate system and the image coordinate system, which is
Fig. 6 illustrates the 3D-to-2D projection principle of a bridge provided by the camera manufacturer and shown in the following
component, using pier 1 as an example. The current component of in equation:
terest is pier 1, and the corresponding point cloud is defined as the 3D ⎡
w
⎤
f 0
ROI. Points labeled as the pier are set to be visible in CloudCompare, ⎢ 2⎥
⎢ ⎥
while points with other labels are set to be invisible. Then, pier 1 can be K=⎢
⎢ h⎥⎥ (4)
easily extracted from the visible points. ⎢0 f
⎣ 2⎦
⎥
The next step is finding the projection of the 3D ROI in image I 0 0 1
containing pier 1. For any point P in the 3D ROI, its coordinates under
the world coordinate system are (X, Y, Z). Then, the coordinates (x, y, z) where w and h are the width and height of the image, respectively.
of point P under the camera coordinate system xyz can be calculated by Then, the coordinates of the projection point p in the image coordi
the following equation: nate system can be obtained as follows:
⎡ ⎤ ⎡ ⎤ ⎡x
x X w⎤
f+
⎣ y ⎦ = R⎣ Y ⎦ + T (2) [ ]
u ⎢ z 2⎥
z Z =⎢⎣y
⎥ (5)
v h⎦
f+
z 2
where R and T are the rotation matrix (3 × 3) and translation matrix (3
× 1) of image I, respectively, which represent the correspondence Note that all the scalars in Eqs. (2)–(5) should be unified in units, and
relationship between the camera coordinate system and the world co converting all units into pixels is recommended.
ordinate system and are computed by the SfM algorithm.
According to the pinhole camera model, the following equation in

8
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

Fig. 7. Parametric analysis of the alpha shape algorithm.

4.2. Boundary detection of projection points pixels in either width or height, setting α to 1000 is appropriate. Other
details and complete processing steps of this example are described in
Using the method mentioned in the previous subsection, all points in Section 4.3.
the 3D ROI are projected into the 2D image. The resultant discrete
projected points represent the 2D ROI. The alpha shape algorithm [48] is 4.3. Batch processing algorithm for image ROI extraction
used in this study to compute a series of boundary line segments and
generate a polygon that encloses the ROI. As shown in Fig. 8, the methods described above are integrated into
The alpha shape algorithm can be implemented in MATLAB R2022a an automatic MATLAB R2022a algorithm. The integrated algorithm can
[49] using the “alphaShape” function, which relies on the value of the batch process all images containing the current component of interest by
parameter α. Fig. 7 shows the boundary detection results using different using the 3D ROI as input and outputting images that contain only the
α values, taking an image of a bridge pier as an example. The red points ROI. The algorithm has excellent operational efficiency, requiring an
denote the projected points. The blue lines and green regions denote the average processing time of only 8.3 s per 9504 × 6336 pixel image.
detected boundary line segments and generated polygons, respectively. The right half of Fig. 8 illustrates the processing flow for a single
When α is set to 1 or 10, the generated polygons have many intersecting image. Image 1 shows an original image, which contains the upper half
polylines, and the enclosing effect is weak. When α is set to 100, the of a pier (the current ROI) and its connection area with the pier cap.
generated polygons can enclose a portion of the projection points, but After performing the 3D-to-2D projection according to the pinhole
voids still exist inside. When α is set to 1000, the algorithm successfully camera model, image 2 is obtained. Since the 3D ROI includes the whole
captures the outer contours of the projection points and the generated pier while the original image contains only the upper half of the pier,
polygons effectively enclose all the projection points. Thus, setting α to a many projection points fall outside the scope of the image. After deleting
larger value is recommended for common nonporous bridge components these overrun points, image 3 is obtained. Then, the algorithm calls the
or surfaces. Since the images involved in this study do not exceed 10,000 “alphaShape” function to detect the boundaries of the projection points,

Fig. 8. Batch processing algorithm for image ROI extraction.

9
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

Fig. 9. Network architecture of the GCBD fusion model.

as shown in image 4. These boundary points are then used to generate a recall and precision are perfectly balanced, as shown in Fig. 10. The
mask, as shown in image 5. Finally, the pixels outside the mask are remaining grid cells above the threshold are treated as areas containing
removed to obtain image 6, which contains only the ROI. Notably, this cracks for further crack segmentation.
algorithm not only separates the ROI from the background but also In addition, since only the grid-based classification branch is needed
removes the background so that the resulting images can be directly in this paper, the box neck and box head in the network can be removed
used for crack identification. through a pruning operation. This improves computing efficiency and
does not affect grid output.
5. Crack identification method
5.1.2. Generalization on OOD data
Digital image-based crack identification can be divided into two Concrete bridge images have different data distributions than asphalt
steps: crack extraction and crack segmentation. A data-driven deep pavement images. However, the weights used in the test are those
learning convolutional neural network (CNN) called the grid-based trained on the asphalt pavement image dataset due to the lack of an
classification and box-based detection fusion model (GCBD) is used for available surface crack image dataset for concrete bridges. Thus, directly
crack extraction [50]. The deep-learning-based method is more robust applying the fusion model on concrete piers requires strong out-of-
than traditional machine learning methods and can handle complex distribution (OOD) generalization performance.
scenes in practical engineering. A typical threshold-based segmentation Because the fusion model adopts a shared backbone network, mul
method [51] in digital image processing (DIP) is adapted for crack titask learning and joint training, it is highly robust. The grid-based
segmentation. The histogram distribution of cracks is bimodal, and the classification branch focuses on the local area, and the box-based
threshold-based method can obtain ideal crack segmentation results. detection branch focuses on the whole region. Fusing two tasks with
different objectives drives the model to capture the common features of
5.1. GCBD fusion model for crack extraction the cracks at both micro and macro scales. The experimental results
show that the weight is still well generalized on concrete bridge surfaces,
The deep learning model used to extract cracks is the GCBD fusion which will be demonstrated in Section 6.5.
model. The model has two output results: grid-based classification re
sults and box-based detection results. In this paper, the grid-based 5.2. Crack segmentation
classification results are the desired output, and the box-based detec
tion results are not needed. 5.2.1. Threshold-based method
Crack segmentation is performed in each grid cell using the Otsu
5.1.1. Grid-based classification branch algorithm [51]. The crack segmentation algorithm can be divided into
The network architecture has two branches, the grid-based classifi three steps: preprocessing, segmentation and postprocessing, as shown
cation branch and the box-based detection branch, as shown in Fig. 9. In in Fig. 11.
this paper, we focus on the grid-based classification branch. The grid- Preprocessing can be divided into two parts: image preprocessing
based classification branch of the network resizes the image to 1440 and cracked region preprocessing. For cracked regions, grid cells with
× 960 and outputs a 45 by 30 grid mask. This branch integrates the obvious misidentification are filtered out based on the confidence of
features extracted by the backbone from three scales through the grid each grid cell and the connectivity between all grid cells through con
neck and finally outputs a grid mask through a layer of convolution on nected component analysis (CCA). If a connected area is small and the
the 5-fold subsampled feature map. average confidence is low, this area is considered an obvious misiden
Each grid cell in the grid mask has a confidence level. An appropriate tification area for filtering. Afterward, the median filter is used to
threshold is set based on the scenario and the requirements for filtering smooth the image and filter out the salt and pepper noise.
out the grid cells below the threshold. The threshold can be chosen by The maximum interclass variance is calculated in each grid cell to
testing model performance on a small dataset, which in this paper is size obtain the local Otsu segmentation threshold. Because the image reso
5. The smaller the threshold is, the higher the recall. The larger the lution is high and the coverage is wide, different areas in the image have
threshold is, the higher the precision. In identifying pier surface cracks, a inconsistent lighting. Therefore, using one threshold segmentation for
threshold of 0.3 is set to find as many cracks as possible. A low threshold the entire image will cause the local tiny crack to not be identified. This
leads to some misidentifications, which are mostly non-pier surface problem can be alleviated by using the local threshold method based on
disturbances that can be filtered through the background exclusion grid cells.
method proposed in the paper. In addition, when the threshold is set to Preprocessing improves the reliability of the crack segmentation
approximately 0.3, the F1 score reaches the maximum value, and the results on the macro scale, while postprocessing further refines the crack

10
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

Fig. 10. Performance on the pier image dataset.

segmentation results on the micro scale. Calculating the connection 5.2.2. Segmentation performance
relation of pixel points filters out noise such as holes. Then, the expan To test the performance of the segmentation algorithm, experiments
sion corrosion closure operation is used to address edge nonclosure and are performed on the concrete crack dataset [53]. The edge of the crack
internal cavities. is fuzzy, and there is a transition area between the crack pixel and the
noncrack pixel, so the two adjacent pixels around the crack pixel can

Fig. 11. Crack segmentation process.

11
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

Fig. 12. Crack segmentation results.

crack segmentation has excellent performance in general.

6. Experimental validation and discussion

To validate the proposed methodology, experimental studies were

carried out in a real-world scenario. To find a suitable experimental
scene, the authors conducted a field survey of 12 bridges along the G7
Beijing-Xinjiang Expressway and G6 Beijing-Tibet Expressway in China.

Fig. 13. Experimental scene: G7 bridge.

also be classified as true positives [54]. Precision, recall and F1 score as

defined in Eq. (1) are used to evaluate the performance of the segmen
tation algorithm. Fig. 12 shows the results of crack segmentation in three
typical images.
The experimental results show that the proposed segmentation al
gorithm can effectively extract cracks on the surface of concrete struc
tures. The precision shows that the accuracy of the crack pixels
identified by the algorithm can reach 92.0%. The recall shows that the
proportion of cracked pixels identified by the algorithm to the actual
cracked pixels is 75.4%. The F1 score reaches 82.8%, indicating that Fig. 14. DJI M300 RTK UAV equipped with a Sony Alpha 7R IV mirrorless
camera and a self-developed gimbal system.

12
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

Fig. 15. UAV executing the image collection mission.

The survey revealed that one bridge along the G7 Beijing-Xinjiang processing technique can also be adopted to enhance the crack detection
Expressway was moderately sized and had cracks on the surfaces of ability if necessary.
several concrete piers, making it a suitable candidate for this experi Since no product on the market integrates this camera with a UAV,
ment. Therefore, this bridge (referred to as the G7 bridge hereafter) was our research team developed a gimbal system that supports the UAV to
selected as the experimental scene, as demonstrated in Fig. 13. The G7 carry the camera. A corresponding software system was also developed
bridge is a three-span continuous concrete bridge with six lanes in total, to enable real-time control of the camera's three-axis rotation and
comprising 8 prismatic piers arranged in two rows. shooting action via the UAV remoter.
The experiment was carried out by following the process shown in Fig. 15 displays some photos of the UAV at work. The UAV was
Fig. 1, and the results of each step are discussed in detail in the following manually controlled to fly and photograph the G7 bridge from various
subsections. angles and different distances. A total of 1577 images were obtained.

6.1. Image collection using UAV

6.2. 3D scene reconstruction
This experiment utilized a DJI M300 RTK UAV equipped with a Sony
The advanced commercial software Agisoft Metashape [52] was used
Alpha 7R IV mirrorless camera and a self-developed gimbal system, as
to perform SfM-based 3D scene reconstruction on a high-performance
illustrated in Fig. 14. The Sony Alpha 7R IV was selected due to its sensor
Windows 10 workstation with an Intel Xeon W-2223 CPU and 192 GB
with up to 61 effective megapixels and lightweight compact body
RAM. All 1577 bridge images were imported into the software. The
weighing only 665 g, which does not significantly increase the UAV's
highest alignment accuracy was selected for photoalignment, which
power consumption. Note that the camera selection should consider the
took approximately 2.5 h. The results showed that 1416 of the 1577
detectable crack width standard required by the inspection criteria.
images were successfully aligned. Then, the estimated preselection
Assuming that cracks wider than one pixel in the images can be detected,
mode in the reference preselection module of the alignment settings was
the following computed detectable crack width should surpass the
selected. This mode uses estimated camera pose information to help
required standard value:
process the photos that were not yet aligned. This step took approxi
FOV w Sw D mately 1 min and resulted in the successful alignment of 1493 of the
ωdetectable = =
w fw 1577 images. As discussed in Section 1, images that could not be aligned
(6)
FOV h Sh D were mostly closely shot local images that lacked information about the
or ωdetectable = = remaining scene.
h fh
Next, a dense point cloud was built with the depth filtering mode set
where ωdetectable is the detectable crack width; FOVw and FOVh are the to aggressive to remove outlier points caused by noise or poor focus. To
width and height of the field of view in millimeters, respectively; w and h investigate the effect of the dense cloud quality setting, a parametric
are the width and height of the image in pixels, respectively; Sw and Sh analysis was conducted. The results of this analysis are summarized in
are the width and height of the camera sensor in millimeters, respec Table 4. For each degradation in the quality setting, the software
tively; D is the shooting distance; and f is the focal length of the camera downscales the preliminary image size by a factor of 4 (2 times on each
in millimeters. side). When a relatively high quality is selected, depth map generation
For the Sony Alpha 7R IV camera, Sw, Sh, w and h were 35.7 mm, and dense cloud generation require a long processing time and a large
23.8 mm, 9504 pixels and 6336 pixels, respectively. The camera was amount of memory. Although high-performance workstations can be
equipped with a prime lens with a focal length f of 50 mm. Therefore, the competent, these consumptions are not feasible for ordinary personal
detectable crack width ωdetectable was 0.3 mm when the shooting distance computers. In addition, when the 3D reconstruction yields too many
D was approximately 4 m. When the required detectable crack width is points that make the file overly large, smoothly editing the exported
smaller or a camera with lower pixel resolution is used, a smaller point cloud in visualization software such as CloudCompare is difficult.
shooting distance is recommended. Moreover, the superresolution Since the Sony Alpha 7R IV mirrorless camera captures high-resolution

Table 4
Parametric analysis of the dense cloud quality setting.
Quality setting Depth map generation Dense cloud generation Number of points File size/GB

Processing time/h Memory usage/GB Processing time/h Memory usage/GB

Ultrahigh 47 21.22 32 135.18 3,690,575,911 57.53

High 15.8 21.8 9.5 70.05 953,498,233 14.02
Medium 3.9 5.87 3.5 35.48 333,855,364 4.84
Low 1.1 2.31 1 7.4 95,224,214 1.36
Lowest 0.5 1.96 0.5 5.59 31,993,679 0.46

13
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

Fig. 16. Built point cloud of the G7 bridge (dense cloud quality: low).

Fig. 17. Visualized comparison between prediction and ground truth of the G7 bridge.

Table 5
Quantitative evaluation of semantic segmentation of the G7 bridge.
Metrics for each class Global metrics

Metrics Background Pier Superstructure Parapet

Precision 99.8% 83.8% 96.9% 82.4% OA 97.0%

Recall 96.6% 99.9% 96.8% 99.8% AR 98.3%
IoU 96.4% 83.8% 93.8% 82.3% mIoU 89.1%
F1 score 98.2% 91.2% 96.8% 90.3% Average F1 score 94.1%

images with 9504 × 6336 pixels, selecting a low quality can already similar cases, a low dense cloud quality setting is recommended. The
generate 95,224,214 points with a point density similar to that of the built point cloud is shown in Fig. 16. Details such as the drainage pipe,
dataset described in Section 3.3. Therefore, for this experiment and height limit mark and pier surface texture of the G7 bridge are quite

Fig. 18. ROI extraction results of some typical images.

14
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

Table 6 bridge. CloudCompare was used to manually annotate the point cloud as
Quantitative evaluation of the image ROI extraction. the ground truth for further comparison. The comparison between the
Metrics for each class Global metrics prediction and the ground truth of the G7 bridge is visualized in Fig. 17.
The results show a small visual difference between the prediction and
Metrics Background ROI
the ground truth, indicating that the semantic segmentation was
Precision 97.1% 99.9% OA 98.9% generally successful.
Recall 99.9% 98.2% AR 99.1%
IoU 97.0% 98.2% mIoU 97.6%
The quantitative assessment results are presented in Table 5. The
F1 score 98.5% 99.1% Average F1 score 98.8% overall prediction performance is excellent, with the model achieving an
OA of 97.0%, which is similar to that of the test set. The segmentation of
the four classes, namely, background, pier, superstructure and parapet,
clear, indicating that the scene reconstruction quality is good and the is satisfactory, with F1 scores above 90% for each class.
point density can meet the requirements. The subsequent processing and
analysis are based on this point cloud. 6.4. Image ROI extraction

6.3. Point cloud semantic segmentation The identification targets of this experiment are the cracks on the
concrete surfaces of piers. A total of 26 images containing crack infor
The built 3D point cloud of the G7 bridge was directly fed into the mation on pier surfaces were selected as the data source for crack
trained RandLA-BridgeNet to obtain a point cloud with predicted se identification. The 3D ROI corresponding to each image is the pier to
mantic labels. This cloud was obtained extremely quickly, with a pro which the cracked concrete surface in the image belongs; it can be easily
cessing time of only 125.4 s for the dense cloud containing extracted from the semantic-segmented point cloud of the G7 bridge, as
approximately 95 million points from the approximately 40-m-long G7 described in Section 4.1. By batch processing the 26 images using the

Fig. 19. Crack identification results of typical images before and after ROI extraction.

15
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

Table 7 poor lighting conditions underneath the bridge, so the pier surface ap
Crack identification results. pears dark, while the abutment surface in the background appears
Image Number of grid Number of grid cells in Misidentification brighter. Because of this, the deep learning method of 2D images is likely
number cells in ROI background rate to misidentify the brighter abutment surface in the background as an
1 144 149 50.9% ROI. The methodology proposed in this paper avoids this problem in
2 160 293 64.7% principle, reasonably removing the background pixels and preserving
3 56 44 44.0% the concrete surface of interest.
4 193 182 48.5% The aforementioned image ROI extraction can be regarded as a bi
5 117 30 20.4%
6 149 192 56.3%
nary classification task to quantitatively evaluate its effectiveness.
7 146 10 6.4% Specifically, this step involves classifying all pixels in each image as
8 129 20 13.4% either background or ROI. The 26 images were manually annotated by
9 57 65 53.3% removing the background to generate the ground truth. Table 6 presents
10 36 133 78.7%
the pixel-level evaluation metrics. There is only a slight difference be
11 141 216 60.5%
12 369 60 14.0% tween the evaluation metrics for the background and ROI. Moreover, all
13 142 60 29.7% the evaluation metrics exceed 97%, indicating that the boundaries be
14 67 204 75.3% tween the background and ROIs were accurately detected.
15 45 246 84.5%
16 59 289 83.0%
17 210 65 23.6%
6.5. Crack identification
18 134 23 14.6%
19 113 19 14.4%
20 98 34 25.8% The images before and after ROI extraction were processed by the
21 99 33 25.0% crack identification method. The crack identification results were ob
22 92 37 28.7% tained by extracting grid cells containing cracks from the GCDB fusion
23 173 168 49.3%
model, as shown in Fig. 19. When the background is excluded, the crack
24 116 136 54.0%
25 163 140 46.2% identification accuracy improves. Lines such as beams, branches, and
26 111 208 65.2% the interface between the ROI and background can easily be mis
All images 3319 3056 47.9% identified as cracks when the background has not been excluded.
Excluding the background can not only eliminate background line
interference but also improve the accuracy of crack identification near
algorithm described in Section 4.3, images containing only the 2D ROIs
the interface between the foreground and background.
(cracked concrete surfaces of interest) were obtained.
The quantitative results of the crack identification are listed in
Fig. 18 illustrates the ROI extraction results of some typical images.
Table 7, in which the misidentification rate denotes the ratio of the
These images demonstrate the challenges mentioned by other re
number of grid cells in the background to the total number of grid cells.
searchers [19–21], showing that directly using a deep learning method
Overall, the ROI extraction operation filters 47.9% of the grid cells; these
for concrete component recognition in 2D images can be problematic.
filtered grid cells are misidentified background cells. Therefore, ROI
For the first three images, the deep learning method for 2D images
extraction can effectively improve crack identification accuracy. Elimi
recognizes all bridge piers as ROIs and keeps them, making directly
nating the interference of a complex background can make the network
obtaining the desired results impossible. The fourth image is taken in
focus on the ROI, which is more consistent with the training data

Fig. 20. Crack segmentation results of the concrete pier.

16
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

distribution and enhances the robustness of the network. surface crack image datasets for concrete bridges. Although the
After crack extraction, the threshold segmentation method intro identification results are generally satisfactory, transfer learning
duced in Section 5.2 is used to further segment cracks, as shown in and fine tuning could feasibly further improve performance. The
Fig. 20. The proposed method can segment cracks effectively and can be main feature extraction layer of asphalt pavement weight can be
used for subsequent crack assessment to assist with sophisticated frozen, and a small amount of concrete surface data can be
maintenance decisions. labeled to train the fusion model so that the model can better
adapt to the data distribution of concrete bridge cracks. From
7. Conclusions another perspective, establishing a large concrete bridge crack
image dataset for training a new crack identification model
Accurately detecting cracks in concrete surface images with complex would also be meaningful.
backgrounds is a challenging task. To improve the results of this task, an
image ROI extraction methodology based on 3D point cloud semantic CRediT authorship contribution statement
segmentation and 3D-to-2D projection is presented in this paper. First, a
deep-learning-based semantic segmentation network RandLA-BridgeNet Jing-Lin Xiao: Data curation, Investigation, Methodology, Visuali
for large-scale bridge point clouds is constructed. A real-world bridge zation, Writing – original draft. Jian-Sheng Fan: Conceptualization,
point cloud dataset is established for training and testing the network. Funding acquisition, Writing – review & editing, Supervision. Yu-Fei
Using the entire point cloud of the scene as input, RandLA-BridgeNet can Liu: Conceptualization, Funding acquisition, Methodology, Supervision,
perform semantic segmentation accurately and efficiently, achieving Writing – review & editing. Bao-Luo Li: Investigation, Validation,
mIoUs of 91.6% and 91.1% on the validation set and the test set, Visualization. Jian-Guo Nie: Supervision, Writing – review & editing.
respectively. Then, the 3D ROIs (concrete components of interest) are
easily extracted from the segmented point cloud and projected into the
Declaration of Competing Interest
corresponding images according to the pinhole camera model and
camera pose information. Next, the alpha shape algorithm is used to
The authors declare that they have no known competing financial
detect the boundaries of the projected 2D ROI and remove the back
interests or personal relationships that could have appeared to influence
ground, generating images that contain only the ROIs (concrete surfaces
the work reported in this paper.
of interest). Finally, improved deep-learning-based crack identification
can be performed using these processed images.
Data availability
The methodology was validated by an experiment on an approxi
mately 40-m-long bridge along the G7 Beijing-Xinjiang Expressway in
Data will be made available on request.
China. For the point cloud reconstructed from 1577 UAV aerial images
and containing approximately 95 million points, the inference of
Acknowledgments
RandLA-BridgeNet took only 125.4 s. RandLA-BridgeNet achieved
excellent semantic segmentation results, with F1 scores of 98.2%,
The research is supported by the National Natural Science Founda
91.2%, 96.8% and 90.3% for the background, pier, superstructure and
tion of China (No. 52192662 and 52121005). The authors express
parapet, respectively. Image ROI extraction was performed on 26 images
sincere appreciation for their contribution to this research.
containing concrete surface cracks, with the overall extraction accuracy
reaching 98.9%. A grid-based classification and box-based detection
References
fusion model is used to identify cracks in the images. After ROI extrac
tion, 47.9% of the grid cells, which represent background mis [1] K. Chaiyasarn, A. Buatik, H. Mohamad, M. Zhou, S. Kongsilp, N. Poovarodom,
recognition, are filtered, greatly improving in the crack identification Integrated pixel-level CNN-FCN crack detection via photogrammetric 3D texture
accuracy. mapping of concrete structures, Automation in Construction 140 (2022), 104388,
https://doi.org/10.1016/j.autcon.2022.104388.
The presented methodology integrates point cloud semantic seg [2] S.Y. Kong, J.S. Fan, Y.F. Liu, X.C. Wei, X.W. Ma, Automated crack assessment and
mentation and 3D-to-2D projection technologies into the UAV-based quantitative growth monitoring, Comput. Aided Civ. Inf. Eng. 36 (2021) 656–674,
bridge crack detection task, contributing to advancements in the field. https://doi.org/10.1111/mice.12626.
[3] X. Tan, A. Abu-Obeidah, Y. Bao, H. Nassif, W. Nasreddine, Measurement and
As indicated by the field experimental validation presented in Section 6,
visualization of strains and cracks in CFRP post-tensioned fiber reinforced concrete
the methodology framework shown in Fig. 1 has much potential for beams using distributed fiber optic sensors, Automation in Construction 124
practical UAV-based bridge inspection applications and achieves (2021), 103604, https://doi.org/10.1016/j.autcon.2021.103604.
impressive crack detection results when handling images containing [4] B.A. Graybeal, B.M. Phares, D.D. Rolander, M. Moore, G. Washer, Visual inspection
of highway bridges, J. Nondestruct. Eval. 21 (3) (2002) 67–83, https://doi.org/
complex background information. 10.1023/A:1022508121821.
However, some limitations still exist and call for future research [5] Y. Liu, S. Cho, B.F. Spencer, J. Fan, Automated assessment of cracks on concrete
efforts: surfaces using adaptive digital image processing, Smart Struct. Syst. 14 (4) (2014)
719–741, https://doi.org/10.12989/sss.2014.14.4.719.
[6] R. Ali, J.H. Chuah, M.S.A. Talip, N. Mokhtar, M.A. Shoaib, Structural crack
(1) Due to the relatively limited scenarios covered by the training detection using deep convolutional neural networks, Automation in Construction
data, the semantic segmentation network has limited applica 133 (2022), 103989, https://doi.org/10.1016/j.autcon.2021.103989.
[7] A. Zhang, K.C.P. Wang, B. Li, E. Yang, X. Dai, Y. Peng, Y. Fei, Y. Liu, J.Q. Li,
bility to various scenarios. A large open-source point cloud C. Chen, Automated pixel-level pavement crack detection on 3D asphalt surfaces
database that covers more bridge types needs to be established. using a deep-learning network, Comput. Aided Civ. Inf. Eng. 32 (10) (2017)
(2) The parameter setting method of the alpha shape algorithm needs 805–819, https://doi.org/10.1111/mice.12297.
[8] S. Dorafshan, R.J. Thomas, M. Maguire, Comparison of deep convolutional neural
to be further studied to extract the 2D ROI boundary accurately networks and edge detectors for image-based crack detection in concrete,
for bridge components or surfaces with holes. Construct. Build Mater. 186 (2018) 1031–1045, https://doi.org/10.1016/j.
(3) In the presented experiment, the manually controlled UAV flight conbuildmat.2018.08.011.
[9] C.V. Dung, L.D. Anh, Autonomous concrete crack detection using deep fully
was cumbersome and inefficient, requiring large battery con
convolutional neural network, Automation in Construction 99 (2019) 52–58,
sumption. Automatic path planning and control methods for UAV https://doi.org/10.1016/j.autcon.2018.11.028.
bridge inspection tasks need to be developed, and multiple UAVs [10] S. Bang, S. Park, H. Kim, H. Kim, Encoder-decoder network for pixel-level road
may collaborate to further improve efficiency. crack detection in black-box images, Comput. Aided Civ. Inf. Eng. 34 (8) (2019)
713–727, https://doi.org/10.1111/mice.12440.
(4) The crack identification model used in this study was trained on [11] C. Xiang, W. Wang, L. Deng, P. Shi, X. Kong, Crack detection algorithm for concrete
the asphalt pavement image dataset due to the lack of available structures based on super-resolution reconstruction and segmentation network,

17
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226

Autom. Constr. 140 (2022), 104346, https://doi.org/10.1016/j. [32] Y. Li, R. Bu, M. Sun, W. Wu, X. Di, B. Chen, PointCNN: convolution on x-
autcon.2022.104346. transformed points, advances in neural information processing systems, Montreal
[12] P. Guo, X. Meng, W. Meng, Y. Bao, Monitoring and automatic characterization of (2018) 820–830. https://dl.acm.org/doi/10.5555/3326943.3327020.
cracks in strain-hardening cementitious composite (SHCC) through intelligent [33] Y. Wang, Y. Sun, Z. Liu, S.E. Sarma, M.M. Bronstein, J.M. Solomon, Dynamic graph
interpretation of photos, Compos. Part B Eng. 242 (2022), 110096, https://doi. CNN for learning on point clouds, ACM Trans. Graph. 38 (5) (2019) 1–12, https://
org/10.1016/j.compositesb.2022.110096. doi.org/10.1145/3326362.
[13] S. Poorghasem, Y. Bao, Review of robot-based automated measurement of [34] L. Landrieu, M. Simonovsky, Large-Scale Point Cloud Semantic Segmentation with
vibration for civil engineering structures, Measurement 207 (2023), 112382, Superpoint Graphs, Proceedings of the IEEE Conference on Computer Vision and
https://doi.org/10.1016/j.measurement.2022.112382. Pattern Recognition, Salt Lake City, 2018, pp. 4558–4567, https://doi.org/
[14] E. Ranyal, A. Sadhu, K. Jain, Road condition monitoring using smart sensing and 10.1109/CVPR.2018.00479.
artificial intelligence: a review, Sensors 22 (8) (2022) 3044, https://doi.org/ [35] Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, A. Markham, RandLA-
10.3390/s22083044. Net: Efficient Semantic Segmentation of Large-Scale Point Clouds, Proceedings of
[15] C. Chen, S. Chandra, Y. Han, H. Seo, Deep learning-based thermal image analysis the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle,
for pavement defect detection and classification considering complex pavement 2020, pp. 11108–11117, https://doi.org/10.1109/CVPR42600.2020.01112.
conditions, Remote Sens. (Basel) 14 (1) (2022) 106, https://doi.org/10.3390/ [36] I. Armeni, O. Sener, A.R. Zamir, H. Jiang, I. Brilakis, M. Fischer, S. Savarese, 3D
rs14010106. Semantic Parsing of Large-Scale Indoor Spaces, Proceedings of the IEEE Conference
[16] J. Guan, X. Yang, L. Ding, X. Cheng, V.C.S. Lee, C. Jin, Automated pixel-level on Computer Vision and Pattern Recognition, Las Vegas, 2016, pp. 1534–1543,
pavement distress detection based on stereo vision and deep learning, Automation https://doi.org/10.1109/CVPR.2016.170.
in Construction 129 (2021), 103788, https://doi.org/10.1016/j. [37] T. Hackel, N. Savinov, L. Ladicky, J.D. Wegner, K. Schindler, M. Pollefeys,
autcon.2021.103788. Semantic3D.net: A new large-scale point cloud classification benchmark, arXiv
[17] Y.F. Liu, S. Cho, B.F. Spencer, J.S. Fan, Concrete crack assessment using digital preprint arXiv (2017), https://doi.org/10.48550/arXiv.1704.03847, 1704.03847.
image processing and 3D scene reconstruction, J. Comput. Civ. Eng. 30 (1) (2016) [38] J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, J. Gall,
04014124, https://doi.org/10.1061/(ASCE)CP.1943-5487.0000446. SemanticKITTI: A Dataset for Semantic Scene Understanding of Lidar Sequences,
[18] Y.F. Liu, X. Nie, J.S. Fan, X.G. Liu, Image-based crack assessment of bridge piers Proceedings of the IEEE International Conference on Computer Vision, Seoul,
using unmanned aerial vehicles and three-dimensional scene reconstruction, 2019, pp. 9297–9307, https://doi.org/10.1109/ICCV.2019.00939.
Comput. Aided Civ. Inf. Eng. 35 (2020) 511–529, https://doi.org/10.1111/ [39] B. Riveiro, M.J. DeJong, B. Conde, Automated processing of large point clouds for
mice.12501. structural health monitoring of masonry arch bridges, Automation in Construction
[19] Y. Narazaki, V. Hoskere, T.A. Hoang, Y. Fujino, A. Sakurai, B.F. Spencer, Vision- 72 (2016) 258–268, https://doi.org/10.1016/j.autcon.2016.02.009.
based automated bridge component recognition with high-level scene consistency, [40] Y. Yan, J.F. Hajjar, Automated extraction of structural elements in steel girder
Comput. Aided Civ. Inf. Eng. 35 (2020) 465–482, https://doi.org/10.1111/ bridges from laser point clouds, Automation in Construction 125 (2021), 103582,
mice.12505. https://doi.org/10.1016/j.autcon.2021.103582.
[20] N. Saovana, N. Yabuki, T. Fukuda, Development of an unwanted-feature removal [41] R. Lu, I. Brilakis, C.R. Middleton, Detection of structural components in point
system for structure from motion of repetitive infrastructure piers using deep clouds of existing RC bridges, Comput. Aided Civ. Inf. Eng. 34 (2019) 191–212,
learning, Adv. Eng. Inform. 46 (2020), 101169, https://doi.org/10.1016/j. https://doi.org/10.1111/mice.12407.
aei.2020.101169. [42] L. Truong-Hong, R. Lindenbergh, Automatically extracting surfaces of reinforced
[21] S.O. Sajedi, X. Liang, Uncertainty-assisted deep vision structural health monitoring, concrete bridges from terrestrial laser scanning point clouds, Automation in
Comput. Aided Civ. Inf. Eng. 36 (2021) 126–142, https://doi.org/10.1111/ Construction 135 (2022), 104127, https://doi.org/10.1016/j.
mice.12580. autcon.2021.104127.
[22] Y. Xie, J. Tian, X.X. Zhu, Linking points with labels in 3D: a review of point cloud [43] H. Kim, J. Yoon, S.H. Sim, Automated bridge component recognition from point
semantic segmentation, IEEE Geoscience and Remote Sensing Magazine 8 (4) clouds using deep learning, Struct. Control Health Monit. 27 (2020), e2591,
(2020) 38–59, https://doi.org/10.1109/MGRS.2019.2937630. https://doi.org/10.1002/stc.2591.
[23] Y. Guo, H. Wang, Q. Hu, H. Liu, M. Bennamoun, Deep learning for 3D point clouds: [44] H. Kim, C. Kim, Deep-learning-based classification of point clouds for bridge
a survey, IEEE Trans. Pattern Anal. Mach. Intell. 43 (12) (2021) 4338–4364, inspection, Remote Sens. (Basel) 12 (22) (2020) 3757, https://doi.org/10.3390/
https://doi.org/10.1109/TPAMI.2020.3005434. rs12223757.
[24] D. Maturana, S. Scherer, Voxnet: A 3D convolutional neural network for real-time [45] J.S. Lee, J. Park, Y.M. Ryu, Semantic segmentation of bridge components based on
object recognition, in: 2015 IEEE/RSJ International Conference on Intelligent hierarchical point cloud model, Automation in Construction 130 (2021), 103847,
Robots and Systems (IROS), Hamburg, 2015, pp. 922–928, https://doi.org/ https://doi.org/10.1016/j.autcon.2021.103847.
10.1109/IROS.2015.7353481. [46] X. Yang, E.R. Castillo, Y. Zou, L. Wotherspoon, Y. Tan, Automated semantic
[25] Y. Zhou, O. Tuzel, Voxelnet: End-to-End Learning for Point Cloud Based 3D Object segmentation of bridge components from large-scale point clouds using a weighted
Detection, Proceedings of the IEEE Conference on Computer Vision and Pattern superpoint graph, Automation in Construction 142 (2022), 104519, https://doi.
Recognition, Salt Lake City, 2018, pp. 4490–4499, https://doi.org/10.1109/ org/10.1016/j.autcon.2022.104519.
CVPR.2018.00472. [47] Y. Jing, B. Sheil, S. Acikgoz, Segmentation of large-scale masonry arch bridge point
[26] C.R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, L.J. Guibas, Volumetric and Multi-View clouds with a synthetic simulator and the BridgeNet neural network, Automation in
CNNs for Object Classification on 3D Data, Proceedings of the IEEE Conference on Construction 142 (2022), 104459, https://doi.org/10.1016/j.
Computer Vision and Pattern Recognition, Las Vegas, 2016, pp. 5648–5656, autcon.2022.104459.
https://doi.org/10.1109/CVPR.2016.609. [48] H. Edelsbrunner, D. Kirkpatrick, R. Seidel, On the shape of a set of points in the
[27] H. Su, S. Maji, E. Kalogerakis, E. Learned-Miller, Multi-View Convolutional Neural plane, IEEE Trans. Inf. Theory 29 (1983) 551–559, https://doi.org/10.1109/
Networks for 3D Shape Recognition, Proceedings of the IEEE International TIT.1983.1056714.
Conference on Computer Vision, Santiago, 2015, pp. 945–953, https://doi.org/ [49] MATLAB R2022a, The MathWorks Inc., Natick, MA. https://ww2.mathworks.cn/h
10.1109/ICCV.2015.114. elp/matlab/, 2022 (accessed May 14, 2023).
[28] A. Boulch, J. Guerrv, B.L. Saux, N. Audebert, SnapNet: 3D point cloud semantic [50] B.L. Li, Y. Qi, J.S. Fan, Y.F. Liu, C. Liu, A grid-based classification and box-based
labeling with 2D deep segmentation networks, Computer & Graphics 71 (2018) detection fusion model for asphalt pavement crack, Comput. Aided Civ. Inf. Eng.
189–198, https://doi.org/10.1016/j.cag.2017.11.010. (2022), https://doi.org/10.1111/mice.12962.
[29] A. Milioto, I. Vizzo, J. Behley, C. Stachniss, RangeNet++: Fast and accurate LiDAR [51] N. Otsu, A threshold selection method from gray-level histograms, IEEE Trans.
semantic segmentation, in: 2019 IEEE/RSJ International Conference on Intelligent Syst. Man Cybern. 9 (1979) 62–66, https://doi.org/10.1109/TSMC.1979.4310076.
Robots and Systems (IROS), Venetian Macao, 2019, pp. 4213–4220, https://doi. [52] L.L.C. Agisoft, Agisoft Metashape user manual: professional edition, Version 2.0.
org/10.1109/IROS40897.2019.8967762. https://www.agisoft.com/pdf/metashape-pro_2_0_en.pdf, 2023.
[30] C.R. Qi, H. Su, K. Mo, L.J. Guibas, PointNet: Deep Learning on Point Sets for 3D [53] C.F. Özgenel, Concrete crack images for classification, Mendeley Data V2 (2019),
Classification and Segmentation, Proceedings of the IEEE Conference on Computer https://doi.org/10.17632/5y9wdsg2zt.2.
Vision and Pattern Recognition, Hawaii, 2017, pp. 652–660, https://doi.org/ [54] J. Liu, X. Yang, S. Lau, X. Wang, S. Luo, V.C.S. Lee, L. Ding, Automated pavement
10.1109/CVPR.2017.16. crack detection and segmentation based on two-step convolutional neural network,
[31] C.R. Qi, L. Yi, H. Su, L.J. Guibas, PointNet++: deep hierarchical feature learning on Comput. Aided Civ. Inf. Eng. 35 (11) (2020) 1291–1305, https://doi.org/10.1111/
point sets in a metric space, advances in neural information processing systems, mice.12622.
Long Beach (2017) 5099–5108. https://dl.acm.org/doi/abs/10.5555/3295222.32
95263.

Ebida Bid Response Plan Template
No ratings yet
Ebida Bid Response Plan Template
18 pages
How George Soros Broke The Bank of England
No ratings yet
How George Soros Broke The Bank of England
4 pages
Identification of the Surface Cracks of Concrete B
No ratings yet
Identification of the Surface Cracks of Concrete B
18 pages
Image super resolution ESR-GAN
No ratings yet
Image super resolution ESR-GAN
15 pages
Crack Detection in Concrete Using Transfer Learning
No ratings yet
Crack Detection in Concrete Using Transfer Learning
12 pages
(OUT 04) - Drones-06-00005 (Membahas CNN)
No ratings yet
(OUT 04) - Drones-06-00005 (Membahas CNN)
23 pages
Implementation of Computer Vision Technique For Crack Monitoring in Concrete Structure
No ratings yet
Implementation of Computer Vision Technique For Crack Monitoring in Concrete Structure
5 pages
1-s2.0-S0926580523001899-main
No ratings yet
1-s2.0-S0926580523001899-main
19 pages
Crack Identification From Concrete Structure Images Using Deep Transfer Learning
No ratings yet
Crack Identification From Concrete Structure Images Using Deep Transfer Learning
7 pages
Quantification of Structural Defects Using Pixel Level Depb1xd0
No ratings yet
Quantification of Structural Defects Using Pixel Level Depb1xd0
14 pages
[email protected]
No ratings yet
[email protected]
14 pages
1 s2.0 S0263224123001963 Main
No ratings yet
1 s2.0 S0263224123001963 Main
13 pages
Sensors 23 01419 v2
No ratings yet
Sensors 23 01419 v2
21 pages
Concrete Crack Quantification Using Voxel-Based Reconstructi
No ratings yet
Concrete Crack Quantification Using Voxel-Based Reconstructi
13 pages
Water 15 02082
No ratings yet
Water 15 02082
21 pages
Survey Paper Draft
No ratings yet
Survey Paper Draft
6 pages
Remotesensing 15 02400 With Cover
No ratings yet
Remotesensing 15 02400 With Cover
47 pages
Road Crack Detection Using Deep Convolutional Neural Network and Adaptive Thresholding
No ratings yet
Road Crack Detection Using Deep Convolutional Neural Network and Adaptive Thresholding
6 pages
Efficient Crack Detection and Quantification in Concrete Structures Using IoT
No ratings yet
Efficient Crack Detection and Quantification in Concrete Structures Using IoT
16 pages
Crack Detection On Concrete Images Using Classification Techniques in Machine Learning
No ratings yet
Crack Detection On Concrete Images Using Classification Techniques in Machine Learning
6 pages
Comparison of Deep Concolutional Neural Networks and Edge Detectors For Image-Based Crack Detection in Concrete
No ratings yet
Comparison of Deep Concolutional Neural Networks and Edge Detectors For Image-Based Crack Detection in Concrete
15 pages
s41598-024-81119-1
No ratings yet
s41598-024-81119-1
20 pages
Mice 12519
No ratings yet
Mice 12519
17 pages
Sensors: Learning To Detect Cracks On Damaged Concrete Surfaces Using Two-Branched Convolutional Neural Network
No ratings yet
Sensors: Learning To Detect Cracks On Damaged Concrete Surfaces Using Two-Branched Convolutional Neural Network
18 pages
ABinocular Vision-Based Crack Detection and Measurement Method Incorporating Semantic Segmentation 2023
No ratings yet
ABinocular Vision-Based Crack Detection and Measurement Method Incorporating Semantic Segmentation 2023
23 pages
Concrete Crack Detection Algorithm Based On Deep Residual Neural Networks
No ratings yet
Concrete Crack Detection Algorithm Based On Deep Residual Neural Networks
7 pages
An Algorithm For Concrete Crack Extraction and Identification
No ratings yet
An Algorithm For Concrete Crack Extraction and Identification
26 pages
Automatic Damage Detection and Diagnosis for Hydra
No ratings yet
Automatic Damage Detection and Diagnosis for Hydra
14 pages
Image-Based Concrete Crack Detection Method Using The Median Absolute Deviation
No ratings yet
Image-Based Concrete Crack Detection Method Using The Median Absolute Deviation
16 pages
A Pavement Crack Detection and Evaluation Framework for a
No ratings yet
A Pavement Crack Detection and Evaluation Framework for a
24 pages
02 Whole
No ratings yet
02 Whole
141 pages
Research Article: Concrete Cracks Detection Using Convolutional Neural Network Based On Transfer Learning
No ratings yet
Research Article: Concrete Cracks Detection Using Convolutional Neural Network Based On Transfer Learning
10 pages
A Pavement Crack Detection Method via Deep Learning and a Binocular-Vision-Based Unmanned Aerial Vehicle
No ratings yet
A Pavement Crack Detection Method via Deep Learning and a Binocular-Vision-Based Unmanned Aerial Vehicle
20 pages
DeepCrack Learning Hierarchical Convolutional Features For Crack Detection
No ratings yet
DeepCrack Learning Hierarchical Convolutional Features For Crack Detection
15 pages
A Comparative Review of Image Processing Based Crack Detection Techniques On Civil Engineering Structures
No ratings yet
A Comparative Review of Image Processing Based Crack Detection Techniques On Civil Engineering Structures
17 pages
(17)Shengyuan Li and Xuefeng Zhao
No ratings yet
(17)Shengyuan Li and Xuefeng Zhao
17 pages
Health Monitoring of Civil Structures With Integrated UAV and Image Processing System
No ratings yet
Health Monitoring of Civil Structures With Integrated UAV and Image Processing System
8 pages
Computer Vision Framework For Crack Detection of Civil Infrastructure-A Review - ScienceDirect
No ratings yet
Computer Vision Framework For Crack Detection of Civil Infrastructure-A Review - ScienceDirect
10 pages
Precision and Efficiency in Dam Crack Inspection
No ratings yet
Precision and Efficiency in Dam Crack Inspection
20 pages
Infrastructures 08 00090
No ratings yet
Infrastructures 08 00090
13 pages
Advanced Topic 8a_ Crack Detection 1
No ratings yet
Advanced Topic 8a_ Crack Detection 1
9 pages
Review 3 final
No ratings yet
Review 3 final
21 pages
Convolutional Neural Networks-Based Crack Detection For Real Concrete Surface
No ratings yet
Convolutional Neural Networks-Based Crack Detection For Real Concrete Surface
8 pages
Research Article
No ratings yet
Research Article
16 pages
Deep Learning Approaches For Crack Detection in Bridge Concrete Structures
No ratings yet
Deep Learning Approaches For Crack Detection in Bridge Concrete Structures
6 pages
Automatic Crack Detection and Measurement Based On Image Ana
No ratings yet
Automatic Crack Detection and Measurement Based On Image Ana
8 pages
Infrastructures 09 00003
No ratings yet
Infrastructures 09 00003
16 pages
1-s2.0-S2352012422005227-main
No ratings yet
1-s2.0-S2352012422005227-main
8 pages
Crack Detection Using Deeplearning
No ratings yet
Crack Detection Using Deeplearning
6 pages
A Novel Intelligent Inspection Robot With Deep Stereo Vision For Three-Dimensional Concrete Damage Detection and Quantification
No ratings yet
A Novel Intelligent Inspection Robot With Deep Stereo Vision For Three-Dimensional Concrete Damage Detection and Quantification
15 pages
2024, DCNAM - Automatic detection of pixel level fine crack using a densely connected - 'Beyene et al' [Structures]
No ratings yet
2024, DCNAM - Automatic detection of pixel level fine crack using a densely connected - 'Beyene et al' [Structures]
12 pages
A-Hybrid-Image-Enhancement-Algorithm-for-Effective-Concrete-Surface-Crack-Cla Ssification
No ratings yet
A-Hybrid-Image-Enhancement-Algorithm-for-Effective-Concrete-Surface-Crack-Cla Ssification
16 pages
Deep Learning-Based Semantic Segmentation Methods for Pavement Cracks
No ratings yet
Deep Learning-Based Semantic Segmentation Methods for Pavement Cracks
11 pages
Signals (3)
No ratings yet
Signals (3)
2 pages
Sensors 23 00053 v2
No ratings yet
Sensors 23 00053 v2
16 pages
Crack Fetection
No ratings yet
Crack Fetection
12 pages
A Computational Framework For Next-Generation Inspection Imaging
No ratings yet
A Computational Framework For Next-Generation Inspection Imaging
176 pages
Buildings 13 00055 v3
No ratings yet
Buildings 13 00055 v3
19 pages
Feature_Pyramid_and_Hierarchical_Boosting_Network_for_Pavement_Crack_Detection
No ratings yet
Feature_Pyramid_and_Hierarchical_Boosting_Network_for_Pavement_Crack_Detection
11 pages
Computer Aided Civil Eng - 2017 - Cha - Deep Learning Based Crack Damage Detection Using Convolutional Neural Networks
No ratings yet
Computer Aided Civil Eng - 2017 - Cha - Deep Learning Based Crack Damage Detection Using Convolutional Neural Networks
18 pages
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
CFSP Integration API V1.0 17 Jun 2022
No ratings yet
CFSP Integration API V1.0 17 Jun 2022
39 pages
IRIE 26 Marx 12 2017 4 PDF
No ratings yet
IRIE 26 Marx 12 2017 4 PDF
13 pages
Technical Manual: Air Cooled Water Chillers With Helical Fans
No ratings yet
Technical Manual: Air Cooled Water Chillers With Helical Fans
36 pages
The 80386 Microprocessors
No ratings yet
The 80386 Microprocessors
24 pages
Too Much Money, Too Little Justice:: The Potter County Misdemeanor System
No ratings yet
Too Much Money, Too Little Justice:: The Potter County Misdemeanor System
54 pages
FE3-C1ABAA 2310212005 Operation Manual
No ratings yet
FE3-C1ABAA 2310212005 Operation Manual
212 pages
Gulf County Sheriff's Office Law Enforcement Summary May 1, 2017 - May 7, 2017
No ratings yet
Gulf County Sheriff's Office Law Enforcement Summary May 1, 2017 - May 7, 2017
3 pages
CSI Catalog
No ratings yet
CSI Catalog
2 pages
Mohit Singhal IIT Ropar CV
No ratings yet
Mohit Singhal IIT Ropar CV
1 page
Huawei Hg556a Manual Servicio en
No ratings yet
Huawei Hg556a Manual Servicio en
59 pages
Medi Prime Customer Information Sheet
No ratings yet
Medi Prime Customer Information Sheet
24 pages
wallstreetjournal_20250321_TheWallStreetJournal
No ratings yet
wallstreetjournal_20250321_TheWallStreetJournal
61 pages
Ramsey Micro-Tech 9104 Reference Manual
No ratings yet
Ramsey Micro-Tech 9104 Reference Manual
276 pages
Design Sheet 4. Power Distribution & Tension Member Belt Conveyor
No ratings yet
Design Sheet 4. Power Distribution & Tension Member Belt Conveyor
1 page
Panasonic DMC-FZ28 Manual
No ratings yet
Panasonic DMC-FZ28 Manual
32 pages
Aircraft Design Group
0% (1)
Aircraft Design Group
1 page
Схема и Сервис Мануал На Toshiba 48L1433DG
No ratings yet
Схема и Сервис Мануал На Toshiba 48L1433DG
22 pages
BS en 27841 1991 1999 Iso 7841 1988
0% (1)
BS en 27841 1991 1999 Iso 7841 1988
20 pages
Excel 2013 Training: 100 Metres Olympic
No ratings yet
Excel 2013 Training: 100 Metres Olympic
2 pages
A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks
No ratings yet
A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks
9 pages
Resumen Modulo 4 - Merged
No ratings yet
Resumen Modulo 4 - Merged
51 pages
Actelec 31 - 1600 (SA Multivueltas)
No ratings yet
Actelec 31 - 1600 (SA Multivueltas)
12 pages
Risk Management Plan Form Extraxt
No ratings yet
Risk Management Plan Form Extraxt
4 pages
Turkey Turkish Disability Act TDA No. 5378 of 2005
No ratings yet
Turkey Turkish Disability Act TDA No. 5378 of 2005
13 pages
MDZ AEP AEP PDP Cheapoair.ca_confirmationemail_print_guid=45bf8af8 850d 40a6 b457 6df32d8b3ae4
No ratings yet
MDZ AEP AEP PDP Cheapoair.ca_confirmationemail_print_guid=45bf8af8 850d 40a6 b457 6df32d8b3ae4
3 pages
Public Notice 7 - List of Registered Firms
No ratings yet
Public Notice 7 - List of Registered Firms
2 pages
TUBA2X
No ratings yet
TUBA2X
3 pages
LB-XP12-240-PD-U1-V1.1-202006
No ratings yet
LB-XP12-240-PD-U1-V1.1-202006
2 pages

3D segmentation and color coding

Uploaded by

3D segmentation and color coding

Uploaded by

Automation in Construction 158 (2024) 105226

Contents lists available at ScienceDirect

Region of interest (ROI) extraction and crack detection for UAV-based

1. Introduction efficiency of crack identification, digital image-based techniques have

Fig. 1. Methodology framework.

Fig. 2. Network architecture of RandLA-BridgeNet.

RandLA-BridgeNet was implemented in Python using the Tensor­

Fig. 3. Dataset used to train and test RandLA-BridgeNet.

Background Pier Superstructure Parapet Total

1 18,438,854 1,180,481 5,907,060 708,914 26,235,309

Metrics Background Pier Superstructure Parapet

Precision 97.5% 97.5% 96.7% 94.7% OA 97.1%

where i represents the class number; TP stands for true positive,

PointNet [46] 97.9% 90.0% 92.5% 85.8%

impressive results in all four classes, including background, pier, su­

Fig. 7. Parametric analysis of the alpha shape algorithm.

Fig. 8. Batch processing algorithm for image ROI extraction.

Fig. 9. Network architecture of the GCBD fusion model.

Fig. 10. Performance on the pier image dataset.

Fig. 11. Crack segmentation process.

Fig. 12. Crack segmentation results.

crack segmentation has excellent performance in general.

6. Experimental validation and discussion

To validate the proposed methodology, experimental studies were

Fig. 13. Experimental scene: G7 bridge.

also be classified as true positives [54]. Precision, recall and F1 score as

Fig. 15. UAV executing the image collection mission.

6.1. Image collection using UAV

Processing time/h Memory usage/GB Processing time/h Memory usage/GB

Ultrahigh 47 21.22 32 135.18 3,690,575,911 57.53

Metrics Background Pier Superstructure Parapet

Precision 99.8% 83.8% 96.9% 82.4% OA 97.0%

Fig. 18. ROI extraction results of some typical images.

Fig. 20. Crack segmentation results of the concrete pier.

You might also like

RandLA-BridgeNet was implemented in Python using the Tensor

impressive results in all four classes, including background, pier, su