3D segmentation and color coding
3D segmentation and color coding
Automation in Construction
journal homepage: www.elsevier.com/locate/autcon
A R T I C L E I N F O A B S T R A C T
Keywords: For digital-image-based bridge inspection tasks, images captured by camera-carrying unmanned aircraft vehicles
Unmanned aircraft vehicles (UAVs) (UAVs) usually contain both the region of interest (ROI) and the background. However, accurately detecting
Bridge inspection cracks in concrete surface images containing background information is challenging. To improve UAV-based
Structure from motion (SfM)
bridge inspection, an image extraction and crack detection methodology is presented in this paper. First, a
Large-scale point clouds
Semantic segmentation
deep-learning-based semantic segmentation network RandLA-BridgeNet for large-scale bridge point clouds,
3D-to-2D projection which can facilitate 3D ROI extraction, is trained and tested. Second, an image ROI extraction method based on
Crack identification 3D-to-2D projection is presented to generate images containing only the ROI. Finally, a data-driven deep learning
Deep learning convolutional neural network (CNN) called the grid-based classification and box-based detection fusion model
(GCBD) is utilized to identify cracks in the processed images. An experiment is conducted on highway bridge
images to validate the presented methodology. The overall semantic segmentation and image ROI extraction
accuracies are 97.0% and 98.9%, respectively. After ROI extraction, 47.9% of the grid cells, which represent
background misrecognition, are filtered, greatly improving the crack identification accuracy.
* Corresponding author.
E-mail addresses: [email protected] (J.-L. Xiao), [email protected] (J.-S. Fan), [email protected] (Y.-F. Liu), [email protected].
edu.cn (B.-L. Li), [email protected] (J.-G. Nie).
https://doi.org/10.1016/j.autcon.2023.105226
Received 17 May 2023; Received in revised form 25 November 2023; Accepted 27 November 2023
Available online 8 December 2023
0926-5805/© 2023 Elsevier B.V. All rights reserved.
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
is defined as the background. The aforementioned crack identification (1) Bridge images may contain complex background information,
studies primarily used images containing only the ROI as the input of and the background can sometimes be so large that the bridge is
deep neural networks. When an image contains both an ROI and back not the dominant object in the image.
ground, the deep-learning-based crack identification algorithm may (2) The characteristics of the same type of bridge components may
misidentify some parts of the background as cracks, thus affecting the differ in images with different shooting distances and lighting
quality of crack recognition. conditions.
As reviewed by Poorghasem et al. [13] and Ranyal et al. [14], various (3) Deep learning-based semantic segmentation algorithms rely
robot-based automated systems, e.g., unmanned ground vehicles (UGVs) heavily on the quality and quantity of training data. Unfortu
and unmanned aerial vehicles (UAVs) carrying vision systems such as nately, there is currently a scarcity of open-source annotated
cameras and optical lenses, were developed to facilitate image collec data, and the available scenarios are not sufficiently compre
tion. Different robot-based automated systems and computer vision hensive. Consequently, the reliability and portability of the
methods can be grouped for different engineering scenarios. Currently, trained network may suffer. Due to the diverse range of scenes
robot-based image collection and digital-image-based crack identifica encountered in practical engineering, annotating images for each
tion are relatively mature methods for road pavement and tunnel surface scene class separately is time-consuming, and building a general
scenarios [15,16]. In these two scenarios, pavement inspection vehicles dataset is extremely challenging.
and tunnel inspection vehicles, respectively, can be used to take digital
images under fully controlled conditions. The obtained images usually Considering the opportunities and limitations mentioned above, this
cover only the surfaces of the inspected structure (ROI), with no extra paper proposes a methodology for extracting image ROIs based on 3D
neous backgrounds, and are of high quality. However, bridge surface point cloud segmentation and 3D-to-2D projection, aiming to improve
distribution has a much higher complexity than road pavement and crack identification from bridge images taken by UAVs and containing
tunnel surfaces. Thus, contact inspection vehicles cannot be applied for background information. Instead of directly extracting bridge compo
this scenario. Instead, using camera-carrying UAVs is required to cap nents from the images as in previous studies [19–21], the proposed
ture bridge surface images. UAVs are in a 6-way free state in flight, and methodology offers enlightening insights about achieving this goal
the scope of the photographic scene is often difficult to control. There indirectly. The presented methodology integrates point cloud semantic
fore, images taken by UAVs will inevitably contain extraneous back segmentation and 3D-to-2D projection technologies into the UAV-based
ground and are not suitable for direct use in crack identification. bridge crack detection task, contributing to advancements in the field.
In addition to crack identification, visualized crack localization is The proposed methodology has both practical purposes (as shown in
also a matter of concern for structural health monitoring and inspection. Section 2) and great potential to improve crack detection results when
Liu et al. [17,18] fused two-dimensional (2D) digital image processing handling images containing complex background information. The
technology and three-dimensional (3D) reconstruction technology to highway bridge is taken as an example in this study to facilitate dis
achieve crack identification and localization. First, many 2D digital cussion, but the proposed methodology is also applicable to other en
images taken in the field were used to complete the 3D reconstruction of gineering scenarios.
the structure by Structure from Motion (SfM). Then, the images con
taining crack information were selected for crack identification. Finally, 2. Methodology framework
the projection method completed crack localization. Generally, images
taken at a close distance with fine details improve crack identification, The framework of the proposed methodology is illustrated in Fig. 1.
while images taken at a long distance containing scene geometric in For the inspection task of concrete bridges discussed in this paper, the
formation are improve the success rate of 3D reconstruction. For images inspector takes numerous images manually or using a UAV. These im
focusing on a local area of the concrete surface, state-of-the-art tech ages contain information about both the spatial composition of the scene
nology has been able to accurately complete crack identification. and the cracks on the concrete surface. This information is used not only
However, images containing only flat, smooth, feature-sparse structural for performing SfM to reconstruct the 3D point cloud of the bridge but
surfaces are not sufficient for SfM to be successful. For successful 3D also for subsequently identifying cracks. First, RandLA-Net, a deep
reconstruction, captured images with background information such as learning framework for semantic segmentation of large-scale point
ground and vegetation are better, as the feature points in these images clouds, is adopted to construct a point cloud semantic segmentation
are more abundant. From this perspective, the background does not network RandLA-BridgeNet for highway bridges. Bridge point clouds
need to be excluded during UAV-based image capture. from an open-source dataset are annotated to train and test the network.
Therefore, it is an objective requirement for UAV-based bridge in A large-scale bridge point cloud can be input into RandLA-BridgeNet
spection to accurately identify cracks from images with background. directly to complete semantic segmentation, and then the 3D ROI can
Extracting the ROI from the images is the best way to achieve this goal. be easily extracted from the point cloud. Second, for each image con
Some recent researchers have extracted ROIs from 2D images based on taining the bridge components to be inspected, the 3D-to-2D projection
semantic segmentation algorithms of deep learning. Taking bridge is performed based on the pinhole camera model. This step, calculating
structures as an example, Narazaki et al. [19] constructed a semantic the projection of the 3D ROI in the 2D image (i.e., 2D ROI), is essentially
segmentation algorithm containing 45 convolutional layers for bridge the inverse process of SfM 3D reconstruction. Next, the edge detection
component recognition after performing scene classification. A MATLAB algorithm is used to find the outer contour of the 2D ROI and generate a
GUI image semantic segmentation annotation tool was developed spe mask. The background pixels outside the outer contour of the 2D ROI are
cifically to manually annotate thousands of images that were combined removed using the mask, and the 2D ROI is extracted, producing an
with an existing database to generate training data. Saovana et al. [20] image containing only the ROI. Finally, the ROI image is used for crack
trained a deep CNN (DCNN) for removing irrelevant features from identification to effectively avoid background interference on the
bridge images. The training data were realistic scene images manually identification algorithm.
annotated using LabelMe, and the number of samples was expanded by Notably, the proposed methodology framework is not limited to the
rotating images and adjusting the brightness of images. Using 236 adopted SfM-based 3D reconstruction technique. It can still be applied
highway bridge images, Sajedi et al. [21] trained Fully Convolutional with minor adjustments when using other 3D reconstruction techniques
DenseNet (FC-DenseNet) to extract different kinds of bridge components and 2D digital images for bridge disease detection.
from the images. These studies show that the main challenges for bridge The remainder of this paper is organized as follows: Section 2 de
component recognition in 2D images are as follows: scribes the point cloud semantic segmentation method used to extract
3D ROIs; Section 3 describes the 3D-to-2D projection and 2D ROI
2
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
3
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
extraction method; Section 4 describes the digital-image-based crack methods use only one or a combination of classic point cloud segmen
identification method; Section 5 describes the experimental study on a tation techniques and depend heavily on domain knowledge such as
real bridge for validating the proposed methodology; and Section 6 geometric features specific to particular bridge types, e.g., common di
concludes this work. mensions, axial orientation, and relative position relationships of com
ponents (piers, cap beams, girders, etc.). Therefore, these methods are
3. Point cloud segmentation technique for extracting 3D ROI only applicable to specific bridge types, and extending them to different
scenarios may result in serious errors.
3.1. Overview of point cloud segmentation To address these issues, some scholars have applied deep learning-
based methods to semantic segmentation of bridge point clouds. Kim
3D point cloud segmentation is a key research area in computer et al. [43,44] utilized PointNet, PointCNN and DGCNN for bridge point
vision that aims to classify each point of the point cloud into one of cloud segmentation, and the three methods performed similarly overall.
several classes based on its spatial location, color feature, semantic in However, these methods require block sampling operations along the
formation, etc. Classic segmentation methods include edge-based tech longitudinal direction of the bridge point cloud, and the size and overlap
niques, region growing, model fitting, and unsupervised clustering [22]. of the sampled blocks impact the segmentation results. Lee et al. [45]
More recently, supervised deep learning methods have gained promi proposed hierarchical DGCNN (HGCNN) based on PointNet and
nence [23], with voxel-based [24–26], multiview-based [27–29], and DGCNN, which effectively improved the recognition of electric poles on
point cloud-based [30–35] methods emerging. Among these, point bridges. Yang et al. [46] utilized the weighted SPG to directly process
cloud-based methods have become common due to their ability to avoid large-scale bridge point clouds, which performs better than PointNet
partial information loss resulting from data preprocessing. and DGCNN and does not require block sampling. Referring to
Some open-source deep learning semantic segmentation frameworks PointNet++, Jing et al. [47] developed BridgeNet for point cloud seg
based on point clouds have been proposed, starting with the classical mentation of masonry arch bridges and identified the bridge geometric
PointNet by Qi et al. [30]. PointNet directly uses 3D point clouds as parameters based on the segmented point clouds.
input and has become the basis for many subsequently proposed
methods. This framework, however, focuses too highly on global fea 3.2. Proposed segmentation network RandLA-BridgeNet
tures, ignores local features, and does not consider the adverse effects of
uneven point cloud density, which makes adapting it to complex scenes 3D point clouds of bridges obtained through SfM reconstruction
difficult. Thus, Qi et al. [31] proposed PointNet++, which overcomes often contain millions of points or more. While the classic PointNet
the problems of feature extraction methods to a certain extent. However, framework and its variations are widely used, they rely on block sam
it adopts the K-nearest neighbor search method, which may lead to the pling techniques to handle large-scale point clouds; these techniques can
concentration of sampling points in one direction. Point cloud data are be sensitive to sampling parameters and may affect the segmentation
usually disordered and have density inhomogeneity. Therefore, Li et al. results. To address this issue, the deep learning framework RandLA-Net
[32] proposed PointCNN to learn the local relationships of point clouds [35] is adopted in this study to develop a robust point cloud semantic
in space, which effectively reduces the time complexity and space segmentation network called RandLA-BridgeNet, which directly takes
complexity of segmentation. To address PointNet ignoring the correla the entire bridge point cloud as input. The network architecture is
tion between neighboring points, Wang et al. [33] proposed a graph illustrated in Fig. 2. The network adopts an encoder-decoder architec
convolution-based DGCNN, which includes an EdgeConv operation that ture with residual connections. The input point cloud is progressively
captures the distance information between each point and its neigh downsampled to extract the features of each point using a shared
boring points to learn edge features. While these frameworks are suit multilayer perceptron (MLP). Then, four encoding and decoding layers
able for small-scale scenarios, they require block sampling when are utilized to learn the features of the points. Finally, three fully con
handling large-scale point clouds. More specifically, large-scale point nected (FC) layers and a dropout layer are applied to predict the se
clouds must be cut into 1 m × 1 m small blocks, and then each block mantic labels of each point. Based on RandLA-Net [35], RandLA-
must be sampled to obtain 4096 points as the network input [30–33]. To BridgeNet follows most of the default parameter settings while adjust
fully adapt to large-scale point clouds, Landrieu et al. [34] proposed the ing the class definitions and the loss function to apply to the bridge point
SPG framework based on the superpoint graph. SPG first divides the cloud dataset. Since the number of points of each class in the bridge
point cloud into geometrically simple but meaningful sets of super dataset differs greatly, the weight of each class is calculated by dividing
points, forms a superpoint graph, and then embeds each superpoint into the number of points in each class by the total number of points in the
a PointNet for semantic segmentation. However, dividing the super dataset. Then, the value of 1/(weight + 0.02) is used as the weight for
points is difficult to implement and prone to classification errors. Hu each class in the loss function.
et al. [35] proposed RandLA-Net, a new framework that can directly To process a million-scale bridge point cloud directly with a deep
handle large-scale point clouds. RandLA-Net achieved good segmenta neural network, it is necessary to gradually downsample while retaining
tion results in the public large indoor and outdoor datasets S3DIS [36], as much geometric structure information as possible. Among the avail
Semantic3D [37] and SemanticKITTI [38]. able sampling methods, farthest point sampling (FPS), inverse density
In recent years, there have been several studies on point cloud seg importance sampling (IDIS), and generator-based sampling (GS) are
mentation of bridges. Due to the lack of a high-quality annotated bridge computationally expensive, while continuous relaxation-based sampling
point cloud database, some researchers have suggested learning- (CRS) is demanding on GPU memory, and policy gradient-based sam
independent segmentation methods [39–42]. Using the normal infor pling (PGS) has difficulty learning effective sampling strategies. There
mation of the points, Riveiro et al. [39] proposed a voxel-based method fore, RandLA-BridgeNet adopts random sampling (RS), which is
to recognize vertical and nonvertical components from the point cloud computationally efficient and has low memory overhead. However, RS
of a masonry arch bridge to divide the arch bridge into different parts. results in a loss of useful information. To mitigate this issue, the network
Yan et al. [40] proposed a heuristic algorithm for extracting structural incorporates a local feature aggregation (LFA) module that complements
components from the point clouds of steel bridges. Lu et al. [41] pro RS. Fig. 2 illustrates the LFA module consisting of three submodules:
posed a top-down point cloud segmentation algorithm for reinforced Local Spatial Encoding (LocSE), Attentive Pooling, and Dilated Residual
concrete bridges to complete bridge component recognition by stepwise Block. The LocSE submodule encodes the 3D coordinate information and
classification. Truong-Hong et al. [42] used a cell- and voxel-based re extracts neighborhood point features, enabling the network to better
gion growing method to extract surfaces individually from point clouds learn the geometric structure of the space from the relative position and
of reinforced concrete bridges. In conclusion, these segmentation distance information of points. Attentive pooling automatically learns
4
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
and aggregates useful information from neighboring point features. The superstructure and parapet and assigning the corresponding ground
dilated residual block connects two sets of LocSE and Attentive Pooling truth semantic labels. The final point cloud data used to train the
to cost-effectively increase the receptive field size of each point and network consisted of spatial location (XYZ), color (RGB), and semantic
facilitate feature propagation between neighboring points. The entire label information.
LFA module preserves the overall geometric details of the input point
cloud even if the features of some points are randomly discarded by
random downsampling. 3.4. Network training and testing
5
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
Table 1
Metadata of the bridge point cloud dataset.
Bridge number Number of points
6
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
Fig. 4. Visualized comparison between prediction and ground truth of the test set.
Table 2
Quantitative evaluation of semantic segmentation of the test set.
Metrics for each class Global metrics
⎧ TPi
⎪
⎪ Precisioni =
⎪
⎪ TPi + FPi
⎪
⎪
⎪
⎪
⎪
⎪ TPi
⎪
⎪ Recalli =
⎪
⎪ TP + FNi
⎪
⎪ i
⎪
⎪
⎪
⎪ TPi
⎪
⎪
⎪ IoUi =
⎪
⎪
⎪ TPi + FNi + FPi
⎪
⎪
⎪
⎪
⎪
⎪ 2 × Precisioni × Recalli
⎪ (F1 score)i =
⎪
⎪
⎪ Precisioni + Recalli
⎪
⎪
⎪
⎪
⎪
⎪ ∑n
⎨ TPi
i=1 (1)
⎪ OA =
⎪
⎪
⎪ N
⎪
⎪
⎪
⎪ ∑n
⎪
⎪ Recalli
⎪
⎪
⎪
⎪ i=1
⎪
⎪ AR =
⎪
⎪ n
⎪
⎪
⎪
⎪
⎪
⎪ ∑n
⎪
⎪ IoUi
⎪
⎪
⎪
⎪ mIoUi =
i=1
⎪
⎪ n
⎪
⎪
⎪
⎪
⎪
⎪ ∑n
⎪
⎪ (F1 score)i
⎪
⎪
⎩ Average F1 score = i=1
n
7
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
Table 3
Comparison of performance metrics between representative models.
Segmentation network OA mIoU IoU for each class
Pier Parapet
8
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
4.2. Boundary detection of projection points pixels in either width or height, setting α to 1000 is appropriate. Other
details and complete processing steps of this example are described in
Using the method mentioned in the previous subsection, all points in Section 4.3.
the 3D ROI are projected into the 2D image. The resultant discrete
projected points represent the 2D ROI. The alpha shape algorithm [48] is 4.3. Batch processing algorithm for image ROI extraction
used in this study to compute a series of boundary line segments and
generate a polygon that encloses the ROI. As shown in Fig. 8, the methods described above are integrated into
The alpha shape algorithm can be implemented in MATLAB R2022a an automatic MATLAB R2022a algorithm. The integrated algorithm can
[49] using the “alphaShape” function, which relies on the value of the batch process all images containing the current component of interest by
parameter α. Fig. 7 shows the boundary detection results using different using the 3D ROI as input and outputting images that contain only the
α values, taking an image of a bridge pier as an example. The red points ROI. The algorithm has excellent operational efficiency, requiring an
denote the projected points. The blue lines and green regions denote the average processing time of only 8.3 s per 9504 × 6336 pixel image.
detected boundary line segments and generated polygons, respectively. The right half of Fig. 8 illustrates the processing flow for a single
When α is set to 1 or 10, the generated polygons have many intersecting image. Image 1 shows an original image, which contains the upper half
polylines, and the enclosing effect is weak. When α is set to 100, the of a pier (the current ROI) and its connection area with the pier cap.
generated polygons can enclose a portion of the projection points, but After performing the 3D-to-2D projection according to the pinhole
voids still exist inside. When α is set to 1000, the algorithm successfully camera model, image 2 is obtained. Since the 3D ROI includes the whole
captures the outer contours of the projection points and the generated pier while the original image contains only the upper half of the pier,
polygons effectively enclose all the projection points. Thus, setting α to a many projection points fall outside the scope of the image. After deleting
larger value is recommended for common nonporous bridge components these overrun points, image 3 is obtained. Then, the algorithm calls the
or surfaces. Since the images involved in this study do not exceed 10,000 “alphaShape” function to detect the boundaries of the projection points,
9
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
as shown in image 4. These boundary points are then used to generate a recall and precision are perfectly balanced, as shown in Fig. 10. The
mask, as shown in image 5. Finally, the pixels outside the mask are remaining grid cells above the threshold are treated as areas containing
removed to obtain image 6, which contains only the ROI. Notably, this cracks for further crack segmentation.
algorithm not only separates the ROI from the background but also In addition, since only the grid-based classification branch is needed
removes the background so that the resulting images can be directly in this paper, the box neck and box head in the network can be removed
used for crack identification. through a pruning operation. This improves computing efficiency and
does not affect grid output.
5. Crack identification method
5.1.2. Generalization on OOD data
Digital image-based crack identification can be divided into two Concrete bridge images have different data distributions than asphalt
steps: crack extraction and crack segmentation. A data-driven deep pavement images. However, the weights used in the test are those
learning convolutional neural network (CNN) called the grid-based trained on the asphalt pavement image dataset due to the lack of an
classification and box-based detection fusion model (GCBD) is used for available surface crack image dataset for concrete bridges. Thus, directly
crack extraction [50]. The deep-learning-based method is more robust applying the fusion model on concrete piers requires strong out-of-
than traditional machine learning methods and can handle complex distribution (OOD) generalization performance.
scenes in practical engineering. A typical threshold-based segmentation Because the fusion model adopts a shared backbone network, mul
method [51] in digital image processing (DIP) is adapted for crack titask learning and joint training, it is highly robust. The grid-based
segmentation. The histogram distribution of cracks is bimodal, and the classification branch focuses on the local area, and the box-based
threshold-based method can obtain ideal crack segmentation results. detection branch focuses on the whole region. Fusing two tasks with
different objectives drives the model to capture the common features of
5.1. GCBD fusion model for crack extraction the cracks at both micro and macro scales. The experimental results
show that the weight is still well generalized on concrete bridge surfaces,
The deep learning model used to extract cracks is the GCBD fusion which will be demonstrated in Section 6.5.
model. The model has two output results: grid-based classification re
sults and box-based detection results. In this paper, the grid-based 5.2. Crack segmentation
classification results are the desired output, and the box-based detec
tion results are not needed. 5.2.1. Threshold-based method
Crack segmentation is performed in each grid cell using the Otsu
5.1.1. Grid-based classification branch algorithm [51]. The crack segmentation algorithm can be divided into
The network architecture has two branches, the grid-based classifi three steps: preprocessing, segmentation and postprocessing, as shown
cation branch and the box-based detection branch, as shown in Fig. 9. In in Fig. 11.
this paper, we focus on the grid-based classification branch. The grid- Preprocessing can be divided into two parts: image preprocessing
based classification branch of the network resizes the image to 1440 and cracked region preprocessing. For cracked regions, grid cells with
× 960 and outputs a 45 by 30 grid mask. This branch integrates the obvious misidentification are filtered out based on the confidence of
features extracted by the backbone from three scales through the grid each grid cell and the connectivity between all grid cells through con
neck and finally outputs a grid mask through a layer of convolution on nected component analysis (CCA). If a connected area is small and the
the 5-fold subsampled feature map. average confidence is low, this area is considered an obvious misiden
Each grid cell in the grid mask has a confidence level. An appropriate tification area for filtering. Afterward, the median filter is used to
threshold is set based on the scenario and the requirements for filtering smooth the image and filter out the salt and pepper noise.
out the grid cells below the threshold. The threshold can be chosen by The maximum interclass variance is calculated in each grid cell to
testing model performance on a small dataset, which in this paper is size obtain the local Otsu segmentation threshold. Because the image reso
5. The smaller the threshold is, the higher the recall. The larger the lution is high and the coverage is wide, different areas in the image have
threshold is, the higher the precision. In identifying pier surface cracks, a inconsistent lighting. Therefore, using one threshold segmentation for
threshold of 0.3 is set to find as many cracks as possible. A low threshold the entire image will cause the local tiny crack to not be identified. This
leads to some misidentifications, which are mostly non-pier surface problem can be alleviated by using the local threshold method based on
disturbances that can be filtered through the background exclusion grid cells.
method proposed in the paper. In addition, when the threshold is set to Preprocessing improves the reliability of the crack segmentation
approximately 0.3, the F1 score reaches the maximum value, and the results on the macro scale, while postprocessing further refines the crack
10
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
segmentation results on the micro scale. Calculating the connection 5.2.2. Segmentation performance
relation of pixel points filters out noise such as holes. Then, the expan To test the performance of the segmentation algorithm, experiments
sion corrosion closure operation is used to address edge nonclosure and are performed on the concrete crack dataset [53]. The edge of the crack
internal cavities. is fuzzy, and there is a transition area between the crack pixel and the
noncrack pixel, so the two adjacent pixels around the crack pixel can
11
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
12
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
The survey revealed that one bridge along the G7 Beijing-Xinjiang processing technique can also be adopted to enhance the crack detection
Expressway was moderately sized and had cracks on the surfaces of ability if necessary.
several concrete piers, making it a suitable candidate for this experi Since no product on the market integrates this camera with a UAV,
ment. Therefore, this bridge (referred to as the G7 bridge hereafter) was our research team developed a gimbal system that supports the UAV to
selected as the experimental scene, as demonstrated in Fig. 13. The G7 carry the camera. A corresponding software system was also developed
bridge is a three-span continuous concrete bridge with six lanes in total, to enable real-time control of the camera's three-axis rotation and
comprising 8 prismatic piers arranged in two rows. shooting action via the UAV remoter.
The experiment was carried out by following the process shown in Fig. 15 displays some photos of the UAV at work. The UAV was
Fig. 1, and the results of each step are discussed in detail in the following manually controlled to fly and photograph the G7 bridge from various
subsections. angles and different distances. A total of 1577 images were obtained.
Table 4
Parametric analysis of the dense cloud quality setting.
Quality setting Depth map generation Dense cloud generation Number of points File size/GB
13
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
Fig. 16. Built point cloud of the G7 bridge (dense cloud quality: low).
Fig. 17. Visualized comparison between prediction and ground truth of the G7 bridge.
Table 5
Quantitative evaluation of semantic segmentation of the G7 bridge.
Metrics for each class Global metrics
images with 9504 × 6336 pixels, selecting a low quality can already similar cases, a low dense cloud quality setting is recommended. The
generate 95,224,214 points with a point density similar to that of the built point cloud is shown in Fig. 16. Details such as the drainage pipe,
dataset described in Section 3.3. Therefore, for this experiment and height limit mark and pier surface texture of the G7 bridge are quite
14
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
Table 6 bridge. CloudCompare was used to manually annotate the point cloud as
Quantitative evaluation of the image ROI extraction. the ground truth for further comparison. The comparison between the
Metrics for each class Global metrics prediction and the ground truth of the G7 bridge is visualized in Fig. 17.
The results show a small visual difference between the prediction and
Metrics Background ROI
the ground truth, indicating that the semantic segmentation was
Precision 97.1% 99.9% OA 98.9% generally successful.
Recall 99.9% 98.2% AR 99.1%
IoU 97.0% 98.2% mIoU 97.6%
The quantitative assessment results are presented in Table 5. The
F1 score 98.5% 99.1% Average F1 score 98.8% overall prediction performance is excellent, with the model achieving an
OA of 97.0%, which is similar to that of the test set. The segmentation of
the four classes, namely, background, pier, superstructure and parapet,
clear, indicating that the scene reconstruction quality is good and the is satisfactory, with F1 scores above 90% for each class.
point density can meet the requirements. The subsequent processing and
analysis are based on this point cloud. 6.4. Image ROI extraction
6.3. Point cloud semantic segmentation The identification targets of this experiment are the cracks on the
concrete surfaces of piers. A total of 26 images containing crack infor
The built 3D point cloud of the G7 bridge was directly fed into the mation on pier surfaces were selected as the data source for crack
trained RandLA-BridgeNet to obtain a point cloud with predicted se identification. The 3D ROI corresponding to each image is the pier to
mantic labels. This cloud was obtained extremely quickly, with a pro which the cracked concrete surface in the image belongs; it can be easily
cessing time of only 125.4 s for the dense cloud containing extracted from the semantic-segmented point cloud of the G7 bridge, as
approximately 95 million points from the approximately 40-m-long G7 described in Section 4.1. By batch processing the 26 images using the
Fig. 19. Crack identification results of typical images before and after ROI extraction.
15
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
Table 7 poor lighting conditions underneath the bridge, so the pier surface ap
Crack identification results. pears dark, while the abutment surface in the background appears
Image Number of grid Number of grid cells in Misidentification brighter. Because of this, the deep learning method of 2D images is likely
number cells in ROI background rate to misidentify the brighter abutment surface in the background as an
1 144 149 50.9% ROI. The methodology proposed in this paper avoids this problem in
2 160 293 64.7% principle, reasonably removing the background pixels and preserving
3 56 44 44.0% the concrete surface of interest.
4 193 182 48.5% The aforementioned image ROI extraction can be regarded as a bi
5 117 30 20.4%
6 149 192 56.3%
nary classification task to quantitatively evaluate its effectiveness.
7 146 10 6.4% Specifically, this step involves classifying all pixels in each image as
8 129 20 13.4% either background or ROI. The 26 images were manually annotated by
9 57 65 53.3% removing the background to generate the ground truth. Table 6 presents
10 36 133 78.7%
the pixel-level evaluation metrics. There is only a slight difference be
11 141 216 60.5%
12 369 60 14.0% tween the evaluation metrics for the background and ROI. Moreover, all
13 142 60 29.7% the evaluation metrics exceed 97%, indicating that the boundaries be
14 67 204 75.3% tween the background and ROIs were accurately detected.
15 45 246 84.5%
16 59 289 83.0%
17 210 65 23.6%
6.5. Crack identification
18 134 23 14.6%
19 113 19 14.4%
20 98 34 25.8% The images before and after ROI extraction were processed by the
21 99 33 25.0% crack identification method. The crack identification results were ob
22 92 37 28.7% tained by extracting grid cells containing cracks from the GCDB fusion
23 173 168 49.3%
model, as shown in Fig. 19. When the background is excluded, the crack
24 116 136 54.0%
25 163 140 46.2% identification accuracy improves. Lines such as beams, branches, and
26 111 208 65.2% the interface between the ROI and background can easily be mis
All images 3319 3056 47.9% identified as cracks when the background has not been excluded.
Excluding the background can not only eliminate background line
interference but also improve the accuracy of crack identification near
algorithm described in Section 4.3, images containing only the 2D ROIs
the interface between the foreground and background.
(cracked concrete surfaces of interest) were obtained.
The quantitative results of the crack identification are listed in
Fig. 18 illustrates the ROI extraction results of some typical images.
Table 7, in which the misidentification rate denotes the ratio of the
These images demonstrate the challenges mentioned by other re
number of grid cells in the background to the total number of grid cells.
searchers [19–21], showing that directly using a deep learning method
Overall, the ROI extraction operation filters 47.9% of the grid cells; these
for concrete component recognition in 2D images can be problematic.
filtered grid cells are misidentified background cells. Therefore, ROI
For the first three images, the deep learning method for 2D images
extraction can effectively improve crack identification accuracy. Elimi
recognizes all bridge piers as ROIs and keeps them, making directly
nating the interference of a complex background can make the network
obtaining the desired results impossible. The fourth image is taken in
focus on the ROI, which is more consistent with the training data
16
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
distribution and enhances the robustness of the network. surface crack image datasets for concrete bridges. Although the
After crack extraction, the threshold segmentation method intro identification results are generally satisfactory, transfer learning
duced in Section 5.2 is used to further segment cracks, as shown in and fine tuning could feasibly further improve performance. The
Fig. 20. The proposed method can segment cracks effectively and can be main feature extraction layer of asphalt pavement weight can be
used for subsequent crack assessment to assist with sophisticated frozen, and a small amount of concrete surface data can be
maintenance decisions. labeled to train the fusion model so that the model can better
adapt to the data distribution of concrete bridge cracks. From
7. Conclusions another perspective, establishing a large concrete bridge crack
image dataset for training a new crack identification model
Accurately detecting cracks in concrete surface images with complex would also be meaningful.
backgrounds is a challenging task. To improve the results of this task, an
image ROI extraction methodology based on 3D point cloud semantic CRediT authorship contribution statement
segmentation and 3D-to-2D projection is presented in this paper. First, a
deep-learning-based semantic segmentation network RandLA-BridgeNet Jing-Lin Xiao: Data curation, Investigation, Methodology, Visuali
for large-scale bridge point clouds is constructed. A real-world bridge zation, Writing – original draft. Jian-Sheng Fan: Conceptualization,
point cloud dataset is established for training and testing the network. Funding acquisition, Writing – review & editing, Supervision. Yu-Fei
Using the entire point cloud of the scene as input, RandLA-BridgeNet can Liu: Conceptualization, Funding acquisition, Methodology, Supervision,
perform semantic segmentation accurately and efficiently, achieving Writing – review & editing. Bao-Luo Li: Investigation, Validation,
mIoUs of 91.6% and 91.1% on the validation set and the test set, Visualization. Jian-Guo Nie: Supervision, Writing – review & editing.
respectively. Then, the 3D ROIs (concrete components of interest) are
easily extracted from the segmented point cloud and projected into the
Declaration of Competing Interest
corresponding images according to the pinhole camera model and
camera pose information. Next, the alpha shape algorithm is used to
The authors declare that they have no known competing financial
detect the boundaries of the projected 2D ROI and remove the back
interests or personal relationships that could have appeared to influence
ground, generating images that contain only the ROIs (concrete surfaces
the work reported in this paper.
of interest). Finally, improved deep-learning-based crack identification
can be performed using these processed images.
Data availability
The methodology was validated by an experiment on an approxi
mately 40-m-long bridge along the G7 Beijing-Xinjiang Expressway in
Data will be made available on request.
China. For the point cloud reconstructed from 1577 UAV aerial images
and containing approximately 95 million points, the inference of
Acknowledgments
RandLA-BridgeNet took only 125.4 s. RandLA-BridgeNet achieved
excellent semantic segmentation results, with F1 scores of 98.2%,
The research is supported by the National Natural Science Founda
91.2%, 96.8% and 90.3% for the background, pier, superstructure and
tion of China (No. 52192662 and 52121005). The authors express
parapet, respectively. Image ROI extraction was performed on 26 images
sincere appreciation for their contribution to this research.
containing concrete surface cracks, with the overall extraction accuracy
reaching 98.9%. A grid-based classification and box-based detection
References
fusion model is used to identify cracks in the images. After ROI extrac
tion, 47.9% of the grid cells, which represent background mis [1] K. Chaiyasarn, A. Buatik, H. Mohamad, M. Zhou, S. Kongsilp, N. Poovarodom,
recognition, are filtered, greatly improving in the crack identification Integrated pixel-level CNN-FCN crack detection via photogrammetric 3D texture
accuracy. mapping of concrete structures, Automation in Construction 140 (2022), 104388,
https://doi.org/10.1016/j.autcon.2022.104388.
The presented methodology integrates point cloud semantic seg [2] S.Y. Kong, J.S. Fan, Y.F. Liu, X.C. Wei, X.W. Ma, Automated crack assessment and
mentation and 3D-to-2D projection technologies into the UAV-based quantitative growth monitoring, Comput. Aided Civ. Inf. Eng. 36 (2021) 656–674,
bridge crack detection task, contributing to advancements in the field. https://doi.org/10.1111/mice.12626.
[3] X. Tan, A. Abu-Obeidah, Y. Bao, H. Nassif, W. Nasreddine, Measurement and
As indicated by the field experimental validation presented in Section 6,
visualization of strains and cracks in CFRP post-tensioned fiber reinforced concrete
the methodology framework shown in Fig. 1 has much potential for beams using distributed fiber optic sensors, Automation in Construction 124
practical UAV-based bridge inspection applications and achieves (2021), 103604, https://doi.org/10.1016/j.autcon.2021.103604.
impressive crack detection results when handling images containing [4] B.A. Graybeal, B.M. Phares, D.D. Rolander, M. Moore, G. Washer, Visual inspection
of highway bridges, J. Nondestruct. Eval. 21 (3) (2002) 67–83, https://doi.org/
complex background information. 10.1023/A:1022508121821.
However, some limitations still exist and call for future research [5] Y. Liu, S. Cho, B.F. Spencer, J. Fan, Automated assessment of cracks on concrete
efforts: surfaces using adaptive digital image processing, Smart Struct. Syst. 14 (4) (2014)
719–741, https://doi.org/10.12989/sss.2014.14.4.719.
[6] R. Ali, J.H. Chuah, M.S.A. Talip, N. Mokhtar, M.A. Shoaib, Structural crack
(1) Due to the relatively limited scenarios covered by the training detection using deep convolutional neural networks, Automation in Construction
data, the semantic segmentation network has limited applica 133 (2022), 103989, https://doi.org/10.1016/j.autcon.2021.103989.
[7] A. Zhang, K.C.P. Wang, B. Li, E. Yang, X. Dai, Y. Peng, Y. Fei, Y. Liu, J.Q. Li,
bility to various scenarios. A large open-source point cloud C. Chen, Automated pixel-level pavement crack detection on 3D asphalt surfaces
database that covers more bridge types needs to be established. using a deep-learning network, Comput. Aided Civ. Inf. Eng. 32 (10) (2017)
(2) The parameter setting method of the alpha shape algorithm needs 805–819, https://doi.org/10.1111/mice.12297.
[8] S. Dorafshan, R.J. Thomas, M. Maguire, Comparison of deep convolutional neural
to be further studied to extract the 2D ROI boundary accurately networks and edge detectors for image-based crack detection in concrete,
for bridge components or surfaces with holes. Construct. Build Mater. 186 (2018) 1031–1045, https://doi.org/10.1016/j.
(3) In the presented experiment, the manually controlled UAV flight conbuildmat.2018.08.011.
[9] C.V. Dung, L.D. Anh, Autonomous concrete crack detection using deep fully
was cumbersome and inefficient, requiring large battery con
convolutional neural network, Automation in Construction 99 (2019) 52–58,
sumption. Automatic path planning and control methods for UAV https://doi.org/10.1016/j.autcon.2018.11.028.
bridge inspection tasks need to be developed, and multiple UAVs [10] S. Bang, S. Park, H. Kim, H. Kim, Encoder-decoder network for pixel-level road
may collaborate to further improve efficiency. crack detection in black-box images, Comput. Aided Civ. Inf. Eng. 34 (8) (2019)
713–727, https://doi.org/10.1111/mice.12440.
(4) The crack identification model used in this study was trained on [11] C. Xiang, W. Wang, L. Deng, P. Shi, X. Kong, Crack detection algorithm for concrete
the asphalt pavement image dataset due to the lack of available structures based on super-resolution reconstruction and segmentation network,
17
J.-L. Xiao et al. Automation in Construction 158 (2024) 105226
Autom. Constr. 140 (2022), 104346, https://doi.org/10.1016/j. [32] Y. Li, R. Bu, M. Sun, W. Wu, X. Di, B. Chen, PointCNN: convolution on x-
autcon.2022.104346. transformed points, advances in neural information processing systems, Montreal
[12] P. Guo, X. Meng, W. Meng, Y. Bao, Monitoring and automatic characterization of (2018) 820–830. https://dl.acm.org/doi/10.5555/3326943.3327020.
cracks in strain-hardening cementitious composite (SHCC) through intelligent [33] Y. Wang, Y. Sun, Z. Liu, S.E. Sarma, M.M. Bronstein, J.M. Solomon, Dynamic graph
interpretation of photos, Compos. Part B Eng. 242 (2022), 110096, https://doi. CNN for learning on point clouds, ACM Trans. Graph. 38 (5) (2019) 1–12, https://
org/10.1016/j.compositesb.2022.110096. doi.org/10.1145/3326362.
[13] S. Poorghasem, Y. Bao, Review of robot-based automated measurement of [34] L. Landrieu, M. Simonovsky, Large-Scale Point Cloud Semantic Segmentation with
vibration for civil engineering structures, Measurement 207 (2023), 112382, Superpoint Graphs, Proceedings of the IEEE Conference on Computer Vision and
https://doi.org/10.1016/j.measurement.2022.112382. Pattern Recognition, Salt Lake City, 2018, pp. 4558–4567, https://doi.org/
[14] E. Ranyal, A. Sadhu, K. Jain, Road condition monitoring using smart sensing and 10.1109/CVPR.2018.00479.
artificial intelligence: a review, Sensors 22 (8) (2022) 3044, https://doi.org/ [35] Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, A. Markham, RandLA-
10.3390/s22083044. Net: Efficient Semantic Segmentation of Large-Scale Point Clouds, Proceedings of
[15] C. Chen, S. Chandra, Y. Han, H. Seo, Deep learning-based thermal image analysis the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle,
for pavement defect detection and classification considering complex pavement 2020, pp. 11108–11117, https://doi.org/10.1109/CVPR42600.2020.01112.
conditions, Remote Sens. (Basel) 14 (1) (2022) 106, https://doi.org/10.3390/ [36] I. Armeni, O. Sener, A.R. Zamir, H. Jiang, I. Brilakis, M. Fischer, S. Savarese, 3D
rs14010106. Semantic Parsing of Large-Scale Indoor Spaces, Proceedings of the IEEE Conference
[16] J. Guan, X. Yang, L. Ding, X. Cheng, V.C.S. Lee, C. Jin, Automated pixel-level on Computer Vision and Pattern Recognition, Las Vegas, 2016, pp. 1534–1543,
pavement distress detection based on stereo vision and deep learning, Automation https://doi.org/10.1109/CVPR.2016.170.
in Construction 129 (2021), 103788, https://doi.org/10.1016/j. [37] T. Hackel, N. Savinov, L. Ladicky, J.D. Wegner, K. Schindler, M. Pollefeys,
autcon.2021.103788. Semantic3D.net: A new large-scale point cloud classification benchmark, arXiv
[17] Y.F. Liu, S. Cho, B.F. Spencer, J.S. Fan, Concrete crack assessment using digital preprint arXiv (2017), https://doi.org/10.48550/arXiv.1704.03847, 1704.03847.
image processing and 3D scene reconstruction, J. Comput. Civ. Eng. 30 (1) (2016) [38] J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, J. Gall,
04014124, https://doi.org/10.1061/(ASCE)CP.1943-5487.0000446. SemanticKITTI: A Dataset for Semantic Scene Understanding of Lidar Sequences,
[18] Y.F. Liu, X. Nie, J.S. Fan, X.G. Liu, Image-based crack assessment of bridge piers Proceedings of the IEEE International Conference on Computer Vision, Seoul,
using unmanned aerial vehicles and three-dimensional scene reconstruction, 2019, pp. 9297–9307, https://doi.org/10.1109/ICCV.2019.00939.
Comput. Aided Civ. Inf. Eng. 35 (2020) 511–529, https://doi.org/10.1111/ [39] B. Riveiro, M.J. DeJong, B. Conde, Automated processing of large point clouds for
mice.12501. structural health monitoring of masonry arch bridges, Automation in Construction
[19] Y. Narazaki, V. Hoskere, T.A. Hoang, Y. Fujino, A. Sakurai, B.F. Spencer, Vision- 72 (2016) 258–268, https://doi.org/10.1016/j.autcon.2016.02.009.
based automated bridge component recognition with high-level scene consistency, [40] Y. Yan, J.F. Hajjar, Automated extraction of structural elements in steel girder
Comput. Aided Civ. Inf. Eng. 35 (2020) 465–482, https://doi.org/10.1111/ bridges from laser point clouds, Automation in Construction 125 (2021), 103582,
mice.12505. https://doi.org/10.1016/j.autcon.2021.103582.
[20] N. Saovana, N. Yabuki, T. Fukuda, Development of an unwanted-feature removal [41] R. Lu, I. Brilakis, C.R. Middleton, Detection of structural components in point
system for structure from motion of repetitive infrastructure piers using deep clouds of existing RC bridges, Comput. Aided Civ. Inf. Eng. 34 (2019) 191–212,
learning, Adv. Eng. Inform. 46 (2020), 101169, https://doi.org/10.1016/j. https://doi.org/10.1111/mice.12407.
aei.2020.101169. [42] L. Truong-Hong, R. Lindenbergh, Automatically extracting surfaces of reinforced
[21] S.O. Sajedi, X. Liang, Uncertainty-assisted deep vision structural health monitoring, concrete bridges from terrestrial laser scanning point clouds, Automation in
Comput. Aided Civ. Inf. Eng. 36 (2021) 126–142, https://doi.org/10.1111/ Construction 135 (2022), 104127, https://doi.org/10.1016/j.
mice.12580. autcon.2021.104127.
[22] Y. Xie, J. Tian, X.X. Zhu, Linking points with labels in 3D: a review of point cloud [43] H. Kim, J. Yoon, S.H. Sim, Automated bridge component recognition from point
semantic segmentation, IEEE Geoscience and Remote Sensing Magazine 8 (4) clouds using deep learning, Struct. Control Health Monit. 27 (2020), e2591,
(2020) 38–59, https://doi.org/10.1109/MGRS.2019.2937630. https://doi.org/10.1002/stc.2591.
[23] Y. Guo, H. Wang, Q. Hu, H. Liu, M. Bennamoun, Deep learning for 3D point clouds: [44] H. Kim, C. Kim, Deep-learning-based classification of point clouds for bridge
a survey, IEEE Trans. Pattern Anal. Mach. Intell. 43 (12) (2021) 4338–4364, inspection, Remote Sens. (Basel) 12 (22) (2020) 3757, https://doi.org/10.3390/
https://doi.org/10.1109/TPAMI.2020.3005434. rs12223757.
[24] D. Maturana, S. Scherer, Voxnet: A 3D convolutional neural network for real-time [45] J.S. Lee, J. Park, Y.M. Ryu, Semantic segmentation of bridge components based on
object recognition, in: 2015 IEEE/RSJ International Conference on Intelligent hierarchical point cloud model, Automation in Construction 130 (2021), 103847,
Robots and Systems (IROS), Hamburg, 2015, pp. 922–928, https://doi.org/ https://doi.org/10.1016/j.autcon.2021.103847.
10.1109/IROS.2015.7353481. [46] X. Yang, E.R. Castillo, Y. Zou, L. Wotherspoon, Y. Tan, Automated semantic
[25] Y. Zhou, O. Tuzel, Voxelnet: End-to-End Learning for Point Cloud Based 3D Object segmentation of bridge components from large-scale point clouds using a weighted
Detection, Proceedings of the IEEE Conference on Computer Vision and Pattern superpoint graph, Automation in Construction 142 (2022), 104519, https://doi.
Recognition, Salt Lake City, 2018, pp. 4490–4499, https://doi.org/10.1109/ org/10.1016/j.autcon.2022.104519.
CVPR.2018.00472. [47] Y. Jing, B. Sheil, S. Acikgoz, Segmentation of large-scale masonry arch bridge point
[26] C.R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, L.J. Guibas, Volumetric and Multi-View clouds with a synthetic simulator and the BridgeNet neural network, Automation in
CNNs for Object Classification on 3D Data, Proceedings of the IEEE Conference on Construction 142 (2022), 104459, https://doi.org/10.1016/j.
Computer Vision and Pattern Recognition, Las Vegas, 2016, pp. 5648–5656, autcon.2022.104459.
https://doi.org/10.1109/CVPR.2016.609. [48] H. Edelsbrunner, D. Kirkpatrick, R. Seidel, On the shape of a set of points in the
[27] H. Su, S. Maji, E. Kalogerakis, E. Learned-Miller, Multi-View Convolutional Neural plane, IEEE Trans. Inf. Theory 29 (1983) 551–559, https://doi.org/10.1109/
Networks for 3D Shape Recognition, Proceedings of the IEEE International TIT.1983.1056714.
Conference on Computer Vision, Santiago, 2015, pp. 945–953, https://doi.org/ [49] MATLAB R2022a, The MathWorks Inc., Natick, MA. https://ww2.mathworks.cn/h
10.1109/ICCV.2015.114. elp/matlab/, 2022 (accessed May 14, 2023).
[28] A. Boulch, J. Guerrv, B.L. Saux, N. Audebert, SnapNet: 3D point cloud semantic [50] B.L. Li, Y. Qi, J.S. Fan, Y.F. Liu, C. Liu, A grid-based classification and box-based
labeling with 2D deep segmentation networks, Computer & Graphics 71 (2018) detection fusion model for asphalt pavement crack, Comput. Aided Civ. Inf. Eng.
189–198, https://doi.org/10.1016/j.cag.2017.11.010. (2022), https://doi.org/10.1111/mice.12962.
[29] A. Milioto, I. Vizzo, J. Behley, C. Stachniss, RangeNet++: Fast and accurate LiDAR [51] N. Otsu, A threshold selection method from gray-level histograms, IEEE Trans.
semantic segmentation, in: 2019 IEEE/RSJ International Conference on Intelligent Syst. Man Cybern. 9 (1979) 62–66, https://doi.org/10.1109/TSMC.1979.4310076.
Robots and Systems (IROS), Venetian Macao, 2019, pp. 4213–4220, https://doi. [52] L.L.C. Agisoft, Agisoft Metashape user manual: professional edition, Version 2.0.
org/10.1109/IROS40897.2019.8967762. https://www.agisoft.com/pdf/metashape-pro_2_0_en.pdf, 2023.
[30] C.R. Qi, H. Su, K. Mo, L.J. Guibas, PointNet: Deep Learning on Point Sets for 3D [53] C.F. Özgenel, Concrete crack images for classification, Mendeley Data V2 (2019),
Classification and Segmentation, Proceedings of the IEEE Conference on Computer https://doi.org/10.17632/5y9wdsg2zt.2.
Vision and Pattern Recognition, Hawaii, 2017, pp. 652–660, https://doi.org/ [54] J. Liu, X. Yang, S. Lau, X. Wang, S. Luo, V.C.S. Lee, L. Ding, Automated pavement
10.1109/CVPR.2017.16. crack detection and segmentation based on two-step convolutional neural network,
[31] C.R. Qi, L. Yi, H. Su, L.J. Guibas, PointNet++: deep hierarchical feature learning on Comput. Aided Civ. Inf. Eng. 35 (11) (2020) 1291–1305, https://doi.org/10.1111/
point sets in a metric space, advances in neural information processing systems, mice.12622.
Long Beach (2017) 5099–5108. https://dl.acm.org/doi/abs/10.5555/3295222.32
95263.
18