Automated Pixel-Level Pavement Crack Detection On 3D Asphalt Surfaces Using A Deep-Learning Network
Automated Pixel-Level Pavement Crack Detection On 3D Asphalt Surfaces Using A Deep-Learning Network
Abstract: The CrackNet, an efficient architecture based ing techniques, CrackNet is programmed to be efficiently
on the Convolutional Neural Network (CNN), is pro- used in conjunction with the data collection software.
posed in this article for automated pavement crack de-
tection on 3D asphalt surfaces with explicit objective of 1 INTRODUCTION
pixel-perfect accuracy. Unlike the commonly used CNN,
CrackNet does not have any pooling layers which down- The automation of pavement crack detection generally
size the outputs of previous layers. CrackNet fundamen- requires robust algorithms of high level of intelligence.
tally ensures pixel-perfect accuracy using the newly de- However, over the past decades, the private and pub-
veloped technique of invariant image width and height lic endeavors on worldwide basis vastly underestimated
through all layers. CrackNet consists of five layers and in- the challenges and difficulties in developing fully auto-
cludes more than one million parameters that are trained mated analysis tools for pavement cracks. The underly-
in the learning process. The input data of the CrackNet ing difficulty is directly related to human’s limitation in
are feature maps generated by the feature extractor us- developing mathematical models simulating cognition
ing the proposed line filters with various orientations, capability.
widths, and lengths. The output of CrackNet is the set Thresholding algorithms were proposed to find cracks
of predicted class scores for all pixels. The hidden lay- by setting global or local thresholds (Cheng et al., 2003;
ers of CrackNet are convolutional layers and fully con- Oliveira and Correia, 2009), with various limitations for
nected layers. CrackNet is trained with 1,800 3D pave- images with complex illuminance. Segmentation-based
ment images and is then demonstrated to be successful methods were proposed to conduct analyses at small
in detecting cracks under various conditions using an- blocks (Kirschke and Velinsky, 1992; Huang and Xu,
other set of 200 3D pavement images. The experiment us- 2006; Ying and Salari, 2010), but were incapable of
ing the 200 testing 3D images showed that CrackNet can representing crack width accurately as the detection is
achieve high Precision (90.13%), Recall (87.63%) and conducted at block level instead of pixel level. Edge
F-measure (88.86%) simultaneously. Compared with re- detectors were widely used to detect the edges of
cently developed crack detection methods based on tra- pavement cracks (Attoh-Okine and Ayenu-Prah, 2008;
ditional machine learning and imaging algorithms, the Santhi et al., 2012; Nisanth and Mathew, 2014), however
CrackNet significantly outperforms the traditional ap- with incapability in detecting complete crack profiles.
proaches in terms of F-measure. Using parallel comput- Filter-based algorithms were developed to find cracks
∗ Towhom correspondence should be addressed. E-mail: kelvin. of anticipated responses (Zhang et al., 2013; Zalama
[email protected]. et al., 2014), with limitations in detecting cracks of weak
C 2017 Computer-Aided Civil and Infrastructure Engineering.
DOI: 10.1111/mice.12297
2 Zhang et al.
Fig. 1. Typical detection results on a smooth asphalt surface and a highly textured asphalt surface.
responses to predesigned filters. The CrackTree demon- tect cracks on 3D pavement surfaces. Recently, the in-
strated a strong capability of detecting discontinued teractive cracking detection algorithm was proposed to
cracks (Zou et al., 2012), however without considering detect cracks on 3D pavement surfaces with aid of the
the actual crack width. Wavelet Transform was applied feedback from human operators (Zhang et al., 2016a).
to decompose the original data into different frequency Hybrid procedures of matched filtering, tensor voting
subbands. Consequently, cracks could be detected and minimum spanning tree were also developed for
more easily following the assumption that cracks are crack detection using 3D pavement data (Sollazzo et al.,
primarily preserved in high frequency subbands 2016). In addition, the 3D shadow modeling was pro-
(Zhou et al., 2006; Subirats et al., 2006; Wang et al., posed for detection of various descended patterns (i.e.,
2007). Nevertheless, the decomposition of original data cracks, joints, grooves, and potholes) on 3D pavement
into frequency domain adversely impacts the spatial surface (Zhang et al., 2017). Although the recent crack
entirety of pavement cracks, resulting in discontinued detection algorithms demonstrate some successes, none
cracks. of them have shown acceptable levels of precision and
The three-dimensional (3D) laser imaging technol- bias on consistent basis on a large set of diversified pave-
ogy has become the dominant approach to automated ment data in various conditions.
pavement data collection in recent years. Compared The biggest challenge of automated crack detection
with two-dimensional (2D) pavement images, 3D pave- is to consistently achieve high performance under
ment data are less vulnerable to lighting conditions and various complex environments. A certain automated
present more useful information as well as fewer noises algorithm may yield detection results of satisfying
in terms of crack detection (Wang, 2011; Zhang and accuracies on some particular roads, while resulting in
Wang, 2017). Many studies were conducted to detect completely unacceptable error rates on other roads.
cracks specifically on 3D pavement surfaces. Depth- Such inconsistent performances may be frequently
checking methods were proposed for crack detection observed on asphalt surfaces where the textures and
using 3D pavement surface data (Jahanshahi et al., 2013; roughness levels are varying. Figure 1 shows detection
Ouyang and Xu, 2013), representing simple threshold- results of the interactive cracking detection algorithm
ing methodologies on the depth and length of a crack (Zhang et al., 2016a) on a smooth asphalt surface and
with limited successes particularly for complex and fine a highly textured asphalt surface. It can be observed
cracks. A modification of the dynamic optimization al- that the interactive cracking detection algorithm of
gorithm was implemented (Jiang and Tsai, 2016) to de- high sensitivity yields many false-positive errors on the
Pixel-level pavement crack detection on 3D asphalt surfaces 3
Fig. 2. Detection results of the matched filtering, interactive cracking detection, and 3D shadow modeling algorithms on a 3D
pavement image.
highly textured surface, while successfully finding all ing automated algorithms for pavement crack detection
cracks on the smooth surface without any errors. On the were developed based on specific and uncomprehensive
other hand, the interactive cracking detection algorithm hypotheses and lacked the capability of learning from
of low sensitivity achieves high accuracy on the highly examples. This could be one of the reasons why the
textured surface, but fails to detect the fine crack on the current automated algorithms have tangible difficulties
smooth surface. This example illustrates challenges of in detecting pavement cracks at consistently high levels
many conflicting performance scenarios for traditional of precision and bias for a pavement network in vary-
automated algorithms, such as the matched filtering ing conditions and with different texture characteristics
(Zhang et al., 2013) and the 3D shadow modeling (Zhang et al., 2016a). However, it is also true that hu-
(Zhang et al., 2017), which require substantial manual mans are good at recognizing pavement cracks without
assistance to obtain consistent results. hesitation and can achieve consistent results. From such
The conflicting performances of traditional auto- a perspective, advanced machine learning techniques
mated algorithms may also be observed on the same that simulate human cognition potentially provide pave-
road, or even on the same pavement image. Figure 2 ment engineers a truly automated tool for production
shows the detection results of the matched filter- purposes.
ing (Zhang et al., 2013), interactive cracking detec- Many successful applications of machine learning
tion (Zhang et al., 2016a) and 3D shadow modeling techniques were reported in the field of transporta-
(Zhang et al., 2017) algorithms on a specific 3D pave- tion engineering, such as traffic incident detection
ment image. The sensitivity levels of the three algo- (Adeli and Samant, 2000), work zone capacity estima-
rithms are progressively tuned to detect the fine cracks tion (Adeli and Jiang, 2003), traffic flow forecasting
located at the top right corner of the image. However, (Jiang and Adeli, 2005), and traffic sign classification
the three algorithms all result in false-positive errors (Ciresan et al., 2012). There were also pioneering
particularly at the left side of the image when the fine applications of machine learning techniques, such as
cracks start to be recognized. Artificial Neural Network (ANN) and Support Vector
The indistinctive boundary between fine cracks and Machine (SVM), in classifying cracks on pavement
local noises/textures indeed has created many difficul- surfaces (Kaseko and Ritchie, 1993; Kaseko et al., 1994;
ties for traditional automated algorithms. Fundamen- Lee and Lee, 2004; Gavilan et al., 2011; Nejad and
tally, the diversity of cracks and textures on asphalt Zakeri, 2011; Marques, 2012; Daniel and Preeja, 2014).
pavement surfaces would require any automated algo- However, these studies generally represent only one
rithms to be completely self-adaptive and fully tuned on or two layers of abstraction and cannot fully reflect the
the basis of exhaustive examples. However, most exist- complexity of pavement surface.
4 Zhang et al.
In recent years, Deep Learning has been found to Zhang et al. (2016b) proposed a CNN with four con-
provide an opportunity to learn from experiences and volution layers, four max-pooling layers and two fully
understand complex problems based on a hierarchy of connected layers for crack detection. Nevertheless, the
concepts (Goodfellow et al., 2016), where the concepts class label assigned to an individual pixel was still based
are defined and learned through increasing levels of ab- on the local context around the pixel, which resulted
straction (Murphy, 2012). in overestimation of crack width. It was shown in their
Particularly, deep Convolutional Neural Networks study that deep CNN outperformed traditional machine
(CNN) have demonstrated successes in large-scale ob- learning techniques (i.e., SVM and Boosting method)
ject recognition problems (Krizhevsky et al., 2012; for pavement crack detection. The classic architecture
Zeiler and Fergus, 2013; Sermanet et al., 2014; Szegedy of AlexNet (Krizhevsky et al., 2012) was adopted to
et al., 2014; Simonyan and Zisserman, 2015; He et al., classify an input image as crack image or no-crack im-
2015). The two most significant properties of CNNs age (Some, 2016), and thus was incapable of detecting
are space invariance and distortion invariance due to cracks at pixel level. Cha et al. (2017) also developed a
the sharing weights and pooling layers (LeCun et al., deep CNN for piecewise classification on cracks. With
1998), which contribute to their great potentials for vi- the use of sliding window techniques, the class label was
sion tasks. However, due to the existence of pooling assigned to each small patch of size 256 × 256. It was
layers, the original data are progressively downsized demonstrated that their method achieved a high level
through increasing levels of abstraction, resulting in loss of accuracy in classifying a small image patch as cracked
of the original information. Therefore, most convolu- or intact even under complex illuminations. However,
tional networks were to classify small image patches and the actual location of cracks in the small patch was not
could not achieve pixel-perfect accuracies. Some con- considered in their study, and the pixel-perfect accu-
volutional networks intended to assign class label to an racy was thus unattainable. Although the recent CNN-
individual pixel by treating the local context around the based methodologies have limitations in terms of pixel-
pixel as a whole patch (Tschopp, 2015; Maji et al., 2016), perfect accuracy, they all reveal the great potential of
and may yield imprecise predictions around the target deep CNN in detecting pavement cracks.
pixels (i.e., the crack pixels) due to overlaps. Other con- In this article, an efficient network architecture based
volutional networks were also proposed for pixelwise on CNN is proposed for pavement crack detection on
classifications (Pinheiro and Collobert, 2015; Shelhamer 3D asphalt surfaces with full considerations on pixel-
et al., 2016), which all include pooling layers, and thus perfect accuracy. Compared with traditional CNNs, the
inevitably lose original data. In the All Convolutional proposed network does not have any pooling layers.
Net, the pooling layers were replaced by convolution The pixel-level accuracy is achieved by the following
layers with larger strides (Springenberg et al., 2015), re- regularizations. First, the spatial size of the input data
sulting in similar spatial reductions and data losses. is invariant through all layers. Second, the ground-truth
Pixel-perfect accuracy is critical for pavement crack of training data is prepared for pixel-to-pixel supervised
detection, meaning the visible geometric features of learning. Last, an individual pixel is compared with its
cracks can be available regarding the shape, orientation, neighbors through the local connections provided at the
length, and width. First, the pixel-perfect accuracy can convolution layer, and the final class score for an indi-
yield accurate identifications on the types of detected vidual pixel is predicted by integrating and analyzing the
cracks (e.g., longitudinal cracks, transverse cracks, or multichannel responses evaluated at that pixel. Several
alligator cracks). Second, the actual width, length and thousand training data sets were prepared and manu-
extent of a crack are important indicators to identify ally processed by the research team for machine learn-
its severity level. Last, the accurate measurements of ing purposes. A total of 1,800 observations are used in
pavement cracks are very useful for timely monitor- the presented work of CrackNet in training, and another
ing on the developing behaviors of pavement cracks. 200 observations are used for testing.
Thus, the automated crack survey without a high level
of pixel-perfect accuracy would become less useful. It is
impossible for a human being to visually measure geo- 2 METHODOLOGY
metric features of cracks without resorting to measure-
ment tools. Although current convolutional networks 2.1 Data preparation
accomplished excellent performances in numerous vi- All pavement surface data used in the article are
sion tasks, it is challenging to apply them in pavement 1-mm 3D data from the PaveVison3D system mounted
crack detection when pixel-level accuracy is fully con- in a Digital Highway Data Vehicle (DHDV) made by
sidered. WayLink. The DHDV can scan the pavement surface
Pixel-level pavement crack detection on 3D asphalt surfaces 5
Table 2
Sizes and rotation angles of line filters used by the feature
extractor
Filter Filter
Filter no. width sx length s y Rotation angle θ
1-36 3 5 0°, 5°, 10°, 15°, . . . , 175°
37-72 5 5 0°, 5°, 10°, 15°, . . . , 175°
73-108 7 5 0°, 5°, 10°, 15°, . . . , 175°
109-144 9 5 0°, 5°, 10°, 15°, . . . , 175°
145-180 11 5 0°, 5°, 10°, 15°, . . . , 175°
181-216 3 10 0°, 5°, 10°, 15°, . . . , 175°
217-252 5 10 0°, 5°, 10°, 15°, . . . , 175°
253-288 7 10 0°, 5°, 10°, 15°, . . . , 175°
289-324 9 10 0°, 5°, 10°, 15°, . . . , 175°
325-360 11 10 0°, 5°, 10°, 15°, . . . , 175°
corresponding pixels are noncrack pixels. Table 1 shows where x is the distance from the profile center; f (x)
that CrackNet has more than 1 million parameters to be is the function value at x; λ is the parameter to control
learned. the flatness at the center area and the steepness near the
The CrackNet intends to detect cracks with explicit center area; and D is the distance from the profile center
requirements on pixel-level accuracy. First, the image where the filter value equals to 0.5.
height and width are invariant through all layers such In Figure 4, the transition from edge area to the cen-
that the error at each pixel can be learned efficiently ter area becomes more dramatic when parameter λ in-
following an end-to-end manner. Second, the feature creases. In terms of crack detection, the symmetric sig-
extractor and convolution layer I provide local con- moid curve can be used to mimic the sharp changes of
nections such that the relationships between crack pix- elevations between cracks and the background. The re-
els and their local surroundings can be learned by the sponse of a crack to the filter can be distinctive if the
CrackNet. Finally, the convolution layer II, fully con- profile of the crack is matched with the filter profile. The
nected layers I and II are deployed primarily for learn- parameter λ is fixed as 6 in this article. Note that λ is
ing the complex difference between a crack pixel and not a sensitive parameter and contributes to small dif-
a noncrack pixel based on the multichannel responses ferences when it is greater than 4. In addition, the width
obtained at the same location. of the profile is fixed as 4 × D because the symmetric
sigmoid curve becomes flat when the distance from cen-
ter is greater than 2 × D.
2.3 Feature extractor Based on the defined profile, a single line filter is
formulated by a collection of identical profiles along y
In this article, the feature extractor serves to generate axis. In other words, the profile defined in (2) is repli-
feature maps and prepare the input of CrackNet. The cated along y axis. To yield nearly zero responses un-
feature extractor utilizes line filters oriented at vari- der noises, the line filter is shifted to have a zero mean.
ous directions and with varied lengths as well as widths Table 2 shows the sizes and rotation angles of the 360
to enhance the contrast between cracks and the back- line filters used by the feature extractor. Five differ-
ground. It can be considered as a feature detection layer ent widths are assigned to the 360 line filters to con-
with fixed operations and without learnable parameters. sider cracks of varying widths. In addition, two small
Each feature map, or equivalently each channel of the lengths are used by the 360 line filters for matching
input of CrackNet is a filtered image processed by an with a crack at local regions. Finally, the 360 line fil-
individual line filter. In particular, there are 360 line fil- ters are aligned at various orientations with a small
ters used in feature extraction to generate 360 feature fixed angle interval such that a crack fragment of ar-
maps. The profile (cross section) of each line filter is a bitrary direction could be entirely or partially matched
symmetric sigmoid curve developed in the article as with one of the line filters. Figure 5 illustrates some
−λ(D−|x|)
representative line filters following the specifications in
f (x) = 1 − 1/ 1 + e D (2) Table 2.
Pixel-level pavement crack detection on 3D asphalt surfaces 7
3 IMAGE LIBRARY FOR TRAINING and Bellinger, 2014) can be found in the image library.
AND TESTING The ground-truths of cracks on all images are manually
marked with close supervision on pixel-perfect accura-
An image library is established to feed CrackNet with cies by multiple teams. A three-round inspection is con-
diverse examples for supervised learning. The image ducted to ensure the ground-truths are accurate at pixel
library has more than 5,000 3D pavement images at level. For the first round, several well-trained operators
1 mm resolution representing diversified variations of manually mark the cracks on provided 3D pavement
cracks and pavement surface textures. All 3D pavement images with full resolution. For the second round, sev-
images in the library are collected in the last 5 years, eral other well-trained operators examine and refine the
and at various locations as well as different collection ground-truths for correcting errors and reducing subjec-
speeds ranging from 20 to 60 MPH. There is no overlap tivity. Finally, the ground-truths are further inspected
between any two images, and no more than 100 images and verified by experts. The entire process of preparing
are from the same road. All types of cracks with various ground-truths of cracks is completed on continuing
severity levels defined in the LTPP protocol (Miller basis for nearly 1 year. Figure 6 illustrates several
8 Zhang et al.
typical 3D pavement images from the established image In this article, the size of mini-batch and the learn-
library. ing rate are manually adjusted through the iterations.
Two thousand asphalt surface images are randomly The mini-batch size for an iteration ranges from 10 to
selected from the image library for training and testing 30, whereas the learning rate for an iteration is be-
of CrackNet. The 2,000 asphalt surface images represent tween 0.001 and 0.05. In particular, smaller batch sizes
various textures and different mix types, including Hot and larger learning rates are preferred in the beginning
Mix Asphalt (HMA) and Warm Mix Asphalt (WMA). for faster convergence. Larger batch sizes and smaller
In particular, 1,800 images are used as training data, and learning rates are used subsequently for fine-tuning.
the other 200 images are considered as testing data.
4.3 Normalized initialization and dropout
4 TRAINING Due to large number of parameters, the initialization
of parameters becomes important. A good strategy for
4.1 Cost function parameter initialization is to keep similar variances of
activation values and backpropagated gradients across
The sigmoid units at the output layer produce predicted
all layers. In the article, the Normalized Initialization
values between 0 and 1 at all individual pixels. In this
(Glorot and Bengio, 2010) is adopted to initialize pa-
article, the target values for a background pixel and a
rameters such that the variances of activation values and
crack pixel are set as 0 and 1, respectively. Given the
backpropagated gradients can be maintained efficiently.
ground-truth of cracks, the target values for all pixels
In addition, another efficient technique called
of an image can be determined directly. Subsequently,
“Dropout” is implemented in the article to reduce the
Cross Entropy is employed as the cost function to mea-
risk of overfitting (Hinton et al., 2012). The Dropout
sure the similarity between the predicted values and tar-
technique randomly omits each hidden neuron with a
get values (Goodfellow et al., 2016; Nielsen, 2017):
probability of 0.5. The omitted hidden neurons have
n
zero output values, and thus will not be activated in both
C = [yi lnai + (1 − yi ) ln (1 − ai )] (3) forward and backward passes. The Dropout technique
i=1
prevents complex co-adaptions of neurons and forces
where yi is the target value at ith pixel; ai is the pre- neurons to be more independent instead of relying on
dicted value at ith pixel; and n is the number of pixels of other neurons.
the image.
Cross Entropy is used as the cost function for two
4.4 Parallel computing
reasons. First, it improves the learning speed. Second,
it drives the network to learn at a rate controlled by Except the output layer, all other layers of CrackNet
the similarity between the predicted values and the tar- require intensive computations on 360 images of size
get values. In other words, the network can learn faster 1,024 × 512 for both forward and backward passes.
when the errors are larger. Therefore, parallel computing techniques is applied
to improve the computational efficiency of CrackNet
4.2 Learning method such that the training of CrackNet on the basis of 1,800
example images can become manageable. There have
The objective of training is to minimize the cost func- been pioneering applications of parallel computing
tion using gradient-based optimization. Mini-batch Gra- techniques in large-scale engineering computations
dient Descent, a variation of Stochastic Gradient De- (Adeli and Kamal, 1989, 1992a, b; Adeli and Kumar,
scent, is implemented here to update parameters at each 1995; Schrefler et al., 2000; Hsieh et al., 2002; Torbol,
iteration. In addition, Backpropagation with the use of 2014; Ponz-Tienda et al., 2016; Wu et al., 2016). All
Momentum is applied to compute gradients. Thus, the these studies demonstrated substantial gains of time ef-
learning method for each iteration is ficiency through well-designed parallel processing. The
fundamental challenge of parallel computing is to re-
W i+1 = 0.9 · W i − ε · ∂∂wC |wi B
i (4) formulate the problem and develop parallel algorithms
wi+1 = wi + W i+1
such that the parallel machines can be fully utilized
where i is the iteration index; wi is the weights learned (Adeli and Vishnubhotla, 1987). As shown in Figure 3,
at ith iteration; W i is the momentum variable at ith the structure of CrackNet is compatible with parallel
iteration; ε is the learning rate; B i represents the ith computing using the Graphics Processing Unit (GPU).
mini-batch; and ( ∂∂wC |wi ) Bi is the average gradient over For the forward pass to produce the output, the com-
B i evaluated at wi . putational tasks at all elements or pixels of the output
Pixel-level pavement crack detection on 3D asphalt surfaces 9
data at each layer can be operated simultaneously. With of three layers by excluding the two fully connected lay-
respect to the backward pass to compute gradients, the ers. To avoid overfitting problem, the 1,800 training im-
computational tasks at all elements of the input data ages are divided into two subsets. First, 300 images are
at each layer can also be executed concurrently. Given randomly selected from the 1,800 images and serve as
that the summation of partial gradients is executed at all validation data which do not participate in learning er-
elements simultaneously, race condition problems will rors and tuning parameters. Second, the other 1,500 im-
occur when multiple GPU threads try to write at the ages are all involved in the learning process. For time
same memory address concurrently. To avoid race efficiency, the performances of the two networks on
condition problems, the atomic add function, a built-in 300 validation images are evaluated at every 50 itera-
function of the CUDA programming platform (NVidia, tions instead of each iteration to inspect if overfitting oc-
2017), can be used to sum the partial gradients at each curs. In the meantime, the learned parameters are saved
element without interference from other elements. In at every 50 iterations for final selection of optimal pa-
general, for both forward and backward passes, the rameters that yield the best performances on the 300
computation tasks at each element of each layer in validation images. The timely performances of the two
CrackNet can be considered as completely independent networks on validation data also provide a guide to
with the use of atomic add function. Therefore, the manually tune hyper parameters, including learning
forward and backward passes are both implemented rate and mini-batch size.
in a massively parallel manner using a single or mul- Precision and Recall are two commonly used indica-
tiple GPU devices for up to two to three orders of tors for evaluating crack detecting algorithms (Fawcett,
computational improvements versus CPU based serial 2006; Zhang et al., 2016a, 2017). Precision refers to the
code implementation of the same algorithms. The percentage of crack pixels classified correctly with re-
CrackNet is programmed under C++ environment spect to all detected pixels, whereas Recall represents
and with the use of CUDA C platform, but without the percentage of crack pixels classified correctly with
using the NVidia cuDNN library. Using 2 GPU devices respect to all true crack pixels. Precision and Recall fre-
(2 NVidia GeForce GTX TITAN Black), the average quently conflict with each other. For instance, a high
processing time of the forward and backward passes for level of sensitivity may yield high Recall but low Preci-
a single image of size 1,024 × 512 are 5.37 and 18.51 sion. On the other hand, a low level of sensitivity could
seconds, respectively. CrackNet may require more result in high Precision but low Recall. Therefore, it
computations than traditional deep CNNs, such as the is challenging to achieve high Precision and high Re-
networks proposed in Zhang et al. (2016b) and Cha call simultaneously. F-measure is the harmonic mean of
et al. (2017). First, the input data of traditional deep Precision and Recall (Fawcett, 2006), reflecting the ac-
CNNs are normally small image patches, and the data curacy of an algorithm more appropriately. A high F-
depths at hidden layers may not exceed 360. Second, measure can be achieved only when the Precision and
the data size in traditional deep CNNs is continuously Recall are both high.
reduced through pooling layers or convolutional layers The training is completed after 700 iterations, which
with stride greater than 1. Last, CrackNet provides full takes roughly 9 days on two NVidia GeForce GTX TI-
connections across all channels at each pixel, resulting TAN Black cards in the same computer. Figure 7 shows
in intensive computations. However, it will be demon- the cost function values for the 700 iterations and the
strated in the next section that the fully connected overall F-measures on validation data elevated at every
layers improve the accuracy of CrackNet. In addition, 50 iterations. It is clearly demonstrated in Figure 7 that
the time cost due to the use of invariant spatial size CrackNet yields better performance than CrackNet-1
through all layers is also necessary, as it is beneficial for in terms of both cost function and overall F-measure,
learning errors at pixel level. indicating the importance of fully connected layers.
After 500 iterations, the cost function values for the two
networks are significantly reduced and only oscillate in
a small range, implying that the two networks can finally
4.5 Training result
yield similar outputs for most training examples. In
Fully connected layers I and II establish full connections addition, the overall F-measure on validation data is im-
across all channels at each individual pixel, which might proved progressively, indicating that the model does not
not be necessary. To justify the use of fully connected result in overfitting problems. Particularly, the highest
layers, a shallower network without fully connected lay- overall F-measure 89.54% for CrackNet is observed at
ers I and II is trained for a comparison with CrackNet. the 650th iteration. Therefore, the parameters saved at
This shallower network, denoted as CrackNet-1, shares the 650th iteration are considered as optimal. Using the
the same architecture with CrackNet but only consists optimal parameters, the overall Precision, Recall, and
10 Zhang et al.
Fig. 9. Illustration of Precision, Recall, and F-measure of CrackNet, 3D shadow modeling, and Pixel-SVM.
The 3D shadow modeling can lead to a higher level of cal regions of very fine cracks or hairline cracks. There
pixel-perfect accuracy, but is sensitive to local noises. are several possible reasons for missing hairline cracks.
Compared with both Pixel-SVM and 3D shadow mod- First, CrackNet may be trained to be conservative on
eling, CrackNet is more robust in suppressing noises hairline cracks such that complex textures on rough sur-
and detecting fine cracks. The efficiency of CrackNet is faces (e.g., open-graded asphalt surfaces) will be pre-
highlighted that automated algorithms based on deep- vented from being misclassified. As shown in Table 3,
learning techniques have better potential compared to the overall Recall is slightly lower than the overall Pre-
traditional algorithms with shallow level of abstraction cision, implying the conservativeness of CrackNet. Sec-
and limited learning capability. ond, the hairline cracks are degraded during down-
sampling. Last, the feature extractor does not grasp
hairline cracks sufficiently. As an image block with hair-
6 DISCUSSION line cracks can have multiple crack pixels as a whole,
it is highly possible that the differences between hair-
Figure 11 shows several testing images with typical er- line cracks and the background become more distinc-
rors resulted from the CrackNet. The false-positive er- tive when they are analyzed at block level. Therefore,
rors are highlighted in the dashed rectangles, whereas for pixel-level detection, CrackNet potentially has more
the false-negative errors are indicated in the dashed difficulties in finding hairline cracks successfully. To re-
circles. The false-negative errors universally occur at lo- duce false-negative errors resulted from CrackNet, one
12 Zhang et al.
Fig. 10. Typical comparison between Pixel-SVM, 3D shadow modeling, and CrackNet.
of the possible solutions is to increase the resolution of in variety compared to those associated with traditional
the input data such that the continuity of fine cracks is methods.
improved. The second possible resolution is to increase In Figure 11, the second type of false-positive errors
the number of line filters used by the feature extrac- mainly results from shoulder drop-off (highlighted on
tor, and align the line filters at more possible orienta- the second image), pavement edge (highlighted on the
tions. Alternatively, other preprocessing techniques can last image), or any other patterns that appear to be sim-
be applied to further enhance the contrast between fine ilar to cracks. This type of errors is challenging to be
cracks and the background. eliminated due to complex similarities with pavement
On the other hand, false-positive errors generated by cracks. However, if the receptive field at convolution
CrackNet can be roughly divided into two types. The layers can be increased, there may be higher probability
first type of false-positive errors, such as those marked of eliminating this type of false-positive errors. In ad-
on the first image in Figure 11, is caused by local noises dition, increasing the receptive field may also lead to
of small extent. Such errors can be eliminated using tra- lower risks in generating the first type of false-positive
ditional postprocessing methods, such as length filtering errors.
that removes patterns whose lengths are smaller than The feature extractor conducts fixed operations to
certain thresholds. Generally speaking, errors associ- prepare the input of CrackNet and includes no learn-
ated with CrackNet are small in number and consistent able parameters. Although it reduces parameters and
Pixel-level pavement crack detection on 3D asphalt surfaces 13
training difficulty, it could be revised as a learnable competitive crack detection algorithm without learning
layer to enhance the learning capability of CrackNet. capability, and Pixel-SVM, which is based on traditional
In addition, CrackNet also has limitations for concrete machine learning algorithms. The article reveals that
pavement surfaces. However, the current version of applications of deep-learning techniques may provide
CrackNet demonstrates a successful approach to apply- much better solutions than traditional crack detection
ing learning techniques for pavement crack detection algorithms with shallow level of abstraction and limited
on asphalt surfaces with explicit requirements on pixel- learning capability.
perfect accuracy. CrackNet in its current version requires substan-
The training and testing images used in the article are tial processing time and potentially has more difficul-
3D surface data. However, the architecture of Crack- ties in detecting hairline cracks successfully. However,
Net may also be suitable for other types of data, such the distinctive advantage of CrackNet is that it detects
as 2D pavement images. First, the feature extractor is cracks at pixel level instead of block level. Pixel-perfect
a general-purpose procedure to enhance contrast be- accuracy desired in automated pavement survey can
tween the crack and the background, given that the thus be enhanced through CrackNet. The image library
crack pixel has a lower intensity or elevation value. Sec- for training and testing purposes includes more than
ond, all subsequent layers of CrackNet also implement 5,000 3D pavement images collected from diverse pave-
general-purpose operations without constraints on data ment sections. Manual preparation of the ground-truths
types. The only matter is to adjust the filter sizes accord- for thousands of images with explicit requirements on
ing to the size of input data. If appropriate data are used pixel-perfect accuracy demands a huge amount of labor
for training, it is likely that CrackNet still can yield sim- and time. However, if the complexity and diversity of
ilar performance for other types of data. pavement surface are fully considered, it is worthwhile
to use a great number of labeled examples for compre-
hensive learning and low risk of overfitting. The fea-
7 CONCLUSIONS ture extractor used in the article can be considered as a
feature detection layer with fixed operations and with-
In this article, an efficient network architecture based out learnable parameters. Such fixed operations may
on Convolutional Neural Network (CNN) christened not be robust enough to extract hairline cracks or sup-
as CrackNet is described for the automated detection press complex noises desirably. However, according to
of pavement cracks on asphalt surfaces. Different from current performance, the CrackNet is still highly ef-
traditional CNNs, CrackNet does not have any pooling ficient even with a fixed feature extraction layer. For
layers that downsize the outputs of previous layers. Re- future developments, the feature extractor can be re-
gardless of the data depth, the data width and height are designed as a learnable layer for enhanced learning
invariant through all layers to achieve pixel-perfect ac- capability.
curacy. The input of CrackNet are feature maps gener- Improvements of CrackNet are continuing at rapid
ated by the feature extractor using proposed line filters. pace by the authors of this article, including identify-
The output of CrackNet are the predicted class scores ing joints on concrete pavements, grooves, and shoulder
for all individual pixels. CrackNet uses more than one drop-offs. A larger volume of test images in varying con-
million parameters and consists of one general convolu- ditions is to be used to further verify the performance of
tion layer, one 1 × 1 convolution layer, two fully con- CrackNet and make network refinements. Continuing
nected layers and one output layer. advances of GPU have allowed the training of Crack-
Using 1,800 image data sets of asphalt surfaces ran- Net to be conducted in an efficient manner. It is antici-
domly selected from the 3D image library established pated that future production-level software solutions to
by the research team, CrackNet is trained on two GPU the problem of automated crack survey will be based on
devices recursively with 700 iterations. The training of deep-learning techniques such as CrackNet due to their
CrackNet is successfully completed by the use of vari- robust performance in terms of precision and bias, and
ous efficient learning techniques, including Mini-batch their consistency across different pavement conditions.
Gradient Descent, Momentum, Cross Entropy, Nor-
malized Initialization, and Dropout. Then 200 testing
images from the image library are processed by the ACKNOWLEDGMENTS
trained CrackNet. The overall Precision, Recall, and
F-measure of CrackNet on the 200 testing images are The authors wish to thank Michael Ohara, Ruxin Yan,
90.13%, 87.63%, and 88.86%, respectively. It is demon- Te Pei, Zhixing Ma, Xiaoli Xu, Guolong Wang, and Shi-
strated in the comparison study that the CrackNet sig- hai Ding for their help in preparing the image library for
nificantly outperforms 3D shadow modeling which is a training and testing CrackNet. Partial financial support
14 Zhang et al.
by the Federal Aviation Administration Grant 13-G- He, K., Zhang, X., Ren, S. & Sun, J. (2015), Spatial pyramid
013 was appreciated for the presented work. In the past pooling in deep convolutional networks for visual recogni-
few years, FHWA provided various project and techni- tion, IEEE Transactions on Pattern Analysis and Machine
Intelligence, 37(9), 1904–16.
cal support to the OSU research team.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I. &
Salakhutdinov, R. R. (2012), Improving neural networks by
REFERENCES preventing co-adaptation of feature detectors, arXiv, 1207,
0580, 1–18.
Adeli, H. & Jiang, X. (2003), Neuro-fuzzy logic model for free- Hsieh, S., Yang, Y. & Hsu, P. (2002), Integration of gen-
way work zone capacity estimation, Journal of Transporta- eral sparse matrix and parallel computing technologies for
tion Engineering, 129(5), 484–93. large–scale structural analysis, Computer-Aided Civil and
Adeli, H. & Kamal, O. (1989), Parallel structural analysis us- Infrastructure Engineering, 17(6), 423–38.
ing threads, Computer-Aided Civil and Infrastructure Engi- Hsu, C. W., Chang, C. C. & Lin, C. J. (2010). A practi-
neering, 4(2), 133–47. cal guide to support vector classification. Available at:
Adeli, H. & Kamal, O. (1992a), Concurrent analysis of large http://www.csie.ntu.edu.tw/˜cjlin/papers/guide/guide.pdf,
structures—I. Algorithms, Computers and Structures, 42(3), accessed June 30, 2017.
413–24. Huang, Y. X. & Xu, B. G. (2006), Automatic inspection of
Adeli, H. & Kamal, O. (1992b), Concurrent analysis of pavement cracking distress, Journal of Electronic Imaging,
large structures—II. Applications, Computers and Struc- 15(1), 013017.1–013017.6.
tures, 42(3), 425–32. Jahanshahi, M. R., Jazizadeh, F., Masri, S. F. & Becerik-
Adeli, H. & Kumar, S. (1995), Concurrent structural optimiza- Gerber, B. (2013), Unsupervised approach for autonomous
tion on massively parallel supercomputer, Journal of Struc- pavement-defect detection and quantification using an in-
tural Engineering, 121(11), 1588–97. expensive depth sensor, Journal of Computing in Civil En-
Adeli, H. & Samant, A. (2000), An adaptive conjugate gradi- gineering, 27(6), 743–54.
ent neural network-wavelet model for traffic incident detec- Jiang, C. & Tsai, Y. (2016), Enhanced crack segmentation al-
tion, Computer-Aided Civil and Infrastructure Engineering, gorithm using 3D pavement data, Journal of Computing in
15(4), 251–60. Civil Engineering, 30(3), 04015050.1–04015050.10.
Adeli, H. & Vishnubhotla, P. (1987), Parallel processing, Jiang, X. & Adeli, H. (2005), Dynamic wavelet neural network
Computer-Aided Civil and Infrastructure Engineering, 2(3), model for traffic flow forecasting, Journal of Transportation
257–69. Engineering, 131(10), 771–79.
Attoh-Okine, N. & Ayenu-Prah, A. (2008), Evaluating pave- Kaseko, M. S. & Ritchie, S. G. (1993), A neural network-based
ment cracks with bidimensional empirical mode decompo- methodology for pavement crack detection and classifica-
sition, EURASIP Journal on Advances in Signal Processing, tion, Transportation Research Part C: Emerging Technolo-
v2008, 1–7. gies, 1(4), 275–91.
Cha, Y. J., Choi, W. & Buyukozturk, O. (2017), Deep Kaseko, M. S., Lo, Z. P. & Ritchie, S. G. (1994), Comparison
learning-based crack damage detection using convolutional of traditional and neural classifiers for pavement-crack de-
neural networks, Computer-Aided Civil and Infrastructure tection, Journal of Transportation Engineering, 120(4), 552–
Engineering, 32(5), 361–78. 69.
Chang, C. C. & Lin, C. J. (2011), LIBSVM: a library for sup- Kirschke, K. R. & Velinsky, S. A. (1992), Histogram-based
port vector machines, ACM Transactions on Intelligent Sys- approach for automated pavement-crack sensing, Journal
tems and Technology, 2(3), 1–27. of Transportation Engineering, 118(5), 700–10.
Cheng, H.D., Shi, J. & Glazier, C. (2003), Real-time image
Krizhevsky, A., Sutskever, I. & Hinton, G. (2012), ImageNet
thresholding based on sample space reduction and interpo-
classification with deep convolutional neural networks, Ad-
lation approach, Journal of Computing in Civil Engineering,
vances in Neural Information Processing Systems, v2, 1097–
17(4), 264–72.
105.
Ciresan, D., Meier, U., Masci, J. & Schmidhuber, J. (2012),
Multi-column deep neural network for traffic sign classifi- LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. (1998),
cation, Neural Networks, 32, 333–38. Gradient-based learning applied to document recognition,
Daniel, A. & Preeja, V. (2014), A novel technique for au- Proceedings of the IEEE, 86(11), 2278–324.
tomatic road distress detection and analysis, International Lee, B. J. & Lee, H. D. (2004), Position-invariant neural net-
Journal of Computer Applications, 101(10), 18–23. work for digital pavement crack analysis, Computer-Aided
Fawcett, T. (2006), An introduction to ROC analysis, Pattern Civil and Infrastructure Engineering, 19(2), 105–18.
Recognition Letters, 27(8), 861–74. Maas, A. L., Hannun, A. Y. & Ng, A. Y. (2013), Rectifier non-
Gavilan, M., Balcones, D., Marcos, O., Llorca, D. F., Sotelo, linearities improve neural network acoustic models, in Pro-
M. A., Parra, I., Ocana, M., Aliseda, P., Yarza, P. & ceedings of the 30th International Conference on Machine
Amirola, A. (2011), Adaptive road crack detection system Learning, Atlanta, Georgia.
by pavement classification, Sensors Journal, 11(10), 9628– Maji, D., Santara, A., Mitra, P. & Sheet, D. (2016), En-
57. semble of deep convolutional neural networks for learn-
Glorot, X. & Bengio, Y. (2010), Understanding the difficulty ing to detect retinal vessels in fundus images, arXiv, 1603,
of training deep feedforward neural networks, Journal of 04833, 1–4.
Machine Learning Research, v9, 249–56. Marques, A. G. C. S. (2012), Automatic road pavement crack
Goodfellow, I., Bengio, Y. & Courville, A. (2016), Deep detection using SVM. Thesis presented to Instituto Supe-
learning, MIT Press. Available at: http://www.deeplearning rior Técnico, Lisbon, Portugal, in partial fulltime of the re-
book.org/, accessed March 20, 2017. quirements for the degree of Master of Science.
Pixel-level pavement crack detection on 3D asphalt surfaces 15
Miller, J. S. & Bellinger, W. Y. (2014), Distress Identification Subirats, P., Dumoulin, J., Vinvent, L. & Barba, D. (2006),
Manual for the Long-Term Pavement Performance Pro- Automation of pavement surface crack detection using the
gram, Federal Highway Administration, Washington DC. continuous wavelet transform, in Proceedings of Interna-
Murphy, K. P. (2012), Machine Learning: A Probabilistic Per- tional Conference on Image Processing, Atlanta, Georgia,
spective, The MIT Press, Cambridge, Massachusetts. 3037–40.
Nejad, F. M. & Zakeri, H. (2011), An optimum feature ex- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov,
traction method based on wavelet-radon transform and dy- D., Erhan, D., Vanhoucke, V. & Rabinovich, A. (2014), Go-
namic neural network for pavement distress classification, ing deeper with convolutions, arXiv, 1409, 4842v1, 1–12.
Expert Systems with Applications, 38(8), 9442–60. Torbol, M. (2014), Real-time frequency-domain decomposi-
Nielsen, M. (2017), Improving the way neural networks tion for structural health monitoring using general-purpose
learn. Available at: http://neuralnetworksanddeeplearning. graphic processing unit, Computer-Aided Civil and Infras-
com/chap3.html, accessed February 15, 2017. tructure Engineering, 29(9), 689–702.
Nisanth, A. & Mathew, A. (2014), Automated visual inspec- Tschopp, F. (2015), Efficient convolutional neural networks
tion on pavement crack detection and characterization, In- for pixelwise classification on heterogeneous hardware sys-
ternational Journal of Technology and Engineering System, tems, arXiv, 1509, 03371, 2–23.
6(1), 14–20. Wang, K. C. P. (2011), Elements of automated survey of pave-
NVidia (2017), CUDA C programming guide. Avail- ments and a 3D methodology, Journal of Modern Trans-
able at: http://docs.nvidia.com/cuda/cuda-c-programming- portation, 19(1), 51–57.
guide/#axzz4eceVT29C, accessed March 1, 2017. Wang, K. C. P., Li, Q. & Gong, W. (2007), Wavelet-based
Oliveira, H. & Correia, P. L. (2009), Automatic road crack pavement distress image edge detection with à trous algo-
segmentation using entropy and image dynamic threshold- rithm, in Transportation Research Record 2024, TRB, Na-
ing, in Proceedings of the 17th European Signal Processing tional Research Council, Washington DC, 73–81.
Conference, Glasgow, Scotland, 622–26. Wu, Q., Cole, C. & McSweeney, T. (2016), Applications of
Ouyang, W. & Xu, B. (2013), Pavement cracking measure- particle swarm optimization in the railway domain, Interna-
ments using 3D laser-scan images, Measurement Science tional Journal of Rail Transportation, 4(3), 167–90.
and Technology, 24(10), 105204.1–105204.9. Ying, L. & Salari L. (2010), Beamlet transform-based
Pinheiro, P. O. & Collobert, R. (2015), From image-level technique for pavement crack detection and classifica-
to pixel-level labeling with convolutional networks, arXiv, tion, Computer-Aided Civil and Infrastructure Engineering,
1411, 6228, 1–9. 25(8), 572–80.
Ponz-Tienda, J. L., Salcedo-Bernal, A. & Pellicer, E. (2016), A Zalama, E., Gomez-Garcia-Bermejo, J., Medina, R. & Lla-
parallel branch and bound algorithm for the resource level- mas, J. (2014), Road crack detection using visual features
ing problem with minimal lags, Computer-Aided Civil and extracted by Gabor filters, Computer-Aided Civil and In-
Infrastructure Engineering, 32(6), 474–98. frastructure Engineering, 29(5), 342–58.
Santhi, B., Krishnamurthy, G., Siddharth, S. & Ramakrishnan, Zeiler, M. D. & Fergus, R. (2013), Visualizing and understand-
P. K. (2012), Automatic detection of cracks in pavements ing convolutional networks, arXiv, 1311, 2901, 1–11.
using edge detection operator, Journal of Theoretical and Zhang, A., Li, Q., Wang, K.C.P. & Qiu, S. (2013), Matched fil-
Applied Information Technology, 36(2), 199–205. tering algorithm for pavement cracking detection, in Trans-
Schrefler, B. A., Matteazzi, R., Gawin, D. & Wang, X. (2000), portation Research Record 2367, TRB, National Research
Two parallel computing methods for coupled thermohy- Council, Washington DC, 30–42.
dromechanical problems, Computer-Aided Civil and Infras- Zhang, A. & Wang, K. C. P. (2017), The fast prefix coding
tructure Engineering, 15(3), 176–88. algorithm (FPCA) for 3D pavement surface data compres-
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R. & sion, Computer-Aided Civil and Infrastructure Engineering,
LeCun, Y. (2014), OverFeat: integrated recognition, local- 32(3), 173–90.
ization and detection using convolutional networks, arXiv, Zhang, A., Wang, K. C. P. & Ai, C. (2017), 3D shadow mod-
1312, 6229, 1–16. eling for detection of descended patterns on 3D pavement
Shelhamer, E., Long, J. & Darrell, T. (2016), Fully convo- surface, Journal of Computing in Civil Engineering, 31(4),
lutional networks for semantic segmentation, arXiv, 1605, 04017019.1–04017019.13.
06211. Zhang, A., Wang, K. C. P., Ji, R. & Li, Q. (2016a), Ef-
Simonyan, K. & Zisserman, A. (2015), Very deep convolu- ficient system of cracking-detection algorithms with 1-
tional networks for large-scale image recognition, arXiv, mm 3D-surface models and performance measures, Jour-
1409, 1556, 1–14. nal of Computing in Civil Engineering, 30(6), 04016020.1–
Sollazzo, G., Wang, K. C. P., Bosurgi, G. & Li, Q. (2016), Hy- 04016020.16.
brid procedure for automated detection of cracking with 3D Zhang, L., Yang, F., Zhang, Y. D. & Zhu, Y. J. (2016b), Road
pavement data, Journal of Computing in Civil Engineering, crack detection using deep convolutional neural network,
30(6), 04016032.1–04016032.12. in Proceedings of International Conference on Image Pro-
Some, L. (2016), Automatic image-based road crack detec- cessing, Phoenix, AZ, v2016, 3708–12.
tion methods. Thesis presented to KTH Royal Institute of Zhou, J., Huang, P. S. & Chiang, F. P. (2006), Wavelet-based
Technology, Stockholm, Sweden, in partial fulltime of the pavement distress detection and evaluation, Optical Engi-
requirements for the degree of Master of Science. neering, 45(2), 027007.1–027007.10, 2006.
Springenberg, J. T., Dosovitskiy, A., Brox, T. & Riedmiller, Zou, Q., Cao, Y., Li, Q. Q., Mao, Q. Z. & Wang, S. (2012),
M. (2015), Striving for simplicity: the all convolutional net, CrackTree: automatic crack detection from pavement im-
arXiv, 1412, 6806, 1–14. ages, Pattern Recognition Letters, 33(3), 227–38.