0% found this document useful (0 votes)

15 views17 pages

SCSegamba Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures

The document presents SCSegamba, a lightweight Structure-Aware Vision Mamba Network designed for high-quality pixel-level segmentation of structural cracks while minimizing computational costs. It introduces a Structure-Aware Visual State Space module and a Gated Bottleneck Convolution to effectively capture crack morphology and texture, outperforming state-of-the-art methods with only 2.8M parameters. Experimental results demonstrate the model's superior performance on benchmark datasets, achieving high F1 scores and mIoU metrics across various scenarios.

Uploaded by

jessie.h3k9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views17 pages

SCSegamba Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures

Uploaded by

jessie.h3k9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack

Segmentation in Structures

Hui Liu1,2,3 , Chen Jia1,2,3,* , Fan Shi1,2,3 , Xu Cheng1,2,3 , Shengyong Chen1,2,3

1
School of Computer Science and Engineering, Tianjin University of Technology
2
Engineering Research Center of Learning-Based Intelligent System (Ministry of Education)
3
Key Laboratory of Computer Vision and System (Ministry of Education)
arXiv:2503.01113v3 [cs.CV] 23 Mar 2025

[email protected], {jiachen, shifan}@email.tjut.edu.cn, {xu.cheng, sy}@ieee.org

Abstract

Pixel-level segmentation of structural cracks across var-

ious scenarios remains a considerable challenge. Cur-
rent methods encounter challenges in effectively modeling
crack morphology and texture, facing challenges in bal-
ancing segmentation quality with low computational re-
(a) Comparison with SOTA methods. (b) Different numbers of SAVSS layers.
source usage. To overcome these limitations, we propose
a lightweight Structure-Aware Vision Mamba Network (SC-
Segamba), capable of generating high-quality pixel-level Raw GT PlainMamba Ours Raw GT PlainMamba Ours

(c) Segmentation results in complex interference conditions.

segmentation maps by leveraging both the morphological
information and texture cues of crack pixels with minimal Figure 1. Performance of SCSegamba on multi-scenario TUT [33]
computational cost. Specifically, we developed a Structure- dataset. (a) Comparison with SOTA methods. (b) Impact of differ-
Aware Visual State Space module (SAVSS), which incor- ent SAVSS layer numbers on performance, with normalized met-
porates a lightweight Gated Bottleneck Convolution (GBC) rics; FLOPs (G), Params (M), and Size (MB) decrease towards the
and a Structure-Aware Scanning Strategy (SASS). The key edges. (c) Visual results under complex interference.
insight of GBC lies in its effectiveness in modeling the mor-
phological information of cracks, while the SASS enhances mentation across diverse scenarios remains a complex chal-
the perception of crack topology and texture by strength- lenge. Recently, Convolutional Neural Networks (CNNs),
ening the continuity of semantic information between crack such as ECSNet [59] and SFIAN [5], have shown effec-
pixels. Experiments on crack benchmark datasets demon- tive crack feature extraction capabilities in segmentation
strate that our method outperforms other state-of-the-art tasks due to their strong local inductive properties. How-
(SOTA) methods, achieving the highest performance with ever, their limited receptive field constrains their ability to
only 2.8M parameters. On the multi-scenario dataset, model broad-scope irregular dependencies across the entire
our method reached 0.8390 in F1 score and 0.8479 in image, resulting in discontinuous segmentation and weak
mIoU. The code is available at https://github.com/ background noise suppression. Although dilated convolu-
Karl1109/SCSegamba. tion [2] expand the receptive field, their inherent induc-
tive bias still prevents them from fully addressing this is-
1. Introduction sue [58], especially in complex crack patterns with heavy
background interference.
Structures like bitumen pavement, concrete, and metal fre-
The success of Vision Transformer (ViT) [10, 45, 47]
quently develop cracks under shear stress, making regu-
has demonstrated Transformer’s effectiveness in capturing
lar health monitoring essential to avoid production issues
irregular pixel dependencies, which is crucial for recogniz-
[3, 4, 19, 23, 30]. Due to differences in material proper-
ing complex crack textures, as seen in networks like DTr-
ties and environmental conditions, various materials exhibit
CNet [48], MFAFNet [9], and Crackmer [46]. However,
significant variations in crack morphology and visual ap-
the quadratic complexity of attention calculations with se-
pearance [24]. As a result, achieving pixel-level crack seg-
quence length leads to high memory use and training chal-
*Corresponding author. lenges for high-resolution images, limiting deployment on
resource-constrained edge devices and practical applica- phologies. Additionally, the Multi-scale Feature Segmen-
tions. As shown in Figure 1(a), Transformer-based meth- tation head (MFS) integrates the GBC and Multi-layer Per-
ods like CTCrackseg [44] and DTrCNet [48] perform well, ceptron (MLP) to achieve high-quality segmentation maps
but their large parameter counts and high computational with low computational requirements. As shown in Figure
demands limit their deployability on resource-constrained 1(c), optimal segmentation performance was achieved with
devices. Although variants like Sparse Transformer [6] four SAVSS layers, producing clear segmentation maps that
and Linear Transformer [21] reduce computational require- effectively mask complex interference while maintaining
ments by sparsifying or linearizing attention, they sacrifice model lightweightness.
the ability to model irregular dependencies and pixel tex- In essence, the primary contributions of our work are
tures, hindering pixel-level detection tasks. Consequently, outlined as follows:
Transformer-based methods struggle to balance segmenta- • We propose a novel lightweight vision Mamba network,
tion performance with computational efficiency. SCSegamba, for crack segmentation. This model effec-
Recently, Selective State Space Models (SSMs) have tively captures morphological and irregular texture cues
attracted considerable interest due to Mamba’s showing of crack pixels, using low computational resources to gen-
strong performance in sequence modeling while maintain- erate high-quality segmentation maps.
ing low computational demands [14, 15]. Vision Mamba • We design the SAVSS with a lightweight GBC Convolu-
(ViM) [62] and VMamba [36] have extended Mamba to tion and a SASS scanning strategy to enhance the han-
the visual domain. The irregular extension of crack re- dling and perception of irregular texture cues in crack im-
gions with numerous branches in low-contrast images, of- ages. Additionally, a simple yet effective MFS is devel-
ten affected by irrelevant areas and shadows, challenges oped to generate segmentation maps with relatively low
existing Mamba VSS (Visual State Space Model) block computational resources.
structures and scanning strategies in capturing crack mor- • We evaluate SCSegamba on benchmark datasets across
phology and texture effectively. Most Mamba-based meth- diverse scenarios, with results demonstrating that our
ods [37, 51, 52, 55] process feature maps through linear method outperforms other SOTA methods while main-
layers, limiting selective enhancement or suppression of taining the low parameter count.
crack features against irrelevant disturbances, thus reduc-
ing detailed morphological extraction. Additionally, com-
2. Related Works
mon parallel or unidirectional diagonal scans [16] strug- 2.1. Crack Segmentation Network
gle to maintain semantic continuity when handling irregu- Early crack detection methods often relied on traditional
lar, multi-directional pixel topologies, weakening their abil- feature extraction techniques, such as wavelet transform
ity to manage complex textures and suppress noise. Con- [61], percolation models [54], and the k-means algorithm
sequently, current Mamba-based SSM frequently produce [25]. Although straightforward to implement, these meth-
false or missed detections in multi-scenario crack images. ods face challenges in suppressing background interference
Moreover, although these methods have the advantage of and achieving high segmentation accuracy. With advance-
fewer parameters, there remains potential to further reduce ments in deep learning, researchers have developed CNN-
their computational demands and enhance their deployabil- based crack segmentation networks that achieve SOTA per-
ity on edge devices. As shown in Figure 1(b), CSMamba formance [5, 7, 18, 26, 40, 49]. For instance, DeepCrack
[37], MambaIR [16], and PlainMamba [55] showed unsat- [35] enables end-to-end pixel-level segmentation, while
isfactory performance on crack images, with room for im- FPHBN [56] demonstrates strong generalization capabili-
provement in parameter count and computational load. ties. BARNet [17] detects crack boundaries by integrat-
To tackle the challenge of balancing high segmenta- ing image gradients with features, and ADDUNet [1] cap-
tion quality with low computational demands, we propose tures both fine and coarse crack characteristics across varied
the SCSegamba network that produces high-quality pixel- conditions. Although CNN-based methods show significant
level crack segmentation maps with low computational re- promise, the local operations and limited receptive fields of
sources. To improve the Mamba network’s perception of CNNs limit their ability to fully capture crack texture cues
irregular crack textures, we design a Structure-Aware Vi- and effectively suppress background noise.
sual State Space block (SAVSS), employing a Structure- Transformers [45], with their self-attention mechanism,
Aware Scanning Strategy (SASS) to enhance semantic con- are well-suited for visual tasks that require long-range
tinuity and strengthen crack morphology perception. For dependency modeling, making them increasingly popu-
capturing crack shape cues while maintaining low param- lar in crack segmentation networks [9, 31, 32, 41, 53].
eter and computational costs, we designed a lightweight For example, VCVNet [39], based on Vision Transformer
Gated Bottleneck Convolution (GBC) that dynamically ad- (ViT), is designed for bridge crack segmentation, address-
justs weights for complex backgrounds and varying mor- ing fine-grained segmentation challenges. SWT-CNN [27]
SASS Scan （4 routes）
Input SAVSS MFS SCSegamba …
0 1 2 3 0 4 8 12 3

Dysample
F4 …
4 5 7 Flatten 12 13 14 15
6 0 Sum

MLP
SAVSS Down Concat
Block4 Sample 8 9 10 11 0 4 1 2 … 15
(128×64×64) …
12 13 14 15 3 7 2 1 12

Dysample
F3

MLP
SAVSS Down Block1 Block2
Block3 Sample
Patch Embedding (64×128×128)
Output

Linear

SS2D
ResConnection

SiLU
Dysample Dysample
L = α · Lbce + β · LDice

F2
Block3

MLP
SAVSS Down
Block2 Sample Block4 Input
Supervise

Linear
Linear
(32×256×256)

PAF

GN
GBC
Positional
Encoding

GBC×2
F1
SAVSS Down

MLP
Conv

Linear

SiLU
Block1 Sample
(16×512×512) MLP Output

(a) Network Architecture of SCSegamba (b) Architecture of SAVSS block

Figure 2. Overview of our proposed method. (a) illustrates the overall architecture of SCSegamba and the processing flow for crack
images. (b) displays the structure of the SAVSS block. The input crack image undergoes comprehensive morphological and texture feature
extraction through SAVSS, while MFS produces a high-quality pixel-level segmentation map.

combines Swin Transformer and CNN for automatic fea- researchers have adapted Mamba to the visual domain, cre-
ture extraction, while TBUNet [60], a Transformer-based ating various VSS blocks. ViM [62] achieves comparable
knowledge distillation model, achieves high-performance modeling to ViT [10] without attention mechanisms, using
crack segmentation with a hybrid loss function. Although fewer computational resources, while VMamba [36] prior-
Transformer-based methods are highly effective at captur- itizes efficient computation and high performance. Plain-
ing crack texture cues and suppressing background noise, Mamba [55] employs a fixed-width layer stacking ap-
their self-attention mechanism introduces computational proach, excelling in tasks such as instance segmentation
complexity that grows quadratically with sequence length. and object detection. However, the VSS block and scanning
This results in a high parameter count and significant strategy require specific optimizations for each visual task,
computational demands, which limit their deployment on as tasks differ in their reliance on long- and short-distance
resource-constrained edge devices. information, necessitating customized VSS block designs to
ensure optimal performance.
Currently, no high-performing Mamba-based model ex-
ReLU ists for crack segmentation. Thus, designing an optimized
ReLU GN VSS structure specifically for crack segmentation is essen-
PointConv Output
GN tial to improve performance and efficiency. Given the intri-
BottConv
cate details and irregular textures of cracks, the VSS block
BottConv
DepthConv requires strong shape extraction and directional awareness
ReLU ReLU
to effectively capture crack texture cues. Additionally, it
should facilitate efficient crack segmentation while mini-
PointConv GN GN Input
mizing computational resource requirements.
BottConv BottConv
3. Methodology
3.1. Preliminary
Figure 3. Architecture of GBC. It employs bottleneck convolu- The complete architecture of our proposed SCSegamba is
tion to efficiently reduce the parameters and computational load, depicted in Figure 2. It includes two main components:
while the gating mechanism enhances the model’s adaptability in the SAVSS for extracting crack shape and texture cues, and
processing diverse crack patterns and complex backgrounds. GN
the MFS for efficient feature processing. To capture key
represents group normalization.
crack region cues, we integrate the GBC at the initial stage
2.2. Selective State Space Model of SAVSS and the final stage of MFS.
The introduction of the Selective State Space Models (S6) For a single RGB image E \in \mathbb {R}^{ 3 \times H \times W} , spatial in-
in the Mamba model [14] has highlighted the potential of formation is divided into n patches, forming a sequence
SSM [12, 13]. Unlike the linear time-invariant S4 model, \{B_1, B_2, \dots , B_n\} . This sequence is processed through the
S6 efficiently captures complex long-distance dependencies SAVSS block, embedding key crack pixel cues into multi-
while preserving computational efficiency, achieving strong scale feature maps \{F_1, F_2, F_3, F_4\} . Finally, in the MFS,
performance in NLP, audio, and genomics. Consequently, all information is consolidated into a single tensor, produc-
ing a refined segmentation output W \in \mathbb {R}^{ 1 \times H \times W } . details. After the residual connection is applied, the result-
ing output is:
3.2. Lightweight Gated Bottleneck Convolution
y = ReLU(Norm_4(BottConv_4(m(x)))) (7)
The gating mechanism enables dynamic features for each
spatial position and channel, enhancing the model’s abil- Output = y + x_{residual} (8)
ity to capture details [8, 57]. To further reduce parameter The design of BottConv and deeper gated branch enable
count and computational cost, we embedded a bottleneck the model to preserve basic crack features while dynami-
convolution (BottConv) with low-rank approximation [28], cally refining the fine-grained feature characterization of the
mapping matrices from high to low dimensional spaces and main branch, resulting in more accurate segmentation maps
significantly lowering computational complexity. in detailed regions.
In the convolution layer, assuming the spatial size of the
filter is p , the number of input channels is d and the input is 3.3. Structure-Aware Visual State Space Module
s, the convolution response can be represented as: Our designed SAVSS features a two-dimensional selective
z = Qs + c (1) scan (SS2D) tailored for visual tasks. Different scanning
strategies impact the model’s ability to capture continuous
where Q is a matrix of size f \times (p^2 \times d) , f is the number of crack textures. As shown in Figure 4, current vision Mamba
output channels, and c is the original bias term. Assuming z networks use various scanning directions, including paral-
lies in a low-rank subspace of rank f_0 , it can be represented lel, snake, bidirectional, and diagonal scans [36, 55, 62].
as z = V(z - z_1) + z_1 , where z_1 abstracts the mean vector Parallel and diagonal scans lack continuity across rows or
of features, acting as an auxiliary variable to facilitate the- diagonals, which limits their sensitivity to crack directions.
oretical derivation and correct feature offsets, V = LM^T Although bidirectional and snake scans maintain semantic
( L \in \mathbb {R}^{f \times f_0} , M \in \mathbb {R}^{(p^2d) \times f_0} ) represents the low-rank pro- continuity along horizontal or vertical paths, they struggle
jection matrix. The simplified response then becomes: to capture diagonal or interwoven textures. To address this,
our proposed diagonal snake scanning is designed to better
z = LM^Ts + c' (2)
capture complex crack texture cues.
Since f_0 < f , the computational complexity reduces SASS consists of four paths: two parallel snake paths
from O(fp^2d) to O(f_0p^2d) + O(ff_0) , where and two diagonal snake paths. This design enables the
, O(ff_0) \ll O(fp^2d)
indicating that the computational complexity re- effective extraction of continuous semantic information in
duction is proportional to the original ratio of f_0/f . regular crack regions while preserving texture continuity
In BottConv, pointwise convolutions project features in multiple directions, making it suitable for multi-scenario
into or out of low-rank subspace, thus significantly reduc- crack images with complex backgrounds.
ing complexity, while depthwise convolution that perform After the RGB crack image undergoes Patch Embedding
spatial information-adequate extraction in subspace guar- and Position Encoding, it is input as a sequence into the
antees negligibly low complexity. As shown in Figure 5, SAVSS block. To maintain a lightweight network, we use
BottConv in our GBC design significantly reduces parame- only 4 layers of SAVSS blocks. The processing equations
ter count and computational load compared to the original are as follows:
convolution, with minimal performance impact. \overline {P} = e^{\Delta P} (9)
As shown in Figure 3, the input feature x \in \mathbb {R}^{ C \times H \times W } is \overline {Q} = (\Delta P)^{-1} (e^{\Delta P} - I) \cdot \Delta Q (10)
retained as x_{\text {residual}} = x to facilitate the residual connection.
Subsequently, the feature x is passed through the BottConv z_k = \overline {P} z_{k-1} + \overline {Q} w_k (11)
layer, followed by normalization and activation functions, u_k = R z_k + S w_k (12)
resulting in the features x_1 and g_2(x) as shown below:
g_1(x) = ReLU(Norm_1(f_1(x))) (3) In these equations, the input w \in \mathbb {R}^{t \times D} , P \in \mathbb {R}^{G \times D}
controls the hidden spatial state, S \in \mathbb {R}^{D \times D} is used to
x_1 = ReLU(Norm_2(BottConv_2(g_1(x)))) (4) initialize the skip connection for input, z_k represents the
g_2(x) = ReLU(Norm_3(BottConv_3(x))) (5) specific hidden state at time step k , and Q \in \mathbb {R}^{G \times D} and
R \in \mathbb {R}^{G \times D} are matrices with hidden spatial dimensions G
To generate the gating feature map, x_1 and g_2(x) are and temporal dimensions D , respectively, obtained through
combined through the Hadamard product: selective scanning SS2D. These are trainable parameters
m(x) = x_1 \odot g_2(x) (6) that are updated accordingly. u_k represents the output at
time step k . SASS establishes multi-directional adjacency
The gating feature map m(x) is subsequently processed relationships, allowing the hidden state z_k to capture more
through BottConv once again to further refine fine-grained intricate topological and textural details, while enabling the
Parallel Parallel Snake Parallel Bidirectional Diagonal Diagonal Snake

0 1 2 3 0 4 8 12 … 3

4 5 6 7 12 13 14 15 … 0

8 9 10 11 0 4 1 2 … 15

12 13 14 15 3 7 2 1 … 12

Patches Input Parallel Route1 Parallel Route2 Diagonal Route1 Diagonal Route2 Scan Sequence Output
SASS Scan Routes

Figure 4. Illustration of our proposed SASS and other scanning strategies. The first row presents four commonly used single scanning
paths, along with our proposed diagonal snake path. The second row illustrates the execution flow of our proposed SASS scanning strategy.

o \in \mathbb {R}^{ 1 \times H \times W } as follows:

o_1 = GBC(Concat(F_1^{up}, F_2^{up}, F_3^{up}, F_4^{up})) (14)

o = MLP(Conv(o_1)) (15)

3.5. Objective Function

Figure 5. Performance comparison between using BottConv and We use a blend of Binary Cross-Entropy loss (BCE) [29]
raw convolution in GBC on the TUT [33] dataset. and Dice loss [43] as the objective function, which helps
output u_k to more effectively integrate multi-directional improve the network’s robustness to imbalanced pixel data.
features. The overall loss function is expressed as follows:
To effectively combine the initial sequence x with the \label {eq:loss} L = \alpha \cdot L_{Dice} + \beta \cdot L_{BCE} (16)
sequence processed through SS2D, we incorporate Pixel
where the hyperparameters \alpha and \beta control the weights of
Attention-oriented Fusion (PAF) [33], enhancing SAVSS’s
the two loss components. The ratio of \alpha to \beta is set to 1:5.
ability to capture crack shape and texture details. Follow-
ing selective scanning, a residual connection is applied to 4. Experiments
the fused information to preserve detail and facilitate fea-
ture flow. Furthermore, GBC refines the inter-layer output 4.1. Datasets
within SAVSS, strengthening crack information extraction Crack500 [56] . The images in this dataset were captured
and boosting performance in later stages. using a mobile phone. The original dataset consists of 500
bitumen crack images, which were expanded to 3368 im-
3.4. Multi-scale Feature Segmentation Head ages through data augmentation. Each image is paired with
Unlike convolutional layers, the MLP swiftly learns the a corresponding pixel-level annotated binary image.
mapping relationships between features and labels, thereby DeepCrack [35] . The dataset comprises 537 RGB images
reducing model complexity. When the four feature maps of cement, bricks and bitumen cracks under various condi-
F_1, F_2, F_3, F_4 \in \mathbb {R}^{ C \times H \times W } produced by SAVSS are fed tions, including fine, wide, stained, and fuzzy cracks, ensur-
into MFS, they undergo individual processing through the ing diversity and representativeness.
efficient MLP operation and dynamic upsampling [34], CrackMap [22] . The dataset was created for a road crack
restoring their resolution to the original size and yielding segmentation study and consists of 120 high-resolution
F_1^{\text {up}}, F_2^{\text {up}}, F_3^{\text {up}}, F_4^{\text {up}} \in \mathbb {R}^{ C \times H \times W } . The formula is as follows: RGB images capturing a variety of thin and complex bi-
tumen road cracks.
F_i^{up} = DySample(MLP_i(F_i)) (13)
TUT [33] . In contrast to other datasets with simple back-
where i denotes the layer index. grounds, this dataset includes a dense, cluttered background
To integrate all multi-scale crack shape and texture rep- and features cracks with elaborate, intricate shapes. It con-
resentations, these feature maps are aggregated into a sin- tains 1408 RGB images across eight scenarios: bitumen,
gle tensor, obtaining a high-quality crack segmentation map cement, bricks, runway, tiles, metal, blades, and pipes.
Raw GT RIND SFIAN CTCrackSeg DTrCNet Crackmer MambaIR CSMamba PlainMamba CrackSCF SCSegamba

Figure 6. Visual comparison of typical cracks with 9 SOTA methods across four datasets. Red boxes highlight critical details, and green
boxes mark misidentified regions.

During processing, all datasets were divided into train- mIoU is used to measure the mean proportion of the in-
ing, validation, and test sets with a 7:1:2 ratio. tersection over union between the ground truth and the pre-
dicted results. The calculation is given by the formula:
4.2. Implementation Details.
Experimental Settings. We built our SCSegamba network
mIoU = \frac {1}{N+1} \sum _{l=0}^{N} \frac {p_{ll}}{\sum _{t=0}^{N} p_{lt} + \sum _{t=0}^{N} p_{tl} - p_{ll}} (19)
using PyTorch v1.13.1 and trained it on an Intel Xeon Plat-
inum 8336C CPU with eight Nvidia GeForce RTX 4090
GPUs. The AdamW optimizer was used with an initial where N is the number of classes, which we set as
learning rate of 5e-4, PolyLR scheduling, a weight decay ; N = 1
t represents the ground truth, l represents the predicted
of 0.01, and a random seed of 42. The network was trained value, and p_{tl} represents the count of pixels classified as l
for 50 epochs, and the model with the best validation per- but belonging to t .
formance was selected for testing. Additionally, we evaluated our method’s complexity us-
Comparison Methods. To comprehensively evaluate our ing three metrics: FLOPs, Params, and Model Size, rep-
model, we compared SCSegamba with 9 SOTA methods. resenting computational complexity, parameter complexity,
The CNN or Transformer-based models included RIND and memory footprint.
[38], SFIAN [5], CTCrackSeg [44], DTrCNet [48], Crack-
mer [46] and SimCrack [20]. Additionally, we compared 4.3. Comparison with SOTA Methods
it with other Mamba-based models, including CSMamba As listed in Table 1, compared with 9 other SOTA meth-
[37], PlainMamba [55], and MambaIR [16]. ods, our proposed SCSegamba achieves the best perfor-
Evaluation Metrics. We used six metrics to evaluate mance across four public datasets. Specifically, on the
SCSegamba’s performance: Precision (P), Recall (R), F1 Crack500 [56] and DeepCrack [35] datasets, which con-
2RP tain larger and more complex crack regions, SCSegamba
Score (F 1 = R+P ), Optimal Dataset Scale (ODS), Opti-
mal Image Scale (OIS), and mean Intersection over Union achieved the highest performance. Notably, on the Deep-
(mIoU). ODS measures the model’s adaptability to datasets Crack dataset, it surpassed the next best method by 1.50%
of varying scales at a fixed threshold m, while OIS evaluates in F1 score and 1.09% in mIoU. This improvement is due
adaptability across image scales at an optimal threshold n. to the robust ability of GBC to capture morphological clues
The calculation formulas are as follows: in large crack areas, enhancing the model’s representational
power. On the CrackMap [22] dataset, which features thin-
ODS = \max _m \frac {2 \cdot P_m \cdot R_m}{P_m + R_m} (17) ner and more elongated cracks, our method surpasses all
other SOTA methods in every metric, outperforming the
next best method by 2.06% in F1 and 1.65% in mIoU. This
OIS = \frac {1}{N} \sum _{i=1}^{N} \max _n \frac {2 \cdot P_{n,i} \cdot R_{n,i}}{P_{n,i} + R_{n,i}} (18) demonstrates the effectiveness of SASS in capturing fine
textures and elongated crack structures. As illustrated in
Crack500 DeepCrack
Methods
ODS OIS P R F1 mIoU ODS OIS P R F1 mIoU
RIND [38] 0.6469 0.6483 0.6998 0.7245 0.7119 0.7381 0.8087 0.8267 0.7896 0.8920 0.8377 0.8391
SFIAN [5] 0.6977 0.7348 0.6983 0.7742 0.7343 0.7604 0.8616 0.8928 0.8549 0.8692 0.8620 0.8776
CTCrackSeg [44] 0.6941 0.7059 0.6940 0.7748 0.7322 0.7591 0.8819 0.8904 0.9011 0.8895 0.8952 0.8925
DTrCNet [48] 0.7012 0.7241 0.6527 0.8280 0.7357 0.7627 0.8473 0.8512 0.8905 0.8251 0.8566 0.8661
Crackmer [46] 0.6933 0.7097 0.6985 0.7572 0.7267 0.7591 0.8712 0.8785 0.8946 0.8783 0.8864 0.8844
SimCrack [20] 0.7127 0.7308 0.7093 0.7984 0.7516 0.7715 0.8570 0.8722 0.8984 0.8549 0.8761 0.8744
CSMamba [37] 0.6931 0.7162 0.6858 0.7823 0.7315 0.7592 0.8738 0.8766 0.9025 0.8863 0.8943 0.8863
PlainMamba [55] 0.7035 0.7173 0.7170 0.7557 0.7358 0.7682 0.8646 0.8668 0.9050 0.8659 0.8850 0.8788
MambaIR [16] 0.7043 0.7189 0.7204 0.7681 0.7435 0.7663 0.8796 0.8840 0.9056 0.8895 0.8975 0.8907
SCSegamba (Ours) 0.7244 0.7370 0.7270 0.7859 0.7553 0.7778 0.8938 0.8990 0.9097 0.9124 0.9110 0.9022
CrackMap TUT
Methods
ODS OIS P R F1 mIoU ODS OIS P R F1 mIoU
RIND [38] 0.6745 0.6943 0.6023 0.7586 0.6699 0.7425 0.7531 0.7891 0.7872 0.7665 0.7767 0.8051
SFIAN [5] 0.7200 0.7465 0.6715 0.7668 0.7160 0.7748 0.7290 0.7513 0.7715 0.7367 0.7537 0.7896
CTCrackSeg [44] 0.7289 0.7373 0.6911 0.7669 0.7270 0.7785 0.7940 0.7996 0.8202 0.8195 0.8199 0.8301
DTrCNet [48] 0.7328 0.7413 0.6912 0.7681 0.7276 0.7812 0.7987 0.8073 0.7972 0.8441 0.8202 0.8331
Crackmer [46] 0.7395 0.7437 0.7229 0.7467 0.7346 0.7860 0.7429 0.7640 0.7501 0.7656 0.7578 0.7966
SimCrack [20] 0.7559 0.7625 0.7380 0.7672 0.7523 0.7963 0.7984 0.8090 0.8051 0.8371 0.8208 0.8334
CSMamba [37] 0.7371 0.7413 0.7053 0.7663 0.7346 0.7841 0.7879 0.7946 0.7947 0.8353 0.8146 0.8263
PlainMamba [55] 0.7150 0.7189 0.6649 0.7616 0.7099 0.7699 0.7867 0.7967 0.7701 0.8523 0.8102 0.8253
MambaIR [16] 0.7332 0.7347 0.7569 0.7013 0.7280 0.7834 0.7861 0.7930 0.7877 0.8387 0.8125 0.8249
SCSegamba (Ours) 0.7741 0.7766 0.7629 0.7727 0.7678 0.8094 0.8204 0.8255 0.8241 0.8545 0.8390 0.8479

Table 1. Comparison with 9 SOTA methods across 4 datasets. Best results are in bold, and second-best results are underlined.

Methods Year FLOPs↓ Params↓ Size↓ that our method, with the enhanced crack morphology
RIND [38] 2021 695.77G 59.39M 453MB and texture perception from GBC and SASS, exhibits ex-
SFIAN [5] 2023 84.57G 13.63M 56MB ceptional robustness and stability. Additionally, leverag-
CTCrackSeg [44] 2023 39.47G 22.88M 174MB ing MFS for feature aggregation improves multi-scale per-
DTrCNet [48] 2023 123.20G 63.45M 317MB ception, making our model particularly suited for diverse,
Crackmer [46] 2024 14.94G 5.90M 43MB interference-rich scenarios.
SimCrack [20] 2024 286.62G 29.58M 225MB 4.4. Complexity Analysis
CSMamba [37] 2024 145.84G 35.95M 233MB Table 2 shows a comparison of the complexity of our
PlainMamba [55] 2024 73.36G 16.72M 96MB method with other SOTA methods when the input image
MambaIR [16] 2024 47.32G 10.34M 79MB size is uniformly set to 512. With only 2.80M param-
SCSegamba (Ours) 2024 18.16G 2.80M 37MB eters and a model size of 37MB, our method surpasses
all others, being 52.54% and 13.95% lower than the next
Table 2. Comparison of complexity with other methods. Best re- best result, respectively. Additionally, compared to Crack-
sults are in bold, and second-best results are underlined. mer [46], which prioritizes computational efficiency, our
Figure 6, our method produces clearer and more precise fea- method’s FLOPs are only 3.22G higher. This demonstrates
ture maps, with superior detail capture in typical scenarios that the combination of lightweight SAVSS and MFS en-
such as cement and bitumen, compared to other methods. ables high-quality segmentation in noisy crack scenes with
For the TUT dataset [33], which includes eight diverse minimal parameters and low computational load, which is
scenarios, our method achieved the best performance, sur- essential for resource-constrained devices.
passing the next best method by 2.21% in F1 and 1.74% in 4.5. Ablation Studies
mIoU. As shown in Figure 6, whether in the complex crack We performed ablation experiments on the representative
topology of plastic tracks, the noise-heavy backgrounds of multi-scenario dataset TUT [33].
metallic materials and turbine blades, or the low-contrast, Ablation study of segmentation heads. As listed in Ta-
dimly lit underground pipeline images, SCSegamba consis- ble 3, with our designed MFS, SCSegamba achieved the
tently produced high-quality segmentation maps while ef- best results across all six metrics, with F1 and mIoU scores
fectively suppressing irrelevant noise. This demonstrates 1.57% and 1.21% higher than the second-best method. In
Seg Head ODS OIS P R F1 mIoU Params ↓ FLOPs ↓ Model Size ↓
UNet [42] 0.8055 0.8151 0.8148 0.8376 0.8260 0.8378 2.92M 19.27G 39MB
Ham [11] 0.7703 0.7784 0.7962 0.7838 0.7909 0.8124 2.86M 35.08G 38MB
SegFormer [50] 0.7947 0.7983 0.8170 0.8174 0.8172 0.8307 2.79M 17.87G 35MB
MFS 0.8204 0.8255 0.8241 0.8545 0.8390 0.8479 2.80M 18.16G 37MB

Table 3. Ablation study of different segmentation heads. UNet [42], Ham [11], and SegFormer [50] are high-performance heads.

GBC PAF Res ODS OIS P R F1 mIoU Params ↓ FLOPs ↓ Model Size ↓
0.8136 0.8196 0.8213 0.8461 0.8335 0.8434 2.49M 16.75G 34MB
0.7998 0.8084 0.7918 0.8524 0.8222 0.8343 2.28M 14.91G 33MB
0.7936 0.8069 0.7952 0.8438 0.8197 0.8313 2.48M 15.65G 35MB
0.8047 0.8102 0.8174 0.8379 0.8275 0.8377 2.54M 17.08G 35MB
0.8116 0.8200 0.8156 0.8522 0.8334 0.8425 2.75M 17.82G 37MB
0.8023 0.8076 0.8219 0.8302 0.8260 0.8360 2.54M 15.99G 35MB
0.8204 0.8255 0.8241 0.8545 0.8390 0.8479 2.80M 18.16G 37MB

Table 4. Ablation study of components within the SAVSS block. Best results are in bold, and second-best results are underlined.
terms of complexity, although Params, FLOPs, and Model Ablation studies of scanning strategies. As listed in Ta-
Size are only 0.01M, 0.29G, and 2MB larger than the Seg- ble 5, under the same conditions of using four different di-
Former head, our method surpasses it in F1 and mIoU by rectional scanning paths, the model achieved the best per-
2.67% and 2.07%, respectively. This demonstrates that formance with our designed SASS scanning strategy, im-
MFS enhances SAVSS output integration, significantly improving F1 and mIoU by 0.30% and 0.33% over the diag-
proving performance while keeping the model lightweight. onal snake strategy. This demonstrates SASS’s ability to
construct semantic and dependency information suited to
Scan ODS OIS P R F1 mIoU crack topology, enhancing crack pixel perception in sub-
Parallel 0.8123 0.8184 0.8146 0.8523 0.8330 0.8427 sequent modules. More comprehensive experiments and
Diag 0.8091 0.8148 0.8225 0.8417 0.8320 0.8410 real-world deployments are available in the Appendix.
ParaSna 0.8102 0.8162 0.8219 0.8365 0.8291 0.8408 5. Conclusion
DiagSna 0.8153 0.8215 0.8237 0.8497 0.8365 0.8451 In this paper, we proposed SCSegamba, a lightweight
SASS 0.8204 0.8255 0.8241 0.8545 0.8390 0.8479 structure-aware Vision Mamba for precise pixel-level crack
segmentation. SCSegamba combines SAVSS and MFS to
Table 5. Ablation studies with different four-route scanning strate-
enhance crack shape and texture perception with a low
gies in the SAVSS block, comparing parallel, diagonal, parallel
parameter count. Equipped with GBC and SASS scan-
snake, and diagonal snake scanning. The impact of different scan-
ning strategies on complexity is negligible; thus, complexity anal- ning, SAVSS, captures irregular crack textures across var-
ysis is omitted from this table. Best results are in bold, and second- ious structures. Experiments on four datasets show SC-
best results are underlined. Segamba’s exceptional performance, especially in complex,
noisy scenarios. On the challenging multi-scenario dataset,
Ablation study of components. Table 4 shows the im- it achieved an F1 score of 0.8390 and mIoU of 0.8479 with
pact of each component in SAVSS on model performance. only 18.16G FLOPs and 2.8M parameters, demonstrating
When fully utilizing GBC, PAF, and residual connections, its effectiveness for real-world crack detection and suitabil-
our model achieved the best results across all metrics. No- ity for edge devices. Future work will incorporate multi-
tably, adding GBC led to significant improvements in F1 modal cues to enhance segmentation quality, while further
and mIoU by 1.57% and 1.42%, respectively, highlighting optimizing VSS design and scan strategies to achieve high-
its strength in capturing crack morphology cues. Similarly, quality results with low computational resources.
residual connections boosted F1 and mIoU by 0.13% and 6. Acknowledgement
2.47%, indicating their role in focusing on essential crack
This work was supported by the National Natural Sci-
features. Although using only PAF resulted in the lowest
ence Foundation of China (NSFC) under Grants 62272342,
Params, FLOPs, and Model Size, it significantly reduced
62020106004, 62306212, and T2422015; the Tianjin Natu-
performance. These findings demonstrate that our fully in-
ral Science Foundation under Grants 23JCJQJC00070 and
tegrated SAVSS effectively captures crack morphology and
24PTLYHZ00320; and the Marie Skłodowska-Curie Ac-
texture cues, achieving top pixel-level segmentation results
tions (MSCA) under Project No. 101111188.
while maintaining a lightweight model.
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack
Segmentation in Structures
Supplementary Material
7. Details of SASS and Ablation Experiments α:β ODS OIS P R F1 mIoU
BCE 0.8099 0.8151 0.8207 0.8457 0.8330 0.8414
As described in Subsection 3.3, the SASS strategy enhances Dice 0.8022 0.8072 0.8038 0.8430 0.8231 0.8358
semantic capture in complex crack regions by scanning tex- 5:1 0.8125 0.8168 0.8207 0.8432 0.8319 0.8428
ture cues from multiple directions. SASS combines parallel 4:1 0.8144 0.8184 0.8217 0.8442 0.8328 0.8437
snake and diagonal snake scans, aligning the scanning paths 3:1 0.8180 0.8229 0.8293 0.8436 0.8364 0.8463
with the actual extension and irregular shapes of cracks, en- 2:1 0.8098 0.8152 0.8204 0.8392 0.8297 0.8408
suring comprehensive capture of texture information. 1:1 0.8123 0.8184 0.8141 0.8507 0.8320 0.8423
To evaluate the necessity of using four scanning paths 1:2 0.8152 0.8214 0.8210 0.8484 0.8345 0.8443
in SASS, we conducted ablation experiments with different 1:3 0.8109 0.8163 0.8226 0.8396 0.8310 0.8418
path numbers across various scanning strategies on multi- 1:4 0.8133 0.8185 0.8163 0.8515 0.8336 0.8433
scenario dataset TUT. As listed in Table 6, all strategies 1:5 0.8204 0.8255 0.8241 0.8545 0.8390 0.8479
performed significantly better with four paths than with
two, likely because four paths allow SAVSS to capture finer Table 7. Sensitivity analysis experiments with different α and β
ratios. Best results are in bold, and second-best results are under-
crack details and topological cues. Notably, aside from
lined.
SASS, the diagonal snake-like scan consistently achieved
the second-best results, with two-path configurations yield-
ing F1 and mIoU scores 0.48% and 0.45% higher than the 8. Details of Objective Function and Analysis
diagonal unidirectional scan. This indicates that the diago-
nal snake-like scan provides more continuous semantic in- The calculation formulas for BCE [29] loss and Dice [43]
formation, enhancing segmentation. Importantly, our pro- loss are as follows:
posed SASS achieved the best results with both two-path
and four-path setups, demonstrating its effectiveness in cap-
L_{Dice} = 1 - \frac {2 \sum _{j=1}^M p_j \hat {p}_j + \epsilon }{\sum _{j=1}^M p_j + \sum _{j=1}^M \hat {p}_j + \epsilon } (20)
turing diverse crack topologies.
To clarify the implementation of our proposed SASS, we
present its execution process in Algorithm 1.
L_{BCE} = -\frac {1}{N} \left [ p_j \log (\hat {p}_j) + (1 - p_j) \log (1 - \hat {p}_j) \right ] (21)

N ODS OIS P R F1 mIoU where M denotes the number of samples, p_j is the ground
truth label for the j -th sample, \hat {p}_j is the predicted probabil-
PaSna Parallel

2 0.8032 0.8126 0.7994 0.8474 0.8231 0.8365

4 0.8123 0.8184 0.8146 0.8523 0.8330 0.8427 ity for the j -th sample, \epsilon is a small constant.
2 0.8035 0.8124 0.8062 0.8458 0.8258 0.8369 In equation 16, the ratio of \alpha to \beta is set to 1:5. This is the
4 0.8102 0.8162 0.8219 0.8365 0.8291 0.8408 optimal ratio of α and β selected after experimenting with
2 0.8080 0.8166 0.8058 0.8496 0.8271 0.8408 various hyperparameter settings on multi-scenario dataset.
Diag

4 0.8091 0.8148 0.8225 0.8417 0.8320 0.8410 As listed in Table 7, setting the α to β ratio at 1:5 yields
the best performance, with improvements of 0.65% in F1
SASS DigSna

2 0.8094 0.8162 0.8185 0.8470 0.8325 0.8413

4 0.8153 0.8215 0.8237 0.8497 0.8365 0.8451 and 0.55% in mIoU over the 1:2 ratio. This suggests that
2 0.8130 0.8192 0.8196 0.8478 0.8335 0.8430 balancing Dice and BCE loss at a 1:5 ratio helps the model
4 0.8204 0.8255 0.8241 0.8545 0.8390 0.8479 better distinguish background pixels from the few crack re-
gion pixels, thereby enhancing performance.
Table 6. Ablation study on the number of paths in different scan-
ning strategies. N represents the number of paths. For two-path
scans, SASS uses the first parallel snake and diagonal snake scans, 9. Visualisation Comparisons
while other methods use the first two paths. Best results are in
bold, and second-best results are underlined. To visually demonstrate the advantages of SCSegamba, we
present detailed visual results in Figure 7. For the Crack500
[56], DeepCrack [35], and CrackMap [22] datasets, which
Crack500
DeepCrack
CrackMap
TUT

Raw GT RIND SFIAN CTCrackseg DTrCNet Crackmer SimCrack CSMamba PlainMamba MambaIR SCSegamba

Figure 7. Visual comparison with 9 SOTA methods across four public datasets. Red boxes highlight critical details, and green boxes mark
misidentified regions.

primarily include bitumen, concrete, and brick scenarios continuity and fine segmentation, resulting in discontinu-
with minimal background noise and a range of crack thick- ities and expanded segmentation areas that do not align with
nesses, our method consistently achieves accurate segmen- actual crack images.
tation, even capturing intricate fine cracks. This is attributed For the TUT [33] dataset, which includes diverse scenar-
to GBC’s strong capability in capturing crack morphology. ios and significant background noise, our method excels at
In contrast, other methods show weaker performance in suppressing interference. For instance, in images of cracks
Layer Num ODS OIS P R F1 mIoU Params ↓ FLOPs ↓ Model Size ↓
2 0.8102 0.8165 0.8181 0.8420 0.8299 0.8413 1.56M 12.26G 20MB
4 0.8204 0.8255 0.8241 0.8545 0.8390 0.8479 2.80M 18.16G 37MB
8 0.8174 0.8222 0.8199 0.8579 0.8387 0.8461 5.23M 29.27G 68MB
16 0.8126 0.8187 0.8226 0.8475 0.8349 0.8430 10.08M 51.51G 127MB
32 0.5203 0.5365 0.5830 0.5680 0.5754 0.6785 19.79M 95.97G 247MB

Table 8. Experiments with different numbers of SAVSS layers. Best results are in bold, and second-best results are underlined.

Patch Size ODS OIS P R F1 mIoU Params ↓ FLOPs ↓ Model Size ↓

4 0.8053 0.8128 0.8146 0.8443 0.8294 0.8381 2.61M 51.81G 34MB
8 0.8204 0.8255 0.8241 0.8545 0.8390 0.8479 2.80M 18.16G 37MB
16 0.7910 0.7937 0.8126 0.8141 0.8133 0.8272 3.59M 9.74G 45MB
32 0.7318 0.7364 0.7535 0.7576 0.7555 0.7879 6.74M 7.64G 82MB

Table 9. Experiments with different patch sizes. Best results are in bold, and second-best results are underlined.

Methods ODS OIS P R F1 mIoU Params ↓ FLOPs ↓ Model Size ↓

MambaIR [16] 0.7869 0.7956 0.7714 0.8445 0.8071 0.8240 3.57M 19.71G 29MB
CSMamba [37] 0.7140 0.7201 0.6934 0.8171 0.7503 0.7773 12.68M 15.44G 84MB
PlainMamba [55] 0.7787 0.7896 0.7617 0.8531 0.8064 0.8201 2.20M 14.09G 18MB
SCSegamba (Ours) 0.8204 0.8255 0.8241 0.8545 0.8390 0.8479 2.80M 18.16G 37MB

Table 10. Comparison experiments of different Mamba-based methods using 4 VSS layers. Best results are in bold, and second-best results
are underlined.

on generator blades and steel pipes, it effectively minimizes mance and resource efficiency, making it ideal for practical
irrelevant noise and provides precise crack segmentation. applications.
This performance is largely attributed to SAVSS’s accu- Comparison with different Patch Size. In our SAVSS,
rate capture of crack topologies. In contrast, CNN-based we set the Patch Size to 8 during Patch Embedding. To ver-
methods like RIND [38] and SFIAN [5] struggle to dis- ify its effectiveness, we conducted experiments with various
tinguish background noise from crack regions, highlighting Patch Sizes. As listed in Table 9, a Patch Size of 8 yields
their limitations in contextual dependency capture. Other the best performance, with F1 and mIoU scores 1.16% and
Transformer and Mamba-based methods also fall short in 1.17% higher than a Patch Size of 4. Although a smaller
segmentation continuity and detail handling compared to Patch Size of 4 reduces parameters and model size, it limits
our approach. the receptive field and hinders the effective capture of longer
textures, impacting segmentation. As shown in Figure 9, as
10. Additional Analysis the Patch Size increases, parameter count and model size
To provide a thorough demonstration of the necessity of decrease, but the computational load per patch rises, affect-
each component in our proposed SCSegamba, we con- ing efficiency. At a Patch Size of 32, performance drops
ducted a more extensive analysis experiment. significantly due to reduced fine-grained detail capture and
Comparison with different numbers of SAVSS layers. sensitivity to contextual variations. Thus, a Patch Size of 8
In our SCSegamba, we used 4 layers of SAVSS blocks to balances detail accuracy and generalization while maintain-
balance performance and computational requirements. As ing model efficiency.
listed in Table 8, 4 layers achieved optimal results, with F1 Comparison under the same number of VSS layers. In
and mIoU scores 0.036% and 0.21% higher than with 8 lay- Subsection 4.3, we compare SCSegamba with other SOTA
ers, while reducing parameters by 2.43M, computation by methods, using default VSS layer settings for Mamba-
11.11G, and model size by 31MB. Although using only 2 based models like MambaIR [16], CSMamba [37], and
layers minimized resource demands, with 1.56M parame- PlainMamba [55]. To examine complexity and perfor-
ters, performance decreased. Conversely, using 32 layers mance under uniform VSS layer counts, we set all Mamba-
increased resource use and reduced performance due to re- based models to 4 VSS layers and conducted comparisons.
dundant features, which impacted generalization. Thus, 4 As listed in Table 2 and 10, although computational re-
SAVSS layers strike an effective balance between perfor- quirements for MambaIR, CSMamba, and PlainMamba de-
Get Video

Input Video
LiDAR
Camera

Upload

Mo
ve
Server

Control by Command

Processing

Console
Output Video

Figure 8. Schematic of real-world deployment. The intelligent vehicle is placed on an outdoor road surface, and we use the server terminal
to remotely control it. The vehicle transmits the video data in real-time to the server, where it is processed to obtain the final output.

crease, their performance drops significantly. For exam- SOTA methods. Specifically, our experimental system con-
ple, CSMamba’s F1 and mIoU scores drop to 0.7503 and sists of two main components: the intelligent vehicle and
0.7773. While PlainMamba with 4 layers achieves reduc- the server. The intelligent vehicle used is a Turtlebot4 Lite
tions of 0.60M in parameters, 4.07G in FLOPs, and 19MB driven by a Raspberry Pi 4, equipped with a LiDAR and
in model size, SCSegamba surpasses it by 4.04% in F1 and a camera. The camera model is OAK-D-Pro, fitted with
3.39% in mIoU. Thus, with 4 SAVSS layers, SCSegamba an OV9282 image sensor capable of capturing high-quality
balances performance and efficiency, capturing crack mor- crack images. The server is a laptop equipped with a Core
phology and texture for high-quality segmentation. i9-13900 CPU running Ubuntu 22.04. The intelligent ve-
hicle and server communicate via the internet. This setup
simulates resource-limited equipment to evaluate the per-
formance of our SCSegamba in real-world deployment sce-
narios.
As shown in Figure 8, in the real-world deployment pro-
cess, the intelligent vehicle was placed on an outdoor road
surface filled with cracks. We remotely controlled the ve-
hicle from the server terminal, directing it to move forward
Figure 9. Comparison of computing resources required for differ- in a straight line at a speed of 0.15 m/s. The camera cap-
ent Patch Size tured video at a frame rate of 30 frames per second. The
vehicle transmitted the recorded video data to the server in
real-time via the network. To accelerate data transmission
11. Real-world Deployment Applications from the vehicle to the server, we set the recording reso-
lution to 512 × 512. Upon receiving the video data, the
To validate the effectiveness of our proposed SCSegamba in server first segmented it into frames, then fed each frame
real-world applications, we conducted a practical deploy- into the pre-trained SCSegamba model, which was trained
ment and compared its real-world performance with other on all datasets, for inference. After segmentation, the server
Methods Inf Time↓
RIND [38] 0.0909s
SFIAN [5] 0.0286s Algorithm 1 SASS execution process
CTCrackseg [44] 0.0357s 1: Input: Patch matrix dimensions H, W
DTrCNet [48] 0.0213s 2: Output: O = (o1, o2, o3, o4), O inverse =
Crackmer [46] 0.0323s (o1 inverse, o2 inverse, o3 inverse, o4 inverse),
SimCrack [20] 0.0345s D = (d1, d2, d3, d4)
CSMamba [37] 0.0625s 3: Initialize: L = H × W
PlainMamba [55] 0.1667s 4: Initialize (i, j) ← (0, 0) for o1, (H − 1, W − 1) if H is
MambaIR [16] 0.0400s odd else (H − 1, 0) for o2
SCSegamba (Ours) 0.0313s 5: id ← down, jd ← lef t if H is odd else right
6: while j < W or i ≥ 0 do
Table 11. Comparison of inference time with other SOTA methods
7: idx ← i × W + j, append idx to o1, set
on resource-constrained server.
o1 inverse[idx]
8: if id = down and i < H − 1 then
recombined the processed frames into a video, yielding the 9: i ← i + 1, add down to d1
final output. This setup simulates real-time crack segmen- 10: else
tation in an real-world production process. 11: j ← j + 1, id ← up if i = H − 1 else down, add
Additionally, we deployed the weight files of other right to d1
SOTA methods on the server for comparison. As listed 12: end if
in Table 11, our SCSegamba achieved an inference speed 13: idx ← i × W + j, append idx to o2, set
of 0.0313 seconds per frame on the resource-constrained o2 inverse[idx]
server, outperforming most other methods. This demon- 14: if jd = right and j < W − 1 then
strates that our method has excellent real-time performance, 15: j ← j + 1, add right to d2
making it suitable for real-time segmentation of cracks in 16: else
video data. 17: i ← i − 1, jd ← lef t if j = W − 1 else right,
As shown in Figure 10, compared to other SOTA meth- add up to d2
ods, our SCSegamba better suppresses irrelevant noise in 18: end if
video data and generates continuous crack region segmen- 19: end while
tation maps. For instance, although SSM-based methods 20: d1 ← [dstart ] + d1[: −1], d2 ← [dstart ] + d2[: −1]
like PlainMamba [55], MambaIR [16], and CSMamba [37] 21: for diag ← 0 to H + W − 2 do
achieve continuous segmentation, they tend to produce false 22: direction ← right if diag is even else down
positives in some irrelevant noise spots. Additionally, while 23: for k ← 0 to min(diag + 1, H, W ) − 1 do
CNN and Transformer-based methods achieve high metrics 24: i, j ← (diag−k, k) if diag is even else (k, diag−k)
and performance on datasets with faster inference speed,
their performance on video data is suboptimal, often show- 25: if j < W then
ing discontinuous segmentation and poor background sup- 26: idx ← i × W + j
pression. For example, cracks segmented by DTrCNet [48] 27: Append idx to o3, set o3 inverse[idx], add
and CTCrackSeg [44] exhibit significant discontinuities, direction to d3
and Crackmer [46] struggles to distinguish between crack 28: end if
and background regions. Based on the above real-world 29: i, j ← (diag − k, W − k − 1) if diag is even else
deployment results, our SCSegamba produces high-quality (k, W − diag + k − 1)
segmentation results on crack video data with low param- 30: if j < W then
eters and computational resources, making it more suit- 31: idx ← i × W + j
able for deployment on resource-constrained devices and 32: Append idx to o4, set o4 inverse[idx], add
demonstrating its strong performance in practical produc- direction to d4
tion scenarios. 33: end if
34: end for
35: end for
36: d3 ← [dstart ] + d3[: −1], d4 ← [dstart ] + d4[: −1]
37: Return: O, O inverse, D
Frame
Frame 501 Frame 401 Frame 301 Frame 201 Frame 101 Frame 001
Original
GT
SCSegamba
CSMamba PlainMamba MambaIR
SimCrack
Crackmer
DTrCNet
CTCrackseg
SFIAN
RIND

Figure 10. Visualisation comparison on video data keyframes. The interval between keyframes is 100 frames in order to ensure continuity
of observation. Red boxes highlight critical details, and green boxes mark misidentified regions.
References [13] Albert Gu, Karan Goel, Ankit Gupta, and Christopher Ré.
On the parameterization and initialization of diagonal state
[1] Zaid Al-Huda, Bo Peng, Riyadh Nazar Ali Algburi, Muga- space models. Advances in Neural Information Processing
hed A Al-antari, AL-Jarazi Rabea, Omar Al-maqtari, and Systems, 35:35971–35983, 2022. 3
Donghai Zhai. Asymmetric dual-decoder-u-net for pavement
[14] Albert Gu, Karan Goel, and Christopher Ré. Efficiently mod-
crack semantic segmentation. Automation in Construction,
eling long sequences with structured state spaces. In The In-
156:105138, 2023. 2
ternational Conference on Learning Representations, 2022.
[2] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian 2, 3
Schroff, and Hartwig Adam. Encoder-decoder with atrous
[15] Albert Gu, Karan Goel, and Christopher Ré. Efficiently mod-
separable convolution for semantic image segmentation. In
eling long sequences with structured state spaces. In The In-
Proceedings of the European conference on computer vision,
ternational Conference on Learning Representations, 2022.
pages 801–818, 2018. 1
2
[3] Zhuangzhuang Chen, Jin Zhang, Zhuonan Lai, Jie Chen, Zun [16] Hang Guo, Jinmin Li, Tao Dai, Zhihao Ouyang, Xudong
Liu, and Jianqiang Li. Geometry-aware guided loss for deep Ren, and Shu-Tao Xia. Mambair: A simple baseline for im-
crack recognition. In Proceedings of the IEEE/CVF Con- age restoration with state-space model. In European Confer-
ference on Computer Vision and Pattern Recognition, pages ence on Computer Vision, pages 222–241. Springer, 2024. 2,
4703–4712, 2022. 1 6, 7, 3, 5
[4] Zhuangzhuang Chen, Zhuonan Lai, Jie Chen, and Jianqiang [17] Jing-Ming Guo, Herleeyandi Markoni, and Jiann-Der Lee.
Li. Mind marginal non-crack regions: Clustering-inspired Barnet: Boundary aware refinement network for crack detec-
representation learning for crack segmentation. In Proceed- tion. IEEE Transactions on Intelligent Transportation Sys-
ings of the IEEE/CVF Conference on Computer Vision and tems, 23(7):7343–7358, 2021. 2
Pattern Recognition, pages 12698–12708, 2024. 1
[18] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
[5] Xu Cheng, Tian He, Fan Shi, Meng Zhao, Xiufeng Liu, and Deep residual learning for image recognition. In Proceed-
Shengyong Chen. Selective feature fusion and irregular- ings of the IEEE conference on computer vision and pattern
aware network for pavement crack detection. IEEE Trans- recognition, pages 770–778, 2016. 2
actions on Intelligent Transportation Systems, 2023. 1, 2, 6,
[19] Yung-An Hsieh and Yichang James Tsai. Machine learning
7, 3, 5
for crack detection: Review and model performance com-
[6] Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. parison. Journal of Computing in Civil Engineering, 34(5):
Generating long sequences with sparse transformers. arXiv 04020038, 2020. 1
preprint arXiv:1904.10509, 2019. 2 [20] Achref Jaziri, Martin Mundt, Andres Fernandez, and Vis-
[7] Wooram Choi and Young-Jin Cha. Sddnet: Real-time crack vanathan Ramesh. Designing a hybrid neural system to
segmentation. IEEE Transactions on Industrial Electronics, learn real-world crack segmentation from fractal-based sim-
67(9):8016–8025, 2019. 2 ulation. In Proceedings of the IEEE/CVF Winter Confer-
[8] Yann N Dauphin, Angela Fan, Michael Auli, and David ence on Applications of Computer Vision, pages 8636–8646,
Grangier. Language modeling with gated convolutional net- 2024. 6, 7, 5
works. In International conference on machine learning, [21] Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and
pages 933–941. PMLR, 2017. 4 François Fleuret. Transformers are rnns: Fast autoregressive
[9] Jiaxiu Dong, Niannian Wang, Hongyuan Fang, Wentong transformers with linear attention. In International confer-
Guo, Bin Li, and Kejie Zhai. Mfafnet: An innovative crack ence on machine learning, pages 5156–5165. PMLR, 2020.
intelligent segmentation method based on multi-layer feature 2
association fusion network. Advanced Engineering Infor- [22] Iason Katsamenis, Eftychios Protopapadakis, Nikolaos
matics, 62:102584, 2024. 1, 2 Bakalos, Andreas Varvarigos, Anastasios Doulamis, Niko-
[10] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, laos Doulamis, and Athanasios Voulodimos. A few-shot at-
Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, tention recurrent residual u-net for crack segmentation. In
Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- International Symposium on Visual Computing, pages 199–
vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is 209. Springer, 2023. 5, 6, 1
worth 16x16 words: Transformers for image recognition at [23] Narges Kheradmandi and Vida Mehranfar. A critical re-
scale. In The International Conference on Learning Repre- view and comparative study on image segmentation-based
sentations, 2021. 1, 3 techniques for pavement crack detection. Construction and
[11] Zhengyang Geng, Meng-Hao Guo, Hongxu Chen, Xia Li, Building Materials, 321:126162, 2022. 1
Ke Wei, and Zhouchen Lin. Is attention better than matrix [24] Hong Lang, Ye Yuan, Jiang Chen, Shuo Ding, Jian John Lu,
decomposition? In International Conference on Learning and Yong Zhang. Augmented concrete crack segmentation:
Representations, 2021. 8 Learning complete representation to defend background in-
[12] Karan Goel, Albert Gu, Chris Donahue, and Christopher Ré. terference in concrete pavements. IEEE Transactions on In-
It’s raw! audio generation with state-space models. In In- strumentation and Measurement, 2024. 1
ternational Conference on Machine Learning, pages 7616– [25] David Lattanzi and Gregory R Miller. Robust automated
7633. PMLR, 2022. 3 concrete damage detection algorithms for field applications.
Journal of Computing in Civil Engineering, 28(2):253–262, [39] Haochen Qi, Xiangwei Kong, Zhibo Jin, Jiqiang Zhang, and
2014. 2 Zinan Wang. A vision-transformer-based convex variational
[26] Qin Lei, Jiang Zhong, and Chen Wang. Joint optimization network for bridge pavement defect segmentation. IEEE
of crack segmentation with an adaptive dynamic threshold Transactions on Intelligent Transportation Systems, 2024. 2
module. IEEE Transactions on Intelligent Transportation [40] Zhong Qu, Wen Chen, Shi-Yan Wang, Tu-Ming Yi, and
Systems, 2024. 2 Ling Liu. A crack detection algorithm for concrete pave-
[27] Huaiyuan Li, Hui Li, Chuang Li, Baohai Wu, and Jinghuai ment based on attention mechanism and multi-features fu-
Gao. Hybrid swin transformer-cnn model for pore-crack sion. IEEE Transactions on Intelligent Transportation Sys-
structure identification. IEEE Transactions on Geoscience tems, 23(8):11710–11719, 2021. 2
and Remote Sensing, 2024. 2 [41] Jianing Quan, Baozhen Ge, and Min Wang. Crackvit: a uni-
[28] Jialin Li, Qiang Nie, Weifu Fu, Yuhuan Lin, Guangpin Tao, fied cnn-transformer model for pixel-level crack extraction.
Yong Liu, and Chengjie Wang. Lors: Low-rank residual Neural Computing and Applications, 35(15):10957–10973,
structure for parameter-efficient network stacking. In Pro- 2023. 2
ceedings of the IEEE/CVF Conference on Computer Vision [42] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-
and Pattern Recognition, pages 15866–15876, 2024. 4 net: Convolutional networks for biomedical image segmen-
[29] Qiufu Li, Xi Jia, Jiancan Zhou, Linlin Shen, and Jinming tation. In Medical image computing and computer-assisted
Duan. Rediscovering bce loss for uniform classification. intervention–MICCAI 2015: 18th international conference,
arXiv preprint arXiv:2403.07289, 2024. 5, 1 Munich, Germany, October 5-9, 2015, proceedings, part III
[30] Jianghai Liao, Yuanhao Yue, Dejin Zhang, Wei Tu, Rui Cao, 18, pages 234–241. Springer, 2015. 8
Qin Zou, and Qingquan Li. Automatic tunnel crack in- [43] Carole H Sudre, Wenqi Li, Tom Vercauteren, Sebastien
spection using an efficient mobile imaging module and a Ourselin, and M Jorge Cardoso. Generalised dice overlap as
lightweight cnn. IEEE Transactions on Intelligent Trans- a deep learning loss function for highly unbalanced segmen-
portation Systems, 23(9):15190–15203, 2022. 1 tations. In Deep Learning in Medical Image Analysis and
[31] Huajun Liu, Xiangyu Miao, Christoph Mertz, Chengzhong Multimodal Learning for Clinical Decision Support: Third
Xu, and Hui Kong. Crackformer: Transformer network International Workshop, DLMIA 2017, and 7th International
for fine-grained crack detection. In Proceedings of the Workshop, ML-CDS 2017, Held in Conjunction with MIC-
IEEE/CVF International Conference on Computer Vision CAI 2017, pages 240–248. Springer, 2017. 5, 1
(ICCV), pages 3783–3792, 2021. 2 [44] Huaqi Tao, Bingxi Liu, Jinqiang Cui, and Hong Zhang. A
[32] Huajun Liu, Jing Yang, Xiangyu Miao, Christoph Mertz, and convolutional-transformer network for crack segmentation
Hui Kong. Crackformer network for pavement crack seg- with boundary awareness. In 2023 IEEE International Con-
mentation. IEEE Transactions on Intelligent Transportation ference on Image Processing, pages 86–90. IEEE, 2023. 2,
Systems, 24(9):9240–9252, 2023. 2 6, 7, 5
[33] Hui Liu, Chen Jia, Fan Shi, Xu Cheng, Mianzhao Wang, [45] A Vaswani. Attention is all you need. Advances in Neural
and Shengyong Chen. Staircase cascaded fusion of Information Processing Systems, 2017. 1, 2
lightweight local pattern recognition and long-range depen- [46] Jin Wang, Zhigao Zeng, Pradip Kumar Sharma, Osama Al-
dencies for structural crack segmentation. arXiv preprint farraj, Amr Tolba, Jianming Zhang, and Lei Wang. Dual-
arXiv:2408.12815, 2024. 1, 5, 7, 2 path network combining cnn and transformer for pave-
[34] Wenze Liu, Hao Lu, Hongtao Fu, and Zhiguo Cao. Learn- ment crack segmentation. Automation in Construction, 158:
ing to upsample by learning to sample. In Proceedings of 105217, 2024. 1, 6, 7, 5
the IEEE/CVF International Conference on Computer Vi- [47] Chunlong Xia, Xinliang Wang, Feng Lv, Xin Hao, and
sion, pages 6027–6037, 2023. 5 Yifeng Shi. Vit-comer: Vision transformer with convolu-
[35] Yahui Liu, Jian Yao, Xiaohu Lu, Renping Xie, and Li Li. tional multi-scale feature interaction for dense predictions.
Deepcrack: A deep hierarchical feature learning architec- In Proceedings of the IEEE/CVF Conference on Computer
ture for crack segmentation. Neurocomputing, 338:139–153, Vision and Pattern Recognition, pages 5493–5502, 2024. 1
2019. 2, 5, 6, 1 [48] Chao Xiang, Jingjing Guo, Ran Cao, and Lu Deng. A
[36] Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi crack-segmentation algorithm fusing transformers and con-
Xie, Yaowei Wang, Qixiang Ye, and Yunfan Liu. Vmamba: volutional neural networks for complex detection scenarios.
Visual state space model. arXiv preprint arXiv:2401.10166, Automation in Construction, 152:104894, 2023. 1, 2, 6, 7, 5
2024. 2, 3, 4 [49] Xiao Xiao, Shen Lian, Zhiming Luo, and Shaozi Li.
[37] Liu Mushui, Jun Dan, Ziqian Lu, Yunlong Yu, Yingming Weighted res-unet for high-quality retina vessel segmenta-
Li, and Xi Li. Cm-unet: Hybrid cnn-mamba unet for re- tion. In 2018 9th international conference on informa-
mote sensing image semantic segmentation. arXiv preprint tion technology in medicine and education, pages 327–331.
arXiv:2405.10530, 2024. 2, 6, 7, 3, 5 IEEE, 2018. 2
[38] Mengyang Pu, Yaping Huang, Qingji Guan, and Haibin [50] Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar,
Ling. Rindnet: Edge detection for discontinuity in re- Jose M Alvarez, and Ping Luo. Segformer: Simple and
flectance, illumination, normal and depth. In Proceedings of efficient design for semantic segmentation with transform-
the IEEE/CVF international conference on computer vision, ers. Advances in neural information processing systems, 34:
pages 6879–6888, 2021. 6, 7, 3, 5 12077–12090, 2021. 8
[51] Xinyu Xie, Yawen Cui, Tao Tan, Xubin Zheng, and Zitong
Yu. Fusionmamba: Dynamic feature enhancement for mul-
timodal image fusion with mamba. Visual Intelligence, 2(1):
37, 2024. 2
[52] Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu, and Lei Zhu.
Segmamba: Long-range sequential modeling mamba for 3d
medical image segmentation. In International Conference on
Medical Image Computing and Computer-Assisted Interven-
tion, pages 578–588. Springer, 2024. 2
[53] Kangmin Xu, Liang Liao, Jing Xiao, Chaofeng Chen, Haon-
ing Wu, Qiong Yan, and Weisi Lin. Boosting image quality
assessment through efficient transformer adaptation with lo-
cal feature enhancement. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition,
pages 2662–2672, 2024. 2
[54] Tomoyuki Yamaguchi, Shingo Nakamura, Ryo Saegusa, and
Shuji Hashimoto. Image-based crack detection for real con-
crete surfaces. IEEJ Transactions on Electrical and Elec-
tronic Engineering, 3(1):128–135, 2008. 2
[55] Chenhongyi Yang, Zehui Chen, Miguel Espinosa, Linus Er-
icsson, Zhenyu Wang, Jiaming Liu, and Elliot J Crowley.
Plainmamba: Improving non-hierarchical mamba in visual
recognition. arXiv preprint arXiv:2403.17695, 2024. 2, 3, 4,
6, 7, 5
[56] Fan Yang, Lei Zhang, Sijia Yu, Danil Prokhorov, Xue Mei,
and Haibin Ling. Feature pyramid and hierarchical boosting
network for pavement crack detection. IEEE Transactions on
Intelligent Transportation Systems, 21(4):1525–1535, 2019.
2, 5, 6, 1
[57] Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and
Thomas S Huang. Free-form image inpainting with gated
convolution. In Proceedings of the IEEE/CVF international
conference on computer vision, pages 4471–4480, 2019. 4
[58] Hang Zhang, Allen A Zhang, Zishuo Dong, Anzheng He,
Yang Liu, You Zhan, and Kelvin CP Wang. Robust semantic
segmentation for automatic crack detection within pavement
images using multi-mixing of global context and local image
features. IEEE Transactions on Intelligent Transportation
Systems, 2024. 1
[59] Tianjie Zhang, Donglei Wang, and Yang Lu. Ecsnet: An ac-
celerated real-time image segmentation cnn architecture for
pavement crack detection. IEEE Transactions on Intelligent
Transportation Systems, 2023. 1
[60] Xiaohu Zhang and Haifeng Huang. Distilling knowledge
from a transformer-based crack segmentation model to a
light-weighted symmetry model with mixed loss function for
portable crack detection equipment. Symmetry, 16(5):520,
2024. 3
[61] Jian Zhou, Peisen S Huang, and Fu-Pen Chiang. Wavelet-
based pavement distress detection and evaluation. Optical
Engineering, 45(2):027007–027007, 2006. 2
[62] Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang,
Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient
visual representation learning with bidirectional state space
model. In International Conference on Machine Learning,
2024. 2, 3, 4

Neofax Nov. 2024
100% (2)
Neofax Nov. 2024
953 pages
drones-08-00417-v2
No ratings yet
drones-08-00417-v2
22 pages
Generating a Synthetic Crack Image Dataset Using Crack Propagation Similarity and GAN
No ratings yet
Generating a Synthetic Crack Image Dataset Using Crack Propagation Similarity and GAN
11 pages
Buildings 13 00055 v3
No ratings yet
Buildings 13 00055 v3
19 pages
Road Crack Detection Using Deep Convolutional Neural Network and Adaptive Thresholding
No ratings yet
Road Crack Detection Using Deep Convolutional Neural Network and Adaptive Thresholding
6 pages
A Deep Hierarchical Feature Learning Architecture for Crack Segmentation
No ratings yet
A Deep Hierarchical Feature Learning Architecture for Crack Segmentation
15 pages
MixSegNet_A_Novel_Crack_Segmentation_Network_Combining_CNN_and_Transformer
No ratings yet
MixSegNet_A_Novel_Crack_Segmentation_Network_Combining_CNN_and_Transformer
11 pages
Sensors 23 00053 v2
No ratings yet
Sensors 23 00053 v2
16 pages
Joint_Learning_of_Blind_Super-Resolution_and_Crack
No ratings yet
Joint_Learning_of_Blind_Super-Resolution_and_Crack
18 pages
Infrastructures 08 00090
No ratings yet
Infrastructures 08 00090
13 pages
Automated Detection of Underwater Cracks Based On Fusion - 2024 - Engineering S
No ratings yet
Automated Detection of Underwater Cracks Based On Fusion - 2024 - Engineering S
14 pages
ABinocular Vision-Based Crack Detection and Measurement Method Incorporating Semantic Segmentation 2023
No ratings yet
ABinocular Vision-Based Crack Detection and Measurement Method Incorporating Semantic Segmentation 2023
23 pages
SDPT Semantic-Aware Dimension-Pooling Transformer For Image Segmentation
No ratings yet
SDPT Semantic-Aware Dimension-Pooling Transformer For Image Segmentation
13 pages
DeepCrack Learning Hierarchical Convolutional Features For Crack Detection
No ratings yet
DeepCrack Learning Hierarchical Convolutional Features For Crack Detection
15 pages
Feature_Pyramid_and_Hierarchical_Boosting_Network_for_Pavement_Crack_Detection
No ratings yet
Feature_Pyramid_and_Hierarchical_Boosting_Network_for_Pavement_Crack_Detection
11 pages
CV Expl 21070126001
No ratings yet
CV Expl 21070126001
16 pages
Remotesensing 15 02400 With Cover
No ratings yet
Remotesensing 15 02400 With Cover
47 pages
jung2019
No ratings yet
jung2019
27 pages
Expl CV
No ratings yet
Expl CV
16 pages
Concrete Crack Quantification Using Voxel-Based Reconstructi
No ratings yet
Concrete Crack Quantification Using Voxel-Based Reconstructi
13 pages
Seg-LSTM: Performance of XLSTM For Semantic Segmentation of Remotely Sensed Images
No ratings yet
Seg-LSTM: Performance of XLSTM For Semantic Segmentation of Remotely Sensed Images
5 pages
02 Whole
No ratings yet
02 Whole
141 pages
Zhang 2016
No ratings yet
Zhang 2016
5 pages
Kim 2020 IOP Conf. Ser. Mater. Sci. Eng. 829 012027-1
No ratings yet
Kim 2020 IOP Conf. Ser. Mater. Sci. Eng. 829 012027-1
9 pages
Semantic Mapping Using Object-Class Segmentation of RGB-D Images
No ratings yet
Semantic Mapping Using Object-Class Segmentation of RGB-D Images
6 pages
Structural Damage Image Classification: Minnie Ho Jorge Troncoso
No ratings yet
Structural Damage Image Classification: Minnie Ho Jorge Troncoso
6 pages
1-s2.0-S0926580522002618-main
No ratings yet
1-s2.0-S0926580522002618-main
17 pages
10 1111@mice 12622
No ratings yet
10 1111@mice 12622
15 pages
An Algorithm For Concrete Crack Extraction and Identification
No ratings yet
An Algorithm For Concrete Crack Extraction and Identification
26 pages
Convolutional Neural Networks-Based Crack Detection For Real Concrete Surface
No ratings yet
Convolutional Neural Networks-Based Crack Detection For Real Concrete Surface
8 pages
Sensors 23 01419 v2
No ratings yet
Sensors 23 01419 v2
21 pages
Deep Learning-Based Semantic Segmentation Methods for Pavement Cracks
No ratings yet
Deep Learning-Based Semantic Segmentation Methods for Pavement Cracks
11 pages
R7
No ratings yet
R7
18 pages
Dint A 00062
No ratings yet
Dint A 00062
16 pages
Automatic_crack_classification_on_asphalt_pavement
No ratings yet
Automatic_crack_classification_on_asphalt_pavement
18 pages
Automatic Damage Detection and Diagnosis for Hydra
No ratings yet
Automatic Damage Detection and Diagnosis for Hydra
14 pages
Download
No ratings yet
Download
16 pages
Crack Detection in Randomly Textured Surfaces Using Spiral Search
No ratings yet
Crack Detection in Randomly Textured Surfaces Using Spiral Search
7 pages
PPTPPTX
No ratings yet
PPTPPTX
30 pages
Research Article: Concrete Cracks Detection Using Convolutional Neural Network Based On Transfer Learning
No ratings yet
Research Article: Concrete Cracks Detection Using Convolutional Neural Network Based On Transfer Learning
10 pages
Automatic Detection of Cracks in Asphalt Pavement Using Deep Learning
No ratings yet
Automatic Detection of Cracks in Asphalt Pavement Using Deep Learning
15 pages
3D segmentation and color coding
No ratings yet
3D segmentation and color coding
18 pages
A Novel Intelligent Inspection Robot With Deep Stereo Vision For Three-Dimensional Concrete Damage Detection and Quantification
No ratings yet
A Novel Intelligent Inspection Robot With Deep Stereo Vision For Three-Dimensional Concrete Damage Detection and Quantification
15 pages
1 s2.0 S0263224122013914 Main
No ratings yet
1 s2.0 S0263224122013914 Main
18 pages
Crack Detection in Concrete Using Transfer Learning
No ratings yet
Crack Detection in Concrete Using Transfer Learning
12 pages
A Comparative Review of Image Processing Based Crack Detection Techniques On Civil Engineering Structures
No ratings yet
A Comparative Review of Image Processing Based Crack Detection Techniques On Civil Engineering Structures
17 pages
Implementation of Computer Vision Technique For Crack Monitoring in Concrete Structure
No ratings yet
Implementation of Computer Vision Technique For Crack Monitoring in Concrete Structure
5 pages
Advanced Microstructural Analysis of Cement-based
No ratings yet
Advanced Microstructural Analysis of Cement-based
7 pages
Road Crack Detection Using Deep Neural Network With Receptive Field Block
No ratings yet
Road Crack Detection Using Deep Neural Network With Receptive Field Block
8 pages
Crack Detection Using Deeplearning
No ratings yet
Crack Detection Using Deeplearning
6 pages
Engineering Deep Learning Methods On Automatic Detection of Damage in Infrastructure Due To Extreme Events
No ratings yet
Engineering Deep Learning Methods On Automatic Detection of Damage in Infrastructure Due To Extreme Events
13 pages
Review 3 final
No ratings yet
Review 3 final
21 pages
Efa 6
No ratings yet
Efa 6
11 pages
Computer Aided Civil Eng - 2017 - Cha - Deep Learning Based Crack Damage Detection Using Convolutional Neural Networks
No ratings yet
Computer Aided Civil Eng - 2017 - Cha - Deep Learning Based Crack Damage Detection Using Convolutional Neural Networks
18 pages
Development of A Location Invariant Crack Detection and Localisation Model (LICDAL) in Unconstrained Oil Pipeline Images Using Deep Convolution Neural Networks
No ratings yet
Development of A Location Invariant Crack Detection and Localisation Model (LICDAL) in Unconstrained Oil Pipeline Images Using Deep Convolution Neural Networks
14 pages
2011-Structural Image Classification With Graph Neural Networks
No ratings yet
2011-Structural Image Classification With Graph Neural Networks
6 pages
Water 15 02082
No ratings yet
Water 15 02082
21 pages
Deep Learning-Based Crack Damage Detection
No ratings yet
Deep Learning-Based Crack Damage Detection
18 pages
Cao 2020
No ratings yet
Cao 2020
13 pages
Sensors 23 07395
No ratings yet
Sensors 23 07395
18 pages
Principles of Mesh Networks and Mesh Generation: Definitive Reference for Developers and Engineers
From Everand
Principles of Mesh Networks and Mesh Generation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Activity Guide - Lists Make - Unit 5 Lesson 4
No ratings yet
Activity Guide - Lists Make - Unit 5 Lesson 4
4 pages
How Numerology Works
No ratings yet
How Numerology Works
18 pages
Modelans W g22
No ratings yet
Modelans W g22
28 pages
Wilcom Shorcut
No ratings yet
Wilcom Shorcut
3 pages
Solid State Numericals
No ratings yet
Solid State Numericals
4 pages
Lecture 7 (Project Management)
No ratings yet
Lecture 7 (Project Management)
28 pages
9th web-APG-EADC 2016 Proceedings-2 PDF
No ratings yet
9th web-APG-EADC 2016 Proceedings-2 PDF
428 pages
Maths JSS 2 Term 2
No ratings yet
Maths JSS 2 Term 2
180 pages
4 Hydrostatic Forces On Curved Surfaces
100% (3)
4 Hydrostatic Forces On Curved Surfaces
12 pages
Blobdload 1
No ratings yet
Blobdload 1
42 pages
First Order Active Filters (LPF, HPF) : (A) Low Pass Filter
100% (1)
First Order Active Filters (LPF, HPF) : (A) Low Pass Filter
12 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
8 pages
Information Search: Smart Quill
No ratings yet
Information Search: Smart Quill
7 pages
Meta Lang Exercise Neg Answers
No ratings yet
Meta Lang Exercise Neg Answers
3 pages
Instant download (Ebook) Automate the Boring Stuff with Python: Practical Programming for Total Beginners by Al Sweigart ISBN 9781593275990, 1593275994 pdf all chapter
100% (5)
Instant download (Ebook) Automate the Boring Stuff with Python: Practical Programming for Total Beginners by Al Sweigart ISBN 9781593275990, 1593275994 pdf all chapter
81 pages
ITRI Pub 602
No ratings yet
ITRI Pub 602
14 pages
Stepper Motor
100% (9)
Stepper Motor
6 pages
Tan Son Nhat T3 Baggage Handling System: Interface Control Document Bhs - Ibms
No ratings yet
Tan Son Nhat T3 Baggage Handling System: Interface Control Document Bhs - Ibms
15 pages
Statmath
No ratings yet
Statmath
6 pages
(Ebook) Conferring in the Math Classroom : A Practical Guidebook to Using 5-Minute Conferences to Grow Confident Mathematicians by Gina Picha, Brian Bushart ISBN 9781625315137, 1625315139 - Download the ebook today and own the complete version
100% (3)
(Ebook) Conferring in the Math Classroom : A Practical Guidebook to Using 5-Minute Conferences to Grow Confident Mathematicians by Gina Picha, Brian Bushart ISBN 9781625315137, 1625315139 - Download the ebook today and own the complete version
74 pages
STAT 220: Engineering Statistics: United Arab Emirates University College of Business and Economics Spring 2011
No ratings yet
STAT 220: Engineering Statistics: United Arab Emirates University College of Business and Economics Spring 2011
3 pages
Underground Reservoirs Fluid Production and Injection V2.2.1 Stanko
No ratings yet
Underground Reservoirs Fluid Production and Injection V2.2.1 Stanko
294 pages
Sybase IQ 16.0 Installation
No ratings yet
Sybase IQ 16.0 Installation
64 pages
Din 806
No ratings yet
Din 806
6 pages
Halite Scale Formation in Gas-Producing Wells
No ratings yet
Halite Scale Formation in Gas-Producing Wells
7 pages
The Difference Between 3 Rock Drilling Methods
No ratings yet
The Difference Between 3 Rock Drilling Methods
6 pages
Adp 2 Report
No ratings yet
Adp 2 Report
58 pages
Power & Distribution X-Mer
No ratings yet
Power & Distribution X-Mer
5 pages
CH 3
No ratings yet
CH 3
8 pages

SCSegamba Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures

Uploaded by

SCSegamba Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures

Uploaded by

SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack

Hui Liu1,2,3 , Chen Jia1,2,3,* , Fan Shi1,2,3 , Xu Cheng1,2,3 , Shengyong Chen1,2,3

[email protected], {jiachen, shifan}@email.tjut.edu.cn, {xu.cheng, sy}@ieee.org

Pixel-level segmentation of structural cracks across var-

(c) Segmentation results in complex interference conditions.

(a) Network Architecture of SCSegamba (b) Architecture of SAVSS block

o \in \mathbb {R}^{ 1 \times H \times W } as follows:

o_1 = GBC(Concat(F_1^{up}, F_2^{up}, F_3^{up}, F_4^{up})) (14)

3.5. Objective Function

2 0.8032 0.8126 0.7994 0.8474 0.8231 0.8365

2 0.8094 0.8162 0.8185 0.8470 0.8325 0.8413

Patch Size ODS OIS P R F1 mIoU Params ↓ FLOPs ↓ Model Size ↓

Methods ODS OIS P R F1 mIoU Params ↓ FLOPs ↓ Model Size ↓

You might also like