0% found this document useful (0 votes)
38 views12 pages

Adaptive Multiview SAR Target Recognition

Uploaded by

its fae
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views12 pages

Adaptive Multiview SAR Target Recognition

Uploaded by

its fae
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

13634 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL.

17, 2024

An Adaptive Multiview SAR Automatic Target


Recognition Network Based on Image Attention
Renli Zhang , Member, IEEE, Yuanzhi Duan, Jindong Zhang, Minhui Gu, Shurui Zhang , Member, IEEE,
and Weixing Sheng

Abstract—The deep neural network has achieved remarkable high-resolution imaging capabilities [2], SAR has become an
recognition performance in synthetic aperture radar (SAR) auto- important tool for remote sensing and reconnaissance in modern
matic target recognition (ATR) by extracting the discriminative fea- society, and has been widely used in fields, such as resource
tures from massive SAR images. Due to the sensitivity of SAR image
to the observation aspect, the multiview ATR method could enhance exploration, environmental monitoring, and military applica-
the robustness of feature representation and improve the recogni- tions [3]. A superpixelwise polarimetric SAR change detection
tion performance. However, existing multiview ATR methods suffer method is employed in [4] for built-up area extraction. In [5], an
from increasing complex structure and heavy computation when end-to-end trainable deep network incorporating differentiable
the number of input images grows. An adaptive multiview fusion superpixel generation and merging steps is designed for SAR
network based on image attention (IA-AMF-Net) compatible with
variable number of input images is proposed for SAR ATR in this image segmentation.
article. In IA-AMF-Net, first, the depthwise separable convolution SAR automatic target recognition (ATR) [6] aims to locate
is employed to extract the classification features from multiple SAR the region of interest (ROI), which contains potential targets
input images in parallel with the lightweight attribute. Second, and then classify the targets in the ROI. Most of the SAR
the channel feature weight vector of each image is generated and ATR methods have been investigated for the single-view SAR
concatenated by applying the squeeze-and-excitation operation to
the extracted features. The image attention weights for feature fu- ATR scenario, including the machine learning-based [7], [8] and
sion are calculated through softmax normalizing the concatenated the deep learning (DL)-based [9], [10] methods. The machine
channel feature weights of input images. At last, the extracted fea- learning-based SAR target recognition methods, e.g., support
tures from multiview SAR images are fused by the obtained image vector machines (SVM) [11], linear discriminant analysis [12],
attention weights. The dimension of fused feature keeps constant adaptive boosting (AdaBoost) [13], conditional Gaussian model
regardless of the number of input images, and the attention to
the classification features of interested images is enhanced. Exper- (CGM) [14], and iterative graph thickening (IGT) [15], tend
imental results on the moving and stationary target acquisition to involve complex algorithms and rely on the single feature
and recognition dataset show that IA-AMF-Net achieves superior attribute due to the handcrafted feature extraction.
recognition performance under various operating conditions with The SAR ATR methods based on DL have been quickly
fewer parameters and lower computational load compared to the advanced with the inherent ability of DL for automatic fea-
other networks.
ture extraction [16]. In [17] and [18], the deep convolutional
Index Terms—Automatic target recognition (ATR), feature fus- neural network (CNN) and an all-convolutional networks (A-
ion, image attention, squeeze-and-excitation, multiview synthetic ConvNets) model addressing overfitting are presented for target
aperture radar (SAR).
classification. In [19], the importance of data augmentation on
the robustness of CNNs is emphasized by dealing with the
I. INTRODUCTION translation invariance of CNNs. In [20], the multiscale feature
EMOTE sensing technology, as an important means of fusion is combined with the attention module and the super-
R Earth observation, allows for the study and analysis of
human life and natural resources [1]. Remotely sensed data
class labels is introduced in the multiscale attention super-class
CNN (MSA-SCNN). The fully convolutional attention block
mainly include various types of sensor images, such as synthetic (FCAB) [21] is cooperated with a CNN to refine the important
aperture radar (SAR) images, optical images and hyperspec- features and suppress the unnecessary ones, meanwhile, this
tral images. With the advantages of all-weather, all-day, and method is computationally efficient with few parameters. In [22],
a semisupervised CNN method integrates the information con-
tained in the unlabeled samples into the loss function in the
Manuscript received 29 March 2024; revised 26 May 2024 and 14 July
2024; accepted 21 July 2024. Date of publication 26 July 2024; date of current training process, which improves the SAR ATR accuracy when
version 8 August 2024. This work was supported by the National Natural the labeled samples are insufficient.
Science Foundation of China under Grant 61971224, Grant 62001227, and Grant In [23], an incremental learning method based on strong sep-
62001230. (Corresponding author: Renli Zhang.)
The authors are with the School of Electronic and Optical Engineer- arability features (SSF-IL) is presented to address the model’s
ing, Nanjing University of Science and Technology, Nanjing 210094, China forgetting of previously learned knowledge. SSF-IL enables the
(e-mail: zhangrenli_nust@163.com; yuanzhiduan@njust.edu.cn; zhangjin- classifier to modify the decision boundary of old classes and mit-
dong@njust.edu.cn; guminhui10@njust.edu.cn; shuruizhang@njust.edu.cn;
shengwx@njust.edu.cn). igate the classification bias toward new classes. A cutting-edge
Digital Object Identifier 10.1109/JSTARS.2024.3434496 concept known as class incremental learning in novel category

© 2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see
https://creativecommons.org/licenses/by-nc-nd/4.0/
ZHANG et al.: ADAPTIVE MULTIVIEW SAR AUTOMATIC TARGET RECOGNITION NETWORK BASED ON IMAGE ATTENTION 13635

discovery for SAR targets (CNTs) is explored in [24]. In CNTs, input images grows. Therefore, those methods are not suitable
the inherent information within SAR images is captured by self- for the scenario where the SAR ATR task is implemented on
supervised learning, and a multiview consistency and enhanced a digital processing platform with limited computing resources
separability strategy is introduced to improve the performance for variable numbers of input images. The SAR ATR methods
of new category discovery. In order to improve the recognition in [31], [32], and [33] directly concatenate the inherent classi-
performance of the DL model, meta-learning [25], few-shot fication features from multiview SAR images for fusion, which
learning [26], and contrastive learning [27] have also been results in the feature fusion efficiency decrease and an increase
investigated. The methods above have been primarily applied in model size.
to the single-view SAR ATR scenario. In order to adapt to the variability of the number of mul-
Due to the backscattering characteristics of SAR targets, such tiview SAR input images, and simultaneously maintain the
as geometric distortion and aspect sensitivity, extracting suffi- robust recognition performance with a low computational load,
cient information from a single-view SAR image for the ATR an adaptive multiview fusion network for SAR ATR based
purpose could be challenging. Thus, the single-view SAR image on image attention (IA-AMF-Net) is proposed in this article.
suffers from inherent limitations for target recognition. Multi- In IA-AMF-Net, first, the weight-sharing feature extraction
view SAR images of the same target contain richer classification module (WFE-module) is utilized to extract the classification
information than a single one. Combining the complementary features from multiview SAR input images in parallel. The same
information from multiple views of SAR images allows for network and parameters are shared to each image. Second, the
more accurate and comprehensive interpretation of the target image attention-based feature fusion module (IAFF-module) is
features, which is beneficial for enhancing the robustness of designed to efficiently fuse the features from multiple images
feature representation and improving the recognition accuracy based on the importance of target features in each view image. In
in SAR ATR. Thus, the multiview learning is an effective method IAFF-module, the channel feature weight vector for each image
for SAR ATR through data fusion of the multifeature sets [28]. is generated by applying squeeze-and-excitation operation to
Exploring the correlation between multiple views and semantic the extracted features. Then, the image attention weights, which
information is crucial in the multiview SAR ATR [29]. model the interdependencies between images, are calculated
In [30], the bidirectional long-short term memory (BLSTM) through softmax normalizing the concatenated channel feature
networks based on spatial information learning utilizes 50 im- weights of input images along the channel dimension. The fea-
ages as input for multiview SAR target recognition, and the tures extracted from multiview SAR images are fused by using
amount of SAR images required in BLSTM is not suitable the obtained image attention weights, such that the attention
for practical ATR tasks. In [31], the classification features are to the classification features of interested images is enhanced,
learned and fused at different layers in the multiview deep and the dimension of the fusion feature keeps constant and is
convolutional neural network (MVDCNN) with a parallel net- independent of the number of input images. At last, the fusion
work topology. In [32], a feature extraction and fusion network feature is fed into the classifier in the deep feature extraction and
(FEF-Net) is designed by using the distinct and useful learning classification module (DFEC-module).
modules, such as deformable convolution. In [33], a convo- IA-AMF-Net has the capacity to adapt to the variable num-
lutional autoencoder (CAE) is utilized to extract features and bers of SAR input images with significantly reduced num-
improves the antinoise ability in the multiaspect SAR recog- ber of parameters and computational load compared to A-
nition network based on self-attention (MACTN). The training ConvNets, FCBA, MA-BLSTM, MVDCNN, FEF-Net, and
process of MACTN is complex because of the pre-training and MVDFLN, and achieves an excellent and robustness recognition
fine-tuning. In [34], a multiview deep feature learning network performance.
(MVDFLN) is designed to learn the intraview and interview The rest of this article is organized as follows. In Section II,
features from multiview SAR images with a complex structure. the network framework of IA-AMF-Net is designed. Section III
Besides, many effective feature fusion methods have been describes the details of the experiment and experimental results.
investigated by designing the complex network structures and The reasonability of the IA-AMF-Net and future work are dis-
learning algorithms. In [35], a competitive game framework cussed in Section IV. Finally, Section V concludes this article.
is designed to minimize the utility function for infrared small
target segmentation. The mask-guided multilevel fusion network
independently explores the consistent and specific features in II. PROPOSED IA-AMF-NET FOR SAR ATR
RGB-T modalities at three different levels for pedestrian detec- In this section, IA-AMF-Net is designed to extract and fuse
tion in [36]. The bidirectional low-frequency amplitude fusion classification features from multiview images for SAR ATR.
method augments the source and target domain images, and Fig. 1 shows the network architecture of IA-AMF-Net. Three
aligns the styles while preserving the content in [37]. main modules constitute the network, i.e., WFE-module, IAFF-
Although the multiview SAR ATR methods above have module, and DFEC-module.
achieved remarkable recognition performance, the following In Fig. 1, k parallel SAR images are fed into the WFE-module.
problems still remain: The multiview SAR ATR frameworks At first, the classification features of each image are extracted
in [31], [32], [33], and [34] are mainly applied for a fixed in the WFE module using the depthwise separable convolution
number of input images, and suffer from an increasing number (DSC) [38] with shared parameters. Then, IAFF-module em-
of parameters and high computational load when the amount of ploys the image attention to fuse the extracted features from
13636 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

Fig. 1. Basic architecture of IA-AMF-Net.

multiview images. The dimension of fusion feature is indepen- channels, height, and width of the classification feature uni ,
dent of the number of input images and the attention to the respectively.
classification features of interested images is enhanced. Finally, Fig. 2 shows the structure of standard convolution and DSC.
DFEC-module calibrates the feature responses for the fusion In standard convolution, each input channel corresponds to
features, and the recognition result is obtained by the softmax a convolutional kernel, and an output channel is the sum of
classifier. The details of IA-AMF-Net are presented as follows. the outputs of all convolutional kernels. DSC is consisted of
a depthwise convolution for filtering and a pointwise convo-
lution for combining. In depthwise convolution, each output
channel is determined by a single convolution kernel sliding
A. WFE-Module
in an input channel [39]. Pointwise convolution adopts 1 × 1
Drawing on the idea of Siamese network, the classification convolutional kernel to perform a linear combination on the
features from each input image are extracted by the same struc- output of depthwise convolution. The computational load of
ture and parameters in WFE-module. As shown in Fig. 1, the DSC has been reduced by (1/O + 1/DK 2
) times in comparison
structure of the WFE-module is composed of six DSCs and a to the standard convolutions, where O and DK × DK represent
max pooling layer. By employing the lightweight convolution the number of output channels and the convolution kernel size,
computation through DSC instead of standard convolution, the respectively [38]. Therefore, the model size and computation
WFE-module extracts the classification features from multiple cost of WFE-module are significantly reduced.
parallel input SAR images with shared parameters to ensure the
lightweight attribute.
In WFE-module, the classification feature uni is extracted B. IAFF-Module
from the ith image within a multiview SAR image combination IAFF-module calculates the image-level weights through the
sample of k images belonging to the target class n, where n ∈ squeeze-and-excitation operation to enhance the attention to the
[1, N ] and N represents the number of target classes, uni = classification features of interested images under the condition of
[u1ni , u2ni , . . . , uC l
ni ] with uni ∈ R
H×W
representing the feature varying input image quantities. Different from the squeeze and
of the lth channel, and C, H, and W denote the number of excitation attention (SEA) in [40], the squeeze-and-excitation
ZHANG et al.: ADAPTIVE MULTIVIEW SAR AUTOMATIC TARGET RECOGNITION NETWORK BASED ON IMAGE ATTENTION 13637

Fig. 2. Structure of standard convolution and DSC. (a) Standard convolution. (b) DSC.

Fig. 3. Block diagram of IAFF-module.

operation in IAFF-module is adopted for the purpose of


modelling interdependencies between images rather than chan-
nels, thereby endowing IA-AMF-Net with the capacity to effec-
tively fuse the features with a constant dimension.
The detailed block diagram of IAFF-module is shown in
Fig. 3. First, the squeeze-and-excitation operation is used to
generate the channel feature weight vector sni ∈ RC for the
Fig. 4. Structure of squeeze-and-excitation operation.
extracted feature uni of the ith SAR image. The squeeze-and-
excitation operation is divided into squeeze step and excitation
step as shown in Fig. 4. In squeeze step, the global average pool-
ing is applied to generate a statistic zni = [zn1 i , zn2 i , . . . , znCi ] by channel-wise feature responses. In excitation step, a simple gat-
shrinking uni through its spatial dimensions H × W . znl is ing mechanism with a sigmoid activation is used to fully capture
i
calculated by the channel-wise dependencies and produces a collection of
H  W
the per-channel modulation weights. The gating mechanism is
1 
znl = ul (t, j) (1) formed by creating a bottleneck with two fully connected (FC)
i H × W t=1 j=1 ni layers around the non-linearity. The channel weight vector sni
is calculated as follows:
where uln (t, j) denotes the (t, j) element of ulni . The squeeze
i
process produces an embedding of the global distribution of sni = σ (W2 δ (W1 zni )) (2)
13638 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

Fig. 5. Optical and SAR images of 10 targets. (a) 2S1. (b) BMP2. (c) BRDM2. (d) BTR60. (e) BTR70. (f) D7. (g) T62. (h) T72. (i) ZIL131. (j) ZSU234.

C C
where W1 ∈ R r ×C and W2 ∈ RC× r are the weight matrices class as follows:
of the two FC layers for dimensionality-reduction and increas- exp (am )
ing, respectively [40], r is the reduction ratio, δ is the rectified pm =  N . (5)
j exp (aj )
linear unit function [41], and σ is the sigmoid function. The
obtained sni represents the interdependencies between the C Hence, the class probabilities of the multiview SAR image
channels of feature uni . combination sample are represented as [p1 , p2 , . . . , pN ].
Then, the channel feature weight vector of each image is
employed to calculate the image attention weights for feature III. EXPERIMENTS AND RESULTS
fusion of multiview SAR images. The k channel feature weight The recognition performance and the computational load of
vectors from multiview images are concatenated to obtain the IA-AMF-Net is evaluated under various operating conditions
multiview channel feature weights Sn = [sn1 ; sn2 ; . . . ; snk ], and compared with other ATR methods in this section.
where Sn ∈ Rk×C . In order to enhance the attention to the
classification features of interested images, the concatenated A. Multiview SAR Image Construction
multiview channel feature weights are normalized by softmax
function along the channel dimension, and the image attention The moving and stationary target acquisition and recog-
weights S̄n = [s̄n1 ; s̄n2 ; . . . ; s̄nk ] are calculated by nition (MSTAR) SAR image dataset is used to evaluate the
  performance of IA-AMF-Net. All SAR images from MSTAR
l exp slni including military vehicles and separate clutter are collected
s̄ni =    (3)
k l
j=1 exp s nj
by a spotlight SAR sensor in the X-band with a resolution of
0.3 m × 0.3 m and using HH polarization. The publicly released
where slni and s̄lni denote the lth elements of sni and s̄ni , dataset contains ten classes of vehicle targets under various
respectively. Finally, the fusion feature Q = [q1 , q2 , . . . , qC ] is operating conditions, such as different aspect angles, depression
obtained by fusing the extracted features from multiview images angles, and serial numbers. Fig. 5 shows the optical and SAR
with the image attention weights as follows: images of ten targets.
k The public MSTAR dataset consists of two types of SAR im-

ql = s̄lni ulni (4) age acquisition conditions: standard operation condition (SOC)
i=1 and extended operation conditions (EOC). SOC refers to the
training and testing sets with similar imaging configurations and
where ql ∈ RH×W , l = 1, 2, . . . , C. target types, while EOC has more complex scenes with signif-
As shown in (4) and Fig. 3, the dimension of the fusion feature icant variations in depression angles, target configurations, and
Q is the same as that of the extracted classification feature uni target versions. The recognition performance of IA-AMF-Net is
of a single image, and is independent of the number of input evaluated under SOC and EOC.
images. Therefore, IAFF-module achieves autonomous feature Fig. 6 shows the geometric model for multiview SAR ATR of
fusion under the condition of varying input image quantities, a ground target. The multiview SAR images can be obtained by
and reduces the number of parameters. imaging the same target from different aspect and depression
angles by radars on one or more platforms, and each view
C. DFEC-Module direction refers to a given depression and aspect angle in which
The DFEC-module is designed at the end of the IA-AMF-Net the SAR image is obtained. Based on the MSTAR dataset, the
to enhance the depth of fusion feature and perform the multiclass depression angle of the multiview images is set to a constant,
classification. As shown in Fig. 1, the DFEC-module utilizes and the aspect angles of the images are covered from 0◦ to 360◦ .
DSCs and SEA to calibrate the multiview image fusion feature The SAR images are selected from the original MSTAR dataset,
Q obtained in IAFF-module. Then, the calibrated fusion feature and the corresponding multiview image samples are constructed
is transformed into a feature vector [a1 , a2 , . . . , aN ] through a and composed of multiple separate images with continuous
global average pooling layer and a FC layer. At last, the softmax variations in aspect angles.
classifier is applied to compute the probability of the multiview For a given view interval θ and view number k > 1, the
SAR image combination sample belonging to the mth target multiview image sample Xn is constructed by k selected images
ZHANG et al.: ADAPTIVE MULTIVIEW SAR AUTOMATIC TARGET RECOGNITION NETWORK BASED ON IMAGE ATTENTION 13639

TABLE I
NETWORK CONFIGURATION OF IA-AMF-NET

Fig. 6. Geometric model for multiview SAR imaging.

Fig. 7. Confusion matrix of a two-view IA-AMF-Net under SOC.

from the same target class n as


Xn = {xn1 , xn2 , . . . , xnk } (6)
k = 2, and the recognition performance of the network is evalu-
with ated under two-view, three-view, and four-view inputs scenarios.
  
ϕ (xn ) − ϕ xn  ≤ θ, xni , xnj ∈ Xn (7)
i j
C. Recognition Performance Under SOC
where ϕ(xni ) denotes the aspect angle of the SAR image xni .
In this experiment, ten classes of target images under SOC
are selected as the experimental data. Table II shows the usage
B. Network Architecture Setup
of raw SAR images for training and testing samples. Among
In IA-AMF-Net, the size of input image for each view is the raw SAR images, only part of the images collected at the
96 × 96. During the training phase, the learning rate starts at radar operating depression angle of 17◦ are selected to generate
0.001 and decreases according to a cosine annealing schedule, the multiview SAR image combinations for network training,
and the mini-batch size is set to 32. Other hyperparameters and the images collected at the depression angle of 15◦ are
in IA-AMF-Net are listed in Table I. The hyperparameters in used to generate the testing samples. The method described in
Table I are denoted as (number of feature maps)@(kernel size Section III-A is used to generate multiview SAR image combi-
in depthwise convolution). “SE-Oper” denotes the squeeze-and- nations. In this way, 22 016 two-view SAR image combinations
excitation operation, “WS,” “SS,” and “NN” denote the window are generated for training IA-AMF-Net. For each target class,
size, stride size and number of neurons, respectively. The stride 2000 samples are randomly selected as a testing set from the
size of each pointwise convolution is 1 × 1. combinations of all the raw images collected at the depression
The aspect interval θ is set to 45◦ for performance evaluation, angle of 15◦ . In total, there are 20 000 samples for testing.
and the numbers of images k in the multiview image sequence Figs. 7–9 show the recognition results of IA-AMF-Net with
are set to 2, 3, and 4, respectively. Taking k = 3 as an example, a two-view, three-view, and four-view inputs, respectively, as pre-
three-view image combination contains three SAR images from sented with confusion matrices. In the confusion matrices, the
the same class of target with the same depression angle and the rows and columns represent the actual class labels of the targets
aspect angle differences of no more than 45◦ . and the predicted class labels, respectively, with the elements
Utilizing the characteristic that the dimension of the fusion denoting the probability of recognition as a certain class. As
feature keeps constant regardless of the number of input images, shown in Figs. 7–9, the overall recognition rates of IA-AMF-Net
IA-AMF-Net is trained with the number of multiview images with two-view, three-view, and four-view inputs are 99.24%,
13640 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

TABLE II
NUMBER OF RAW IMAGES FOR TRAINING AND TESTING SAMPLES UNDER SOC

TABLE III
NUMBER OF RAW IMAGES FOR TRAINING AND TESTING SAMPLES UNDER
EOC-D

and the corresponding outputs in the FC layer can be considered


as representations of the samples in the high-dimensional space,
and are mapped into the 2-D Euclidean space by t-SNE to
illustrate the classification performance.
Fig. 10 shows the results of visualization example of the input
samples and the outputs in the FC layer for IA-AMF-Net with
three-view inputs. The points with the same color represent
the same target class. In Fig. 10(a), the labels of input sam-
ples are distributed disorderly and difficult to be classified in
Fig. 8. Confusion matrix of a three-view IA-AMF-Net under SOC. practice. After being processed by IA-AMF-Net, as shown in
Fig. 10(b), the samples belonging to the same class are more
closely clustered together, and the samples from different classes
are separated clearly from each other, which brings excellent
recognition results.

D. Recognition Performance Under EOC


Compared with SOC, EOC leads to greater difficulty in
recognition. EOC includes depression-variant EOC (EOC-D)
and version-variant EOC (EOC-V) [31].
First, the recognition performance of IA-AMF-Net is eval-
uated through recognizing four classes of targets in EOC-D
from the MSTAR dataset: BRDM2, ZSU234, 2S1, and T72.
The numbers of raw images for training and testing samples
are shown in Table III. Only part of images with a depression
angle of 17◦ are selected to generate the multiview SAR image
combinations as the training samples. In this way, 9120 two-view
SAR image combinations are generated for training. The images
collected at a depression angle of 30◦ are used to generate the
testing samples. For each class, 2000 samples are randomly
Fig. 9. Confusion matrix of a four-view IA-AMF-Net under SOC.
selected as the testing set. Notably, the T72-sn132 version is
used for training and the T72-A64 version is used for testing.
As shown in Figs. 11– 13, IA-AMF-Net with two-view,
99.73%, and 99.84%, respectively. The recognition rates of the three-view, and four-view inputs achieves superior recogni-
majority classes of the target reach 100%. IA-AMF-Net has the tion results, reaching the overall recognition rates of 97.49%,
ability to extract more classification features from multiview 99.13%, and 99.99%, respectively. The recognition rates for all
SAR images and effectively fuse them, thereby achieving precise classes are higher than 96%. As the number of multiview images
recognition performance in SOC ATR experiments. increases, the corresponding recognition performance improves.
To visually demonstrate the classification capability of IA- The recognition rate of IA-AMF-Net with four-view inputs is
AMF-Net, the t-distributed stochastic neighbor embedding close to 100%. Despite the large differences in the depression
(t-SNE) algorithm [42] is employed to perform visual analysis angles between the training and testing sets, IA-AMF-Net still
with part of the multiview testing samples. The input samples possesses satisfactory recognition performance.
ZHANG et al.: ADAPTIVE MULTIVIEW SAR AUTOMATIC TARGET RECOGNITION NETWORK BASED ON IMAGE ATTENTION 13641

Fig. 10. Results of visualization. (a) Input samples. (b) Outputs of the FC layer.

Fig. 11. Confusion matrix of a two-view IA-AMF-Net under EOC-D. Fig. 12. Confusion matrix of a three-view IA-AMF-Net under EOC-D.

TABLE IV
NUMBER OF RAW IMAGES FOR TRAINING SAMPLES UNDER EOC-V

TABLE V
NUMBER OF RAW IMAGES FOR TESTING SAMPLES UNDER EOC-V

Next, the experiment is conducted on EOC-V using four


classes of targets: BRDM2, BTR70, BMP2sn-9563, and T72sn-
132. Part of raw images with a depression angle of 17◦ are
selected to generate training samples. The number of raw images Fig. 13. Confusion matrix of a four-view IA-AMF-Net under EOC-D.
for training is listed in Table IV. Thus, 7233 two-view SAR
image combinations are generated as the training set. As shown samples are selected randomly from the multiview combinations
in Table V, all the raw images of five version variants of T72, for each class of target.
collected at depression angles of 17◦ and 15◦ , are used to gener- As shown in Table VI, IA-AMF-Net with two-view, three-
ate SAR image combinations for testing. In total, 2000 testing view, and four-view inputs reaches the overall recognition rates
13642 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

TABLE VI
CONFUSION MATRIX OF IA-AMF-NET UNDER EOC-V

TABLE VII
COMPARISON OF THE NUMBER OF PARAMETERS AND FLOPS

of 99.07%, 99.82%, and 99.99%, respectively. Remarkably, the IA-AMF-Net keeps constant when the number of input images
recognition rates for the major classes can reach to 100%. grows, and is less than that of lightweight network FCAB.
The experiment results above demonstrate that IA-AMF-Net The number of parameters and FLOPs in IA-AMF-Net are
achieves the outstanding recognition performance under differ- both less than 4% of those in MVDCNN. Benefiting from the
ent ATR operating conditions, indicating its potential capacity DSC structure, the parameter sharing strategy and the constant
for SAR ATR tasks in remote sensing applications. dimension of fused feature, IA-AMF-Net method is lightweight
and efficient for SAR ATR.
Next, the recognition performance of IA-AMF-Net under
E. Recognition Performance Comparison SOC and EOC is compared with that of other methods. Be-
In this section, the performance of IA-AMF-Net is compared cause these methods are based on the MSTAR dataset with
with that of other methods, including traditional CNN [17], different principles and implementations, the various formats
A-ConvNets [18], CNN using augmented training data [19], of the input samples used for training in these methods make the
FCAB [21], multiaspect-aware BLSTM (MA-BLSTM) [30], performance comparison difficult [31]. Thus, we simultaneously
MVDCNN [31], FEF-Net [32], MVDFLN [34], SVM [11], compare the recognition rate and the number of raw images for
CGM [14], AdaBoost [13], IGT [15], MSA-SCNN [20], training samples in these methods. An ATR method is identified
MACTN [33], and multiview feature extraction and discrimi- to be superior if it achieves a higher recognition rate with fewer
nation network [43]. raw images for training.
Table VII illustrates the recognition rates, the number of Table VIII shows the recognition rates of various methods
parameters (# Params), and floating point operations (FLOPs) under SOC and EOC, as well as the number of raw images for
of different networks under SOC. IA-AMF-Net with two-view, training. It can be observed that the recognition performance
three-view, and four-view inputs is denoted as 2-IA-AMF-Net, of the methods based on DL surpasses that of the traditional
3-IA-AMF-Net, and 4-IA-AMF-Net, respectively. As shown methods. Due to the rich target classification information of
in Table VII, the recognition performance of IA-AMF-Net is multiview images, the recognition ability of the multiview meth-
superior to that of other networks. The number of parameters in ods is superior to that of the single-view methods. IA-AMF-Net
ZHANG et al.: ADAPTIVE MULTIVIEW SAR AUTOMATIC TARGET RECOGNITION NETWORK BASED ON IMAGE ATTENTION 13643

TABLE VIII
COMPARISON OF RECOGNITION RATE AND THE NUMBER OF RAW IMAGES

Fig. 14. Feature maps of 1st DSC layer in DFEC-module. (a) Single-view input. (b) Three-view inputs.

achieves the highest recognition performance compared with TABLE IX


ABLATION EXPERIMENTS OF IA-AMF-NET
other methods, and the recognition rates of IA-AMF-Net are
at least 2.87%and 0.36% higher than those of other methods
under EOC-D and EOC-V, respectively. Utilizing the trained net-
work with two-view inputs, IA-AMF-Net possesses an excellent
and robust recognition ability in the three-view and four-view
scenarios.

IV. DISCUSSION
For further discussion on IA-AMF-Net, the experiments are effectively weight and fuse the target information contained in
conducted to explore the effectiveness of this network. the multiview inputs according to the importance of each image,
and achieves a high recognition rate.
A. Effectiveness Analysis of Feature Fusion for IA-AMF-Net
Fig. 14 shows the feature map outputs of IA-AMF-Net with B. Ablation Study of IA-AMF-Net
single-view input and three-view inputs to verify the fusion The IA-AMF-Net mainly consists of three parts: WFE-
efficiency of the image attention using the multiview images. module, IAFF-module, and DFEC-module. The ablation exper-
Partial feature map outputs of the first DSC layer in the DFEC- iment is conducted to verify the effectiveness of IAFF-module,
module are selected and substituted into the interpolation pro- which is used for feature fusion. The MSTAR datasets under
cessing. Compared with the feature map output of single-view SOC and EOC with two-view inputs are used for the ablation
in Fig. 14(a), the impact of the speckle noise in the raw SAR experiment. The ablation results are shown in Table IX. In
images on the three-view feature map in Fig. 14(b) has been Table IX, the “Average” method replaces the image attention
reduced, and the central region of the target is enhanced with weights s̄lni in (4) of IAFF-module with 1/k for feature fusion
obvious brightness. Therefore, IA-AMF-Net has the capacity to which treats the feature of each image as equally important
13644 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

as those of other multiview images, and the “IAFF-module [6] C. Clemente, L. Pallotta, D. Gaglione, A. De Maio, and J. J. Soraghan,
without softmax” method replaces the weights s̄lni in (4) of “Automatic target recognition of military vehicles with Krawtchouk mo-
ments,” IEEE Trans. Aerosp. Electron. Syst., vol. 53, no. 1, pp. 493–500,
IAFF-module with slni for feature fusion, which focuses on the Feb. 2017.
interchannel dependencies of the feature of each image while [7] S. Dang, Z. Cui, Z. Cao, and N. Liu, “SAR target recognition via incre-
mental nonnegative matrix factorization,” Remote. Sens., vol. 10, no. 3,
neglecting the importance of interview relationships. As shown 2018, Art. no. 374.
in Table IX, the recognition rates of IA-AMF-Net with two-view [8] L. Tao, X. Jiang, X. Liu, Z. Li, and Z. Zhou, “MultiScale supervised kernel
inputs improve by 2.09% under SOC, 2.44% under EOC-D, dictionary learning for SAR target recognition,” IEEE Trans. Geosci.
Remote Sens., vol. 58, no. 9, pp. 6281–6297, Sep. 2020.
and 2.81% under EOC-V compared to the baseline “Average” [9] F. Zhou, L. Wang, X. Bai, and Y. Hui, “SAR ATR of ground vehicles based
method. Through feature fusion of IAFF-module, IA-AMF-Net on LM-BN-CNN,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 12,
models the interdependencies between images and efficiently pp. 7282–7293, Dec. 2018.
[10] J. Zhang, M. Xing, and Y. Xie, “FEC: A feature fusion framework for
fuses the classification features of multiple images based on SAR target recognition based on electromagnetic scattering features and
the image attention weights to achieve the superior recognition deep CNN features,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 3,
performance. pp. 2174–2187, Mar. 2021.
[11] Q. Zhao and J. Principe, “Support vector machines for SAR automatic
target recognition,” IEEE Trans. Aerosp. Electron. Syst., vol. 37, no. 2,
C. Future Work pp. 643–654, Apr. 2001.
[12] M. Loog and D. de Ridder, “Local discriminant analysis,” in Proc. Int.
Investigating the few-shot learning method for multiview Conf. Pattern Recognit., 2006, vol. 3, pp. 328–331.
SAR ATR and applying the IA-AMF-Net network to the multi- [13] Y. Sun, Z. Liu, S. Todorovic, and J. Li, “Adaptive boosting for SAR
automatic target recognition,” IEEE Trans. Aerosp. Electron. Syst., vol. 43,
view images obtained from other types of sensors, including the no. 1, pp. 112–125, Jan. 2007.
optical images, infrared images and radar high resolution range [14] J. O’Sullivan, M. DeVore, V. Kedia, and M. Miller, “SAR ATR per-
profiles, for target recognition are our future work. formance using a conditionally Gaussian model,” IEEE Trans. Aerosp.
Electron. Syst., vol. 37, no. 1, pp. 91–108, Jan. 2001.
[15] U. Srinivas, V. Monga, and R. G. Raj, “SAR automatic target recognition
V. CONCLUSION using discriminative graphical models,” IEEE Trans. Aerosp. Electron.
Syst., vol. 50, no. 1, pp. 591–606, Jan. 2014.
In this article, we proposed an adaptive multiview fusion [16] J. Liu, M. Xing, H. Yu, and G. Sun, “EFTL: Complex convolutional
networks with electromagnetic feature transfer learning for SAR target
network based on image attention, named IA-AMF-Net, for recognition,” IEEE Trans. Geosci. Remote Sens., vol. 60, Jun. 2022,
feature extraction and effective fusion of multiview images. Art. no. 5209811, doi: 10.1109/TGRS.2021.3083261.
IA-AMF-Net adopts DSCs to extract the classification features [17] D. Morgan, “Deep convolutional neural networks for ATR from SAR
imagery,” in Proc. SPIE Algorithms Synth. Aperture Radar Imagery XXII,
from multiview parallel input images in the WFE-module, and 2015, vol. 9475, Art. no. 94750F.
then fuses the extracted features by the image attention weights [18] S. Chen, H. Wang, F. Xu, and Y.-Q. Jin, “Target classification using the deep
in the IAFF-module. The attention to the classification features convolutional networks for SAR images,” IEEE Trans. Geosci. Remote
Sens., vol. 54, no. 8, pp. 4806–4817, Aug. 2016.
of interested images is enhanced, and the dimension of the [19] H. Furukawa, “Deep learning for target classification from SAR imagery:
fused feature keeps constant and is independent of the number Data augmentation and translation invariance,” 2017, arXiv:1708.07920.
of input images. Finally, DFEC-module calibrates the feature [20] D. Wang, Y. Song, J. Huang, D. An, and L. Chen, “SAR target clas-
sification based on multiscale attention super-class network,” IEEE J.
responses for the fusion feature and uses the softmax classifier Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 9004–9019,
for classification. Extensive experiments on the MSTAR dataset Sep. 2022, doi: 10.1109/JSTARS.2022.3206901.
demonstrate that IA-AMF-Net achieves satisfactory recognition [21] R. Li, X. Wang, J. Wang, Y. Song, and L. Lei, “SAR target recogni-
tion based on efficient fully convolutional attention block CNN,” IEEE
performance under various operating conditions, with less than Geosci. Remote Sens. Lett., vol. 19, Nov. 2022, Art. no. 4005905,
4% of the number of parameters and FLOPs of MVDCNN. doi: 10.1109/LGRS.2020.3037256.
Therefore, IA-AMF-Net can effectively be applied to the SAR [22] Z. Yue et al., “A novel semi-supervised convolutional neural network
method for synthetic aperture radar image recognition,” Cogn. Comput.,
ATR tasks and exhibit superiority in computation complexity, vol. 13, no. 4, pp. 795–806, 2021.
accuracy and robustness. [23] F. Gao et al., “SAR target incremental recognition based on features with
strong separability,” IEEE Trans. Geosci. Remote Sens., vol. 62, Jan. 2024,
Art. no. 5202813, doi: 10.1109/TGRS.2024.3351636.
REFERENCES [24] H. Huang, F. Gao, J. Sun, J. Wang, A. Hussain, and H. Zhou, “Novel cate-
gory discovery without forgetting for automatic target recognition,” IEEE
[1] F. Mena, D. Arenas, M. Nuske, and A. Dengel, “Common practices and J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 17, pp. 4408–4420,
taxonomy in deep multiview fusion for remote sensing applications,” IEEE Jan. 2024, doi: 10.1109/JSTARS.2024.3358449.
J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 17, pp. 4797–4818, [25] L. Li, J. Liu, L. Su, C. Ma, B. Li, and Y. Yu, “A novel
Feb. 2024, doi: 10.1109/JSTARS.2024.3361556. graph metalearning method for SAR target recognition,” IEEE
[2] T. Zhang, Y. Li, J. Wang, M. Xing, L. Guo, and P. Zhang, “A modified Geosci. Remote Sens. Lett., vol. 19, Jul. 2022, Art. no. 4015705,
range model and extended Omega-k algorithm for high-speed-high-squint doi: 10.1109/LGRS.2021.3097130.
SAR with curved trajectory,” IEEE Trans. Geosci. Remote Sens., vol. 61, [26] K. Fu, T. Zhang, Y. Zhang, Z. Wang, and X. Sun, “Few-
Mar. 2023, Art. no. 5204515, doi: 10.1109/TGRS.2023.3255518. shot SAR target classification via metalearning,” IEEE Trans.
[3] A. Moreira, P. Prats-Iraola, M. Younis, G. Krieger, I. Hajnsek, and K. P. Geosci. Remote Sens., vol. 60, Feb. 2022, Art. no. 2000314,
Papathanassiou, “A tutorial on synthetic aperture radar,” IEEE Geosci. doi: 10.1109/TGRS.2021.3058249.
Remote Sens. Mag., vol. 1, no. 1, pp. 6–43, Mar. 2013. [27] C. Wang, H. Gu, and W. Su, “SAR image classification us-
[4] F. Zhang, X. Sun, F. Ma, and Q. Yin, “Superpixelwise likelihood ra- ing contrastive learning and pseudo-labels with limited data,” IEEE
tio test statistic for polsar data and its application to built-up area ex- Geosci. Remote Sens. Lett., vol. 19, Apr. 2022, Art. no. 4012505,
traction,” ISPRS J. Photogramm. Remote Sens., vol. 209, pp. 233–248, doi: 10.1109/LGRS.2021.3069224.
2024. [28] H. Luo et al., “Multiview learning for impervious surface mapping us-
[5] F. Ma, F. Zhang, D. Xiang, Q. Yin, and Y. Zhou, “Fast task-specific region ing high-resolution multispectral imagery and LiDAR data,” IEEE J.
merging for SAR image segmentation,” IEEE Trans. Geosci. Remote Sens., Sel. Topics Appl. Earth Observ. Remote Sens., vol. 16, pp. 7866–7881,
vol. 60, Jan. 2022, Art. no. 5222316, doi: 10.1109/TGRS.2022.3141125. Nov. 2023, doi: 10.1109/JSTARS.2022.3221625.
ZHANG et al.: ADAPTIVE MULTIVIEW SAR AUTOMATIC TARGET RECOGNITION NETWORK BASED ON IMAGE ATTENTION 13645

[29] T. Zhang, X. Tong, and Y. Wang, “Semantics-assisted multiview fusion Yuanzhi Duan received the B.S. degree in electronic
for SAR automatic target recognition,” IEEE Geosci. Remote Sens. Lett., science and technology in 2022 from the Nanjing Uni-
vol. 21, Mar. 2024, Art. no. 4007005, doi: 10.1109/LGRS.2024.3374375. versity of Science and Technology, Nanjing, China,
[30] F. Zhang, C. Hu, Q. Yin, W. Li, H.-C. Li, and W. Hong, “Multi-aspect- where she is currently working toward the M.S. de-
aware bidirectional LSTM networks for synthetic aperture radar target gree in communication and information system.
recognition,” IEEE Access, vol. 5, pp. 26880–26891, 2017. Her research interests include radar signal process-
[31] J. Pei, Y. Huang, W. Huo, Y. Zhang, J. Yang, and T.-S. Yeo, “SAR automatic ing and SAR automatic target recognition.
target recognition based on multiview deep learning framework,” IEEE
Trans. Geosci. Remote Sens., vol. 56, no. 4, pp. 2196–2210, Apr. 2018.
[32] J. Pei et al., “FEF-Net: A deep learning approach to multiview SAR image
target recognition,” Remote. Sens., vol. 13, no. 17, 2021, Art. no. 3494.
[33] S. Li, Z. Pan, and Y. Hu, “Multi-aspect convolutional-transformer network Jindong Zhang received the B.S. degree in commu-
for SAR automatic target recognition,” Remote. Sens., vol. 14, no. 16, 2022, nication engineering and the M.S. degree in com-
Art. no. 3924. munication and information system from the Nan-
[34] J. Pei et al., “Multiview deep feature learning network for SAR automatic jing University of Science and Technology, Nanjing,
target recognition,” Remote. Sens., vol. 13, no. 8, 2021, Art. no. 1455. China, in 2021 and 2024, respectively.
[35] H. Zhou, C. Tian, Z. Zhang, C. Li, Y. Xie, and Z. Li, “PixelGame: Infrared His research interests include deep learning and
small target segmentation as a Nash equilibrium,” IEEE J. Sel. Topics SAR ship detection.
Appl. Earth Observ. Remote Sens., vol. 15, pp. 8010–8024, Sep. 2022,
doi: 10.1109/JSTARS.2022.3206062.
[36] X. Li, S. Chen, C. Tian, H. Zhou, and Z. Zhang, “M2FNet: Mask-guided
multi-level fusion for RGB-T pedestrian detection,” IEEE Trans. Multi- Minhui Gu received the B.S. degree in communica-
media, early access, Mar., 25, 2024, doi: 10.1109/TMM.2024.3381377. tion engineering in 2023 from the Nanjing University
[37] Z. Zhang et al., “Low-frequency amplitude fusion based consistency of Science and Technology, Nanjing, China, where
learning method for multi-source domain adaptation for joint optic disc she is currently working toward the M.S. degree in
and cup segmentation,” Biomed. Signal Process. Control, vol. 96, 2024, communication and information system.
Art. no. 106481. Her research interests include radar signal process-
[38] A. G. Howard et al., “MobileNets: Efficient convolutional neural networks ing and target detection.
for mobile vision applications,” 2017, arXiv:1704.04861.
[39] J. Zhang, W. Sheng, H. Zhu, S. Guo, and Y. Han, “MLBR-YOLOx:
An efficient SAR ship detection network with multilevel background
removing modules,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens.,
vol. 16, pp. 5331–5343, May 2023, doi: 10.1109/JSTARS.2023.3280741. Shurui Zhang (Member, IEEE) received the B.S. de-
[40] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc. gree in electronic and information engineering from
IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7132–7141. the Department of Electronic and Information En-
[41] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltz- gineering and the Ph.D. degree in information and
mann machines,” in Proc. Int. Conf. Mach. Learn., 2010, pp. 807–814. communication engineering from the Department of
[42] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Information and Communication Engineering, Nan-
Learn. Res., vol. 9, no. 86, pp. 2579–2605, 2008. jing University of Science and Technology (NJUST),
[43] X. Zhang, J. Pei, Y. Ma, Q. Yi, W. Huo, and Y. Huang, “Multiview feature in 2013 and 2019, respectively.
extraction and discrimination network for SAR ATR,” in Proc. IEEE Int. From 2017 to 2018, he was a visiting Ph.D. student
Geosci. Remote Sens. Symp., 2023, pp. 7042–7045. with the Department of Electronic and Computer
Engineering, McMaster University, Hamilton, ON,
Canada. He is currently an Associate Professor with the School of Electronic and
Optical Engineering, NJUST. His research interests include adaptive wideband
beamforming, multidimensional signal processing, target detection, and target
tracking.

Weixing Sheng received the B.Sc. degree in elec-


Renli Zhang (Member, IEEE) received the B.S. de- tronic engineering from Shanghai Jiao Tong Univer-
gree in electronic information engineering and the sity, Shanghai, China, in 1988, and the M.S. and
Ph.D. degree in communication and information sys- Ph.D. degrees in electronic engineering from the Nan-
tem from the Nanjing University of Science and jing University of Science and Technology, Nanjing,
Technology, Nanjing, China, in 2008 and 2013, re- China, in 1991 and 2002, respectively.
spectively. Since 1991, he has been with the School of Elec-
He is currently a Professor with the School of Elec- tronic and Optical Engineering, Nanjing University
tronic and Optical Engineering, Nanjing University of Science and Technology, where he is currently
of Science and Technology. His research interests a Professor. His current research interests include
include digital beamforming, radar signal processing, adaptive signal processing.
and anti-jamming.

You might also like