Adaptive Multiview SAR Target Recognition
Adaptive Multiview SAR Target Recognition
17, 2024
Abstract—The deep neural network has achieved remarkable high-resolution imaging capabilities [2], SAR has become an
recognition performance in synthetic aperture radar (SAR) auto- important tool for remote sensing and reconnaissance in modern
matic target recognition (ATR) by extracting the discriminative fea- society, and has been widely used in fields, such as resource
tures from massive SAR images. Due to the sensitivity of SAR image
to the observation aspect, the multiview ATR method could enhance exploration, environmental monitoring, and military applica-
the robustness of feature representation and improve the recogni- tions [3]. A superpixelwise polarimetric SAR change detection
tion performance. However, existing multiview ATR methods suffer method is employed in [4] for built-up area extraction. In [5], an
from increasing complex structure and heavy computation when end-to-end trainable deep network incorporating differentiable
the number of input images grows. An adaptive multiview fusion superpixel generation and merging steps is designed for SAR
network based on image attention (IA-AMF-Net) compatible with
variable number of input images is proposed for SAR ATR in this image segmentation.
article. In IA-AMF-Net, first, the depthwise separable convolution SAR automatic target recognition (ATR) [6] aims to locate
is employed to extract the classification features from multiple SAR the region of interest (ROI), which contains potential targets
input images in parallel with the lightweight attribute. Second, and then classify the targets in the ROI. Most of the SAR
the channel feature weight vector of each image is generated and ATR methods have been investigated for the single-view SAR
concatenated by applying the squeeze-and-excitation operation to
the extracted features. The image attention weights for feature fu- ATR scenario, including the machine learning-based [7], [8] and
sion are calculated through softmax normalizing the concatenated the deep learning (DL)-based [9], [10] methods. The machine
channel feature weights of input images. At last, the extracted fea- learning-based SAR target recognition methods, e.g., support
tures from multiview SAR images are fused by the obtained image vector machines (SVM) [11], linear discriminant analysis [12],
attention weights. The dimension of fused feature keeps constant adaptive boosting (AdaBoost) [13], conditional Gaussian model
regardless of the number of input images, and the attention to
the classification features of interested images is enhanced. Exper- (CGM) [14], and iterative graph thickening (IGT) [15], tend
imental results on the moving and stationary target acquisition to involve complex algorithms and rely on the single feature
and recognition dataset show that IA-AMF-Net achieves superior attribute due to the handcrafted feature extraction.
recognition performance under various operating conditions with The SAR ATR methods based on DL have been quickly
fewer parameters and lower computational load compared to the advanced with the inherent ability of DL for automatic fea-
other networks.
ture extraction [16]. In [17] and [18], the deep convolutional
Index Terms—Automatic target recognition (ATR), feature fus- neural network (CNN) and an all-convolutional networks (A-
ion, image attention, squeeze-and-excitation, multiview synthetic ConvNets) model addressing overfitting are presented for target
aperture radar (SAR).
classification. In [19], the importance of data augmentation on
the robustness of CNNs is emphasized by dealing with the
I. INTRODUCTION translation invariance of CNNs. In [20], the multiscale feature
EMOTE sensing technology, as an important means of fusion is combined with the attention module and the super-
R Earth observation, allows for the study and analysis of
human life and natural resources [1]. Remotely sensed data
class labels is introduced in the multiscale attention super-class
CNN (MSA-SCNN). The fully convolutional attention block
mainly include various types of sensor images, such as synthetic (FCAB) [21] is cooperated with a CNN to refine the important
aperture radar (SAR) images, optical images and hyperspec- features and suppress the unnecessary ones, meanwhile, this
tral images. With the advantages of all-weather, all-day, and method is computationally efficient with few parameters. In [22],
a semisupervised CNN method integrates the information con-
tained in the unlabeled samples into the loss function in the
Manuscript received 29 March 2024; revised 26 May 2024 and 14 July
2024; accepted 21 July 2024. Date of publication 26 July 2024; date of current training process, which improves the SAR ATR accuracy when
version 8 August 2024. This work was supported by the National Natural the labeled samples are insufficient.
Science Foundation of China under Grant 61971224, Grant 62001227, and Grant In [23], an incremental learning method based on strong sep-
62001230. (Corresponding author: Renli Zhang.)
The authors are with the School of Electronic and Optical Engineer- arability features (SSF-IL) is presented to address the model’s
ing, Nanjing University of Science and Technology, Nanjing 210094, China forgetting of previously learned knowledge. SSF-IL enables the
(e-mail: zhangrenli_nust@163.com; yuanzhiduan@njust.edu.cn; zhangjin- classifier to modify the decision boundary of old classes and mit-
dong@njust.edu.cn; guminhui10@njust.edu.cn; shuruizhang@njust.edu.cn;
shengwx@njust.edu.cn). igate the classification bias toward new classes. A cutting-edge
Digital Object Identifier 10.1109/JSTARS.2024.3434496 concept known as class incremental learning in novel category
© 2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see
https://creativecommons.org/licenses/by-nc-nd/4.0/
ZHANG et al.: ADAPTIVE MULTIVIEW SAR AUTOMATIC TARGET RECOGNITION NETWORK BASED ON IMAGE ATTENTION 13635
discovery for SAR targets (CNTs) is explored in [24]. In CNTs, input images grows. Therefore, those methods are not suitable
the inherent information within SAR images is captured by self- for the scenario where the SAR ATR task is implemented on
supervised learning, and a multiview consistency and enhanced a digital processing platform with limited computing resources
separability strategy is introduced to improve the performance for variable numbers of input images. The SAR ATR methods
of new category discovery. In order to improve the recognition in [31], [32], and [33] directly concatenate the inherent classi-
performance of the DL model, meta-learning [25], few-shot fication features from multiview SAR images for fusion, which
learning [26], and contrastive learning [27] have also been results in the feature fusion efficiency decrease and an increase
investigated. The methods above have been primarily applied in model size.
to the single-view SAR ATR scenario. In order to adapt to the variability of the number of mul-
Due to the backscattering characteristics of SAR targets, such tiview SAR input images, and simultaneously maintain the
as geometric distortion and aspect sensitivity, extracting suffi- robust recognition performance with a low computational load,
cient information from a single-view SAR image for the ATR an adaptive multiview fusion network for SAR ATR based
purpose could be challenging. Thus, the single-view SAR image on image attention (IA-AMF-Net) is proposed in this article.
suffers from inherent limitations for target recognition. Multi- In IA-AMF-Net, first, the weight-sharing feature extraction
view SAR images of the same target contain richer classification module (WFE-module) is utilized to extract the classification
information than a single one. Combining the complementary features from multiview SAR input images in parallel. The same
information from multiple views of SAR images allows for network and parameters are shared to each image. Second, the
more accurate and comprehensive interpretation of the target image attention-based feature fusion module (IAFF-module) is
features, which is beneficial for enhancing the robustness of designed to efficiently fuse the features from multiple images
feature representation and improving the recognition accuracy based on the importance of target features in each view image. In
in SAR ATR. Thus, the multiview learning is an effective method IAFF-module, the channel feature weight vector for each image
for SAR ATR through data fusion of the multifeature sets [28]. is generated by applying squeeze-and-excitation operation to
Exploring the correlation between multiple views and semantic the extracted features. Then, the image attention weights, which
information is crucial in the multiview SAR ATR [29]. model the interdependencies between images, are calculated
In [30], the bidirectional long-short term memory (BLSTM) through softmax normalizing the concatenated channel feature
networks based on spatial information learning utilizes 50 im- weights of input images along the channel dimension. The fea-
ages as input for multiview SAR target recognition, and the tures extracted from multiview SAR images are fused by using
amount of SAR images required in BLSTM is not suitable the obtained image attention weights, such that the attention
for practical ATR tasks. In [31], the classification features are to the classification features of interested images is enhanced,
learned and fused at different layers in the multiview deep and the dimension of the fusion feature keeps constant and is
convolutional neural network (MVDCNN) with a parallel net- independent of the number of input images. At last, the fusion
work topology. In [32], a feature extraction and fusion network feature is fed into the classifier in the deep feature extraction and
(FEF-Net) is designed by using the distinct and useful learning classification module (DFEC-module).
modules, such as deformable convolution. In [33], a convo- IA-AMF-Net has the capacity to adapt to the variable num-
lutional autoencoder (CAE) is utilized to extract features and bers of SAR input images with significantly reduced num-
improves the antinoise ability in the multiaspect SAR recog- ber of parameters and computational load compared to A-
nition network based on self-attention (MACTN). The training ConvNets, FCBA, MA-BLSTM, MVDCNN, FEF-Net, and
process of MACTN is complex because of the pre-training and MVDFLN, and achieves an excellent and robustness recognition
fine-tuning. In [34], a multiview deep feature learning network performance.
(MVDFLN) is designed to learn the intraview and interview The rest of this article is organized as follows. In Section II,
features from multiview SAR images with a complex structure. the network framework of IA-AMF-Net is designed. Section III
Besides, many effective feature fusion methods have been describes the details of the experiment and experimental results.
investigated by designing the complex network structures and The reasonability of the IA-AMF-Net and future work are dis-
learning algorithms. In [35], a competitive game framework cussed in Section IV. Finally, Section V concludes this article.
is designed to minimize the utility function for infrared small
target segmentation. The mask-guided multilevel fusion network
independently explores the consistent and specific features in II. PROPOSED IA-AMF-NET FOR SAR ATR
RGB-T modalities at three different levels for pedestrian detec- In this section, IA-AMF-Net is designed to extract and fuse
tion in [36]. The bidirectional low-frequency amplitude fusion classification features from multiview images for SAR ATR.
method augments the source and target domain images, and Fig. 1 shows the network architecture of IA-AMF-Net. Three
aligns the styles while preserving the content in [37]. main modules constitute the network, i.e., WFE-module, IAFF-
Although the multiview SAR ATR methods above have module, and DFEC-module.
achieved remarkable recognition performance, the following In Fig. 1, k parallel SAR images are fed into the WFE-module.
problems still remain: The multiview SAR ATR frameworks At first, the classification features of each image are extracted
in [31], [32], [33], and [34] are mainly applied for a fixed in the WFE module using the depthwise separable convolution
number of input images, and suffer from an increasing number (DSC) [38] with shared parameters. Then, IAFF-module em-
of parameters and high computational load when the amount of ploys the image attention to fuse the extracted features from
13636 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024
multiview images. The dimension of fusion feature is indepen- channels, height, and width of the classification feature uni ,
dent of the number of input images and the attention to the respectively.
classification features of interested images is enhanced. Finally, Fig. 2 shows the structure of standard convolution and DSC.
DFEC-module calibrates the feature responses for the fusion In standard convolution, each input channel corresponds to
features, and the recognition result is obtained by the softmax a convolutional kernel, and an output channel is the sum of
classifier. The details of IA-AMF-Net are presented as follows. the outputs of all convolutional kernels. DSC is consisted of
a depthwise convolution for filtering and a pointwise convo-
lution for combining. In depthwise convolution, each output
channel is determined by a single convolution kernel sliding
A. WFE-Module
in an input channel [39]. Pointwise convolution adopts 1 × 1
Drawing on the idea of Siamese network, the classification convolutional kernel to perform a linear combination on the
features from each input image are extracted by the same struc- output of depthwise convolution. The computational load of
ture and parameters in WFE-module. As shown in Fig. 1, the DSC has been reduced by (1/O + 1/DK 2
) times in comparison
structure of the WFE-module is composed of six DSCs and a to the standard convolutions, where O and DK × DK represent
max pooling layer. By employing the lightweight convolution the number of output channels and the convolution kernel size,
computation through DSC instead of standard convolution, the respectively [38]. Therefore, the model size and computation
WFE-module extracts the classification features from multiple cost of WFE-module are significantly reduced.
parallel input SAR images with shared parameters to ensure the
lightweight attribute.
In WFE-module, the classification feature uni is extracted B. IAFF-Module
from the ith image within a multiview SAR image combination IAFF-module calculates the image-level weights through the
sample of k images belonging to the target class n, where n ∈ squeeze-and-excitation operation to enhance the attention to the
[1, N ] and N represents the number of target classes, uni = classification features of interested images under the condition of
[u1ni , u2ni , . . . , uC l
ni ] with uni ∈ R
H×W
representing the feature varying input image quantities. Different from the squeeze and
of the lth channel, and C, H, and W denote the number of excitation attention (SEA) in [40], the squeeze-and-excitation
ZHANG et al.: ADAPTIVE MULTIVIEW SAR AUTOMATIC TARGET RECOGNITION NETWORK BASED ON IMAGE ATTENTION 13637
Fig. 2. Structure of standard convolution and DSC. (a) Standard convolution. (b) DSC.
Fig. 5. Optical and SAR images of 10 targets. (a) 2S1. (b) BMP2. (c) BRDM2. (d) BTR60. (e) BTR70. (f) D7. (g) T62. (h) T72. (i) ZIL131. (j) ZSU234.
C C
where W1 ∈ R r ×C and W2 ∈ RC× r are the weight matrices class as follows:
of the two FC layers for dimensionality-reduction and increas- exp (am )
ing, respectively [40], r is the reduction ratio, δ is the rectified pm = N . (5)
j exp (aj )
linear unit function [41], and σ is the sigmoid function. The
obtained sni represents the interdependencies between the C Hence, the class probabilities of the multiview SAR image
channels of feature uni . combination sample are represented as [p1 , p2 , . . . , pN ].
Then, the channel feature weight vector of each image is
employed to calculate the image attention weights for feature III. EXPERIMENTS AND RESULTS
fusion of multiview SAR images. The k channel feature weight The recognition performance and the computational load of
vectors from multiview images are concatenated to obtain the IA-AMF-Net is evaluated under various operating conditions
multiview channel feature weights Sn = [sn1 ; sn2 ; . . . ; snk ], and compared with other ATR methods in this section.
where Sn ∈ Rk×C . In order to enhance the attention to the
classification features of interested images, the concatenated A. Multiview SAR Image Construction
multiview channel feature weights are normalized by softmax
function along the channel dimension, and the image attention The moving and stationary target acquisition and recog-
weights S̄n = [s̄n1 ; s̄n2 ; . . . ; s̄nk ] are calculated by nition (MSTAR) SAR image dataset is used to evaluate the
performance of IA-AMF-Net. All SAR images from MSTAR
l exp slni including military vehicles and separate clutter are collected
s̄ni = (3)
k l
j=1 exp s nj
by a spotlight SAR sensor in the X-band with a resolution of
0.3 m × 0.3 m and using HH polarization. The publicly released
where slni and s̄lni denote the lth elements of sni and s̄ni , dataset contains ten classes of vehicle targets under various
respectively. Finally, the fusion feature Q = [q1 , q2 , . . . , qC ] is operating conditions, such as different aspect angles, depression
obtained by fusing the extracted features from multiview images angles, and serial numbers. Fig. 5 shows the optical and SAR
with the image attention weights as follows: images of ten targets.
k The public MSTAR dataset consists of two types of SAR im-
ql = s̄lni ulni (4) age acquisition conditions: standard operation condition (SOC)
i=1 and extended operation conditions (EOC). SOC refers to the
training and testing sets with similar imaging configurations and
where ql ∈ RH×W , l = 1, 2, . . . , C. target types, while EOC has more complex scenes with signif-
As shown in (4) and Fig. 3, the dimension of the fusion feature icant variations in depression angles, target configurations, and
Q is the same as that of the extracted classification feature uni target versions. The recognition performance of IA-AMF-Net is
of a single image, and is independent of the number of input evaluated under SOC and EOC.
images. Therefore, IAFF-module achieves autonomous feature Fig. 6 shows the geometric model for multiview SAR ATR of
fusion under the condition of varying input image quantities, a ground target. The multiview SAR images can be obtained by
and reduces the number of parameters. imaging the same target from different aspect and depression
angles by radars on one or more platforms, and each view
C. DFEC-Module direction refers to a given depression and aspect angle in which
The DFEC-module is designed at the end of the IA-AMF-Net the SAR image is obtained. Based on the MSTAR dataset, the
to enhance the depth of fusion feature and perform the multiclass depression angle of the multiview images is set to a constant,
classification. As shown in Fig. 1, the DFEC-module utilizes and the aspect angles of the images are covered from 0◦ to 360◦ .
DSCs and SEA to calibrate the multiview image fusion feature The SAR images are selected from the original MSTAR dataset,
Q obtained in IAFF-module. Then, the calibrated fusion feature and the corresponding multiview image samples are constructed
is transformed into a feature vector [a1 , a2 , . . . , aN ] through a and composed of multiple separate images with continuous
global average pooling layer and a FC layer. At last, the softmax variations in aspect angles.
classifier is applied to compute the probability of the multiview For a given view interval θ and view number k > 1, the
SAR image combination sample belonging to the mth target multiview image sample Xn is constructed by k selected images
ZHANG et al.: ADAPTIVE MULTIVIEW SAR AUTOMATIC TARGET RECOGNITION NETWORK BASED ON IMAGE ATTENTION 13639
TABLE I
NETWORK CONFIGURATION OF IA-AMF-NET
TABLE II
NUMBER OF RAW IMAGES FOR TRAINING AND TESTING SAMPLES UNDER SOC
TABLE III
NUMBER OF RAW IMAGES FOR TRAINING AND TESTING SAMPLES UNDER
EOC-D
Fig. 10. Results of visualization. (a) Input samples. (b) Outputs of the FC layer.
Fig. 11. Confusion matrix of a two-view IA-AMF-Net under EOC-D. Fig. 12. Confusion matrix of a three-view IA-AMF-Net under EOC-D.
TABLE IV
NUMBER OF RAW IMAGES FOR TRAINING SAMPLES UNDER EOC-V
TABLE V
NUMBER OF RAW IMAGES FOR TESTING SAMPLES UNDER EOC-V
TABLE VI
CONFUSION MATRIX OF IA-AMF-NET UNDER EOC-V
TABLE VII
COMPARISON OF THE NUMBER OF PARAMETERS AND FLOPS
of 99.07%, 99.82%, and 99.99%, respectively. Remarkably, the IA-AMF-Net keeps constant when the number of input images
recognition rates for the major classes can reach to 100%. grows, and is less than that of lightweight network FCAB.
The experiment results above demonstrate that IA-AMF-Net The number of parameters and FLOPs in IA-AMF-Net are
achieves the outstanding recognition performance under differ- both less than 4% of those in MVDCNN. Benefiting from the
ent ATR operating conditions, indicating its potential capacity DSC structure, the parameter sharing strategy and the constant
for SAR ATR tasks in remote sensing applications. dimension of fused feature, IA-AMF-Net method is lightweight
and efficient for SAR ATR.
Next, the recognition performance of IA-AMF-Net under
E. Recognition Performance Comparison SOC and EOC is compared with that of other methods. Be-
In this section, the performance of IA-AMF-Net is compared cause these methods are based on the MSTAR dataset with
with that of other methods, including traditional CNN [17], different principles and implementations, the various formats
A-ConvNets [18], CNN using augmented training data [19], of the input samples used for training in these methods make the
FCAB [21], multiaspect-aware BLSTM (MA-BLSTM) [30], performance comparison difficult [31]. Thus, we simultaneously
MVDCNN [31], FEF-Net [32], MVDFLN [34], SVM [11], compare the recognition rate and the number of raw images for
CGM [14], AdaBoost [13], IGT [15], MSA-SCNN [20], training samples in these methods. An ATR method is identified
MACTN [33], and multiview feature extraction and discrimi- to be superior if it achieves a higher recognition rate with fewer
nation network [43]. raw images for training.
Table VII illustrates the recognition rates, the number of Table VIII shows the recognition rates of various methods
parameters (# Params), and floating point operations (FLOPs) under SOC and EOC, as well as the number of raw images for
of different networks under SOC. IA-AMF-Net with two-view, training. It can be observed that the recognition performance
three-view, and four-view inputs is denoted as 2-IA-AMF-Net, of the methods based on DL surpasses that of the traditional
3-IA-AMF-Net, and 4-IA-AMF-Net, respectively. As shown methods. Due to the rich target classification information of
in Table VII, the recognition performance of IA-AMF-Net is multiview images, the recognition ability of the multiview meth-
superior to that of other networks. The number of parameters in ods is superior to that of the single-view methods. IA-AMF-Net
ZHANG et al.: ADAPTIVE MULTIVIEW SAR AUTOMATIC TARGET RECOGNITION NETWORK BASED ON IMAGE ATTENTION 13643
TABLE VIII
COMPARISON OF RECOGNITION RATE AND THE NUMBER OF RAW IMAGES
Fig. 14. Feature maps of 1st DSC layer in DFEC-module. (a) Single-view input. (b) Three-view inputs.
IV. DISCUSSION
For further discussion on IA-AMF-Net, the experiments are effectively weight and fuse the target information contained in
conducted to explore the effectiveness of this network. the multiview inputs according to the importance of each image,
and achieves a high recognition rate.
A. Effectiveness Analysis of Feature Fusion for IA-AMF-Net
Fig. 14 shows the feature map outputs of IA-AMF-Net with B. Ablation Study of IA-AMF-Net
single-view input and three-view inputs to verify the fusion The IA-AMF-Net mainly consists of three parts: WFE-
efficiency of the image attention using the multiview images. module, IAFF-module, and DFEC-module. The ablation exper-
Partial feature map outputs of the first DSC layer in the DFEC- iment is conducted to verify the effectiveness of IAFF-module,
module are selected and substituted into the interpolation pro- which is used for feature fusion. The MSTAR datasets under
cessing. Compared with the feature map output of single-view SOC and EOC with two-view inputs are used for the ablation
in Fig. 14(a), the impact of the speckle noise in the raw SAR experiment. The ablation results are shown in Table IX. In
images on the three-view feature map in Fig. 14(b) has been Table IX, the “Average” method replaces the image attention
reduced, and the central region of the target is enhanced with weights s̄lni in (4) of IAFF-module with 1/k for feature fusion
obvious brightness. Therefore, IA-AMF-Net has the capacity to which treats the feature of each image as equally important
13644 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024
as those of other multiview images, and the “IAFF-module [6] C. Clemente, L. Pallotta, D. Gaglione, A. De Maio, and J. J. Soraghan,
without softmax” method replaces the weights s̄lni in (4) of “Automatic target recognition of military vehicles with Krawtchouk mo-
ments,” IEEE Trans. Aerosp. Electron. Syst., vol. 53, no. 1, pp. 493–500,
IAFF-module with slni for feature fusion, which focuses on the Feb. 2017.
interchannel dependencies of the feature of each image while [7] S. Dang, Z. Cui, Z. Cao, and N. Liu, “SAR target recognition via incre-
mental nonnegative matrix factorization,” Remote. Sens., vol. 10, no. 3,
neglecting the importance of interview relationships. As shown 2018, Art. no. 374.
in Table IX, the recognition rates of IA-AMF-Net with two-view [8] L. Tao, X. Jiang, X. Liu, Z. Li, and Z. Zhou, “MultiScale supervised kernel
inputs improve by 2.09% under SOC, 2.44% under EOC-D, dictionary learning for SAR target recognition,” IEEE Trans. Geosci.
Remote Sens., vol. 58, no. 9, pp. 6281–6297, Sep. 2020.
and 2.81% under EOC-V compared to the baseline “Average” [9] F. Zhou, L. Wang, X. Bai, and Y. Hui, “SAR ATR of ground vehicles based
method. Through feature fusion of IAFF-module, IA-AMF-Net on LM-BN-CNN,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 12,
models the interdependencies between images and efficiently pp. 7282–7293, Dec. 2018.
[10] J. Zhang, M. Xing, and Y. Xie, “FEC: A feature fusion framework for
fuses the classification features of multiple images based on SAR target recognition based on electromagnetic scattering features and
the image attention weights to achieve the superior recognition deep CNN features,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 3,
performance. pp. 2174–2187, Mar. 2021.
[11] Q. Zhao and J. Principe, “Support vector machines for SAR automatic
target recognition,” IEEE Trans. Aerosp. Electron. Syst., vol. 37, no. 2,
C. Future Work pp. 643–654, Apr. 2001.
[12] M. Loog and D. de Ridder, “Local discriminant analysis,” in Proc. Int.
Investigating the few-shot learning method for multiview Conf. Pattern Recognit., 2006, vol. 3, pp. 328–331.
SAR ATR and applying the IA-AMF-Net network to the multi- [13] Y. Sun, Z. Liu, S. Todorovic, and J. Li, “Adaptive boosting for SAR
automatic target recognition,” IEEE Trans. Aerosp. Electron. Syst., vol. 43,
view images obtained from other types of sensors, including the no. 1, pp. 112–125, Jan. 2007.
optical images, infrared images and radar high resolution range [14] J. O’Sullivan, M. DeVore, V. Kedia, and M. Miller, “SAR ATR per-
profiles, for target recognition are our future work. formance using a conditionally Gaussian model,” IEEE Trans. Aerosp.
Electron. Syst., vol. 37, no. 1, pp. 91–108, Jan. 2001.
[15] U. Srinivas, V. Monga, and R. G. Raj, “SAR automatic target recognition
V. CONCLUSION using discriminative graphical models,” IEEE Trans. Aerosp. Electron.
Syst., vol. 50, no. 1, pp. 591–606, Jan. 2014.
In this article, we proposed an adaptive multiview fusion [16] J. Liu, M. Xing, H. Yu, and G. Sun, “EFTL: Complex convolutional
networks with electromagnetic feature transfer learning for SAR target
network based on image attention, named IA-AMF-Net, for recognition,” IEEE Trans. Geosci. Remote Sens., vol. 60, Jun. 2022,
feature extraction and effective fusion of multiview images. Art. no. 5209811, doi: 10.1109/TGRS.2021.3083261.
IA-AMF-Net adopts DSCs to extract the classification features [17] D. Morgan, “Deep convolutional neural networks for ATR from SAR
imagery,” in Proc. SPIE Algorithms Synth. Aperture Radar Imagery XXII,
from multiview parallel input images in the WFE-module, and 2015, vol. 9475, Art. no. 94750F.
then fuses the extracted features by the image attention weights [18] S. Chen, H. Wang, F. Xu, and Y.-Q. Jin, “Target classification using the deep
in the IAFF-module. The attention to the classification features convolutional networks for SAR images,” IEEE Trans. Geosci. Remote
Sens., vol. 54, no. 8, pp. 4806–4817, Aug. 2016.
of interested images is enhanced, and the dimension of the [19] H. Furukawa, “Deep learning for target classification from SAR imagery:
fused feature keeps constant and is independent of the number Data augmentation and translation invariance,” 2017, arXiv:1708.07920.
of input images. Finally, DFEC-module calibrates the feature [20] D. Wang, Y. Song, J. Huang, D. An, and L. Chen, “SAR target clas-
sification based on multiscale attention super-class network,” IEEE J.
responses for the fusion feature and uses the softmax classifier Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 9004–9019,
for classification. Extensive experiments on the MSTAR dataset Sep. 2022, doi: 10.1109/JSTARS.2022.3206901.
demonstrate that IA-AMF-Net achieves satisfactory recognition [21] R. Li, X. Wang, J. Wang, Y. Song, and L. Lei, “SAR target recogni-
tion based on efficient fully convolutional attention block CNN,” IEEE
performance under various operating conditions, with less than Geosci. Remote Sens. Lett., vol. 19, Nov. 2022, Art. no. 4005905,
4% of the number of parameters and FLOPs of MVDCNN. doi: 10.1109/LGRS.2020.3037256.
Therefore, IA-AMF-Net can effectively be applied to the SAR [22] Z. Yue et al., “A novel semi-supervised convolutional neural network
method for synthetic aperture radar image recognition,” Cogn. Comput.,
ATR tasks and exhibit superiority in computation complexity, vol. 13, no. 4, pp. 795–806, 2021.
accuracy and robustness. [23] F. Gao et al., “SAR target incremental recognition based on features with
strong separability,” IEEE Trans. Geosci. Remote Sens., vol. 62, Jan. 2024,
Art. no. 5202813, doi: 10.1109/TGRS.2024.3351636.
REFERENCES [24] H. Huang, F. Gao, J. Sun, J. Wang, A. Hussain, and H. Zhou, “Novel cate-
gory discovery without forgetting for automatic target recognition,” IEEE
[1] F. Mena, D. Arenas, M. Nuske, and A. Dengel, “Common practices and J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 17, pp. 4408–4420,
taxonomy in deep multiview fusion for remote sensing applications,” IEEE Jan. 2024, doi: 10.1109/JSTARS.2024.3358449.
J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 17, pp. 4797–4818, [25] L. Li, J. Liu, L. Su, C. Ma, B. Li, and Y. Yu, “A novel
Feb. 2024, doi: 10.1109/JSTARS.2024.3361556. graph metalearning method for SAR target recognition,” IEEE
[2] T. Zhang, Y. Li, J. Wang, M. Xing, L. Guo, and P. Zhang, “A modified Geosci. Remote Sens. Lett., vol. 19, Jul. 2022, Art. no. 4015705,
range model and extended Omega-k algorithm for high-speed-high-squint doi: 10.1109/LGRS.2021.3097130.
SAR with curved trajectory,” IEEE Trans. Geosci. Remote Sens., vol. 61, [26] K. Fu, T. Zhang, Y. Zhang, Z. Wang, and X. Sun, “Few-
Mar. 2023, Art. no. 5204515, doi: 10.1109/TGRS.2023.3255518. shot SAR target classification via metalearning,” IEEE Trans.
[3] A. Moreira, P. Prats-Iraola, M. Younis, G. Krieger, I. Hajnsek, and K. P. Geosci. Remote Sens., vol. 60, Feb. 2022, Art. no. 2000314,
Papathanassiou, “A tutorial on synthetic aperture radar,” IEEE Geosci. doi: 10.1109/TGRS.2021.3058249.
Remote Sens. Mag., vol. 1, no. 1, pp. 6–43, Mar. 2013. [27] C. Wang, H. Gu, and W. Su, “SAR image classification us-
[4] F. Zhang, X. Sun, F. Ma, and Q. Yin, “Superpixelwise likelihood ra- ing contrastive learning and pseudo-labels with limited data,” IEEE
tio test statistic for polsar data and its application to built-up area ex- Geosci. Remote Sens. Lett., vol. 19, Apr. 2022, Art. no. 4012505,
traction,” ISPRS J. Photogramm. Remote Sens., vol. 209, pp. 233–248, doi: 10.1109/LGRS.2021.3069224.
2024. [28] H. Luo et al., “Multiview learning for impervious surface mapping us-
[5] F. Ma, F. Zhang, D. Xiang, Q. Yin, and Y. Zhou, “Fast task-specific region ing high-resolution multispectral imagery and LiDAR data,” IEEE J.
merging for SAR image segmentation,” IEEE Trans. Geosci. Remote Sens., Sel. Topics Appl. Earth Observ. Remote Sens., vol. 16, pp. 7866–7881,
vol. 60, Jan. 2022, Art. no. 5222316, doi: 10.1109/TGRS.2022.3141125. Nov. 2023, doi: 10.1109/JSTARS.2022.3221625.
ZHANG et al.: ADAPTIVE MULTIVIEW SAR AUTOMATIC TARGET RECOGNITION NETWORK BASED ON IMAGE ATTENTION 13645
[29] T. Zhang, X. Tong, and Y. Wang, “Semantics-assisted multiview fusion Yuanzhi Duan received the B.S. degree in electronic
for SAR automatic target recognition,” IEEE Geosci. Remote Sens. Lett., science and technology in 2022 from the Nanjing Uni-
vol. 21, Mar. 2024, Art. no. 4007005, doi: 10.1109/LGRS.2024.3374375. versity of Science and Technology, Nanjing, China,
[30] F. Zhang, C. Hu, Q. Yin, W. Li, H.-C. Li, and W. Hong, “Multi-aspect- where she is currently working toward the M.S. de-
aware bidirectional LSTM networks for synthetic aperture radar target gree in communication and information system.
recognition,” IEEE Access, vol. 5, pp. 26880–26891, 2017. Her research interests include radar signal process-
[31] J. Pei, Y. Huang, W. Huo, Y. Zhang, J. Yang, and T.-S. Yeo, “SAR automatic ing and SAR automatic target recognition.
target recognition based on multiview deep learning framework,” IEEE
Trans. Geosci. Remote Sens., vol. 56, no. 4, pp. 2196–2210, Apr. 2018.
[32] J. Pei et al., “FEF-Net: A deep learning approach to multiview SAR image
target recognition,” Remote. Sens., vol. 13, no. 17, 2021, Art. no. 3494.
[33] S. Li, Z. Pan, and Y. Hu, “Multi-aspect convolutional-transformer network Jindong Zhang received the B.S. degree in commu-
for SAR automatic target recognition,” Remote. Sens., vol. 14, no. 16, 2022, nication engineering and the M.S. degree in com-
Art. no. 3924. munication and information system from the Nan-
[34] J. Pei et al., “Multiview deep feature learning network for SAR automatic jing University of Science and Technology, Nanjing,
target recognition,” Remote. Sens., vol. 13, no. 8, 2021, Art. no. 1455. China, in 2021 and 2024, respectively.
[35] H. Zhou, C. Tian, Z. Zhang, C. Li, Y. Xie, and Z. Li, “PixelGame: Infrared His research interests include deep learning and
small target segmentation as a Nash equilibrium,” IEEE J. Sel. Topics SAR ship detection.
Appl. Earth Observ. Remote Sens., vol. 15, pp. 8010–8024, Sep. 2022,
doi: 10.1109/JSTARS.2022.3206062.
[36] X. Li, S. Chen, C. Tian, H. Zhou, and Z. Zhang, “M2FNet: Mask-guided
multi-level fusion for RGB-T pedestrian detection,” IEEE Trans. Multi- Minhui Gu received the B.S. degree in communica-
media, early access, Mar., 25, 2024, doi: 10.1109/TMM.2024.3381377. tion engineering in 2023 from the Nanjing University
[37] Z. Zhang et al., “Low-frequency amplitude fusion based consistency of Science and Technology, Nanjing, China, where
learning method for multi-source domain adaptation for joint optic disc she is currently working toward the M.S. degree in
and cup segmentation,” Biomed. Signal Process. Control, vol. 96, 2024, communication and information system.
Art. no. 106481. Her research interests include radar signal process-
[38] A. G. Howard et al., “MobileNets: Efficient convolutional neural networks ing and target detection.
for mobile vision applications,” 2017, arXiv:1704.04861.
[39] J. Zhang, W. Sheng, H. Zhu, S. Guo, and Y. Han, “MLBR-YOLOx:
An efficient SAR ship detection network with multilevel background
removing modules,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens.,
vol. 16, pp. 5331–5343, May 2023, doi: 10.1109/JSTARS.2023.3280741. Shurui Zhang (Member, IEEE) received the B.S. de-
[40] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc. gree in electronic and information engineering from
IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7132–7141. the Department of Electronic and Information En-
[41] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltz- gineering and the Ph.D. degree in information and
mann machines,” in Proc. Int. Conf. Mach. Learn., 2010, pp. 807–814. communication engineering from the Department of
[42] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Information and Communication Engineering, Nan-
Learn. Res., vol. 9, no. 86, pp. 2579–2605, 2008. jing University of Science and Technology (NJUST),
[43] X. Zhang, J. Pei, Y. Ma, Q. Yi, W. Huo, and Y. Huang, “Multiview feature in 2013 and 2019, respectively.
extraction and discrimination network for SAR ATR,” in Proc. IEEE Int. From 2017 to 2018, he was a visiting Ph.D. student
Geosci. Remote Sens. Symp., 2023, pp. 7042–7045. with the Department of Electronic and Computer
Engineering, McMaster University, Hamilton, ON,
Canada. He is currently an Associate Professor with the School of Electronic and
Optical Engineering, NJUST. His research interests include adaptive wideband
beamforming, multidimensional signal processing, target detection, and target
tracking.