Unsupervised Multimodal Medical Image Fusion
Unsupervised Multimodal Medical Image Fusion
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
5026917 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
XIE et al.: MRSCFusion: JOINT RESIDUAL SWIN TRANSFORMER AND MULTISCALE CNN 5026917
which leads to the distortion of brightness, color, and other It is worth noting that the parameters in these fusion strategies
information. cannot be changed, and hence, they are unlearnable.
The SR-based methods [12] combine the local and single 2) End-to-End CNN-Based Image Fusion: Several CNN-
component fusion methods, such as orthogonal matching pur- based end-to-end fusion methods have been suggested in
suit (OMP), simultaneous OMP (SOMP), and dictionary-based [30], [31], [32], [33], [34], [35], and [36]. In [30], a gen-
adaptive SR [13], [14] for image fusion. Although these erative adversarial network (GAN) was introduced, where the
methods have the advantages of easy solutions, their per- generator computed the fused images, and the discriminator
formance is constrained by less flexibility to design fusion constrained the details. Later, DDcGAN [31] introduced a
strategies, poor detail preservation ability, and low robustness dual-discriminator architecture to enhance the prominence of
for misregistration and noise. thermal targets; nevertheless, although the fused images tend
The PCNN can extract image features and reduce the to be similar to the source images, they fail to preserve the
difference in the source images. PCNN-based image fusion details and also cause content loss. It is difficult to get stable
methods are generally divided into two categories [15], [16], fusion images due to the application of a discriminator. To
[17], [18]. The first is the combination of PCNN and multiscale solve this problem, GANMcC [32] proposed a multiclassifi-
transform, such as the combination of PCNN and NSST cation constraint GAN-based method to obtain better fused
[15], PCNN and wavelet transform [16], PCNN and NSCT results with more significant contrast and abundant texture
[17]. The other is to reduce the computational complexity information. Zhang et al. [33] developed an end-to-end image
of PCNN [18]. Although PCNN performed well in image fusion network (IFCNN) that was a simple and effective fusion
fusion, parameters in the PCNN model relied on experience network. IFCNN used two convolution layers to extract the
and manual debugging. PCNN has had a more advanced adap- deep feature and used the element-wise fusion rule to fuse
tive development recently. For example, the gradient descent the features. Although IFCNN was a state-of-the-art image
method was employed to adapt the parameters and optimize fusion model, the fusion strategy in the traditional ways was
the decay coefficient; however, these PCNN-based methods still not optimal. Xu et al. [34] proposed an unsupervised end-
still have a high computational cost and difficult design of to-end network that tackled the fusion task by combining it
fusion rule [19]. with continual learning. Li et al. [35] designed a novel residual
fusion network with a two-stage training strategy to supercede
handcrafted fusion strategies. They, furthermore, proposed a
B. Deep-Learning Image Fusion new loss to retain more details and salient information. Xu
1) NonEnd-to-End Image Fusion: Recently, deep neural and Ma [36] proposed an unsupervised medical image fusion
networks have been introduced for image fusion to overcome model. The network aimed at enhancing the chrominance
the shortcomings of traditional approaches. Early on, deep information by surface- and deep-level constraints; however,
neural networks have been employed to extract image features these methods focused on learning spatial local features and
[21], [23]. For example, Li et al. [21] designed a fusion did not take into account the long-range dependencies in
network, where the images were decomposed into basic parts images.
and salient parts. Then, the decision maps were calculated
from the features and fused by average feature fusion. After C. Vision Transformer
that, the same authors proposed a deep fusion framework Despite CNNs providing highly efficient and generalizable
that used the pretrained ResNet-50 to extract deep features solutions for image fusion, the local extraction ability limits
[23]. In [24], the feature extraction and the fusion method the capture of long-range dependencies. Transformer-based
were conducted by a single network. In addition, the CNN model [43] derived from natural language processing (NLP)
was trained based on multiple blurred image patches, and the can capture long-range dependencies and has become popular.
decision map was generated, from which the fused images The success of Transformer stems from its self-attention (SA)
could easily be reconstructed [25]. Inspired by DeepFuse [26], mechanism, which can model long-standing relationships.
an autoencoder fusion network was developed in [27], which SA aims to capture the relationship between all features by
consisted of the encoder, fusion layer, and decoder. In this transforming the inputs into query, key, and value by using
framework, the features of multimodal images were extracted three learnable weight matrices.
by Dense blocks [28] and CNN layers and fused by fusion Visual Transformer (ViT) [44] was first proposed to per-
strategy. Although it mitigated the degradation problem in form image recognition tasks and has achieved excellent
traditional CNN-based networks, the network heavily relied performance; moreover, according to most recent studies, the
on the fusion rules. In [29], NestFuse has been designed to prediction errors of ViT are closer to that of humans compared
enhance the salient/intensity features, as well as preserve more with CNN. These desirable properties of Transformers attract
details. Besides, a novel attention model was proposed to fuse great interest in the medical community, and their applications
the images. alleviate the inherent inductive bias of CNNs. The Transformer
The deep-learning-based fusion models have achieved can effectively encode organs distributed in large receptive
excellent performance; however, the above-mentioned methods fields by spatially modeling the relationship between distant
generally employ handcrafted fusion strategies such as element pixels. Most Transformer-based models have shown superior
addition and element maximum to combine the deep features, performance over CNN-based methods. For conducting image
which hinder the optimal performance of the fusion model. fusion, IFT [45] developed a Transformer-based multiscale
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
5026917 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
XIE et al.: MRSCFusion: JOINT RESIDUAL SWIN TRANSFORMER AND MULTISCALE CNN 5026917
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
5026917 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
XIE et al.: MRSCFusion: JOINT RESIDUAL SWIN TRANSFORMER AND MULTISCALE CNN 5026917
balance between L ssim and L grad . Here, we set α = 0.01 via where λ1 and λ2 are the parameters to control the trade-off
performing the ablation study (Section IV-D). between L content and L int . Here, we set λ1 = 1000, λ2 = 10 via
a) Structural similarity loss: The structural similarity performing the ablation study (Section IV-D).
(SSIM) has been widely used in image processing community,
which displays the distortion of image structures. We employ IV. E XPERIMENTS AND D ISCUSSIONS
structural similarity loss to constrain the structural similarity
In this section, we perform extensive experiments to demon-
between the source images and the fusion image. The struc-
strate the validation of the proposed image fusion strategy.
tural similarity loss is defined as
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
5026917 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
TABLE I
E VALUATION M ETRICS FOR Q UANTITATIVE C OMPARISONS
Fig. 9. Qualitative comparisons of MRSCFusion with other methods on three pairs of representative CT and MRI images. From left to right: CT images,
MRI images, fusion results of IFCNN, FusionGAN, U2fusion, DenseFuse, EMFusion, RFN-Nest, IFT, and MRSCFusion (ours).
batch size is to choose exponentials of 2 as that uses memory quantitative evaluation metrics, including the sum of correla-
most efficiently. The number of epochs is the number of times tion differences (SCD) [55], multiscale structural similarity
that the network will work through the training dataset. It is index measure (MS-SSIM) [53], edge information measure-
dependent on the type of network and the dataset. In this work, ment Q AB/F [56], feature mutual information (FMI) [57],
the batch size and epoch are set to 4 and 2, respectively, since spatial frequency (SF) [59], and visual information fidelity
the training data is sufficient and the loss can be converged (VIF) [58]. These evaluation metrics and their descriptions
under the limited computational memory. In addition, the are presented in Table I.
effectiveness of these parameters has been verified in previous
studies [27], [29], [35], [45].
C. Experiment Results and Discussions
1) CT-MRI: To intuitively display the fusion performance,
B. Comparison Methods and Evaluation Metrics qualitative fusion comparisons on three pairs of representative
To demonstrate the effectiveness of the proposed MRSC- CT and MRI images are illustrated in Fig. 9. From left to
Fusion, we conduct comprehensive comparisons with seven right are CT images, MRI images, the fusion images with
representative fusion methods, including DenseFuse [27], IFCNN, FusionGAN, U2Fusion, DenseFuse, EMFusion, RFN-
FusionGAN [30], IFCNN [33], U2Fusion [34], RFN-Nest Nest, IFT, and our MRSCFusion. In the CT and MRI image
[35], EMFusion [36], and IFT [45]. We perform these methods fusion, we expect that dense structures (e.g., bone structures)
using their publicly available codes with the corresponding in CT images and soft tissues in MRI images can be simul-
parameter settings. taneously retained in the fused images. To better demonstrate
Qualitative and quantitative comparisons of the fused results the fusion effect, we zoomed-in view on an area with more
are implemented. The qualitative results intuitively evaluate dense structures and texture details in a red box. As shown
the fusion performance based on visual perception. For the in Fig. 9, the brightness and sharpness of fused images by
multimodal medical image fusion, we expect the fused results FusionGAN, U2Fusion, DenseFuse, RFN-Nest, and IFT are
to contain abundant texture information and appropriate inten- unsatisfactory. The gray matter in CT images blurs the details
sity information. The quantitative results indicate the objective of MRI images with some comparison methods. Specifically,
assessments of the fusion performance. We choose six popular the FusionGAN brings in much redundant information like
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
XIE et al.: MRSCFusion: JOINT RESIDUAL SWIN TRANSFORMER AND MULTISCALE CNN 5026917
TABLE II
Q UANTITATIVE C OMPARISONS OF MRSCF USION W ITH S EVEN C OMPETITORS ON CT-MRI F USION .
S IX M ETRICS A RE S HOWN B ELOW (AVERAGE VALUE )
Fig. 10. Quantitative comparisons of MRSCFusion with other methods on 21 pairs of CT and MRI images. Six evaluation metrics are used, and average
values with different methods are marked in legends.
artifacts. In addition, for IFCNN, the fused images can retain ability of fine-grained details with the introduced GRDB. The
structure details well in MRI images but the dense structures highest VIF demonstrates our fusion images have a better
in CT images are weakened. EMFusion can well retain dense visual effect, which is consistent with qualitative results. In
structures but lose a little edge detail. By contrast, our MRSC- addition, for SF, the MRSCFusion merely follows behind
Fusion can preserve more edge details, as well as retain the IFCNN, which implies our fused images also contain rich
dense structures in fused images. texture details.
The quantitative fusion comparisons with six evaluation 2) PET-MRI: The qualitative fusion comparisons on three
metrics on 21 test CT and MRI image pairs are reported in pairs of representative PET and MRI images are illustrated
Table II and Fig. 10. In Table II, we can find that the MRSCFu- in Fig. 11. From left to right are PET images, MRI images,
sion achieves optimal results (average values) on metrics SCD, and the fused images with different methods. We zoomed-
MS-SSIM, Q AB/F , FMI, VIF, and ranks second on metric SF. in view on an area with more functional (color) information
In Fig. 10, the evaluation scores of each metric on 21 fused and texture information in a red box to better exhibit the
images are connected by a line, and the average value of each fusion effect. As shown in Fig. 11, the MRSCFusion shows
method is marked in the legend. Obviously, for MS-SSIM, several superiorities. We find that most competitors can retain
Q AB/F , and VIF, the MRSCFusion achieves the highest scores the functional information well in PET images, but lose
in all 21 fused images. For SCD and FMI, it achieves higher some texture details in MRI images. Specifically, the fused
scores for most single fused images than competitors with the images by FusionGAN contain some redundant information,
highest average values. The best results on SCD and MS-SSIM and texture details are blurred. The IFCNN can extract the
indicate our fused images contain abundant information with texture information well from MRI images but the fused
less distortion, and achieve higher structural similarities. The images are a little dark, indicating that it cannot retain the
highest FMI indicates that MRSCFusion transfers more dense color (functional) information well from PET images. In addi-
structures from CT images to fused images; moreover, the tion, the U2Fusion, EMFusion, RFN-Nest, IFT, and DenseFuse
highest Q AB/F that reflects more edge information from MRI can achieve relatively satisfactory fusion results, while there is
images is preserved, which benefits from the strong extraction some loss of texture details. In contrast, our MRSCFusion well
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
5026917 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
Fig. 11. Qualitative comparisons of MRSCFusion with other methods on three pairs of representative PET and MRI images. From left to right: PET images,
MRI images, fusion results of IFCNN, FusionGAN, U2fusion, DenseFuse, EMFusion, RFN-Nest, IFT, and MRSCFusion (ours).
Fig. 12. Quantitative comparisons of MRSCFusion with other methods on 21 pairs of PET and MRI images. Six evaluation metrics are used, and average
values with different methods are marked in legends.
preserves texture details and meanwhile retains the functional effect, which is also consistent with qualitative results. Overall,
information; moreover, there are fewer mosaics in our fusion based on visual perception and the objective assessment, our
images, which are more consistent with the human vision. MRSCFusion achieves better performance.
The quantitative fusion comparisons of six evaluation met- 3) SPECT-MRI: The qualitative fusion comparisons on
rics on 21 test PET and MRI image pairs are reported in three pairs of representative SPECT and MRI images are
Table III and Fig. 12. In Table III, we can find that the illustrated in Fig. 13. From left to right are SPECT images,
MRSCFusion achieves the optimal results on metrics SCD, MRI images, and the fused images with different methods. To
Q AB/F , FMI, SF, and VIF, and ranks second on metric intuitively display qualitative comparisons, we zoomed-in view
MS-SSIM following IFCNN with a narrow margin. In Fig. 12, on a local area in a red box. Similar to PET and MRI image
the evaluation scores of each metric on 21 fused images are fusion, the SPECT and MRI image fusion with MRSCFuion
connected by a line, and the average value of each method also has advantages. We can find that there are some artifacts
is marked in the legend. We can find that, for SCD, Q AB/F , or intensity distortions in fused images with the competitors.
FMI, and SF, the MRSCFusion achieves the highest score Specifically, the FusionGAN and DenseFuse over-preserve the
in all 21 fused images. In addition, for VIF, it achieves functional information of SPECT images, leading to artifacts.
higher scores for most single fused images than competitors The IFCNN can extract the texture information of MRI images
with the highest average values. The highest Q AB/F and SF well but suffers from capturing the metabolism information
indicate our fused images contain more edge information and of SPECT images. The EMFusion, U2Fusion, RFN-Nest, and
texture details. The superior SCD, FMI, and MS-SSIM reflect IFT also can achieve satisfactory fusion results, while they
our MRSCFusion preserves more meaningful information and cause a slight loss of details. In contrast, our MRSCFuion
structural similarities from source images; moreover, the best can retain the functional information of SPECT images well
VIF displays that our fused images have a better visual and meanwhile preserve structure details of MRI images;
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
XIE et al.: MRSCFusion: JOINT RESIDUAL SWIN TRANSFORMER AND MULTISCALE CNN 5026917
Fig. 13. Qualitative comparisons of MRSCFusion with other methods on three pairs of representative SPECT and MRI images. From left to right: SPECT
images, MRI images, fusion results of IFCNN, FusionGAN, U2fusion, DenseFuse, EMFusion, RFN-Nest, IFT, and MRSCFusion (ours).
Fig. 14. Quantitative comparisons of MRSCFusion with other methods on 21 pairs of SPECT and MRI images. Six evaluation metrics are used, and average
values with different methods are marked in legends.
TABLE III
Q UANTITATIVE C OMPARISONS OF MRSCF USION W ITH S EVEN C OMPETITORS ON PET-MRI F USION . S IX M ETRICS A RE S HOWN B ELOW (M EAN VALUE )
furthermore, it preserves more edge information and has a scores of each metric on 21 fused images are connected by
better visual effect. a line, and the average value of each method is marked in
The quantitative fusion comparisons of six evaluation met- the legend. We can find that, for SCD, Q AB/F , FMI, SF, and
rics on 21 test SPECT and MRI image pairs are reported VIF, the MRSCFusion achieves higher scores for most single
in Table IV and Fig. 14. In Table IV, we can find that the fused images than the other methods with the highest average
MRSCFusion achieves the optimal results (average values) on values. With the best SCD and FMI, the MRSCFusion can
most metrics except for MS-SSIM that ranks second, following preserve more meaningful information from source images. In
IFCNN with a narrow margin. In Fig. 14, the evaluation addition, the optimal Q AB/F , SF, and VIF indicate our fused
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
5026917 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
TABLE IV
Q UANTITATIVE C OMPARISONS OF MRSCF USION W ITH S EVEN C OMPETITORS ON SPECT-MRI F USION .
S IX M ETRICS A RE S HOWN B ELOW (M EAN VALUE )
Fig. 16. Convergence of the losses without the weight block (left plot) and
with the weight block (right plot).
Fig. 15. Qualitative comparisons on RSCF module analysis. From left and worth noting that, for SPECT-MRI fusion, the fused result with
right: MRI images, CT/PET/SPECT images, fusion images without RSCF, RSCF has a lower MS-SSIM than that without RSCF. This is
and fusion images with RSCF. because the GRDB in the RSCF module refines the edges
of color, which results in slightly lower similarities; however,
it effectively improves the Q AB/F and SF metrics.
images have clearer quality and better visual effects. Overall,
2) Analysis of AWB: In MRSCFusion, we employ the AWB
the MRSCFusion achieves better performance with the fused
to assign learnable weights wa and wb for controlling the
images containing more morphological information (texture
information preservation degree of source images. To validate
and edge information) and functional information.
the effectiveness, we perform comparison experiments without
the AWB, where weights wa and wb are both fixed to 0.5. The
D. Ablation Study experiments are carried out with 3000 pairs of images. It is
1) Analysis of RSCF Fusion Module: The proposed RSCF analyzed from two aspects: 1) the loss analysis and 2) the
fusion module boosts the comprehensive information with the qualitative and quantitative results analysis.
guidance of RSCF. We implement ablation studies on the The convergence of losses in both cases are shown in
fusion strategy to verify its special role. Specifically, we train Fig. 16. The left plot stands for the loss on each fusion task
a model only using an autoencoder and employ the “add” without applying the AWB, while the right plot indicates that
to replace RSCF as the fusion strategy, which means that with the weight block. Here, we perform 3000 iterations. It
source features are added to generate fused features. A typical is worth noting that 300 points are drawn on the horizontal
example with/without the RSCF fusion module is displayed in axis for exhibitions. We can find that the losses with the
Fig. 15. It can be noticed that the image edges obtained without weight block on different tasks are lower than those without
RSCF are not clear enough. In the fusion images on CT-MRI the weight block, which indicates that the weight block can
and PET-MRI fusion, the information is lost seriously. The reduce the loss of information.
fusion results of SPECT-MRI fusion are also unsatisfactory, The qualitative comparison results with/without the weight
and the edge information is still blurred. Since the structure block on different tasks are depicted in Fig. 17. The qualitative
of the RSCF fusion module captures both local features and results have shown that our model has two advantages. First,
long-range dependencies, it can simultaneously enhance the we can find that the fusion images with the weight block have
description of texture details and retain the intensity distribu- more details. The second is that the AWB can highlight the
tion of prominent structures. dense structure or color information in source images. This
The quantitative fusion comparisons of six evaluation met- advantage makes the fused images maintain high contrast.
rics for CT-MRI, PET-MRI, and SPECT-MRI fusion are From Table VI, we can also find that most evaluation metrics
reported in Table V. We can find that the fusion results with the with the weight block are higher than those without the weight
RSCF module surpass those without RSCF on the whole. It is block. The adaptive weights can assist the model in better
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
XIE et al.: MRSCFusion: JOINT RESIDUAL SWIN TRANSFORMER AND MULTISCALE CNN 5026917
TABLE V
Q UANTITATIVE C OMPARISON R ESULTS B ETWEEN W ITH /W ITHOUT RSCF M ODULE
Fig. 17. Qualitative comparisons on analysis of the AWB. The first two
columns are MRI and CT/PET/SPECT images, and the last two columns are
the fused images without/with the adaptive weight block.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
5026917 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
TABLE VI
Q UANTITATIVE C OMPARISONS ON A NALYSIS OF THE AWB
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
XIE et al.: MRSCFusion: JOINT RESIDUAL SWIN TRANSFORMER AND MULTISCALE CNN 5026917
TABLE VII
RUNNING T IME FOR D IFFERENT M ETHODS W ITH F USING T WO I MAGES OF S IZE 256 × 256 (U NIT: S ECONDS )
than the convolutional operations [47]. It is worth noting that [7] R. Singh, R. Srivastava, O. Prakash, and A. Khare, “Multimodal medical
our MRSCFusion can, however, run on common hardware image fusion in dual tree complex wavelet transform domain using
maximum and average fusion rules,” J. Med. Imag. Health Informat.,
devices, which achieves better performance with an acceptable vol. 2, no. 2, pp. 168–173, Jun. 2012.
computational cost. [8] J. Chen, X. Li, L. Luo, X. Mei, and J. Ma, “Infrared and visible image
fusion based on target-enhanced multiscale transform decomposition,”
Inf. Sci., vol. 508, pp. 64–78, Jan. 2020.
V. C ONCLUSION AND F UTURE W ORK [9] L. Yang, B. L. Guo, and W. Ni, “Multimodality medical image fusion
In this work, an end-to-end unsupervised fusion model is based on multiscale geometric analysis of contourlet transform,” Neu-
rocomputing, vol. 72, nos. 1–3, pp. 203–211, Dec. 2008.
proposed to tackle multimodal medical image fusion. We
[10] G. Bhatnagar, Q. M. J. Wu, and Z. Liu, “Directive contrast based
design a novel RSCF module that can effectively mine and multimodal medical image fusion in NSCT domain,” IEEE Trans.
fuse multiscale deep features. The proposed RSCF fusion Multimedia, vol. 15, no. 5, pp. 1014–1024, Aug. 2013.
module includes a global branch based on RSTB for cap- [11] J. Jose et al., “An image quality enhancement scheme employing
adolescent identity search algorithm in the NSST domain for multi-
turing the global contextual information, as well as a local modal medical image fusion,” Biomed. Signal Process. Control, vol. 66,
branch based on GRDB for capturing the local fine-grained Apr. 2021, Art. no. 102480.
information. To further effectively integrate more meaningful [12] J. Wang, C. Lu, M. Wang, P. Li, S. Yan, and X. Hu, “Robust face
information from source images and ensure the visual quality recognition via adaptive sparse representation,” IEEE Trans. Cybern.,
vol. 44, no. 12, pp. 2368–2378, Dec. 2014.
of fused images, we define a joint loss function, including
[13] X. Lu, B. Zhang, Y. Zhao, H. Liu, and H. Pei, “The infrared and
content loss and intensity loss, to constrain the RSCF fusion visible image fusion algorithm based on target separation and sparse
module; moreover, we introduce adaptive weights to control representation,” Infr. Phys. Technol., vol. 67, pp. 397–407, Nov. 2014.
the information preservation degree of source images. The [14] M. Yin, P. Duan, W. Liu, and X. Liang, “A novel infrared and
visible image fusion algorithm based on shift-invariant dual-tree complex
proposed model follows a two-stage training strategy, where shearlet transform and sparse representation,” Neurocomputing, vol. 226,
an autoencoder is trained to extract multiple deep features and pp. 182–191, Feb. 2017.
reconstruct fused images in the first stage. Then, the RSCF [15] K. Zhang, Y. Huang, and C. Zhao, “Remote sensing image fusion via
fusion module is trained to fuse the multiscale features in the RPCA and adaptive PCNN in NSST domain,” Int. J. Wavelets, Multires-
olution Inf. Process., vol. 16, no. 5, Sep. 2018, Art. no. 1850037.
second stage. The proposed model is evaluated on multiple
[16] Z. Wang and C. Gong, “A multi-faceted adaptive image fusion algorithm
medical fusion tasks, where we have achieved better results in using a multi-wavelet-based matching measure in the PCNN domain,”
both qualitative and quantitative evaluations compared to other Appl. Soft Comput., vol. 61, pp. 1113–1124, Dec. 2017.
state-of-the-art fusion methods. In future work, we will further [17] S. Singh and D. Gupta, “Detail enhanced feature-level medical image
fusion in decorrelating decomposition domain,” IEEE Trans. Instrum.
improve the performance of the proposed model and reduce the Meas., vol. 70, pp. 1–9, 2021.
computational cost. In addition, carrying out a comprehensive [18] C. Panigrahy, A. Seal, and N. K. Mahato, “MRI and SPECT image
clinical evaluation of medical applications is valuable future fusion using a weighted parameter adaptive dual channel PCNN,” IEEE
work, which will help to adopt our strategy. Signal Process. Lett., vol. 27, pp. 690–694, 2020.
[19] Y. Li, J. Zhao, Z. Lv, and J. Li, “Medical image fusion method by deep
learning,” Int. J. Cognit. Comput. Eng., vol. 2, pp. 21–29, Jun. 2021.
R EFERENCES [20] H. Zhang, H. Xu, X. Tian, J. Jiang, and J. Ma, “Image fusion meets deep
learning: A survey and perspective,” Inf. Fusion, vol. 76, pp. 323–336,
[1] K. Padmavathi, C. S. Asha, and V. K. Maya, “A novel medical Dec. 2021.
image fusion by combining TV-L1 decomposed textures based on
[21] H. Li, X.-J. Wu, and J. Kittler, “Infrared and visible image fusion using
adaptive weighting scheme,” Eng. Sci. Technol., Int. J., vol. 23, no. 1,
a deep learning framework,” in Proc. 24th Int. Conf. Pattern Recognit.
pp. 225–239, Feb. 2020.
(ICPR), Aug. 2018, pp. 2705–2710.
[2] P. Ganasala and V. Kumar, “Multimodality medical image fusion based
on new features in NSST domain,” Biomed. Eng. Lett., vol. 4, no. 4, [22] Z. Wang, Y. Wu, J. Wang, J. Xu, and W. Shao, “Res2Fusion: Infrared
pp. 414–424, Dec. 2014. and visible image fusion based on dense Res2Net and double nonlocal
attention models,” IEEE Trans. Instrum. Meas., vol. 71, pp. 1–12, 2022.
[3] G. Wang, W. Li, X. Gao, B. Xiao, and J. Du, “Functional and anatomical
image fusion based on gradient enhanced decomposition model,” IEEE [23] H. Li, X.-J. Wu, and T. S. Durrani, “Infrared and visible image fusion
Trans. Instrum. Meas., vol. 71, pp. 1–14, 2022. with ResNet and zero-phase component analysis,” Infr. Phys. Technol.,
[4] M. Yin, X. Liu, Y. Liu, and X. Chen, “Medical image fusion with vol. 102, Nov. 2019, Art. no. 103039.
parameter-adaptive pulse coupled neural network in nonsubsampled [24] Y. Liu, X. Chen, H. Peng, and Z. Wang, “Multi-focus image fusion with
shearlet transform domain,” IEEE Trans. Instrum. Meas., vol. 68, no. 1, a deep convolutional neural network,” Inf. Fusion, vol. 36, pp. 191–207,
pp. 49–64, Jan. 2019. Jul. 2017.
[5] S. Li, B. Yang, and J. Hu, “Performance comparison of different multi- [25] Y. Liu, X. Chen, R. K. Ward, and Z. Jane Wang, “Image fusion with
resolution transforms for image fusion,” Inf. Fusion, vol. 12, no. 2, convolutional sparse representation,” IEEE Signal Process. Lett., vol. 23,
pp. 74–84, Apr. 2011. no. 12, pp. 1882–1886, Dec. 2016.
[6] J. Jinju, N. Santhi, K. Ramar, and B. S. Bama, “Spatial frequency [26] K. R. Prabhakar, V. S. Srikar, and R. V. Babu, “DeepFuse: A deep
discrete wavelet transform image fusion technique for remote sensing unsupervised approach for exposure fusion with extreme exposure image
applications,” Eng. Sci. Technol., Int. J., vol. 22, no. 3, pp. 715–726, pairs,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017,
Jun. 2019. pp. 4724–4732.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
5026917 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
[27] H. Li and X.-J. Wu, “DenseFuse: A fusion approach to infrared and visi- [50] C. Feng, Y. Yan, H. Fu, L. Chen, and Y. Xu, “Task transformer
ble images,” IEEE Trans. Image Process., vol. 28, no. 5, pp. 2614–2623, network for joint MRI reconstruction and super-resolution,” in Proc.
May 2019. Int. Conf. Med. Image Comput. Comput. Assist. Intervent. (MICCAI),
[28] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, 2021, pp. 307–317.
“Densely connected convolutional networks,” in Proc. IEEE [51] X. Li, H. Chen, Y. Li, and Y. Peng, “MAFusion: Multiscale attention
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, network for infrared and visible image fusion,” IEEE Trans. Instrum.
pp. 2261–2269. Meas., vol. 71, pp. 1–16, 2022.
[29] H. Li, X.-J. Wu, and T. Durrani, “NestFuse: An infrared and visible [52] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “UNet++:
image fusion architecture based on nest connection and spatial/channel A nested U-Net architecture for medical image segmentation,” in
attention models,” IEEE Trans. Instrum. Meas., vol. 69, no. 12, Proc. Int. Workshop Multimodal Learn. Clinical Decis. Support, 2018,
pp. 9645–9656, Dec. 2020. pp. 3–11.
[30] J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, “FusionGAN: A generative [53] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
adversarial network for infrared and visible image fusion,” Inf. Fusion, quality assessment: From error visibility to structural similarity,” IEEE
vol. 48, pp. 11–26, Aug. 2019. Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[31] J. Ma, H. Xu, J. Jiang, X. Mei, and X.-P. Zhang, “DDcGAN: [54] T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” in
A dual-discriminator conditional generative adversarial network for Proc. Eur. Conf. Comput. Vis., 2014, pp. 740–755.
multi-resolution image fusion,” IEEE Trans. Image Process., vol. 29, [55] V. Aslantas and E. Bendes, “A new image quality metric for image
pp. 4980–4995, 2020. fusion: The sum of the correlations of differences,” AEU-Int. J. Electron.
[32] J. Ma, H. Zhang, Z. Shao, P. Liang, and H. Xu, “GANMcC: A generative Commun., vol. 69, no. 12, pp. 1890–1896, Dec. 2015.
adversarial network with multiclassification constraints for infrared and [56] C. S. Xydeas and V. Petrović, “Objective image fusion performance
visible image fusion,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–14, measure,” Electron. Lett., vol. 36, no. 4, pp. 308–309, 2000.
2021. [57] M. B. A. Haghighat, A. Aghagolzadeh, and H. Seyedarabi, “A non-
[33] Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, “IFCNN: reference image fusion metric based on mutual information of image
A general image fusion framework based on convolutional neural features,” Comput. Electr. Eng., vol. 37, no. 5, pp. 744–756, Sep. 2011.
network,” Inf. Fusion, vol. 54, pp. 99–118, Feb. 2020.
[58] Y. Han, Y. Cai, Y. Cao, and X. Xu, “A new image fusion performance
[34] H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, “U2Fusion: A unified metric based on visual information fidelity,” Inf. Fusion, vol. 14, no. 2,
unsupervised image fusion network,” IEEE Trans. Pattern Anal. Mach. pp. 127–135, Apr. 2013.
Intell., vol. 44, no. 1, pp. 502–518, Jan. 2022.
[59] K. Guo, X. Li, X. Hu, J. Liu, and T. Fan, “Hahn-PCNN-CNN: An end-
[35] H. Li, X. Wu, and J. Kittler, “RFN-Nest: An end-to-end residual to-end multi-modal brain medical image fusion framework useful for
fusion network for infrared and visible images,” Inf. Fusion, vol. 73, clinical diagnosis,” BMC Med. Imag., vol. 21, no. 1, pp. 1–22, Jul. 2021.
pp. 720–786, Sep. 2021.
[60] C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmen-
[36] H. Xu and J. Ma, “EMFusion: An unsupervised enhanced medical image tation for deep learning,” J. Big Data, vol. 6, no. 1, pp. 1–48, Jul. 2019.
fusion network,” Inf. Fusion, vol. 76, pp. 177–186, Dec. 2021.
[61] R. Livni et al., “On the computational efficiency of training neural
[37] H. Xu, J. Ma, Z. Le, J. Jiang, and X. Guo, “FusionDN: A unified densely networks,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2014,
connected network for image fusion,” in Proc. AAAI Conf. Artif. Intell. pp. 1–12.
(AAAI), Apr. 2020, pp. 1249–12484.
[62] Y. Liu, X. Chen, R. K. Ward, and Z. J. Wang, “Medical image fusion via
[38] C. Liu, B. Yang, Y. Li, X. Zhang, and L. Pang, “An information retention convolutional sparsity based morphological component analysis,” IEEE
and feature transmission network for infrared and visible image fusion,” Signal Process. Lett., vol. 26, no. 3, pp. 485–489, Mar. 2019.
IEEE Sensors J., vol. 21, no. 13, pp. 14950–14959, Jul. 2021.
[39] L. Tang, J. Yuan, and J. Ma, “Image fusion in the loop of high-level
vision tasks: A semantic-aware real-time infrared and visible image
fusion network,” Inf. Fusion, vol. 82, pp. 28–42, Jun. 2022.
[40] Y. Long, H. Jia, Y. Zhong, Y. Jiang, and Y. Jia, “RXDNFuse:
A aggregated residual dense network for infrared and visible image
fusion,” Inf. Fusion, vol. 69, pp. 128–141, May 2021.
Xinyu Xie received the B.E. degree in commu-
[41] H. Zhang, H. Xu, Y. Xiao, X. Guo, and J. Ma, “Rethinking the image
nication engineering from the University of South
fusion: A fast unified image fusion network based on proportional
China, Hengyang, China, in 2020, where she is
maintenance of gradient and intensity,” in Proc. AAAI. Artif. Intell.
currently pursuing the M.S. degree with the School
(AAAI), Apr. 2020, pp. 12797–12804.
of Electrical Engineering.
[42] Y. Liu, X. Chen, J. Cheng, and H. Peng, “A medical image fusion method Her research interests include computer vision,
based on convolutional neural networks,” in Proc. 20th Int. Conf. Inf. information fusion, and medical image processing.
Fusion (Fusion), Jul. 2017, pp. 1–7.
[43] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, and L. Jones, “Atten-
tion is all you need,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30,
2017, pp. 1–13.
[44] A. Dosovitskiy et al., “An image is worth 16×16 words: Transformers
for image recognition at scale,” 2020, arXiv:2010.11929.
[45] V. Vs, J. M. J. Valanarasu, P. Oza, and V. M. Patel, “Image fusion
transformer,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Oct. 2022,
pp. 3566–3570.
[46] Z. Liu et al., “Swin transformer: Hierarchical vision transformer using Xiaozhi Zhang received the Ph.D. degree from the
shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), School of Information Engineering, Guangdong Uni-
Oct. 2021, pp. 9992–10002. versity of Technology, Guangzhou, China, in 2018.
[47] A. Lin, B. Chen, J. Xu, Z. Zhang, G. Lu, and D. Zhang, “DS-TransUNet: From 2017 to 2018, he was a joint Ph.D. Stu-
Dual Swin transformer U-Net for medical image segmentation,” IEEE dent and a Research Assistant with the Department
Trans. Instrum. Meas., vol. 71, pp. 1–15, 2022. of Mathematics and Statistics, Curtin University,
[48] J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Tim- Perth, WA, Australia. He is an Associate Professor
ofte, “SwinIR: Image restoration using Swin transformer,” in Proc. with the School of Electrical Engineering, Univer-
IEEE/CVF Int. Conf. Comput. Vis. Workshops (ICCVW), Oct. 2021, sity of South China, Hengyang, China. His main
pp. 1833–1844. research interests include optimization algorithms
[49] C.-M. Feng, Y. Yan, G. Chen, Y. Xu, L. Shao, and H. Fu, and applications, machine learning, medical image
“Multi-modal transformer for accelerated MR imaging,” 2021, processing, and time-frequency analysis.
arXiv:2106.14248. Dr. Zhang is a regular reviewer for many conferences and journals.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
XIE et al.: MRSCFusion: JOINT RESIDUAL SWIN TRANSFORMER AND MULTISCALE CNN 5026917
Shengcheng Ye received the B.E. degree in software Bin Yang (Member, IEEE) received the B.S. degree
engineering from the University of South China, from the Zhengzhou University of Light Industry,
Hengyang, China, in 2023. Zhengzhou, China, in 2005, and the Ph.D. degree
His research interests include computer vision and in electrical engineering from Hunan University,
medical image processing. Changsha, China, in 2010.
He joined the School of Electrical Engineer-
ing, University of South China, Hengyang, China,
in 2010. He is a Full Professor with the School of
Electrical Engineering, University of South China.
His professional interests are information fusion,
pattern recognition, and image processing.
Dr. Yang has won one Second-Grade National Awards at Science of China
in 2019.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on October 01,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.