0% found this document useful (0 votes)
112 views62 pages

A Systematic Review of Transfer Learning in Software Engineering

Uploaded by

russojimpv04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views62 pages

A Systematic Review of Transfer Learning in Software Engineering

Uploaded by

russojimpv04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Multimedia Tools and Applications (2024) 83:87237–87298

https://doi.org/10.1007/s11042-024-19756-x

A systematic review of transfer learning in software


engineering

Ruchika Malhotra1 · Shweta Meena1

Received: 13 April 2023 / Revised: 16 June 2024 / Accepted: 23 June 2024 /


Published online: 27 July 2024
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024

Abstract
Nowadays, everyone requires a good quality software. The quality of software can’t be
assured due to lack of data availability for training, and testing. Thus, Transfer Learning
(TL) plays an important role in the reusability of existing software for developing new
software with a similar domain and task. TL focused on transferring knowledge from
existing prediction models for the development of new prediction models. The developed
models are used for unseen datasets based on the characteristics, and nature of the data-
set. The sufficient amount of training data is unavailable. The data distribution and task of
the source and target project must be checked before employing TL for software develop-
ment. In this Systematic Review (SR), we have investigated 39 studies from January 1990
to March 2024 that used TL in the software engineering domain. The review focused on
the identification of Machine Learning (ML) techniques used with TL techniques, types
of TL explored, TL settings explored, experimental setting, dataset, quality attribute, vali-
dation methods, threats to validity, strengths and weakness of TL techniques, and hybrid
techniques with TL. According to the experimental comparison, the performance of TL
techniques is encouraging. The findings of this SR paper will serve as guidelines for acad-
emicians, software industry experts, software developers, software testers, and researchers.
This SR is also helpful in the selection of appropriate types of TL and TL settings for the
development of efficient software in the future based on the type of problem and TL set-
ting. Thus, this study showed that 30.67% of the studies are focused on defect prediction,
that used 15% open-source dataset. Further, 35% of studies used SVM as a base classifier
for TL, and different independent variables of the used dataset are considered as prediction
model input. Further, the K-fold Cross-Validation (CV) method is used in 15 studies.

Keywords Transfer Learning · Machine Learning · Software Engineering · Cross-Project ·


Defect Prediction · Change Prediction · Effort Estimation · Software Quality · Search
Optimization · Evolutionary Techniques · Heuristic · Hypothesis Testing

* Shweta Meena
shwetameena@dtu.ac.in
Ruchika Malhotra
ruchikamalhotra2004@yahoo.com
1
Department of Software Engineering, Delhi Technological University, Delhi, India

13
Vol.:(0123456789)
87238 Multimedia Tools and Applications (2024) 83:87237–87298

1 Introduction

Nowadays, the demand for efficient software increasing rapidly. The development of cor-
rect and accurate software requires huge amounts of data for the development of predic-
tion models using latest techniques such as transformation of knowledge from one project
to another project named as TL. TL consists of two words, i.e., transfer and learning; the
name itself indicates that something is transferred by learning. TL means gathering knowl-
edge from other models, storing that knowledge, and transferring it to other models, which
would be helpful for other existing models. However, TL can be referred to as learning to
learn, life-long learning, knowledge transfer, inductive transfer, multitask learning, knowl-
edge consolidation, context-sensitive learning, knowledge-based inductive bias, meta-
learning, and incremental/cumulative learning [1]. Multitask learning is associated with
TL in such a way that it tries to learn multiple tasks simultaneously, which are not related
to each other. The approach followed by multitasking learning, which is different from
other types is that it uncovers some of the latest features which are related to each task.
The two main issues addressed in TL are what data is required to transfer, and how
will we transfer the data. That is why various TL algorithms are applied across a training
and target domain, which provides different knowledge to transfer, and the performance
improvement also varies in the various target domains. The objective is to explore and
develop the optimal TL algorithm that maximizes performance improvement in a reason-
able amount of time with minimum effort, and it requires exhaustive research and experts
in this area. A novel TL framework was developed known as Learning to Transfer (L2T).
The L2T framework is helpful in automatically determining what and how to transfer by
using past experiences in TL.

1.1 Motivation

Nowadays, mostly used techniques or methods are related to ML. It has numerous applica-
tions in different fields. ML works on the assumption that the training and testing data are
from a similar feature space. They have a similar probabilistic distribution. However, in
the real world, we have examined that the training and future data have a different feature
space and distribution. When the distribution of the training and future data changes, we
required to build the prediction models from starting using newly gathered training data. It
is not feasible to perform a recursive task of training data collection and then developing
efficient prediction models. It would reduce the time and effort in recollecting the training
data and help in transferring knowledge between problem domains.
There are various ML algorithms available in the existing literature. These ML algo-
rithms are used to construct the prediction model for transferring knowledge. In the exist-
ing literature, various ML techniques have been used to transfer knowledge from source
to target domain where data is unlabeled. In the existing study [2] authors has employed
Support Vector Machines (SVM), Decision Trees (DT), and Random Forest (RF) to trans-
fer knowledge. The use of these classification techniques provides benefits for transferring
knowledge and designing of a prediction model. A Deep Learning (DL)-based sketch rec-
ognition method [3] is proposed, which is used for different granular sketches for fine-tun-
ing different layers of neural networks.
The usage of TL techniques must be enhanced. However, it is required to consistently
outline the documentation of already developed techniques from the existing literature and

13
Multimedia Tools and Applications (2024) 83:87237–87298 87239

experiments. There does not exist any SR for the TL that focuses on the TL techniques in
software engineering. In this paper, an extensive review of studies between December 1990
and March 2024 is performed.
The objective of this SR is to analyze, outline, and evaluate the experimental proofs
concerning: (1) the quality attributes used for TL (2) ML techniques used for TL (3) the
experimental settings that have been used for TL such as datasets used for TL, independ-
ent variables used for TL, algorithms used for TL, validation techniques, performance
measures used for TL, statistical test used for TL (4) effectiveness of TL algorithm using
ML techniques (5) threats to validity using TL (6) advantages and disadvantages of TL
techniques.

1.2 Innovation

There are various guidelines provided to software developers, industry experts, and
research scholars concerning the usage of TL in knowledge transfer from one domain to
another domain. To complete our objective, we have examined different online libraries
and determined 39 relevant studies. These 39 studies are used to get the research ques-
tion answers related to ML techniques used with TL. We have also considered the quality
assessment criteria, inclusion and exclusion criteria for the selection of relevant studies.
With respect to TL categories, some authors explored heterogeneous TL [4]. However, the
researchers have found the answers of heterogeneous TL methods concerning target labels,
and surveyed methods used for heterogeneous TL. Furthermore, the authors discussed
methods for heterogeneous TL such as Directed Cyclic Network (DCN). Thus, the com-
parison among empirical studies was conducted considering different methodologies and
techniques for heterogeneous TL.
The sentiment analysis domain has wide uses of TL. In the recent study, a TL-based
approach [5] was designed by the researchers for offensive and hate speech detection on
social media platforms. ML models are used for the prediction of hate speech and offen-
sive language with TL. The experiment has been conducted using three different datasets.
Moreover, the performance was same for all three datasets using proposed methodology by
the researchers. Unigram and bigram-based ML models are considered as baseline mod-
els in the study. The approach used by the authors resulted in a more robust and efficient
methodology for the detection of hate speech and abusive language. Object detection also
used TL [6]. TL has been used for remote sensing image aircraft object detection. A faster
R-CNN algorithm is applied for natural images for remote sensing images. An ensemble
method [7] has been presented that combines features of pre-trained deep convolutional
neural networks (ResNet-50 and ResNet-101) for fruit freshness classification. The TL-
based approach outperformed in comparison to other methods but requires to reduce data-
set size. Human activity recognition also used TL in various ways. In the existing study,
the proposed approach based on TL is compared with existing state-of-the-art methods [8].
In the existing study [9], a novel algorithm was developed by the researchers termed as
Gravitational Search Algorithm (GSA). GSA is based on Gravity and the notion of mass
interaction. The existing algorithm performance was compared with GSA. Thus, the gravi-
tational search algorithm outperformed existing algorithms such as PSO, RGA, and CFO.
However, TL based approach outperformed in comparison to state-of-the-art methods. In
the existing study, deep TL-based methods are used to avoid a lack of data problem issues
[10].

13
87240 Multimedia Tools and Applications (2024) 83:87237–87298

The authors have compared pre-trained models with traditional ML models considering
data augmentation and deep TL methods. In the existing studies, TL is used in the software
engineering field considering defect prediction, change prediction, effort estimation, and
maintenance of software. Furthermore, TL is useful for future projects which exhibit simi-
lar characteristics or properties as existing projects. However, TL plays an important role in
designing prediction models.

1.3 Contribution

This SR mainly focused on the usage of TL in the software engineering domain including
literature of last 35 years, from January 1990 to March 2024. The research is limited to this
period, as the idea of TL emerged in 1990. The emergence of TL helped academicians,
software industry experts, and developers to develop more prediction models using avail-
able data. However, 122 studies are reviewed, and 39 studies are examined thoroughly after
an extensive search through various digital libraries mentioned further. This review pre-
sents and evaluates the empirical evidence of different TL settings, TL types, and search-
based techniques performance for TL in software engineering. Moreover, TL is a subset
of Cross-Project (CP) and cross-company models. The above techniques performance is
empirically validated with ML techniques. The objectives of this SR are to analyze the per-
formance of TL in software engineering including:

• Commonly used dataset.


• Commonly used ML techniques for TL.
• Tranfer learning type explored.
• Level of knowledge transferred with different TL settings.
• Software quality attribute that is improvised through TL.
• Performance metrics used for evaluation of TL model effectiveness.
• The statistical test is used for empirical validation.
• Strengths and weaknesses of these techniques.
• Potential threats to validity as reported by the existing studies.
• TL settings.
• TL algorithms.
• Validation methods used for CP.
• The predictive capability of TL concerning the ML algorithm as base learner.

1.4 Sections

Organization of SR: Section II discussed relevant existing studies/background work. Sec-


tion III summarized the problem statement with a mathematical explanation. Section IV
presented the methodology followed to conduct this review including research questions,
search strategy formation, and quality assessment criterion, contains the research questions
that are designed to analyze the existing literature related to TL, and extraction of data
from the collected studies. Section V presents details of selected primary studies including
publication year, source of publication, and publisher. Section VI discussed the answers to
the research questions. Section VII presents the limitations of SR. Section VIII discussed
applications of this SR. Section IX examines the conclusion and future work achieved from
the SR.

13
Multimedia Tools and Applications (2024) 83:87237–87298 87241

2 Background

In the existing literature, several studies exist in which TL is effectively applied for the
identification of defects, and change. Only one study exists to estimate the software effort
in the literature. A survey [11] was conducted to analyze the reinforcement learning with
TL and provided future directions with various TL domains, task differences, multi-task
learning, and methods to transfer knowledge for reinforcement learning. The authors sur-
veyed the usage of TL in bioinformatics [12] with existing growth of TL and considered
various aspects such as several key bioinformatics application areas, including sequence
classification, generic expression data analysis, biological network reconstruction, and
biomedical applications with 12 different domains and 25 primary studies. Furthermore,
authors [13] conducted a survey of TL using computational intelligence with TL using a
Neural Network (NN), TL using a Hierarchical Bayesian (HB) model, TL using a Bayesian
Network (BN), TL using a Genetic Algorithm (GA), and TL using fuzzy system with 30
studies. Moreover, a detailed survey [2] of TL technique is conducted with a main focus on
20 studies and the TL algorithm in the last 5 years.
A survey of [14] TL is conducted which studied the concepts and definitions related
to TL, with more focus on TL settings, TL approaches, TL applied to DL, types of TL
such as domain adaptation, domain confusion, multitask learning, one-shot learning, zero-
shot learning, meta-learning. The authors surveyed to analyze TL for activity recognition
[15] in terms of categorization of TL by sensor modality (video sequences, wearable sen-
sors, ambient sensors, crossing the sensor boundaries, physical setting boundaries), by the
difference between source and target data, data availability, and the information (instance
transfer, feature-representation transfer, parameter transfer, relational knowledge transfer)
that is transferred. In the existing study [16–20], authors also explored inclined planes
system optimizations that used Netwon’s second law of motion. The improved version of
inclined planes systems optimization algorithm is developed. Thus, authors tested and vali-
dated the improved version on 100 independent trails that showed success in 90% cases. It
is concluded that proposed algorithm improved inclined planes systems optimization algo-
rithm outperformed existing optimization algorithms in terms of various factors such as
estimated coefficients, convergence, fitness, output responses, noise analysis, stability, and
reliability.
The authors [21] conducted a survey of TL for collaborative recommendation auxiliary
data by introducing TL definitions, and categorization of TL techniques with different TL
strategies, and a novel and generic TL framework was introduced. The representative work
of the TL strategy is discussed in detail. The authors surveyed TL for smart home and new
user home automation adaptation [22], TL for sentiment analysis [23], TL for robot and
human automation [24], image style TL [25], decade TL survey [26], TL for edge com-
puting [27], TL for cluster classification [28], deep reinforcement TL survey [29], TL in
natural language processing [30], generalized survey of TL [31], TL for electronic vehicles
[32], TL for renewable energy [33], TL for machinery diagnostics and protagonistic[34],
learning of health sensing data using TL helpful in future medication and healthcare sector
[35]. Thus, it has been observed that all the surveys are conducted considering the speci-
fied domain in which the software engineering domain has not explored till now. Thus, the
objective of conducting SR for TL in software engineering arose. However, apart from the
software engineering domain, huge number of studies exist in other domains, and authors
have applied TL in various aspects. The usage of TL is limited in software engineering.
The usage of TL helps in the reusability of existing models or project data.

13
87242 Multimedia Tools and Applications (2024) 83:87237–87298

The authors [36] have analyzed the impact of TL using Cross-Project Defect Prediction
(CPDP) by identifying the linear relationship between software defects. However, a novel
method was proposed named Heterogenous CPDP (HCPDP) using Encoder Networks and
Ensemble Learning (ENEL). Furthermore, the performance of ENEL is analyzed using
precision, recall, G-mean, F1-score, and AUC. The performance, and quality of large
software are also enhanced using TL in the existing study [37]. To prevent software from
early aging in defect prediction, a Cross-Project-Based Aging (CPBA) related bug predic-
tion model was developed. Thus, a hybrid CPAP with feature TL and class imbalance was
proposed to remove early aging in software. The experiment is conducted and evaluated
on three different projects with two performance metrics such as AUC and balance meas-
ure. However, the proposed hybrid CPBA performed 9.0%, 32.9%, 4.4%, and 3.9% better
in comparison to state-of-the-art CPBA methods, namely TLAP, JPKS, SRLA, and JDA-
ISDA, respectively.
Furthermore, recently authors explored HCPDP in detail by reusing the existing pro-
ject dataset for the target project [38]. A novel approach named Multi-source HCPDP
(MHCPDP) was developed to reduce the difference between source and target project.
Thus, a multi-source TL algorithm was developed to improve the performance of a base
classifier by reducing the impact of negative transfer. The performance of MHCPDP was
evaluated on five different datasets using two performance metrics and resulted in perform-
ing better than the traditional HCPDP method. In the existing study, it was proved that
TCA does not perform well for CPDP. Thus, TCA + was developed resulting in reducing
the difference between the data distribution of source and target projects. Moreover, the
authors developed a novel method named as Two-Phase TL model (TPTL) [39] to over-
come the limitations of TCA + , and TCA. In the first phase, a source project estimator was
proposed for the automatic selection of two source projects with the highest distribution
similarity for target projects. In the second phase, two prediction models were developed
based on selected source projects in the previous phase. The performance of TPTL was
analyzed on 42 defect datasets from the PROMISE repository, and compared with state-of-
the-art methods. Thus, results showed that TPTP performed better than baseline methods
by 19%, 5%, 36%, 27%, and 11% in terms of F1-score; by 64%, 92%, 71%, 11%, and 66%
in terms of cost-effectiveness.
The authors proposed 62 CPDP models using parameter optimization with TL. Thus,
it has been concluded that automatic [40] parameter optimization for CPDP improves per-
formance by 77% with reasonable computational cost. However, CPDP [41] improvised
with weighted software modules via TL. The gravity-based analogy is used to assign train-
ing modules weights and compare them to the test set. Moreover, cost-sensitive C4.5 is
employed on weighted training data. The experiment was conducted using 10 NASA data-
sets and provided results as 0.81 PD value, 0.41 F-measure value, and 0.8 AUC value. The
performance of C4.5 CPDP model compared with Naïve Bayes (NB) CPDP model. The
authors experimented to overcome data distribution variation, by combining feature trans-
fer with EL [42].
The feature transfer method is introduced with two stages feature transfer and classifica-
tion. The experiment conducted on 20 source projects showed that a two-stage combina-
tion of feature transfer with EL outperformed. However, more studies conducted to deal
with feature selection and distance-weight instance transfer [43]. A new technique named
as Multi-WCM-WtrA is introduced and tested using AEEEM and ReLink dataset. Thus,
Multi-WCM-WTrA outperformed with an improvement of 23% in comparison with the
TCA + algorithm on AEEEM dataset and a 5% improvement on ReLink dataset. A novel
approach named as TSboostDF was proposed in consideration of knowledge transfer and

13
Multimedia Tools and Applications (2024) 83:87237–87298 87243

class imbalance issues [44] and the results proved that TSboostDF outperformed existing
TL methods. However, the testing resources and testing efficiency play important role in
software defect prediction. Thus, a novel approach was proposed named as Multi-Source
CPDP (MSCPDP) based on TL.[45]. However, existing MSCPDP approaches are not
open-source, due to which MSCPDP lab modifies state-of-the-art MSCPDP models with
unified structure, data processing, model training, and testing, 13 performance evaluation
metrics. Thus, the toolbox functionalities were presented in the study. It was analyzed that
TL performed effectively for CPDP. Moreover, two issues need to be addressed data distri-
bution difference between source and target data, and selection of single source projects.
Thus, authors [46] developed a Three-Stage Weighting Framework for Multi-Source TL
(3SW-MSTL). Furthermore, there exists a study that showed the it is not mandatory for
source and target data to follow similar distribution using TL [47].
The conditional distribution is not considered by the authors in the existing study. Thus,
a study [48] was conducted to develop a novel approach using conditional distribution
names as Balanced Distribution Adaptation (BDA). The effectiveness of BDATL was ana-
lyzed using 18 projects from four datasets using six performance metrics such as AUC,
Recall, G-mean, Balance, and F-measure. The performance of models developed using
BDA compared with 12 baseline methods showed improvement of 23.8%, 12.5%, 11.5%,
4.7%, 34.2%, and 33.7%.
In the existing studies, search-based techniques are also explored for CDP with TL.
Thus, the authors conducted a study [49] and proposed a novel multi-objective approach
for CPDP using logistic regression with a genetic algorithm. Moreover, instead of giving a
single predictive model to the software engineer, it is better to have multiple options with
a multi-objective approach to reduce compromise between defect-prone artifacts and LOC
to be analyzed, the result was validated on 10 datasets of the PROMISE repository. Fur-
thermore, the Nearest Neighbor (NN)-Filter, [50] embedded in a GA, is used for CPDP to
generate a training dataset in case of its non-availability. A new search-based approach was
proposed named Genetic Instance Selection (GIS) that optimizes the combined measure
of F—measure and G-Mean on a validation set of NN-filter. The performance is evaluated
on 13 datasets of the PROMISE repository, CPDP with GIS, mainly NN-filter and naïve
CPDP.
A novel approach [51] proposed considering manifold feature transformation. This
approach transforms actual features into manifold space and reduces the difference in data
distribution of the transformed source project and target project in the manifold space.
Furthermore, the transformed project would be used for the NB classifier. The experiment
was conducted with AEEEM, and ReLink dataset using F1-measure. In the existing study
[52], authors experimented with hybrid search algorithms for CPDP and Within-Project
DP (WPDP). In the existing study, [53] authors used the Kernel Twin SVMs (KTSVMs)
to implement Domain Adaptation (DA) to match the distributions of training data for dif-
ferent projects. However, KTSVMs with DA function (called DA-KTSVM) are also used
for CPDP. Further, paraterms are also optimized using the Quantum Particle Swarm Opti-
mization algorithm (QPSO), and the optimized DA-KTSVM is called DA-KTSVMO. The
experiment was conducted on 17 open-source software projects. It was concluded that DA-
KTSVMO performed better to utilize sufficient data knowledge and easily reuse defective
data to improve the prediction performance of DA-KTSVMO.
The authors [54] used fuzzy means clustering to estimate similarity between source and
target features. The experiment was conducted on six projects of five heterogeneous data-
sets and concluded that the Quantum Crow Search Optimized Intuitionistic Fuzzy C Means
Clustering (QCSO-IFCMC) achieved higher accuracy to avoid local average in comparison

13
87244 Multimedia Tools and Applications (2024) 83:87237–87298

to existing clustering models. Further, the authors conducted a study of CPDP using [55]
two-phase feature importance amplification. The authors [56] conducted CPDP through
Hybrid Feature Selection (HFS). Thus, the strength of RF and recursive feature elimination
methods are used to select relevant features. The experimental result showed 78% average
accuracy of all prediction models using HFS for CPDP.
The authors [57] developed a novel feature selection method based on GA with two
stages such as feature selection, and EL. The feature selection stage selects features using
the integrated training results of candidate feature subsets for the training set to obtain the
optimal set and in the ensemble training phase, the EasyEnsemble method is used to allevi-
ate the class imbalance problem, multiple NB classifiers. However, the proposed GA-based
algorithm improves the performance average F1-score value by 38.9%, 31.6%, 35.1%,
22.0%, and 31.6%. The features are selected optimally through a search-based optimizer for
CPDP [58] by integrating the Artificial Neural Network (ANN) filter, KNN filter, Random
Forest Ensemble (RFE) model, GA, and classifiers as manipulative independent variables.
The authors analyzed [59] the predictive capability of the firefly algorithm for the selection
of a minimal number of metrics and providing them as input to SVM classifiers. However,
the fitness function of the firefly algorithm was modified to maximize the performance in
terms of accuracy and minimize the number of metrics. Furthermore, the Hybrid Firefly
(HFF) algorithm or Weighted FCM Firefly Search (WFCMFF) approach is proposed to
find a better set of metrics to further improve the performance of defect prediction. Thus,
the FF algorithm and the Stochastic Weighted FCM Search (SWFCMS) algorithm com-
bined to select the better set of metrics. Thus, the proposed model improved the perfor-
mance from 86.27% to 93.26%.

Reference Study Main Findings Future Work

[21] Highlighted the importance of leverag- Develop more heterogeneous knowledge


ing auxiliary data to enhance recom- transfer algorithms and strategies for
mendation accuracy and efficiency. It better efficiency, and effectiveness of TL
integrated parameter transfer for feature models. Develop unified frameworks for
matching and adversarial adaptation, integrating heterogeneous auxiliary data
TL methods explored in a collaborative and scalable solutions to handle large and
recommendation. The authors concluded diverse datasets. Design multi-objective
that TL-CRAD represents a promising recommendation models. Integration of
research direction that has the potential the above suggestions helps TL-CRAD to
to significantly advance in the field of advance recommendation systems in the
recommendation systems. era of big data and artificial intelligence.
[22] The current state of smart home develop- Investigating more advanced user engage-
ment was explored, and opportunities ment strategies, such as participatory
for enhancing the adaptation process design approaches, could lead to more
were identified. However, the authors user-centric smart home solutions.
concluded that emphasis more on user Exploring novel TL methodologies
feedback early in the design phase to tailored specifically for smart home
improve usability and acceptance of applications could enhance adaptation
smart home systems better cater user performance in diverse real-world sce-
needs and preferences. narios. Furthermore, emerging technolo-
gies can be integrated with smart home
IoT-based appliances, and artificialial
intelligence based devices.

13
Multimedia Tools and Applications (2024) 83:87237–87298 87245

Reference Study Main Findings Future Work


[23] Highlighted the need for exploration of Explore more cross-domain TL in aspect
cross-domain TL in the aspect of extrac- extraction to leverage its full potential.
tion and addressed challenges associated Address the effect of negative transfer
with negative transfer in TL for text problems in TL for text analysis.
analysis. Additionally, aspect-level senti-
ment analysis focusing on short texts
with semantic richness is identified as a
more prominent domain in the future.
[25] Examined various image style transfer Advancements in DL methods to enhance
techniques, including both Non-DL transfer speed and quality, with a focus on
and DL approaches. DL methods, integrating high-level semantic informa-
inspired by these algorithms, have seen tion into the transfer process.
rapid development, with a focus on
enhancing transfer speed and quality.
However, the subjective evaluation of
experimental results remains prevalent,
and highlighting the need for a standard-
ized evaluation metric for image transfer
quality in future research.
[27] Investigated the potentialities and chal- Collect more peer-reviewed studies and
lenges of DTL and Edge Computing in experimental results to enhance the under-
mitigating the COVID-19 pandemic. standing of these technologies effective-
ness in pandemic mitigation like COVID-
19. Collaboration between scientific
communities worldwide will be crucial in
developing comprehensive strategies to
combat future pandemics effectively.
[22] Demonstrated application of DTL tech- Focus on training models with larger
niques for the morphological classifica- datasets, including classifications from
tion of star clusters using HST images. multiple galaxies. Standardized datasets
It is concluded that better accuracy of human-labelled star cluster classifica-
for classification, robust performance tion must be developed to use for future
with various training dataset scenarios, network training. The main objective of
and different image sizes. Presented future work is to focus on advancing the
advancement in the field of PHANGS- field of star cluster evolution by leverag-
HST, and provided a proof-of-concept ing DL techniques. The study provided a
for the automation of star cluster clas- way to explore the applications of DTL
sification using DTL. large dataset classification including the
new PHANGS–HST data.
[30] Discussed TL techniques in natural Explore novel algorithmic enhancements
language processing, focused on key and architectures for enhancements in
classification algorithms such as BERT, model performance and efficiency. Inves-
GPT, ELMo, RoBERTa, and ALBERT. tigate the adaptation of TL techniques for
However, an in-depth analysis of their specific natural language processing tasks
core principles, methodologies, and per- and domains, and explore the integra-
formance benchmarks is accomplished. tion of multimodal data sources for more
Highlighted the diverse strategies robust learning.
utilized to leverage pre-trained models
for efficient learning.

13
87246 Multimedia Tools and Applications (2024) 83:87237–87298

Reference Study Main Findings Future Work


[31] Focused on TL settings, and type of TL. Focus more on heterogeneous TL, including
The authors observed that existing transferring knowledge across different
studies used inductive and transductive domains, and tasks via various feature
settings, majorly along with interest in space. TL can be further applied to
unsupervised TL. The issue of negative various domains such as video classifica-
transfer is addressed which requires tion, social network analysis, and logical
the development of methods to ensure inference.
transferability between source and target
domains or tasks while avoiding detri-
mental effects.
[32] The authors conducted a comprehensive Focus on improving the performance of
and detailed review of DTL techniques sensor robustness, develop advanced DTL
from various perspectives including techniques, and conduct experiments
intelligent vehicle perception, covering with high-quality benchmark datasets in
perception tasks, benchmark datasets, complex driving scenarios using more
and domain distribution discrepancies. international-level hardware and software
packages to increase the effectiveness.
Increase collaboration among various
companies and research organizations.
[33] Analyzed the effectiveness of TL for the Explore TL across different model types,
renewable energy system. TL investi- and incorporate heterogeneous data. How-
gated in renewable energy system by ever, more efforts are required to validate
focusing on a forecast of energy con- TL-based methods in diverse weather and
sumption, prediction of cross-building topographical features in geographical
energy consumption, and fault diagnosis. areas.
Highlighted growth of TL for renewable
energy systems. It is identified that the
TL model relies more on feature and
parameter transfer.
[34] Surveyed to analyze the effectiveness of Focus on using more advanced TL architec-
cross-domain TL methods in machinery tures, DL methods, and novel techniques
diagnostics and prognostics. Further, TL for regression tasks such as useful life
methods are categorized into categories prediction, deserve exploration. Explore
based on different aspects. Emphasized TL for system health monitoring, domain
the importance of fault diagnosis and shift, and data heterogeneity TL in the
its application in useful life prediction. healthcare system.
However, the additional information
on open-source machinery datasets is
included to enhance researcher and prac-
titioners interest in this domain.
[35] Analyzed the application of TL in the Focus more on using TL for large datasets
digital health sector. TL methods and with more advanced technologies, for
Federated Learning (FL) together real-time data adaptation and edge device
contributed more for achieving high deployment. Design solutions for dealing
prediction accuracy. However, alongwith with DA challenges, and ensuring model
TL the privacy and security concerns to interpretability. In addition to that for
medical data are also taken care. equitable healthcare, it is important to
mitigate bias in TL models. Thus, such
advancements will help the healthcare
sector to grow more in terms of patient
care, and effective treatment before the
deadline.

13
Multimedia Tools and Applications (2024) 83:87237–87298 87247

Reference Study Main Findings Future Work


[36] A novel approach termed ENEL was Future research will focus on further
proposed for HCPDP. Thus, the main validating the effectiveness of the pro-
objective was to mitigate negative trans- posed approach by employing additional
fer during TL. However, cost-sensitive software defect datasets and HCPDP
learning was analyzed considering the baselines for comprehensive evaluation.
imbalanced dataset. Evaluate model Furthermore, alternative class imbal-
performance on 16 datasets from 4 pro- ance handling methods will be explored
jects using PD, F1-score, G-mean, and to enhance the performance. Continued
AUC. It is concluded that the proposed experimentation and refinement contribute
approach ENEL worked better than the to the advancement of HCPDP method-
existing HCPDP and WPDP methods in ologies.
the literature.
[37] Kernel Principal Component Analysis Further validation of the proposed method’s
(KPCA) developed for CPDP which effectiveness will be pursued through
mitigates the impact of negative transfer. the collection and analysis of additional
Subsequently, Double Marginalized datasets. Additionally, the exploration of
Denoising Autoencoders (DMDA) alternative techniques and enhancements
are used for global and local feature to address evolving challenges in CPBA
representations, enhancing transfer- prediction remains an area of future
ability across software projects and research.
class discriminability. The combination
of KPCA and DMDA reduces the data
distribution differences. However, the
imbalanced dataset and overlap chal-
lenge is addressed through the K-means
Clustering Cleaning Ensemble (KCCE).
Experimental results on Linux, MySQL,
and NetBSD datasets demonstrate the
robust performance.
[38] MHCPDP was proposed to address the Develop a solution for addressing class
low feature correlation. The autoencoder imbalance issues with TL. Further, more
algorithm is updated for the selection investigation is required to deal with
of specified instances, reducing the imbalanced datasets.
difference between various projects
used for training and testing. Further-
more, a multi-source TL algorithm was
introduced to mitigate negative transfer
effects and utilize multiple-source pro-
ject datasets for enhanced training.
[40] The impact of automated parametric The insights of research provide valuable
optimization was observed by conduct- guidance for future work concerning the
ing experiments across different projects CPDP field. In the future, developers must
using TL. However, the usage of auto- focus on the development of an advanced
mated parameter optimization enhanced optimizer for CPDP, focusing on param-
the performance by 77%. However, eter TL.
parameter optimization is one of the
best techniques for improving CPDP
performance.

13
87248 Multimedia Tools and Applications (2024) 83:87237–87298

Reference Study Main Findings Future Work


[41] The performance of software defect pre- In the future, focus on enhancing the TL-
diction models is improved by compar- based approach proposed in this study.
ing training and testing software metrics, Investigating the applicability of other ML
and computation of weights for training algorithms and cost-sensitive techniques
software modules. Integration of the results in an even more accurate predic-
C4.5 classifier using weighted training tion model.
data provides better results compared
to Naive Bayes (NB) with an average of
PD, F-measure, and AUC values of 0.81,
0.41, and 0.8 respectively.
[42] A CPDP method is proposed that com- Parameter settings must be optimized for the
bines feature transfer and EL to address optimization of TL algorithms. EL meth-
feature distribution variability and class ods must be expanded for the inclusion of
imbalance. Three experiments were base classifiers, such as SVM, and linear
conducted on five projects from the regression. Furthermore, the performance
AEEEM dataset, and the effectiveness of EL methods must be analyzed such as
of the proposed method was analyzed. It AdBoost, and Bagging for CPDP.
is concluded that significant improve-
ments in feature transfer, and classifica-
tion stages compared to classical CPDP
methods.
[43] A novel approach named WCM-WTrA Explore adaptive thresholds and more
was proposed for CPDP, integrating both sophisticated feature importance cal-
feature selection and distance weight culations in the feature selection stage.
instance transfer. Further, an enhanced Analyze the performance of the proposed
version of WCM-WTrA is proposed algorithm on a larger dataset.
considering multiple sources for defect
prediction. Thus, distance weight
instance transfer and feature selection
play an important role in CPDP.
[44] A novel TSboostDF, a TL algorithm Investigate the effect of TSboostDF of
developed for CPDP. A sampling multi-source TL on CPDP. Integrating
method integrated with the TL tech- knowledge from multiple source projects,
nique. Thus, TSboostDF demonstrates classifiers can potentially achieve even
superior performance compared to tradi- better performance on CPDP tasks.
tional CPDP. The success of TSboostDF
highlighted the potential of leveraging
multi-source TL for further enhancing
CPDP solutions.
[45] Researchers used MSCPDP, for develop- In the future research can be conducted
ing the MSCPDP Lab. It offers research- on maintaining the proposed tool box or
ers and practitioners a comprehensive creating a more efficient method.
toolbox in the MATLAB environment.
MSCPDP is helpful in advancements of
both academic and industry for software
defect prediction applications.
[46] A proposed novel method based on a In the future, focus more on validating
weighted framework using TL for the generalizability of 3SW-MSTL on
CPDP. The proposed approach with additional datasets, particularly real-
source selection, instance reweighting, world datasets, to assess its applicability
and multi-source data utilization trains in diverse scenarios. Efforts are required
the prediction model effectively. How- to understand and mitigate the higher PF
ever, experimental results showed the value observed in 3SW-MSTL compared
performance of 3SW-MSTL. The overall to the WPDP method
performance of the proposed approach
is evident.

13
Multimedia Tools and Applications (2024) 83:87237–87298 87249

Reference Study Main Findings Future Work


[47] A cross-project bug-type prediction In the future, various program analysis
framework is introduced based on the tools, including SVF for C programs,
TrAdaBoost TL method. The proposed SUPA for C/C +  + programs, and tools for
framework is used to predict bug types Java and Python programs, will be used
in projects with limited labeled data for the extraction of relevant source code
using bug reports from another project. metrics. Further, the detailed insight helps
The proposed framework was tested on in bug type prediction and improves the
four projects. Thus, the investigation effectiveness of the proposed framework.
identified influential prediction factors
such as source-target project pair, and
the size of the source project data.
[48] A novel method BDA was introduced Focus on integrating and developing addi-
for addressing the challenges of CPDP. tional strategies to address the issue of an
BDA mainly reduces the differ- imbalanced dataset in the BDA frame-
ence between data distribution and work. Further, efforts will be made to
conditional space. According to the determine the value of optimal parameter
experiment, it is concluded that BDA λ for BDA across different benchmark
outperformed 12 baseline methods by datasets, by ensuring robust and adaptable
23.8%, 12.5%, 11.5%, 4.7%, 34.2%, and performance.
33.7% across six evaluation indicators.
[49] The proposed multi-objective defect pre- Future research endeavors aim to explore
diction approach, employing a logistic various cost-effectiveness models beyond
regression model trained with vNSGA-II KLOC, incorporating alternative indica-
Genetic Algorithm, offers a novel tors reflecting the cost of analysis and
strategy for achieving a compromise testing. Additionally, the focus will shift
between precision and recall in defect towards measuring the effectiveness of
prediction. By producing a Pareto front defect prediction in terms of number of
of predictors, enables software engineers defects covered, rather than the number
to choose configurations that align with of defect-prone classes. Lastly, investi-
their specific needs, balancing code gations will be conducted to assess the
inspection cost and defect prediction. feasibility of integrating the proposed
Overall, the approach shows promise for multi-objective approach with existing
both CPDP and WPDP scenarios. local prediction methods, potentially
enhancing overall prediction performance
and flexibility
[53] Semi-Supervised CPDP (SSCPDP) utiliz- Explore more approaches, and perform
ing DA has gained significant attention comparison with existing approaches
in recent years. However, in this study, to understand their applicability and
KTSVMs with DA capability were characteristics across different domains.
introduced. It is optimized by Improved Additionally, the study aims to explore
Quantum-behaved Particle Swarm and promote application of diverse DA
Optimization (IQPSO). Moreover, the techniques in software engineering and
proposed approach matches the data other relevant fields
distribution across various projects.
Thus, the evaluation of the proposed
approach on 17 open-source software
projects demonstrates the superiority of
the proposed model over a benchmark.

13
87250 Multimedia Tools and Applications (2024) 83:87237–87298

Reference Study Main Findings Future Work


[54] The paper introduces a novel two-stage Future research efforts can explore further
approach for high-classification CPDP enhancements and applications of the pro-
combining evolutionary algorithm based posed QCSO-IFCM approach in HCPDP.
feature selection and uncertainty One avenue is to investigate the scalability
theory-based unsupervised modeling. and robustness of the approach across
Unlike conventional methods focusing diverse datasets and software development
solely on removing irrelevant features, environments. Additionally, refining the
this approach integrates quantum crow QCSO algorithm and exploring alternative
search optimization to balance local and uncertainty-based modeling techniques
global search for optimal feature subsets. may further improve the accuracy and
efficiency of defect prediction.
[55] A two-phase model was introduced named Focus on validation of TFIA performance
TFIA for addressing the imbalance data- by experimenting on diverse datasets such
set and data distribution issue in CPDP as AEEEM dataset. However, the scal-
by integrating DA and correlation-based ability and robustness of TFIA must be
feature selection method. The perfor- enhanced through various other methods
mance of the proposed approach was in different domains. Explore parameter
analyzed on AEEEM dataset, by valida- optimization for fine-tunning of models
tion using an RF classifier. and automatic adaptation of varying
scenarios applications
[56] CPDP was explored by performing More advanced technologies, methods,
experimentation on 14 projects from and techniques must be explored for
the PROMISE repository. However, the CPDP. The parameters of HFS must be
usage of HFS specifically RF and RFE optimized, and refined for improving the
was examined and proved to achieve prediction accuracy.
higher accuracy. However, multi-class
characteristics of different versions of
the PROMISE dataset help in the selec-
tion of relevant features. Furthermore,
Convolutional Neural Network (CNN) is
also explored for CPDP, and resulted in
higher prediction accuracy. The authors
validated the result using the Wilcoxon
test. Thus, it is proved that HFS tech-
niques impact CPDP model performance
for multi-class datasets.
[60] Proposed novel feature selection method Future research will focus on optimizing the
for CPDP based on GA named as GAFS. GA to reduce search time while maintain-
The existing limitations of CPDP space ing effectiveness. Additionally, efforts
distribution are overcome through will be directed towards enhancing the
GAFS. The GAFS approach demon- performance of defect prediction methods
strates superior prediction performance by providing more fine-grained results and
in comparison to traditional methods. explanations for predictions.
However, further optimization of the
genetic algorithm to reduce search time
and refinement of defect prediction
methods for more granular results and
interpretability are identified as future
research areas.

13
Multimedia Tools and Applications (2024) 83:87237–87298 87251

Reference Study Main Findings Future Work


[61] The multi-class nature of the PROM- There is a need to explore additional deter-
ISE CPDP repository was revealed minants for further CPDP methodologies
through exploratory data analysis. and establish more reliable and accurate
The performance of defect prediction methods for selection of testing data.
models is improvised through proposed Further investigation into the use of ckloc
feature selection methods, and optimiz- features and other promising feature selec-
ers, particularly with the utilization of tion methods contributes to improvize the
ckloc features. However, ANN filter quality and performance of defect predic-
and RFE optimizer outperform existing tion models.
approaches, showcasing the potential for
enhancing defect prediction perfor-
mance.
[59] A novel approach WFCMFF introduced Explore further enhancements to feature
for feature selection and demonstrates selection techniques like WFCMFF.
its effectiveness in improving the defect Investigate the applicability of improved
prediction model performance using WFCMFF in diverse domains beyond
various classifiers. Thus, using SVM defect prediction. Explore the integra-
with WFCMFF enhanced classification tion of hybrid algorithms with advanced
accuracy, recall, precision, and F-meas- ML models to achieve better prediction
ure values are observed. Moreover, the accuracy and robust models.
application of HFF algorithm results in a
significant accuracy improvement from
90.27% to 93.26%, attributed to feature
sets generated through FF algorithm.
The feature selection techniques with a
hybrid algorithm enhanced the software
quality and performance through impro-
vised DP accuracy.

3 Problem statement

3.1 Context

The objective of this section is to explain the computation involved in completing this
study. The study was conducted to analyze the usage of TL in the software engineering
domain. Thus, good quality software must be designed and developed using TL in the
absence of a sufficient amount of training data. Consider two projects A and B of two dif-
ferent companies Amazon (X) and Flipkart (Y). A prediction model (PMX) needs to be
developed using XA data and that PMX must be reused to develop a prediction model for
company Y. However, the data distribution of both companies is not similar DSX ≠ DSY.
Thus, company B is required to check the similarity among the domains of X and Y data.
Based on the domain, and data distribution of X and Y, PMX will be used for designing
PMY through YB using different types of TL such as feature transfer, instance transfer, rela-
tional knowledge transfer, and parameter transfer.
Domain: SE.

1 Research Questions {RQ1, ­R2,RQ3, ……………..RQi}.


2 Total Studies: TS = { S1, S2, S3……….Si}.
3 Search Strategy: SS = {SS1, SS2, SS3, SS4, ………​………​………, SSi}.
4 Primary Studies: PS = {PS1, PS2, PS3, PS4, ………​………​………, PSi}.

13
87252 Multimedia Tools and Applications (2024) 83:87237–87298

5 I/EC = {IC1, IC2, IC3, ………….ICn, EC1, EC2, EC3, EC4,………​………, ECi}.
6 QAC = {QAC1, QAC2, QAC3, ………….QACi}.
7 If si ∈ ICn, select Si else remove si from the Si.
8 Compute QACj, if QAC values for PSa are greater than 7.5 then select PSa else reject
PSa.
9 DS = {DS1, DS2, DS3, ………….DSa}.
10 FInta of RQp from PSa.
n

OutcomeofSystmaticReview = RQi ∗ SSi ∗ ICi ∗ ECi ∗ QACi ∗ PSi ∗ DSi
i=1

This SR is most useful for industry experts, software developers, and academicians, to
reuse the existing models for the development of similar models using TL. In software
engineering domain, the availability of training data is reducing everyday. Due to this other
project data with similar characteristics to some extent are useful in the development of an
efficient prediction model.

4 Review methodology

This section discusses the procedure followed in completing this SR provided by Kitchen-
ham [62]. This procedure has three different stages in which this review has been carried
out. The three stages of the methodology used are review planning, review organization,
and describing the results of an SR are discussed.
The procedure illustrated in Fig. 1. concerning the novelty of the Kitchenham meth-
odology [62] in the software engineering domain the following aspects are considered:
relevance to the research question, establishment of best practices, and adaptability. The
SR methodology is based on a selection of primary studies based on the relevance of
research questions designed. A structural framework is provided with the extraction
of relevant studies from the large set of studies and synthesized results from the pri-
mary study. In the software engineering domain, various studies referred to Kitchen-
ham guidelines for conducting the SR using specified procedures of data extraction in a

Fig. 1  Procedure to conduct systematic literature review [62]

13
Multimedia Tools and Applications (2024) 83:87237–87298 87253

time-bound manner. The reliability, and effectiveness of the Kitchenham methodology


have been demonstrated in a large number of studies, which strengthens its utility in our
review. However, the methodology of conducting SR is not new, but the application of
established guidelines and standard protocol ensures the reliability and validity of our
SR findings. Thus, the novelty of the methodology used is the applicability of the estab-
lished methodology to address research questions and conduct research in the future.
There are various steps followed in the review protocol: identification of research
questions, the design of a search procedure, selection criteria for primary studies, qual-
ity evaluation criteria, steps for extraction of relevant data from key studies, and data
combination procedure. The review protocol construction is followed by sequential
steps that were carried out for conducting this review. The first step is designing sci-
entific questions that provide a solution to the problems in the SR. In the second step,
the search procedure, which includes the identification of search words and sources on
the web to identify the prime studies for this SR. The third step involves the identifica-
tion of key studies based on the research questions that are designed in the first step.
The inclusion and exclusion benchmarks for the primary studies are also included in the
third step. In the fourth step, we have identified the quality assessment criteria or the
procedure by constructing the questionnaire for quality evaluation to study and evaluate
the existing studies related to the topic of SR. The fifth step includes the data extrac-
tion forms, that is used to gather relevant data for finding the answer to the questions
designed, and in the sixth step, we design different methods for combining data. The
first step of planning the review stage plays a key role in conducting SR. It decreases the
probability and danger of research learning. We have used various studies related to the
topic of SR to design review rules. In further sections, we have discussed research ques-
tions and procedure to conduct and manage the review systematically.
Various electronic databases were explored for the collection of primary studies are
as follows:

ACM Digital Library (https://​dl.​acm.​org) (23 studies collected)


Google Scholar (https://​schol​ar.​google.​com/) (34 studies collected)
IEEE Xplore (https://​ieeex​plore.​ieee.​org) (37 studies collected)
ScienceDirect (https://​www.​scien​cedir​ect.​com) (14 studies collected)
SpringerLink (www.​sprin​ger.​com) (12 studies collected)
Wiley Online Library (www.​wiley.​com) (02 studies collected)

The flowchart for conducting SR with mathematics is presented in Fig. 2. Accord-


ing to [62], after identification of a need to conduct SR, the studies are collected from
various digital libraries such as ACM (23), Google Scholar (34), IEEE (37), ScienceDi-
rect (14), Wiley (2), and SpringerLink (12). Furthermore, the inclusion/exclusion crite-
rion is designed for the extraction of relevant studies (100). In the next step, the quality
assessment criterion is designed to analyze the quality of selected studies from the pre-
vious step concerning abstract, introduction, motivation, methodology used, experiment
result, performance measures used, limitations, and conclusion (39). Thus, SR is con-
ducted with 39 studies.
Algorithm for Kitchenham SR Methodology:

1 Need for SR.


2 Define Research Questions (RQs) that will be addressed by the SR.

13
87254 Multimedia Tools and Applications (2024) 83:87237–87298

Fig. 2  Flowchart of systematic


review procedure with math-
ematical expression

3 Development and Execution of Search Strategy or Search String (SS).


4 Design Inclusion/Exclusion (IEC) and quality assessment criterion (QAC).
5 Selection of Primary Studies (PS) based on quality assessment criterion. PS referes to
primary studies selected from the total studies after applying quality assessment crite-
rion and Inclusion/exclusion criterion. PS is the total count of studies used for conductng
the SR.
6 Extraction of Relevant Data (RD) from selected PS studies.
7 Synthesize extracted data to address RQs.
8 Interpret the findings (IFind) of the SR from the step (vii). IFind refers to the data syn-
thezied from the PS (selected in step 5). After selection of PS from ­TSi, the answers of
RQs addressed through findings or data extraction from the selected relevant studies in
order to study the existing literature.

4.1 Formation of research question

The objective behind conducting the SR is to study, analyze, and evaluate actual documen-
tation of the studies using various techniques of ML, different TL techniques, and various
approaches that are used by these TL techniques. Table 1 presents six research questions
focused on the SR. We have analyzed these studies using different quality attributes that

13
Table 1  Research questions formation for this systematic review
RQ_# Research Questions Motivation

RQ1 Which quality attributes are used for TL? Determine quality attributes used
RQ2 Which kind of ML techniques are used for TL? Determine various classes of ML techniques that have been used for knowledge transfer
RQ3 What experimental settings have been used for TL? Identify the experimental setup in which the experiment was conducted
RQ3.1 Which datasets have been used for TL? Identify datasets used for TL
RQ3.2 Which independent variables have been used for TL? Identify the independent variables
Multimedia Tools and Applications (2024) 83:87237–87298

RQ3.3 Which algorithms have been used for TL? Identify the efficient used TL algorithm
RQ3.4 What validation techniques have been used for TL? Identify the various validation methods that are used
RQ3.5 Which performance measure has been used for TL? To check the performance of ML techniques for TL
RQ3.6 What statistical test has been used for TL? Identify a statistical test that is reported to be appropriate for TL
RQ3.7 Which category of TL has been used? Identify the category of the TL method
RQ4 Which TL methods are found to be effective using ML techniques? To explore the effective TL technique using results provided by various evaluation measures
RQ5 What are the threats to validity for TL? Identify the types of threats to validity used
RQ6 What are the advantages & disadvantages of various TL techniques? Examine the information about TL techniques
87255

13
87256 Multimedia Tools and Applications (2024) 83:87237–87298

have been used for TL (RQ1). We have analyzed various ML techniques that are used for
TL in different studies (RQ2). In the third RQ, we have studied the experimental settings
that have been used in the primary studies. The third RQ summarizes the datasets used
for TL, independent variables used for TL, algorithms used for TL, validation techniques
used for TL, the performance measure used for TL, the statistical test used for TL, type
of knowledge transferred among source and target data (RQ3). In RQ4, we have studied
the TL algorithms which are performed to be effective using ML techniques (RQ4). In
RQ5, we have summarized the threats to validity of TL (RQ5). The sixth research question
identifies the strengths and weaknesses of various TL techniques used in the primary stud-
ies (RQ6). The last question guides software practitioners, researchers, professionals, and
industry experts about the appropriate ML techniques for TL.

4.2 Search strategy and various criteria used for the selection of primary studies

We have selected key studies by using various search terms or words. We have formed
these search terms or words by incorporating similar terms or words and alternative terms
or words using ‘OR’ boolean expression and joining the main search terms or words using
‘AND’ boolean expression. We have presented some of the search terms or words that are
used to recognize primary studies:
((“Transfer” OR “transfer learning” OR “transfer knowledge” OR “knowledge
transfer” OR “transfer of learning”) AND (“variables” OR “parameters”) AND
(“machine learning” OR “support vector machine” OR “neural network” OR
“ensemble learning” OR “random forest” OR “decision tree” OR “naive bayes”
OR “CART” OR “bayesian network”) AND (“cross-project” OR “cross-com-
pany”) AND (“defect” OR “change” OR “effort” OR “maintenance” OR “soft-
ware quality” OR “software quality improvise”) AND (“improved” OR “better” OR
“enhanced”) AND (“validation” OR “empirical” OR “design” OR “development”)
AND (“evolutionary” OR “search” OR “optimized” OR “heuristic” OR “particle
swarm” OR “harmony search” OR “simulated annealing” OR “bat search” OR
“swarm intelligence” OR “firefly search” OR “gravitational search” OR “inclined
planes sytem” OR “bio-inspired” OR “genetic algorithm” OR “Grey wolf” OR
“cuckoo serach” OR “ant colony” OR “artificial bee colony”) AND (“method”
OR “technique” OR ‘algorithm” OR “variant” OR “model”) AND (“dataset” OR
“database”) AND (“cross-validation” OR “hold-out validation”) AND (“statisti-
cally” OR “validated” OR “statistical” OR “statistical test” OR “paired test” OR
“wilcoxon” OR “ANOVA”)).
The ML related search terms or words are extracted from the ML-based research pub-
lications. We have selected the digital portals after identification of the search terms or
words. Some of the digital portals are accessible at the university only. Various electronic
databases were explored for the collection of primary studies. The electronic databases
that are mentioned above used for the selection of key studies. The combination of search
strings, terms, and words is used by these electronic databases. For the selection of key
studies, these electronic databases have been used. We restricted search from December
1990 to March 2024, the development of ML techniques started in 1990. Firstly, we have
to choose the electronic databases that are required to explore. The search procedure for
the identification of primary studies has been performed. The second step is to identify the

13
Multimedia Tools and Applications (2024) 83:87237–87298 87257

relevant studies by accessing complete text papers. The second step consist of the inclusion
and exclusion criteria discussed in a further section.
The empirical studies related to ML techniques for TL are also included in this SR. We
have identified 39 key studies that are included in this SR. The studies are selected based
on the inclusion/ exclusion criterion. This criterion is as follows:
Criteria to include studies:

• Experimental studies for TL using different ML techniques.


• Empirical studies are relevant to the software engineering field.
• Empirical studies have a combination of ML and non-ML techniques.

Criteria to exclude studies:

• Empirical studies are not related to TL in software engineering.


• Empirical studies do not describe the experimental investigation.
• Empirical studies not provided the results of ML techniques for TL.
• Review studies.
• Empirical studies which are not written in the English language.
• Empirical studies have a similar author in the conference and the existing version has
been extended in the journal.
• Chapters of TL.

We have tested the inclusion and exclusion criteria mentioned above. We have reviewed
the complete paper in case of a doubt whether the study should be included or excluded.
Also, the quality of the studies is identified based on the importance of the research ques-
tions. Next, the final studies are obtained by applying the quality evaluation criteria men-
tioned in the following section.

4.3 Quality evaluation criteria

In this section, the formation of a quality evaluation questionnaire has been discussed.
These questionnaires are used to study the purpose and strength of selected primary stud-
ies. The quality assessment criteria were designed by considering the guidelines and sug-
gestions provided in the existing studies [63]. We have used quality assessment criteria for
assigning a particular weight to every study. Table 2 presents quality evaluation criteria.
We have decided on three parameters corresponding to each question, which are based on
whether the particular study answers the question or not. If the study answers the question,
then we tickmark corresponding to the yes parameter. If the study does not answer the
question, then we tick the mark corresponding to no parameter. Every question has been
assigned some rank as 1 (yes), 0.5 (partly), and 0 (no). The summation of values assigned
to each question provides the final score corresponding to each study. The maximum and
minimum score of every study is 13 and 0.

4.3.1 Questions for quality evaluation

Each study is assessed corresponding to quality questions by assigning a score (0, 0.5, or
1). The scores were categorized into various classes like very high (10.1 ≤ scores ≤ 13),
high (7.6 ≤ scores ≤ 10), medium (5.1 ≤ scores 7.5), low (2.6 ≤ scores ≤ 5.0), and very low

13
87258 Multimedia Tools and Applications (2024) 83:87237–87298

Table 2  Quality evaluation criterion


Q# Quality questions Yes (1) Partly (0.5) No (0)

Q1 Does the stated aim of the research is clear? 25 7 7


Q2 Does the definition and usage of quality attributes is clear? 28 5 6
Q3 Does the explanation of the experimental setup is clear? 29 8 2
Q4 Does it specify the independent variables used? 21 13 5
Q5 Does the proper data size used? 28 3 8
Q6 Do the ML techniques are clearly defined? 27 7 5
Q7 Does the performance measure been clearly stated and used? 25 14 0
Q8 Does the validation techniques used in the study? 24 10 5
Q9 Does the comparative analysis been conducted (ML vs. TL)? 27 8 4
Q10 Do the stated results, findings, and conclusions are clear? 21 15 3
Q11 Does the study add some contribution to the literature? 28 7 4
Q12 Does the stated limitations or threats to validity are clear? 26 8 5
Q13 Does the study use a repeatable research methodology? 30 3 6

(0 ≤ scores ≤ 2.5). The highest and lowest score that could be given to a study was 13 and
0. In the next sub-sections, we have selected primary studies with an identifier. We want to
analyze the studies by the information contained or based on the experiment that they have
performed. The mathematical model used for quality evaluation criterion can be analyzed
from Table 2. In Table 2, the quality questions designed for analyzing the quality of each
study mentoned in column 2, corresponding to each column quality questions there is a rat-
ing in terms of Yes, Partly, and No with score 1, 0.5, 0. Thus, the quality of each study
analyzed by answering 13 quality questions on a scale of 1 to 0. After computation of qual-
ity scale, the quality score of each study is calculated based on the score assigned to each
quality question. The studies having quality score lies in the range of 7 to 13 are selected
for further data extraction process in order to address the answer of each RQs are presented
in Table 3.

4.4 Extraction and Synthesis of data

The form has been filled out to extract data from primary studies. Data extraction form is
mainly used to design the RQ, or we can say that for the identification of primary studies
that answer the RQ. In the data extraction form, we have summarized the details about
every study, such as the author’s name, the title of a primary study, publisher details, exper-
imental settings, dataset details, independent variables, validation techniques, and ML
techniques used in the primary studies. We have gathered details about primary studies
using data extraction. In the data extraction card, we have stored the details about RQs that
are answered by a particular primary study. We have stored results in the form of an Excel
file. These results are useful for the data synthesis process in the future.
After data extraction, the next step is data synthesis. The role of data synthesis is basi-
cally to gather factual data and evidence from the selected primary studies. This factual
data and figures are combined to answer the RQs. Some of the primary studies stated iden-
tical and comparable viewpoints, or they may prove different things by performing dif-
ferent experiments. We have studied and analyzed the quantitative and qualitative data in
this review. The quantitative data consists of different values such as values for evaluation

13
Multimedia Tools and Applications (2024) 83:87237–87298 87259

Table 3  Primary studies


Primary Study # Reference No Quality score Primary Study # Reference No Quality score

PS1 [64] 11.5 PS21 [65] 10


PS2 [66] 12 PS22 [67] 12
PS3 [68] 10 PS23 [69] 7
PS4 [70] 7 PS24 [71] 11.5
PS5 [72] 9 PS25 [73] 11.5
PS6 [74] 11 PS26 [75] 11.5
PS7 [76] 11.5 PS27 [77] 13
PS8 [78] 11.5 PS28 [79] 8
PS9 [80] 12.5 PS29 [81] 10
PS10 [82] 9 PS30 [83] 11
PS11 [84] 9.5 PS31 [85] 10.5
PS12 [86] 9 PS32 [87] 9
PS13 [88] 10.5 PS33 [89] 10.5
PS14 [90] 11 PS34 [91] 7
PS15 [92] 8.5 PS35 [93] 8.5
PS16 [94] 12.5 PS36 [46] 9
PS17 [95] 8 PS37 [39] 11.5
PS18 [96] 10 PS38 [48] 13
PS19 [97] 7 PS39 [47] 10
PS20 [98] 7.5

measures like precision, recall, accuracy, AUC, F-measure and error rate. The qualitative
data consists of experimental setup, different ML techniques, data sets used in primary
studies, empirical validation methods, strengths, and weaknesses of various TL techniques/
algortihms. We have used tables for presenting and discussing the results with pictorial
representation to answer RQs such as line chart, boxplot, bar graph, and pie graph.

5 Primary Studies

This section provides a summary of selected primary studies. The total primary studies that
we have selected are 39 out of 122 studies, which are related to TL in the software engi-
neering field, used ML techniques, and appropriate validation techniques for TL. Some of
the studies used public, proprietary, or open-source datasets.

5.1 Source of publication

In Table 4, we have summarized the details of the publications. The primary studies are
published in the top journals and conferences that are presented in Table 5. Table 4 con-
tains the count of primary studies and percent of primary studies corresponding to journals
and conferences mentioned in the table. The conferences and journals having the highest
publications are the International Conference on Artificial Intelligence, International Con-
ference on ML, Neural Information Processing Systems, International Conference on Tools

13
Table 4  Summary of publications
87260

Publication name (Transaction/Journal/Conference/Proceedings/Workshop/Symposium Type (Conference/ Journal/Transaction/ # of Studies Percent


name) Symposium)

13
NIPS Conference: Advances in Neural Information Processing Systems Conference 2 5.13
International Conference on Machine Learning Conference 4 10.26
AAAI Conference on Artificial Intelligence Conference 5 12.82
International Conference on Software Engineering (ICSE) Conference 1 2.56
Empirical Software Engineering Journal 1 2.56
International Journal of Pervasive Computing and Communications Journal 1 2.56
Asia–Pacific Symposium on Internetware Symposium 1 2.56
ESEC/FSE’15 Conference 1 2.56
International Conference on Reliability Systems Engineering (ICRSE) Conference 1 2.56
ASE 16 Conference 1 2.56
International Conference on Tools with Artificial Intelligence Conference 2 5.13
International Conference on Machine Learning and Applications Conference 1 2.56
ACM Transaction Intelligent System Technology Transaction 1 2.56
IEEE Access Journal 2 5.13
The Journal of Systems & Software Journal 1 2.56
Brazilian Conference on Intelligent Systems Conference 1 2.56
International Conference on Bioinformatics and Biomedicine (BIBM) Conference 1 2.56
Asia–Pacific Software Engineering Conference Conference 1 2.56
AIP Conference Proceedings Conference 1 2.56
IEEE Transactions on Software Engineering Transaction 2 5.13
Mitsubishi Electric Research Laboratories Conference 1 2.56
IEEE Conference on computer vision and pattern recognition Conference 1 2.56
Computing Research Repository (CoRR) Journal 1 2.56
Information and Software Technology Journal 3 7.69
Multimedia Tools and Applications (2024) 83:87237–87298
Table 4  (continued)
Publication name (Transaction/Journal/Conference/Proceedings/Workshop/Symposium Type (Conference/ Journal/Transaction/ # of Studies Percent
name) Symposium)

Computer Science and Technology Journal 1 2.56


Software Quality Journal Journal 1 2.56
Multimedia Tools and Applications (2024) 83:87237–87298
87261

13
87262

13
Table 5  Top publication venues with impact factor
Publication name (Transaction/Journal/Conference/Proceedings/Workshop/ Type (Conference/ Journal/Trans- # of Studies Percent Impact Factor
Symposium name) action/Symposium)

IEEE Transactions on Software Engineering Transaction 2 5.13 9.322


ACM Transaction Intelligent System Technology Transaction 1 2.56 5
IEEE Access Journal 2 5.13 3.9
Information and Software Technology Journal 3 7.69 3.862
Empirical Software Engineering Journal 1 2.56 3.762
The Journal of Systems & Software Journal 1 2.56 2.829
Computer Science and Technology Journal 1 2.56 1.9
Software Quality Journal Journal 1 2.56 1.9
AAAI Conference on Artificial Intelligence Conference 5 12.82 -
International Conference on Machine Learning Conference 4 10.26 -
NIPS Conference: Advances in Neural Information Processing Systems Conference 2 5.13 -
International Conference on Tools with Artificial Intelligence Conference 2 5.13 -
Multimedia Tools and Applications (2024) 83:87237–87298
Multimedia Tools and Applications (2024) 83:87237–87298 87263

with Artificial Intelligence, and the Computing Research Repository (CoRR). We have
seen that most of the studies are presented at the conference. One-third of the total primary
studies are presented in a journal that is 33%, and two-third part of the total primary stud-
ies are published in conferences that are 67%. However, the count of studies published in
Journals (13) is much less than that of the studies published in conferences (26).

5.2 Publication year

In Fig. 2, we have presented the categorization of studies during the period from 1990 to
2024. Figure 3 depicts that there is a continuous increase in the number of studies from
2015 onwards. We have examined that the number of studies increased in the years 2009,
2012, 2014, 2015, 2016, 2017, 2018, 2020, and 2024. One of the datasets that are common
in most of the primary studies is the 20 Newsgroup dataset, and there is continuous use of
accuracy as a performance measure in most of the primary studies. We have collected and
compiled the complete data until March 2024.
The summary of publications according to the type, count, and percentage is summa-
rized in Table 4. Most of the existing studies are published in A* international conference.
The journal and conference are influential, highly reputed, and recognized in the field of
software engineering. However, PS1, PS29, and PS30 are published in top transactions.
Further, PS9, PS13, PS14, PS21, PS22, PS23, PS34, PS36, PS37, PS38, and PS39 are pub-
lished in reputed Journal such as Software Quality, Empirical Software Engineering, and
IEEE Access.
The existing studies are also published in reputed journals with high impact factor at
reputed publishers such as transactions on Software Engineering, and ACM Transaction
Intelligent System Technology represented in Table 5. The publication of existing studies
for TL increased in subsequent years represented in Fig. 3.
However, it has been noticed that TL is mostly explored in the field of defect prediction
compared to change prediction, effort estimation, and maintainability prediction. The field
of TL resulted in the identification of defects in future projects using existing defect predic-
tion models. However, the feasibility of TL for estimating effort is conducted. Moreover,
researchers are experimenting to release a successful study for effort estimation using TL.
The 20 Newsgroup dataset utilization increased from 2015. Also, most of the studies in the
increased year are due to the use of ML techniques or different TL algorithms using dif-
ferent ML techniques as a base learner. Many studies determined the effectiveness of ML
techniques for TL. We have easily decided which studies have to be included or excluded
in this way many of the studies are published for change prediction in software engineering

Fig. 3  Yearwise studies pub-


lished

13
87264 Multimedia Tools and Applications (2024) 83:87237–87298

Fig. 4  Distribution of primary


studies according to the type
(Conference/ Journal)

Table 6  Software and hardware requirement

Hardware Requirement
S.No Hardware Type Specification
1 Processor 11th Gen Intel® Core™ i7-12,700 2.10 GHz
2 RAM 16.0 GB
3 Storage 1 TB SSD
4 Display 27" diagonal, FHD (1920 × 1080)
Software Requirement
S.No Software Type Specification
1 System type 64-bit Operating System, × 64-based processor
2 Operating System Windows 11 Pro
3 Microsoft Excel Excel 2016
4 Mendeley Mendeley Desktop 1.19.8
5 SPSS IBM SPSS Statistics 21

for identifying the change proneness nature of the software using TL. It is also presented
that the studies mostly used SVM as an ML technique. Furthermore, the distribution of
studies according to conference and Journal type is presented in Fig. 4.

6 Results and discussions

In this section, we have summarized the results obtained from the selected primary studies.
The specified hardware and software used for conducting this SR are provided below in
Table 6.

6.1 RQ1: What quality attributes used for TL?

In this section, we have discussed the quality attributes that are used by the various stud-
ies. Some quality attributes, such as effectiveness, performance, reliability, effort, change,
and defect. However, it has been observed that most of the studies are now focused on
defect prediction. Defect prediction in the software engineering field plays an important
role before deploying software at the end user site. Thus, software developers and software
tester team ensures that the software is free from any kind of defect in the deployment
stage. Furthermore, developers need to take care of defect proneness in future projects with

13
Multimedia Tools and Applications (2024) 83:87237–87298 87265

the help of existing project datasets using TL for knowledge transfer in projects with simi-
lar data distribution, and similar tasks.
The most commonly used attribute out of all the quality attributes is performance. It
has been used in 14 (PS1, PS3, PS4, PS10, PS11, PS19, PS20, PS21, PS22, PS24, PS25,
PS26, PS27, PS34) studies. The authors have analyzed the performance of an algorithm
that has been developed or used in the study. The next frequently used attribute is effec-
tiveness, which has been used in 10 (PS2, PS5, PS6, PS7, PS8, PS12, PS14, PS16, PS32,
PS33) studies. The defect attributes have been used in 13 studies (PS9, PS15, PS17, PS18,
PS23, PS28, PS29, PS30, PS35, P36, P37, P38, P39) out of the selected studies. The defect
attribute is used to analyze the effect of TL on defect prediction in software engineering.
The authors have analyzed the effect of defect prediction using TL, whether it is predicted
or not. The effort attribute has been used in 2 studies (PS13, PS39) out of 39 studies that
have been used for this review. It has been checked that it is feasible to build transfer learn-
ers for effort estimation [88]. It has been observed that TL is effective for defect predic-
tion. TL can estimate the effort across time as well as space. Furthermore, in recent years,
authors focused more on defects using WPDP and CPDP. Thus, the occurrence of any kind
of change leads to defects in the current version, and subsequent version of that software.
Thus, developers are required to collect all the requirements from the user in the initial
stage. Further, if there would be any kind of change requested by the customer then it’s fea-
sibility must be checked and approved by the Change Control Board (CCB). Furthermore,
the quality attributes identified in the primary studies are presented in Fig. 5. The descrip-
tion of quality attributes provided in Table 7.

6.2 RQ2: Which kind of ML techniques are used for TL?

The current section summarizes the details of ML techniques that are used in the selected
primary studies. We have categorized ML techniques into five different categories such as
SVM, DT, EL, Bayesian Learners (BL), NN, and miscellaneous. Table 8 summarizes the
number of studies and the percentage of studies that used ML techniques. Out of all the
ML techniques that have been mentioned in the above table, most of the techniques are
from such categories as SVM, EL, DT, and BL examined in 35.90%, 28.21%, 25.64%, and

Fig. 5  Type of quality attribute used by various studies

13
87266 Multimedia Tools and Applications (2024) 83:87237–87298

Table 7  Description of quality attributes


Quality attribute Description

Effectiveness This attribute has the ability to provide the desired output or the capability of providing
the desired output.
Performance This attribute provides the system output by doing some work for a particular period.
Reliability This attribute is related to the characteristics which deal with the software potential to
maintain its performance level under certain conditions which are stated in a certain
period.
Effort This quality attribute tells the reasonable amount of time required in developing a
particular software (in terms of person-hours or money).
Defect Proneness This quality attribute is defined as an error made in the source code or the logic in
the source code that can lead to crashing or can produce imprecise/ unpredicted
outcomes.

25.64% of studies respectively. It has been observed that SVM is widely used with TL in
14 studies for software engineering (PS2, PS5, PS6, PS9, PS10, PS11, PS16, PS19, PS22,
PS24, PS26, PS30, PS31, PS37), with LSVM, SVDD, KNND, MSVMs. Futher, EL cat-
egory ML techniques are used increasingly (PS16, PS17, PS18, PS19, PS24, PS29, PS30,
PS34, PS36, PS37, PS38) in 11 studies with RF, VAWBSVM, Adaboost, SGD classifier,
Gradient Boosting classifier. DT category is used in 10 studies (PS1, PS14, PS15, PS21,
PS22, PS24, PS29, PS34, PS37, PS38) with C4.5 and CART variants. Furthermore, the
distribution of studies in terms of percentage is presented in Fig. 6. The ML techniques
mentioned in the table are used for TL. Figure 7 (a, b, c, and d) presents the distribution of
studies according to ML category and types.

6.3 RQ3: What experimental settings have been used for TL?

This section identifies the datasets, independent variables, algorithms, validation tech-
niques, performance measures, and statistical tests used for TL in the selected primary
studies.

6.3.1 Which datasets have been used for TL?

There are various types of datasets used for TL studies. Figure 8. presents the number
and percentage of studies that used various types of datasets. All datasets have a different
nature. Private datasets consist of data that was collected by other researchers and data
collected by other agencies for evaluation or research purposes. Private datasets are not dis-
tributed among researchers and due to this private datasets are not verified and repeatable
by the researchers. Public datasets are freely available. Thus, it has been concluded that
more proprietary datasets and academic datasets must be used for future experimentation.
Also, the exhaust dataset does not provide very efficient results in such cases. It is always
advisable to use more industry-oriented datasets that help researchers to understand and
study datasets in more detail.
The various categories of used datasets are as follows:

• AEEEM dataset: This dataset is collected by D’Ambros. AEEEM dataset is a com-


monly used dataset concerning to a software defect. This dataset consists of various

13
Table 8  Category of machine learning techniques used in primary studies
Category of ML classifier Type Percentage of No. of studies
studies

SVM TSVM: Transductive SVM, MSVMs: Multiclass SVM, LSVM: Linear SVM, SVDD: Support Vector 35.90 14
Domain Data Description, KNND: k-nearest Neighbor Data Description
DT CART, C4.5 25.64 10
Multimedia Tools and Applications (2024) 83:87237–87298

EL RF, VAWBSVM: Value Aware Boosting with SVM, SGDClassifier, Gradient Boosting Classifier, AdaBoost 28.21 11
Classifier
BL WNBC: Weighted NB classifier, NB: Naive Bayes, BN: Bayesian Networks 25.64 10
K-NN Nearest Neighbor 17.95 7
Miscellaneous SR: Softmax Regression, LR: Linear Regression, MTL: Multitask learning, ST: Self—training, Logistic 12.82 5
Regression, GBM: Graph-based methods, MR: Manifold regularization
87267

13
87268 Multimedia Tools and Applications (2024) 83:87237–87298

Fig. 6  Distribution of machine learning techniques used in the primary studies

metrics such as change metrics, existing defects metrics, code metrics, the entropy of
changes metrics, and the entropy of source code metrics. It consists of 61 metrics and
5386 instances [65, 97, 99]. This dataset is used in 11% of the primary studies (PS11,
PS16, PS17, PS18, PS29, PS30, PS34, PS36, PS38).
• MAGIC Gamma Telescope dataset: The MAGIC Gamma dataset, also known as MAG.
This dataset is resourced from the repository of UCI ML. This dataset is in the form of
binary classification, which has various instances and numerical value attributes. This
dataset is used in 4% of the primary studies (PS19, PS24).
• MovieLens dataset: This dataset is collected by GroupLens. It is a movie rating dataset
on a scale of 1 to 5. It provides the rating dataset that is available on the MovieLens
web site. Users provide rating for each movie during different time intervals. This data-
set is used in 2% of the primary studies (PS7, PS8).
• NASA dataset: This dataset is publicly available. NASA repository stores this dataset,
and the NASA metrics data program maintains this dataset. All datasets in the NASA
repository act as a particular NASA computer software or sub-part of software. This
software consists of data regarding defect marking and metrics related to source code.
Metrics related to source code consist of length, understandability, and complexity,
which are associated with software quality. This dataset is used in 15% of the primary
studies (PS4, PS9, PS13, PS16, PS17, PS18, PS23, PS28, PS29, PS30, PS34, PS36,
PS38).

• ReLink dataset: This dataset contains information regarding defects. The information
stored in this is manually proven and improved. ReLink contains 26 complexity met-
rics. These metrics are used for defect forecasts. The ReLink dataset has different fea-
tures like time interval, bug owner, change committer, and text similarity. It consists of
a total of 26 features and 658 instances [100]. This dataset is used in 9% of the primary
studies (PS11, PS16, PS17, PS18, PS29, PS30, P36, PS38).
• Reuters-215782: This dataset was collected by Carnegie Group, Inc. and Reuters, Ltd.
during the period of developing CONSTRUE text categorization system. This dataset is
one of the commonly used datasets for text categorization. It is defined as the collection
of various documents that are available on the Reuters commercial newswire system. It
has five top divisions and many subdivisions. This dataset is used in 5% of the primary
studies (PS2, PS5, PS6, PS10).

13
Multimedia Tools and Applications (2024) 83:87237–87298 87269

Fig. 7  Division of sub-categories of ML techniques in (a) SVM (b) EL (c) BL (d) Miscellaneous

13
87270 Multimedia Tools and Applications (2024) 83:87237–87298

Fig. 8  Dataset used

• SOFTLAB: This dataset is collected by a Turkish software company having three dif-
ferent datasets, which are AR3, AR4, and AR5. The dataset stored by the company
acts as a controller for different software installed in home appliances like a washing
machine, a dishwasher, and a refrigerator, respectively. The used datasets from SOFT-
LAB and NASA are obtained from the PROMISE repository [99]. This dataset is used
in 7% of the primary studies (PS9, PS16, PS17, PS28, PS30, PS38).
• Synthetic dataset: This dataset is created with the process of data protection and data
privacy. Dataset is used in 2% of the primary studies (PS26, PS31).
• PROMISE repository dataset: This dataset is freely accessible in the PROMISE reposi-
tory. This repository is created to enhance the use of prediction models in software
engineering. This dataset it is used in 6% of primary studies (PS15, PS23, PS28, PS35,
PS37).
• UCI 20 Newsgroups dataset: This dataset was collected by Ken Lang, for his work.
This dataset is a collection of 20,000 newsgroup reports, and they are equally divided
among other 20 different newsgroups. It is a widely used dataset to experiment with
text applications of ML algorithms, document categorization, and document clustering.
This dataset is used in 9% of primary studies (PS2, PS3, PS6, PS10, PS18, PS20, PS21,
PS22).
• Others: This category consists of different dataset which is concerned with movies,
image datasets like imagine, dataset concerned with species like inaturalist, and some
real datasets like amazon product reviews. It is used in 29% of primary studies (PS2,
PS3, PS4, PS6, PS7, PS8, PS10, PS11, PS12, PS13, PS14, PS16, PS17, PS19, PS20,
PS21, PS24, PS25, PS26, PS27, PS30, PS31, PS32, PS33, PS39).

6.3.2 Which independent variables have been used for TL?

In this section, we have discussed different independent variables used in each study pre-
sented in Table 9. Various independent variables have been used by the selected primary
studies like number of features, classes, Object-Oriented (OO) metrics, Halstead metrics,
and Chidamber & Kemerer (CK) metrics. It is observed that CK metrics are mostly used
CK metrics.

13
Table 9  Independent variables used
Independent Variables Primary Studies Independent Variables Primary Studies

Information Measure Metric (IM Metric) PS1 Number of test samples PS20
Number of classes in email PS2 Performance metrics PS21
The vocabulary of words and a summary of documents PS3 Train Pivot Predictors PS27
Eigenvector PS6 Attributes PS9, PS23
Regularization parameters, Number of feature clusters k, Number of PS10 PS15, PS18, PS30, PS35
nearest neighbors
Multimedia Tools and Applications (2024) 83:87237–87298

Defect prediction metrics PS11 Line of Code (LOC) PS30


Tradeoff parameter λ, Feature corruption probability p PS12 Static code metrics PS29
Number of instances, number of labels PS14 OO metrics PS11, PS16, PS17, PS18, PS29, PS30, PS34, PS36, PS38
Common metrics, Company-specific metrics PS16 Halstead Metrics PS4, PS9, PS13, PS16, PS17, PS18, PS23, PS28, PS29,
PS30, PS34, PS36, PS38
Number of instance classes PS17 Source code metrics PS15, PS23, PS28, PS35, PS37
Test target size, number of labeled instances, Degree of freedom, PS19 Quality Model for PS35
p-value OO Design (QMOOD)
metrics
87271

13
87272 Multimedia Tools and Applications (2024) 83:87237–87298

6.3.3 Which algorithms have been used for TL?

This section discussed the various algorithms that have been used by the primary stud-
ies. The algorithms are based on the type of target and training data. Two studies have
performed a comparison among five TL methods using ML techniques as a base learner.
Adaptation Regularization TL (ARTL), Geodesic Flow Kernel (GFK), TCA, TJM are the
TL algorithms that have been used in two studies (Table 10).

6.3.4 What validation techniques have been used for TL?

This section describes the studies used in this review that have used different validation
techniques to validate the outcomes after applying a particular algorithm or after experi-
menting. The different validation techniques such as K-fold cross-validation, Leave-one-
out cross-validation (LOOCV), and Hold-out validation presented in Table 11. The most
commonly used validation technique is K-fold cross-validation. K-fold cross-validation has
been used in 15 (PS1, PS2, PS6, PS11, PS12, PS13, PS14, PS15, PS21, PS22, PS23, PS24,
PS29, PS30, PS31) studies out of all the selected primary studies for the review. However,
LOOCV is used in two studies (PS13, PS26), and hold-out cross-validation is used in one
study (PS3). The graphical representation of the count of studies that used validation tech-
niques presented in Fig. 9.

6.3.5 Which performance measure has been used for TL?

There are various metrics or measures used to analyze the performance of different mod-
els developed using TL. The evaluation measures play an important role in performing
the comparison and evaluation of developed models using various TL and ML techniques.
Table 12 represents various evaluation measures, and theoretical description of the speci-
fied measures.
The illustration of the count of studies for each specified evaluation metric is repre-
sented in Fig. 10. From the Fig. 10, it has been observed that accuracy is the widely used
evaluation metrics (PS2, PS5, PS10, PS12, PS14, PS18, PS19, PS20, PS21, PS22, PS24,
PS25, PS27, PS28, PS29, PS32, PS33), followed by Recall (PS9, PS11, PS16, PS18, PS21,
PS27, PS28, PS29, PS35, P36, PS38, PS39), F-measure (PS9, PS11, PS15, PS16, PS17,
PS18, PS28, PS29, PS37, PS38, PS39), AUC measure (PS4, PS9,PS15, PS17, PS21, PS23,
PS26, PS30, PS34, P36, PS38), Precision (PS11, PS14, PS18, PS28, PS29, PS39), FPR
(PS9, PS16, PS18, PS29, PS35), and G-mean (PS36, PS38). The performance metric
which is rarely used are united in the miscellaneous category such as absolute residual,
AUCEC, CLL, Error rate, Error mean, Error median, MAE, MRE, MER, MBRE, Misclas-
sification error, Mean square error, RMSE, SA, UAR (PS3, PS4, PS6, PS8, PS13, PS14,
PS16, PS25, PS31, PS32).

6.3.6 What statistical test has been used for TL?

This section describes the various statistical tests that have been used by the stud-
ies. These tests tell us about the significant difference between various distributions.
Table 13 represents the type of statistical test, their description, and the study identifier
in which they are used. In Fig. 11 we have represented graphically the statistical test

13
Table 10  Transfer learning algorithm used
Primary Studies TL algorithm Description

PS2 Task-clustering algorithm The study given [68] by the researchers used a task – clustering algo-
rithm. This algorithm was used for text classification. There exists
a linear text classification algorithm; it used inner product across
a test document vector and parameter vector. In the task clustering
algorithm, the tasks are grouped via the nearest neighbor algorithm
to facilitate knowledge transfer. Different parameter functions are
used in this algorithm. The parameter function is obtained with the
help of training data, and it has been used for testing data.
PS4 Find a legal mapping for a source clause The study used the algorithm to find a mapping for source clause.
The authors have used the concept of TL in terms of transferring
mapping learned from source to target. There are two different types
of mapping. One is global mapping, and the other is local mapping.
In global mapping, mapping is established for each source predicate
to a target predicate and used for the entire source translation. The
other approach called local mapping, is to find the top mapping of
Multimedia Tools and Applications (2024) 83:87237–87298

each source clause individually.


PS5 Dimensionality Reduction algorithm (TL via Maximum Mean The study proposed a dimensionality reduction algorithm. This
Discrepancy Embedding) algorithm is a process that consists of two-step. This algorithm
minimized the dimensionality with the help of TL. It is designed
to ensure active TL and the objective was to minimize distance
between data distributions across different domains.
PS9, PS29 Transfer NB (TNB) The study given by [80] existing authors proposed a TNB algorithm.
This algorithm takes the set of labeled samples and unlabeled
samples as input.
PS10, PS19 Graph co-regularized Collective Matrix tri-Factorization (GCMF)/ The study [2] has proposed a GCMF algorithm for TL. It uses any
Graph Co-Regularization TL (GTL) algorithm prior knowledge if available, and prior knowledge includes links in
network mining.
PS12 Hybrid Heterogeneous TL (HHTL) The study given by existing authors [101] proposed an HHTL algo-
rithm. This algorithm transfers features across different source and
target domains.
87273

13
Table 10  (continued)
87274

Primary Studies TL algorithm Description

13
PS19, PS20, PS22 Adaptation Regularization TL (ARTL) algorithm The study given by existing authors [98] used ARTL algorithm. This
algorithm performs instant adaptation of different domains and
classifier learning.
PS19, PS24, PS29, PS32, PS37 TCA algorithm The study given by existing authors [81, 102] used TCA algorithm.
This algorithm explores similar features between the training and
target data.
PS19, PS24 TJM algorithm The study used TJM algorithm. This algorithm is similar to ARTL
algorithm. The main aim of this algorithm is to decrease the mar-
ginal probabilities.
PS21 Feature Space Remapping (FSR) The study given by existing researchers [65] used FSR algorithm. It
is a heterogeneous TL algorithm. It transforms features among the
source and target data. It calculates meta-features and then computes
the similarity between them.
PS27 Weight—Structural Corresponding Learning (SCL) algorithm The study given by existing researchers [77] used Weighted SCL
algorithm. This algorithm finds out the important and unimportant
features among the source and target domains.
PS22 Weighted-resampling-based TL algorithm (TrResampling) The study given by existing researchers [67] proposed a TrResam-
pling algorithm. In this algorithm, several iterations are performed.
The main focus of this algorithm is to transfer weights assigned to
the instances. In each iteration, a new source training data set is cre-
ated. The labeled data in the target dataset is also combined with the
source training data set.
PS35 TL Oriented Minority Oversampling Technique based on Feature The study given by [93] has proposed a TOMOFWTNB algorithm.
Weighting TNB (TOMOFWTNB) This algorithm transfers the features among the source and target
data. The transferred features are selected based on their correlation
with the predictor/ output.
Multimedia Tools and Applications (2024) 83:87237–87298
Table 10  (continued)
Primary Studies TL algorithm Description

PS36 3SW-MSTL A novel method named 3SW-MSTL was developed for multi-source.
In the first stage, it is used to select multiple source projects from
multiple target projects considered as a training project. Fur-
ther, KNN applied to obtain 14 reweighted training instances by
minimizing the difference of marginal distributions between each
selected source project, and target project. It is based on a difference
between the conditional probability distribution of selected source
projects and target projects, a multi-source data utilization scheme is
employed for prediction model training.
PS38 BDA BDA considers both marginal and conditional distribution differences
between both source and target projects.
PS39 TrAdaBoost It is a supervised instance based domain adaptation algorithm and is
mainly used for classification tasks. It is considered a reverse boost-
ing concept.
Multimedia Tools and Applications (2024) 83:87237–87298
87275

13
87276 Multimedia Tools and Applications (2024) 83:87237–87298

Table 11  Description of validation techniques used


Validation technique Description

K-Fold Cross-Validation In this validation technique, the original data is randomly divided into K
identical-sized subsets of original data. Out of K subsets, a single subset acts
as verification data to perform testing, and the leftover K-1 subsets act as
training data.
LOOCV This validation technique is similar to K–fold cross-validation where K is
equivalent to N, the number of data points in the set. It means that the func-
tion approximator is trained for all the data except one point and that one
point is used for prediction.
Hold-out Validation This validation technique is simple and commonly used cross-validation
technique. In this technique, the dataset is categorized into two different sets,
one dataset is used as a training set, and another dataset is used as a testing
set. The training set is used to fit a function in the function approximator. The
outcome values are predicted by function approximator using the testing set
data, which is provided as an input to it.

Fig. 9  Validation techniques used

and no. of studies in which they are used. The various test that is used in the studies
is One-way ANOVA, ANOVA, Paired t-test, Wilcoxon test, Wilcoxon rank-sum test,
Tukey’s Honest Significant Difference test, Friedman test, Two-tailed T-test, Kolmogo-
rov–Smirnov test (K–S test or KS test). Kruskal–Wallis H-Test is rarely used as a sta-
tistical test. Thus, from the observed data it is concluded that Wilcoxon tests were used
in the majority of cases (PS10, PS11, PS13, PS23, PS34, PS36, PS37), as it is a non-
parametric test and used to perform comparison among two independent samples. Fur-
thermore, the Friedman test is used in the majority of studies (PS22, PS36, PS38), and it
is used to compare multiple treatments. However, the limitation of the Friedman test is
that it can be applied only when the minimum of treatments is 3. However, if the result
of Freidman test conclude to accept the alternate hypothesis, then a post-hoc analysis
test must be performed. Thus, a comparison of two techniques must be performed using
Nemenyi test, Wilcoxon signed rank test, and Bonferroni Dun test.

13
Multimedia Tools and Applications (2024) 83:87237–87298 87277

Table 12  Description of performance measure


Evaluation Measure Description

Accuracy Accuracy defines as a correlation between the correctly classified


classes and the summation of all the classes.
TP+FN
TP+TN+FN+FP
Area under the curve (AUC) This performance metric tells us whether the model is capable
of distinguishing between different classes. This metric or
performance measure is used for binary classification. A model
having an AUC value of 0.0 depicts 100% incorrect predictions
made by that model, and if the AUC is 1.0, that indicates the
predictions made by the model are 100% right.
False Positive Rate (FPR) FPR is defined by the ratio of positive instances that are not
correctly identified, which are originally classified as negative
instances to the total original negative instances.
FP
FP+TN
F-measure F-measure is a weighted reciprocal of the arithmetic mean of
the reciprocals of recall or sensitivity and precision. Its value
depends on precision and recall value. If the value is less out of
precision and recall, then it results in less F-measure value.
(a+1)∗recall∗p
recal+a∗precision
Precision Precision is defined by the proportion of actual positive instances
that are correctly predicted to the overall predicted positive
instances. It provides the count of correct positive predictions.
The 100% precision value for any class A indicates that the
instances associated with class A are correctly determined
as a part of class A. It does not indicate anything about other
instances that are associated with class A and predicted as
incorrect instances.
TP
TP+FP
Recall The recall is defined by the negative instances (which are cor-
rectly classified) to the sum of real positive instances. Recall is
also as True Positive Rate (TPR) or sensitivity. The recall value
of 1.0 for any class, indicates that a total number of instances
that are associated with that class are identified as a part of that
class
TP
TP+FN
Matthews correlation coefficient (MCC) MCC is used to identify the correlation coefficient between the
actual and predicted binary classification. The perfect predic-
tion means a + 1 value of the coefficients, a random prediction
is indicated by the value of coefficients as 0, and a value of -1
indicates complete disagreement between actual and predicted
observation. √ TP∗TN−FP∗FN
(TP+FP)(TP+FN)(TN+FP)(TN+FN)
G-Mean G-mean measures the balance between classification perfor-
mances of both the majority and minority classes.
Other miscellaneous measures They include absolute residual, Area Under the Cost-Effective-
ness Curve (AUCEC), Conditional Log-Likelihood (CLL),
Error rate, Error mean, Error median, Mean Absolute Error
(MAE), Magnitude of Relative Error (MRE), Magnitude of
Error Relative to the estimate (MER), Mean Balanced Relative
Error (MBRE), Misclassification error, Mean square error, Root
Mean Square Error (RMSE), Standardized Accuracy (SA),
Unweighted Average Recall (UAR)

13
87278 Multimedia Tools and Applications (2024) 83:87237–87298

Fig. 10  Performance measure used

6.3.7 Which category of TL has been used?

This section describes the various categories of TL method used in the studies. There are
three different categories of TL methods, such as Transductive TL (TdTL), Inductive TL
(IdTL), and Unsupervised TL (UnTL). These categories can also be termed as TL settings
in which the TL algorithms have been performed. The TL categories and their settings are
illustrated in Table 14. These categories differentiate from each other based on the type of
source data, type of target data, source and target domain, and source and target task. It has
been observed that most of the studies employed feature TL, and instance TL with IdTL.
Moreover, relational knowledge and parameter transfer are also feasible with IdTL. How-
ever, in the existing studies authors explored feature representation, and instance transfer-
based learning. The knowledge transfer is easy with the features of different projects. Fea-
ture transfer considers the features of the source and target domain. Further, a correlation
needs to be established between the features of both projects. Based on feature similarity,
either direct features will be extracted, or a feature matching analyzer will be used if there
is a huge amount of dissimilarity among the features of the source and target project. The
parameter transfer is used when the algorithm is used for transferring knowledge using
default parameters value change, and the target project sets its algorithm parameter value
according to the source project, this is accomplished for parameter TL. Hyperparameter
optimization can also be employed for parameter TL. In relational knowledge transfer, a
relationship needs to be established among the source project dataset, and using that pre-
diction model must be designed. Furthermore, this prediction model would be used for
knowledge transfer using the same methodology in the target dataset. In instance type
transfer, the knowledge is shared based on the instances of the source project. The dataset
must be preprocessed to apply instance TL. Thus, based on the analysis of existing litera-
ture, it has been concluded that feature transfer is more effective and efficient with TL in
the software engineering domain.
We have observed that TdTL (34.28%) has been widely used among all the categories.
IdTL (28.57%) has been used in only those studies that are related to multi-task learning
and self-taught learning. The last category UnTL has not been used in any of the studies.
Most of the studies considered the labeled data in the source domain ­(SD) while in the case
of UnTL source domain (­SD) labels are not available, and target domain (­TD) labels are
not available. Four different approaches correspond to these TL settings, such as instance

13
Table 13  Description of statistical test used
Statistical test Studies Description

One-way ANOVA PS14, PS21 One—way analysis of variance technique is used for the comparison of mean of two or more than two
samples. This technique applies to numeric data only.
ANOVA PS19, PS24, Analysis of variance technique is used to check if there is a significant difference between the mean of two
or more then two groups. It checks dependency between factors with the help of the mean comparison of
different samples
Kruskal–Wallis H-test PS2, PS18 The Kruskal–Wallis H test is also called a one-way ANOVA on ranks. It is a rank-based nonparametric test
that can be used to identify that can be used to determine if there are statistically significant differences
between two or more groups of an independent variable on a continuous or ordinal dependent variable.
It is considered as extended version of the Mann–Whitney U test to perform the comparison across more
than two independent groups
Paired t-test PS20, PS32 The paired t-test is also known as the paired sample t-test and dependent sample t-test. It is a statistical test
that is used to identify whether the mean difference between two sets of observations is zero.
Wilcoxon test PS10, PS11, PS13, PS23, The Wilcoxon test has four different variants. It is a non-parametric test. One of the variants of the Wil-
PS34, PS36, PS37 coxon test is the Wilcoxon signed-rank test. This test is used to compare two different samples which are
Multimedia Tools and Applications (2024) 83:87237–87298

related, matched samples, or parallel measurements over one sample to analyze the difference between
their population mean ranks. Another variant is the Wilcoxon signed-rank test, which is a nonparametric
test that can be used to identify whether two dependent samples were selected from populations having a
similar distribution.
Wilcoxon rank-sum test PS9, PS16, PS35 Wilcoxon rank-sum test also known as the Mann–Whitney–Wilcoxon, Mann–Whitney U test, or Wilcoxon–
Mann–Whitney test. Wilcoxon rank-sum test is a non-parametric test of no effect that is the value that is
randomly selected form one population sample will be either less than or greater than a value that is ran-
domly selected from another population sample. Non-parametric means it does not have any assumptions
of gaussian distributions (normal distribution). This test applies to independent samples.
Friedman Test PS22, PS36, PS38 It is a non-parametric test and it is an alternative measure, This test is used to test the difference across dif-
ferent groups when the target variable is of ordinal type.
Two-tailed T-test PS22 In the two-tailed test, the critical area of the distribution is two-sided, it tests whether a sample is greater
than or less than a certain range of values. Thus, it is used in null hypothesis testing.
KS test PS34, PS35 It is a non-parametric test which is used to test the equality of continuous or discontinuity.
87279

13
Table 13  (continued)
87280

Statistical test Studies Description

13
Tukey’s HSD PS19, PS24 Tukey’s Honest Significant Difference test or Tukey’s HSD (honestly significant difference) test, also known
as the Tukey’s range test, Tukey’s test, and Tukey method, is a one-step process of several comparison and
statistical tests. This test can be applicable to unprocessed data or in combination with an ANOVA to find
out the means that are different from each other.
Bonferroni-Dunn test PS36 It is used to perform comparison among multiple pairs of mean (averages) among groups of data and is
mostly used after applying statistical test for mean comparison such as ANOVA.
Nemenyi test PS38 This test is used as a post-hoc analysis test like the Wilcoxon signed rank test followed by the Friedman test.
It is used to find out which groups are different. The hypothesis for Friedman test concerning Nemenyi
tests as follows:
• The null hypothesis ­(Ho): The mean value for each of the populations is equal
• The alternative hypothesis: ­(Ha): At least one population mean differs from the others
Kendall tau-b rank correlation PS16 It is used to find the strength and direction of association between two variables on an ordinal scale.
coefficient
One-sided paired t-test PS17 In a two-tailed test, the critical area of the distribution is one-sided, it tests whether a sample is greater than
or less than a certain range of values, but not both.
Multimedia Tools and Applications (2024) 83:87237–87298
Multimedia Tools and Applications (2024) 83:87237–87298 87281

Fig. 11  Statistical test used

transfer, feature-representation transfer, parameter transfer, and relational-knowledge trans-


fer. For the InTL setting, instance transfer is mostly used, and for TnTL, feature-representa-
tion transfer is mostly used. The count of studies for each approach corresponds to different
TL settings presented in Fig. 12. Also, if a huge number of features are available, then a
specified feature selection technique must be applied to select only relevant features.

6.4 RQ4: Which TL algorithm has been found to be effective using ML techniques?

This section discusses the TL algorithm, which is effective against the various traditional
learners. Traditional learners include ML techniques, which are compared with the pro-
posed algorithm by different authors. These studies have used different datasets over which
comparisons have been made. We have observed the values of accuracy, AUC, Recall, and
F-measure for analyzing the performance of TL algorithms. However, these four metrics
are mostly used in the existing studies. In the comparative analysis concerning existing
studies, the combined dataset of results is collected and outliers are removed. Moreover,
outliers lead to unbiased results corresponding to specified datasets. Thus, a boxplot is
used to remove these outliers. Figure 13 presents the distribution of studies concerning
accuracy value corresponding to all the datasets majorly used. Figures 14 and 15 present
the distribution of studies concerning AUC value corresponding to all the datasets majorly
used. Figure 16 presents the distribution of studies concerning Recall value corresponding
to all the datasets majorly used. Figure 17 presents the distribution of studies concerning
the F-measure value corresponding to all the datasets majorly used. The descriptive statis-
tics of all the performance measure with respect to TL techniques are presented in Table 15
including minimum, maximum, mean, median, and standard deviation measure values.
In the study given by existing researchers [103], experiments have been performed
on various TL and ML algorithms on different datasets. The various TL algorithms that
are used in the two studies are GTL, TCA, TJM, and GFK. These algorithms have been
tested on five distortion profiles. It has been observed that the traditional ML algorithm
RF has performed best. GFK and TJM algorithms provided the worst result. The base clas-
sifier is same for both the algorithms. Other base classifiers are flexible to noisy datasets,
unlike 1-NN classifier, due to which these two algorithms performance result in the worst

13
87282

13
Table 14  Transfer learning type and setting
TL Category Relevant Field Source Domain ­(SD) and Target Domain (­ TD) Source Labels ­(SL) Target Labels (­ TL) Source ­(ST) and Target Task (­ TT)

TdTL Domain Adaptation, Sample SD ≠ ­TD Present Absent SD = ­TD


Selection Bias & Co-variate Different (Related to some extent) Similar
Shift
IdTL Multi-task Learning SD = ­TD Present Present SD ≠ ­TD
Similiar Different (Related to some extent)
Self-taught Learning SD = ­TD Absent Present SD ≠ ­TD
Similar Different (Related to some extent)
UnTL SD ≠ ­TD Absent Absent SD ≠ ­TD
Different (Related to some extent) Different (Related to some extent)
Multimedia Tools and Applications (2024) 83:87237–87298
Multimedia Tools and Applications (2024) 83:87237–87298 87283

Fig. 12  Type of transfer approach corresponding to different transfer learning settings

Fig. 13  Dataset-wise accuracy for TL techniques used

performance. When SVM is used as a base classifier, then TCA algorithm results in the
worst performance. Table 16 and Table 17 present the statistics of performance measures
obtained from the existing study. The performance of the ARTL algorithm is proved to be
best in comparison with other TL algorithms that have been used in this study. The ARTL
algorithm has attempted to resolve boundary and optional distribution differences, which
can be a reason for the best performance of the ARTL algorithm. The overall conclusion
of the study stated that the TCA algorithm performs best out of all the TL algorithms that
have been used in this study for comparison. The TJM algorithm is second best after the
TCA algorithm. All of these algorithms perform best or worst on different distortions. The
ARTL algorithm comes third after the TJM algorithm.
In the existing study [103], five different TL methods were compared on different data-
sets with a different statistical test against seven different base learners. The five TL algo-
rithms that are used are as follows: GFK, JDA, TJM, TKL, and TCA. The seven different base
learners are RF, SVM, Discriminant Analysis, LR, 5NN, DT, and NB. AUC values have been

13
87284 Multimedia Tools and Applications (2024) 83:87237–87298

Fig. 14  Dataset-wise AUC for TL techniques used

Fig. 15  Dataset-wise AUC for ML techniques used

computed for four base learners corresponding to the MAG, USPS, CCC, and CV datasets. In
the next step, accuracy has been calculated for all algorithms corresponding to seven different
distortion profiles. In the third step, accuracy has been computed over seven base learners cor-
responding to each TL algorithm. The best base learner for the TL algorithm has been individ-
ually investigated by Tukey’s HSD test and assigned the HSD group for every accuracy value.

6.5 RQ5: What are the threats to validity for TL?

This section discusses the threats to validity for TL based on the similarity between
domains, the kind of data used in the source, and the target domain. Based on data

13
Multimedia Tools and Applications (2024) 83:87237–87298 87285

Fig. 16  Dataset-wise Recall values for TL techniques used

Fig. 17  Dataset-wise F-measure values for TL techniques used

distribution across different domains, [104] proposed a dimensionality reduction algo-


rithm for effective TL when data is distributed across different domains. There exist dif-
ferent latent factors over different domains. TL uses labeled data from a similar learning
task. TL uses available data for learning on the target data, which consists of both train-
ing and target test data. For TL, the source and target data have similarities, and there
exists a relationship between them. It has been examined from all the primary studies
that have been studied for this review that the data must be specified in the target and
training domain based on the algorithm.

13
87286 Multimedia Tools and Applications (2024) 83:87237–87298

Table 15  Descriptive statistics of performance measure in the existing studies


Technique Performance Minimum Maximum Mean Median Standard Deviation
Measure

Baseline AUC​ 0.5796 0.778 0.6688 0.594 0.6688 ± 0.06426


(Logistic Accuracy 57 63 59.333 60 59.333 ± 1.83356
Regression)
F-measure 0.1 0.69 0.3538 0.315 0.3538 ± 0.14655
BDA AUC​ 0.651 0.704 0.6737 0.67 0.6737 ± 0.0191
AUC​ 0.703 0.733 0.07165 0.715 0.07165 ± 0.0132382
Recall 0.305 0.404 0.36175 0.369 0.36175 ± 0.036
CA AUC​ 0.665 0.79 0.73075 0.734 0.73075 ± 0.0451
CC AUC​ 1 6 2.698113 3 2.698113 ± 0.8813
F-measure 0.1686 0.5097 0.334 0.2655 0.334 ± 0.139
CCA + AUC​ 0.6732 0.869 0.814 0.8319 0.814 ± 0.0651
Recall 0.67 0.86 0.76 0.755 0.76 ± 0.07778
F-measure 0.55 0.84 0.712 0.73 0.712 ± 0.1042
CDT AUC​ 0.621 0.728 0.669 0.6635 0.669 ± 0.04063
Recall 0.267 0.375 0.3035 0.286 0.3035 ± 0.0424
CPDP-CM AUC​ 0.654 0.695 0.6628 0.654 0.6628 ± 0.01614
CPDP-IFS AUC​ 0.527 0.736 0.6419 0.665 0.6419 ± 0.06944
FeSCH AUC​ 0.685 0.801 0.72975 0.7165 0.72975 ± 0.04371
Recall 0.214 0.344 0.276 0.274 0.276 ± 0.0604
HDP AUC​ 0.5139 0.721 0.6326 0.63125 0.6326 ± 0.63125
HDP KS,0.05 AUC​ 0.5 0.59 0.5344 0.528 0.5344 ± 0.343
HISNN AUC​ 0.686 0.76 0.708 0.693 0.708 ± 0.3045
Recall 0.23 0.286 0.264 0.2705 0.264 ± 0.02
J48 AUC​ 0.6712 0.8462 0.8033 0.8268 0.8033 ± 0.0599
JDT AUC​ 0.674 0.752 0.7195 0.726 0.7195 ± 0.029
Recall 0.249 0.38 0.2995 0.2845 0.2995 ± 0.0547
KNN_FMT AUC​ 0.21 0.67 0.389 0.335 0.389 ± 0.1508
KNN_HDP AUC​ 0.504 0.763 0.644 0.6455 0.644 ± 0.06054
KNN_RM AUC​ 0.467 1.758 0.6812 0.643 0.6812 ± 0.2474
LR_FMT AUC​ 0.54 0.64 0.6498 0.6535 0.6498 ± 0.0542
LR_HDP AUC​ 0.43 0.794 0.65817 0.6655 0.65817 ± 0.08938
LR_RM AUC​ 0.414 1.826 0.7143 0.6725 0.7143 ± 0.6725
ManualUD AUC​ 1 1 1 1 1±0
Recall 0.222 0.451 0.345 0.3535 0.345 ± 0.095
MSMDA AUC​ 0.6511 0.7993 0.73126 0.739 0.73126 ± 0.04618
MTDP AUC​ 1 5 3 3 3 ± 1.414214
NB_FMT AUC​ 0.392 0.687 0.5433 0.5355 0.5433 ± 0.07829
NB_HDP AUC​ 0.514 0.812 0.6836 0.7 0.6836 ± 0.076
NB_RM AUC​ 0.422 1.816 0.699 0.649 0.699 ± 0.26057
NN- Recall 0.36 0.65 0.4925 0.48 0.4925 ± 0.1145
F-measure 0.34 0.59 0.45 0.425 0.45 ± 0.09233
NN Filter AUC​ 0.5745 0.7612 0.6508 0.644 0.6508 ± 0.05828
F-measure 0.1407 0.5372 0.3239 0.2695 0.3239 ± 0.14125
AUC​ 0.66662 0.8361 0.79553 0.8191 0.79553 ± 0.0584
Recall 0.85 1 0.908 0.875 0.908 ± 0.0656

13
Multimedia Tools and Applications (2024) 83:87237–87298 87287

Table 15  (continued)
Technique Performance Minimum Maximum Mean Median Standard Deviation
Measure

Peter-Filter AUC​ 0.663 0.768 0.70425 0.693 0.70425 ± 0.0391


Recall 0.235 0.325 0.27 0.26 0.27 ± 0.036
RF AUC​ 0.5157 0.6564 0.6024 0.60605 0.6024 ± 0.4403627
TCA​ AUC​ 0.1 0.5 0.314 0.29 0.314 ± 0.10297
Recall 0.183 0.363 0.254 0.2355 0.254 ± 0.0665
F-measure 0.23 0.72 0.45 0.435 0.45 ± 0.1498
TCA + Accuracy 43 69 59.111 62 59.111 ± 7.4889
AUC​ 0.676 0.757 0.7235 0.7305 0.7235 ± 0.0314
Recall 0.38 0.47 0.425 0.425 0.425 ± 0.045
F-measure 0.23 0.72 0.454 0.43 0.454 ± 0.1484
F-measure 0.31 0.36 0.336 0.335 0.336 ± 0.025
TCANN AUC​ 0.15 0.38 0.2855 0.315 0.2855 ± 0.0794
F-measure 0.21 0.76 0.4515 0.396 0.4515 ± 0.1826
TNB Accuracy 45 61 56.111 58 56.111 ± 4.7
AUC​ 0.5981 0.7641 0.6392714 0.6171 0.6392714 ± 0.0528
Recall 0.83 1 0.901 0.875 0.901 ± 0.0719
F-measure 0.3 0.44 0.36 0.35 0.36 ± 0.0524
VAB-SVM AUC​ 0.23 0.62 0.4085 0.37 0.4085 ± 0.1263
F-measure 0.15 0.57 0.3234 0.335 0.3234 ± 0.1079
VCB Accuracy 44 65 56.8889 62 56.8889 ± 7.578
Yu-Filter AUC​ 0.563 0.764 0.66 0.6565 0.66 ± 0.075
Recall 0.198 0.333 0.2525 0.2375 0.2525 ± 0.0499

Table 16  Classification accuracy Distortion profile Algorithm /Test Classifica-


of base learner across various tion Accu-
datasets racy

CCI-60-C1 RF 79.59%
LCI-60-C0 RF 73.87%
FFB-1 RF 79.28%
CFB-2 ARTL 76.56%
DCB-80 RF 76.22%

6.6 RQ6: What are the advantages & disadvantages of various TL techniques?

This section discusses the advantages and disadvantages of TL techniques, studied by


the researchers in the existing studies. The advantages and disadvantages of the TL
techniques supported by the studies are discussed in this section. Table 18 summarizes
the advantages and disadvantages of TL techniques.

13
87288 Multimedia Tools and Applications (2024) 83:87237–87298

Table 17  Accuracy and HSD group ranking of best and worst base learners for TL algorithm used
TL Algorithm Best / Worst Base Learner Accuracy HSD
Group
Ranking

GFK algorithm Best SVM 71.60% A


Worst NB 62.70% F
Joint Distribution Adaptation (JDA) Best 5NN 69.90% A
Worst SVM 66.00% D
TJM Best SVM 71.10% A
Worst DT 65.60% E
Transfer Kernel Learning (TKL) Best SVM 73.10% A
Worst NB 66.20% E
TCA algorithm Best SVM 70.30% A
Worst NB 65.60% D

7 Limitations

This SR review examined various studies and selected primary studies to study and analyze
the existing TL algorithms that have been designed to perform different experiments. This
SR also identifies the ML techniques that are used for TL. However, an exhaustive search
was conducted to collect studies from all the digital libraries and selected 39 primary stud-
ies. Out of 39 primary studies, some of the studies are related to the software engineering
field. It is one of the limitations of this review. Thus, the effect of TL in software engi-
neering is not conclusive. The primary studies that have been considered for this review
have performed different experiments in each study, and every study has a different TL
algorithm, Moreover, there can be a threat that some relevant studies are excluded after
applying the exclusion criterion. However, we assumed that primary studies are non-biased
and impartial. Thus, if it exists in our SR, then there is a threat to the validity of the review.
To conduct this SR, studies with ML, and TL techniques are considered only with specified
measures, and validation methods. However, more techniques can be explored, including
the dataset, performance measures, and validation methods. Moreover, the results in each
primary study depend on the experimental setting used such as dataset, variables used,
feature selection techniques used, validation method used, type of projects used, and pro-
gramming language used. Thus, a threat to validity can occur but statistical analysis of the
results performed in this SR.

8 Applications

TL is a technique that is used to train the model on one task and resue it for learning on
another task by establishing a relationship between the data distribution of both tasks.
The operational success of this SR is helpful for academicians, researchers, and industry
experts to develop more reliable and robust software in the future. The authors tested the
capability of these models for different versions of the same projects named as inter-pro-
ject validation. The performance of inter-project validation using TL is more efficient and

13
Multimedia Tools and Applications (2024) 83:87237–87298 87289

Table 18  Advantages and Disadvantages of TL techniques


TL Technique Advantages and Disadvantages of TL techniques

Discriminability Based TL (DBT) It has been demonstrated that the destination net-
works that are initialized via DBT learn much faster
than networks that are initialized randomly. DBT
indicates considerable and important learning speed
improvement across randomly initialized networks.
DBT is superior in comparison to literal transfer,
and to directly use the destination network on the
destination task.
Task – clustering It provides an idea to achieve inductive transfer
in classifier design with the help of labeled data
from the related classification problems to solve a
particular classification problem.
TNB It improves the performance of the dataset collected
from various companies or cross-company data.
Graph co-regularized TL (GTL) The main focus of GTL is TdTL. In TdTL, the
domain has generously labeled examples, while the
destination domain consists of unlabeled examples
only. GTL does not cover the latent features under
various domains as the bridge to transfer knowledge
simultaneously. It results in maximizing the empiri-
cal likelihood of all the domains and conserving the
geometric structure in every domain.
TCA​ Advantages: TCA learns a similar transfer compo-
nent that comes under both domains such as the
difference in the distribution of data across various
domains. It can be reduced if it is projected on the
subspace, and it conserves the various data proper-
ties. It is beneficial to use traditional ML methods
in this subspace to train classification and regres-
sion over various domains. If two or more domains
are associated with each other, then there may exist
various similar components under them, due to
which the partitioning of data across domains is to
be distinguished.
HHTL Advantages: It is useful for transferring knowledge
over various feature scopes and concurrently
rectifying the data error on the transformed feature
space. The performance of HHTL is best and more
stable when the size of parallel data is increased.
The HHTL is effective and robust for cross-lan-
guage sentiment classification.
Instance-based techniques Advantages: These techniques are used for handling
instances by removing the outliers, relevant filter-
ing, or weighting of instances.
Distribution-based techniques These techniques aim at managing the instance distri-
bution for training and testing sets with the help of
stratification, cost curves, and mixture models.

13
87290 Multimedia Tools and Applications (2024) 83:87237–87298

Table 18  (continued)
TL Technique Advantages and Disadvantages of TL techniques

GA for Feature-Space Remapping (GAFSR) and Advantages: These techniques are informed,
Greedy Search for Feature-Space Remapping supervised learning techniques. The benefit of FSR
(GrFSR) is that it can be applicable for both cases that are
either informed or uninformed. The main advantage
of GAFSR is that it achieves the best performance
scores across all the metrics.
Disadvantages: It takes more time to execute, in
comparison to IFS. In IFS, the computation count is
low, but the performance score is high.
Stacking Advantages: It is beneficial in terms of combining
the stacking with IFSR, and IFSR uses labeled data.
Disadvantage: To train ensemble classifiers, it needs
labeled data.
Canonical Correlation Analysis (CCA) Advantages: It is an effective TL method. It is used
to make the distribution among training and testing
data of companies. CCA with CCDP is effective for
HCCDP. CCA acts as a powerful tool in multi-
variate data analysis to establish the correlation
between two different sets of variables.
Feature-Space Remapping (FSR) Advantages: It can manage various feature spaces
without using any co-occurrence data. This
technique uses originally raw data which is already
mapped on a feature space, and this is why this
technique is also known as a remapping name. It
requires a low amount of labeled data in its target
domain. This labeled data is required to understand
the relations to the training domain. It can increase
the classification accuracy in the target domain by
combining the relevant information from the train-
ing domain with the help of ensemble learners.
TNB Advantages: TNB performs better for SOFTLAB
dataset, and it does not perform better for NASA
dataset. TNB works for both within-company as
well as cross-company. The author focused on
cross-company defect prediction. TNB outperforms
naive bayes in the context of performance measures
such as F-measure, AUC over within company, and
cross-company defect prediction.
Disadvantages: TNB is limited to a particular com-
pany dataset.
Ensemble technique Advantages: This technique performs better than a
trained classifier using a huge amount of labeled
data in the destination domain.
Voting Ensemble Advantages: It is defined as the simplest method for
combining multiple classifiers.
Bellwether Advantages: Bellwether can be used efficiently
when the availability of historical data is limited
or negligible. Due to a lack of historical data,
developers try to get data from other projects. It has
been examined that irrespective of the granularity
of data, there exists a bellwether dataset that can be
used for the training of defect prediction models.
The bellwether does not require brief data mining
methods to discover, and it can be identified during
the early phase of the project life cycle.

13
Multimedia Tools and Applications (2024) 83:87237–87298 87291

effective in comparison to the cross-validation method. Moreover, the cost of resources,


project completion time, and successful outcome would be achieved with the usage of TL.
Thus, this SR proves the credibility of TL in providing insightful results for software engi-
neering projects in case of unavailability of a sufficient amount of training data. However,
this SR is useful in the real world from various aspects. (1) It is useful to identify defects in
the software and remove them in the early stage. It helps in reducing the defects in future
projects using TL in intra-version DP, inter-version DP, cross-company DP, WPDP, and
CPDP. Thus, the TL-based models leverage knowledge learned from large datasets, these
models help in identifying patterns that indicate the occurrence of defect and defect-prone
parts of the software. (2) TL can also be used for automatic code completion and code
generation. The models that pertain to large code repositories can be fine-tuned for similar
tasks, for software with a specified programming language, which helps in reducing the
amount of effort in extensive training, and improvising performance. (3) TL also helps in
increasing the reusability of existing projects by reducing the amount of effort required for
refactoring the code and maintenance of code for future projects. (4) TL is helpful in code
summarization in a way that models trained in a specified language can be fine-tuned for
generating code summaries for code snippets and documentation manuals including brief
code summarization, and documentation generation. (5) TL helps in the classification, and
categorization of code by identifying the purpose, and functionality of code, designing pat-
terns, and detecting code smells. (6) TL is also helpful in software project management by
estimating the project deadlines, the amount of time required to complete the project, and
the resources required to accomplish specified projects based on existing project datasets
using pre-trained models. (7) Using cross-language ideas TL can be used for translating
code from one language to another language using relational knowledge TL and increasing
the adaptability of legacy systems to news platforms. (8) TL models are also helpful for
software testing and software quality assurance, with test case generation, test case prior-
itization, estimation of risk involved in the software with process and product, and fault
localization. (9) TL models are also useful for security analysis and vulnerability detection.
Thus, pre-trained models help to investigate code vulnerabilities such as SQL injection,
butter overflow vulnerabilities, and cross-site scripting to improve the security of software
systems. (10) TL models developed in the natural language processing domain are helpful
in the software engineering field as well for requirement analysis and automated documen-
tation manual generation for the software. Thus, TL is helpful for the software engineer-
ing field from various aspects to improve the reusability of existing software through fine-
tuned models which are useful in improving the quality of software such as functionality,
maintainability, testability, user-friendliness, and reliability.

8.1 Economic analysis

In this section, the importance of assessing the economic impact of TL in software engi-
neering is discussed. Further, the evaluation of value or wealth from the usage of TL is a
crucial aspect of socio-economic implications in the software engineering domain. In the
future, we will use an economic evaluation approach to quantify, and analyze the economic
impact of this SR at local and global scales. The economic analysis includes cost–benefit
analysis, Return on Investment (ROI), and socio-economic impact analysis. The cost–ben-
efit analysis will be done by analyzing the cost of implementing TL in the industry through
reusability of existing code including the cost of acquiring data, model development, and
maintenance. The cost–benefit analysis includes development efficiency, product quality,

13
87292 Multimedia Tools and Applications (2024) 83:87237–87298

and reduced time-to-market cost of the software. Further, ROI will be computed by com-
paring the financial gains achieved through the implementation of the initial investment
made. The ROI analysis would help business associates, and software developers in terms
of profitability and viability of integrating TL techniques into software engineering meth-
odology workflow. Furthermore, the socio-economic impact analysis will be accomplished
by considering various aspects such as innovation simulation, social welfare improvement,
and efficient software developers employment. Thus, this analysis will help in a holistic
perspective on the value creation potential of TL with software engineering ecosystem.
Thus, comprehensive analysis provide valuable financial and socio-economic analysis to
researchers, academicians, and industry experts in the upcoming years.

9 Conclusion and future direction

In this paper, we have performed SR for TL using ML techniques. We have studied and
examined the various TL algorithms in the fields of artificial intelligence, ML, and soft-
ware engineering. Firstly, we have done a deep analysis followed by a sequence of system-
atic points and identified 39 primary studies during this period (1991–2024). Secondly,
the quality attributes that are focused on TL are discussed. Thirdly, the characteristics or
experimental settings of the primary studies have been discussed based on the dataset,
independent variables, TL algorithms, validation techniques, performance measures, and
statistical tests. Fourth, we have analyzed the comparison of various TL techniques with
traditional ML algorithms as a base learner. In the end, the merits and demerits of TL tech-
niques are summarized. The relevant outcomes obtained from the primary studies selected
for this review are as follows:

• The quality attributes that have been used for TL are accuracy, effectiveness, perfor-
mance, reliability, effort, and defect. The most commonly used quality attribute is per-
formance and effectiveness used in 32%, 23% of studies. There is no study conducted
for change prediction using TL.
• The ML techniques were categorized into different classes such as SVM, EL, DT, BL,
NB, and Miscellaneous. The mostly used ML techniques for TL were SVM, RF, and
NB.
• The most commonly used dataset for performing experiments is NASA in the litera-
ture.
• The independent variables that have been used by various studies do not exhibit any
relationship with each other.
• The algorithms that have been used by the selected studies differ and these algorithms
depend on the type of training and target dataset.
• The validation technique that has been used in most of the primary studies is K-fold
cross-validation. In K-fold cross-validation, the original dataset is used for both training
as well as validation, and it uses every sample for validation exactly once.
• The performance measure that has been used by most studies is accuracy followed by
AUC, F-measure, and recall.
• The TL categories used in the selected primary studies are IdTL and TdTL. UnTL has
not been used in any study. The instance transfer setting has been mostly used in IdTL,
and feature-representation transfer has been mostly used in TdTL.

13
Multimedia Tools and Applications (2024) 83:87237–87298 87293

• The outcome of this SR corresponding to characteristics of TL, IdTL is mostly used


in the existing study. The features and instances of different projects play an impor-
tant role in transferring knowledge between two different projects. The performance of
TL models was evaluated using accuracy, adaptability, scalability, and generalization
power of the predictive model. Thus, it has been analyzed that TL-based algorithms
satisfy the above characteristics for the development of efficient software quality mod-
els for future projects.
• The comparison made using SVM as a base learner provides the best classification
accuracy in comparison to other base learners. The comparison made with NB as a
base learner provides the worst classification accuracy in most of the cases.
• These are some of the instructions for the researcher scholars, industry experts, and
software developers to carry out future research on TL in software engineering.
• More experiments should be carried out to increase the studies for TL in software engi-
neering. The number of studies using TL in software engineering must be increased to
show the benefits of TL in software engineering.
• The experimental settings should be specified in the study. Datasets used by your study,
independent variables used by your study, performance measures, and the statistical
test used by your study.
• More studies should be carried out for change prediction using different types of TL.
• To obtain more accurate, precise, and generalized results, more studies should be car-
ried out across cross-project and cross-company prediction using TL.
• More studies for TL should be carried out using ML techniques.
• More studies should be carried out using different settings and approaches of TL. Each
of the TL type should be explored corresponding to all the TL categories.
• It is observed that there are very few studies that transfer knowledge using similar TL
techniques or TL algorithms.
• The weaknesses and strengths should be noted for each of the TL techniques that would
be helpful in future studies.
• More studies should be conducted considering hybrid/evolutionary, and swarm-based
algorithms.
• More studies must conducted to analyze the impact of feature selection techniques with
TL.
• More studies must be conducted with bio-inspired algorithms for TL.
• The effectiveness of TL models must be analyzed with hyperparameter optimization.

Funding Not funded.

Data availability The details of the selected primary studies used as data in this article are specified in
Table 3.

Declarations
Ethical Approval and consent to participate This article does not contain any studies with human partici-
pants or animals performed by any of the authors.

Conflict of Interest The authors declare that they have no conflicts of interest.

Consent for Publication The authors provide consent for publication.

13
87294 Multimedia Tools and Applications (2024) 83:87237–87298

References
1. Joachims T (1999) Transductive inference for text classification using support vector machines. In
Icml 99:200–209. https://​dl.​acm.​org/​doi/​10.​5555/​645528.​657646
2. Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. J Big data 3:1–40.
https://​doi.​org/​10.​1186/​s40537-​016-​0043-6
3. Zhao P, Liu Y, Lu Y, Xu B (2019) A sketch recognition method based on transfer deep learning
with the fusion of multi-granular sketches. Multimed Tools Appl 78:35179–35193. https://​doi.​org/​
10.​1007/​s11042-​019-​08216-6
4. Day O, Khoshgoftaar TM (2017) A survey on heterogeneous transfer learning. J Big Data 4:29.
https://​doi.​org/​10.​1186/​s40537-​017-​0089-0
5. Priyadarshini I, Sahu S, Kumar R (2023) A transfer learning approach for detecting offensive and
hate speech on social media platforms. Multimed Tools Appl 82:27473–27499. https://​doi.​org/​10.​
1007/​s11042-​023-​14481-3
6. Chen J, Sun J, Li Y, Hou C (2022) Object detection in remote sensing images based on deep trans-
fer learning. Multimed Tools Appl 81:12093–12109. https://​doi.​org/​10.​1007/​s11042-​021-​10833-z
7. Kang J, Gwak J (2022) Ensemble of multi-task deep convolutional neural networks using transfer
learning for fruit freshness classification. Multimed Tools Appl 81:22355–22377. https://​doi.​org/​
10.​1007/​s11042-​021-​11282-4
8. Varshney N, Bakariya B, Kushwaha AKS (2022) Human activity recognition using deep trans-
fer learning of cross position sensor based on vertical distribution of data. Multimed Tools Appl
81:22307–22322. https://​doi.​org/​10.​1007/​s11042-​021-​11131-4
9. Rashedi E, Nezamabadi-pour H, Saryazdi S (2009) GSA: A Gravitational Search Algorithm. Inf
Sci (Ny) 179:2232–2248. https://​doi.​org/​10.​1016/j.​ins.​2009.​03.​004
10. Ornek AH, Ceylan M (2022) Medical thermograms’ classification using deep transfer learn-
ing models and methods. Multimed Tools Appl 81:9367–9384. https://​doi.​org/​10.​1007/​
s11042-​021-​11852-6
11. Taylor ME, Stone P (2009) Transfer Learning for Reinforcement Learning Domains : A Survey. J
Mach Learn Res 10:1633–1685. https://​doi.​org/​10.​1145/​15770​69.​17558​39
12. Xu Q, Yang Q (2011) A Survey of Transfer and Multitask Learning in Bioinformatics. J Comput Sci
Eng 5:257–268. https://​doi.​org/​10.​5626/​jcse.​2011.5.​3.​257
13. Lu J, Behbood V, Hao P et al (2015) Transfer learning using computational intelligence: A survey.
Knowledge-Based Syst 80:14–23. https://​doi.​org/​10.​1016/j.​knosys.​2015.​01.​010
14. Ribani R, Marengoni M (2019) A Survey of Transfer Learning for Convolutional Neural Networks.
Proc - 32nd Conf Graph Patterns Images Tutorials. SIBGRAPI-T 2019:47–57. https://​doi.​org/​10.​
1109/​SIBGR​API-T.​2019.​00010
15. Cook D, Feuz KD, Krishnan NC (2013) Transfer Learning for Activity Recognition: A Survey.
Knowl Inf Syst 36:537–556
16. Mohammadi A, Zahiri SH (2018) Inclined planes system optimization algorithm for IIR system iden-
tification. Int J Mach Learn Cybern 9:541–558. https://​doi.​org/​10.​1007/​s13042-​016-​0588-x
17. Mohammadi A, Zahiri SH (2017) IIR model identification using a modified inclined planes system
optimization algorithm. Artif Intell Rev 48:237–259. https://​doi.​org/​10.​1007/​s10462-​016-​9500-z
18. Mohammadi A, Sheikholeslam F, Mirjalili S (2022) Inclined planes system optimization: the-
ory, literature review, and state-of-the-art versions for IIR system identification. Expert Syst Appl
200:117127. https://​doi.​org/​10.​1016/j.​eswa.​2022.​117127
19. Esfahrood SM, Mohammadi A, Zahiri SH (2019) A simplified and efficient version of inclined planes
system optimization algorithm. In: 2019 5th Conference on Knowledge Based Engineering and Inno-
vation (KBEI), pp 504–509. https://​doi.​org/​10.​1109/​KBEI.​2019.​87350​44
20. Mohammadi A, Sheikholeslam F, Mirjalili S (2023) Nature-inspired metaheuristic search algorithms
for optimizing benchmark problems: inclined planes system optimization to state-of-the-art methods.
Arch Comput Methods Eng 30(1):331–389. https://​doi.​org/​10.​1007/​s11831-​022-​09800-0
21. Pan W (2016) A survey of transfer learning for collaborative recommendation with auxiliary data.
Neurocomputing 177:447–453. https://​doi.​org/​10.​1016/j.​neucom.​2015.​11.​059
22. Ali SMM, Augusto JC, Windridge D (2019) A Survey of User-Centred Approaches for Smart Home
Transfer Learning and New User Home Automation Adaptation. Appl Artif Intell 33:747–774.
https://​doi.​org/​10.​1080/​08839​514.​2019.​16037​84
23. Liu R, Shi Y, Ji C, Jia M (2019) A Survey of Sentiment Analysis Based on Transfer Learning. IEEE
Access 7:85401–85412. https://​doi.​org/​10.​1109/​ACCESS.​2019.​29250​59
24. Liu Y, Li Z, Liu H, Kan Z (2020) Skill transfer learning for autonomous robots and human–robot
cooperation: A survey. Rob Auton Syst 128:103515. https://​doi.​org/​10.​1016/j.​robot.​2020.​103515

13
Multimedia Tools and Applications (2024) 83:87237–87298 87295

25. Zhao C (2020) A Survey on Image Style Transfer Approaches Using Deep Learning. J Phys Conf
Ser 1453:. https://​doi.​org/​10.​1088/​1742-​6596/​1453/1/​012129
26. Niu S, Liu Y, Wang J, Song H (2020) A Decade Survey of Transfer Learning (2010–2020). IEEE
Trans Artif Intell 1:151–166. https://​doi.​org/​10.​1109/​TAI.​2021.​30546​09
27. Sufian A, Ghosh A, Sadiq AS, Smarandache F (2020) A Survey on Deep Transfer Learning to
Edge Computing for Mitigating the COVID-19 Pandemic: DTL-EC. J Syst Archit 108:101830.
https://​doi.​org/​10.​1016/j.​sysarc.​2020.​101830
28. Wei W, Huerta EA, Whitmore BC et al (2020) Deep transfer learning for star cluster classification:
I. application to the PHANGS-HST survey. Mon Not R Astron Soc 493:3178–3193. https://​doi.​
org/​10.​1093/​mnras/​staa3​25
29. Zhao W, Queralta JP (2020) Westerlund T (2020) Sim-to-Real Transfer in Deep Reinforcement
Learning for Robotics: A Survey. IEEE Symp Ser Comput Intell SSCI 2020:737–744. https://​doi.​
org/​10.​1109/​SSCI4​7803.​2020.​93084​68
30. Dhyani B (2021) Transfer Learning in Natural Language Processing: A Survey. Math Stat Eng
Appl 70:303–311. https://​doi.​org/​10.​17762/​msea.​v70i1.​2312
31. Panigrahi S, Nanda A, Swarnkar T (2021) A Survey on Transfer Learning. Smart Innov Syst Tech-
nol 194:781–789. https://​doi.​org/​10.​1007/​978-​981-​15-​5971-6_​83
32. Liu X, Li J, Ma J et al (2023) Deep transfer learning for intelligent vehicle perception: A survey.
Green Energy Intell Transp 2:100125. https://​doi.​org/​10.​1016/j.​geits.​2023.​100125
33. Al-Hajj R, Assi A, Neji B, Ghandour R, Al Barakeh Z (2023) Transfer learning for renewable
energy systems: a survey. Sustainability 15(11):9131. https://​doi.​org/​10.​3390/​su151​19131
34. Yao S, Kang Q, Zhou MC et al (2023) A survey of transfer learning for machinery diagnostics and
prognostics. Springer, Netherlands
35. Chato L, Regentova E (2023) Survey of transfer learning approaches in the machine learning of
digital health sensing data. J Pers Med 13(12):1703. https://​doi.​org/​10.​3390/​jpm13​121703
36. Haque R, Ali A, Mcclean S et al (2024) Heterogeneous Cross-Project Defect Prediction Using
Encoder Networks and Transfer Learning. IEEE Access 12:409–419. https://​doi.​org/​10.​1109/​
ACCESS.​2023.​33433​29
37. Xie W, Zhang C, Jia K, et al (2023) Cross-Project Aging-Related Bug Prediction Based on Feature
Transfer and Class Imbalance Learning. Proc - 2023 IEEE 34th Int Symp Softw Reliab Eng Work
ISSREW 2023 206–213. https://​doi.​org/​10.​1109/​ISSRE​W60843.​2023.​00075
38. Wu J, Wu Y, Niu N, Zhou M (2021) MHCPDP: multi-source heterogeneous cross-project defect
prediction via multi-source transfer learning and autoencoder. Softw Qual J 29:405–430. https://​
doi.​org/​10.​1007/​s11219-​021-​09553-2
39. Liu C, Yang D, Xia X et al (2019) A two-phase transfer learning model for cross-project defect
prediction. Inf Softw Technol 107:125–136. https://​doi.​org/​10.​1016/j.​infsof.​2018.​11.​005
40. Li K, Xiang Z, Chen T, Wang S, Tan KC (2020) Understanding the automated parameter optimi-
zation on transfer learning for cross-project defect prediction: an empirical study. In: Proceedings
of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Associa-
tion for Computing Machinery, New York, NY, pp 566–577. https://​doi.​org/​10.​1145/​33778​11.​
33803​60
41. Chen Y, Dai H (2021) Improving cross-project defect prediction with weighted software modules
via transfer learning. J Phys Conf Ser 2025:. https://​doi.​org/​10.​1088/​1742-​6596/​2025/1/​012100
42. Zeng F, Lin W, Xing Y, et al (2022) A Cross-project Defect Prediction Model Using Feature
Transfer and Ensemble Learning. Teh Vjesn 29:1089–1099. https://​doi.​org/​10.​17559/​TV-​20220​
42111​0027
43. Lei T, Xue J, Wang Y et al (2022) WCM-WTrA: A Cross-Project Defect Prediction Method Based
on Feature Selection and Distance-Weight Transfer Learning. Chinese J Electron 31:354–366.
https://​doi.​org/​10.​1049/​cje.​2021.​00.​119
44. Tang S, Huang S, Zheng C, et al (2022) A novel cross-project software defect prediction algorithm
based on transfer learning. Tsinghua Sci Technol 27:41–57. https://​doi.​org/​10.​26599/​TST.​2020.​
90100​40
45. Zou J, Li Z, Liu X, Tong H (2023) MSCPDPLab: A MATLAB toolbox for transfer learning based
multi-source cross-project defect prediction. SoftwareX 21:101286. https://​doi.​org/​10.​1016/j.​
softx.​2022.​101286
46. Bai J, Jia J, Capretz LF (2022) A three-stage transfer learning framework for multi-source cross-
project software defect prediction. Inf Softw Technol 150:106985. https://​doi.​org/​10.​1016/j.​infsof.​
2022.​106985
47. Du X, Zhou Z, Yin B, Xiao G (2020) Cross-project bug type prediction based on transfer learning.
Softw Qual J 28:39–57. https://​doi.​org/​10.​1007/​s11219-​019-​09467-0

13
87296 Multimedia Tools and Applications (2024) 83:87237–87298

48. Xu Z, Pang S, Zhang T et al (2019) Cross Project Defect Prediction via Balanced Distribution
Adaptation Based Transfer Learning. J Comput Sci Technol 34:1039–1062. https://​doi.​org/​10.​
1007/​s11390-​019-​1959-z
49. Canfora G, De Lucia A, Di Penta M et al (2013) Multi-objective cross-project defect prediction.
Proc - IEEE 6th Int Conf Softw Testing. Verif Validation, ICST 2013:252–261. https://​doi.​org/​10.​
1109/​ICST.​2013.​38
50. Hosseini S, Turhan B, Mäntylä M (2016) Search based training data selection for cross project
defect prediction. In: Proceedings of the 12th international conference on predictive models and
data analytics in software engineering, pp 1–10. https://​doi.​org/​10.​1145/​29729​58.​29729​64
51. Zhao Y, Zhu Y, Yu Q, Chen X (2021) Cross-project defect prediction method based on manifold
feature transformation. Future Internet 13(8):216. https://​doi.​org/​10.​3390/​fi130​80216
52. Rhmann W (2020) Cross project defect prediction using hybrid search based algorithms. Int J Inf
Technol 12:531–538
53. Jin C (2021) Cross-project software defect prediction based on domain adaptation learning and
optimization. Expert Syst Appl 171:114637. https://​doi.​org/​10.​1016/j.​eswa.​2021.​114637
54. Deepalakshmi J, Chandran M (2022) An optimized clustering model for heterogeneous cross-pro-
ject defect prediction using Quantum Crow search. In: 1st Int Conf Softw Eng Inf Technol (ICo-
SEIT), pp 30–35. https://​doi.​org/​10.​1109/​ICoSE​IT556​04.​2022.​10030​011
55. Xing Y, Lin W, Lin X, Yang B, Tan Z (2022) Cross‐project defect prediction based on two‐phase
feature importance amplification. Comput Intell Neurosci 1:2320447. https://​doi.​org/​10.​1155/​
2022/​23204​47
56. Aljaidi M, Gul S, Faiz R, Samara G, Alsarhan A, al-Qerem A (2023) Impact evaluation of signifi-
cant feature set in cross project for defect prediction through hybrid feature selection in multiclass.
bioRxiv 2023-07
57. Hu Z, Zhu Y (2023) Cross-project defect prediction method based on genetic algorithm feature
selection. Eng Reports 1–15. https://​doi.​org/​10.​1002/​eng2.​12670
58. Faiz R bin, Shaheen S, Sharaf M, Rauf HT (2023) Optimal Feature Selection through Search-
Based Optimizer in Cross Project. Electron 12:. https://​doi.​org/​10.​3390/​elect​ronic​s1203​0514
59. Gottumukkala DP, Ushasree D, Suneetha TV (2024) Software Defect Prediction Through Effec-
tive Weighted Optimization Model for Assured Software Quality. Int J Intell Syst Appl Eng
12:619–633
60. Hu Z, Zhu Y (2023) Cross‐project defect prediction method based on genetic algorithm feature
selection. Engineering Reports 5(12): e12670. https://doi.org/10.1002/eng2.12670
61. Faiz RB, Shaheen S, Sharaf M, Rauf HT (2023) Optimal feature selection through search-based
optimizer in cross project. Electronics 12(3): 514. https://doi.org/10.3390/electronics12030514
62. Kitchenham BA (2012) Systematic review in software engineering: where we are and where we
should be going. Proc 2nd Int Work Evidential Assess Softw Technol (EAST ’12) 1–2. https://​doi.​
org/​10.​1145/​23722​33.​23722​35
63. Malhotra R (2016) Empirical research in software engineering: concepts, analysis, and applica-
tions. CRC press.
64. Pratt LY (1992) Discriminability-based transfer between neural networks. Advances in Neural
Information Processing Systems 5:204–211
65. Feuz KD, Cook DJ (2015) Transfer learning across feature-rich heterogeneous feature spaces via fea-
ture-space remapping (FSR). ACM Trans Intell Syst Technol 6:. https://​doi.​org/​10.​1145/​26295​28
66. Do CB, Ng AY (2005) Transfer learning for text classification. Adv Neural Inf Process Syst
18:299–306
67. Liu X, Liu Z, Wang G et al (2017) Ensemble Transfer Learning Algorithm. IEEE. Access 6:2389–
2396. https://​doi.​org/​10.​1109/​ACCESS.​2017.​27828​84
68. Rana R, Ng AY, Koller D (2006) Constructing informative priors using transfer learning. In: Pro-
ceedings of the 23rd international conference on Machine learning, pp 713–720. https://​doi.​org/​
10.​1145/​11438​44.​11439​34
69. Yu Q, Jiang S, Zhang Y (2017) A feature matching and transfer approach for cross-company defect
prediction. J Syst Softw 132:366–378. https://​doi.​org/​10.​1016/j.​jss.​2017.​06.​070
70. Mihalkova L, Huynh T, Mooney RJRJ (2007) Mapping and revising Markov logic networks for
transfer learning. Aaai 7:608–614
71. Weiss K, Khoshgoftaar T (2018) Evaluation of transfer learning algorithms using different base
learners. Proc - Int Conf Tools with Artif Intell ICTAI 2017-Novem:187–196. https://​doi.​org/​10.​
1109/​ICTAI.​2017.​00039
72. Pan SJ, Kwok JT, Yang Q (2008) Transfer learning via dimensionality reduction. Proceedeings
23th AAAI Conf Artif Intell 677–682. https://​doi.​org/​10.​1109/​TKDE.​2009.​191

13
Multimedia Tools and Applications (2024) 83:87237–87298 87297

73. Pereira FLF, Dos Santos Lima FD, De Moura Leite LG, et al (2017) Transfer learning for Bayesian
networks with application on hard disk drives failure prediction. Proc - 2017 Brazilian Conf Intell
Syst BRACIS 2017 2018-Janua:228–233. https://​doi.​org/​10.​1109/​BRACIS.​2017.​64
74. Dai W, Jin O, Xue GR, et al (2009) Eigentransfer: a unified framework for transfer learning. Proc 26th
Annu Int Conf Mach Learn 193–200. https://​doi.​org/​10.​1145/​15533​74.​15533​99
75. Gargees R, Keller J, Popescu M (2017) Early illness recognition in older adults using transfer learn-
ing. Proc - 2017 IEEE Int Conf Bioinforma Biomed BIBM 2017 2017-Janua:1012–1016. https://​doi.​
org/​10.​1109/​BIBM.​2017.​82177​95
76. Li B, Yang Q, Xue X (2009) Transfer learning for collaborative filtering via a rating-matrix genera-
tive model. 1–8. https://​doi.​org/​10.​1145/​15533​74.​15534​54
77. Yan S, Shen B, Mo W, Li N (2018) Transfer Learning for Cross-Platform Software Crowdsourcing
Recommendation. Proc - Asia-Pacific Softw Eng Conf APSEC 2017-Decem:269–278. https://​doi.​
org/​10.​1109/​APSEC.​2017.​33
78. Wan J, Wang X, Yin Y, Zhou R (2015) Transfer Learning in Collaborative Filtering for Sparsity
Reduction Via Feature Tags Learning Model. 56–60. https://​doi.​org/​10.​14257/​astl.​2015.​81.​12
79. Chen Y, Ding X (2018) Research on cross - Project software defect prediction based on transfer learn-
ing. AIP Conf Proc 1955:. https://​doi.​org/​10.​1063/1.​50337​47
80. Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect predic-
tion. Inf Softw Technol 54:248–256. https://​doi.​org/​10.​1016/j.​infsof.​2011.​09.​007
81. Krishna R, Menzies T (2019) Bellwethers: A Baseline Method for Transfer Learning. IEEE Trans
Softw Eng 45:1081–1105. https://​doi.​org/​10.​1109/​TSE.​2018.​28216​70
82. Long M, Wang J, Ding G et al (2012) Transfer learning with graph co-regularization. Proc Natl Conf
Artif Intell 2:1033–1039. https://​doi.​org/​10.​1609/​aaai.​v26i1.​8290
83. Nam J, Fu W, Kim S et al (2018) Heterogeneous Defect Prediction. IEEE Trans Softw Eng 44:874–
896. https://​doi.​org/​10.​1109/​TSE.​2017.​27206​03
84. Nam J, Pan SJ, Kim S (2013) Transfer defect learning. Proc - Int Conf Softw Eng 382–391. https://​
doi.​org/​10.​1109/​ICSE.​2013.​66065​84
85. Deshmukh AA (2018) SEMI-SUPERVISED TRANSFER LEARNING USING MARGINAL PRE-
DICTORS University of Michigan Electrical Engineering and Computer Science Emil Laftchiev
Mitsubishi Electric Research Labs Data Analytics Cambridge, MA 02139. IEEE Data Sci Work
2018:160–164
86. Zhou JT, Pan SJ, Tsang IW, Yan Y (2014) Hybrid heterogeneous transfer learning through deep learn-
ing. Proc Natl Conf Artif Intell 3:2213–2219. https://​doi.​org/​10.​1609/​aaai.​v28i1.​8961
87. Wei Y, Zhang Y, Huang J, Yang Q (2018) Transfer Learning via Learning to Transfer. Icml
80:5085–5094
88. Kocaguneli E, Menzies T, Mendes E (2015) Transfer learning in effort estimation. Empir Softw Eng
20:813–843. https://​doi.​org/​10.​1007/​s10664-​014-​9300-5
89. Cui Y, Song Y, Sun C, et al (2018) Large Scale Fine-Grained Categorization and Domain-Specific
Transfer Learning. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 4109–4118. https://​
doi.​org/​10.​1109/​CVPR.​2018.​00432
90. Feuz KD, Cook DJ (2014) Heterogeneous transfer learning for activity recognition using heu-
ristic search techniques. Int J Pervasive Comput Commun 10:393–418. https://​doi.​org/​10.​1108/​
IJPCC-​03-​2014-​0020
91. Chen J, Yang Y, Hu K et al (2019) Multiview transfer learning for software defect prediction. IEEE
Access 7:8901–8916. https://​doi.​org/​10.​1109/​ACCESS.​2018.​28907​33
92. Qing H, Biwen L, Beijun S, Xia Y (2015) Cross-project software defect prediction using feature-
based transfer learning. In: Proceedings of the 7th Asia-Pacific Symposium on Internetware, pp
74–82. https://​doi.​org/​10.​1145/​28759​13.​28759​44
93. Tong H, Liu B, Wang S, Li Q (2019) Transfer-learning oriented class imbalance learning for cross-
project defect prediction. https://​doi.​org/​10.​48550/​arXiv.​1901.​08429
94. Jing X, Wu F, Dong X, et al (2015) Heterogeneous cross-company defect prediction by unified metric
representation and CCA-based transfer learning. 2015 10th Jt Meet Eur Softw Eng Conf ACM SIG-
SOFT Symp Found Softw Eng ESEC/FSE 2015 - Proc 496–507. https://​doi.​org/​10.​1145/​27868​05.​
27868​13
95. Cao Q, Sun Q, Cao Q, Tan H (2015) Software defect prediction via transfer learning based neural
network. Proc 2015 1st Int Conf Reliab Syst Eng ICRSE 2015. https://​doi.​org/​10.​1109/​ICRSE.​2015.​
73664​75
96. Krishna R, Menzies T, Fu W (2016) Too much automation? the bellwether effect and its implications
for transfer learning. ASE 2016 - Proc 31st IEEE/ACM Int Conf Autom Softw Eng 122–131. https://​
doi.​org/​10.​1145/​29702​76.​29703​39

13
87298 Multimedia Tools and Applications (2024) 83:87237–87298

97. Weiss KR, Khoshgoftaar TM (2017) An investigation of transfer learning and traditional machine
learning algorithms. Proc - 2016 IEEE 28th Int Conf Tools with Artif Intell ICTAI 2016 283–290.
https://​doi.​org/​10.​1109/​ICTAI.​2016.​48
98. Su KM, Robbins KA, Hairston WD (2017) Adaptive thresholding and reweighting to improve domain
transfer learning for unbalanced data with applications to EEG imbalance. Proc - 2016 15th IEEE Int
Conf Mach Learn Appl ICMLA 2016 320–325. https://​doi.​org/​10.​1109/​ICMLA.​2016.​34
99. Jing XY, Wu F, Dong X, Xu B (2017) An Improved SDA Based Defect Prediction Framework for
Both Within-Project and Cross-Project Class-Imbalance Problems. IEEE Trans Softw Eng 43:321–
339. https://​doi.​org/​10.​1109/​TSE.​2016.​25978​49
100. Wu F, Jing XY, Dong X, et al (2017) Cross-project and within-project semi-supervised software
defect prediction problems study using a unified solution. In: Proceedings - 2017 IEEE/ACM 39th
International Conference on Software Engineering Companion, ICSE-C 2017. Inst Electr Electron
Eng Inc 195–197. https://​doi.​org/​10.​1109/​ICSE-C.​2017.​72
101. Duan L, Tsang IW, Xu D (2012) Domain transfer multiple kernel learning. IEEE Trans Pattern Anal
Mach Intell 34:465–479. https://​doi.​org/​10.​1109/​TPAMI.​2011.​114
102. Wei Y, Zhang Y, Huang J, Yang Q (2018) Transfer learning via learning to transfer. 35th Int Conf
Mach Learn ICML 11:8059. https://​doi.​org/​1783.1/​92190
103. Weiss KR, Khoshgoftaar TM (2017) Detection of Phishing Webpages Using Heterogeneous Transfer
Learning. Proc - 2017 IEEE 3rd Int Conf Collab Internet Comput CIC 2017 2017-Janua:190–197.
https://​doi.​org/​10.​1109/​CIC.​2017.​00034
104. Xu Y, Pan SJ, Xiong H et al (2017) A Unified Framework for Metric Transfer Learning. IEEE Trans
Knowl Data Eng 29:1158–1171. https://​doi.​org/​10.​1109/​TKDE.​2017.​26691​93

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.

13

You might also like