A Systematic Review of Transfer Learning in Software Engineering
A Systematic Review of Transfer Learning in Software Engineering
https://doi.org/10.1007/s11042-024-19756-x
Abstract
Nowadays, everyone requires a good quality software. The quality of software can’t be
assured due to lack of data availability for training, and testing. Thus, Transfer Learning
(TL) plays an important role in the reusability of existing software for developing new
software with a similar domain and task. TL focused on transferring knowledge from
existing prediction models for the development of new prediction models. The developed
models are used for unseen datasets based on the characteristics, and nature of the data-
set. The sufficient amount of training data is unavailable. The data distribution and task of
the source and target project must be checked before employing TL for software develop-
ment. In this Systematic Review (SR), we have investigated 39 studies from January 1990
to March 2024 that used TL in the software engineering domain. The review focused on
the identification of Machine Learning (ML) techniques used with TL techniques, types
of TL explored, TL settings explored, experimental setting, dataset, quality attribute, vali-
dation methods, threats to validity, strengths and weakness of TL techniques, and hybrid
techniques with TL. According to the experimental comparison, the performance of TL
techniques is encouraging. The findings of this SR paper will serve as guidelines for acad-
emicians, software industry experts, software developers, software testers, and researchers.
This SR is also helpful in the selection of appropriate types of TL and TL settings for the
development of efficient software in the future based on the type of problem and TL set-
ting. Thus, this study showed that 30.67% of the studies are focused on defect prediction,
that used 15% open-source dataset. Further, 35% of studies used SVM as a base classifier
for TL, and different independent variables of the used dataset are considered as prediction
model input. Further, the K-fold Cross-Validation (CV) method is used in 15 studies.
* Shweta Meena
shwetameena@dtu.ac.in
Ruchika Malhotra
ruchikamalhotra2004@yahoo.com
1
Department of Software Engineering, Delhi Technological University, Delhi, India
13
Vol.:(0123456789)
87238 Multimedia Tools and Applications (2024) 83:87237–87298
1 Introduction
Nowadays, the demand for efficient software increasing rapidly. The development of cor-
rect and accurate software requires huge amounts of data for the development of predic-
tion models using latest techniques such as transformation of knowledge from one project
to another project named as TL. TL consists of two words, i.e., transfer and learning; the
name itself indicates that something is transferred by learning. TL means gathering knowl-
edge from other models, storing that knowledge, and transferring it to other models, which
would be helpful for other existing models. However, TL can be referred to as learning to
learn, life-long learning, knowledge transfer, inductive transfer, multitask learning, knowl-
edge consolidation, context-sensitive learning, knowledge-based inductive bias, meta-
learning, and incremental/cumulative learning [1]. Multitask learning is associated with
TL in such a way that it tries to learn multiple tasks simultaneously, which are not related
to each other. The approach followed by multitasking learning, which is different from
other types is that it uncovers some of the latest features which are related to each task.
The two main issues addressed in TL are what data is required to transfer, and how
will we transfer the data. That is why various TL algorithms are applied across a training
and target domain, which provides different knowledge to transfer, and the performance
improvement also varies in the various target domains. The objective is to explore and
develop the optimal TL algorithm that maximizes performance improvement in a reason-
able amount of time with minimum effort, and it requires exhaustive research and experts
in this area. A novel TL framework was developed known as Learning to Transfer (L2T).
The L2T framework is helpful in automatically determining what and how to transfer by
using past experiences in TL.
1.1 Motivation
Nowadays, mostly used techniques or methods are related to ML. It has numerous applica-
tions in different fields. ML works on the assumption that the training and testing data are
from a similar feature space. They have a similar probabilistic distribution. However, in
the real world, we have examined that the training and future data have a different feature
space and distribution. When the distribution of the training and future data changes, we
required to build the prediction models from starting using newly gathered training data. It
is not feasible to perform a recursive task of training data collection and then developing
efficient prediction models. It would reduce the time and effort in recollecting the training
data and help in transferring knowledge between problem domains.
There are various ML algorithms available in the existing literature. These ML algo-
rithms are used to construct the prediction model for transferring knowledge. In the exist-
ing literature, various ML techniques have been used to transfer knowledge from source
to target domain where data is unlabeled. In the existing study [2] authors has employed
Support Vector Machines (SVM), Decision Trees (DT), and Random Forest (RF) to trans-
fer knowledge. The use of these classification techniques provides benefits for transferring
knowledge and designing of a prediction model. A Deep Learning (DL)-based sketch rec-
ognition method [3] is proposed, which is used for different granular sketches for fine-tun-
ing different layers of neural networks.
The usage of TL techniques must be enhanced. However, it is required to consistently
outline the documentation of already developed techniques from the existing literature and
13
Multimedia Tools and Applications (2024) 83:87237–87298 87239
experiments. There does not exist any SR for the TL that focuses on the TL techniques in
software engineering. In this paper, an extensive review of studies between December 1990
and March 2024 is performed.
The objective of this SR is to analyze, outline, and evaluate the experimental proofs
concerning: (1) the quality attributes used for TL (2) ML techniques used for TL (3) the
experimental settings that have been used for TL such as datasets used for TL, independ-
ent variables used for TL, algorithms used for TL, validation techniques, performance
measures used for TL, statistical test used for TL (4) effectiveness of TL algorithm using
ML techniques (5) threats to validity using TL (6) advantages and disadvantages of TL
techniques.
1.2 Innovation
There are various guidelines provided to software developers, industry experts, and
research scholars concerning the usage of TL in knowledge transfer from one domain to
another domain. To complete our objective, we have examined different online libraries
and determined 39 relevant studies. These 39 studies are used to get the research ques-
tion answers related to ML techniques used with TL. We have also considered the quality
assessment criteria, inclusion and exclusion criteria for the selection of relevant studies.
With respect to TL categories, some authors explored heterogeneous TL [4]. However, the
researchers have found the answers of heterogeneous TL methods concerning target labels,
and surveyed methods used for heterogeneous TL. Furthermore, the authors discussed
methods for heterogeneous TL such as Directed Cyclic Network (DCN). Thus, the com-
parison among empirical studies was conducted considering different methodologies and
techniques for heterogeneous TL.
The sentiment analysis domain has wide uses of TL. In the recent study, a TL-based
approach [5] was designed by the researchers for offensive and hate speech detection on
social media platforms. ML models are used for the prediction of hate speech and offen-
sive language with TL. The experiment has been conducted using three different datasets.
Moreover, the performance was same for all three datasets using proposed methodology by
the researchers. Unigram and bigram-based ML models are considered as baseline mod-
els in the study. The approach used by the authors resulted in a more robust and efficient
methodology for the detection of hate speech and abusive language. Object detection also
used TL [6]. TL has been used for remote sensing image aircraft object detection. A faster
R-CNN algorithm is applied for natural images for remote sensing images. An ensemble
method [7] has been presented that combines features of pre-trained deep convolutional
neural networks (ResNet-50 and ResNet-101) for fruit freshness classification. The TL-
based approach outperformed in comparison to other methods but requires to reduce data-
set size. Human activity recognition also used TL in various ways. In the existing study,
the proposed approach based on TL is compared with existing state-of-the-art methods [8].
In the existing study [9], a novel algorithm was developed by the researchers termed as
Gravitational Search Algorithm (GSA). GSA is based on Gravity and the notion of mass
interaction. The existing algorithm performance was compared with GSA. Thus, the gravi-
tational search algorithm outperformed existing algorithms such as PSO, RGA, and CFO.
However, TL based approach outperformed in comparison to state-of-the-art methods. In
the existing study, deep TL-based methods are used to avoid a lack of data problem issues
[10].
13
87240 Multimedia Tools and Applications (2024) 83:87237–87298
The authors have compared pre-trained models with traditional ML models considering
data augmentation and deep TL methods. In the existing studies, TL is used in the software
engineering field considering defect prediction, change prediction, effort estimation, and
maintenance of software. Furthermore, TL is useful for future projects which exhibit simi-
lar characteristics or properties as existing projects. However, TL plays an important role in
designing prediction models.
1.3 Contribution
This SR mainly focused on the usage of TL in the software engineering domain including
literature of last 35 years, from January 1990 to March 2024. The research is limited to this
period, as the idea of TL emerged in 1990. The emergence of TL helped academicians,
software industry experts, and developers to develop more prediction models using avail-
able data. However, 122 studies are reviewed, and 39 studies are examined thoroughly after
an extensive search through various digital libraries mentioned further. This review pre-
sents and evaluates the empirical evidence of different TL settings, TL types, and search-
based techniques performance for TL in software engineering. Moreover, TL is a subset
of Cross-Project (CP) and cross-company models. The above techniques performance is
empirically validated with ML techniques. The objectives of this SR are to analyze the per-
formance of TL in software engineering including:
1.4 Sections
13
Multimedia Tools and Applications (2024) 83:87237–87298 87241
2 Background
In the existing literature, several studies exist in which TL is effectively applied for the
identification of defects, and change. Only one study exists to estimate the software effort
in the literature. A survey [11] was conducted to analyze the reinforcement learning with
TL and provided future directions with various TL domains, task differences, multi-task
learning, and methods to transfer knowledge for reinforcement learning. The authors sur-
veyed the usage of TL in bioinformatics [12] with existing growth of TL and considered
various aspects such as several key bioinformatics application areas, including sequence
classification, generic expression data analysis, biological network reconstruction, and
biomedical applications with 12 different domains and 25 primary studies. Furthermore,
authors [13] conducted a survey of TL using computational intelligence with TL using a
Neural Network (NN), TL using a Hierarchical Bayesian (HB) model, TL using a Bayesian
Network (BN), TL using a Genetic Algorithm (GA), and TL using fuzzy system with 30
studies. Moreover, a detailed survey [2] of TL technique is conducted with a main focus on
20 studies and the TL algorithm in the last 5 years.
A survey of [14] TL is conducted which studied the concepts and definitions related
to TL, with more focus on TL settings, TL approaches, TL applied to DL, types of TL
such as domain adaptation, domain confusion, multitask learning, one-shot learning, zero-
shot learning, meta-learning. The authors surveyed to analyze TL for activity recognition
[15] in terms of categorization of TL by sensor modality (video sequences, wearable sen-
sors, ambient sensors, crossing the sensor boundaries, physical setting boundaries), by the
difference between source and target data, data availability, and the information (instance
transfer, feature-representation transfer, parameter transfer, relational knowledge transfer)
that is transferred. In the existing study [16–20], authors also explored inclined planes
system optimizations that used Netwon’s second law of motion. The improved version of
inclined planes systems optimization algorithm is developed. Thus, authors tested and vali-
dated the improved version on 100 independent trails that showed success in 90% cases. It
is concluded that proposed algorithm improved inclined planes systems optimization algo-
rithm outperformed existing optimization algorithms in terms of various factors such as
estimated coefficients, convergence, fitness, output responses, noise analysis, stability, and
reliability.
The authors [21] conducted a survey of TL for collaborative recommendation auxiliary
data by introducing TL definitions, and categorization of TL techniques with different TL
strategies, and a novel and generic TL framework was introduced. The representative work
of the TL strategy is discussed in detail. The authors surveyed TL for smart home and new
user home automation adaptation [22], TL for sentiment analysis [23], TL for robot and
human automation [24], image style TL [25], decade TL survey [26], TL for edge com-
puting [27], TL for cluster classification [28], deep reinforcement TL survey [29], TL in
natural language processing [30], generalized survey of TL [31], TL for electronic vehicles
[32], TL for renewable energy [33], TL for machinery diagnostics and protagonistic[34],
learning of health sensing data using TL helpful in future medication and healthcare sector
[35]. Thus, it has been observed that all the surveys are conducted considering the speci-
fied domain in which the software engineering domain has not explored till now. Thus, the
objective of conducting SR for TL in software engineering arose. However, apart from the
software engineering domain, huge number of studies exist in other domains, and authors
have applied TL in various aspects. The usage of TL is limited in software engineering.
The usage of TL helps in the reusability of existing models or project data.
13
87242 Multimedia Tools and Applications (2024) 83:87237–87298
The authors [36] have analyzed the impact of TL using Cross-Project Defect Prediction
(CPDP) by identifying the linear relationship between software defects. However, a novel
method was proposed named Heterogenous CPDP (HCPDP) using Encoder Networks and
Ensemble Learning (ENEL). Furthermore, the performance of ENEL is analyzed using
precision, recall, G-mean, F1-score, and AUC. The performance, and quality of large
software are also enhanced using TL in the existing study [37]. To prevent software from
early aging in defect prediction, a Cross-Project-Based Aging (CPBA) related bug predic-
tion model was developed. Thus, a hybrid CPAP with feature TL and class imbalance was
proposed to remove early aging in software. The experiment is conducted and evaluated
on three different projects with two performance metrics such as AUC and balance meas-
ure. However, the proposed hybrid CPBA performed 9.0%, 32.9%, 4.4%, and 3.9% better
in comparison to state-of-the-art CPBA methods, namely TLAP, JPKS, SRLA, and JDA-
ISDA, respectively.
Furthermore, recently authors explored HCPDP in detail by reusing the existing pro-
ject dataset for the target project [38]. A novel approach named Multi-source HCPDP
(MHCPDP) was developed to reduce the difference between source and target project.
Thus, a multi-source TL algorithm was developed to improve the performance of a base
classifier by reducing the impact of negative transfer. The performance of MHCPDP was
evaluated on five different datasets using two performance metrics and resulted in perform-
ing better than the traditional HCPDP method. In the existing study, it was proved that
TCA does not perform well for CPDP. Thus, TCA + was developed resulting in reducing
the difference between the data distribution of source and target projects. Moreover, the
authors developed a novel method named as Two-Phase TL model (TPTL) [39] to over-
come the limitations of TCA + , and TCA. In the first phase, a source project estimator was
proposed for the automatic selection of two source projects with the highest distribution
similarity for target projects. In the second phase, two prediction models were developed
based on selected source projects in the previous phase. The performance of TPTL was
analyzed on 42 defect datasets from the PROMISE repository, and compared with state-of-
the-art methods. Thus, results showed that TPTP performed better than baseline methods
by 19%, 5%, 36%, 27%, and 11% in terms of F1-score; by 64%, 92%, 71%, 11%, and 66%
in terms of cost-effectiveness.
The authors proposed 62 CPDP models using parameter optimization with TL. Thus,
it has been concluded that automatic [40] parameter optimization for CPDP improves per-
formance by 77% with reasonable computational cost. However, CPDP [41] improvised
with weighted software modules via TL. The gravity-based analogy is used to assign train-
ing modules weights and compare them to the test set. Moreover, cost-sensitive C4.5 is
employed on weighted training data. The experiment was conducted using 10 NASA data-
sets and provided results as 0.81 PD value, 0.41 F-measure value, and 0.8 AUC value. The
performance of C4.5 CPDP model compared with Naïve Bayes (NB) CPDP model. The
authors experimented to overcome data distribution variation, by combining feature trans-
fer with EL [42].
The feature transfer method is introduced with two stages feature transfer and classifica-
tion. The experiment conducted on 20 source projects showed that a two-stage combina-
tion of feature transfer with EL outperformed. However, more studies conducted to deal
with feature selection and distance-weight instance transfer [43]. A new technique named
as Multi-WCM-WtrA is introduced and tested using AEEEM and ReLink dataset. Thus,
Multi-WCM-WTrA outperformed with an improvement of 23% in comparison with the
TCA + algorithm on AEEEM dataset and a 5% improvement on ReLink dataset. A novel
approach named as TSboostDF was proposed in consideration of knowledge transfer and
13
Multimedia Tools and Applications (2024) 83:87237–87298 87243
class imbalance issues [44] and the results proved that TSboostDF outperformed existing
TL methods. However, the testing resources and testing efficiency play important role in
software defect prediction. Thus, a novel approach was proposed named as Multi-Source
CPDP (MSCPDP) based on TL.[45]. However, existing MSCPDP approaches are not
open-source, due to which MSCPDP lab modifies state-of-the-art MSCPDP models with
unified structure, data processing, model training, and testing, 13 performance evaluation
metrics. Thus, the toolbox functionalities were presented in the study. It was analyzed that
TL performed effectively for CPDP. Moreover, two issues need to be addressed data distri-
bution difference between source and target data, and selection of single source projects.
Thus, authors [46] developed a Three-Stage Weighting Framework for Multi-Source TL
(3SW-MSTL). Furthermore, there exists a study that showed the it is not mandatory for
source and target data to follow similar distribution using TL [47].
The conditional distribution is not considered by the authors in the existing study. Thus,
a study [48] was conducted to develop a novel approach using conditional distribution
names as Balanced Distribution Adaptation (BDA). The effectiveness of BDATL was ana-
lyzed using 18 projects from four datasets using six performance metrics such as AUC,
Recall, G-mean, Balance, and F-measure. The performance of models developed using
BDA compared with 12 baseline methods showed improvement of 23.8%, 12.5%, 11.5%,
4.7%, 34.2%, and 33.7%.
In the existing studies, search-based techniques are also explored for CDP with TL.
Thus, the authors conducted a study [49] and proposed a novel multi-objective approach
for CPDP using logistic regression with a genetic algorithm. Moreover, instead of giving a
single predictive model to the software engineer, it is better to have multiple options with
a multi-objective approach to reduce compromise between defect-prone artifacts and LOC
to be analyzed, the result was validated on 10 datasets of the PROMISE repository. Fur-
thermore, the Nearest Neighbor (NN)-Filter, [50] embedded in a GA, is used for CPDP to
generate a training dataset in case of its non-availability. A new search-based approach was
proposed named Genetic Instance Selection (GIS) that optimizes the combined measure
of F—measure and G-Mean on a validation set of NN-filter. The performance is evaluated
on 13 datasets of the PROMISE repository, CPDP with GIS, mainly NN-filter and naïve
CPDP.
A novel approach [51] proposed considering manifold feature transformation. This
approach transforms actual features into manifold space and reduces the difference in data
distribution of the transformed source project and target project in the manifold space.
Furthermore, the transformed project would be used for the NB classifier. The experiment
was conducted with AEEEM, and ReLink dataset using F1-measure. In the existing study
[52], authors experimented with hybrid search algorithms for CPDP and Within-Project
DP (WPDP). In the existing study, [53] authors used the Kernel Twin SVMs (KTSVMs)
to implement Domain Adaptation (DA) to match the distributions of training data for dif-
ferent projects. However, KTSVMs with DA function (called DA-KTSVM) are also used
for CPDP. Further, paraterms are also optimized using the Quantum Particle Swarm Opti-
mization algorithm (QPSO), and the optimized DA-KTSVM is called DA-KTSVMO. The
experiment was conducted on 17 open-source software projects. It was concluded that DA-
KTSVMO performed better to utilize sufficient data knowledge and easily reuse defective
data to improve the prediction performance of DA-KTSVMO.
The authors [54] used fuzzy means clustering to estimate similarity between source and
target features. The experiment was conducted on six projects of five heterogeneous data-
sets and concluded that the Quantum Crow Search Optimized Intuitionistic Fuzzy C Means
Clustering (QCSO-IFCMC) achieved higher accuracy to avoid local average in comparison
13
87244 Multimedia Tools and Applications (2024) 83:87237–87298
to existing clustering models. Further, the authors conducted a study of CPDP using [55]
two-phase feature importance amplification. The authors [56] conducted CPDP through
Hybrid Feature Selection (HFS). Thus, the strength of RF and recursive feature elimination
methods are used to select relevant features. The experimental result showed 78% average
accuracy of all prediction models using HFS for CPDP.
The authors [57] developed a novel feature selection method based on GA with two
stages such as feature selection, and EL. The feature selection stage selects features using
the integrated training results of candidate feature subsets for the training set to obtain the
optimal set and in the ensemble training phase, the EasyEnsemble method is used to allevi-
ate the class imbalance problem, multiple NB classifiers. However, the proposed GA-based
algorithm improves the performance average F1-score value by 38.9%, 31.6%, 35.1%,
22.0%, and 31.6%. The features are selected optimally through a search-based optimizer for
CPDP [58] by integrating the Artificial Neural Network (ANN) filter, KNN filter, Random
Forest Ensemble (RFE) model, GA, and classifiers as manipulative independent variables.
The authors analyzed [59] the predictive capability of the firefly algorithm for the selection
of a minimal number of metrics and providing them as input to SVM classifiers. However,
the fitness function of the firefly algorithm was modified to maximize the performance in
terms of accuracy and minimize the number of metrics. Furthermore, the Hybrid Firefly
(HFF) algorithm or Weighted FCM Firefly Search (WFCMFF) approach is proposed to
find a better set of metrics to further improve the performance of defect prediction. Thus,
the FF algorithm and the Stochastic Weighted FCM Search (SWFCMS) algorithm com-
bined to select the better set of metrics. Thus, the proposed model improved the perfor-
mance from 86.27% to 93.26%.
13
Multimedia Tools and Applications (2024) 83:87237–87298 87245
13
87246 Multimedia Tools and Applications (2024) 83:87237–87298
13
Multimedia Tools and Applications (2024) 83:87237–87298 87247
13
87248 Multimedia Tools and Applications (2024) 83:87237–87298
13
Multimedia Tools and Applications (2024) 83:87237–87298 87249
13
87250 Multimedia Tools and Applications (2024) 83:87237–87298
13
Multimedia Tools and Applications (2024) 83:87237–87298 87251
3 Problem statement
3.1 Context
The objective of this section is to explain the computation involved in completing this
study. The study was conducted to analyze the usage of TL in the software engineering
domain. Thus, good quality software must be designed and developed using TL in the
absence of a sufficient amount of training data. Consider two projects A and B of two dif-
ferent companies Amazon (X) and Flipkart (Y). A prediction model (PMX) needs to be
developed using XA data and that PMX must be reused to develop a prediction model for
company Y. However, the data distribution of both companies is not similar DSX ≠ DSY.
Thus, company B is required to check the similarity among the domains of X and Y data.
Based on the domain, and data distribution of X and Y, PMX will be used for designing
PMY through YB using different types of TL such as feature transfer, instance transfer, rela-
tional knowledge transfer, and parameter transfer.
Domain: SE.
13
87252 Multimedia Tools and Applications (2024) 83:87237–87298
5 I/EC = {IC1, IC2, IC3, ………….ICn, EC1, EC2, EC3, EC4,………………, ECi}.
6 QAC = {QAC1, QAC2, QAC3, ………….QACi}.
7 If si ∈ ICn, select Si else remove si from the Si.
8 Compute QACj, if QAC values for PSa are greater than 7.5 then select PSa else reject
PSa.
9 DS = {DS1, DS2, DS3, ………….DSa}.
10 FInta of RQp from PSa.
n
∑
OutcomeofSystmaticReview = RQi ∗ SSi ∗ ICi ∗ ECi ∗ QACi ∗ PSi ∗ DSi
i=1
This SR is most useful for industry experts, software developers, and academicians, to
reuse the existing models for the development of similar models using TL. In software
engineering domain, the availability of training data is reducing everyday. Due to this other
project data with similar characteristics to some extent are useful in the development of an
efficient prediction model.
4 Review methodology
This section discusses the procedure followed in completing this SR provided by Kitchen-
ham [62]. This procedure has three different stages in which this review has been carried
out. The three stages of the methodology used are review planning, review organization,
and describing the results of an SR are discussed.
The procedure illustrated in Fig. 1. concerning the novelty of the Kitchenham meth-
odology [62] in the software engineering domain the following aspects are considered:
relevance to the research question, establishment of best practices, and adaptability. The
SR methodology is based on a selection of primary studies based on the relevance of
research questions designed. A structural framework is provided with the extraction
of relevant studies from the large set of studies and synthesized results from the pri-
mary study. In the software engineering domain, various studies referred to Kitchen-
ham guidelines for conducting the SR using specified procedures of data extraction in a
13
Multimedia Tools and Applications (2024) 83:87237–87298 87253
13
87254 Multimedia Tools and Applications (2024) 83:87237–87298
The objective behind conducting the SR is to study, analyze, and evaluate actual documen-
tation of the studies using various techniques of ML, different TL techniques, and various
approaches that are used by these TL techniques. Table 1 presents six research questions
focused on the SR. We have analyzed these studies using different quality attributes that
13
Table 1 Research questions formation for this systematic review
RQ_# Research Questions Motivation
RQ1 Which quality attributes are used for TL? Determine quality attributes used
RQ2 Which kind of ML techniques are used for TL? Determine various classes of ML techniques that have been used for knowledge transfer
RQ3 What experimental settings have been used for TL? Identify the experimental setup in which the experiment was conducted
RQ3.1 Which datasets have been used for TL? Identify datasets used for TL
RQ3.2 Which independent variables have been used for TL? Identify the independent variables
Multimedia Tools and Applications (2024) 83:87237–87298
RQ3.3 Which algorithms have been used for TL? Identify the efficient used TL algorithm
RQ3.4 What validation techniques have been used for TL? Identify the various validation methods that are used
RQ3.5 Which performance measure has been used for TL? To check the performance of ML techniques for TL
RQ3.6 What statistical test has been used for TL? Identify a statistical test that is reported to be appropriate for TL
RQ3.7 Which category of TL has been used? Identify the category of the TL method
RQ4 Which TL methods are found to be effective using ML techniques? To explore the effective TL technique using results provided by various evaluation measures
RQ5 What are the threats to validity for TL? Identify the types of threats to validity used
RQ6 What are the advantages & disadvantages of various TL techniques? Examine the information about TL techniques
87255
13
87256 Multimedia Tools and Applications (2024) 83:87237–87298
have been used for TL (RQ1). We have analyzed various ML techniques that are used for
TL in different studies (RQ2). In the third RQ, we have studied the experimental settings
that have been used in the primary studies. The third RQ summarizes the datasets used
for TL, independent variables used for TL, algorithms used for TL, validation techniques
used for TL, the performance measure used for TL, the statistical test used for TL, type
of knowledge transferred among source and target data (RQ3). In RQ4, we have studied
the TL algorithms which are performed to be effective using ML techniques (RQ4). In
RQ5, we have summarized the threats to validity of TL (RQ5). The sixth research question
identifies the strengths and weaknesses of various TL techniques used in the primary stud-
ies (RQ6). The last question guides software practitioners, researchers, professionals, and
industry experts about the appropriate ML techniques for TL.
4.2 Search strategy and various criteria used for the selection of primary studies
We have selected key studies by using various search terms or words. We have formed
these search terms or words by incorporating similar terms or words and alternative terms
or words using ‘OR’ boolean expression and joining the main search terms or words using
‘AND’ boolean expression. We have presented some of the search terms or words that are
used to recognize primary studies:
((“Transfer” OR “transfer learning” OR “transfer knowledge” OR “knowledge
transfer” OR “transfer of learning”) AND (“variables” OR “parameters”) AND
(“machine learning” OR “support vector machine” OR “neural network” OR
“ensemble learning” OR “random forest” OR “decision tree” OR “naive bayes”
OR “CART” OR “bayesian network”) AND (“cross-project” OR “cross-com-
pany”) AND (“defect” OR “change” OR “effort” OR “maintenance” OR “soft-
ware quality” OR “software quality improvise”) AND (“improved” OR “better” OR
“enhanced”) AND (“validation” OR “empirical” OR “design” OR “development”)
AND (“evolutionary” OR “search” OR “optimized” OR “heuristic” OR “particle
swarm” OR “harmony search” OR “simulated annealing” OR “bat search” OR
“swarm intelligence” OR “firefly search” OR “gravitational search” OR “inclined
planes sytem” OR “bio-inspired” OR “genetic algorithm” OR “Grey wolf” OR
“cuckoo serach” OR “ant colony” OR “artificial bee colony”) AND (“method”
OR “technique” OR ‘algorithm” OR “variant” OR “model”) AND (“dataset” OR
“database”) AND (“cross-validation” OR “hold-out validation”) AND (“statisti-
cally” OR “validated” OR “statistical” OR “statistical test” OR “paired test” OR
“wilcoxon” OR “ANOVA”)).
The ML related search terms or words are extracted from the ML-based research pub-
lications. We have selected the digital portals after identification of the search terms or
words. Some of the digital portals are accessible at the university only. Various electronic
databases were explored for the collection of primary studies. The electronic databases
that are mentioned above used for the selection of key studies. The combination of search
strings, terms, and words is used by these electronic databases. For the selection of key
studies, these electronic databases have been used. We restricted search from December
1990 to March 2024, the development of ML techniques started in 1990. Firstly, we have
to choose the electronic databases that are required to explore. The search procedure for
the identification of primary studies has been performed. The second step is to identify the
13
Multimedia Tools and Applications (2024) 83:87237–87298 87257
relevant studies by accessing complete text papers. The second step consist of the inclusion
and exclusion criteria discussed in a further section.
The empirical studies related to ML techniques for TL are also included in this SR. We
have identified 39 key studies that are included in this SR. The studies are selected based
on the inclusion/ exclusion criterion. This criterion is as follows:
Criteria to include studies:
We have tested the inclusion and exclusion criteria mentioned above. We have reviewed
the complete paper in case of a doubt whether the study should be included or excluded.
Also, the quality of the studies is identified based on the importance of the research ques-
tions. Next, the final studies are obtained by applying the quality evaluation criteria men-
tioned in the following section.
In this section, the formation of a quality evaluation questionnaire has been discussed.
These questionnaires are used to study the purpose and strength of selected primary stud-
ies. The quality assessment criteria were designed by considering the guidelines and sug-
gestions provided in the existing studies [63]. We have used quality assessment criteria for
assigning a particular weight to every study. Table 2 presents quality evaluation criteria.
We have decided on three parameters corresponding to each question, which are based on
whether the particular study answers the question or not. If the study answers the question,
then we tickmark corresponding to the yes parameter. If the study does not answer the
question, then we tick the mark corresponding to no parameter. Every question has been
assigned some rank as 1 (yes), 0.5 (partly), and 0 (no). The summation of values assigned
to each question provides the final score corresponding to each study. The maximum and
minimum score of every study is 13 and 0.
Each study is assessed corresponding to quality questions by assigning a score (0, 0.5, or
1). The scores were categorized into various classes like very high (10.1 ≤ scores ≤ 13),
high (7.6 ≤ scores ≤ 10), medium (5.1 ≤ scores 7.5), low (2.6 ≤ scores ≤ 5.0), and very low
13
87258 Multimedia Tools and Applications (2024) 83:87237–87298
(0 ≤ scores ≤ 2.5). The highest and lowest score that could be given to a study was 13 and
0. In the next sub-sections, we have selected primary studies with an identifier. We want to
analyze the studies by the information contained or based on the experiment that they have
performed. The mathematical model used for quality evaluation criterion can be analyzed
from Table 2. In Table 2, the quality questions designed for analyzing the quality of each
study mentoned in column 2, corresponding to each column quality questions there is a rat-
ing in terms of Yes, Partly, and No with score 1, 0.5, 0. Thus, the quality of each study
analyzed by answering 13 quality questions on a scale of 1 to 0. After computation of qual-
ity scale, the quality score of each study is calculated based on the score assigned to each
quality question. The studies having quality score lies in the range of 7 to 13 are selected
for further data extraction process in order to address the answer of each RQs are presented
in Table 3.
The form has been filled out to extract data from primary studies. Data extraction form is
mainly used to design the RQ, or we can say that for the identification of primary studies
that answer the RQ. In the data extraction form, we have summarized the details about
every study, such as the author’s name, the title of a primary study, publisher details, exper-
imental settings, dataset details, independent variables, validation techniques, and ML
techniques used in the primary studies. We have gathered details about primary studies
using data extraction. In the data extraction card, we have stored the details about RQs that
are answered by a particular primary study. We have stored results in the form of an Excel
file. These results are useful for the data synthesis process in the future.
After data extraction, the next step is data synthesis. The role of data synthesis is basi-
cally to gather factual data and evidence from the selected primary studies. This factual
data and figures are combined to answer the RQs. Some of the primary studies stated iden-
tical and comparable viewpoints, or they may prove different things by performing dif-
ferent experiments. We have studied and analyzed the quantitative and qualitative data in
this review. The quantitative data consists of different values such as values for evaluation
13
Multimedia Tools and Applications (2024) 83:87237–87298 87259
measures like precision, recall, accuracy, AUC, F-measure and error rate. The qualitative
data consists of experimental setup, different ML techniques, data sets used in primary
studies, empirical validation methods, strengths, and weaknesses of various TL techniques/
algortihms. We have used tables for presenting and discussing the results with pictorial
representation to answer RQs such as line chart, boxplot, bar graph, and pie graph.
5 Primary Studies
This section provides a summary of selected primary studies. The total primary studies that
we have selected are 39 out of 122 studies, which are related to TL in the software engi-
neering field, used ML techniques, and appropriate validation techniques for TL. Some of
the studies used public, proprietary, or open-source datasets.
5.1 Source of publication
In Table 4, we have summarized the details of the publications. The primary studies are
published in the top journals and conferences that are presented in Table 5. Table 4 con-
tains the count of primary studies and percent of primary studies corresponding to journals
and conferences mentioned in the table. The conferences and journals having the highest
publications are the International Conference on Artificial Intelligence, International Con-
ference on ML, Neural Information Processing Systems, International Conference on Tools
13
Table 4 Summary of publications
87260
13
NIPS Conference: Advances in Neural Information Processing Systems Conference 2 5.13
International Conference on Machine Learning Conference 4 10.26
AAAI Conference on Artificial Intelligence Conference 5 12.82
International Conference on Software Engineering (ICSE) Conference 1 2.56
Empirical Software Engineering Journal 1 2.56
International Journal of Pervasive Computing and Communications Journal 1 2.56
Asia–Pacific Symposium on Internetware Symposium 1 2.56
ESEC/FSE’15 Conference 1 2.56
International Conference on Reliability Systems Engineering (ICRSE) Conference 1 2.56
ASE 16 Conference 1 2.56
International Conference on Tools with Artificial Intelligence Conference 2 5.13
International Conference on Machine Learning and Applications Conference 1 2.56
ACM Transaction Intelligent System Technology Transaction 1 2.56
IEEE Access Journal 2 5.13
The Journal of Systems & Software Journal 1 2.56
Brazilian Conference on Intelligent Systems Conference 1 2.56
International Conference on Bioinformatics and Biomedicine (BIBM) Conference 1 2.56
Asia–Pacific Software Engineering Conference Conference 1 2.56
AIP Conference Proceedings Conference 1 2.56
IEEE Transactions on Software Engineering Transaction 2 5.13
Mitsubishi Electric Research Laboratories Conference 1 2.56
IEEE Conference on computer vision and pattern recognition Conference 1 2.56
Computing Research Repository (CoRR) Journal 1 2.56
Information and Software Technology Journal 3 7.69
Multimedia Tools and Applications (2024) 83:87237–87298
Table 4 (continued)
Publication name (Transaction/Journal/Conference/Proceedings/Workshop/Symposium Type (Conference/ Journal/Transaction/ # of Studies Percent
name) Symposium)
13
87262
13
Table 5 Top publication venues with impact factor
Publication name (Transaction/Journal/Conference/Proceedings/Workshop/ Type (Conference/ Journal/Trans- # of Studies Percent Impact Factor
Symposium name) action/Symposium)
with Artificial Intelligence, and the Computing Research Repository (CoRR). We have
seen that most of the studies are presented at the conference. One-third of the total primary
studies are presented in a journal that is 33%, and two-third part of the total primary stud-
ies are published in conferences that are 67%. However, the count of studies published in
Journals (13) is much less than that of the studies published in conferences (26).
5.2 Publication year
In Fig. 2, we have presented the categorization of studies during the period from 1990 to
2024. Figure 3 depicts that there is a continuous increase in the number of studies from
2015 onwards. We have examined that the number of studies increased in the years 2009,
2012, 2014, 2015, 2016, 2017, 2018, 2020, and 2024. One of the datasets that are common
in most of the primary studies is the 20 Newsgroup dataset, and there is continuous use of
accuracy as a performance measure in most of the primary studies. We have collected and
compiled the complete data until March 2024.
The summary of publications according to the type, count, and percentage is summa-
rized in Table 4. Most of the existing studies are published in A* international conference.
The journal and conference are influential, highly reputed, and recognized in the field of
software engineering. However, PS1, PS29, and PS30 are published in top transactions.
Further, PS9, PS13, PS14, PS21, PS22, PS23, PS34, PS36, PS37, PS38, and PS39 are pub-
lished in reputed Journal such as Software Quality, Empirical Software Engineering, and
IEEE Access.
The existing studies are also published in reputed journals with high impact factor at
reputed publishers such as transactions on Software Engineering, and ACM Transaction
Intelligent System Technology represented in Table 5. The publication of existing studies
for TL increased in subsequent years represented in Fig. 3.
However, it has been noticed that TL is mostly explored in the field of defect prediction
compared to change prediction, effort estimation, and maintainability prediction. The field
of TL resulted in the identification of defects in future projects using existing defect predic-
tion models. However, the feasibility of TL for estimating effort is conducted. Moreover,
researchers are experimenting to release a successful study for effort estimation using TL.
The 20 Newsgroup dataset utilization increased from 2015. Also, most of the studies in the
increased year are due to the use of ML techniques or different TL algorithms using dif-
ferent ML techniques as a base learner. Many studies determined the effectiveness of ML
techniques for TL. We have easily decided which studies have to be included or excluded
in this way many of the studies are published for change prediction in software engineering
13
87264 Multimedia Tools and Applications (2024) 83:87237–87298
Hardware Requirement
S.No Hardware Type Specification
1 Processor 11th Gen Intel® Core™ i7-12,700 2.10 GHz
2 RAM 16.0 GB
3 Storage 1 TB SSD
4 Display 27" diagonal, FHD (1920 × 1080)
Software Requirement
S.No Software Type Specification
1 System type 64-bit Operating System, × 64-based processor
2 Operating System Windows 11 Pro
3 Microsoft Excel Excel 2016
4 Mendeley Mendeley Desktop 1.19.8
5 SPSS IBM SPSS Statistics 21
for identifying the change proneness nature of the software using TL. It is also presented
that the studies mostly used SVM as an ML technique. Furthermore, the distribution of
studies according to conference and Journal type is presented in Fig. 4.
In this section, we have summarized the results obtained from the selected primary studies.
The specified hardware and software used for conducting this SR are provided below in
Table 6.
In this section, we have discussed the quality attributes that are used by the various stud-
ies. Some quality attributes, such as effectiveness, performance, reliability, effort, change,
and defect. However, it has been observed that most of the studies are now focused on
defect prediction. Defect prediction in the software engineering field plays an important
role before deploying software at the end user site. Thus, software developers and software
tester team ensures that the software is free from any kind of defect in the deployment
stage. Furthermore, developers need to take care of defect proneness in future projects with
13
Multimedia Tools and Applications (2024) 83:87237–87298 87265
the help of existing project datasets using TL for knowledge transfer in projects with simi-
lar data distribution, and similar tasks.
The most commonly used attribute out of all the quality attributes is performance. It
has been used in 14 (PS1, PS3, PS4, PS10, PS11, PS19, PS20, PS21, PS22, PS24, PS25,
PS26, PS27, PS34) studies. The authors have analyzed the performance of an algorithm
that has been developed or used in the study. The next frequently used attribute is effec-
tiveness, which has been used in 10 (PS2, PS5, PS6, PS7, PS8, PS12, PS14, PS16, PS32,
PS33) studies. The defect attributes have been used in 13 studies (PS9, PS15, PS17, PS18,
PS23, PS28, PS29, PS30, PS35, P36, P37, P38, P39) out of the selected studies. The defect
attribute is used to analyze the effect of TL on defect prediction in software engineering.
The authors have analyzed the effect of defect prediction using TL, whether it is predicted
or not. The effort attribute has been used in 2 studies (PS13, PS39) out of 39 studies that
have been used for this review. It has been checked that it is feasible to build transfer learn-
ers for effort estimation [88]. It has been observed that TL is effective for defect predic-
tion. TL can estimate the effort across time as well as space. Furthermore, in recent years,
authors focused more on defects using WPDP and CPDP. Thus, the occurrence of any kind
of change leads to defects in the current version, and subsequent version of that software.
Thus, developers are required to collect all the requirements from the user in the initial
stage. Further, if there would be any kind of change requested by the customer then it’s fea-
sibility must be checked and approved by the Change Control Board (CCB). Furthermore,
the quality attributes identified in the primary studies are presented in Fig. 5. The descrip-
tion of quality attributes provided in Table 7.
The current section summarizes the details of ML techniques that are used in the selected
primary studies. We have categorized ML techniques into five different categories such as
SVM, DT, EL, Bayesian Learners (BL), NN, and miscellaneous. Table 8 summarizes the
number of studies and the percentage of studies that used ML techniques. Out of all the
ML techniques that have been mentioned in the above table, most of the techniques are
from such categories as SVM, EL, DT, and BL examined in 35.90%, 28.21%, 25.64%, and
13
87266 Multimedia Tools and Applications (2024) 83:87237–87298
Effectiveness This attribute has the ability to provide the desired output or the capability of providing
the desired output.
Performance This attribute provides the system output by doing some work for a particular period.
Reliability This attribute is related to the characteristics which deal with the software potential to
maintain its performance level under certain conditions which are stated in a certain
period.
Effort This quality attribute tells the reasonable amount of time required in developing a
particular software (in terms of person-hours or money).
Defect Proneness This quality attribute is defined as an error made in the source code or the logic in
the source code that can lead to crashing or can produce imprecise/ unpredicted
outcomes.
25.64% of studies respectively. It has been observed that SVM is widely used with TL in
14 studies for software engineering (PS2, PS5, PS6, PS9, PS10, PS11, PS16, PS19, PS22,
PS24, PS26, PS30, PS31, PS37), with LSVM, SVDD, KNND, MSVMs. Futher, EL cat-
egory ML techniques are used increasingly (PS16, PS17, PS18, PS19, PS24, PS29, PS30,
PS34, PS36, PS37, PS38) in 11 studies with RF, VAWBSVM, Adaboost, SGD classifier,
Gradient Boosting classifier. DT category is used in 10 studies (PS1, PS14, PS15, PS21,
PS22, PS24, PS29, PS34, PS37, PS38) with C4.5 and CART variants. Furthermore, the
distribution of studies in terms of percentage is presented in Fig. 6. The ML techniques
mentioned in the table are used for TL. Figure 7 (a, b, c, and d) presents the distribution of
studies according to ML category and types.
This section identifies the datasets, independent variables, algorithms, validation tech-
niques, performance measures, and statistical tests used for TL in the selected primary
studies.
There are various types of datasets used for TL studies. Figure 8. presents the number
and percentage of studies that used various types of datasets. All datasets have a different
nature. Private datasets consist of data that was collected by other researchers and data
collected by other agencies for evaluation or research purposes. Private datasets are not dis-
tributed among researchers and due to this private datasets are not verified and repeatable
by the researchers. Public datasets are freely available. Thus, it has been concluded that
more proprietary datasets and academic datasets must be used for future experimentation.
Also, the exhaust dataset does not provide very efficient results in such cases. It is always
advisable to use more industry-oriented datasets that help researchers to understand and
study datasets in more detail.
The various categories of used datasets are as follows:
13
Table 8 Category of machine learning techniques used in primary studies
Category of ML classifier Type Percentage of No. of studies
studies
SVM TSVM: Transductive SVM, MSVMs: Multiclass SVM, LSVM: Linear SVM, SVDD: Support Vector 35.90 14
Domain Data Description, KNND: k-nearest Neighbor Data Description
DT CART, C4.5 25.64 10
Multimedia Tools and Applications (2024) 83:87237–87298
EL RF, VAWBSVM: Value Aware Boosting with SVM, SGDClassifier, Gradient Boosting Classifier, AdaBoost 28.21 11
Classifier
BL WNBC: Weighted NB classifier, NB: Naive Bayes, BN: Bayesian Networks 25.64 10
K-NN Nearest Neighbor 17.95 7
Miscellaneous SR: Softmax Regression, LR: Linear Regression, MTL: Multitask learning, ST: Self—training, Logistic 12.82 5
Regression, GBM: Graph-based methods, MR: Manifold regularization
87267
13
87268 Multimedia Tools and Applications (2024) 83:87237–87298
metrics such as change metrics, existing defects metrics, code metrics, the entropy of
changes metrics, and the entropy of source code metrics. It consists of 61 metrics and
5386 instances [65, 97, 99]. This dataset is used in 11% of the primary studies (PS11,
PS16, PS17, PS18, PS29, PS30, PS34, PS36, PS38).
• MAGIC Gamma Telescope dataset: The MAGIC Gamma dataset, also known as MAG.
This dataset is resourced from the repository of UCI ML. This dataset is in the form of
binary classification, which has various instances and numerical value attributes. This
dataset is used in 4% of the primary studies (PS19, PS24).
• MovieLens dataset: This dataset is collected by GroupLens. It is a movie rating dataset
on a scale of 1 to 5. It provides the rating dataset that is available on the MovieLens
web site. Users provide rating for each movie during different time intervals. This data-
set is used in 2% of the primary studies (PS7, PS8).
• NASA dataset: This dataset is publicly available. NASA repository stores this dataset,
and the NASA metrics data program maintains this dataset. All datasets in the NASA
repository act as a particular NASA computer software or sub-part of software. This
software consists of data regarding defect marking and metrics related to source code.
Metrics related to source code consist of length, understandability, and complexity,
which are associated with software quality. This dataset is used in 15% of the primary
studies (PS4, PS9, PS13, PS16, PS17, PS18, PS23, PS28, PS29, PS30, PS34, PS36,
PS38).
• ReLink dataset: This dataset contains information regarding defects. The information
stored in this is manually proven and improved. ReLink contains 26 complexity met-
rics. These metrics are used for defect forecasts. The ReLink dataset has different fea-
tures like time interval, bug owner, change committer, and text similarity. It consists of
a total of 26 features and 658 instances [100]. This dataset is used in 9% of the primary
studies (PS11, PS16, PS17, PS18, PS29, PS30, P36, PS38).
• Reuters-215782: This dataset was collected by Carnegie Group, Inc. and Reuters, Ltd.
during the period of developing CONSTRUE text categorization system. This dataset is
one of the commonly used datasets for text categorization. It is defined as the collection
of various documents that are available on the Reuters commercial newswire system. It
has five top divisions and many subdivisions. This dataset is used in 5% of the primary
studies (PS2, PS5, PS6, PS10).
13
Multimedia Tools and Applications (2024) 83:87237–87298 87269
Fig. 7 Division of sub-categories of ML techniques in (a) SVM (b) EL (c) BL (d) Miscellaneous
13
87270 Multimedia Tools and Applications (2024) 83:87237–87298
• SOFTLAB: This dataset is collected by a Turkish software company having three dif-
ferent datasets, which are AR3, AR4, and AR5. The dataset stored by the company
acts as a controller for different software installed in home appliances like a washing
machine, a dishwasher, and a refrigerator, respectively. The used datasets from SOFT-
LAB and NASA are obtained from the PROMISE repository [99]. This dataset is used
in 7% of the primary studies (PS9, PS16, PS17, PS28, PS30, PS38).
• Synthetic dataset: This dataset is created with the process of data protection and data
privacy. Dataset is used in 2% of the primary studies (PS26, PS31).
• PROMISE repository dataset: This dataset is freely accessible in the PROMISE reposi-
tory. This repository is created to enhance the use of prediction models in software
engineering. This dataset it is used in 6% of primary studies (PS15, PS23, PS28, PS35,
PS37).
• UCI 20 Newsgroups dataset: This dataset was collected by Ken Lang, for his work.
This dataset is a collection of 20,000 newsgroup reports, and they are equally divided
among other 20 different newsgroups. It is a widely used dataset to experiment with
text applications of ML algorithms, document categorization, and document clustering.
This dataset is used in 9% of primary studies (PS2, PS3, PS6, PS10, PS18, PS20, PS21,
PS22).
• Others: This category consists of different dataset which is concerned with movies,
image datasets like imagine, dataset concerned with species like inaturalist, and some
real datasets like amazon product reviews. It is used in 29% of primary studies (PS2,
PS3, PS4, PS6, PS7, PS8, PS10, PS11, PS12, PS13, PS14, PS16, PS17, PS19, PS20,
PS21, PS24, PS25, PS26, PS27, PS30, PS31, PS32, PS33, PS39).
In this section, we have discussed different independent variables used in each study pre-
sented in Table 9. Various independent variables have been used by the selected primary
studies like number of features, classes, Object-Oriented (OO) metrics, Halstead metrics,
and Chidamber & Kemerer (CK) metrics. It is observed that CK metrics are mostly used
CK metrics.
13
Table 9 Independent variables used
Independent Variables Primary Studies Independent Variables Primary Studies
Information Measure Metric (IM Metric) PS1 Number of test samples PS20
Number of classes in email PS2 Performance metrics PS21
The vocabulary of words and a summary of documents PS3 Train Pivot Predictors PS27
Eigenvector PS6 Attributes PS9, PS23
Regularization parameters, Number of feature clusters k, Number of PS10 PS15, PS18, PS30, PS35
nearest neighbors
Multimedia Tools and Applications (2024) 83:87237–87298
13
87272 Multimedia Tools and Applications (2024) 83:87237–87298
This section discussed the various algorithms that have been used by the primary stud-
ies. The algorithms are based on the type of target and training data. Two studies have
performed a comparison among five TL methods using ML techniques as a base learner.
Adaptation Regularization TL (ARTL), Geodesic Flow Kernel (GFK), TCA, TJM are the
TL algorithms that have been used in two studies (Table 10).
This section describes the studies used in this review that have used different validation
techniques to validate the outcomes after applying a particular algorithm or after experi-
menting. The different validation techniques such as K-fold cross-validation, Leave-one-
out cross-validation (LOOCV), and Hold-out validation presented in Table 11. The most
commonly used validation technique is K-fold cross-validation. K-fold cross-validation has
been used in 15 (PS1, PS2, PS6, PS11, PS12, PS13, PS14, PS15, PS21, PS22, PS23, PS24,
PS29, PS30, PS31) studies out of all the selected primary studies for the review. However,
LOOCV is used in two studies (PS13, PS26), and hold-out cross-validation is used in one
study (PS3). The graphical representation of the count of studies that used validation tech-
niques presented in Fig. 9.
There are various metrics or measures used to analyze the performance of different mod-
els developed using TL. The evaluation measures play an important role in performing
the comparison and evaluation of developed models using various TL and ML techniques.
Table 12 represents various evaluation measures, and theoretical description of the speci-
fied measures.
The illustration of the count of studies for each specified evaluation metric is repre-
sented in Fig. 10. From the Fig. 10, it has been observed that accuracy is the widely used
evaluation metrics (PS2, PS5, PS10, PS12, PS14, PS18, PS19, PS20, PS21, PS22, PS24,
PS25, PS27, PS28, PS29, PS32, PS33), followed by Recall (PS9, PS11, PS16, PS18, PS21,
PS27, PS28, PS29, PS35, P36, PS38, PS39), F-measure (PS9, PS11, PS15, PS16, PS17,
PS18, PS28, PS29, PS37, PS38, PS39), AUC measure (PS4, PS9,PS15, PS17, PS21, PS23,
PS26, PS30, PS34, P36, PS38), Precision (PS11, PS14, PS18, PS28, PS29, PS39), FPR
(PS9, PS16, PS18, PS29, PS35), and G-mean (PS36, PS38). The performance metric
which is rarely used are united in the miscellaneous category such as absolute residual,
AUCEC, CLL, Error rate, Error mean, Error median, MAE, MRE, MER, MBRE, Misclas-
sification error, Mean square error, RMSE, SA, UAR (PS3, PS4, PS6, PS8, PS13, PS14,
PS16, PS25, PS31, PS32).
This section describes the various statistical tests that have been used by the stud-
ies. These tests tell us about the significant difference between various distributions.
Table 13 represents the type of statistical test, their description, and the study identifier
in which they are used. In Fig. 11 we have represented graphically the statistical test
13
Table 10 Transfer learning algorithm used
Primary Studies TL algorithm Description
PS2 Task-clustering algorithm The study given [68] by the researchers used a task – clustering algo-
rithm. This algorithm was used for text classification. There exists
a linear text classification algorithm; it used inner product across
a test document vector and parameter vector. In the task clustering
algorithm, the tasks are grouped via the nearest neighbor algorithm
to facilitate knowledge transfer. Different parameter functions are
used in this algorithm. The parameter function is obtained with the
help of training data, and it has been used for testing data.
PS4 Find a legal mapping for a source clause The study used the algorithm to find a mapping for source clause.
The authors have used the concept of TL in terms of transferring
mapping learned from source to target. There are two different types
of mapping. One is global mapping, and the other is local mapping.
In global mapping, mapping is established for each source predicate
to a target predicate and used for the entire source translation. The
other approach called local mapping, is to find the top mapping of
Multimedia Tools and Applications (2024) 83:87237–87298
13
Table 10 (continued)
87274
13
PS19, PS20, PS22 Adaptation Regularization TL (ARTL) algorithm The study given by existing authors [98] used ARTL algorithm. This
algorithm performs instant adaptation of different domains and
classifier learning.
PS19, PS24, PS29, PS32, PS37 TCA algorithm The study given by existing authors [81, 102] used TCA algorithm.
This algorithm explores similar features between the training and
target data.
PS19, PS24 TJM algorithm The study used TJM algorithm. This algorithm is similar to ARTL
algorithm. The main aim of this algorithm is to decrease the mar-
ginal probabilities.
PS21 Feature Space Remapping (FSR) The study given by existing researchers [65] used FSR algorithm. It
is a heterogeneous TL algorithm. It transforms features among the
source and target data. It calculates meta-features and then computes
the similarity between them.
PS27 Weight—Structural Corresponding Learning (SCL) algorithm The study given by existing researchers [77] used Weighted SCL
algorithm. This algorithm finds out the important and unimportant
features among the source and target domains.
PS22 Weighted-resampling-based TL algorithm (TrResampling) The study given by existing researchers [67] proposed a TrResam-
pling algorithm. In this algorithm, several iterations are performed.
The main focus of this algorithm is to transfer weights assigned to
the instances. In each iteration, a new source training data set is cre-
ated. The labeled data in the target dataset is also combined with the
source training data set.
PS35 TL Oriented Minority Oversampling Technique based on Feature The study given by [93] has proposed a TOMOFWTNB algorithm.
Weighting TNB (TOMOFWTNB) This algorithm transfers the features among the source and target
data. The transferred features are selected based on their correlation
with the predictor/ output.
Multimedia Tools and Applications (2024) 83:87237–87298
Table 10 (continued)
Primary Studies TL algorithm Description
PS36 3SW-MSTL A novel method named 3SW-MSTL was developed for multi-source.
In the first stage, it is used to select multiple source projects from
multiple target projects considered as a training project. Fur-
ther, KNN applied to obtain 14 reweighted training instances by
minimizing the difference of marginal distributions between each
selected source project, and target project. It is based on a difference
between the conditional probability distribution of selected source
projects and target projects, a multi-source data utilization scheme is
employed for prediction model training.
PS38 BDA BDA considers both marginal and conditional distribution differences
between both source and target projects.
PS39 TrAdaBoost It is a supervised instance based domain adaptation algorithm and is
mainly used for classification tasks. It is considered a reverse boost-
ing concept.
Multimedia Tools and Applications (2024) 83:87237–87298
87275
13
87276 Multimedia Tools and Applications (2024) 83:87237–87298
K-Fold Cross-Validation In this validation technique, the original data is randomly divided into K
identical-sized subsets of original data. Out of K subsets, a single subset acts
as verification data to perform testing, and the leftover K-1 subsets act as
training data.
LOOCV This validation technique is similar to K–fold cross-validation where K is
equivalent to N, the number of data points in the set. It means that the func-
tion approximator is trained for all the data except one point and that one
point is used for prediction.
Hold-out Validation This validation technique is simple and commonly used cross-validation
technique. In this technique, the dataset is categorized into two different sets,
one dataset is used as a training set, and another dataset is used as a testing
set. The training set is used to fit a function in the function approximator. The
outcome values are predicted by function approximator using the testing set
data, which is provided as an input to it.
and no. of studies in which they are used. The various test that is used in the studies
is One-way ANOVA, ANOVA, Paired t-test, Wilcoxon test, Wilcoxon rank-sum test,
Tukey’s Honest Significant Difference test, Friedman test, Two-tailed T-test, Kolmogo-
rov–Smirnov test (K–S test or KS test). Kruskal–Wallis H-Test is rarely used as a sta-
tistical test. Thus, from the observed data it is concluded that Wilcoxon tests were used
in the majority of cases (PS10, PS11, PS13, PS23, PS34, PS36, PS37), as it is a non-
parametric test and used to perform comparison among two independent samples. Fur-
thermore, the Friedman test is used in the majority of studies (PS22, PS36, PS38), and it
is used to compare multiple treatments. However, the limitation of the Friedman test is
that it can be applied only when the minimum of treatments is 3. However, if the result
of Freidman test conclude to accept the alternate hypothesis, then a post-hoc analysis
test must be performed. Thus, a comparison of two techniques must be performed using
Nemenyi test, Wilcoxon signed rank test, and Bonferroni Dun test.
13
Multimedia Tools and Applications (2024) 83:87237–87298 87277
13
87278 Multimedia Tools and Applications (2024) 83:87237–87298
This section describes the various categories of TL method used in the studies. There are
three different categories of TL methods, such as Transductive TL (TdTL), Inductive TL
(IdTL), and Unsupervised TL (UnTL). These categories can also be termed as TL settings
in which the TL algorithms have been performed. The TL categories and their settings are
illustrated in Table 14. These categories differentiate from each other based on the type of
source data, type of target data, source and target domain, and source and target task. It has
been observed that most of the studies employed feature TL, and instance TL with IdTL.
Moreover, relational knowledge and parameter transfer are also feasible with IdTL. How-
ever, in the existing studies authors explored feature representation, and instance transfer-
based learning. The knowledge transfer is easy with the features of different projects. Fea-
ture transfer considers the features of the source and target domain. Further, a correlation
needs to be established between the features of both projects. Based on feature similarity,
either direct features will be extracted, or a feature matching analyzer will be used if there
is a huge amount of dissimilarity among the features of the source and target project. The
parameter transfer is used when the algorithm is used for transferring knowledge using
default parameters value change, and the target project sets its algorithm parameter value
according to the source project, this is accomplished for parameter TL. Hyperparameter
optimization can also be employed for parameter TL. In relational knowledge transfer, a
relationship needs to be established among the source project dataset, and using that pre-
diction model must be designed. Furthermore, this prediction model would be used for
knowledge transfer using the same methodology in the target dataset. In instance type
transfer, the knowledge is shared based on the instances of the source project. The dataset
must be preprocessed to apply instance TL. Thus, based on the analysis of existing litera-
ture, it has been concluded that feature transfer is more effective and efficient with TL in
the software engineering domain.
We have observed that TdTL (34.28%) has been widely used among all the categories.
IdTL (28.57%) has been used in only those studies that are related to multi-task learning
and self-taught learning. The last category UnTL has not been used in any of the studies.
Most of the studies considered the labeled data in the source domain (SD) while in the case
of UnTL source domain (SD) labels are not available, and target domain (TD) labels are
not available. Four different approaches correspond to these TL settings, such as instance
13
Table 13 Description of statistical test used
Statistical test Studies Description
One-way ANOVA PS14, PS21 One—way analysis of variance technique is used for the comparison of mean of two or more than two
samples. This technique applies to numeric data only.
ANOVA PS19, PS24, Analysis of variance technique is used to check if there is a significant difference between the mean of two
or more then two groups. It checks dependency between factors with the help of the mean comparison of
different samples
Kruskal–Wallis H-test PS2, PS18 The Kruskal–Wallis H test is also called a one-way ANOVA on ranks. It is a rank-based nonparametric test
that can be used to identify that can be used to determine if there are statistically significant differences
between two or more groups of an independent variable on a continuous or ordinal dependent variable.
It is considered as extended version of the Mann–Whitney U test to perform the comparison across more
than two independent groups
Paired t-test PS20, PS32 The paired t-test is also known as the paired sample t-test and dependent sample t-test. It is a statistical test
that is used to identify whether the mean difference between two sets of observations is zero.
Wilcoxon test PS10, PS11, PS13, PS23, The Wilcoxon test has four different variants. It is a non-parametric test. One of the variants of the Wil-
PS34, PS36, PS37 coxon test is the Wilcoxon signed-rank test. This test is used to compare two different samples which are
Multimedia Tools and Applications (2024) 83:87237–87298
related, matched samples, or parallel measurements over one sample to analyze the difference between
their population mean ranks. Another variant is the Wilcoxon signed-rank test, which is a nonparametric
test that can be used to identify whether two dependent samples were selected from populations having a
similar distribution.
Wilcoxon rank-sum test PS9, PS16, PS35 Wilcoxon rank-sum test also known as the Mann–Whitney–Wilcoxon, Mann–Whitney U test, or Wilcoxon–
Mann–Whitney test. Wilcoxon rank-sum test is a non-parametric test of no effect that is the value that is
randomly selected form one population sample will be either less than or greater than a value that is ran-
domly selected from another population sample. Non-parametric means it does not have any assumptions
of gaussian distributions (normal distribution). This test applies to independent samples.
Friedman Test PS22, PS36, PS38 It is a non-parametric test and it is an alternative measure, This test is used to test the difference across dif-
ferent groups when the target variable is of ordinal type.
Two-tailed T-test PS22 In the two-tailed test, the critical area of the distribution is two-sided, it tests whether a sample is greater
than or less than a certain range of values. Thus, it is used in null hypothesis testing.
KS test PS34, PS35 It is a non-parametric test which is used to test the equality of continuous or discontinuity.
87279
13
Table 13 (continued)
87280
13
Tukey’s HSD PS19, PS24 Tukey’s Honest Significant Difference test or Tukey’s HSD (honestly significant difference) test, also known
as the Tukey’s range test, Tukey’s test, and Tukey method, is a one-step process of several comparison and
statistical tests. This test can be applicable to unprocessed data or in combination with an ANOVA to find
out the means that are different from each other.
Bonferroni-Dunn test PS36 It is used to perform comparison among multiple pairs of mean (averages) among groups of data and is
mostly used after applying statistical test for mean comparison such as ANOVA.
Nemenyi test PS38 This test is used as a post-hoc analysis test like the Wilcoxon signed rank test followed by the Friedman test.
It is used to find out which groups are different. The hypothesis for Friedman test concerning Nemenyi
tests as follows:
• The null hypothesis (Ho): The mean value for each of the populations is equal
• The alternative hypothesis: (Ha): At least one population mean differs from the others
Kendall tau-b rank correlation PS16 It is used to find the strength and direction of association between two variables on an ordinal scale.
coefficient
One-sided paired t-test PS17 In a two-tailed test, the critical area of the distribution is one-sided, it tests whether a sample is greater than
or less than a certain range of values, but not both.
Multimedia Tools and Applications (2024) 83:87237–87298
Multimedia Tools and Applications (2024) 83:87237–87298 87281
This section discusses the TL algorithm, which is effective against the various traditional
learners. Traditional learners include ML techniques, which are compared with the pro-
posed algorithm by different authors. These studies have used different datasets over which
comparisons have been made. We have observed the values of accuracy, AUC, Recall, and
F-measure for analyzing the performance of TL algorithms. However, these four metrics
are mostly used in the existing studies. In the comparative analysis concerning existing
studies, the combined dataset of results is collected and outliers are removed. Moreover,
outliers lead to unbiased results corresponding to specified datasets. Thus, a boxplot is
used to remove these outliers. Figure 13 presents the distribution of studies concerning
accuracy value corresponding to all the datasets majorly used. Figures 14 and 15 present
the distribution of studies concerning AUC value corresponding to all the datasets majorly
used. Figure 16 presents the distribution of studies concerning Recall value corresponding
to all the datasets majorly used. Figure 17 presents the distribution of studies concerning
the F-measure value corresponding to all the datasets majorly used. The descriptive statis-
tics of all the performance measure with respect to TL techniques are presented in Table 15
including minimum, maximum, mean, median, and standard deviation measure values.
In the study given by existing researchers [103], experiments have been performed
on various TL and ML algorithms on different datasets. The various TL algorithms that
are used in the two studies are GTL, TCA, TJM, and GFK. These algorithms have been
tested on five distortion profiles. It has been observed that the traditional ML algorithm
RF has performed best. GFK and TJM algorithms provided the worst result. The base clas-
sifier is same for both the algorithms. Other base classifiers are flexible to noisy datasets,
unlike 1-NN classifier, due to which these two algorithms performance result in the worst
13
87282
13
Table 14 Transfer learning type and setting
TL Category Relevant Field Source Domain (SD) and Target Domain ( TD) Source Labels (SL) Target Labels ( TL) Source (ST) and Target Task ( TT)
performance. When SVM is used as a base classifier, then TCA algorithm results in the
worst performance. Table 16 and Table 17 present the statistics of performance measures
obtained from the existing study. The performance of the ARTL algorithm is proved to be
best in comparison with other TL algorithms that have been used in this study. The ARTL
algorithm has attempted to resolve boundary and optional distribution differences, which
can be a reason for the best performance of the ARTL algorithm. The overall conclusion
of the study stated that the TCA algorithm performs best out of all the TL algorithms that
have been used in this study for comparison. The TJM algorithm is second best after the
TCA algorithm. All of these algorithms perform best or worst on different distortions. The
ARTL algorithm comes third after the TJM algorithm.
In the existing study [103], five different TL methods were compared on different data-
sets with a different statistical test against seven different base learners. The five TL algo-
rithms that are used are as follows: GFK, JDA, TJM, TKL, and TCA. The seven different base
learners are RF, SVM, Discriminant Analysis, LR, 5NN, DT, and NB. AUC values have been
13
87284 Multimedia Tools and Applications (2024) 83:87237–87298
computed for four base learners corresponding to the MAG, USPS, CCC, and CV datasets. In
the next step, accuracy has been calculated for all algorithms corresponding to seven different
distortion profiles. In the third step, accuracy has been computed over seven base learners cor-
responding to each TL algorithm. The best base learner for the TL algorithm has been individ-
ually investigated by Tukey’s HSD test and assigned the HSD group for every accuracy value.
This section discusses the threats to validity for TL based on the similarity between
domains, the kind of data used in the source, and the target domain. Based on data
13
Multimedia Tools and Applications (2024) 83:87237–87298 87285
13
87286 Multimedia Tools and Applications (2024) 83:87237–87298
13
Multimedia Tools and Applications (2024) 83:87237–87298 87287
Table 15 (continued)
Technique Performance Minimum Maximum Mean Median Standard Deviation
Measure
CCI-60-C1 RF 79.59%
LCI-60-C0 RF 73.87%
FFB-1 RF 79.28%
CFB-2 ARTL 76.56%
DCB-80 RF 76.22%
13
87288 Multimedia Tools and Applications (2024) 83:87237–87298
Table 17 Accuracy and HSD group ranking of best and worst base learners for TL algorithm used
TL Algorithm Best / Worst Base Learner Accuracy HSD
Group
Ranking
7 Limitations
This SR review examined various studies and selected primary studies to study and analyze
the existing TL algorithms that have been designed to perform different experiments. This
SR also identifies the ML techniques that are used for TL. However, an exhaustive search
was conducted to collect studies from all the digital libraries and selected 39 primary stud-
ies. Out of 39 primary studies, some of the studies are related to the software engineering
field. It is one of the limitations of this review. Thus, the effect of TL in software engi-
neering is not conclusive. The primary studies that have been considered for this review
have performed different experiments in each study, and every study has a different TL
algorithm, Moreover, there can be a threat that some relevant studies are excluded after
applying the exclusion criterion. However, we assumed that primary studies are non-biased
and impartial. Thus, if it exists in our SR, then there is a threat to the validity of the review.
To conduct this SR, studies with ML, and TL techniques are considered only with specified
measures, and validation methods. However, more techniques can be explored, including
the dataset, performance measures, and validation methods. Moreover, the results in each
primary study depend on the experimental setting used such as dataset, variables used,
feature selection techniques used, validation method used, type of projects used, and pro-
gramming language used. Thus, a threat to validity can occur but statistical analysis of the
results performed in this SR.
8 Applications
TL is a technique that is used to train the model on one task and resue it for learning on
another task by establishing a relationship between the data distribution of both tasks.
The operational success of this SR is helpful for academicians, researchers, and industry
experts to develop more reliable and robust software in the future. The authors tested the
capability of these models for different versions of the same projects named as inter-pro-
ject validation. The performance of inter-project validation using TL is more efficient and
13
Multimedia Tools and Applications (2024) 83:87237–87298 87289
Discriminability Based TL (DBT) It has been demonstrated that the destination net-
works that are initialized via DBT learn much faster
than networks that are initialized randomly. DBT
indicates considerable and important learning speed
improvement across randomly initialized networks.
DBT is superior in comparison to literal transfer,
and to directly use the destination network on the
destination task.
Task – clustering It provides an idea to achieve inductive transfer
in classifier design with the help of labeled data
from the related classification problems to solve a
particular classification problem.
TNB It improves the performance of the dataset collected
from various companies or cross-company data.
Graph co-regularized TL (GTL) The main focus of GTL is TdTL. In TdTL, the
domain has generously labeled examples, while the
destination domain consists of unlabeled examples
only. GTL does not cover the latent features under
various domains as the bridge to transfer knowledge
simultaneously. It results in maximizing the empiri-
cal likelihood of all the domains and conserving the
geometric structure in every domain.
TCA Advantages: TCA learns a similar transfer compo-
nent that comes under both domains such as the
difference in the distribution of data across various
domains. It can be reduced if it is projected on the
subspace, and it conserves the various data proper-
ties. It is beneficial to use traditional ML methods
in this subspace to train classification and regres-
sion over various domains. If two or more domains
are associated with each other, then there may exist
various similar components under them, due to
which the partitioning of data across domains is to
be distinguished.
HHTL Advantages: It is useful for transferring knowledge
over various feature scopes and concurrently
rectifying the data error on the transformed feature
space. The performance of HHTL is best and more
stable when the size of parallel data is increased.
The HHTL is effective and robust for cross-lan-
guage sentiment classification.
Instance-based techniques Advantages: These techniques are used for handling
instances by removing the outliers, relevant filter-
ing, or weighting of instances.
Distribution-based techniques These techniques aim at managing the instance distri-
bution for training and testing sets with the help of
stratification, cost curves, and mixture models.
13
87290 Multimedia Tools and Applications (2024) 83:87237–87298
Table 18 (continued)
TL Technique Advantages and Disadvantages of TL techniques
GA for Feature-Space Remapping (GAFSR) and Advantages: These techniques are informed,
Greedy Search for Feature-Space Remapping supervised learning techniques. The benefit of FSR
(GrFSR) is that it can be applicable for both cases that are
either informed or uninformed. The main advantage
of GAFSR is that it achieves the best performance
scores across all the metrics.
Disadvantages: It takes more time to execute, in
comparison to IFS. In IFS, the computation count is
low, but the performance score is high.
Stacking Advantages: It is beneficial in terms of combining
the stacking with IFSR, and IFSR uses labeled data.
Disadvantage: To train ensemble classifiers, it needs
labeled data.
Canonical Correlation Analysis (CCA) Advantages: It is an effective TL method. It is used
to make the distribution among training and testing
data of companies. CCA with CCDP is effective for
HCCDP. CCA acts as a powerful tool in multi-
variate data analysis to establish the correlation
between two different sets of variables.
Feature-Space Remapping (FSR) Advantages: It can manage various feature spaces
without using any co-occurrence data. This
technique uses originally raw data which is already
mapped on a feature space, and this is why this
technique is also known as a remapping name. It
requires a low amount of labeled data in its target
domain. This labeled data is required to understand
the relations to the training domain. It can increase
the classification accuracy in the target domain by
combining the relevant information from the train-
ing domain with the help of ensemble learners.
TNB Advantages: TNB performs better for SOFTLAB
dataset, and it does not perform better for NASA
dataset. TNB works for both within-company as
well as cross-company. The author focused on
cross-company defect prediction. TNB outperforms
naive bayes in the context of performance measures
such as F-measure, AUC over within company, and
cross-company defect prediction.
Disadvantages: TNB is limited to a particular com-
pany dataset.
Ensemble technique Advantages: This technique performs better than a
trained classifier using a huge amount of labeled
data in the destination domain.
Voting Ensemble Advantages: It is defined as the simplest method for
combining multiple classifiers.
Bellwether Advantages: Bellwether can be used efficiently
when the availability of historical data is limited
or negligible. Due to a lack of historical data,
developers try to get data from other projects. It has
been examined that irrespective of the granularity
of data, there exists a bellwether dataset that can be
used for the training of defect prediction models.
The bellwether does not require brief data mining
methods to discover, and it can be identified during
the early phase of the project life cycle.
13
Multimedia Tools and Applications (2024) 83:87237–87298 87291
8.1 Economic analysis
In this section, the importance of assessing the economic impact of TL in software engi-
neering is discussed. Further, the evaluation of value or wealth from the usage of TL is a
crucial aspect of socio-economic implications in the software engineering domain. In the
future, we will use an economic evaluation approach to quantify, and analyze the economic
impact of this SR at local and global scales. The economic analysis includes cost–benefit
analysis, Return on Investment (ROI), and socio-economic impact analysis. The cost–ben-
efit analysis will be done by analyzing the cost of implementing TL in the industry through
reusability of existing code including the cost of acquiring data, model development, and
maintenance. The cost–benefit analysis includes development efficiency, product quality,
13
87292 Multimedia Tools and Applications (2024) 83:87237–87298
and reduced time-to-market cost of the software. Further, ROI will be computed by com-
paring the financial gains achieved through the implementation of the initial investment
made. The ROI analysis would help business associates, and software developers in terms
of profitability and viability of integrating TL techniques into software engineering meth-
odology workflow. Furthermore, the socio-economic impact analysis will be accomplished
by considering various aspects such as innovation simulation, social welfare improvement,
and efficient software developers employment. Thus, this analysis will help in a holistic
perspective on the value creation potential of TL with software engineering ecosystem.
Thus, comprehensive analysis provide valuable financial and socio-economic analysis to
researchers, academicians, and industry experts in the upcoming years.
In this paper, we have performed SR for TL using ML techniques. We have studied and
examined the various TL algorithms in the fields of artificial intelligence, ML, and soft-
ware engineering. Firstly, we have done a deep analysis followed by a sequence of system-
atic points and identified 39 primary studies during this period (1991–2024). Secondly,
the quality attributes that are focused on TL are discussed. Thirdly, the characteristics or
experimental settings of the primary studies have been discussed based on the dataset,
independent variables, TL algorithms, validation techniques, performance measures, and
statistical tests. Fourth, we have analyzed the comparison of various TL techniques with
traditional ML algorithms as a base learner. In the end, the merits and demerits of TL tech-
niques are summarized. The relevant outcomes obtained from the primary studies selected
for this review are as follows:
• The quality attributes that have been used for TL are accuracy, effectiveness, perfor-
mance, reliability, effort, and defect. The most commonly used quality attribute is per-
formance and effectiveness used in 32%, 23% of studies. There is no study conducted
for change prediction using TL.
• The ML techniques were categorized into different classes such as SVM, EL, DT, BL,
NB, and Miscellaneous. The mostly used ML techniques for TL were SVM, RF, and
NB.
• The most commonly used dataset for performing experiments is NASA in the litera-
ture.
• The independent variables that have been used by various studies do not exhibit any
relationship with each other.
• The algorithms that have been used by the selected studies differ and these algorithms
depend on the type of training and target dataset.
• The validation technique that has been used in most of the primary studies is K-fold
cross-validation. In K-fold cross-validation, the original dataset is used for both training
as well as validation, and it uses every sample for validation exactly once.
• The performance measure that has been used by most studies is accuracy followed by
AUC, F-measure, and recall.
• The TL categories used in the selected primary studies are IdTL and TdTL. UnTL has
not been used in any study. The instance transfer setting has been mostly used in IdTL,
and feature-representation transfer has been mostly used in TdTL.
13
Multimedia Tools and Applications (2024) 83:87237–87298 87293
Data availability The details of the selected primary studies used as data in this article are specified in
Table 3.
Declarations
Ethical Approval and consent to participate This article does not contain any studies with human partici-
pants or animals performed by any of the authors.
Conflict of Interest The authors declare that they have no conflicts of interest.
13
87294 Multimedia Tools and Applications (2024) 83:87237–87298
References
1. Joachims T (1999) Transductive inference for text classification using support vector machines. In
Icml 99:200–209. https://dl.acm.org/doi/10.5555/645528.657646
2. Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. J Big data 3:1–40.
https://doi.org/10.1186/s40537-016-0043-6
3. Zhao P, Liu Y, Lu Y, Xu B (2019) A sketch recognition method based on transfer deep learning
with the fusion of multi-granular sketches. Multimed Tools Appl 78:35179–35193. https://doi.org/
10.1007/s11042-019-08216-6
4. Day O, Khoshgoftaar TM (2017) A survey on heterogeneous transfer learning. J Big Data 4:29.
https://doi.org/10.1186/s40537-017-0089-0
5. Priyadarshini I, Sahu S, Kumar R (2023) A transfer learning approach for detecting offensive and
hate speech on social media platforms. Multimed Tools Appl 82:27473–27499. https://doi.org/10.
1007/s11042-023-14481-3
6. Chen J, Sun J, Li Y, Hou C (2022) Object detection in remote sensing images based on deep trans-
fer learning. Multimed Tools Appl 81:12093–12109. https://doi.org/10.1007/s11042-021-10833-z
7. Kang J, Gwak J (2022) Ensemble of multi-task deep convolutional neural networks using transfer
learning for fruit freshness classification. Multimed Tools Appl 81:22355–22377. https://doi.org/
10.1007/s11042-021-11282-4
8. Varshney N, Bakariya B, Kushwaha AKS (2022) Human activity recognition using deep trans-
fer learning of cross position sensor based on vertical distribution of data. Multimed Tools Appl
81:22307–22322. https://doi.org/10.1007/s11042-021-11131-4
9. Rashedi E, Nezamabadi-pour H, Saryazdi S (2009) GSA: A Gravitational Search Algorithm. Inf
Sci (Ny) 179:2232–2248. https://doi.org/10.1016/j.ins.2009.03.004
10. Ornek AH, Ceylan M (2022) Medical thermograms’ classification using deep transfer learn-
ing models and methods. Multimed Tools Appl 81:9367–9384. https://doi.org/10.1007/
s11042-021-11852-6
11. Taylor ME, Stone P (2009) Transfer Learning for Reinforcement Learning Domains : A Survey. J
Mach Learn Res 10:1633–1685. https://doi.org/10.1145/1577069.1755839
12. Xu Q, Yang Q (2011) A Survey of Transfer and Multitask Learning in Bioinformatics. J Comput Sci
Eng 5:257–268. https://doi.org/10.5626/jcse.2011.5.3.257
13. Lu J, Behbood V, Hao P et al (2015) Transfer learning using computational intelligence: A survey.
Knowledge-Based Syst 80:14–23. https://doi.org/10.1016/j.knosys.2015.01.010
14. Ribani R, Marengoni M (2019) A Survey of Transfer Learning for Convolutional Neural Networks.
Proc - 32nd Conf Graph Patterns Images Tutorials. SIBGRAPI-T 2019:47–57. https://doi.org/10.
1109/SIBGRAPI-T.2019.00010
15. Cook D, Feuz KD, Krishnan NC (2013) Transfer Learning for Activity Recognition: A Survey.
Knowl Inf Syst 36:537–556
16. Mohammadi A, Zahiri SH (2018) Inclined planes system optimization algorithm for IIR system iden-
tification. Int J Mach Learn Cybern 9:541–558. https://doi.org/10.1007/s13042-016-0588-x
17. Mohammadi A, Zahiri SH (2017) IIR model identification using a modified inclined planes system
optimization algorithm. Artif Intell Rev 48:237–259. https://doi.org/10.1007/s10462-016-9500-z
18. Mohammadi A, Sheikholeslam F, Mirjalili S (2022) Inclined planes system optimization: the-
ory, literature review, and state-of-the-art versions for IIR system identification. Expert Syst Appl
200:117127. https://doi.org/10.1016/j.eswa.2022.117127
19. Esfahrood SM, Mohammadi A, Zahiri SH (2019) A simplified and efficient version of inclined planes
system optimization algorithm. In: 2019 5th Conference on Knowledge Based Engineering and Inno-
vation (KBEI), pp 504–509. https://doi.org/10.1109/KBEI.2019.8735044
20. Mohammadi A, Sheikholeslam F, Mirjalili S (2023) Nature-inspired metaheuristic search algorithms
for optimizing benchmark problems: inclined planes system optimization to state-of-the-art methods.
Arch Comput Methods Eng 30(1):331–389. https://doi.org/10.1007/s11831-022-09800-0
21. Pan W (2016) A survey of transfer learning for collaborative recommendation with auxiliary data.
Neurocomputing 177:447–453. https://doi.org/10.1016/j.neucom.2015.11.059
22. Ali SMM, Augusto JC, Windridge D (2019) A Survey of User-Centred Approaches for Smart Home
Transfer Learning and New User Home Automation Adaptation. Appl Artif Intell 33:747–774.
https://doi.org/10.1080/08839514.2019.1603784
23. Liu R, Shi Y, Ji C, Jia M (2019) A Survey of Sentiment Analysis Based on Transfer Learning. IEEE
Access 7:85401–85412. https://doi.org/10.1109/ACCESS.2019.2925059
24. Liu Y, Li Z, Liu H, Kan Z (2020) Skill transfer learning for autonomous robots and human–robot
cooperation: A survey. Rob Auton Syst 128:103515. https://doi.org/10.1016/j.robot.2020.103515
13
Multimedia Tools and Applications (2024) 83:87237–87298 87295
25. Zhao C (2020) A Survey on Image Style Transfer Approaches Using Deep Learning. J Phys Conf
Ser 1453:. https://doi.org/10.1088/1742-6596/1453/1/012129
26. Niu S, Liu Y, Wang J, Song H (2020) A Decade Survey of Transfer Learning (2010–2020). IEEE
Trans Artif Intell 1:151–166. https://doi.org/10.1109/TAI.2021.3054609
27. Sufian A, Ghosh A, Sadiq AS, Smarandache F (2020) A Survey on Deep Transfer Learning to
Edge Computing for Mitigating the COVID-19 Pandemic: DTL-EC. J Syst Archit 108:101830.
https://doi.org/10.1016/j.sysarc.2020.101830
28. Wei W, Huerta EA, Whitmore BC et al (2020) Deep transfer learning for star cluster classification:
I. application to the PHANGS-HST survey. Mon Not R Astron Soc 493:3178–3193. https://doi.
org/10.1093/mnras/staa325
29. Zhao W, Queralta JP (2020) Westerlund T (2020) Sim-to-Real Transfer in Deep Reinforcement
Learning for Robotics: A Survey. IEEE Symp Ser Comput Intell SSCI 2020:737–744. https://doi.
org/10.1109/SSCI47803.2020.9308468
30. Dhyani B (2021) Transfer Learning in Natural Language Processing: A Survey. Math Stat Eng
Appl 70:303–311. https://doi.org/10.17762/msea.v70i1.2312
31. Panigrahi S, Nanda A, Swarnkar T (2021) A Survey on Transfer Learning. Smart Innov Syst Tech-
nol 194:781–789. https://doi.org/10.1007/978-981-15-5971-6_83
32. Liu X, Li J, Ma J et al (2023) Deep transfer learning for intelligent vehicle perception: A survey.
Green Energy Intell Transp 2:100125. https://doi.org/10.1016/j.geits.2023.100125
33. Al-Hajj R, Assi A, Neji B, Ghandour R, Al Barakeh Z (2023) Transfer learning for renewable
energy systems: a survey. Sustainability 15(11):9131. https://doi.org/10.3390/su15119131
34. Yao S, Kang Q, Zhou MC et al (2023) A survey of transfer learning for machinery diagnostics and
prognostics. Springer, Netherlands
35. Chato L, Regentova E (2023) Survey of transfer learning approaches in the machine learning of
digital health sensing data. J Pers Med 13(12):1703. https://doi.org/10.3390/jpm13121703
36. Haque R, Ali A, Mcclean S et al (2024) Heterogeneous Cross-Project Defect Prediction Using
Encoder Networks and Transfer Learning. IEEE Access 12:409–419. https://doi.org/10.1109/
ACCESS.2023.3343329
37. Xie W, Zhang C, Jia K, et al (2023) Cross-Project Aging-Related Bug Prediction Based on Feature
Transfer and Class Imbalance Learning. Proc - 2023 IEEE 34th Int Symp Softw Reliab Eng Work
ISSREW 2023 206–213. https://doi.org/10.1109/ISSREW60843.2023.00075
38. Wu J, Wu Y, Niu N, Zhou M (2021) MHCPDP: multi-source heterogeneous cross-project defect
prediction via multi-source transfer learning and autoencoder. Softw Qual J 29:405–430. https://
doi.org/10.1007/s11219-021-09553-2
39. Liu C, Yang D, Xia X et al (2019) A two-phase transfer learning model for cross-project defect
prediction. Inf Softw Technol 107:125–136. https://doi.org/10.1016/j.infsof.2018.11.005
40. Li K, Xiang Z, Chen T, Wang S, Tan KC (2020) Understanding the automated parameter optimi-
zation on transfer learning for cross-project defect prediction: an empirical study. In: Proceedings
of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Associa-
tion for Computing Machinery, New York, NY, pp 566–577. https://doi.org/10.1145/3377811.
3380360
41. Chen Y, Dai H (2021) Improving cross-project defect prediction with weighted software modules
via transfer learning. J Phys Conf Ser 2025:. https://doi.org/10.1088/1742-6596/2025/1/012100
42. Zeng F, Lin W, Xing Y, et al (2022) A Cross-project Defect Prediction Model Using Feature
Transfer and Ensemble Learning. Teh Vjesn 29:1089–1099. https://doi.org/10.17559/TV-20220
421110027
43. Lei T, Xue J, Wang Y et al (2022) WCM-WTrA: A Cross-Project Defect Prediction Method Based
on Feature Selection and Distance-Weight Transfer Learning. Chinese J Electron 31:354–366.
https://doi.org/10.1049/cje.2021.00.119
44. Tang S, Huang S, Zheng C, et al (2022) A novel cross-project software defect prediction algorithm
based on transfer learning. Tsinghua Sci Technol 27:41–57. https://doi.org/10.26599/TST.2020.
9010040
45. Zou J, Li Z, Liu X, Tong H (2023) MSCPDPLab: A MATLAB toolbox for transfer learning based
multi-source cross-project defect prediction. SoftwareX 21:101286. https://doi.org/10.1016/j.
softx.2022.101286
46. Bai J, Jia J, Capretz LF (2022) A three-stage transfer learning framework for multi-source cross-
project software defect prediction. Inf Softw Technol 150:106985. https://doi.org/10.1016/j.infsof.
2022.106985
47. Du X, Zhou Z, Yin B, Xiao G (2020) Cross-project bug type prediction based on transfer learning.
Softw Qual J 28:39–57. https://doi.org/10.1007/s11219-019-09467-0
13
87296 Multimedia Tools and Applications (2024) 83:87237–87298
48. Xu Z, Pang S, Zhang T et al (2019) Cross Project Defect Prediction via Balanced Distribution
Adaptation Based Transfer Learning. J Comput Sci Technol 34:1039–1062. https://doi.org/10.
1007/s11390-019-1959-z
49. Canfora G, De Lucia A, Di Penta M et al (2013) Multi-objective cross-project defect prediction.
Proc - IEEE 6th Int Conf Softw Testing. Verif Validation, ICST 2013:252–261. https://doi.org/10.
1109/ICST.2013.38
50. Hosseini S, Turhan B, Mäntylä M (2016) Search based training data selection for cross project
defect prediction. In: Proceedings of the 12th international conference on predictive models and
data analytics in software engineering, pp 1–10. https://doi.org/10.1145/2972958.2972964
51. Zhao Y, Zhu Y, Yu Q, Chen X (2021) Cross-project defect prediction method based on manifold
feature transformation. Future Internet 13(8):216. https://doi.org/10.3390/fi13080216
52. Rhmann W (2020) Cross project defect prediction using hybrid search based algorithms. Int J Inf
Technol 12:531–538
53. Jin C (2021) Cross-project software defect prediction based on domain adaptation learning and
optimization. Expert Syst Appl 171:114637. https://doi.org/10.1016/j.eswa.2021.114637
54. Deepalakshmi J, Chandran M (2022) An optimized clustering model for heterogeneous cross-pro-
ject defect prediction using Quantum Crow search. In: 1st Int Conf Softw Eng Inf Technol (ICo-
SEIT), pp 30–35. https://doi.org/10.1109/ICoSEIT55604.2022.10030011
55. Xing Y, Lin W, Lin X, Yang B, Tan Z (2022) Cross‐project defect prediction based on two‐phase
feature importance amplification. Comput Intell Neurosci 1:2320447. https://doi.org/10.1155/
2022/2320447
56. Aljaidi M, Gul S, Faiz R, Samara G, Alsarhan A, al-Qerem A (2023) Impact evaluation of signifi-
cant feature set in cross project for defect prediction through hybrid feature selection in multiclass.
bioRxiv 2023-07
57. Hu Z, Zhu Y (2023) Cross-project defect prediction method based on genetic algorithm feature
selection. Eng Reports 1–15. https://doi.org/10.1002/eng2.12670
58. Faiz R bin, Shaheen S, Sharaf M, Rauf HT (2023) Optimal Feature Selection through Search-
Based Optimizer in Cross Project. Electron 12:. https://doi.org/10.3390/electronics12030514
59. Gottumukkala DP, Ushasree D, Suneetha TV (2024) Software Defect Prediction Through Effec-
tive Weighted Optimization Model for Assured Software Quality. Int J Intell Syst Appl Eng
12:619–633
60. Hu Z, Zhu Y (2023) Cross‐project defect prediction method based on genetic algorithm feature
selection. Engineering Reports 5(12): e12670. https://doi.org/10.1002/eng2.12670
61. Faiz RB, Shaheen S, Sharaf M, Rauf HT (2023) Optimal feature selection through search-based
optimizer in cross project. Electronics 12(3): 514. https://doi.org/10.3390/electronics12030514
62. Kitchenham BA (2012) Systematic review in software engineering: where we are and where we
should be going. Proc 2nd Int Work Evidential Assess Softw Technol (EAST ’12) 1–2. https://doi.
org/10.1145/2372233.2372235
63. Malhotra R (2016) Empirical research in software engineering: concepts, analysis, and applica-
tions. CRC press.
64. Pratt LY (1992) Discriminability-based transfer between neural networks. Advances in Neural
Information Processing Systems 5:204–211
65. Feuz KD, Cook DJ (2015) Transfer learning across feature-rich heterogeneous feature spaces via fea-
ture-space remapping (FSR). ACM Trans Intell Syst Technol 6:. https://doi.org/10.1145/2629528
66. Do CB, Ng AY (2005) Transfer learning for text classification. Adv Neural Inf Process Syst
18:299–306
67. Liu X, Liu Z, Wang G et al (2017) Ensemble Transfer Learning Algorithm. IEEE. Access 6:2389–
2396. https://doi.org/10.1109/ACCESS.2017.2782884
68. Rana R, Ng AY, Koller D (2006) Constructing informative priors using transfer learning. In: Pro-
ceedings of the 23rd international conference on Machine learning, pp 713–720. https://doi.org/
10.1145/1143844.1143934
69. Yu Q, Jiang S, Zhang Y (2017) A feature matching and transfer approach for cross-company defect
prediction. J Syst Softw 132:366–378. https://doi.org/10.1016/j.jss.2017.06.070
70. Mihalkova L, Huynh T, Mooney RJRJ (2007) Mapping and revising Markov logic networks for
transfer learning. Aaai 7:608–614
71. Weiss K, Khoshgoftaar T (2018) Evaluation of transfer learning algorithms using different base
learners. Proc - Int Conf Tools with Artif Intell ICTAI 2017-Novem:187–196. https://doi.org/10.
1109/ICTAI.2017.00039
72. Pan SJ, Kwok JT, Yang Q (2008) Transfer learning via dimensionality reduction. Proceedeings
23th AAAI Conf Artif Intell 677–682. https://doi.org/10.1109/TKDE.2009.191
13
Multimedia Tools and Applications (2024) 83:87237–87298 87297
73. Pereira FLF, Dos Santos Lima FD, De Moura Leite LG, et al (2017) Transfer learning for Bayesian
networks with application on hard disk drives failure prediction. Proc - 2017 Brazilian Conf Intell
Syst BRACIS 2017 2018-Janua:228–233. https://doi.org/10.1109/BRACIS.2017.64
74. Dai W, Jin O, Xue GR, et al (2009) Eigentransfer: a unified framework for transfer learning. Proc 26th
Annu Int Conf Mach Learn 193–200. https://doi.org/10.1145/1553374.1553399
75. Gargees R, Keller J, Popescu M (2017) Early illness recognition in older adults using transfer learn-
ing. Proc - 2017 IEEE Int Conf Bioinforma Biomed BIBM 2017 2017-Janua:1012–1016. https://doi.
org/10.1109/BIBM.2017.8217795
76. Li B, Yang Q, Xue X (2009) Transfer learning for collaborative filtering via a rating-matrix genera-
tive model. 1–8. https://doi.org/10.1145/1553374.1553454
77. Yan S, Shen B, Mo W, Li N (2018) Transfer Learning for Cross-Platform Software Crowdsourcing
Recommendation. Proc - Asia-Pacific Softw Eng Conf APSEC 2017-Decem:269–278. https://doi.
org/10.1109/APSEC.2017.33
78. Wan J, Wang X, Yin Y, Zhou R (2015) Transfer Learning in Collaborative Filtering for Sparsity
Reduction Via Feature Tags Learning Model. 56–60. https://doi.org/10.14257/astl.2015.81.12
79. Chen Y, Ding X (2018) Research on cross - Project software defect prediction based on transfer learn-
ing. AIP Conf Proc 1955:. https://doi.org/10.1063/1.5033747
80. Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect predic-
tion. Inf Softw Technol 54:248–256. https://doi.org/10.1016/j.infsof.2011.09.007
81. Krishna R, Menzies T (2019) Bellwethers: A Baseline Method for Transfer Learning. IEEE Trans
Softw Eng 45:1081–1105. https://doi.org/10.1109/TSE.2018.2821670
82. Long M, Wang J, Ding G et al (2012) Transfer learning with graph co-regularization. Proc Natl Conf
Artif Intell 2:1033–1039. https://doi.org/10.1609/aaai.v26i1.8290
83. Nam J, Fu W, Kim S et al (2018) Heterogeneous Defect Prediction. IEEE Trans Softw Eng 44:874–
896. https://doi.org/10.1109/TSE.2017.2720603
84. Nam J, Pan SJ, Kim S (2013) Transfer defect learning. Proc - Int Conf Softw Eng 382–391. https://
doi.org/10.1109/ICSE.2013.6606584
85. Deshmukh AA (2018) SEMI-SUPERVISED TRANSFER LEARNING USING MARGINAL PRE-
DICTORS University of Michigan Electrical Engineering and Computer Science Emil Laftchiev
Mitsubishi Electric Research Labs Data Analytics Cambridge, MA 02139. IEEE Data Sci Work
2018:160–164
86. Zhou JT, Pan SJ, Tsang IW, Yan Y (2014) Hybrid heterogeneous transfer learning through deep learn-
ing. Proc Natl Conf Artif Intell 3:2213–2219. https://doi.org/10.1609/aaai.v28i1.8961
87. Wei Y, Zhang Y, Huang J, Yang Q (2018) Transfer Learning via Learning to Transfer. Icml
80:5085–5094
88. Kocaguneli E, Menzies T, Mendes E (2015) Transfer learning in effort estimation. Empir Softw Eng
20:813–843. https://doi.org/10.1007/s10664-014-9300-5
89. Cui Y, Song Y, Sun C, et al (2018) Large Scale Fine-Grained Categorization and Domain-Specific
Transfer Learning. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 4109–4118. https://
doi.org/10.1109/CVPR.2018.00432
90. Feuz KD, Cook DJ (2014) Heterogeneous transfer learning for activity recognition using heu-
ristic search techniques. Int J Pervasive Comput Commun 10:393–418. https://doi.org/10.1108/
IJPCC-03-2014-0020
91. Chen J, Yang Y, Hu K et al (2019) Multiview transfer learning for software defect prediction. IEEE
Access 7:8901–8916. https://doi.org/10.1109/ACCESS.2018.2890733
92. Qing H, Biwen L, Beijun S, Xia Y (2015) Cross-project software defect prediction using feature-
based transfer learning. In: Proceedings of the 7th Asia-Pacific Symposium on Internetware, pp
74–82. https://doi.org/10.1145/2875913.2875944
93. Tong H, Liu B, Wang S, Li Q (2019) Transfer-learning oriented class imbalance learning for cross-
project defect prediction. https://doi.org/10.48550/arXiv.1901.08429
94. Jing X, Wu F, Dong X, et al (2015) Heterogeneous cross-company defect prediction by unified metric
representation and CCA-based transfer learning. 2015 10th Jt Meet Eur Softw Eng Conf ACM SIG-
SOFT Symp Found Softw Eng ESEC/FSE 2015 - Proc 496–507. https://doi.org/10.1145/2786805.
2786813
95. Cao Q, Sun Q, Cao Q, Tan H (2015) Software defect prediction via transfer learning based neural
network. Proc 2015 1st Int Conf Reliab Syst Eng ICRSE 2015. https://doi.org/10.1109/ICRSE.2015.
7366475
96. Krishna R, Menzies T, Fu W (2016) Too much automation? the bellwether effect and its implications
for transfer learning. ASE 2016 - Proc 31st IEEE/ACM Int Conf Autom Softw Eng 122–131. https://
doi.org/10.1145/2970276.2970339
13
87298 Multimedia Tools and Applications (2024) 83:87237–87298
97. Weiss KR, Khoshgoftaar TM (2017) An investigation of transfer learning and traditional machine
learning algorithms. Proc - 2016 IEEE 28th Int Conf Tools with Artif Intell ICTAI 2016 283–290.
https://doi.org/10.1109/ICTAI.2016.48
98. Su KM, Robbins KA, Hairston WD (2017) Adaptive thresholding and reweighting to improve domain
transfer learning for unbalanced data with applications to EEG imbalance. Proc - 2016 15th IEEE Int
Conf Mach Learn Appl ICMLA 2016 320–325. https://doi.org/10.1109/ICMLA.2016.34
99. Jing XY, Wu F, Dong X, Xu B (2017) An Improved SDA Based Defect Prediction Framework for
Both Within-Project and Cross-Project Class-Imbalance Problems. IEEE Trans Softw Eng 43:321–
339. https://doi.org/10.1109/TSE.2016.2597849
100. Wu F, Jing XY, Dong X, et al (2017) Cross-project and within-project semi-supervised software
defect prediction problems study using a unified solution. In: Proceedings - 2017 IEEE/ACM 39th
International Conference on Software Engineering Companion, ICSE-C 2017. Inst Electr Electron
Eng Inc 195–197. https://doi.org/10.1109/ICSE-C.2017.72
101. Duan L, Tsang IW, Xu D (2012) Domain transfer multiple kernel learning. IEEE Trans Pattern Anal
Mach Intell 34:465–479. https://doi.org/10.1109/TPAMI.2011.114
102. Wei Y, Zhang Y, Huang J, Yang Q (2018) Transfer learning via learning to transfer. 35th Int Conf
Mach Learn ICML 11:8059. https://doi.org/1783.1/92190
103. Weiss KR, Khoshgoftaar TM (2017) Detection of Phishing Webpages Using Heterogeneous Transfer
Learning. Proc - 2017 IEEE 3rd Int Conf Collab Internet Comput CIC 2017 2017-Janua:190–197.
https://doi.org/10.1109/CIC.2017.00034
104. Xu Y, Pan SJ, Xiong H et al (2017) A Unified Framework for Metric Transfer Learning. IEEE Trans
Knowl Data Eng 29:1158–1171. https://doi.org/10.1109/TKDE.2017.2669193
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.
13