23_Domain_Adaptation_Challenges_Methods_Datasets_and_Applications (1)
23_Domain_Adaptation_Challenges_Methods_Datasets_and_Applications (1)
ABSTRACT Deep Neural Networks (DNNs) trained on one dataset (source domain) do not perform well
on another set of data (target domain), which is different but has similar properties as the source domain.
Domain Adaptation (DA) strives to alleviate this problem and has great potential in its application in practical
settings, real-world scenarios, industrial applications and many data domains. Various DA methods aimed at
individual data domains have been reported in the last few years; however, there is no comprehensive survey
that encompasses all these data domains, focuses on the datasets available, the methods relevant to each
domain, and importantly the applications and challenges. To that end, this survey paper discusses how DA can
help DNNs work efficiently in these settings by reviewing DA methods and techniques. We have considered
five data domains: computer vision, natural language processing, speech, time-series, and multi-modal
Taxonomy -
data. We present a comprehensive taxonomy, including the methods, datasets, challenges, and applications classification specially
corresponding to each domain. Our goal is to discuss industrial use cases and DA implementation for those. species
Our final aim is to provide future research directions based on evolving methods and results, the datasets
used, and industrial applications.
INDEX TERMS Artificial intelligence, computer vision, deep neural network, domain adaptation, multi-
modal data, natural language processing.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 11, 2023 6973
P. Singhal et al.: Domain Adaptation: Challenges, Methods, Datasets, and Applications
the state-of-the-art models are limited to only some academic focused on the visual domain (CV) or NLP domain
datasets. The performance degradation is caused by domain and missed out on areas of cross-pollination. This
shift (domain gap or dataset bias): the difference in data survey, we believe, for the first time, discusses DA in
distributions between source and target domains. DA is a field multi-modal data settings. To understand data domain
of AI that aims to alleviate as far as possible the impact of (CV, NLP, speech, time-series, multi-modal) specific
domain shift and ensures that the models perform well in DA methods and techniques and ones that are used
the target domain after being trained on the source domain. across data domains.
The target and source domains should have some similarities 2. To compile a list of existing and emerging DA datasets
(e.g., features) for a meaningful adaptation. and tasks in five data domains.
DA provides an attractive option for Deep Learning (DL) 3. To review recent DA methods and techniques for more
– which, more often than not, provide high performance over practical DA settings like learning with fewer data,
shallow learning or classical learning algorithms. DA negates learning on the go, continuous adaptation, presence of
the vast amount of labeled data requirements in the tar- domain or category gap, etc., across data domains.
get domain and typically uses available (labeled) data in 4. To understand challenges and issues that hinder the
the source domain, a boon to data-hungry supervised DL adoption of DA. Based on these challenges and issues,
algorithms. Realistically, there is an excessive amount of research directions are also provided. These challenges
unlabeled data available, but labeled data is scarce. Some and issues also provide research direction.
techniques have been tried to better the performance met- 5. Understanding and reviewing industrial use-cases
ric of deep networks by using more data (labeled) from where DA has been employed and appreciating
the target domain, including better/alternative architectures use-cases where DA if deployed, would provide rich
and backbones, use of normalization layers (e.g., Instance dividends.
Normalization (IN) [2], Batch Normalization (BN) [3]), data Organization of paper: Pictorial view of the organization
generation and data augmentation, etc. By far, DA appears of the paper can be seen in Figure 1. For completeness,
to provide a more robust alternative to all the mentioned the survey also briefly discusses the background, definition,
techniques. and theory of DA in section II and then discusses DA in
Initial work on DA is related to shallow (or classical) shallow or classical learning in section III. DA in DL is
learning. With DL more prevalent in recent years, the focus discussed in section IV; this section also focuses on more
of research shifted to DA in DL. The invention of GAN [4], practical DA settings. Datasets used in five data domains
Attention and Attention-based Transformers [5] have boosted and observations are mentioned in section V. Challenges and
various DA in DL methods. The research direction and focus issues being worked on in this field are mentioned in section
now is to solve real-world and practical setting problems with VI. Section VII looks at common and specific DA use-cases
the latest methods and techniques (e.g., few or zero-shot, self- across industries and provides a perspective on how DA can
supervised learning, meta-learning, etc.) and with real-world be helpful. Section VIII provides the future research frontiers.
data situations (e.g., multi-modal data, multi-domain, contin- The paper is concluded in section IX.
uous/ incremental domains, and data restriction, etc.). This
survey does not focus at length on Domain Generalization II. BACKGROUND
(DG), a related area where information about the target This section aims to succinctly provide the formal def-
domain is unknown. inition of DA, the categories of transfer learning and
A number of survey papers on DA are reported. The domain adaptation, and a theoretical foundation of domain
primary difference between this and the previous works is adaptation.
threefold; this survey encompasses various data domains
instead of only focusing on a specific (text/image-based) A. FORMAL DEFINITION OF DOMAIN ADAPTATION
modality. Secondly, the survey is conducted with a primary Let there be a source domain Ds , composed of a feature
focus on the applications of DA in these data domains space χ s and marginal probability distribution P(X s ) such
– the challenges faced and how those can be mitigated that Ds = {χ s , P(X s )}. Also, there exists a sample set
using DA. Thirdly, it tries to understand the application X s = {x1s , x2s , . . . , xns } and corresponding labels Y s =
of DA approach across data domains/modalities and also {ys1 , ys2 , . . . , ysn } from ϒ.
tries to understand what makes a particular DA approach Similarly, there is a target domain Dt , composed of a fea-
data domain specific. In summary, the primary goals of this ture space χ t and data with marginal probability distribution
work are: P(X t ) such that Dt = {χ t , P(X t )}. Also, there exists a sample
1. To provide a joint perspective and recent updates of set X t = {x1t , x2t , . . . , xnt } and corresponding labels Y t =
domain adaptation in five deep learning data domains {yt1 , yt2 , . . . , ytn } from ϒ.
– Visual or Computer Vision (CV), Natural Language Sometimes, labels in the target domain are unavailable
Processing (NLP), speech, time-series, and multi- (case of unsupervised DA) or only a few are available (case
modal domains. Most of the previous surveys only of semi-supervised DA), or no data at all is available in the
Ys serves as training target in supervised learning task.
6974 VOLUME 11, 2023
P. Singhal et al.: Domain Adaptation: Challenges, Methods, Datasets, and Applications
FIGURE 1. Organization of the paper. Paper flow is from top to bottom and right to left. Industrial applications (Sec. VII-F)
include applications of both shallow and deep learning domain adaptations. Data Domain (CV, NLP, Speech, Time-series,
Multi-modal) specific approaches / methods, Datasets and Applications are contained in subsections of Sec. IV, Sec. V,
Sec. VII respectively.
target domain (case of domain generalization or zero-shot A. Based on the feature set and data distributions, there
DA). Supervised or unsupervised DA refers to labels in the are two types of transfer learning approaches (refer to
target domain being available or not for training. There exists Table 1)
a domain shift between Ds and Dt . The task in the source B. Based on the task difference and the corresponding
domain is: T s = {ϒ s , P(Y s |X s )} and the target domain is: source and target domain data (refer to Table 2)
T t = {ϒ t , P(Y t |X t )}. Figure 2 shows DA categories based on various source
In the case of DA, if there exists a mathematical model and target domain characteristics. DA work typically falls
f : X s → Y s . If, T s is related to T t , and the same model f into homogeneous and transductive TL. However, in the
also works for X t → Y t with a minimal error or acceptable recent past, there have been reasonable attempts to focus
error, the model f has adapted to the target domain Dt and on heterogenous DA. DA can be categorized based on
source domain Ds . So this is the what it is the availability of labels in the target domain (refer to
Table 3 and Table 4).
DA can also be categorized based on the label (classes) in
B. CATEGORIES OF TRANSFER LEARNING AND DOMAIN domain and source data (refer to Table 4).
ADAPTATION Typically, domain classification represents the scenario
The seminal work on DA by Pan and Yang [6] mentions when there is only a single source domain. The adaptation
that DA is a specific case of transfer learning (TL). The is to another single-target domain (called single-target DA).
commonality between DA and TL is that some learning based However, recently, DA to multiple target domains (called
on source domain data is utilized for the task in another. multi-target DA) is also reported. Adaptation from multi-
Hence it is beneficial to understand different instances/types ple source domains (called multi-source DA) has also been
of TL. researched.
FIGURE 2. Overview of DA categories based on data characteristics (availability, feature space, number of classes) of the
source and target domains. Subclassification within each class leads us to specific category of DA.
Until more recently, the DA focus was on reducing data. Predictive DA uses metadata in the target domain to
the dependency of labeled instances of data in the target adapt. Domain generalization (DG) can be seen like zero-shot
domain; now, researchers are also focusing on reducing the DA, but it is bereaved of knowing anything about the target;
dependency of data itself in the target domain. Few-shot DA, however, more robust DG methods should also include some
single-shot DA, and zero-shot DA are examples of efforts essence of multi-target DA, Universal DA. DA techniques
to incrementally reduce the requirement of target domain also focus on the absence of source data during the DA
FIGURE 3. Domain adaptation categories plotted based on the availability of annotated data in the source and the target domain (forming aa horizontal
plane) and category (class) set difference in source and target (forming the vertical axis), enhanced and adapted from Tommasi [7].
TABLE 1. Transfer learning (/domain adaptation) categories based on TABLE 2. Transfer learning categories based on task differences and data.
data.
TABLE 3. DA categories based on the availability of labeled target TABLE 4. DA categories based on label set in target domain data. Cs &Ct
domain data. Represent label set (not the number) in sources & target domain,
respectively.
Techniques
1.
Adversarial
Domain
Adaptation:
2.Distribution
Matching
3.Moment
Matching
4.Contrastive
or Self-
Supervised
Learning
5.
Representati
on Learning
via Domain
Regularizatio
n
of each view simultaneously, assuming the features are not that DA is supported for deep neural networks. Unlike DA
(much) common (i.e., χ viewi ̸ = χ viewj ), very similar to in shallow learning, the focus of DA in deep learning is to
heterogenous DA where χ source ̸ = χ target . Also, similar include DA in the deep learning process and pipeline such
is Domain Separation Networks (DSN) [31], where private that transferable representations are learned. In this direction,
and shared feature spaces are orthogonal, as far as possible, the earliest work, Glorot et al. [34], included Stacked Denois-
the endeavor is to have (in different co-training strategies) ing Autoencoders (SDA) on amazon.com product reviews
features split into two mutually exclusive views. Blum & to do sentiment analysis for different products. After that,
Mitchell, in their co-training strategy [32], solved the NLP substantial work has been done in the CV area, with NLP
text classification problem, using as one of the views the picking up (again) fast in the recent past – primarily due to
anchor texts of hyperlinks of pages pointing to the page and the availability of transfer learning in NLP using transformers
another view as the text of the page. The features are taken to and attention architectures. DA research has now gathered
be dissimilar. Due to reduction of the bias of predictions on pace to solve real-world problems (like multi-modal data
unlabeled data, Ruder [33] mentions that Tri-training is one support, data restrictions, and scarcity).
of the best multi-view training methods. Table 7 lists the Deep DA methods and approaches and
further extends on the deep DA categorization mentioned by
IV. DOMAIN ADAPTATION IN DEEP LEARNING
Wang and Deng [35]. However, [35] only focused on Deep
Since deep neural networks are associated with high accuracy DA techniques for the visual domain. In contrast, we aim to
(or any required metrics) and can provide state-of-the-art include more deep DA approaches, which are data domain-
(SOTA) results, there has been increased usage of deep neural specific and review progress on other existing approaches.
networks in many AI and ML applications and tasks. How- Also, our emphasis is to learn more about DA in unsupervised
ever, these networks also face domain shift problems and are settings; supervised settings, semi-supervised and pseudo-
not able to adapt to different (from source domain) data dis- semi-supervised are included for completeness or novelty.
tributions and provide the same SOTA results. Further, given
that deep neural networks require a large amount of labeled A. DISCREPANCY-BASED METHODS
data to train and the availability of labeled data is a concern (it These methods build on the shallow domain adaptation meth-
is costly, arduous, or at times infeasible), it is much required ods, map the features to a high dimensional RKHS space, and
understand the discrepancy using metrics like MMD or simi- features that the network learns should be agnostic, i.e., they
lar. The difference being distribution difference is understood should lie in a feature space where domain information is lost
and aligned using deep features against the hand-crafted fea- while the class information is intact. MMD [14] is used as
tures of shallow DA methods. a discrepancy metric for domain loss. Extending (2), DDC
Figure 4 shows the typical structure/architecture of net- mentioned the joint loss for domain adaptation as
works implementing discrepancy-based methods – discrep-
ancy metrics or representation of the network (along with LDomain_Adaptation = LClassification + λMMD2 (P(X s ), P(X t ))
discrepancy metric) or loss is used to regularize the network. (4)
Domain adaptation can happen at single or multiple layers
(called adaptation layers in Figure 4) Equation (4) also helps us understand that the discrepancy
metric (MMD or similar) acts like a regularizer for the overall
network. Later on, many works have built up on the DDC’s
1) DISCREPANCY METHODS: METRICS-BASED key idea by using different/similar discrepancy metrics. Dis-
Deep Domain Confusion (DDC) [45] was the first key idea crepancy metrics often used in deep DA are mentioned in
that jointly optimized task (classification) and domain confu- Table 8.
sion. Similar to Figure 4, DDC used 2 parallel networks with The work of Kashyap et al. [46] further segregates the
one network as supervised (classification loss was included) divergences into 3 classes – Geometric (distance between
and the other network as not supervised. Domain (confusion) vectors), Information-theoretic (distance between probabil-
loss is used to adapt two fully connected layers with the idea: ity distribution), and higher-order measures (amongst higher
FIGURE 4. Typical network structure/architecture of deep neural networks implementing discrepancy-based methods for domain
adaptation. The feature extraction layers are shared or regularized. The layers after feature called ‘‘Adaptation layers’’ include the
discrepancy metrics or a representation of network along with the discrepancy metric. Task specific layers do not take much part in adapting
the features of source and target. Best viewed in color.
moment distance between distributions or distance between methods is that information about the domain change
projections or distance between representations). (source to target) is only an affine transformation away,
i.e., there exists a small transformation on weights that
2) DISCREPANCY METHODS: ARCHITECTURE-BASED can help the transformation from source to target features.
In these methods, the focus is more on learning more This small transformation can be the affine transforma-
transferable features and architecture than the metric. The tion or Multi-Layer Perceptron (MLP) / Deep Networks
underlying principle with architecture-based discrepancy themselves.
FIGURE 5. Typical network structure/architecture of deep neural networks implementing adversarial-based methods for domain adaptation. The
generator(s) are optional – used only to create synthetic data if required. Domain Discriminator looks to discriminate (classify) source and target
domains, while task-specific (say classification) is done by task-specific network. Best viewed in color.
Deep Adaptation Network (DAN) [40] uses the concept B. ADVERSARIAL METHODS
that in convolutional deep networks transition, earlier layers The idea behind adversarial set of methods is to enhance
understand generic features while the later layer understands domain confusion while still being robustly trained to
task-specific features. They froze the initial layers, fine- understand domain segregation (adversarial objective). This
tuned the middle layers, and looked at discrepancy-based is closely related to Generative Adversarial Network or
methods like MK-MMD (multiple kernel – MMD), a vari- GANs [4], which includes two networks – Generator and
ant of MMD [13], to adapt later layers. Typically, the Discriminator – in an adversarial setting. The generator aims
discrepancy-based methods look to align the marginal dis- to produce output (typically images) to fool or confuse the
tribution of source and target data, but there are different discriminator, while the discriminator, on the other hand,
approaches, too, like Joint Adaptation Network (JAN) [41]. tries to segregate it into real and fake. In DA, the idea bor-
JAN further improved on DAN architecture by learning rowed is that discriminator should be able to segregate the
joint distributions of multiple domain-specific layers across domain distribution of source and target domains (say by
domains and using the joint maximum mean discrepancy using domain invariant features). Adversarial Discriminative
(JMMD) criterion; they used a representation (ϕ) of the net- Domain Adaptation, or ADDA [48] introduced a generic
work itself. framework (similar to Figure 5) for DA using adversarial
Similarly, JAN-A [41] builds further on JAN architec- models. Typical adversarial discriminative architectures fol-
ture that there is now another network (θ) that computes low a Siamese architecture with source and target stream and
representation on top of network representations (ϕ). This are trained on task loss (typically classification) and either
not only minimizes the JMMD but also learns the network an adversarial loss or a discrepancy loss. In contrast, adver-
(θ) – maximum is over the θ network, and the minimum sarial generative architecture (in its simplest form) includes a
is over the JMMD – an adversarial objective (min-max). generator that generates other domain (typically target) map-
Computer Vision (CV) also uses normalization layers as a ping from the first domain (typically source); after that, the
key architectural concept for DA. Given that this is specific generated mapping and other mappings then follow adversar-
to CV, it is detailed in the normalization layers (refer to ial discriminative architecture.
section Normalization layers). The hypothesis behind this
is batch normalization (BN) layer represents domain-related
knowledge. Transferrable Prototypical Networks (TPN) [47] 1) ADVERSARIAL DISCRIMINATIVE MODELS
focus on discrepancy (distances) for each class in an embed- One of the seminal works in deep DA is Domain-Adversarial
ding space of 3 datasets - source only, target only and a Neural Network (DANN) [49] (refer to Figure 6), which
mix of source and target. It also assigns ‘‘pseudo-labels’’ supports the idea of adversarial domain adaptation, i.e., learn-
to unlabeled target samples. Adaptation is done so that the ing task should be discriminative yet, it should encourage
prototype of each class is close in the embedding space. domain confusion. They showed that any feed-forward model
could support adaptation if augmented with a novel gradient generative component typically creates synthetic target data
reversal layer. from labeled source data. This synthetic labeled target data
DANN is the most widely used DA approach across all alleviates the need for labeled examples in target domains.
data domains. In CV, DANN was initially used for digit Then the network is trained to assume there is no or little
recognition and image classification. Later on, DANN or domain shift present in the synthetic data. The source map-
its derivatives are also used for more complex tasks like ping component is the generator that maps the source domain
semantic segmentation and object detection. In the case of into the target domain. Therefore, colloquially these genera-
semantic segmentation, a Siamese network (consisting of two tors are also known as domain mappers. One of the earliest
parallel tracks) approach is taken, where one track processes works in adversarial generative models is Coupled Generative
source samples and the other track processes target samples. Adversarial Network – CoGAN [60]. As the name suggests,
Due to the inherent complexity of tasks – Domain alignment two GANs run parallel, and weight sharing happens in the ini-
(Domain Classifier – pink network in Figure 6) is present at tial layers for generators and the final layers of discriminators.
various layers/stages, and the convolution layers (input to fea- These layers capture high-level features in discriminators and
ture extractors) and deconvolution layers (feature extractors high-level semantics in generators. This helps the GAN to
to semantic map) are aligned (shared, mapped or statistical understand the joint distribution of domains. In CoGAN, the
metric is used). Hoffman et al. [50] used 2 more losses other target domain is transformed into the source domain, and then
than the regular semantic loss – one loss to adapt to category- the classification happens.
specific parameters, i.e., category-specific adaptation and the Typically, the DA is specific to a task (shared across
other loss to reduce ‘‘global distribution distance,’’ i.e., global two domains); however, PixelDA [61] used an adversarial
domain alignment. Huang et al. [51] looked at aligning fea- generative DA setup to provide a framework that is decou-
tures at each layer of the network. pled from task-related aspects. Typically, source images are
In NLP, DANN has been widely used for classification transformed into target-like images; however, Generate-to-
tasks – Text Classification ([52], [53]) and Sentiment Analy- adapt [62] uses GANs for domain adaption with Generator
sis ( [49], [54], [55]). DANN or its variants are also used for creating source-like images for target domain cases. It uses
Named Entity Recognition (NER) ([52], [56]) and Parts of the embeddings (learned during training) of images as the
Speech (PoS) Tagging [57]– structural prediction tasks. latent space as an auxiliary input to the GAN to create
Adversarial Discriminative Domain Adaptation or ADDA source-like images from the generator and discriminator,
[48] model also uses similar philosophy as DANN but dif- discriminating the domain (real/fake) and providing class
fers in that feature extractors are not shared between source labels. Other examples of adversarial generative models in
and target, and the loss function that is used in ADDA the speech domain are Park et al. [63] and Augmented Cyclic
is GAN loss while DANN uses min-max loss and the Adversarial Learning (ACAL) [64].
training is multistep. Conditional Domain Adversarial Net-
works (CDAN) [58] use a conditional discriminator, taking 3) ADVERSARIAL RECONSTRUCTION-BASED METHODS
input from both feature extractor and classifier. Work of Another variation of adversarial generative methods is
Shen et al. [54], instead of using a pure classifier in the reconstruction-based methods (on the same lines as shallow
discriminator, used the loss as Wasserstein distance (similar feature matching DA strategy): Reconstruction methods typ-
to Wasserstein GAN by Arjovsky et al. [59]) during training ically use Adversarial GAN-based networks or Autoencoder
between source and target samples. Inspired by the multi- (AE) based networks to reconstruct one domain content in
view strategy, Du et al. [60] proposed Dual Adversarial another domain style. Table 9 provides key ideas behind some
Domain Adaptation (DADA), having two ‘‘joint’’ discrim- Adversarial reconstruction-based methods. There are other
inators, supporting all the classes of source and the tar- methods in the literature that do not fully comply with the
get domain (2K -dimension), pitted against each other and adversarial reconstruction definition but still are very close
back-propagating into feature extractor. They also used a to its working.
source class predictor to classify source labels and provide
pseudo labels. The latest attempt to improve adversarial dis- • An example in NLP is AE-SCL: Ziser and Reichart [66]
criminative models is Smooth Domain Adversarial Training brought SCL [16] into the neural networks using
(SDAT) [61], which mentions that reaching smooth minima Autoencoders; their network is called Autoencoder-SCL
only for the task-specific loss (and not the domain discrimi- or AE-SCL. AE-SCL does not reconstruct the input
nator loss) helps better adapting to the target domain. but predicts if the pivot features will be present in the
input or not. They used this for cross-domain sentiment
analysis. They further improved AE-SCL using Pivot-
2) ADVERSARIAL GENERATIVE MODELS Based Language Modeling (PBLM) [67] and Task
Adversarial generative models are different from Adversar- Refinement Learning using PBLM (TRL-PBLM) [68].
ial discriminative models in that they have a generative • An example in CV is DiscoGAN: DiscoGAN [69] is
component (typically, a generator of GAN) along with the also very similar to CycleGAN, the difference being
discriminative component of discriminative models. This that it does not have cyclic reconstruction loss.
FIGURE 6. Domain-Adversarial Neural Network (DANN) trains two network together. DANN trains feature extractor (green network) and
class/label predictor (blue network) on source data. DANN also trains feature extractor (green network) and domain classifier (pink
network) on the source and target data. The gradient reversal layer (GRL) allows the feed-forward network to progress as it is; however,
during the backpropagation, it changes (reverses but multiplies by a negative quantity) the gradient from domain discrimination, which
leads to the feature extractor (green network) understands domain invariant features (domain confusion). λ helps to learn the
classification features and then slowly learn domain features. Best viewed in color.
C. MULTI-DOMAIN ADAPTATION labeled source domain and the target domain. Zhao et al. [73]
Multi-Domain DA setting differs from a typical DA setting introduced Multi-source Adversarial Domain Aggregation
in that either number of source domains would be multiple Network or MADAN, which essentially uses CycleGAN
(called multi-source adaptation), or the number of target (sub-domain aggregator discriminator for source domains
domains would be multiple (called multi-domain adaptation). and cross-domain cycle discriminator for source-target
domains) coand creates a latent adapted domain for all
1) MULTI-SOURCE ADAPTATION source data and target data. Similarly, Russo, Tommasi, and
To create more robust domain-adapted models, it makes sense Caputo [74] used CoGAN to adapt each source and tar-
to train the models on multiple sources. In earlier surveys get domain. Rebuffi et al. [75] used one residual adapters
(pre-deep learning), Sun et al. [70] mention training an indi- (which sit on the residual branch) for each domain. Yang and
vidual classifier on individual source domains and a target Hospedales [76] provided both multi-task and multi-domain
domain and then merging the base classifiers or merging all perspectives using low-rank tensor methods; this work also
the sources as one source and then training. For deep learn- provides an alternative to zero-shot learning.
ing, Zhao et al. [71] again mention the use of discrepancy, In NLP, Guo et al. [77] introduce DistanceNet-Bandit, with
adversarial, and feature alignment-based strategies. Another distance metrics (DistanceNet) providing loss functions in
strategy or area explored is intermediate domain generation addition to task loss along with using multi-armed bandit
for adaptation; in this case, a domain is generated using to control switching between multiple domains dynamically.
domain generators (typically GAN-based). Guo et al. [78] used meta-learning to combine predictors from
Moment Matching for Multi-Source Domain Adapta- each source-target domain.
tion(M3DA) [72] created a multi-domain dataset called In time-series, Zhu et al. [79] used a multi-adversarial
DomainNet with 6 domains. Further, they dynamically strategy where multiple source domains (sample of roller
aligned moments of feature distributions of the multiple bearings) were projected into a shared subspace, and domain
invariant features were obtained. Xia et al. [80] introduced The diversity of the models makes sure that the deviation
a moment matching-based intraclass multisource domain from correctness is not much. One of the most significant
adaptation network, which measures the discrepancy (MMD) drawbacks of these models is that they are computationally
between each source domain and target domain samples. expensive. Ensemble methods for DA can be segregated
into two sub-techniques – pseudo labeling ensembling and
2) MULTI-TARGET ADAPTATION self-ensembling.
Typically, DA follows the pairwise approach, with the In the case of the self-ensembling method, the combining
source domain linked to the target domain. Inspired by [73], of output is done on multiple outputs of a single model
Gholami et al. [81] also look for shared information over time. Combining outputs over time is also known as
across domains. They propose Multi-Target DA-Information- temporal ensembling. French et al. [84] used Teacher Student
Theoretic-Approach or MTDA-ITA, which uses private (mean teacher variant) architecture proposed by Tarvainen
and shared spaces between source and target combina- and Valpola [85], as a self-ensemble technique for visual DA.
tion, much like Domain Separation Networks (DSN) [31]. The teacher network is first trained on the task and outputs
Isobe et al. [82] used multi-target DA for semantic seg- floats (probabilities) instead of Boolean (0–1 integer) labels.
mentation tasks using the individual source-target and indi- The student then learns from the teacher, and the student can
vidual bridges created amongst the pairs for collaboration. learn things better because the teacher informs the student
A student model is learned based on all the individual of the nuances. Gradient descent is used to train the stu-
source-target model pairs using regularization on each indi- dent network, while the exponential moving average of the
vidual source-target model pair. Similar knowledge dis- student network is the weight of the teacher network. The
tillation is understood in Multi-Teacher Multi-Target DA training loss is a combination of a supervised and an unsuper-
(MT-MTDA) [83] vised component. This architecture dramatically reduces the
model parameters without compromising on accuracy met-
D. HYBRID METHODS rics. In NLP, [86] also used adaptive ensembling, an extension
Hybrid methods indicate the amalgamation of multiple tech- to temporal ensembling, and classified political data while
niques discussed before for executing DA. studying temporal and topic drift. They used a temporal
curriculum and a student-teacher network.
1) ENSEMBLE-BASED METHODS Another data-centric variant of ensembling is pseudo-
Ensemble methods contain multiple models, where the out- labeling ensembling, wherein the target domain labels are
put of multiple models is combined, typically averaging provided based on the combined perspective of compris-
in regression and voting in case of classification tasks. ing models. If most models converge, i.e., there is high
FIGURE 7. Typical network structure/architecture of deep neural networks implementing homogeneous multi-modal DA. Discriminators in
Intra-Modality block force the feature extractors to understand domain-invariant features. Inter-Modality and Inter-Domain are optional and
not seen in every multi-modal DA setting. Best viewed in color.
confidence in label class for a particular instance of the target 1) HOMOGENEOUS MULTI-MODAL DOMAIN ADAPTATION
domain. An instance of the target domain (not the source Most of the work in DA supports homogeneous data, i.e.,
domain) is used for training the target classifier, hence the feature space remains the same (χ s = χ t ), but the shift is
name of the technique pseudo-labeling. In computer vision, because of different data distributions, i.e., P(X s ) ̸=P(X t ).
Saito et al. [87] proposed Asymmetric Tri-Training (ATT), When both source and target domain would have at least
which had two networks providing the labels for target two modalities, i.e., multi-modal, but still, the feature space
domain instances – first trained on the source domain if the (features fed for the task perspective) is the same, it is
two networks converge, then the pseudo label is assigned to called Homogeneous multimodal DA. Typical homogeneous
the target instance, and that data is used for training the third multi-modal architecture (refer to Figure 7) does imple-
network. Final labels don’t have to be provided at all times, ment intra-modality interaction compulsorily; however, it is
and the probability score can also be used instead (examples: seen that implementation of inter-modality and inter-domain
Zou et al. [88] and, to some extent, French et al. [84]) aspects is optional.
Qi et al. [89] created a multi-modal DA network with atten-
E. MULTI-MODAL DOMAIN ADAPTATION tion and fusion modules along with hybrid domain constraints
Multi-modal is a complex data domain with respect to to learn domain invariant features. The intra and inter units
DA, as the DA process has to take into account the dif- in the attention module help to understand the relationship
ferent modality structures and different domain shifts (for among modalities. The bilinear model approach ( [90], [91])
each modality). In the case of heterogeneous multi-modal was used for fusion, and then tucker decomposition was used
DA, the DA process must also take care of different fea- to support computational (GPU) [92] restriction.
ture spaces/ feature representations/ dimensions of feature For social media event rumor detection, Zhang et al. [93]
spaces. proposed Multi-modal Disentangled Domain Adaption
(MDDA), which looks to resolve two challenges – entan- scenarios involving higher-dimension also (supported by a
glement and domain. Disentanglement of event content with variant called Randomized Multilinear (RM) conditioning).
rumor style was done as part of the first challenge, and Athanasiadis et al. [98] present Domain Adaptation Con-
domain shift was tackled in the latter challenge (with only ditional Semi-Supervised Generative Adversarial Networks
rumor style taken after the first challenge). The network (dacssGAN) in the realm of emotion recognition, where
learned only a transferrable rumor style with the alignment domains (audio, video) are heterogeneous and multi-modal.
of feature distributions over different events. The network uses GANs and conformal prediction tech-
Multi-Modal Self Supervised Adversarial Domain Adap- niques [99] to implement DA.
tation or MM-SADA [94] uses two modalities – optical flow Seo et al. [100] aim to improve audio-visual sentiment
and RGB of EPIC-Kitchens video dataset, and understand analysis performance using text modality during the training
if the fine-grained action recognition (depends highly on phase by ‘‘transferring knowledge’’ of unimodal (text modal-
the environment) can be improved across dataset domains. ity) to other modalities (audio and visual). The knowledge
They used self-supervision across two domains with both transfer employs the reduction of distribution differences of
modalities and adversarial adaptation between each modality feature representation in data for each modality.
of source and target data (i.e., one discriminator for RGB and In NLP, Cross-lingual translation also falls under hetero-
one for optical flow). geneous tasks as the words and the construct of the two
Li et al. [95] look at DA amongst multiple modalities languages are very different, leading to an assumption that
from domains (scripted source, improvised source). They use input features don’t match. i.e., χ s ̸= χ t . Various attempts,
an emotion recognition model based on adversarial training including Conneau et al. [101], have been made to sup-
(which helps to remove domain difference between emotion port cross-lingual DA as an unsupervised task; however,
elicitation approaches) and a soft label loss approach (which Søgaard et al. [102] showed that the underlying assump-
helps to understand non-rigid emotions and to consider emo- tion that the words are isomorphic in a language is incor-
tion and domain categories simultaneously). rect. They further suggested that a weakly supervised
solution outperforms (the metric used was bilingual dic-
tionary induction scores) unsupervised cross-lingual DA.
2) HETEROGENEOUS MULTI-MODAL DOMAIN ADAPTATION Conneau et al. [103] mentioned that pre-trained models (dis-
One of the most prevalent real-world data is the heteroge- cussed later in the section Pre-Trained Models) achieve better
neous multi-modal domain; as deep networks look to use results in unsupervised cross-learning representation trans-
more heterogeneous multi-modal data, it is imperative to lation tasks. Generative adversarial text-to-image synthe-
learn DA in heterogeneous multi-modal settings. The DA, sis [104] provided a way to generate an image based on text,
in the case of heterogeneous data, is carried out by extracting translating visual concepts of pixels from characters using a
features of two domains using separate network and the task convolutional-recurrent neural network. Along similar lines,
level aspects either by sharing weights (strong parameter StackGAN [105] also created photo-realistic images in two
sharing) or weakly parameter-shared weights as in the work stacked steps from the text.
of Shu et al. [96].
The importance of heterogeneous multi-modal DA lies
F. DOMAIN ADAPTATION IN COMPUTER VISION (CV)
when one of the modalities is missing in the target domain:
This section focuses on DA strategies typically only seen in
consider the source domain having modalities m1 and m2,
the computer vision data domain and not shared with other
while the target domain may just contain m3 with missing
data domains.
m4. Ding et al. [97] look at solving a real-world ‘Missing
Modality Problem’ by introducing Missing Modality Trans-
fer Learning via latent low-rank constraint (M2TL). The 1) NORMALIZATION LAYERS
transfer of learning is twofold – one, from one database Normalization layers help maintain a stable training of neural
to another (cross-database transfer), and two, from source networks and are used in nearly all neural networks. A few
modality to target modality (cross-modality transfer). They examples of normalization layers in regular neural networks
use low-rank matrix constraint to learn subspace within a are batch normalization or batchnorm [3], layer normaliza-
database across modalities and MMD to couple databases in tion or layernorm [106], instance normalization or instan-
the source domain (known modalities). cenorm [2], and group normalization or groupnorm [107].
Conditional adversarial domain adaptation [58] uses con- Chang [108] created a DA framework using a domain-
ditional domain adversarial networks (CDAN), a variant of specific batch normalization layer – other model parameters
the adversarial discriminative model, which assists adver- were shared between domains. Li et al. [109] proposed the
sarial adaptation by employing discriminative information Adaptive Batch Normalization (AdaBN) layer. The intuition
understood in the classifier predictions. The discriminator behind the layers is that these layers learn domain knowledge
is conditioned on the cross-covariance of domain-specific in contrast to weights learning task knowledge and biases
feature representations and classifier predictions. CDAN can learning some sort of priors. Carlucci et al. [110] in Auto-
adapt to multi-modal data distributions and can support DIAL built further on [109] AdaBN layers and used DA
FIGURE 8. Typical self-supervision network structure. It is a multi-task network and includes an auxiliary task which aims at
understand the feature distribution mode, but does not impact the core DA task but ‘‘provides’’ knowledge of sorts to core DA task.
Best viewed in color.
layers amongst the standard CNN layers. The purpose of G. DOMAIN ADAPTATION IN NATURAL LANGUAGE
these layers was to normalize the target and source mini- PROCESSING (NLP)
batches (separate for two domains) but influenced by each This section focuses on DA strategies typically only seen
other based on a parameter learned as part of the training in the NLP data domain and not shared with other data
process. domains. Most of the work in DA has been done in the
Roy et al. [111] proposed Domain-specific Whitening CV area, though the origins of DA have been in NLP. For
Transform (DWT) – domain alignment layers to com- example, DANN [49] was initially applied to sentiment clas-
pute intermediate feature covariance matrices, along with sification, but later it was used for computer vision clas-
Min-Entropy Consensus (MEC) loss (a merger of entropy and sification tasks. Ramponi and Plank [115] categorize NLP
consistency loss) for coherent predictions for sample. domain adaptation models into Model-Centric, Data-Centric,
and Hybrid. Model-Centric models (focus on augmenting the
feature space, tinkering with loss functions, and changing the
2) SELF -SUPERVISION METHODS
architecture of the model), discussed before, has been used
Self-supervision DA methods look at joint training of an aux-
in other applications and computer too. Pre-trained models
iliary self-supervision task alongside the main task and there-
are Data-Centric models and are discussed below, and hybrid
fore are also aligned to multi-task. In the Deep Reconstruction
models are discussed in the section Hybrid Methods.
Classification Network (DRCN), [65] had a deconvolution
network to reconstruct the image (an auxiliary self-supervised
task) while the convolution network performed the label pre- 1) PRE-TRAINED MODELS
diction (main task). The feature mapping parameters were The Data-Centric models are not shared with computer vision
shared in DRCN, very much similar to Figure 8. The intuition tasks, perhaps because these models focus on data elements,
is that the main task receives knowledge transfer from the different in computer vision and NLP, to support adaptation.
auxiliary task. These models are less prevalent but, of late, have picked
Carlucci et al. [112] used the auxiliary task of jigsaw puzzle up the interest of researchers. BERT Devlin et al. [116]
solving (permutation index) while solving the main task as a was a model to revolutionize transfer learning– other
DA/DG strategy. It is noted that typically the auxiliary task methods include pseudo labeling, pre-training (zero-shot)
is an unsupervised task; however, the main task is a super- (example: Multilingual BERT)/fine-turning (including multi-
vised task. Xu et al. [113] further increased the number of phase) (example: SciBERT [117] / BioBERT [118]).
auxiliary tasks (image rotation prediction, flip prediction, and Figure 9 provides a typical pre-trained training strat-
patch location prediction), further underlying those low-level egy, and Table 10 lists different pre-trained training data
differences (like pixel-level reconstruction/prediction) are not and strategies. Based on the DA definition, Pre-training
much useful in DA. In contrast, high-level structural task and fine-tuning are not kinds of DA processes, but these
(like part of image rotation) is very useful. Kim et al. [114] transformer-based language models are task agnostic in the
showed that the self-supervision technique is useful even sense that they can be fine-tuned on specific tasks using a
with few labeled instances in the source domain. They small dataset. It is included in this survey for completeness.
used within-domain instance discrimination (in-domain self- AdaptaBERT [119] used a two-step approach for domain-
supervision) and cross-domain matching (across-domain adaptive fine-tuning. In the first step, they performed domain
self-supervision) to learn features that are domain-invariant tuning by taking contextualized word embeddings (unla-
as well as discriminative. beled source and target domain data) and maximizing the
FIGURE 9. Typical Pre-trained Training strategy. Pre-training is typically task agnostic; future steps are required to adapt the model to the task in
question. An optional multi-step pre-training is done to reduce the data distribution gap of source data and target data. Best viewed in color.
probability of masked tokens. In the second step, they focused of data are used to train different models in multi-view
on task tuning by taking labeled source data and back- training. The views differ from each other in the following
propagating for the desired task (PoS tags in this case). dimensions (or a combination of dimensions):
The philosophy behind multi-view training is that the views time-series DA – input space adaptation and output space
complement each other, and the collaborated models improve adaptation.
each other’s performance. Examples of multi-view training
are Co-Training [31], Democratic Co-Training [129], and Tri- 1) INPUT SPACE ADAPTATION
Training [130]
In the input space DA strategy, the impetus is to use/generate
the source domain samples which resemble the target domain
H. DOMAIN ADAPTATION IN SPEECH
samples, much like reconstruction-based methods. Typically,
In speech domain adaptation tasks, the focus is to first
prior knowledge (Wang et al. [139]) or GANs (Contra-
identify which elements of the data are actually speech
GAN [140]) are used in this strategy.
and not noise; for the elements identified as speech, then
the focus is either recognition of speech called Automatic
Speech Recognition (ASR) or adapting to a speaker. Text- 2) OUTPUT SPACE ADAPTATION
to-speech (TTS) is a multi-modal variety where the out- Output space DA strategy is used both for classification
put modality (space) is speech. The DA strategies that and forecasting (DAF [137]). In the case of classification
are typically employed are discrepancy based ([131])(refer Yang et al. [141], high-confidence labels on the target
to section Discrepancy-Based Methods), adversarial-based domain are selected for training, analogous to pseudo-semi-
([132], [133]) (refer to section Adversarial Methods), supervised training (refer to section Pseudo-Semi-Supervised
pseudo-semi-supervised training based ([131]) (refer to Domain Adaptation). In the case of forecasting, domain-
section Pseudo-Semi-Supervised Domain Adaptation) and specific features are used (values of transformer network in
knowledge distillation based ([134], [135]) (Ensemble-based DAF [137])
methods or Teacher-student based, refer to section Ensemble-
Based Methods). J. EMERGING DOMAIN ADAPTATION FOR PRACTICAL
One speech-specific strategy understood is the work by SETTINGS AND REAL-WORLD CHALLENGES
Zhang [136], where a pretraining process is undertaken on Some models and techniques available in the literature do not
the DNN model using unlabeled target domain data first. fit into existing categories, have gained a lot of traction, and
Later, labeled source data is used to fine-tune the network. are, to some extent, very innovative and adapted to more prac-
The intuition behind the pretraining process is to seek shared tical settings and/or real-world challenges. These emerging
representation. DA techniques are mentioned below.
I. DOMAIN ADAPTATION IN TIME-SERIES
Typically, the tasks that are prevalent in time-series DA are 1) FEW-SHOT DOMAIN ADAPTATION
classification (generally 2 class classification) and forecast- The challenge with few-shot DA is that there is not enough
ing (predicting based on past time-stamped information). Fur- target data that can conclusively conform to the simultaneous
ther, the problems solved are univariate and multivariate, i.e., requirements of DA. These requirements are domain confu-
involving multiple time-stamped variables used for predic- sion and representation alignment between the two domains.
tion, e.g., pressure, temperature and flow rate predicting fault One of the first works on few-shot DA is Motiian et al. [142].
in a power station. Jin et al. mention [137] the complexity in They introduced Few-Shot Adversarial Domain Adaptation
time-series DA as two-fold: (FADA) using adversarial learning focusing on speed of adap-
1) Varyinginput and output space: The output space of tation. They alleviated the difficulty mentioned before by
the source domain time-series (say, the flow rate in mixing source and target samples into four categories based
the power station) may be different from the output on domain and class labels, and the classifier then worked on
space of the target domain time-series (say, a count these four categories instead of the standard two. Further, they
of units in a warehouse). Hence, it is imperative that initialized the network (feature extractor and label classifier)
not only domain-invariant features are captured but using source data only, then updated the domain class dis-
also domain-specific features be captured as in Domain criminator (freezing feature extractor). Finally, they froze the
Adaptation Forecaster (DAF) [137]. Similarly, input domain class discriminator and updated the feature extractor
space may be different. and label classifier.
2) Dependence on different time period subsets: In the Domain-Adaptive Few-Shot Learning (DA-FSL)
It may be possible that the outcome (classifica- [143], they look to solve even a more complex problem
tion/forecasting) may not be captured by overall history related to few-shot learning, i.e., target data may have
representation. In most likelihood, it would be a subset classes that can come from different domain. The focus
of overall time-period representation that may impact of the domain-adversarial prototypical network (DAPN) in
the outcome. DA-FSL is to attain alignment in global domain distribution
A survey on sensor time series [138] mentions that the while keeping class discriminative-ness intact by introduc-
strategies used for time-series DA bear much resemblance to ing new losses (domain discrimination, domain confusion,
non-time-series DA, with two specific strategies for classification). The losses are weighted using an adaptive
re-weighting mechanism. Another novel aspect was the use domain adaptation by looking at cluster distribution of
of attention before the embedding of the source. unknown (new) classes, giving more understanding to
Further, Yue et al. [144] proposed an end-to-end Few Shot the network to segregate between known and unknown
Domain Adaptation method, which includes self-learning classes and within known classes.
(called Prototypical Cross-domain Self-Supervised Learning 2) Partial Domain Adaptation: The source domain hav-
(PCS) framework) and is unsupervised. The main idea is ing a greater number of label classes than the target
knowledge transfer from source to target is to find similarities domain, i.e., Partial DA setting, leads to a problem
between instance and prototype (representative), making the of negative transfer. Partial Adversarial Domain Adap-
transfer more robust. tation (PADA) [151] implements an adversarial dis-
criminative method and aligns the feature distribution
2) ZERO-SHOT DOMAIN ADAPTATION of two domains in a shared space. Further, it weighs
Zero-Shot DA is a complex scenario because actual target down the importance of the extra class(es) of the source
domain data is not present during training time; only some domain. Cao et al. [152] extend their previous work
information about it (typically target metadata) is available. using Example Transfer Network (ETN), where the
Zero-Shot DA differs from DG because DG does not have strategy of weighting down the class importance is
any information about the target data, not even the metadata. different. It evaluates transferability and only transfers
examples like the target domain.
1) Zero-Shot Learning (usage of task-irrelevant data):
3) Universal Domain Adaptation: Universal DA is one
For the computer vision task, Peng et al. [145] used
of the most complex DA scenarios to deal with, and
information in task-irrelevant data (domain pairs) to
the research attempts are very recent. The idea by
help understand network information about the non-
You et al. [153] is typically to appreciate two ele-
available task-relevant target domain.
ments – domain similarity (which helps to understand
2) Zero-Shot Learning (new labels in the target
if the task can be supported) and prediction uncertainty.
domain): The intention is to learn ‘‘different’’ class
Domain similarity deduces samples coming from sim-
labels in the target domain, given labels in the source.
ilar labels, while prediction uncertainty deduces the
This is genuinely not a DA scenario, as the label domain
unknown class. It further includes aspects of partial
is different in both source and target. An example men-
domain adaptation strategies by the same research
tioned in Kodirov et al. [146] is that the label ‘‘Polar
group and supports all settings – closed/partial/open-
Bear’’ can be represented as embedding vectors of ‘has
set variations. The training tries to find an optimum
fur,’ ‘is white,’ and ‘eats fish.’ Any semantic embedding
probability (that the sample is part of the source class)
that is close to these embedding vectors can help label
which can help segregate if data can be worked on;
effectively.
else, mark it as unknown. V. N. and Kundu et al. [154]
support Universal DA by using a proxy of unobserved
3) LABEL SET DIFFERENCE IN DOMAINS
class (a hypothetical negative class) and therefore helps
This perspective helps to close the category (label) gap in DA in class separability.
– it may be possible that the target label set may contain more
(or open-set) or less (or partial) than the source. The typical
DA scenario is called closed-set DA, where the label set in 4) CONTINUOUS / SEQUENTIAL / INCREMENTAL DOMAIN
source and target is the same. The solution that supports both ADAPTATION
open-set and partial is called universal domain adaptation. In a representative DA setting, the source data and target data
are available during the training time. However, in real-world
1) Open (set) Domain Adaptation: The Open DA idea
settings, target data may be made available as we progress on
by Saito et al. [147] uses an adversarial generative
DA testing over time, or the target domain itself may change.
model where the generator creates samples different
In these settings, continuous (or sequential or incremental)
from the data boundaries of source samples. The fea-
DA is imperative.
ture extractor component can either align the features
of the target domain within the boundaries of the source 1) Online domain adaptation: In the work of J. P. and
domain or push away from the boundaries; the samples Mancini [155], continuous domain adaptation is done
pushed away from boundaries represent the unknown using batch normalization for unsupervised domain
class. The separate-to-adapt strategy ([148]) progres- adaptation. Sharing of network parameters happens
sively (coarse boundaries to finer boundaries) separates between source and target (online) except for the batch
known classes and unknown classes and uses the adver- normalization params. Batch normalization parameters
sarial discriminative method. Saito and collaborators are updated on the go (over time). This online DA
again discuss open-set domain adaptation with a bench- strategy was used in robotics use where the objects
mark towards open-set classification in syn2real [149]. were lit differently in different settings.
Pan et al. introduced Self-Ensembling with Category- 2) Predictive and Online domain adaptation: For
agnostic Clusters (SE-CC) [150]), which helps in unsupervised learning scenarios, Mancini et al. in
AdaGraph [156] focused on a predictive domain adap- example, Source Hypothesis Transfer (SHOT) by Liang et al.
tation scenario with an online learning component. [184] only uses the source model instead of the source data.
The system learned generalizing from annotated source The model aligns the source model with target data by learn-
images alongside unlabeled samples (with associated ing target-specific features (uses information maximization
metadata) from secondary domains. AdaGraph is used and self-supervised pseudo-labeling).
to understand the domain-specific parameters, and it Universal Source Free domain adaptation [154] and Feder-
provides those parameters to batch normalization lay- ated domain adaptation [160] also aim to support DA where
ers as part of predictive DA. the availability of source data during training is unsure.
3) Continuously Changing Domains: Sometimes, the V. N. and Kundu et al. [154] support Universal DA (closed,
task involved is such that domains vary continuously open set, and partial domain adaptation) and use synthetically
(e.g., self-driving car driving on a sunny day, and sud- generated hypothetical negative classes, which can act as a
denly it rains); we cannot treat the shift as discrete or proxy for the unobserved class, knowledge of class separabil-
static domains. Continuous Unsupervised Adaptation ity, and category gap. In federated domain adaptation [160],
or CUA [157] learns to adapt to new distribution model parameters are trained for each source note separately,
without not deviating (replay) from how it performed converging at different speeds. The use of dynamic atten-
in previous distributions. CUA has an element of tion help understands the weightage of each source model.
adaptation (Adapt Module) and memory (to replay if Federated domain adaptation also uses concepts of Domain
the same domain is countered again, called Replay Alignment, Domain Disentanglement, and Mutual informa-
Module). tion minimization.
4) Continuously Indexed Domain Adaptation: One of
the drawbacks of the existing DA techniques is that they
look to transfer knowledge between categorical (A and 7) SELF-SUPERVISED LEARNING IN DOMAIN ADAPTATION
B) domains. However, in the real world, continuously Self-supervised learning (including domain adaptation) is
indexed domains are involved in many tasks. Contin- typically a two-step sequential process; the first process
uously Indexed Domain Adaptation or CIDA [158] step includes unsupervised learning from a pretext task (in
conditions domain index distribution on a discriminator CV: rotation, image reorganization, implanting, coloriza-
that models the encoding. Another variant of CIDA tion, etc.), which is used to understand intrinsic domain
is Probabilistic CIDA (PCIDA); here, instead of the information (in CV: say semantic information of images
predicted domain index as output, it provides mean and in a particular domain). In the second process step, this
variance for the domain. learning is applied to a new task which further broadens
it. Bucci et al. [161] implemented a similar process for
object recognition across domains. The first task broad-
5) OPEN COMPOUND DOMAIN ADAPTATION (OCDA)
ens the previous supervised learning of semantic labels,
At times, there do not exist any clear boundaries amongst
and the second task focuses on understanding the structure
the source and multiple target domains. X. S. and Liu [159]
of the objects and their orientation. Given that label bias
concentrated on open compound domain adaptation (OCDA),
does not affect self-supervised learning, it can be used in
where the target domain is a composite of numerous unla-
partial (Bucci et al. [162]) and open-set (Bucci et al. [163])
beled and homogeneous domains. To bootstrap generaliza-
DA areas.
tion, they used curriculum domain adaptation in a data-driven
self-organizing fashion – understand easy-to-hard, based on
domain gaps. OCDA also separates characteristics discrim- 8) META-LEARNING IN DOMAIN ADAPTATION
inative between classes from those specific to domains. The
Meta-learning (or learning-to-learn) represents algorithms
curriculum of domain-robust learning is constructed from the
that learn from the output of other algorithms. These sit one
teased-out domain feature. Further, the use of memory mod-
level above (can be visualized as outer loop algorithms) over
ules increases the support for new domains. The knowledge
the standard task algorithms and are vital in model selection
transfer happens from the source domain to target domain
and tuning processes. Li and Hospedales [164] implemented
instances, and also, the network can dynamically balance the
meta-learning for semi-supervised DA and multi-source DA;
memory-transferred knowledge and the input information.
they also mentioned that meta-learning could be used for
If the new domain is close to any source domain, it can work
good initialization. Meta-learning in DA helps to increase
as a typical domain adaptation; in case of a difference, the
evaluation metrics (positive impact) by 0.7% (DANN) to
memory module helps.
2.5% (MCD). Another example in the speech domain is the
adaptation of generative-based dialogue systems for unseen
6) SOURCE DATA RESTRICTIONS domains - Ribeiro et al. [165] improved DiKTNet (a dialogue
There are conditions where data privacy is a concern or source model) adaptation to unseen domains using meta-learning.
data is not available. DA model that relies less on no source Meta-learning also finds use in domain generalization
data (post model creation) is a boon in those conditions. For ([166], [167])
FIGURE 10. Typical Pseudo-semi-supervised DA strategy. A subset of target data is pseudo-labeled using ‘‘non-adapted’’ source model. The
‘‘core-DA’’ task ensures that the pseudo-labels assigned are corrected. Best viewed in color.
9) PSEUDO-SEMI-SUPERVISED DOMAIN ADAPTATION but a reflection of labeled source data. Thereafter, one
This set of methods includes the treatment of a subset of of the techniques is to train a new model with labeled
unlabeled target domain data and labeling them before the source data and pseudo-labeled target data. However,
start of the ‘‘core’’ DA process (refer to Figure 10). Therefore, this method has the inherent weakness of propagating
for the ‘‘core’’ DA process, there exists a subset of target noisy labels (incorrect labels).
domain data that is labeled and hence the name pseudo-semi- In CV, Kim and Kim [170], worked on abating the noisy
supervised DA. It may be noted that the initial labeling of label problem by implementing a joint optimization
unlabeled target domain data may be accurate or inaccurate, framework, i.e., iteratively updating the model (net-
which is further refined during the ‘‘core’’ DA process. work) and pseudo-labels.
In NLP, Wang et al. [171] used Generative Pseudo
1) Active Learning in Domain Adaptation (Active
Labeling (GPL) for query-passage extraction purposes:
DA): While DA attains excellent results, the perfor-
where they retrieved positive passages from labeled
mances of DA methods often fall far behind their
data and applied that model for retrieving negative
supervised counterparts. In such cases, active domain
passages in target data. Thereafter, they used Margin-
adaptation (Active DA) has recently gained a lot of
MSE loss which helped the cross-encoder to soft-label
interest. In the Active DA method, a subset of tar-
query-passage pairs effectively. They then used the
get samples is used to obtain annotations and fur-
soft-labeled pairs for the core task.
ther helps to improve the performance of the ‘‘core’’
In time-series, as part of the output space strategy,
DA. The focus is on selecting samples that not only
Yang et al. [141] selected high-confidence labels on
include the diversity of target data but also represent the
the target domain for training.
complexity.
Moving Semantic Transfer Network (MSTN) [174]
Su et al. [168], in Active Adversarial Domain Adapta-
looked to align the centroid of each class in
tion (AADA), used selection criteria based on diversity
both labeled source and pseudo-labeled target data.
cue (dependent on optimal discriminator in adversar-
Chen et al. [175], in Progressive Feature Alignment
ial setting) and uncertainty cue (dependent on cross-
Network (PFAN), formulated an easy-to-hard strategy
entropy, a proxy for empirical risk). They showed
(ETHS) and used only an easy sample for downstream
superior performance for digit recognition and object
network (Adaptive Prototype Alignment or APA) use.
detection tasks. Prabhu et al. [169] further improved
ETHS and APA were then used iteratively till conver-
on basic active learning techniques of diversity cue and
gence for best results.
uncertainty cue by proposing Clustering Uncertainty-
weighted Embeddings (CLUE). They weighted sam-
ples and selected them; here, diversity was supported V. DATASETS USED IN DOMAIN ADAPTATION
by clustering and uncertainty by entropy weighting. This section captures the existing and emerging datasets used
They surpassed previous active learning-based SOTA for DA across CV, NLP, speech, time-series, and multi-modal
(i.e., AADA) results in digit recognition and object data domains. One observation is that researchers use very
detection. few benchmark DA datasets, and the research is done in a
2) Pseudo-Labeling in Domain Adaptation: Unlike very narrow set of tasks.
active learning, Pseudo-label DA includes applying the
model trained on labeled source data on a batch of A. COMPUTER VISION (CV) DATASETS
unlabeled target data to predict labels / annotate. Here In Computer Vision (CV), most of the DA work has been done
the labels/annotations on target data are not accurate in digit recognition and image classification. Complex CV
TABLE 11. (Continued.) Common computer vision (CV) datasets used in DA.
TABLE 12. Common natural language processing (NLP) datasets used in DA.
TABLE 12. (Continued.) Common natural language processing (NLP) datasets used in DA.
TABLE 12. (Continued.) Common natural language processing (NLP) datasets used in DA.
tasks (like pose estimation) are now getting traction. Table 11 of public time-series datasets used in DA continue to be very
lists common CV datasets used in DA in recent times. less.
instances. Results shown by researchers on diverse and finally, the diversity would lead to capturing more
datasets would promote the creation of more datasets, practical settings.
2) DA has the promise to apply to real-world problems source DA) may not yield that good performance. Few
and solve them. Researchers have started investigating papers discuss the bidirectional results. Reasons are not
and solving some of the challenges, and some are yet understood as to why a particular direction yields better
to be explored. Table 16 provides a view of real-world performance over the other direction. Example: SVHN
challenges and examples of research work undertaken; to MNIST accuracy is very high [84], while MNIST to
however, some areas are still to be examined. SVHN is not very high. A general-purpose strategy is
3) Need for more tasks and applications: New/other required for bi-directional DA.
applications involving different types of data (like 6) Effective comparison metrics missing for some DA
NLP [115], [33]) for DA can be understood. Time- scenarios: Typically, absolute mAP is used for object
series data adaptation is not looked at much (sensor detection tasks – however, it is the relative mAP
type adaption may be a great use case). Further, multi- (source-only baseline and after DA) that is important
modal data-related domain adaptations are few. Also, for DA. It is much better than absolute mAP as dif-
industrial applications (where the target is industrial ferent papers also use models trained with different
data) can be looked at by exploiting domain adaptation hyperparameters. There is a need of similar effective
(source data is academic data). There is a need to comparison metrics.
develop a DA framework in these areas. 7) Varied model and data parameters in DA: Fair
4) Research bias for Classification tasks: In computer and comprehensive evaluation of DA approach and
vision (refer to Table 11), most of the work done is in reusability comparison is difficult due to varied met-
classification tasks (digit recognition and image clas- rics, hyper-parameters and data input (e.g., image size).
sification). Other tasks (pose estimation, object detec- There is an imperative need of standardization of some
tion, etc.) are less explored. possible parameters e.g., image size.
Similarly, most of the work reported in NLP domain
adaptation is in sentiment analysis, followed by clas- VII. APPLICATIONS OF DOMAIN ADAPTATION
sification tasks (as in CV) (refer to Table 12), and not Given that DA includes relevant elements and supports gener-
many tasks (most are 1:1 adaptation tasks) are explored alization, it has found usage in many applications. Mentioned
by researchers on the techniques published by them. are some motivating examples and possible usage in the
Areas like dependency parsing (DEP), Named Entity future.
Recognition (NER), part-of-speech (POS), and other
areas are explored significantly less. A. COMPUTER VISION (CV) DOMAIN ADAPTATION USAGE
5) Bidirectional DA: It is understood that DA from the DA in CV continues to mirror the progress of CV tasks
source domain to the target domain may yield good and techniques with a lag. The initial focus of DA in CV
performance, but the reverse (i.e., target domain to was on simple CV tasks – like digit recognition and image
classification, but later, the focus included complex tasks of TABLE 17. DA usage in various computer vision (CV) areas.
object detection, segmentation, depth estimation and similar.
Surveys have been done on domain adaptation on specific
computer vision tasks, e.g., semantic segmentation [294] and
object detection [295]. The current focus is increasingly on
even more complex tasks (e.g., pose estimation, video classi-
fication), complex datasets (e.g., in the wild, 3D), improve
state-of-the-art DA metrics in previously mentioned tasks.
Also, due to the scarcity of data in the target domain, most
DA methods adapt from synthetic or other domain data to
real data.
Most of the work on DA in CV is on 2 Dimension (2D)
data, e.g., camera images, followed by 2D data with time, e.g.,
video images, followed by a focus on 3D, e.g., LiDAR (Light
Detection And Ranging). A survey on LiDAR perception
by [214] further captures deep DA techniques.
Table 17 provides a view of different CV tasks and
key DA advances in those specific tasks. These tasks and
techniques have found much use of DA in the CV in indus-
tries (further discussed in the section Industrial Applica-
tions), e.g., AI imaging is widely used in the healthcare
sector while LiDAR DA is used in Advanced Driver Assis-
tance Systems (ADAS) or Autonomous driving. These tech-
niques are also used in situations where the data is derived
from different foundations (geographic, genetic, cultural,
age, etc.)
TABLE 17. (Continued.) DA usage in various computer vision (CV) areas. TABLE 18. DA usage in various natural language processing (NLP) areas.
TABLE 19. DA usage in speech areas. TABLE 20. DA usage in time-series tasks.
proliferated usage in NLP and CV is used in DA as part of Industry 4.0 can be supported by DA. For
(DAF [137], Adversarial Memory Network (AMN) example, IoT devices or edge devices are quite varied,
(Attention + DANN + SCL MemNet) [55], Feder- and they are installed in varied environments / used
ated domain adaptation [160]). Also, the focus is to by varied users; this variation provides good ground to
include two or more DA techniques together. Wil- use DA.
son and Cook [356] mention the combination of the • Use of more stable training approaches: Adversar-
teacher-student network [84] and AutoDIAL [110], ial feature learning-based approaches are still most
AutoDIAL can replace the student network to under- utilized by researchers, even though the training
stand the degree of adaptation. Similarly, GAN, at times is unstable in practice and requires care-
a data augmentation technique, can replace stochas- ful selection and tuning of parameters. However,
tic data augmentation in [84]. This augmentation pseudo-learning-based approaches (including pseudo-
of multiple techniques or methods can be useful in learning based self-training) are being adopted by
multimodal DA. researchers more and more based on their outperfor-
• Multi-domain support: To support multiple domains mance and training stability. However, one drawback
in DA, techniques or methods are required to deal with of pseudo-learning-based approaches indeed is noise
larger domain shifts and/or are robust. StarGAN [357] in pseudo labels, which can lead to under perfor-
looks at multi-domain image-to-image translation and mance. Focus of researchers are now looking to employ
can be used in multi-domain adaptation. only more confident pseudo-predictions for training.
• Cross-modal application: DA techniques or meth- Similarly, the use of mean-teacher strategy is on the
ods primarily developed for one modality (say text) rise, as the approach utilities additional regulariza-
can be used in another modality (say an image). tions or feature matching strategy which improve the
It is observed currently that other than adversar- performance.
ial methods, not many methods are used across • Post-DA over pre-DA strategies: Post-DA tech-
modalities. niques are becoming more common to improve
• Supporting more real-world scenarios: DA ‘‘fallen’’ task accuracy. For example, Saunders and
researchers are looking to support more real-world Byrne [355] used Elastic Weight Consolidation (EWC)
scenarios. These real-world which are inspired by data and lattice-rescoring technique to prop-up the ‘‘fallen’’
(unavailability, label-set difference, etc.) and environ- accuracy (due to catastrophic forgetting during DA).
mental (restricted, sequential, etc.) limitations. The However, pre-DA methods are not much found in the
current research endeavor is to support a larger domain literature. Incorporating pre-DA knowledge of domain
shift in DA when applied to real-world applications. gaps arising from either data processing (image pro-
WILDS Datasets [358] provide 10 curated real-world cessing techniques, text extraction techniques) may
dataset benchmarks having a varied range of domain lead to a performance increase. One possible way to
shifts. Further, DA provides the potential for rein- incorporate this would be to use multi-level constraints
forcement learning applications to learn in a simulated in adversarial-based approaches. Further research work
environment and then apply the policy learned to the undertaken in both Pre and Post DA strategies would
real-world environment. More industrial applications improve task accuracies.
TABLE 23. DA can be applied to a host of industrial use-cases. • Removing bias for specific frameworks: Just like we
see a classification task bias for nearly all DA work,
there also exists a research bias for specific frame-
works. A case in point is object detection DA, where
nearly all the DA strategies focus on Faster RCNN.
Other frameworks like YOLO, SSD, and DETR must
also be evaluated for DA performance.
• Solving Industrial use-cases: DA has the potential
to solve many AI industrial use-cases, which are not
implemented due to economies of scale in implemen-
tation for multiple locations, multiple cultures, multiple
demographics, etc., large domain gap is understood in
high frequency, etc. Table 23 provides a list of industrial
use-cases where DA would lead to enormous benefits
for the industry if applied.
IX. CONCLUSION
There is an imperative need for deep networks to adapt to
multiple domains to reduce costs, increase application, and
be more human-like - the ultimate aim of artificial intelli-
gence. This paper explores the work done in DA in deep
neural networks (also known as deep DA) in multiple data
domains (computer vision, NLP, multimodal, speech, time-
series), reviews different methods and techniques, and men-
tions emerging datasets related to DA. This paper focuses on
applying DA in more practical settings, in various industries,
in the wild, and in real-world scenarios where the DA chal-
lenges lie. We believe that research undertaken in mentioned
future research frontiers would greatly impact DA and AI as
a whole.
ACKNOWLEDGMENT
The authors would like to thank Dr. Prathosh, an Assistant
Professor of the Indian Institute of Science, for the insights
and perspective provided on domain adaptation in his course
‘‘Advanced Topics on Deep Learning.’’
REFERENCES
[1] C. Darwin, On the Origin of Species by Means of Natural Selection, or
Preservation of Favoured Races in the Struggle for Life. London, U.K.:
John Murray, 1859.
[2] D. Ulyanov, A. Vedaldi, and V. Lempitsky, ‘‘Instance normalization:
The missing ingredient for fast stylization,’’ 2016, arXiv:1607.08022.
[3] S. Ioffe and C. Szegedy, ‘‘Batch normalization: Accelerating deep net-
work training by reducing internal covariate shift,’’ in Proc. 32nd Int.
Conf. Mach. Learn., 2015, pp. 448–456.
[4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, ‘‘Generative adversarial nets,’’ in
Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 1–9.
[5] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
L. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. NIPS,
2017, pp. 1–15.
[6] S. J. Pan and Q. Yang, ‘‘A survey on transfer learning,’’ in Proc. IEEE
Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
[7] T. Tommasi, ‘‘Tutorial on domain adaptation,’’ in Proc. ECCV, 2020,
pp. 1–10.
[8] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, ‘‘Analysis of
representations for domain adaptation,’’ in Proc. NeurIPS, 2006, pp. 1–8.
[9] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and
J. W. Vaughan, ‘‘A theory of learning from different domains,’’ Mach.
Learn., vol. 79, nos. 1–2, pp. 151–175, May 2010.
[10] H. Zhao, R. T. Des Combes, Z. Kun, and G. Geoffrey, ‘‘On learning [35] M. Wang and W. Deng, ‘‘Deep visual domain adaptation: A survey,’’
invariant representations for domain adaptation,’’ in Proc. Int. Conf. Neurocomputing, vol. 312, pp. 135–153, Oct. 2018.
Mach. Learn., 2019, pp. 7523–7532. [36] B. Sun and K. Saenko, ‘‘Deep CORAL: Correlation alignment for deep
[11] T. Le, K. Nguyen, N. Ho, H. Bui, and D. Phung, ‘‘On deep domain domain adaptation,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016,
adaptation: Some theoretical understandings,’’ 2018, arXiv:1811.06199. pp. 443–450.
[12] G. Csurka, ‘‘Domain adaptation for visual applications: A comprehensive [37] B. Sun, J. Feng, and K. Saenko, ‘‘Return of frustratingly easy
survey,’’ 2017, arXiv:1702.05374. domain adaptation,’’ in Proc. 30th AAAI Conf. Intell., 2016,
[13] S. J. Pan, J. T. Kwok, and Q. Yang, ‘‘Transfer learning via dimensionality pp. 1–8.
reduction,’’ in Proc. 23rd AAAI Conf. Artif. Intell., 2008, pp. 677–682. [38] G. Kang, L. Jiang, Y. Yang, and A. G. Hauptmann, ‘‘Contrastive adapta-
¨
[14] K. M. Borgwardt, A. Gretton, M. J. Rasch, H.-P. Kriegel, B. Sch?lkopf, tion network for unsupervised domain adaptation,’’ in Proc. IEEE Conf.
and A. J. Smola, ‘‘Integrating structured biological data by kernel max- Comput. Vis. Pattern Recognit., Jun. 2019, pp. 4893–4902.
imum mean discrepancy,’’ Bioinformatics, vol. 22, no. 14, pp. 49–57, [39] M. Long, Y. Cao, J. Wang, and M. I. Jordan, ‘‘Learning transferable
2006. features with deep adaptation networks,’’ in Proc. Int. Conf. Mach. Learn.
[15] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, ‘‘Domain adaptation via (ICML), 2015, pp. 97–105.
transfer component analysis,’’ IEEE Trans. Neural Netw., vol. 22, no. 2, [40] M. Long, H. Zhu, J. Wang, and M. I. Jordan, ‘‘Deep transfer learning
pp. 199–210, Feb. 2011. with joint adaptation networks,’’ in Proc. Int. Conf. Mach. Learn., 2017,
[16] J. Blitzer, R. McDonald, and F. Pereira, ‘‘Domain adaptation with struc- pp. 1–6.
tural correspondence learning,’’ in Proc. Conf. Empirical Methods Natu- [41] B. B. Damodaran, B. Kellenberger, R. Flamary, D. Tuia, and N. Courty,
ral Lang. Process., 2006, pp. 120–128. ‘‘DeepJDOT: Deep joint distribution optimal transport for unsupervised
[17] R. K. Ando and T. Zhang, ‘‘A framework for learning predictive structures domain adaptation,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018,
from multiple tasks and unlabeled data,’’ J. Mach. Learn. Res., vol. 6, pp. 447–463.
no. 11, pp. 185–1817, 2005. [42] N. Courty, R. Flamary, A. Habrard, and A. Rakotomamonjy, ‘‘Joint
[18] H. Daumé, ‘‘Frustratingly easy domain adaptation,’’ 2009, distribution optimal transportation for domain adaptation,’’ in Proc. Adv.
arXiv:0907.1815. Neural Inf. Process. Syst., 2017, pp. 1–12.
[19] R. Gopalan, R. Li, and R. Chellappa, ‘‘Unsupervised adaptation across [43] F. Zhuang, X. Cheng, P. Luo, S. J. Pan, and Q. He, ‘‘Supervised represen-
domain shifts by generating intermediate data representations,’’ IEEE tation learning: Transfer learning with deep autoencoders,’’ in Proc. 24th
Trans. Pattern Anal. Mach. Intell., vol. 36, no. 11, pp. 2288–2302, Int. Joint Conf. Artif. Intell., 2015, pp. 1–7.
Nov. 2014. [44] C.-Y. Lee, T. Batra, M. H. Baig, and D. Ulbricht, ‘‘Sliced Wasser-
[20] B. Gong, Y. Shi, F. Sha, and K. Grauman, ‘‘Geodesic flow kernel for stein discrepancy for unsupervised domain adaptation,’’ in Proc.
unsupervised domain adaptation,’’ in Proc. IEEE Conf. Comput. Vis. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,
Pattern Recognit., Jun. 2012, pp. 2066–2073. pp. 10285–10295.
[21] B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars, ‘‘Unsupervised [45] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell,
visual domain adaptation using subspace alignment,’’ in Proc. IEEE Int. ‘‘Deep domain confusion: Maximizing for domain invariance,’’ 2014,
Conf. Comput. Vis., Dec. 2013, pp. 2960–2967. arXiv:1412.3474.
[22] M. Long, G. Ding, J. Wang, J. Sun, Y. Guo, and P. S. Yu, ‘‘Transfer sparse [46] A. R. Kashyap, D. Hazarika, M.-Y. Kan, and R. Zimmermann, ‘‘Domain
coding for robust image representation,’’ in Proc. IEEE Conf. Comput. divergences: A survey and empirical analysis,’’ 2020, arXiv:2010.12198.
Vis. Pattern Recognit., Jun. 2013, pp. 407–414. [47] Y. Pan, T. Yao, Y. Li, Y. Wang, C.-W. Ngo, and T. Mei, ‘‘Transferrable
[23] J. D. Jong. (Oct. 2017). Transfer Learning: Domain Adaptation prototypical networks for unsupervised domain adaptation,’’ in Proc.
by Instance-Reweighting. Accessed: Sep. 18, 2021. [Online]. Avail- IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,
able: https://johanndejong.wordpress.com/2017/10/15/transfer-learning- pp. 2239–2247.
domain-adaptation-by-instance-reweighting/ [48] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, ‘‘Adversarial discrim-
[24] M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu, ‘‘Transfer joint matching inative domain adaptation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
for unsupervised domain adaptation,’’ in Proc. IEEE Conf. Comput. Vis. Recognit. (CVPR), Jul. 2017, pp. 7167–7176.
Pattern Recognit., Jun. 2014, pp. 1410–1417. [49] Y. Ganin and V. Lempitsky, ‘‘Unsupervised domain adaptation by back-
[25] W. Dai, Q. Yang, G.-R. Xue, and Y. Yu, ‘‘Boosting for transfer learning,’’ propagation,’’ in Proc. Int. Conf. Mach. Learn., 2015, pp. 1180–1189.
in Proc. 24th Int. Conf. Mach. Learn., Jun. 2007, pp. 1855–1862. [50] J. Hoffman, D. Wang, F. Yu, and T. Darrell, ‘‘FCNs in the wild: Pixel-level
[26] S. Ruder, P. Ghaffari, and J. G. Breslin, ‘‘Knowledge adaptation: Teaching adversarial and constraint-based adaptation,’’ 2016, arXiv:1612.02649.
to adapt,’’ 2017, arXiv:1702.02052. [51] H. Huang, Q. Huang, and P. Krahenbuhl, ‘‘Domain transfer through deep
[27] B. Zadrozny, ‘‘Learning and evaluating classifiers under sample selection activation matching,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018,
bias,’’ in Proc. 21st Int. Conf. Mach. Learn., 2004, p. 114. pp. 590–605.
[28] M. Sugiyama, S. Nakajima, H. Kashima, P. Buenau, and M. Kawanabe, [52] Y.-B. Kim, K. Stratos, and D. Kim, ‘‘Adversarial adaptation of synthetic
‘‘Direct importance estimation with model selection and its application to or stale data,’’ in Proc. 55th Annu. Meeting Assoc. Comput. Linguistics,
covariate shift adaptation,’’ in Proc. 20th Int. Conf. Neural Inf. Process. 2017, pp. 1297–1307.
Syst., 2007, pp. 1–8. [53] B. Xu, M. Mohtarami, and J. Glass, ‘‘Adversarial domain adaptation for
[29] L. Duan, D. Xu, and I. W. Tsang, ‘‘Learning with augmented features stance detection,’’ 2019, arXiv:1902.02401.
for heterogeneous domain adaptation,’’ in Proc. 29th Int. Conf. Int. Conf. [54] J. Shen, Y. Qu, W. Zhang, and Y. Yu, ‘‘Wasserstein distance guided
Mach. Learn., 2012, pp. 1–12. representation learning for domain adaptation,’’ in Proc. 32nd AAAI Conf.
[30] B. Kulis, K. Saenko, and T. Darrell, ‘‘What you saw is not what you get: Artif. Intell., 2018, pp. 1–6.
Domain adaptation using asymmetric kernel transforms,’’ in Proc. CVPR, [55] Z. Li, Y. Zhang, Y. Wei, Y. Wu, and Q. Yang, ‘‘End-to-end adversarial
Jun. 2011, pp. 1785–1792. memory network for cross-domain sentiment classification,’’ in Proc.
[31] K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan, 26th Int. Joint Conf. Artif. Intell., Aug. 2017, pp. 2237–2243.
‘‘Domain separation networks,’’ in Proc. Neural Inf. Process. Syst., 2016, [56] A. Naik and C. Rose, ‘‘Towards open domain event trigger identification
pp. 1–15. using adversarial domain adaptation,’’ in Proc. 58th Annu. Meeting Assoc.
[32] A. Blum and T. Mitchell, ‘‘Combining labeled and unlabeled data with co- Comput. Linguistics, 2020, pp. 1–7.
training,’’ in Proc. 11th Annu. Conf. Comput. Learn. Theory, Jul. 1998, [57] M. Yasunaga, J. Kasai, and D. Radev, ‘‘Robust multilingual part-
pp. 92–100. of-speech tagging via adversarial training,’’ in Proc. Conf. North
[33] S. Ruder, ‘‘Neural transfer learning for natural language processing,’’ Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang. Technol., 2018,
Ph.D. thesis, School Eng. Inform., Nat. Univ. Ireland, Galway, Ireland, pp. 1–15.
2019. [58] M. Long, Z. Cao, J. Wang, and M. I. Jordan, ‘‘Conditional adversarial
[34] X. Glorot, A. Bordes, and Y. Bengio, ‘‘Domain adaptation for large-scale domain adaptation,’’ in Proc. NeurIPS, 2018, pp. 1–13.
sentiment classification: A deep learning approach,’’ in Proc. Int. Conf. a [59] M. Arjovsky, S. Chintala, and L. Bottou, ‘‘Wasserstein GAN,’’ 2017,
Learn. (ICML), 2011, pp. 1–11. arXiv:1701.07875.
[60] Y. Du, Z. Tan, Q. Chen, X. Zhang, Y. Yao, and C. Wang, ‘‘Dual adversarial [84] G. French, M. Mackiewicz, and M. Fisher, ‘‘Self-ensembling for visual
domain adaptation,’’ 2020, arXiv:2001.00153. domain adaptation,’’ in Proc. Int. Conf. Learn. Represent. (ICLR), 2018,
[61] H. Rangwani, S. K. Aithal, M. Mishra, A. Jain, and R. V. Babu, ‘‘A closer pp. 1–20.
look at smoothness in domain adversarial training,’’ in Proc. 39th Int. [85] A. Tarvainen and H. Valpola, ‘‘Mean teachers are better role mod-
Conf. Mach. Learn. (ICML), 2022, pp. 1–22. els: Weight-averaged consistency targets improve semi-supervised deep
[62] G. Shi, C. Feng, L. Huang, B. Zhang, H. Ji, L. Liao, and H. Huang, ‘‘Genre learning results,’’ 2017, arXiv:1703.01780.
separation network with adversarial training for cross-genre relation [86] S. Desai, B. Sinno, A. Rosenfeld, and J. J. Li, ‘‘Adaptive ensem-
extraction,’’ in Proc. Conf. Empirical Methods Natural Lang. Process., bling: Unsupervised domain adaptation for political document analy-
2018, pp. 1018–1023. sis,’’ in Proc. Conf. Empirical Methods Natural Lang. Process. 9th
[63] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, ‘‘Unpaired image-to-image Int. Joint Conf. Natural Lang. Process. (EMNLP-IJCNLP), 2019,
translation using cycle-consistent adversarial networks,’’ in Proc. IEEE pp. 1–13.
Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2223–2232. [87] K. Saito, Y. Ushiku, and T. Harada, ‘‘Asymmetric tri-training for unsuper-
[64] J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, vised domain adaptation,’’ in Proc. 34th Int. Conf. Mach. Learn., 2017,
A. A. Efros, and T. Darrell, ‘‘CyCADA: Cycle-consistent adversarial pp. 2988–2997.
domain adaptation,’’ in Proc. Int. Conf. Mach. Learn. (ICML), 2018, [88] Y. Zou, Z. Yu, B. V. K. Kumar, and J. Wang, ‘‘Unsupervised domain
pp. 1989–1998. adaptation for semantic segmentation via class-balanced self-training,’’
[65] M. Ghifary, W. B. Kleijn, M. Zhang, D. Balduzzi, and W. Li, ‘‘Deep in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 289–305.
reconstruction-classification networks for unsupervised domain adapta- [89] F. Qi, X. Yang, and C. Xu, ‘‘A unified framework for multimodal
tion,’’ in Proc. Eur. Conf. Comput. Vis., 2016, pp. 597–613. domain adaptation,’’ in Proc. 26th ACM Int. Conf. Multimedia, Oct. 2018,
[66] Y. Ziser and R. Reichart, ‘‘Neural structural correspondence learning for pp. 429–437.
domain adaptation,’’ in Proc. 21st Conf. Comput. Natural Lang. Learn., [90] A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, and M. Rohrbach,
2017, pp. 1–11. ‘‘Multimodal compact bilinear pooling for visual question answering and
[67] Y. Ziser and R. Reichart, ‘‘Pivot based language modeling for improved visual grounding,’’ 2016, arXiv:1606.01847.
neural domain adaptation,’’ in Proc. Conf. North Amer. Chapter Assoc. [91] J.-H. Kim, K.-W. On, W. Lim, J. Kim, J.-W. Ha, and B.-T. Zhang,
Comput. Linguistics, Hum. Lang. Technol., 2018, pp. 1241–1251. ‘‘Hadamard product for low-rank bilinear pooling,’’ 2016,
[68] Y. Ziser and R. Reichart, ‘‘Task refinement learning for improved accu- arXiv:1610.04325.
racy and stability of unsupervised domain adaptation,’’ in Proc. 57th [92] H. Ben-younes, R. Cadene, M. Cord, and N. Thome, ‘‘MUTAN: Multi-
Annu. Meeting Assoc. Comput. Linguistics, 2019, pp. 5895–5906. modal tucker fusion for visual question answering,’’ in Proc. IEEE Int.
[69] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, ‘‘Learning to discover Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2612–2620.
cross-domain relations with generative adversarial networks,’’ in Proc. [93] H. Zhang, S. Qian, Q. Fang, and C. Xu, ‘‘Multimodal disentangled
Int. Conf. Mach. Learn., 2017, pp. 1–10. domain adaption for social media event rumor detection,’’ IEEE Trans.
[70] S. Sun, H. Shi, and Y. Wu, ‘‘A survey of multi-source domain adaptation,’’ Multimedia, vol. 23, pp. 4441–4454, 2020.
Inf. Fusion, vol. 24, pp. 84–92, Jul. 2015. [94] J. Munro and D. Damen, ‘‘Multi-modal domain adaptation for fine-
[71] S. Zhao, B. Li, C. Reed, P. Xu, and K. Keutzer, ‘‘Multi-source domain grained action recognition,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
adaptation in the deep learning era: A systematic survey,’’ 2020, Pattern Recognit. (CVPR), Jun. 2020, pp. 122–132.
arXiv:2002.12169.
[95] H. Li, Y. Kim, C.-H. Kuo, and S. Narayanan, ‘‘Acted vs. improvised:
[72] X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang, ‘‘Moment
Domain adaptation for elicitation approaches in audio-visual emotion
matching for multi-source domain adaptation,’’ in Proc. IEEE/CVF Int.
recognition,’’ 2021, arXiv:2104.01978.
Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1406–1415.
[96] X. Shu, G.-J. Qi, J. Tang, and J. Wang, ‘‘Weakly-shared deep transfer
[73] S. Zhao, B. Li, X. Yue, Y. Gu, P. Xu, R. Hu, H. Chai, and K. Keutzer,
networks for heterogeneous-domain knowledge propagation,’’ in Proc.
‘‘Multi-source domain adaptation for semantic segmentation,’’ 2019,
23rd ACM Int. Conf. Multimedia, 2015, pp. 35–44.
arXiv:1910.12181.
[74] P. Russo, T. Tommasi, and B. Caputo, ‘‘Towards multi-source adaptive [97] Z. Ding, M. Shao, and Y. Fu, ‘‘Missing modality transfer learning via
semantic segmentation,’’ in Proc. Int. Conf. Image Anal. Process., 2019, latent low-rank constraint,’’ IEEE Trans. Image Process., vol. 24, no. 11,
pp. 292–301. pp. 4322–4334, Nov. 2015.
[75] S.-A. Rebuffi, H. Bilen, and A. Vedaldi, ‘‘Learning multiple visual [98] C. Athanasiadis, E. Hortal, and S. Asteriadis, ‘‘Audio–visual domain
domains with residual adapters,’’ 2017, arXiv:1705.08045. adaptation using conditional semi-supervised generative adversarial net-
[76] Y. Yang and T. M. Hospedales, ‘‘A unified perspective on multi-domain works,’’ Neurocomputing, vol. 397, pp. 331–344, Jul. 2020.
and multi-task learning,’’ 2014, arXiv:1412.7489. [99] G. Shafer and V. Vovk, ‘‘A tutorial on conformal prediction,’’ J. Mach.
[77] H. Guo, R. Pasunuru, and M. Bansal, ‘‘Multi-source domain adaptation Learn. Res., vol. 9, no. 3, pp. 1–51, 2008.
for text classification via DistanceNet-bandits,’’ in Proc. AAAI Conf. Artif. [100] S. Seo, S. Na, and J. Kim, ‘‘HMTL: Heterogeneous modality trans-
Intell., 2020, pp. 7830–7838. fer learning for audio-visual sentiment analysis,’’ IEEE Access, vol. 8,
[78] J. Guo, D. J. Shah, and R. Barzilay, ‘‘Multi-source domain adaptation pp. 140426–140437, 2020.
with mixture of experts,’’ 2018, arXiv:1809.02256. [101] A. Conneau, G. Lample, M. Ranzato, L. Denoyer, and H. Jégou, ‘‘Word
[79] J. Zhu, N. Chen, and C. Shen, ‘‘A new multiple source domain adapta- translation without parallel data,’’ in Proc. ICLR, 2018, pp. 1–11.
tion fault diagnosis method between different rotating machines,’’ IEEE [102] A. Søgaard, S. Ruder, and I. Vulić, ‘‘On the limitations of unsupervised
Trans. Ind. Informat., vol. 17, no. 7, pp. 4788–4797, Jul. 2021. bilingual dictionary induction,’’ in Proc. 56th Annu. Meeting Assoc.
[80] Y. Xia, C. Shen, D. Wang, Y. Shen, W. Huang, and Z. Zhu, ‘‘Moment Comput. Linguistics, 2018, pp. 1–11.
matching-based intraclass multisource domain adaptation network for [103] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek,
bearing fault diagnosis,’’ Mech. Syst. Signal Process., vol. 168, Apr. 2022, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov,
Art. no. 108697. ‘‘Unsupervised cross-lingual representation learning at scale,’’ 2019,
[81] B. Gholami, P. Sahu, O. Rudovic, K. Bousmalis, and V. Pavlovic, arXiv:1911.02116.
‘‘Unsupervised multi-target domain adaptation: An information theo- [104] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee,
retic approach,’’ IEEE Trans. Image Process., vol. 29, pp. 3993–4002, ‘‘Generative adversarial text to image synthesis,’’ in Proc. Int. Conf.
2020. Mach. Learn., 2016, pp. 1060–1069.
[82] T. Isobe, X. Jia, S. Chen, J. He, Y. Shi, J. Liu, H. Lu, and S. Wang, [105] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. Metaxas,
‘‘Multi-target domain adaptation with collaborative consistency learn- ‘‘StackGAN: Text to photo-realistic image synthesis with stacked genera-
ing,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), tive adversarial networks,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
Jun. 2021, pp. 8187–8196. Oct. 2017, pp. 5907–5915.
[83] L. T. Nguyen-Meidine, A. Belal, M. Kiran, J. Dolz, L.-A. Blais-Morin, [106] J. Lei Ba, J. Ryan Kiros, and G. E. Hinton, ‘‘Layer normalization,’’ 2016,
and E. Granger, ‘‘Unsupervised multi-target domain adaptation through arXiv:1607.06450.
knowledge distillation,’’ in Proc. IEEE Winter Conf. Appl. Comput. Vis. [107] Y. Wu and K. He, ‘‘Group normalization,’’ in Proc. Eur. Conf. Comput.
(WACV), Jan. 2021, pp. 1339–1347. Vis. (ECCV), 2018, pp. 3–19.
[108] W.-G. Chang, T. You, S. Seo, S. Kwak, and B. Han, ‘‘Domain-specific [130] S. Ruder and B. Plank, ‘‘Strong baselines for neural semi-supervised
batch normalization for unsupervised domain adaptation,’’ in Proc. learning under domain shift,’’ in Proc. ACL, 2018, pp. 1–11.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, [131] P. Gimeno, D. Ribas, A. Ortega, A. Miguel, and E. Lleida, ‘‘Unsupervised
pp. 7354–7362. adaptation of deep speech activity detection models to unseen domains,’’
[109] Y. Li, N. Wang, J. Shi, X. Hou, and J. Liu, ‘‘Adaptive batch normalization Appl. Sci., vol. 12, no. 4, p. 1832, Feb. 2022.
for practical domain adaptation,’’ Pattern Recognit., vol. 80, pp. 109–117, [132] A. P. Prathosh and A. G. Ramakrishnan, ‘‘Unsupervised domain adapta-
Aug. 2018. tion schemes for building ASR in low-resource languages,’’ in Proc. IEEE
[110] F. M. Carlucci, L. Porzi, B. Caputo, E. Ricci, and S. R. Bulo, ‘‘AutoDIAL: Autom. Speech Recognit. Understand. Workshop (ASRU), Dec. 2021,
Automatic domain alignment layers,’’ in Proc. IEEE Int. Conf. Comput. pp. 342–349.
Vis. (ICCV), Oct. 2017, pp. 5067–5075. [133] J.-H. Park, M. Oh, and H.-M. Park, ‘‘Unsupervised speech domain adap-
[111] S. Roy, A. Siarohin, E. Sangineto, S. R. Bulo, N. Sebe, and E. Ricci, tation based on disentangled representation learning for robust speech
‘‘Unsupervised domain adaptation using feature-whitening and consen- recognition,’’ 2019, arXiv:1904.06086.
sus loss,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. [134] T. Asami, R. Masumura, Y. Yamaguchi, H. Masataki, and Y. Aono,
(CVPR), Jun. 2019, pp. 9471–9480. ‘‘Domain adaptation of DNN acoustic models using knowledge distilla-
[112] F. M. Carlucci, A. D’Innocente, S. Bucci, B. Caputo, and T. Tom- tion,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP),
masi, ‘‘Domain generalization by solving jigsaw puzzles,’’ in Proc. New Orleans, LA, USA, Mar. 2017, pp. 5185–5189.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, [135] Z. Meng, J. Li, Y. Gaur, and Y. Gong, ‘‘Domain adaptation via teacher-
pp. 2229–2238. student learning for end-to-end speech recognition,’’ in Proc. IEEE
[113] J. Xu, L. Xiao, and A. M. López, ‘‘Self-supervised domain adaptation for Autom. Speech Recognit. Understand. Workshop (ASRU), Dec. 2019,
computer vision tasks,’’ IEEE Access, vol. 7, pp. 156694–156706, 2019. pp. 268–275.
[114] D. Kim, K. Saito, T.-H. Oh, B. A. Plummer, S. Sclaroff, and K. Saenko, [136] X.-L. Zhang, ‘‘Unsupervised domain adaptation for deep neural network
‘‘Cross-domain self-supervised learning for domain adaptation with few based voice activity detection,’’ in Proc. IEEE Int. Conf. Acoust., Speech
source labels,’’ 2020, arXiv:2003.08264. Signal Process. (ICASSP), May 2014, pp. 6864–6868.
[115] A. Ramponi and B. Plank, ‘‘Neural unsupervised domain adaptation in [137] X. Jin, Y. Park, D. C. Maddix, H. Wang, and Y. Wang, ‘‘Domain
NLP—A survey,’’ 2020, arXiv:2006.00632. adaptation for time series forecasting via attention sharing,’’ 2021,
[116] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-training arXiv:2102.06828.
of deep bidirectional transformers for language understanding,’’ 2018, [138] Y. Shi, X. Ying, and J. Yang, ‘‘Deep unsupervised domain adaptation
arXiv:1810.04805. with time series sensor data: A survey,’’ Sensors, vol. 22, no. 15, p. 5507,
[117] I. Beltagy, K. Lo, and A. Cohan, ‘‘SciBERT: A pretrained language model Jul. 2022.
for scientific text,’’ in Proc. Conf. Empirical Methods Natural Lang. [139] Q. Wang, C. Taal, and O. Fink, ‘‘Integrating expert knowledge with
Process. 9th Int. Joint Conf. Natural Lang. Process. (EMNLP-IJCNLP), domain adaptation for unsupervised fault diagnosis,’’ IEEE Trans.
2019, pp. 1–6. Instrum. Meas., vol. 71, pp. 1–12, 2022.
[118] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, [140] A. R. Sanabria, F. Zambonelli, S. Dobson, and J. Ye, ‘‘ContrasGAN:
‘‘BioBERT: A pre-trained biomedical language representation model for Unsupervised domain adaptation in human activity recognition via
biomedical text mining,’’ Bioinformatics, vol. 36, no. 4, pp. 1234–1240, adversarial and contrastive learning,’’ Pervas. Mobile Comput., vol. 78,
Sep. 2019. Dec. 2021, Art. no. 101477.
[119] X. Han and J. Eisenstein, ‘‘Unsupervised domain adaptation of contex- [141] B. Yang, Q. Li, L. Chen, C. Shen, and S. Natarajan, ‘‘Bearing fault
tualized embeddings for sequence labeling,’’ in Proc. Conf. Empirical diagnosis based on multilayer domain adaptation,’’ Shock Vib., vol. 2020,
Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang. Pro- pp. 1–11, Sep. 2020.
cess. (EMNLP-IJCNLP), 2019, pp. 1–12. [142] S. Motiian, Q. Jones, S. M. Iranmanesh, and G. Doretto, ‘‘Few-shot
[120] A. Conneau, G. Lample, R. Rinott, A. Williams, S. R. Bowman, adversarial domain adaptation,’’ 2017, arXiv:1711.02536.
H. Schwenk, and V. Stoyanov, ‘‘XNLI: Evaluating cross-lingual sentence [143] A. Zhao, M. Ding, Z. Lu, T. Xiang, Y. Niu, J. Guan, and J.-R. Wen,
representations,’’ 2018, arXiv:1809.05053. ‘‘Domain-adaptive few-shot learning,’’ in Proc. IEEE Winter Conf. Appl.
[121] J. Howard and S. Ruder, ‘‘Universal language model fine-tuning for text Comput. Vis. (WACV), Jan. 2021, pp. 1390–1399.
classification,’’ in Proc. 56th Annu. Meeting Assoc. Comput. Linguistics, [144] X. Yue, Z. Zheng, S. Zhang, Y. Gao, T. Darrell, K. Keutzer, and
2018, pp. 1–12. A. S. Vincentelli, ‘‘Prototypical cross-domain self-supervised learning
[122] S. Merity, N. S. Keskar, and R. Socher, ‘‘Regularizing and optimizing for few-shot unsupervised domain adaptation,’’ in Proc. IEEE/CVF Conf.
LSTM language models,’’ 2017, arXiv:1708.02182. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 13834–13844.
[123] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, ‘‘How transferable are [145] K. C. Peng, Z. Wu, and J. Ernst, ‘‘Zero-shot deep domain adaptation,’’ in
features in deep neural networks?’’ in Proc. Adv. Neural Inf. Process. Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 764–781.
Syst., 2014, pp. 1–12. [146] E. Kodirov, T. Xiang, Z. Fu, and S. Gong, ‘‘Unsupervised domain adap-
[124] S. Gururangan, A. Marasović, S. Swayamdipta, K. Lo, I. Beltagy, tation for zero-shot learning,’’ in Proc. IEEE Int. Conf. Comput. Vis.
D. Downey, and N. A. Smith, ‘‘Don’t stop pretraining: Adapt language (ICCV), Dec. 2015, pp. 2452–2460.
models to domains and tasks,’’ in Proc. 58th Annu. Meeting Assoc. [147] K. Saito, S. Yamamoto, Y. Ushiku, and T. Harada, ‘‘Open set domain
Comput. Linguistics, 2020, pp. 1–19. adaptation by backpropagation,’’ in Proc. Eur. Conf. Comput. Vis., 2018,
[125] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, pp. 153–168.
L. Zettlemoyer, and V. Stoyanov, ‘‘RoBERTa: A robustly optimized [148] H. Liu, Z. Cao, M. Long, J. Wang, and Q. Yang, ‘‘Separate to adapt: Open
BERT pretraining approach,’’ 2019, arXiv:1907.11692. set domain adaptation via progressive separation,’’ in Proc. IEEE/CVF
[126] J. Phang, T. Févry, and S. R. Bowman, ‘‘Sentence encoders on STILTs: Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 2927–2936.
Supplementary training on intermediate labeled-data tasks,’’ 2018, [149] X. Peng, B. Usman, K. Saito, N. Kaushik, J. Hoffman, and K. Saenko,
arXiv:1811.01088. ‘‘Syn2Real: A new benchmark for synthetic-to-real visual domain adap-
[127] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, tation,’’ 2018, arXiv:1806.09755.
and L. Zettlemoyer, ‘‘Deep contextualized word representations,’’ 2018, [150] Y. Pan, T. Yao, Y. Li, C.-W. Ngo, and T. Mei, ‘‘Exploring
arXiv:1802.05365. category-agnostic clusters for open-set domain adaptation,’’ in Proc.
[128] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever. (2018). IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
Improving Language Understanding by Generative Pre-Training. pp. 13867–13875.
Accessed: Sep. 18, 2021. [Online]. Available: https://cdn.openai.com/ [151] Z. Cao, L. Ma, M. Long, and J. Wang, ‘‘Partial adversarial domain adap-
research-covers/language-unsupervised/language_understanding_paper. tation,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 135–150.
pdf [152] Z. Cao, K. You, M. Long, J. Wang, and Q. Yang, ‘‘Learning to transfer
[129] Y. Zhou and S. Goldman, ‘‘Democratic co-learning,’’ in Proc. 16th IEEE examples for partial domain adaptation,’’ in Proc. IEEE/CVF Conf. Com-
Int. Conf. Tools Artif. Intell., 2004, pp. 594–602. put. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 2985–2994.
[153] K. You, M. Long, Z. Cao, J. Wang, and M. I. Jordan, ‘‘Universal domain [176] S. Lee, S. Cho, and S. Im, ‘‘DRANet: Disentangling representation and
adaptation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. adaptation networks for unsupervised cross-domain adaptation,’’ in Proc.
(CVPR), Jun. 2019, pp. 2720–2729. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021,
[154] J. N. Kundu, N. Venkat, M. V. Rahul, and R. V. Babu, ‘‘Universal source- pp. 15252–15261.
free domain adaptation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern [177] J. J. Hull, ‘‘A database for handwritten text recognition research,’’ IEEE
Recognit. (CVPR), Jun. 2020, pp. 4544–4553. Trans. Pattern Anal. Mach. Intell., vol. 16, no. 5, pp. 550–554, May 1994.
[155] M. Mancini, H. Karaoguz, E. Ricci, P. Jensfelt, and B. Caputo, ‘‘Kitting [178] K. Saito, K. Watanabe, Y. Ushiku, and T. Harada, ‘‘Maximum classifier
in the wild through online domain adaptation,’’ in Proc. IEEE/RSJ Int. discrepancy for unsupervised domain adaptation,’’ in Proc. IEEE/CVF
Conf. Intell. Robots Syst. (IROS), Oct. 2018, pp. 1103–1109. Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 3723–3732.
[156] M. Mancini, S. R. Bulo, B. Caputo, and E. Ricci, ‘‘AdaGraph: Unifying [179] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng,
predictive and continuous domain adaptation through graphs,’’ in Proc. ‘‘Reading digits in natural images with unsupervised feature learning,’’
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, in Proc. NIPS Workshop Deep Learn. Unsupervised, 2011, pp. 1–9.
pp. 6568–6577. [180] K. Saenko, B. Kulis, M. Fritz, and T. Darrell, ‘‘Adapting visual cate-
[157] A. Bobu, E. Tzeng, J. Hoffman, and T. Darrell, ‘‘Adapting to continuously gory models to new domains,’’ in Proc. Eur. Conf. Comput. Vis., 2010,
shifting domains,’’ in Proc. 6th Int. Conf. Learn. Represent. (ICLR), pp. 213–226.
Vancouver, BC, Canada, 2018, pp. 1–4. [181] J. Na, H. Jung, H. J. Chang, and W. Hwang, ‘‘FixBi: Bridg-
[158] H. Wang, H. He, and D. Katabi, ‘‘Continuously indexed domain adapta- ing domain spaces for unsupervised domain adaptation,’’ in Proc.
tion,’’ in Proc. Int. Conf. Mach. Learn. (ICML), 2020, pp. 1–14. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021,
[159] Z. Liu, Z. Miao, X. Pan, X. Zhan, D. Lin, S. X. Yu, and B. Gong, ‘‘Open pp. 1094–1103.
compound domain adaptation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. [182] S. Sankaranarayanan, Y. Balaji, C. D. Castillo, and R. Chellappa, ‘‘Gen-
Pattern Recognit. (CVPR), Jun. 2020, pp. 1–10. erate to adapt: Aligning domains using generative adversarial networks,’’
[160] X. Peng, Z. Huang, Y. Zhu, and K. Saenko, ‘‘Federated adversarial in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
domain adaptation,’’ 2019, arXiv:1911.02054. pp. 8503–8512.
[161] S. Bucci, A. D’Innocente, Y. Liao, F. M. Carlucci, B. Caputo, and [183] H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan,
T. Tommasi, ‘‘Self-supervised learning across domains,’’ IEEE Trans. ‘‘Deep hashing network for unsupervised domain adaptation,’’ in
Pattern Anal. Mach. Intell., vol. 44, no. 9, pp. 5516–5528, Sep. 2022. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
[162] S. Bucci, A. D’Innocente, and T. Tommasi, ‘‘Tackling partial domain pp. 5018–5027.
adaptation with self-supervision,’’ in Proc. Int. Conf. Image Anal. Pro- [184] J. Liang, D. Hu, and J. Feng, ‘‘Do we really need to access the source
cess., 2019, pp. 70–81. data? Source hypothesis transfer for unsupervised domain adaptation,’’
[163] S. Bucci, M. R. Loghmani, and T. Tommasi, ‘‘On the effectiveness of 2020, arXiv:2002.08546.
image rotation for open set domain adaptation,’’ in Proc. Eur. Conf. [185] Q. Wang and T. P. Breckon, ‘‘Unsupervised domain adaptation via struc-
Comput. Vis., 2020, pp. 422–438. tured prediction based selective pseudo-labeling,’’ in Proc. AAAI Conf.
[164] D. Li and T. Hospedales, ‘‘Online meta-learning for multi-source and Artif. Intell., 2020, pp. 6243–6250.
semi-supervised domain adaptation,’’ in Proc. Eur. Conf. Comput. Vis., [186] B. Gong, K. Grauman, and F. Sha, ‘‘Connecting the dots with land-
2020, pp. 382–403. marks: Discriminatively learning domain-invariant features for unsuper-
[165] R. Ribeiro, A. Abad, and J. Lopes, ‘‘Domain adaptation in dialogue vised domain adaptation,’’ in Proc. 30th Int. Conf. Mach. Learn., 2013,
systems using transfer and meta-learning,’’ 2021, arXiv:2102.11146. pp. 222–230.
[166] D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales, ‘‘Learning to general- [187] H.-Z. Feng, Z. You, M. Chen, T. Zhang, M. Zhu, F. Wu, C. Wu, and
ize: Meta-learning for domain generalization,’’ in Proc. 32nd AAAI Conf. W. Chen, ‘‘KD3A: Unsupervised multi-source decentralized domain
Artif. Intell., 2018, pp. 1–8. adaptation via knowledge distillation,’’ 2020, arXiv:2011.09757.
[167] Y. Balaji, S. Sankaranarayanan, and R. Chellappa, ‘‘MetaReg: Towards [188] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays,
domain generalization using meta-regularization,’’ in Proc. Adv. Neural P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, ‘‘Microsoft COCO:
Inf. Process. Syst., 2018, pp. 1–8. Common objects in context,’’ in Proc. Eur. Conf. Comput. Vis., 2014,
[168] J.-C. Su, Y.-H. Tsai, K. Sohn, B. Liu, S. Maji, and M. Chandraker, pp. 740–755.
‘‘Active adversarial domain adaptation,’’ in Proc. IEEE Winter Conf. [189] E. Real, J. Shlens, S. Mazzocchi, X. Pan, and V. Vanhoucke, ‘‘YouTube-
Appl. Comput. Vis. (WACV), Mar. 2020, pp. 739–748. BoundingBoxes: A large high-precision human-annotated data set for
[169] V. Prabhu, A. Chandrasekaran, K. Saenko, and J. Hoffman, ‘‘Active object detection in video,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
domain adaptation via clustering uncertainty-weighted embeddings,’’ Recognit. (CVPR), Jul. 2017, pp. 5296–5305.
in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, [190] H. Tang and K. Jia, ‘‘Discriminative adversarial domain adaptation,’’ in
pp. 8505–8514. Proc. AAAI Conf. Artif. Intell., 2020, pp. 5940–5947.
[170] Y. Kim and C. Kim, ‘‘Semi-supervised domain adaptation via selective [191] Q. Cai, Y. Pan, C.-W. Ngo, X. Tian, L. Duan, and T. Yao, ‘‘Exploring
pseudo labeling and progressive self-training,’’ in Proc. 25th Int. Conf. object relation in mean teacher for cross-domain detection,’’ in Proc.
Pattern Recognit. (ICPR), Jan. 2021, pp. 1059–1066. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,
[171] K. Wang, N. Thakur, N. Reimers, and I. Gurevych, ‘‘GPL: Generative pp. 11457–11466.
pseudo labeling for unsupervised domain adaptation of dense retrieval,’’ [192] X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and
in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, Hum. K. Saenko, ‘‘VisDA: The visual domain adaptation challenge,’’ 2017,
Lang. Technol., 2022, pp. 1–16. arXiv:1710.06924.
[172] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learn- [193] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson,
ing applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11, U. Franke, S. Roth, and B. Schiele, ‘‘The cityscapes dataset for semantic
pp. 2278–2324, Nov. 1998. urban scene understanding,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
[173] J. Wang, J. Chen, J. Lin, L. Sigal, and C. W. de Silva, ‘‘Discriminative fea- Recognit. (CVPR), Jun. 2016, pp. 3213–3223.
ture alignment: Improving transferability of unsupervised domain adap- [194] L. Ming-Yu and T. Oncel, ‘‘Coupled generative adversarial networks,’’ in
tation by Gaussian-guided latent alignment,’’ Pattern Recognit., vol. 116, Proc. Adv. Neural Inf. Process. Syst., vol. 29, 2016, pp. 1–13.
Aug. 2021, Art. no. 107943. [195] Q. Zhou, Q. Gu, J. Pang, X. Lu, and L. Ma, ‘‘Self-adversarial disentan-
[174] S. Xie, Z. Zheng, L. Chen, and C. Chen, ‘‘Learning semantic representa- gling for specific domain adaptation,’’ 2021, arXiv:2108.03553.
tions for unsupervised domain adaptation,’’ in Proc. 35th Int. Conf. Mach. [196] C. Sakaridis, D. Dai, and L. Van Gool, ‘‘Semantic foggy scene under-
Learn., 2018, pp. 5423–5432. standing with synthetic data,’’ Int. J. Comput. Vis., vol. 126, no. 9,
[175] C. Chen, W. Xie, W. Huang, Y. Rong, X. Ding, Y. Huang, T. Xu, pp. 973–992, 2018.
and J. Huang, ‘‘Progressive feature alignment for unsupervised domain [197] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, ‘‘The German traffic
adaptation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. sign recognition benchmark: A multi-class classification competition,’’ in
(CVPR), Jun. 2019, pp. 627–636. Proc. Int. Joint Conf. Neural Netw., Jul. 2011, pp. 1453–1460.
[198] S. R. Richter, V. Vineet, S. Roth, and V. Koltun, ‘‘Playing for data: Ground [220] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and
truth from computer games,’’ in Proc. Eur. Conf. Comput. Vis., 2016, C. Potts, ‘‘Learning word vectors for sentiment analysis,’’ in Proc. 49th
pp. 102–118. Annu. Meeting Assoc. Comput. Linguistics, Hum. Lang. Technol., 2011,
[199] P. Zhang, B. Zhang, T. Zhang, D. Chen, Y. Wang, and F. Wen, ‘‘Proto- pp. 142–150.
typical pseudo label denoising and target structure learning for domain [221] J. Thorne, A. Vlachos, C. Christodoulopoulos, and A. Mittal, ‘‘FEVER:
adaptive semantic segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. A large-scale dataset for fact extraction and verification,’’ in Proc. Conf.
Pattern Recognit. (CVPR), Jun. 2021, pp. 12414–12424. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang. Technol.,
[200] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, 2018, pp. 1–20.
‘‘The SYNTHIA dataset: A large collection of synthetic images [222] M. Davies, ‘‘The corpus of contemporary American English: 450 million
for semantic segmentation of urban scenes,’’ in Proc. IEEE words, 1990-present,’’ Brigham Young Univ., Provo, UT, USA,
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, Tech. Rep., 2008.
pp. 3234–3243. [223] E. Sandhu. (2008). The New York Times Annotated Corpus LDC2008T19
[201] N. Araslanov and S. Roth, ‘‘Self-supervised augmentation consistency for Linguistic Data Consortium. Accessed: Sep. 18, 2021. [Online]. Avail-
adapting semantic segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. able: https://catalog.ldc.upenn.edu/LDC2008T19
Pattern Recognit. (CVPR), Jun. 2021, pp. 15384–15394. [224] Y. Wu, D. Bamman, and S. Russell, ‘‘Adversarial training for relation
[202] Y.-H. Chen, W.-Y. Chen, Y.-T. Chen, B.-C. Tsai, Y.-C.-F. Wang, and extraction,’’ in Proc. Conf. Empirical Methods Natural Lang. Process.,
M. Sun, ‘‘No more discrimination: Cross city adaptation of road scene Copenhagen, Denmark, 2017, pp. 1778–1783.
segmenters,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, [225] M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, ‘‘Building a large
pp. 1992–2001. annotated corpus of English: The Penn Treebank,’’ Comput. Linguistics,
[203] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, ‘‘HMDB: vol. 19, no. 2, pp. 313–330, 1993.
A large video database for human motion recognition,’’ in Proc. Int. Conf. [226] J. Nivre, Ž. Agić, M. J. Aranzabe, M. Asahara, A. Atutxa,
Comput. Vis., Nov. 2011, pp. 2556–2563. M. Ballesteros, J. Bauer, K. Bengoetxea, R. A. Bhat, C. Bosco,
[204] M.-H. Chen, Z. Kira, G. Alregib, J. Yoo, R. Chen, and J. Zheng, S. Bowman, G. G. A. Celano and M. Connor. (Nov. 15, 2015).
‘‘Temporal attentive alignment for large-scale video domain adapta- Universal Dependencies. Accessed: Sep. 18, 2021. [Online]. Available:
tion,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, http://hdl.handle.net/11234/1-1548
pp. 6321–6330. [227] S. Petrov and R. McDonald, ‘‘Overview of the 2012 shared task on
[205] K. Soomro, A. R. Zamir, and M. Shah, ‘‘UCF101: A dataset of 101 human parsing the web,’’ Google, Tech. Rep., 2012.
actions classes from videos in the wild,’’ 2012, arXiv:1212.0402. [228] C. Galves, ‘‘The Tycho Brahe corpus of historical Portuguese: Method-
[206] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, ology and results,’’ Linguistic Variation, vol. 18, no. 1, pp. 49–73, 2018.
and N. Navab, ‘‘Model based training, detection and pose estimation of [229] Y. Yang and J. Eisenstein, ‘‘Fast easy unsupervised domain adaptation
texture-less 3D objects in heavily cluttered scenes,’’ in Proc. Asian Conf. with marginalized structured dropout,’’ in Proc. 52nd Annu. Meeting
Comput. Vis., 2012, pp. 548–562. Assoc. Comput. Linguistics, 2014, pp. 538–544.
[207] X. Zhang, Y. Sugano, M. Fritz, and A. Bulling, ‘‘Appearance-based [230] A. Kroch, B. Santorini, and L. Delfs. (2004). The Penn-Helsinki Parsed
gaze estimation in the wild,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Corpus of Early Modern English (PPCEME). Accessed: Sep. 18, 2021.
Recognit. (CVPR), Jun. 2015, pp. 4511–4520. [Online]. Available: http://www.ling.upenn.edu/ppche/ppche-release-
[208] Z. Guo, Z. Yuan, C. Zhang, W. Chi, Y. Ling, and S. Zhang, ‘‘Domain 2016/PPCEME-RELEASE-3
adaptation gaze estimation by embedding with prediction consistency,’’ [231] E. F. T. K. Sang and F. De Meulder, ‘‘Introduction to the CoNLL-2003
in Proc. Asian Conf. Comput. Vis., 2020, pp. 1–16. shared task: Language-independent named entity recognition,’’ 2003,
[209] K. A. F. Mora, F. Monay, and J.-M. Odobez, ‘‘EYEDIAP: A database arXiv:cs/0306050.
for the development and evaluation of gaze estimation algorithms from [232] C. Jia, X. Liang, and Y. Zhang, ‘‘Cross-domain NER using Cross-domain
RGB and RGB-D cameras,’’ in Proc. Symp. Eye Tracking Res. Appl., language modeling,’’ in Proc. 57th Annu. Meeting Assoc. Comput. Lin-
Mar. 2014, pp. 255–258. guistics, 2019, pp. 2464–2474.
[210] P. Jiang and S. Saripalli, ‘‘LiDARNet: A boundary-aware domain [233] T.-T. Vu, D. Phung, and G. Haffari, ‘‘Effective unsupervised domain
adaptation model for point cloud semantic segmentation,’’ 2020, adaptation with adversarially trained language models,’’ in Proc. Conf.
arXiv:2003.01174. Empirical Methods Natural Lang. Process. (EMNLP), 2020, pp. 1–11.
[211] J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, [234] B. Strauss, B. Toma, A. Ritter, M. D. Marneffe, and W. Xu, ‘‘Results of the
and J. Gall, ‘‘SemanticKITTI: A dataset for semantic scene understand- WNUT16 named entity recognition shared task,’’ in Proc. 2nd Workshop
ing of LiDAR sequences,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. Noisy User-Generated Text (WNUT), 2016, pp. 1–7.
(ICCV), Oct. 2019, pp. 9297–9307. [235] D. Bamman, S. Popat, and S. Shen, ‘‘An annotated dataset of lit-
[212] Y. Pan, B. Gao, J. Mei, S. Geng, C. Li, and H. Zhao, ‘‘SemanticPOSS: erary entities,’’ in Proc. Conf. North Amer. Chapter Assoc. Com-
A point cloud dataset with large quantity of dynamic instances,’’ in Proc. put. Linguistics, Hum. Lang. Technol., Minneapolis, MI, USA, 2019,
IEEE Intell. Vehicles Symp. (IV), Oct. 2020, pp. 687–693. pp. 2138–2144.
[213] A. Carballo, J. Lambert, A. Monrroy, D. Wong, P. Narksri, Y. Kitsukawa, [236] J. Pustejovsky, P. Hanks, R. Sauri, A. See, R. Gaizauskas, A. Setzer,
E. Takeuchi, S. Kato, and K. Takeda, ‘‘LIBRE: The multiple 3D D. Radev, B. Sundheim, D. Day, L. Ferro, and M. Lazo, ‘‘The TIME-
LiDAR dataset,’’ in Proc. IEEE Intell. Vehicles Symp. (IV), Oct. 2020, BANK corpus,’’ in Proc. Corpus Linguistics Conf., 2003, p. 40.
pp. 1094–1101. [237] G. Crichton, S. Pyysalo, B. Chiu, and A. Korhonen, ‘‘A neural network
[214] L. T. Triess, M. Dreissig, C. B. Rist, and J. M. Zollner, ‘‘A survey on deep multi-task learning approach to biomedical named entity recognition,’’
domain adaptation for LiDAR perception,’’ in Proc. IEEE Intell. Vehicles BMC Bioinf., vol. 18, no. 1, pp. 1–14, Dec. 2017.
Symp. Workshops (IV Workshops), Jul. 2021, pp. 350–357. [238] J. C. S. Alvarado, K. Verspoor, and T. Baldwin, ‘‘Domain adaption of
[215] Waymo Open Dataset, Waymo, Mountain View, CA, USA, Jun. 2022. named entity recognition to support credit risk assessment,’’ in Proc.
[216] B. Caine, R. Roelofs, V. Vasudevan, J. Ngiam, Y. Chai, Z. Chen, and Australas. Lang. Technol. Assoc. Workshop, 2015, pp. 84–90.
J. Shlens, ‘‘Pseudo-labeling for scalable 3D object detection,’’ 2021, [239] J.-D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier, ‘‘Intro-
arXiv:2103.02093. duction to the bio-entity recognition task at JNLPBA,’’ in Proc.
[217] R. He and J. McAuley, ‘‘Ups and downs: Modeling the visual evolution Int. Joint Workshop Natural Lang. Process. Biomed. Appl., 2004,
of fashion trends with one-class collaborative filtering,’’ in Proc. 25th Int. pp. 70–75.
Conf. World Wide Web, Apr. 2016, pp. 1–11. [240] N. Poerner, U. Waltinger, and H. Schütze, ‘‘Inexpensive domain adapta-
[218] Q. Nguyen. (2015). Skytrax Reviews. Accessed: Sep. 18, 2021. [Online]. tion of pretrained language models: Case studies on biomedical NER and
Available: https://github.com/quankiquanki/skytrax-reviews-dataset COVID-19 QA,’’ in Proc. Findings Assoc. Comput. Linguistics, EMNLP,
[219] R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk, A. Farhadi, F. Roesner, 2020, pp. 1–8.
and Y. Choi, ‘‘Defending against neural fake news,’’ 2019, arXiv:1905. [241] L. Smith et al., ‘‘Overview of BioCreative II gene mention recognition,’’
12616. Genome Biol., vol. 9, no. 2, pp. 1–19, 2008.
[242] J. Li, Y. Sun, R. J. Johnson, D. Sciaky, C.-H. Wei, R. Leaman, A. P. Davis, [263] A. E. W. Johnson, T. J. Pollard, L. Shen, L.-W.-H. Lehman, M. Feng,
C. J. Mattingly, T. C. Wiegers, and Z. Lu, ‘‘BioCreative V CDR task M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi, and R. G. Mark,
corpus: A resource for chemical disease relation extraction,’’ Database, ‘‘MIMIC-III, a freely accessible critical care database,’’ Sci. Data, vol. 3,
vol. 2016, Jan. 2016, Art. no. baw068. no. 1, pp. 1–9, May 2016.
[243] R. I. Doğan, R. Leaman, and Z. Lu, ‘‘NCBI disease corpus: A resource [264] S. Purushotham, W. Carvalho, T. Nilanon, and Y. Liu, ‘‘Variational recur-
for disease name recognition and concept normalization,’’ J. Biomed. rent adversarial deep domain adaptation,’’ in Proc. ICLR, 2017, pp. 1–15.
Informat., vol. 47, pp. 1–10, Feb. 2014. [265] R. G. Khemani, D. Conti, T. A. Alonzo, R. D. Bart, and C. J. L. Newth,
[244] M. Krallinger, O. Rabal, F. Leitner, M. Vazquez, D. Salgado, Z. Lu, ‘‘Effect of tidal volume in children with acute hypoxemic respiratory
R. Leaman, Y. Lu, D. Ji, and D. M. Lowe, ‘‘The CHEMDNER corpus failure,’’ Intensive Care Med., vol. 35, no. 8, pp. 1428–1437, Aug. 2009.
of chemicals and drugs and its annotation principles,’’ J. Cheminform., [266] A. Jain, H. S. Koppula, S. Soh, B. Raghavan, A. Singh, and A. Saxena,
vol. 7, no. 1, pp. 1–17, 2015. ‘‘Brain4Cars: Car that knows before you do via sensory-fusion deep
[245] M. Gerner, G. Nenadic, and C. M. Bergman, ‘‘LINNAEUS: A species learning architecture,’’ 2016, arXiv:1601.00740.
name identification system for biomedical literature,’’ BMC Bioinf., [267] M. Tonutti, E. Ruffaldi, A. Cattaneo, and C. A. Avizzano, ‘‘Robust
vol. 11, no. 1, pp. 1–17, Dec. 2010. and subject-independent driving manoeuvre anticipation through domain-
[246] C. Walker, S. Strassel, J. Medero, and K. Maeda. (Feb. 15, 2006). adversarial recurrent neural networks,’’ Robot. Auto. Syst., vol. 115,
ACE 2005 Multilingual Training Corpus. [Online]. Available: pp. 162–173, May 2019.
https://catalog.ldc.upenn.edu/LDC2006T06 [268] R. Cai, J. Chen, Z. Li, W. Chen, K. Zhang, J. Ye, Z. Li, X. Yang,
[247] L. Fu, T. H. Nguyen, B. Min, and R. Grishman, ‘‘Domain adaptation for and Z. Zhang, ‘‘Time series domain adaptation via sparse associative
relation extraction with domain adversarial neural network,’’ in Proc. 8th structure alignment,’’ in Proc. 35th AAAI Conf. Artif. Intell. (AAAI), 2021,
Int. Joint Conf. Natural Lang. Process., 2017, pp. 425–429. pp. 6859–6867.
[248] S. Pyysalo, F. Ginter, J. Heimonen, J. Björne, J. Boberg, J. Järvinen, [269] Y. Zheng, X. Yi, M. Li, R. Li, Z. Shan, E. Chang, and T. Li, ‘‘Forecasting
and T. Salakoski, ‘‘BioInfer: A corpus for information extraction in the fine-grained air quality based on big data,’’ in Proc. 21st ACM SIGKDD
biomedical domain,’’ BMC Bioinf., vol. 8, no. 1, pp. 1–24, Dec. 2007. Int. Conf. Knowl. Discovery Data Mining, Aug. 2015, pp. 2267–2276.
[249] A. Rios, R. Kavuluru, and Z. Lu, ‘‘Generalizing biomedical relation clas- [270] C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim,
sification with neural adversarial domain adaptation,’’ Bioinformatics, J. N. Chang, S. Lee, and S. S. Narayanan, ‘‘IEMOCAP: Interactive emo-
vol. 34, no. 17, pp. 2973–2981, Sep. 2018. tional dyadic motion capture database,’’ Lang. Resour. Eval., vol. 42,
[250] R. Bunescu, R. Ge, R. J. Kate, E. M. Marcotte, R. J. Mooney, no. 4, pp. 335–359, Dec. 2008.
A. K. Ramani, and Y. W. Wong, ‘‘Comparative experiments on learning [271] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, ‘‘Collecting large, richly
information extractors for proteins and their interactions,’’ Artif. Intell. annotated facial-expression databases from movies,’’ IEEE Multimedia-
Med., vol. 33, no. 2, pp. 139–155, Feb. 2005. Mag., vol. 19, no. 3, pp. 34–41, Jul. 2012.
[251] I. Segura-Bedmar, P. Martínez, and M. Herrero-Zazo, ‘‘Lessons learnt [272] C. Busso, S. Parthasarathy, A. Burmania, M. AbdelWahab, N. Sadoughi,
from the DDIExtraction-2013 shared task,’’ J. Biomed. Informat., vol. 51, and E. M. Provost, ‘‘MSP-IMPROV: An acted corpus of dyadic interac-
pp. 152–164, Oct. 2014. tions to study emotion perception,’’ IEEE Trans. Affect. Comput., vol. 8,
no. 1, pp. 67–80, Jan. 2017.
[252] A. Liu, S. Soderland, J. Bragg, C. H. Lin, X. Ling, and D. S. Weld,
[273] D. Huang, J. Sun, and Y. Wang, ‘‘The BUAA-VisNir
‘‘Effective crowd annotation for relation extraction,’’ in Proc. Conf. North
face database instructions,’’ Beihang Univ., Beijing, China,
Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang. Technol., 2016,
Tech. Rep., IRIP-TR-12-FR-001, 2012.
pp. 897–906.
[274] G. Zhao, X. Huang, M. Taini, S. Z. Li, and M. Pietikäinen, ‘‘Facial
[253] Microsoft Corporation. (Jun. 2019). Microsoft Research Open Data (Cor-
expression recognition from near-infrared videos,’’ Image Vis. Comput.,
tana Dataset). Accessed: Sep. 18, 2021. [Online]. Available: https://
vol. 29, no. 9, pp. 607–619, Aug. 2011.
msropendata.com/datasets/1cc496ec-aaff-4576-b4bc-4a65798fa907
[275] T. Sim, S. Baker, and M. Bsat, ‘‘The CMU pose, illumination, and
[254] D. Pearce and H.-G. Hirsch, ‘‘The aurora experimental framework for
expression database,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 25,
the performance evaluation of speech recognition systems under noisy
no. 12, pp. 1615–1618, Dec. 2003.
conditions,’’ in Proc. 6th Int. Conf. Spoken Lang. Process., Oct. 2000,
[276] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, ‘‘From few
pp. 1–8.
to many: Illumination cone models for face recognition under variable
[255] S. Sun, B. Zhang, L. Xie, and Y. Zhang, ‘‘An unsupervised deep domain lighting and pose,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6,
adaptation approach for robust speech recognition,’’ Neurocomputing, pp. 643–660, Jun. 2001.
vol. 257, pp. 79–87, Sep. 2017. [277] J.-M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders,
[256] D. B. Paul and J. M. Baker, ‘‘The design for the wall street journal-based ‘‘The Amsterdam library of object images,’’ Int. J. Comput. Vis., vol. 61,
CSR corpus,’’ in Proc. 2nd Int. Conf. Spoken Lang. Process. (ICSLP), no. 1, pp. 103–112, Jan. 2005.
Oct. 1992, pp. 1–6. [278] S. A. Nene, S. K. Nayar, and H. Murase, ‘‘Columbia object image library
[257] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, ‘‘LibriSpeech: (COIL-100),’’ Dept. Comput. Sci., Columbia Univ., New York, NY, USA,
An ASR corpus based on public domain audio books,’’ in Proc. Tech. Rep. CUCS-006-96, 1996.
IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 2015, [279] H. Cao, D. G. Cooper, M. K. Keutmann, R. C. Gur, A. Nenkova, and
pp. 5206–5210. R. Verma, ‘‘CREMA-D: Crowd-sourced emotional multimodal actors
[258] D. Iskra, B. Grosskopf, K. Marasek, H. V. D. Heuvel, F. Diehl, and dataset,’’ IEEE Trans. Affect. Comput., vol. 5, no. 4, pp. 377–390,
A. Kiessling, ‘‘SPEECON—Speech databases for consumer devices: Oct. 2014.
Database specification and validation,’’ in Proc. LREC, 2002, pp. 1–6. [280] S. R. Livingstone and F. A. Russo, ‘‘The Ryerson audio-visual database
[259] P. Denisov, N. T. Vu, and M. F. Font, ‘‘Unsupervised domain adaptation of emotional speech and song (RAVDESS): A dynamic, multimodal set
by adversarial learning for robust speech recognition,’’ in Proc. Speech of facial and vocal expressions in North American English,’’ PLoS ONE,
Commun.; 13th ITG-Symp., 2018, pp. 1–15. vol. 13, no. 5, May 2018, Art. no. e0196391.
[260] P. Price et al., ‘‘Resource management RM1 2.0 LDC93S3B, [281] A. Zadeh, R. Zellers, E. Pincus, and L.-P. Morency, ‘‘MOSI: Multimodal
Web download,’’ Linguistic Data Consortium, Univ. Pennsylvania, corpus of sentiment intensity and subjectivity analysis in online opinion
Philadelphia, PA, USA, 1993. [Online]. Available: https://catalog.ldc. videos,’’ 2016, arXiv:1606.06259.
upenn.edu/LDC93S3B [282] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, ‘‘NUS-WIDE:
[261] C. J. Leggetter and P. C. Woodland, ‘‘Maximum likelihood lin- A real-world web image database from national university of Singapore,’’
ear regression for speaker adaptation of continuous density hidden in Proc. ACM Int. Conf. Image Video Retr., Jul. 2009, pp. 1–9.
Markov models,’’ Comput. Speech Lang., vol. 9, no. 2, pp. 171–185, [283] A. Zubiaga, M. Liakata, and R. Procter, ‘‘Exploiting context for rumour
Apr. 1995. detection in social media,’’ in Social Informatics. Cham, Switzerland:
[262] J. H. L. Hansen, A. Sangwan, A. Joglekar, A. E. Bulut, L. Kaushik, Springer, 2017, pp. 109–123.
and C. Yu, ‘‘Fearless steps: Apollo-11 corpus advancements for speech [284] E. Kochkina, M. Liakata, and A. Zubiaga, ‘‘All-in-one: Multi-task learn-
technologies from earth to the moon,’’ in Proc. Interspeech, Hyderabad, ing for rumour verification,’’ in Proc. 27th Int. Conf. Comput. Linguistics,
India, Sep. 2018, pp. 1–5. 2018, pp. 1–12.
[285] D. Damen, H. Doughty, G. M. Farinella, S. Fidler, A. Furnari, E. Kazakos, [307] Z. Yang, J. Hu, R. Salakhutdinov, and W. W. Cohen, ‘‘Semi-supervised
D. Moltisanti, J. Munro, T. Perrett, and W. Price, ‘‘Scaling egocentric QA with generative domain-adaptive nets,’’ 2017, arXiv:1702.02206.
vision: The EPIC-kitchens dataset,’’ in Proc. Eur. Conf. Comput. Vis. [308] D. Britz, Q. Le, and R. Pryzant, ‘‘Effective domain mixing for neu-
(ECCV), 2018, pp. 720–736. ral machine translation,’’ in Proc. 2nd Conf. Mach. Transl., 2017,
[286] J. Liang, Y. Wang, D. Hu, R. He, and J. Feng, ‘‘A balanced and pp. 118–126.
uncertainty-aware approach for partial domain adaptation,’’ in Proc. Eur. [309] B. Chen, C. Cherry, G. Foster, and S. Larkin, ‘‘Cost weighting for neural
Conf. Comput. Vis. (ECCV), 2020, pp. 123–140. machine translation domain adaptation,’’ in Proc. 1st Workshop Neural
[287] F. Qiao, L. Zhao, and X. Peng, ‘‘Learning to learn single domain gen- Mach. Transl., 2017, pp. 40–46.
eralization,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. [310] C. Chu and R. Wang, ‘‘A survey of domain adaptation for neural machine
(CVPR), Jun. 2020, pp. 12556–12565. translation,’’ in Proc. 27th Int. Conf. Comput. Linguistics, Sante Fe, NM,
[288] T. M. H. Hsu, W. Y. Chen, C.-A. Hou, Y.-H.-H. Tsai, Y.-R. Yeh, USA, 2018, pp. 1–16.
and Y.-C.-F. Wang, ‘‘Unsupervised domain adaptation with imbalanced [311] W.-J. Ko, G. Durrett, and J. J. Li, ‘‘Domain agnostic real-valued speci-
cross-domain data,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), ficity prediction,’’ in Proc. AAAI Conf. Artif. Intell., 2019, pp. 6610–6617.
Dec. 2015, pp. 4121–4129. [312] Q. Wang, W. Rao, S. Sun, L. Xie, E. S. Chng, and H. Li, ‘‘Unsupervised
[289] T. Gebru, J. Hoffman, and L. Fei-Fei, ‘‘Fine-grained recognition in the domain adaptation via domain adversarial training for speaker recogni-
wild: A multi-task domain adaptation approach,’’ in Proc. IEEE Int. Conf. tion,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP),
Comput. Vis. (ICCV), Oct. 2017, pp. 1349–1358. Apr. 2018, pp. 4889–4893.
[290] S. Reddy, D. Chen, and C. D. Manning, ‘‘CoQA: A conversational [313] S. Khurana, N. Moritz, T. Hori, and J. L. Roux, ‘‘Unsupervised domain
question answering challenge,’’ Trans. Assoc. Comput. Linguistics, vol. 7, adaptation for speech recognition via uncertainty driven self-training,’’
pp. 249–266, Nov. 2019. in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP),
[291] S. Gupta, J. Hoffman, and J. Malik, ‘‘Cross modal distillation for super- Jun. 2021, pp. 6553–6557.
vision transfer,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. [314] E. Hosseini-Asl, Y. Zhou, C. Xiong, and R. Socher, ‘‘Augmented cyclic
(CVPR), Jun. 2016, pp. 2827–2836. adversarial learning for low resource domain adaptation,’’ in Proc. Int.
[292] D. Saunders and B. Byrne, ‘‘Reducing gender bias in neural machine Conf. Learn. Represent., 2019, pp. 1–15.
translation as a domain adaptation problem,’’ 2020, arXiv:2004.04498. [315] L. Samarakoon, B. Mak, and A. Y. S. Lam, ‘‘Domain adaptation of end-to-
[293] T. DeVries, I. Misra, C. Wang, and L. V. D. Maaten, ‘‘Does object end speech recognition in low-resource settings,’’ in Proc. IEEE Spoken
recognition work for everyone?’’ in Proc. IEEE/CVF Conf. Comput. Vis. Lang. Technol. Workshop (SLT), Dec. 2018, pp. 382–388.
Pattern Recognit. Workshops, 2019, pp. 52–59. [316] H. Zhao, Z. Zhu, J. Hu, A. Coates, and G. Gordon, ‘‘Principled
[294] G. Csurka, R. Volpi, and B. Chidlovskii, ‘‘Unsupervised domain adapta- hybrids of generative and discriminative domain adaptation,’’ 2017,
tion for semantic image segmentation: A comprehensive survey,’’ 2021, arXiv:1705.09011.
arXiv:2112.03241. [317] X. Zhang, J. Wang, N. Cheng, and J. Xiao, ‘‘TDASS: Target domain
[295] P. Oza, V. A. Sindagi, and V. M. Patel, ‘‘Unsupervised domain adaptation adaptation speech synthesis framework for multi-speaker low-resource
of object detectors: A survey,’’ 2021, arXiv:2105.13502. TTS,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2022, pp. 1–7.
[296] K. Sohn, S. Liu, G. Zhong, X. Yu, M.-H. Yang, and M. Chandraker, [318] S. Mavaddaty, S. M. Ahadi, and S. Seyedin, ‘‘A novel speech enhance-
‘‘Unsupervised domain adaptation for face recognition in unlabeled ment method by learnable sparse and low-rank decomposition and
videos,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, domain adaptation,’’ Speech Commun., vol. 76, pp. 42–60, Feb. 2016.
pp. 3210–3218. [319] C. Chen, Y. Miao, C. X. Lu, L. Xie, P. Blunsom, A. Markham, and
[297] H.-K. Hsu, C.-H. Yao, Y.-H. Tsai, W.-C. Hung, H.-Y. Tseng, M. Singh, N. Trigoni, ‘‘MotionTransformer: Transferring neural inertial track-
and M.-H. Yang, ‘‘Progressive domain adaptation for object detection,’’ ing between domains,’’ in Proc. AAAI Conf. Artif. Intell., 2019,
in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Mar. 2020, pp. 8009–8016.
pp. 749–757. [320] A. Hussein and H. Hajj, ‘‘Domain adaptation with representation learning
[298] Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, ‘‘Domain adaptive and nonlinear relation for time series,’’ ACM Trans. Internet Things,
faster R-CNN for object detection in the wild,’’ in Proc. IEEE/CVF Conf. vol. 3, no. 2, pp. 1–26, May 2022.
Comput. Vis. Pattern Recognit., Jun. 2018, pp. 3339–3348. [321] F. Ott, D. Rügamer, L. Heublein, B. Bischl, and C. Mutschler, ‘‘Domain
[299] W. Xu, J. He, H. L. Zhang, B. Mao, and J. Cao, ‘‘Real-time target adaptation for time-series classification to mitigate covariate shift,’’ 2022,
detection and recognition with deep convolutional networks for intelli- arXiv:2204.03342.
gent visual surveillance,’’ in Proc. 9th Int. Conf. Utility Cloud Comput., [322] G. Wilson, J. R. Doppa, and D. J. Cook, ‘‘Multi-source deep domain adap-
Dec. 2016, pp. 321–326. tation with weak supervision for time-series sensor data,’’ in Proc. 26th
[300] J. Zhang, J. Huang, Z. Luo, G. Zhang, and S. Lu, ‘‘DA-DETR: ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Aug. 2020,
Domain adaptive detection transformer by hybrid attention,’’ 2021, pp. 1768–1778.
arXiv:2103.17084. [323] B. Lucas, C. Pelletier, D. Schmidt, G. I. Webb, and F. Petitjean, ‘‘Unsuper-
[301] S. Sankaranarayanan, Y. Balaji, A. Jain, S. N. Lim, and R. Chellappa, vised domain adaptation techniques for classification of satellite image
‘‘Learning from synthetic data: Addressing domain shift for semantic time series,’’ in Proc. IEEE Int. Geosci. Remote Sens. Symp., Sep. 2020,
segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 1074–1077.
Jun. 2018, pp. 3752–3761. [324] X. Jin, Y. Park, D. C. Maddix, H. Wang, and Y. Wang, ‘‘Domain adapta-
[302] Y. Li, N. Wang, J. Liu, and X. Hou, ‘‘Demystifying neural style transfer,’’ tion for time series forecasting via attention sharing,’’ in Proc. Int. Conf.
in Proc. 26th Int. Joint Conf. Artif. Intell., Aug. 2017, pp. 1–7. Mach. Learn., 2022, pp. 10280–10297.
[303] J. N. Kundu, P. K. Uppala, A. Pahuja, and R. V. Babu, ‘‘AdaDepth: [325] T. Li, X. Chen, S. Zhang, Z. Dong, and K. Keutzer, ‘‘Cross-domain
Unsupervised content congruent adaptation for depth estimation,’’ in sentiment classification with contrastive learning and mutual information
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, maximization,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.
pp. 2656–2665. (ICASSP), Jun. 2021, pp. 8203–8207.
[304] T.-H. Chen, Y.-H. Liao, C.-Y. Chuang, W.-T. Hsu, J. Fu, and M. Sun, [326] P. Budzianowski, T.-H. Wen, B.-H. Tseng, I. Casanueva, S. Ultes,
‘‘Show, adapt and tell: Adversarial training of cross-domain image O. Ramadan, and M. Gašić, ‘‘MultiWOZ—A large-scale multi-domain
captioner,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, Wizard-of-Oz dataset for task-oriented dialogue modelling,’’ 2018,
pp. 521–530. arXiv:1810.00278.
[305] W. Zhao, W. Xu, M. Yang, J. Ye, Z. Zhao, Y. Feng, and Y. Qiao, ‘‘Dual [327] W.-N. Zhang, Q. Zhu, Y. Wang, Y. Zhao, and T. Liu, ‘‘Neural personalized
learning for cross-domain image captioning,’’ in Proc. ACM Conf. Inf. response generation as domain adaptation,’’ World Wide Web, vol. 22,
Knowl. Manage., Nov. 2017, pp. 29–38. pp. 1427–1446, Jul. 2019.
[306] J. Johnson, A. Karpathy, and L. Fei-Fei, ‘‘DenseCap: Fully convolutional [328] M. Yang, W. Tu, Q. Qu, Z. Zhao, X. Chen, and J. Zhu, ‘‘Personalized
localization networks for dense captioning,’’ in Proc. IEEE Conf. Comput. response generation by dual-learning based domain adaptation,’’ Neural
Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 4565–4574. Netw., vol. 103, pp. 72–82, Jul. 2018.
[329] M. Azamfar, X. Li, and J. Lee, ‘‘Deep learning-based domain [351] X. Deng, H. L. Yang, N. Makkar, and D. Lunga, ‘‘Large scale unsu-
adaptation method for fault diagnosis in semiconductor manufactur- pervised domain adaptation of segmentation networks with adversarial
ing,’’ IEEE Trans. Semicond. Manuf., vol. 33, no. 3, pp. 445–453, learning,’’ in Proc. IEEE Int. Geosci. Remote Sens. Symp., Jul. 2019,
Aug. 2020. pp. 4955–4958.
[330] M. Thota, S. Kollias, M. Swainson, and G. Leontidis, ‘‘Multi-source deep [352] X. Wang and X. Tang, ‘‘Face photo-sketch synthesis and recognition,’’
domain adaptation for quality control in retail food packaging,’’ 2020, IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 11, pp. 1955–1967,
arXiv:2001.10335. Nov. 2009.
[331] M. Wulfmeier, A. Bewley, and I. Posner, ‘‘Incremental adversarial [353] M. Latah and L. Toker, ‘‘Artificial intelligence enabled software-defined
domain adaptation for continually changing environments,’’ in Proc. networking: A comprehensive overview,’’ IET Netw., vol. 8, no. 2,
IEEE Int. Conf. Robot. Autom. (ICRA), May 2018, pp. 4489–4495. pp. 79–99, Mar. 2019.
[332] K. Bousmalis, A. Irpan, P. Wohlhart, Y. Bai, M. Kelcey, M. Kalakrishnan, [354] J. Yoo, Y. Hong, Y. Noh, and S. Yoon, ‘‘Domain adaptation using adver-
L. Downs, J. Ibarz, P. Pastor, K. Konolige, S. Levine, and V. Vanhoucke, sarial learning for autonomous navigation,’’ 2017, arXiv:1712.03742.
‘‘Using simulation and domain adaptation to improve efficiency of deep [355] Y. Liu, L. Zhong, J. Qiu, J. Lu, and W. Wang, ‘‘Unsupervised domain
robotic grasping,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), adaptation for nonintrusive load monitoring via adversarial and joint
May 2018, pp. 4243–4250. adaptation network,’’ IEEE Trans. Ind. Informat., vol. 18, no. 1,
[333] F. Barbato, M. Toldo, U. Michieli, and P. Zanuttigh, ‘‘Latent space regu- pp. 266–277, Jan. 2022.
larization for unsupervised domain adaptation in semantic segmentation,’’ [356] G. Wilson and D. J. Cook, ‘‘A survey of unsupervised deep domain
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops adaptation,’’ ACM Trans. Intell. Syst. Technol., vol. 11, no. 5, pp. 1–46,
(CVPRW), Jun. 2021, pp. 2835–2845. Oct. 2020.
[334] D. Kothandaraman, R. Chandra, and D. Manocha, ‘‘BoMuDANet: Unsu- [357] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, ‘‘Star-
pervised adaptation for visual scene understanding in unstructured driving GAN: Unified generative adversarial networks for multi-domain image-
environments,’’ 2020, arXiv:2010.03523. to-image translation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
[335] F. Munir, S. Azam, and M. Jeon, ‘‘SSTN: Self-supervised domain Recognit., Jun. 2018, pp. 8789–8797.
adaptation thermal object detection for autonomous driving,’’ 2021, [358] P. W. Koh, S. Sagawa, H. Marklund, S. M. Xie, M. Zhang,
arXiv:2103.03150. A. Balsubramani, W. Hu, M. Yasunaga, R. L. Phillips, I. Gao, T. Lee,
[336] M. Teichmann, M. Weber, M. Zollner, R. Cipolla, and R. Urtasun, ‘‘Multi- E. David, I. Stavness, W. Guo, B. A. Earnshaw, and I. S. Haque,
Net: Real-time joint semantic reasoning for autonomous driving,’’ in ‘‘WILDS: A benchmark of in-the-wild distribution shifts,’’ in Proc.
Proc. IEEE Intell. Vehicles Symp. (IV), Jun. 2018, pp. 1013–1020. PMLR, 2021, pp. 5637–5664.
[337] M. Sedinkina, N. Breitkopf, and H. Schütze, ‘‘Automatic domain adap- [359] T. Talaviya, D. Shah, N. Patel, H. Yagnik, and M. Shah, ‘‘Implementation
tation outperforms manual domain adaptation for predicting financial of artificial intelligence in agriculture for optimisation of irrigation and
outcomes,’’ 2020, arXiv:2006.14209. application of pesticides and herbicides,’’ Artif. Intell. Agricult., vol. 4,
[338] B. Lebichot, Y.-A. L. Borgne, L. He-Guelton, F. Oblé, and G. Bontempi, pp. 58–73, Jan. 2020.
‘‘Deep-learning domain adaptation techniques for credit cards fraud [360] Forbes. (Feb. 7, 2021). 10 Ways AI Has The Potential To Improve
detection,’’ in Recent Advances in Big Data and Deep Learning. Florence, Agriculture In 2021. Accessed: Sep. 18, 2021. [Online]. Avail-
Italy: Association for Computational Linguistics, 2020, pp. 78–88. able: https://www.forbes.com/sites/louiscolumbus/2021/02/17/10-ways-
[339] R. Caruana, ‘‘Multitask learning,’’ Mach. Learn., vol. 28, pp. 41–75, ai-has-the-potential-to-improve-agriculture-in-2021/?sh=379355747f3b
Jul. 1997. [361] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan,
[340] P. Gardner, X. Liu, and K. Worden, ‘‘On the application of domain ‘‘Unsupervised pixel-level domain adaptation with generative adversarial
adaptation in structural health monitoring,’’ Mech. Syst. Signal Process., networks,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
vol. 138, Apr. 2020, Art. no. 106550. Jul. 2017, pp. 3722–3731.
[341] M. Long, J. Wang, G. Ding, S. J. Pan, and P. S. Yu, ‘‘Adaptation regular- [362] A. Rahate, R. Walambe, S. Ramanna, and K. Kotecha, ‘‘Multimodal
ization: A general framework for transfer learning,’’ IEEE Trans. Knowl. co-learning: Challenges, applications with datasets, recent advances and
Data Eng., vol. 26, no. 5, pp. 1076–1089, May 2014. future directions,’’ 2021, arXiv:2107.13782.
[342] W. M. Kouw and M. Loog, ‘‘A review of domain adaptation without [363] Y. Zhang, R. Barzilay, and T. Jaakkola, ‘‘Aspect-augmented adversarial
target labels,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 3, networks for domain adaptation,’’ Trans. Assoc. Comput. Linguistics,
pp. 766–785, Mar. 2021. vol. 5, pp. 515–528, Dec. 2017.
[343] J. Ren, I. Hacihaliloglu, E. A. Singer, D. J. Foran, and X. Qi, ‘‘Unsuper- [364] P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui,
vised domain adaptation for classification of histopathology whole-slide J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam,
images,’’ Frontiers Bioeng. Biotechnol., vol. 7, pp. 1–10, May 2019. H. Zhao, A. Timofev, S. Ettinger, M. Krivoko, and A. Joshi, ‘‘Scal-
[344] X. Tang, B. Du, J. Huang, Z. Wang, and L. Zhang, ‘‘On combining active ability in perception for autonomous driving: Waymo open dataset,’’
and transfer learning for medical data classification,’’ IET Comput. Vis., in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jul. 2020,
vol. 13, no. 2, pp. 194–205, Mar. 2019. pp. 2446–2454.
[345] K. Kamnitsas, C. Baumgartner, C. Ledig, V. F. Newcombe, J. P. Simpson,
A. D. Kane, D. K. Menon, A. Nori, A. Criminisi, D. Rueckert, and
B. Glocker, ‘‘Unsupervised domain adaptation in brain lesion segmen-
tation with adversarial networks,’’ in Proc. Int. Conf. Inf. Process. Med.
Imag., 2017, pp. 597–609.
[346] C. Liu, A. Mauricio, J. Qi, D. Peng, and K. Gryllias, ‘‘Domain adaptation
digital twin for rolling element bearing prognostics,’’ in Proc. Annu. Conf.
PHM Soc., 2020, pp. 1–10.
[347] A. G. Mahyari and T. Locker, ‘‘Domain adaptation for robot predictive
maintenance systems,’’ 2018, arXiv:1809.08626. PEEYUSH SINGHAL received the B.Tech. and
[348] T. Boucher, C. J. Cary, S. Mahadevan, and M. Dyar, ‘‘Aligning mixed M.Tech. degrees from the Indian Institute of Tech-
manifolds,’’ in Proc. 29th AAAI Conf. Artif. Intell., 2015, pp. 1–7. nology Bombay. He has over two decades of
[349] J. Yang, H. Zou, S. Cao, Z. Chen, and L. Xie, ‘‘MobileDA: Toward experience in software engineering, technology
edge-domain adaptation,’’ IEEE Internet Things J., vol. 7, no. 8, consulting, and data science and engineering. He is
pp. 6909–6918, Aug. 2020. currently a Doctor of Philosophy (Ph.D.) Scholar
[350] S. Mourragui, M. Loog, M. A. V. D. Wiel, M. J. T. Reinders, and at the Symbiosis Institute of Technology, Sym-
L. F. A. Wessels, ‘‘PRECISE: A domain adaptation approach to transfer biosis International University, Pune, India. His
predictors of drug response from pre-clinical models to tumors,’’ Bioin- research interest includes domain adaptation and
formatics, vol. 35, no. 14, pp. 510–519, 2019. its applications in various research tasks.
RAHEE WALAMBE (Senior Member, IEEE) KETAN KOTECHA received the M.Tech. and
received the M.Phil. and Ph.D. degrees from Ph.D. degrees from the Indian Institute of Tech-
Lancaster University, U.K., in 2008. From nology Bombay. He is currently the Head of the
2008 to 2017, she was a Research Consultant with Symbiosis Centre for Applied AI, the Director
various organizations in the control and robotics of the Symbiosis Institute of Technology, and the
domain. Since 2017, she has been working as Dean of the Faculty of Engineering, Symbiosis
an Associate Professor with the Department of International University, Pune, India. He is also an
Electronics and Telecommunications, Symbiosis Expert in artificial intelligence and deep learning.
Institute of Technology, Symbiosis International He has published widely in a number of excellent
University, Pune, India. Her research interests peer-reviewed journals on various topics ranging
include applied deep learning and AI in the field of robotics and healthcare. from cutting-edge AI, education policies, teaching-learning practices, and
AI for all. He was a recipient of multiple international research grants and
awards.