OptimalTransportforDomainAdaptation资源-CSDN下载

需积分: 50 170 浏览量 2018-12-20 16:13:36 上传评论收藏 3.74MB PDF 举报

资源推荐

资源详情

资源评论

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. X, JANUARY XX 1

Optimal Transport for Domain Adaptation

Nicolas Courty, Rémi Flamary, Devis Tuia, Senior Member, IEEE,

Alain Rakotomamonjy, Member, IEEE

Abstract—Domain adaptation is one of the most challenging tasks of modern data analytics. If the adaptation is done correctly,

models built on a speciﬁc data representation become more robust when confronted to data depicting the same classes, but

described by another observation system. Among the many strategies proposed, ﬁnding domain-invariant representations has

shown excellent properties, in particular since it allows to train a unique classiﬁer effective in all domains. In this paper, we

propose a regularized unsupervised optimal transportation model to perform the alignment of the representations in the source

and target domains. We learn a transportation plan matching both PDFs, which constrains labeled samples of the same class

in the source domain to remain close during transport. This way, we exploit at the same time the labeled samples in the source

and the distributions observed in both domains. Experiments on toy and challenging real visual adaptation examples show

the interest of the method, that consistently outperforms state of the art approaches. In addition, numerical experiments show

that our approach leads to better performances on domain invariant deep learning features and can be easily adapted to the

semi-supervised case where few labeled samples are available in the target domain.

Index Terms—Unsupervised Domain Adaptation, Optimal Transport, Transfer Learning, Visual Adaptation, Classiﬁcation.

1 INTRODUCTION

ODERN data analytics are based on the avail-

ability of large volumes of data, sensed by a

variety of acquisition devices and at high temporal

frequency. But this large amounts of heterogeneous

data also make the task of learning semantic concepts

more difﬁcult, since the data used for learning a

decision function and those used for inference tend

not to follow the same distribution. Discrepancies

(also known as drift) in data distribution are due

to several reasons and are application-dependent. In

computer vision, this problem is known as the vi-

sual adaptation domain problem, where domain drifts

occur when changing lighting conditions, acquisition

devices, or by considering the presence or absence of

backgrounds. In speech processing, learning from one

speaker and trying to deploy an application targeted

to a wide public may also be hindered by the dif-

ferences in background noise, tone or gender of the

speaker. In remote sensing image analysis, one would

like to leverage from labels deﬁned over one city

image to classify the land occupation of another city.

The drifts observed in the probability density function

(PDF) of remote sensing images are caused by variety

of factors: different corrections for atmospheric scat-

tering, daylight conditions at the hour of acquisition

or even slight changes in the chemical composition of

the materials.

For those reasons, several works have coped with

these drift problems by developing learning methods

able to transfer knowledge from a source domain to

a target domain for which data have different PDFs.

Learning in this PDF discrepancy context is denoted

Manuscript received January 2015; revised September 2015.

as the domain adaptation problem [37]. In this work,

we address the most difﬁcult variant of this problem,

denoted as unsupervised domain adaptation, where

data labels are only available in the source domain.

We tackle this problem by assuming that the effects

of the drifts can be reduced if data undergo a phase

of adaptation (typically, a non-linear mapping) where

both domains look more alike.

Several theoretical works [2], [36], [22] have empha-

sized the role played by the divergence between the

data probability distribution functions of the domains.

These works have led to a principled way of solving

the domain adaptation problem: transform data so as

to make their distributions “closer”, and use the label

information available in the source domain to learn

a classiﬁer in the transformed domain, which can be

applied to the target domain. Our work follows the

same intuition and proposes a transformation of the

source data that ﬁts a least effort principle, i.e. an

effect that is minimal with respect to a transformation

cost or metric. In this sense, the adaptation problem

boils down to: i) ﬁnding a transformation of the input

data matching the source and target distributions and

then ii) learning a new classiﬁer from the transformed

source samples. This process is depicted in Figure 1.

In this paper, we advocate a solution for ﬁnding this

transformation based on optimal transport.

Optimal Transport (OT) problems have recently

raised interest in several ﬁelds, in particular because

OT theory can be used for computing distances

between probability distributions. Those distances,

known under several names in the literature (Wasser-

stein, Monge-Kantorovich or Earth Mover distances)

have important properties: i) They can be evalu-

ated directly on empirical estimates of the distribu-

arXiv:1507.00504v2 [cs.LG] 22 Jun 2016

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. X, JANUARY XX 2

Dataset

Class 1

Class 2

Samples

Classifier on

Optimal transport

Samples

Classification on transported samples

Samples

Classifier on

Fig. 1: Illustration of the proposed approach for domain adaptation. (left) dataset for training, i.e. source

domain, and testing, i.e. target domain. Note that a classiﬁer estimated on the training examples clearly does

not ﬁt the target data. (middle) a data dependent transportation map T

γ0

is estimated and used to transport

the training samples onto the target domain. Note that this transformation is usually not linear. (right) the

transported labeled samples are used for estimating a classiﬁer in the target domain.

tions without having to smoothen them using non-

parametric or semi-parametric approaches; ii) By ex-

ploiting the geometry of the underlying metric space,

they provide meaningful distances even when the

supports of the distributions do not overlap. Leverag-

ing from these properties, we introduce a novel frame-

work for unsupervised domain adaptation, which

consists in learning an optimal transportation based

on empirical observations. In addition, we propose

several regularization terms that favor learning of

better transformations w.r.t. the adaptation problem.

They can either encode class information contained

in the source domain or promote the preservation

of neighborhood structures. An efﬁcient algorithm is

proposed for solving the resulting regularized op-

timal transport optimization problem. Finally, this

framework can also easily be extended to the semi-

supervised case, where few labels are available in the

target domain, by a simple and elegant modiﬁcation

in the optimal transport optimization problem.

The remainder of this Section presents related

works, while Section 2 formalizes the problem of un-

supervised domain adaptation and discusses the use

of optimal transport for its resolution. Section 3 intro-

duces optimal transport and its regularized version.

Section 4 presents the proposed regularization terms

tailored to ﬁt the domain adaptation constraints. Sec-

tion 5 discusses algorithms for solving the regular-

ized optimal transport problem efﬁciently. Section 6

evaluates the relevance of our domain adaptation

framework through both synthetic and real-world

examples.

1.1 Related works

Domain adaptation. Domain adaptation strategies

can be roughly divided in two families, depending

on whether they assume the presence of few labels

in the target domain (semi-supervised DA) or not

(unsupervised DA).

In the ﬁrst family, methods which have been pro-

posed include searching for projections that are dis-

criminative in both domains by using inner products

between source samples and transformed target sam-

ples [42], [32], [29]. Learning projections, for which

labeled samples of the target domain fall on the

correct side of a large margin classiﬁer trained on

the source data, have also been proposed [27]. Several

works based on extraction of common features under

pairwise constraints have also been introduced as

domain adaptation strategies [26], [52], [47].

The second family tackles the domain adaptation

problem assuming, as in this paper, that no labels are

available in the target domain. Besides works dealing

with sample reweighting [46], many works have con-

sidered ﬁnding a common feature representation for

the two (or more) domains. Since the representation,

or latent space, is common to all domains, projected

labeled samples from the source domain can be used

to train a classiﬁer that is general [18], [38]. A common

strategy is to propose methods that aim at ﬁnding rep-

resentations in which domains match in some sense.

For instance, adaptation can be performed by match-

ing the means of the domains in the feature space [38],

aligning the domains by their correlations [33] or

by using pairwise constraints [51]. In most of these

works, feature extraction is the key tool for ﬁnding

a common latent space that embeds discriminative

information shared by all domains.

Recently, the unsupervised domain adaptation

problem has been revisited by considering strategies

based on a gradual alignment of a feature repre-

sentation. In [24], authors start from the hypothesis

that domain adaptation can be better estimated when

comparing gradual distortions. Therefore, they use

intermediary projections of both domains along the

Grassmannian geodesic connecting the source and

target eigenvectors. In [23], [54], all sets of trans-

formed intermediary domains are obtained by using

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. X, JANUARY XX 3

a geodesic-ﬂow kernel. While these methods have

the advantage of providing easily computable out-

of-sample extensions (by projecting unseen samples

onto the latent space eigenvectors), the transformation

deﬁned remains global and is applied in the same way

to the whole target domain. An approach combining

sample reweighting logic with representation trans-

fer is found in [53], where authors extend the sam-

ple re-weighing to reproducing kernel Hilbert space

through the use of surrogate kernels. The transforma-

tion achieved is again a global linear transformation

that helps in aligning domains.

Our proposition strongly differs from those re-

viewed above, as it deﬁnes a local transformation

for each sample in the source domain. In this sense,

the domain adaptation problem can be seen as a

graph matching problem [35], [10], [11] as each source

sample has to be mapped on target samples under the

constraint of marginal distribution preservation.

Optimal Transport and Machine Learning. The op-

timal transport problem has ﬁrst been introduced

by the French mathematician Gaspard Monge in the

middle of the 19th century as a way to ﬁnd a mini-

mal effort solution to the transport of a given mass

of dirt into a given hole. The problem reappeared

in the middle of the 20th century in the work of

Kantorovitch [30] and found recently surprising new

developments as a polyvalent tool for several funda-

mental problems [49]. It was applied in a wide panel

of ﬁelds, including computational ﬂuid mechanics [3],

color transfer between multiple images or morphing

in the context of image processing [40], [20], [5], inter-

polation schemes in computer graphics [6], and eco-

nomics, via matching and equilibriums problems [12].

Despite the appealing properties and application

success stories, the machine learning community has

considered optimal transport only recently (see, for

instance, works considering the computation of dis-

tances between histograms [15] or label propagation

in graphs [45]); the main reason being the high com-

putational cost induced by the computation of the

optimal transportation plan. However, new comput-

ing strategies have emerged [15], [17], [5] and made

possible the application of OT distances in operational

settings.

2 OPTIMAL TRANSPORT AND APPLICATION

TO DOMAIN ADAPTATION

In this section, we present the general unsupervised

domain adaptation problem and show how it can be

addressed from an optimal transport perspective.

2.1 Problem and theoretical motivations

Let Ω ∈ R

be an input measurable space of di-

mension d and C the set of possible labels. P(Ω)

denotes the set of all probability measures over Ω. The

standard learning paradigm assumes the existence of

a set of training data X

= {x

}

i=1

associated with

a set of class labels Y

= {y

}

i=1

, with y

∈ C, and

a testing set X

= {x

}

i=1

with unknown labels. In

order to infer the set of labels Y

associated with

, one usually relies on an empirical estimate of the

joint probability distribution P(x, y) ∈ P(Ω × C) from

, Y

), and assumes that X

and X

are drawn from

the same distribution P(x) ∈ P(Ω).

2.2 Domain adaptation as a transportation prob-

lem

In domain adaptation problems, one assumes the

existence of two distinct joint probability distributions

, y) and P

, y), respectively related to a source

and a target domains, noted as Ω

and Ω

. In the

following, µ

and µ

are their respective marginal

distributions over X. We also denote f

and f

the true

labeling functions, i.e. the Bayes decision functions in

each domain.

At least one of the two following assumptions is

generally made by most domain adaptation methods:

• Class imbalance: Label distributions are different

in the two domains (P

(y) 6= P

(y)), but the con-

ditional distributions of the samples with respect

to the labels are the same (P

|y) = P

|y));

• Covariate shift: Conditional distributions of

the labels with respect to the data are equal

(y|x

) = P

(y|x

), or equivalently f

= f

f). However, data distributions in the two do-

mains are supposed to be different (P

) 6=

)). For the adaptation techniques to be ef-

fective, this difference needs to be small [2].

In real world applications, the drift occurring between

the source and the target domains generally implies a

change in both marginal and conditional distributions.

In our work, we assume that the domain drift is due

to an unknown, possibly nonlinear transformation of

the input space T : Ω

→ Ω

. This transformation

may have a physical interpretation (e.g. change in the

acquisition conditions, sensor drifts, thermal noise,

etc.). It can also be directly caused by the unknown

process that generates the data. Additionnally, we

also suppose that the transformation preserves the

conditional distribution, i.e.

(y|x

) = P

(y|T(x

)).

This means that the label information is preserved by

the transformation, and the Bayes decision functions

are tied through the equation f

(T(x)) = f

(x).

Another insight can be provided regarding the

transformation T. From a probabilistic point of view,

T transforms the measure µ in its image measure, noted

T#µ, which is another probability measure over Ω

satisfying

T#µ(x) = µ(T

−1

(x)), ∀x ∈ Ω

(1)

剩余13页未读，继续阅读

评论收藏

内容反馈

wdqkdzz

粉丝: 0

Optimal Transport for Domain Adaptation

Computational Optimal Transport

optimal transport on comples networks

Graph-Optimal-Transport:ICML 2020代码“跨域对齐的图形最佳传输”

Optimal Transport for Applied Mathematicians.pdf

这是最优传输理论（optimal transport theory）实现的工具箱，里面包含了完整的matlab代码。

Optimal Transport and Wasserstein Distance

论文bola: near-optimal bitrate adaptation for online videos的代码流程图

Optimal Time-Domain Noise Reduction Filters--A Theoretical Study.pdf

可解释性深度生成模型-最优传输理论- Optimal Transport old and new

Python-StyleTransferbyRelaxedOptimalTransportandSelfSimilarity

Optimal Control Theory for Applications

Optimal Transport Models in Economics.pdf

VocabularyLearningviaOptimalTransportforNeuralMachineTranslation

OT_CycleGAN:Optimal Transport-driven CycleGAN（OT-CycleGAN）的实现

Optimal Trajectory Generation for Dynamic Street Scenarios in a Frene´t Frame

An optimal algorithm for finding segment intersections

Optimal_Transmit_Covariance_for_Ergodic_MIMO_Channels

Intelligent Optimal Adaptive Control for Mechatronic Systems

Optimal-Scheduling-for-Charging-and-Discharging-of-Electric-Vehicles-master.zip

Optimal Kalman Filtering for System with Unknown Inputs

Applied Optimal Control for Dynamically Stable Legged Locomotion

Reinforcement Learning for Optimal Feedback Control

Parameter Learning for Performance Adaptation

无人车规划经典论文

Distributed Optimal Consensus Control for Multiagent Systems With Input Delay

Optimal Training Sequence for MIMO Wireless

an optimal algorithm for sink finding

taoqick 搜索自己CSDN博客

钢琴88键独立音频文件.zip

最新资源