Earthmover'sdistance资源-CSDN下载

5星 · 超过95%的资源需积分: 43 54 浏览量 2009-11-25 12:42:16 上传评论收藏 576KB PDF 举报

### Earth Mover’s Distance (EMD)：一种用于图像检索的度量方法 #### 概述《Earth Mover’s Distance (EMD) 作为图像检索的一种度量》这篇文章主要探讨了一种名为“Earth Mover’s Distance（EMD）”的距离度量方法，并将其应用于基于内容的图像检索中。EMD 是一种衡量两个概率分布之间差异的方法，其基本思想是计算将一个分布转换为另一个分布所需的最小代价。这种度量方法在图像检索领域具有很高的实用价值，因为它能够更准确地反映人眼对图像相似性的感知。 #### EMD 的定义与原理 EMD 的概念最早由 Peleg、Werman 和 Rom 在解决某些视觉问题时提出。在图像检索的应用中，研究者结合了一种基于向量量化（Vector Quantization）的表示方案来处理分布，从而形成了一种新的图像比较框架。这种框架通常比其他已有的方法更能准确地捕捉到感知上的相似性。 EMD 的计算基于线性优化中的运输问题（Transportation Problem），这是一个经典的运筹学问题。对于给定的需求方和供给方，目标是最小化从供给方到需求方分配物品的总成本。在这个问题中，“物品”可以理解为分布在图像中的像素或特征，“成本”则是移动这些像素或特征的成本。 #### 应用与优势 EMD 在图像检索中的应用主要集中在颜色和纹理两个方面。通过比较不同图像的颜色和纹理分布之间的 EMD，可以有效地检索出与查询图像相似的图像集合。与其他距离度量方法相比，EMD 具有以下显著优势： 1. **更好的感知相似性**：EMD 能够更好地反映人类对图像相似性的感知，因为它考虑了不同分布之间的“距离”，而不仅仅是简单地比较直方图。 2. **鲁棒性**：相比于直方图匹配技术，EMD 更加鲁棒。它可以处理长度可变的分布表示形式，避免了量化和其他直方图常见的问题。 3. **部分匹配**：EMD 自然地支持部分匹配，这意味着即使两幅图像不完全相同，也能够根据它们共享的部分进行匹配。 4. **真度量**：当用于比较具有相同总质量的分布时，EMD 是一个真正的度量，满足非负性、对称性和三角不等式等度量的基本属性。 #### 实现与算法 EMD 的实现基于运输问题的有效算法，这类算法在计算上通常是高效的。例如，匈牙利算法（Hungarian Algorithm）是一种解决此类问题的经典算法之一，它可以在多项式时间内找到最优解。 #### 结论《Earth Mover’s Distance (EMD) 作为图像检索的一种度量》一文详细介绍了 EMD 方法及其在图像检索领域的应用。EMD 作为一种强大的工具，不仅能够有效提升图像检索系统的性能，还能更好地模拟人类视觉系统对图像相似性的判断标准。未来的研究还可以进一步探索 EMD 在其他计算机视觉任务中的应用潜力，如对象识别、场景分类等领域。

资源推荐

资源详情

资源评论

International Journal of Computer Vision 40(2), 99–121, 2000

° 2000 Kluwer Academic Publishers. Manufactured in The Netherlands.

The Earth Mover’s Distance as a Metric for Image Retrieval

YOSSI RUBNER, CARLO TOMASI AND LEONIDAS J. GUIBAS

Computer Science Department, Stanford University, Stanford, CA 94305, USA

[email protected]

Abstract. We investigate theproperties of a metric between two distributions, the EarthMover’s Distance(EMD),

for content-based image retrieval. The EMD is based on the minimal cost that must be paid to transform one dis-

tribution into the other, in a precise sense, and was ﬁrst proposed for certain vision problems by Peleg, Werman,

and Rom. For image retrieval, we combine this idea with a representation scheme for distributions that is based on

vector quantization. This combination leads to an image comparison framework that often accounts for perceptual

similarity better than other previously proposed methods. The EMD is based on a solution to the transportation

problem from linear optimization, for which efﬁcient algorithms are available, and also allows naturally for partial

matching. It is more robust than histogram matching techniques, in that it can operate on variable-length represen-

tations of the distributions that avoid quantization and other binning problems typical of histograms. When used to

compare distributions with the same overall mass, the EMD is a true metric. In this paper we focus on applications

to color and texture, and we compare the retrieval performance of the EMD with that of other distances.

Keywords: image retrieval, perceptual metrics, color, texture, Earth Mover’s Distance

1. Introduction

Multidimensional distributions are often used in com-

puter vision to describe and summarize different fea-

tures of an image. For example, the one-dimensional

distribution of image intensities describes the overall

brightness content of a gray-scale image, and a three-

dimensional distribution can play a similar role for

color images. The texture content of an image can be

described by a distribution of local signal energy over

frequency. These descriptors can be used in a variety

of applications including, for example, image retrieval.

It is often advantageous to ‘compress’ or otherwise

approximate an original distribution by another distri-

bution with a more compact description. This yields

important savings in storage and processing time, and

most importantly, as we will see, a certain perceptual

robustness to the matching. Multidimensional distri-

butions are usually compressed by partitioning the un-

derlying space into a ﬁxed number of bins, usually of

a predeﬁned size: the resulting quantized data struc-

ture is a histogram. However, even when the binning

is adaptive, based on the overall distribution of the fea-

tures ofall the images in the database, often for speciﬁc

images only a small fraction of the bins in a histogram

contain signiﬁcant information. For instance, when

considering color, a picture of a desert landscape con-

tains mostly blue pixels in the sky region and yellow-

brown pixels in the rest. A ﬁnely quantized histogram

in this case is highly inefﬁcient. On the other hand,

a multitude of colors is a characterizing feature for a

picture of a carnival in Rio, and a coarsely quantized

histogram would be inadequate. In brief, because his-

tograms are ﬁxed-size structures, they cannot achieve

a balance between expressiveness and efﬁciency.

In contrast, we propose variable-size descriptions

of distributions. In our signatures, as we call these

descriptions, the dominant clusters are extracted from

the original distribution using a clustering algorithm

such as vector quantization, and are used to form its

100 Rubner, Tomasi and Guibas

compressed representation. A signature is a set of the

main clusters or modes of a distribution, each repre-

sented by a single point (the cluster center) in the un-

derlying space, together with a weight that denotes the

size of that cluster. Simple images have short signa-

tures, complex images have long ones. Of course, in

some applications, ﬁxed-size histograms may still be

adequate, and can be considered as special cases of

signatures.

In addition to histograms and signatures which are

based on global or local tessellation of the space into

non-overlapping regions, there are other techniques to

describe non-parametric distributions. For example, in

kernel density estimation (Duda and Hart, 1973), each

data point is replaced by some kernel and the density

estimations is regarded as the superposition of all these

kernels. These techniques are out of the scope of this

paper.

Given two distributions, it is often useful to deﬁne a

quantitative measure of their dissimilarity, with the in-

tent of approximating perceptual dissimilarity as well

as possible. This is particularly important in image re-

trieval applications, but has fundamental implications

alsofor the understandingoftexturediscriminationand

color perception. Deﬁning a distance between two dis-

tributions requires ﬁrst a notion of distance between

the basic features that are aggregated into the distribu-

tions. We call this distance the ground distance.For

instance, in the case of color, the ground distance mea-

sures dissimilarity between individual colors. Fortu-

nately, color ground distance has been carefully stud-

ied in the literature of psychophysics, and has led to

measures like the CIE-Lab color space (Wyszecki and

Stiles, 1982). To be sure, thisspacewasdesignedbased

on psychophysical experiments where colors were pre-

sented in pairs and ona neutral background. While this

limits the appropriateness of this space for the more

complex situations encountered in retrieval, we believe

that it is hard to do better than CIE-Lab without explic-

itly modelling thegeometric layoutof colors inimages.

While RGB space has proven clearly inadequate in our

experiments, it is possible that other spaces, such as

HSV, may lead to performance similar to that obtained

with CIE-Lab.

In this paper, we address the problem of lifting these

distances from individual features to full distributions.

In other words, we want to deﬁne a consistent mea-

sure of distance, or dissimilarity, between two distri-

butions of mass in a space that is itself endowed with a

grounddistance.Forcolor, thismeansﬁndingdistances

between image color distributions. For texture, we

locally describe the texture content of a small neigh-

borhood in an image as distribution of energy in the

frequency domain. The “lifted” distance is a distance

between distributions of such local descriptors over the

entire images, regarded as distribution of textures.

Mathematically, it would be convenient if these dis-

tribution distances were true metrics, which would

lead to more efﬁcient data structures and search al-

gorithms (Bozkaya and Ozsoyoglu, 1997; Clarkson,

1997). Practically, it is important that distances be-

tween distributions correlate with human perception.

In this paper we strive to achieve both goals. For the

ﬁrst we have proof, for the second we show experi-

ments. We also would like these distances to allow

for partial matches when one distribution is compared

to a subset of the other. For partial matches, the dis-

tances we deﬁne are not metric. Concerning this point,

we refer to Tversky’s discussion (Tversky, 1977) of

the non-metric nature of perceptual distances. From

a practical point of view, our measure deals naturally

both with full, metric matches and with partial, non-

metric matches.

In this paper we capitalize on the old transportation

problem (Rachev, 1984; Hitchcock, 1941) from linear

optimization, which was ﬁrst introduced into computer

vision by Peleg et al. (1989) to measure the distance

between two gray-scale images. For image retrieval,

weuse this distancemeasuretocompare two signatures

in coloror texture space. As discussedin moredetail in

the next section, this leads to very different computa-

tional properties, mainly because signaturesrather than

pixels are compared to each other. We give the name

of Earth Mover’s Distance (EMD), suggested by Stolﬁ

(1994), to this metric in this new context. The trans-

portation problem is to ﬁnd the minimal cost that must

be paid totransformone distribution into theother. The

EMD is based on a solution to the transportation prob-

lem for which efﬁcient algorithms are available, and

it has many desirable properties for image retrieval, as

we will see. It is also more robust in comparison to

other histogram matching techniques, in that it suffers

from no arbitrary quantization problems due to rigid

binning, and it tolerates well some amount of defor-

mations that shift features in the feature space. This

robustness results in increased precision for image re-

trieval. It allows for partial matching, and hence natu-

rally supports partial image retrieval queries. It can be

applied to signatures with different sizes, which leads

to better storage utilization. When used to compare

The Earth Mover’s Distance 101

distributions that have the same overall mass, the EMD

is a true metric.

In this paper we focus on applications of the EMD

to color and texture images. In the next section, we

introduce histograms and survey some of the existing

measures of dissimilarity and their drawbacks. Then,

in Sections 3 and 4, we introduce the concepts of a

signature and of the Earth Mover’s Distance (EMD),

which we apply to color and texture in Section 5. We

compare the results of image retrieval using the EMD

with those obtained with other metrics, and demon-

strate the unique properties of the EMD for texture-

based retrieval. Section 6 concludes with a summary

and plans for future work.

2. Previous Work

Image retrieval systems usually represent image fea-

tures by multi-dimensional histograms. For exam-

ple, the color content of an image is deﬁned by the

distribution of its pixels in some color space (Swain

and Ballard, 1991; Hafner et al., 1995; Belongie

et al., 1998). Texture features are commonly deﬁned

by energy distributions in the spatial frequency do-

main (Farrokhnia and Jain, 1991; Big¨un and Buf, 1994;

Manjunath and Ma, 1996). Image databases are in-

dexed by histograms of these distributions, and those

images that have the closest histograms to that spec-

iﬁed in the query are retrieved. For such a search, a

measure of dissimilarity between histograms must be

deﬁned. In this section we formally deﬁne histograms,

and discuss some of the most common histogram dis-

similarity measures thatare usedfor image retrieval. In

Section4wedeﬁnetheEMD.Inadditiontohistograms,

this distance is well deﬁned also for signatures, deﬁned

in Section 3. In Section 5 we also compare the EMD

with the other methods surveyed below.

A histogram {h

} is a mapping from a set of

d-dimensional integer vectors i to the set of nonneg-

ative reals. These vectors typically represent bins (or

their centers) in a ﬁxed partitioning of the relevant re-

gion of the underlying feature space, and the associ-

ated reals are a measure of the mass of the distribution

that falls into the corresponding bin. For instance, in

a grey-level histogram, d is equal to one, the set of

possible grey values is split into N intervals, and h

is the number of pixels in an image that have a grey

value in the interval indexed by i (a scalar in this case).

The ﬁxed partitioning of the feature space does not

have to be regular. If the distribution of features of all

the images is known a priori, adaptive binning can be

used.

Several measureshavebeen proposed for the dissim-

ilaritybetween twohistograms H ={h

}and K ={k

We divide them into two categories. The bin-by-bin

dissimilarity measures only compare contents of cor-

responding histogram bins, that is, they compare h

and k

for all i, but not h

and k

for i 6= j. The

cross-bin measures also contain terms that compare

non-corresponding bins. To this end, cross-bin dis-

tances make use of the ground distance d

, deﬁned as

the distance between the representative features for bin

i and bin j. Predictably, bin-by-bin measures are more

sensitive to the position of bin boundaries.

2.1. Bin-By-Bin Dissimilarity Measures

In thiscategory only pairs ofbins in the two histograms

thathavethesameindexarematched. The dissimilarity

between two histograms is a combination of all the

pairwise comparisons. A ground distance is used by

these measures only implicitly and in an extreme form:

features that fall into the same bin are close enough to

each other to be considered the same, and those that

do not are too far apart to be considered similar. In

this sense, bin-by-bin measures imply a binary ground

distance with a threshold depending on bin size.

Minkowski-Form Distance:

(H, K) =

− k

1/r

The L

distance is often used for computing dissim-

ilarity between color images (Swain and Ballard,

1991). Other common usages are L

and L

∞

.In

Stricker and Orengo (1995) it was shown that for

image retrieval the L

distance results in many

false negatives because neighboring bins are not

considered.

Histogram Intersection:

∩

(H, K) = 1 −

min(h

, k

)

The histogram intersection (Swain and Ballard,

1991) is attractive because of its ability to handle

partial matches when the areas of the two histograms

(thesumoverall the bins) aredifferent. It isshownin

Swain and Ballard (1991) that when the areas of the

two histograms are equal, the histogram intersection

is equivalent to the (normalized) L

distance.

102 Rubner, Tomasi and Guibas

Kullback-Leibler Divergence and Jeffrey Divergence:

The Kullback-Leibler (K-L) divergence (Kullback,

1968) is deﬁned as follows:

(H, K) =

log

From the information theory point of view, the

K-L divergence has the property that it measures

how inefﬁcient on average it would be to code one

histogram using the other as the code-book (Cover

and Thomas, 1991). However, the K-L divergence

is non-symmetric and is sensitive to histogram bin-

ning. The empirically derived Jeffrey divergence is

a modiﬁcation of the K-L divergence that is numer-

ically stable, symmetric and robust with respect to

noise and the size of histogram bins (Puzicha et al.,

1997). It is deﬁned as:

(H, K) =

log

+ k

log

where m

Statistics:

(H, K) =

(

− m

)

where again m

. This distance measures how

unlikely it is that one distribution was drawn from

the population represented by the other.

Figure 1. Examples where the L

distance (as a representative of bin-by-bin dissimilarity measures) and the quadratic-form distance do not

match perceptual dissimilarity. Assuming that histograms have unit mass, (a) d

, k

) = 2, d

, k

) = 1. (b) d

, k

) = 0.1429,

, k

) = 0.0893. Perceptual dissimilarity is based on correspondence between bins in the two histograms. Figures (c) and (d) show the

desired correspondences for (a) and (b) respectively.

These dissimilarity deﬁnitions can be appropriate in

different areas. For example, the Kullback-Leibler di-

vergence is justiﬁed by information theory and the χ

statistics by statistics. However, these measures do

not necessarily match perceptual similarity well. Their

major drawback is that they account only for the cor-

respondence between bins with the same index, and

do not use information across bins. This problem is

illustrated in Fig. 1(a) which shows two pairs of one-

dimensional gray-scale histograms. For instance, the

distance between the two histograms on the left is

larger than the L

distance between the two histograms

on theright, in contrast to perceptual dissimilarity. The

desired distance should be based on correspondences

between bins in the two histograms and on the ground

distancebetweenthem as shownin part (c) of the ﬁgure.

Another drawback of bin-by-bin dissimilarity mea-

sures is their sensitivity to bin size. A binning that is

toocoarse will nothave sufﬁcientdiscriminative power,

while a binning that is too ﬁne will place similar fea-

tures in different bins which will never be matched.

On the other hand, cross-bin dissimilarity measures,

described next, always yield better results with smaller

bins.

2.2. Cross-Bin Dissimilarity Measures

When a ground distance that matches perceptual dis-

similarity is available for single features, incorporating

this information results in perceptually more meaning-

ful dissimilarity measures.

剩余22页未读，继续阅读

评论收藏

内容反馈

chenduo621291966

2012-07-19

很好的资源，谢谢了，要是中文的就更好了

crazyfln

粉丝: 0

Earth mover's distance

最新资源

Earth mover's distance

Earth Movers Distance

EMD(Earth Mover's Distance)

python-emd:围绕Yossi Rubner的Earth Mover's Distance实现（http的Python包装器

python emd算法

context-mover-distance-and-barycenters:AISTATS 2020论文随附的代码

仿真电路以及操作方法

【纯干货啊】华为IPD流程管理(完整版).pptx

信号与系统——保研复习资料.pdf

python大作业 含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar

可编程语言标准IEC61131-3中文版.pdf

Landsat_WRS2.zip

数字信号处理——保研复习资料.pdf

使用STM32F103C8T6+L298N+MG513P30电机使用外部中断法和输入捕获法进行编码器测速

系统规划与管理师全套资料.zip

线性代数——保研复习资料.pdf

中国行政区划矢量数据WGS84免费下载

离散数学及其应用第八版偶数题答案

通信原理——保研复习资料.pdf

风电和储能并网Simulink模型

模型预测控制陈虹.pdf

数字信号处理PDF完整本(程佩青)-专业指导文档类资源

CISP题库（带答案）【精品】

专接本_汇编工具_2021年授课专用.exe

数字孪生体白皮书(全版)2019完成.pdf

innovus命令手册.pdf

调幅发射机及超外差式接收机仿真电路Multisim12.zip

VCS_User_Guide.pdf

【人工智能】计算机视觉之OpenCV学习详解一

Java编程精粹：SCJP认证备考指南

最新资源

python大作业含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar