0% found this document useful (0 votes)

18 views8 pages

2008 - CVPR-gjqi - Two-Dimensional Active Learning For Image Classification

Two-dimensional active learning is proposed for multi-label image classification. Traditional active learning selects samples only, but for multi-label problems this is sub-optimal as it does not consider label correlations. The proposed method selects sample-label pairs to minimize classification error, considering both the sample and label dimensions. It argues that only some effective labels need annotation while others can be inferred from label correlations. Experiments on real-world applications show it outperforms methods that do not use label correlations.

Uploaded by

thuyntt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views8 pages

2008 - CVPR-gjqi - Two-Dimensional Active Learning For Image Classification

Uploaded by

thuyntt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Two-Dimensional Active Learning for Image Classification

†
Guo-Jun Qi, ‡ Xian-Sheng Hua, ‡ Yong Rui, † Jinhui Tang, ‡ Hong-Jiang Zhang
†
MOE-Microsoft Key Laboratory of Multimedia Computing and Communication
& Department of Automation, University of Science and Technology of China
{qgj, jhtang}@mail.ustc.edu.cn
‡
Microsoft Research Asia
{xshua, yongrui, hjzhang}@microsoft.com

Abstract age classification, as it can significantly reduce the human

cost in labeling training samples. Specifically, active learn-
In this paper, we propose a two-dimensional active ing methods iteratively annotate a set of elaborately se-
learning scheme and show its application in image classifi- lected samples so that the classification error is minimized
cation. Traditional active learning methods select samples in each iteration. As a result, the total number of training
only along the sample dimension. While this is the right samples that need to be labeled is smaller than non active
strategy in binary classification, it is sub-optimal for multi- learning approaches. It is clear that the core of any ac-
label classification. In multi-label classification, we argue tive learning approach is the sample selection strategy. In
that, for each selected sample, only a part of more effec- the past decade, a number of active learning approaches
tive labels are necessary to be annotated while others can were developed by using different sample selection strate-
be inferred by exploring the correlations among the labels. gies [14][4][10][8]. Most of these approaches focus on
The reason is that the contributions of different labels to the binary or multi-class classification scenario [10][4][15].
minimizing the classification error are different due to the However, in many real-world applications such as image
inherent label correlations. To this end, we propose to se- and video retrieval [2][12], text search [16] and bioinfor-
lect sample-label pairs, rather than only samples, to mini- matics [6], a sample is usually associated with multiple la-
mize a multi-label Bayesian classification error bound. This bels rather than a single one. Under such a multi-label set-
new active learning strategy not only considers the sam- ting, each sample will be annotated as “positive” or “nega-
ple dimension but also the label dimension, and we call it tive” for each and every label (See figure 1 for some exam-
Two-Dimensional Active Learning (2DAL). We also show ples). As a result, active learning with multi-labeled sam-
that the traditional active learning formulation is a special ples is much more challenging than that with binary-labeled
case of 2DAL when there is only one label. Extensive ex- ones, especially when the number of labels is large.
periments conducted on two real-world applications show A direct way to tackle active learning under multi-label
that the 2DAL significantly outperforms the best existing ap- setting is to translate it into a set of binary problems, i.e.,
proaches which did not take label correlation into account. each category/label is independently handled by a binary
active learning algorithm. For example, in [11][3] two re-
1. Introduction search groups have proposed such a binary-based active
learning algorithm for multi-labeled classification problem,
Image semantic understanding is typically formulated as
respectively. However, this type of approaches does not take
either a multi-class or a multi-label classification problem
into account the inherent relationships among multiple la-
[15][2]. In the multi-class setting, each image is classi-
bels. In this paper, we propose a novel active learning strat-
fied into one and only one predefined category. In other
egy which iteratively selects sample-label pairs to minimize
words, only one label is assigned to an image in this setting.
Real-world applications [2], however, require that one or
multiple labels can be assigned to an image. This require-
ment results in multi-label classification, which is signifi-
Field P N N
cantly more challenging, and will be the focus of this paper. Mountain P N P
Specifically, we will use active learning as the tool, and ex- Urban N P N

tend it from a one-dimensional sample-centric approach to Beach N P P

a two-dimensional joint sample-label-centric approach for People N P N

multi-label image classification. Figure 1. Some examples of multi-labeled images. “P” means pos-
Active learning is one of the most used methods in im- itive label and “N” means negative label.

1
Label Dimension Label Dimension

the expected classification error. Specifically, in each iter- Before 2DAL After 2DAL
... ... ... ...
ation, the annotators are only required to annotate/confirm
X1 ? ? P X1 S ? P

... ... ... ... ... ... ... ...

a selected part of labels of selected samples while the re- Xi ? ... P ... N Xi ? ... P ... N

maining unlabeled part will be inferred according to the la- ... ... ... ... ... ... ... ...
... ... ... ...
bel correlations. We call this strategy 2 Dimensional Active Xj ? ? ? 2DAL Xj S ? S

Sample ... ... ... ... Sample ... ... ... ...

Learning (2DAL) since it considers not only the samples to

Dimension Dimension

Xn P ... ? ... P Xn P ... ? ... P

be labeled along the sample dimension but also the labels Samples Labels Samples Labels

associated with these samples along the label dimension.

An intuitive explanation of this strategy is that there exist P Labeled Positive Concept N Labeled Negative Concept

both sample and label redundancies for multi-labeled sam- Unlabeled Concept Selected to be labeled Concept
? S
ples. Therefore, annotating a set of selected sample-label
pairs provides enough information for training the classi- Figure 2. Proposed two-dimension 2DAL strategy
fiers since the information in the selected pairs can be prop-
as follows. In section 2, we present the 2DAL selection
agated to the rest along both the sample and the label “di-
strategy used in the proposed active learning algorithm. We
mensions”. Unlike existing binary-based active learning
also show that the traditional active learning formulation is
strategies [11][3] which only take the sample redundancy
a special case of 2DAL when there is only one lable. After
into account, the 2DAL strategy additionally considers the
that, a Kernelized Maximum Entropy Model is proposed in
label dimension to leverage the rich relationships embedded
section 3 to model the label correlations. In addition, an
in the multiple labels. 2DAL efficiently selects an informa-
Expectation-Maximum (EM) algorithm is also given in this
tive part of the labels rather than all the labels for a partic-
section to solve the incomplete labeling problem. In section
ular selected sample. Such a strategy significantly reduces
4 we evaluate the proposed 2DAL with comparison with the
the required human labors during active learning. For ex-
state-of-the-art one dimensional active learning approach on
ample, Field and Mountain tend to occur simultaneously in
two real-world data sets. Finally, we conclude in section 5.
an image. Therefore, it is reasonable to select only one la-
bel (e.g., Mountain) for annotation since the uncertainty of
the other label can be remarkably decreased after annotating 2. Two-Dimensional Active Learning
this one. Another example is Mountain and Urban, in con- In this section, we start by detailing the underlying idea
trast to Field and Mountain, these two labels often do not of the proposed 2DAL strategy in multi-label setting from
occur simultaneously. Thus, annotating one of them most the perspective of information theory. Then, a Bayesian
likely will probably eliminate the existence of the other. error bound is derived that states the expected classifica-
To realize 2DAL, we will answer the following questions tion error given a selected sample-label pair. The pro-
in this paper: posed 2DAL strategy will then be deduced by selecting the
1 What is the proper selection strategy for finding the sample-label pairs which optimize this bound.
sample-label pairs? To address this issue, we formu-
late the selection of sample-label pairs as minimizing 2.1. The proposed 2DAL strategy
a derived Multi-Label Bayesian Classification Error Figure 2 illustrates the proposed 2DAL strategy. Dif-
Bound. We will demonstrate that selecting sample- ferent from the typical binary active learning formulation
label pairs in this way will significantly reduce the un- that selects the most informative samples for annotation, we
certainty of both the samples and the labels. jointly select both the samples and labels simultaneously.
2 How can we model the label relationships/correlations? The underlying assumption is that different labels of a cer-
Since the proposed 2DAL strategy utilizes the label tain sample have different contributions to minimizing the
dependencies to reduce labeling cost, the underlying expected classification error of the to-be-trained classifier.
classifier should be able to model the corresponding And annotating a portion of well-selected labels may pro-
label correlations. Accordingly, we propose a Ker- vide sufficient information for learning the classifier. As
nelized Maximum Entropy Model (KMEM) to model shown in Figure 1, this strategy trades off between the an-
such correlations. Furthermore, since the 2DAL strat- notation labors and the learning performance along two di-
egy only annotates a sub set of labels, we formulate an mensions, i.e., the sample and label dimensions. In essence,
Expectation-Maximum(EM) [5] algorithm to solve the the multi-label classifiers do have uncertainty along differ-
incomplete labeling problem. ent labels as well as different samples. Traditional active
To the best of our knowledge, we are the first to present the learning algorithms can be seen as a one-dimension active
study of active learning on the granularity of sample-label selection approach, which only reduces the sample uncer-
pairs, with both theoretical analysis and empirical results tainty. In contrast, 2DAL is a two-dimensional active learn-
on real-world data sets. The rest of the paper is organized ing strategy, which selects the most “informative” sample-
label pairs to reduce the uncertainty along the dimensionali- parts U (x) and L(x). Once ys is selected to ask for label-
ties of both sample and label. More specifically, along label ing¡ (but not yet annotated),
¢ the Bayesian classification error
dimension all of the labels correlatively interact. Therefore, E yi |ys , yL(x) , x for an unlabeled yi , i ∈ U (x) is bounded
once partial labels are annotated, the left unlabeled concepts as
can then be inferred based on label correlations. Theoreti- 1
¡ ¢ ¡ ¢
2 H yi¡|ys ; yL(x) , x − ¢ ² ≤ E yi |ys ; yL(x) , x
(1)
cally, the label correlations have a connection with the ex- ≤ 12 H yi |ys ; yL(x) , x
pected Bayesian Error Bound (see the following lemma and
theorem in section 2.2), and thus these label correlations can where
help to reduce the prediction errors in the testing set during ¡ ¢ P ¡ ¢
H yi |ys ; yL(x) , x = {−P yi = t, ys = r|yL(x) , x
the active learning procedure. This approach saves much ¡ t,r∈{0,1} ¢
labor from fully annotating multiple labels. Thus, it is far × log P yi = t|ys = r; yL(x) , x }
more efficient when the number of labels is huge. For in-
is the conditional entropy of yi given the selected part ys
stance, an image may be associated with thousands and hun-
( both yi and ys are random variables since they have
dreds of concepts. That means a full annotation strategy will
not been labeled) and yL(x) is the known labeled part;
pay large labor costs for only one image. On the contrary,
2DAL only selects the most informative labels for annota- ² = 21 log 45 is a constant.
This lemma will be proven in the appendix.
tion. In the following section, we will derive such a two- Remark 1. It is worth noting that this bound is irrelevant
dimension selection criterion based on a derived Bayesian to the true label of the selected ys . In fact, before the anno-
classification error bound in multi-label setting. tator gives the label of ys , the true value of ys is unknown.
On the other hand, it is worth noting that as illustrated However, no matter what ys holds, 1 or 0, this bound always
in Figure 2, during the learning process, some samples may holds.
be lack of some labels since only a partial of labels are an- Based on this lemma, we can obtain the following theo-
notated. This is different from traditional active learning rem which bounds the multi-label error:
algorithm. In the section 3.2, we will address how to learn Theorem 1. (Multi-labeled Bayesian classification error
the classification model from incomplete labels. bound) Under the condition of lemma 1, the Bayesian clas-
sification error bound E(y|ys ; yL(x) , x) for sample x over all
2.2. Multi-labeled Bayesian error bound the labels y is
¡ ¢
The 2DAL learner requests annotations on the basis of E y|ysP ; yL(x)©, x ¡ ¢ ¡ ¢ª
1 m
sample-label pairs which, once incorporated into the train- ≤ 2m i=1 H yi |yL(x) , x − M I yi ; ys |yL(x) , x
ing set, are expected to result in the lowest generalization (2)
error. Here we will first derive a Multi-Labeled Bayesian where M I(yi ; ys |yL(x) , x) is the mutual information be-
Error Bound when a selected sample-label pair is labeled tween the random variables yi and ys given the known la-
under multi-label setting, and 2DAL accordingly will itera- beled part yL(x) .
tively select the ones to minimize this bound.
Proof.
Before we move further, we first define some notations. ¡ ¢
For each sample x, it has m labels yi (1 ≤ i ≤ m) and each E y|ys ; yL(x) , x
of them indicates whether its corresponding semantic con- (1) 1 Pm ¡ ¢
= m i=1 E yi |ys ; yL(x) , x
cept occurs. As stated before, in each 2DAL iteration, some (2) Pm ¡ ¢
1
of these labels have already been annotated while others ≤ 2m i=1 H yi |ys ; yL(x) , x
not. Let U (x) , {i|(x, yi ) is unlabeled sample-label pair.} (3) 1 Pm © ¡ ¢ ¡ ¢ª
= 2m i=1 H yi |yL(x) , x − M I yi ; ys |yL(x) , x
denote the set of indices of unlabeled part and L(x) , (3)
{i|(x, yi ) is labeled sample-label pair.} denote the labeled where (2) directly comes from Lemma 1, (3) makes use of
part. Note that L(x) can be an empty set ∅, which indicates the relationship between mutual information and entropy:
that no label has been annotated for x. Let P (y|x) be the M I(X; Y ) = H(X) − H(X|Y ).
m
conditional distribution over samples, where y = {0, 1}
is the complete label vector and P (x) be the marginal sam- We are concerned with pool-based active learning, i.e., a
ple distribution. large pool P is available to the learner sampled from P (x)
First, we establish a Bayesian error bound for classifying and the proposed 2DAL then selects the most informative
one unlabeled yi once ys is actively selected for annotating. sample-label pairs from the pool. We first write the ex-
This error bound originates from the equivocation bound pected Bayesian classification error over all samples in P
given in [7], and we extend it to multi-label setting so it can before selecting a sample-label pair (xs , ys )
handle sample-label pairs. 1 X ¡ ¢
Lemma 1. Given a sample x and its labeled and unlabeled E b (P) = E y|yL(x) , x (4)
|P| x∈P
We can use the above classification error on the pool to esti- but also label uncertainty. The above selection strat-
mate the¡ expected ¢errorRover the¡full distribution
¢ P (x), i.e., egy Eqn. 9 actually well-reflects these two targets. The
EP (x) E y|yL(x) , x = P (x)E y|yL(x) , x dx, because the last term in Eqn. 9 can be rewritten as
pool not only provides a finite set of samples but also an Pm ¡ ¢
estimation of P (x). After selecting the pair (xs , ys ), the i=1 M I yi ; ys |yL(xs ) , xs
expected Bayesian classification error over the pool P is ¡ ¢ P m ¡ ¢
= M I ys ; ys |yL(xs ) , xs + M I yi ; ys |yL(xs ) , xs
i=1,i6=s
E a (P)n ¡ ¢ ¡ ¢
1
¡ ¢ P ¡ ¢o = H ys |yL(xs ) , xs +
P
m
M I yi ; ys |yL(xs ) , xs
= |P| E y|ys ; yL(xs ) , xs + x∈P\xs E y|yL(x) , x
1
¡ ¢ ¡ ¢ i=1,i6=s
= |P| {E y|ys ; yL(xs ) , xs − E y|yL(xs ) , xs (10)
P ¡ ¢
+ x∈P E y|yL(x) , x } As we can see, the objective selection func-
(5) tion for 2DAL has been divided into two parts:
Therefore, the reduction of the expected Bayesian classifi- ¡ ¢ P
m ¡ ¢
H ys |yL(xs ) , xs and M I yi ; ys |yL(xs ) , xs .
cation after selecting (xs , ys ) over the whole pool P is i=1,i6=s
b a The former entropy measures the uncertainty of the se-
∆E (P) = E (P) − E (P) (6)
lected pair (x∗s , ys∗ ) itself, and this is consistent with
Thus our goal is to select a best (x∗s , ys∗ ) to maximize the the typical one dimensional active learning algorithm,
above expected error reduction. That is, i.e., to select the most uncertain samples near the clas-
sification boundary [10][9]. On the other hand, the
(x∗s , ys∗ ) = arg maxxs ∈P,ys ∈U (xs ) ∆E (P)
(7) latter mutual information term measures the statisti-
= arg minxs ∈P,ys ∈U (xs ) −∆E (P)
cal redundancy among the selected label and the rest.
Applying Lemma 1 and Theorem 1, we have By maximizing these mutual information terms, 2DAL
can provide information for the inference of other la-
−∆E (P) = E a (P) − E b (P)
(1) 1 ¡ ¢ ¡ ¢ bels to help reduce their label uncertainty. Therefore,
= |P| {E y|ys ; yL(xs ) , xs − E y|yL(xs ) , xs the obtained strategy confirms our motivation of select-
P ¡ ¢ 1
P ¡ ¢
+ x∈P E y|yL(x) , x } − |P| x∈P E y|yL(x) , x ing the most informative sample-label pairs to reduce
1
© ¡ ¢ ¡ ¢ª
= |P| E y|ys ; yL(xs ) , xs − E y|yL(xs ) , xs the uncertainties along both sample and label dimen-
(2) Pm ¡ ¢ sion. Note that when there is only one label for each
1 1
≤ |P| { 2m i=1¡ H yi |yL(xs ) , xs¢ sample, Eqn. 10 reduces to H(ys |xs ). The selection
1
P m
− 2m Pmi=1 M ¡ I yi ; ys |yL(x
¢ s ) , xs criterion becomes the same as the traditional binary-
1
− m i=1 E yi |yL(xs ) , xs } based criterion, i.e., to select the most uncertain sam-
(3)
1 1
Pm ¡ ¢ ple for annotation [9] [14].
≤ |P| { 2m i=1¡ H yi |yL(xs ) , xs¢
1
P m 2 When computing the mutual information terms in Eqn.
− 2m Pmi=1¡M I ¡yi ; ys |yL(xs ) ,¢xs ¢
1
− m ©i=1 12 HPyi |yL(xs )¡, xs − ² } 9, we need the distribution P (y|x). However, the true
1 1 m ¢ª distribution is unknown, but we can estimate it using
= |P| ² − 2m i=1 M I yi ; ys |yL(xs ) , xs
the current learner. As stated in [13], such an approxi-
(8)
mation is reasonable because the most useful labeling
The equality (1) comes from Eqn. 4 5 . The first inequality
is usually consistent with the learner’s prior belief over
(2) follows the Theorem 1 and the second inequality (3)
the majority (but not all) of the unlabeled pairs.
comes from the lower bound of Lemma 1.
Consequently, by minimizing the obtained Bayesian er- 3 It is worth indicating that the posterior P (y|x) is
ror bound 8, we can select the sample-label pair for annota- significant in modeling the label correlations. If
tion according to we assume the independence Qm among the different
labels, i.e., P (y|x) = i=1 P (yi |x) and corre-
(x∗s , ys∗ ) ½ ¾ spondingly¡ the mutual information term will be-
1 1
P
m ¡ ¢ ¢
= arg min |P| ²− 2m M I yi ; ys |yL(xs ) , xs come M I yi ; ys |yL(xs ) , xs = 0, i 6= s. In this
xs ∈P,ys ∈U (xs )
Pm i=1
¡ ¢ case, the selection criterion
¡ reduces ¢to (x∗s , ys∗ ) =
= arg max i=1 M I yi ; ys |yL(xs ) , xs arg maxxs ∈P,ys ∈U (xs ) H ys |yL(xs ) , xs , that is, to se-
xs ∈P,ys ∈U (xs )
(9) lect the most uncertain sample-label pair. Such a cri-
terion neglects the label correlations and will be less
2.3. Further Discussions efficient in reducing label uncertainty. Therefore, a
1 As we discussed in section 2.1, the proposed 2DAL ap- statistical method that can model the label correlations
proach is an active learning algorithm along two di- is required to adopt. We introduce such a Bayesian
mensions, which reduces not only sample uncertainty model in the following section.
3. Maximum Entropy Model and EM Variant to be determined. The optimal parameters can be found by
minimizing the Lagrangian:
In the above 2DAL strategy, we have indicated that a sta-
D E
tistical model is needed to measure label correlations. How-
L(b, R, W ) = − log P̂ (y|x)
ever, common multi-label classifiers, such as one-against- Q̃
λb 2 λR 2 λW 2
rest encoded binary SVM and others, tackle the classifi- + 2n
||b|| 2 + 2n ||R||F + 2n ||W ||F ® (12)
T
cation of multi-labeled samples in an independent manner. = −y (b + Ry + W x) + log Z(x) Q̃
λb
These models neglect the label correlations and, hence, do + 2n ||b||22 + λ2nR ||R||2F + λ2n
W
||W ||2F
not fit our target. In this section, we will introduce a multi-
labeled Bayesian classifier in which the relations among dif- where ||.||F denotes Frobenius norm and n is the number of
ferent labels are well modeled. samples in training set.
Now, we extend the above multi-labeled MEM to a
3.1. Kernelized maximum entropy model nonlinear one so that the powerful kernel method can be
The principle of Maximum Entropy Model (MEM) is to adopted. A transformation ψ maps samples into a tar-
model all known, and assume nothing about the unknown. get space in which kernel function k(x0 , x) gives the in-
Traditional single-label data classification suffers from the ner product. We can ¡ rewrite the multi-labeled¢ MEM as
1
same problem as binary SVM. [16] extends the single la- P̂ (y|x) = Z(x) exp yT (b + Ry) + yT K(W, x) . Accord-
beled MEM to multi-labeled case. This model is linear ing to the Representer Theorem, the optimal weighting vec-
and can be effective on a set of samples that vary linearly. tor of the single-labeled problem is a linear combination of
However, it will fail to capture the structure of the feature samples. In the proposed multi-labeled setting, the mapped
space if the variations among the samples are nonlinear. weighting matrix ψ(W ) can still be written as a linear com-
But image classification is actually in this case when one bination of ψ(xi ) except that the combination coefficients
is trying to extract features from image categories that vary are vectors instead of scalars, i.e.
in their appearance, illumination conditions and complex Pn
ψ(W ) = i=1 θ(xi )ψ T (xi )
background clutters. Therefore, a nonlinear version of such = [ θ(x1 ) θ(x2 ) · · · θ(xn ) ]
a MEM is required to classify the images based on their T (13)
· [ ψ(x1 ) ψ(x2 ) · · · ψ(xn ) ]
nonlinear feature structure. Moreover, they do not address = Θ · [ ψ(x1 ) ψ(x2 ) · · · ψ(xn ) ]
T
the problem brought by incomplete labels. We first intro-
duce the model in [16] and further extend it to a nonlinear where the summation is taken over the samples in the train-
n
case by incorporating a kernel function into the model. Such ing set {xi }i=1 . θ(xi ) is a m × 1 coefficient vector and
an extension is used as the underlying classifier in 2DAL. Θ is a m × n matrix in which each row is the weighting
e y), Q(x, y) denote the empirical and the model
Let Q(x, coefficients for each label. Accordingly, we have
distribution, respectively. The optimal multi-label model
can be obtained by solving the following formulation [16]: K(W, x) = ψ(W ) · ψ(x)
T (14)
= Θ · [ k(x1 , x) · · · k(xn , x) ] = Θ · k(x)
P̂ = arg maxP H(x, y|Q) = arg minP hlog P (y|x)iQ
s.t. hyi iQ = hyi iQ̃ + ηi , and so
hyi yj iQ = hyi yj iQ̃ + θil , 1 ≤ i < j ≤ m 1
¡ ¢
hy P̂ (y|x) = Z(x) exp yT (b + Ry) + yT k(W, x)
Pi xl iQ = hyi xl iQ̃ + φil , 1 ≤ i ≤ m, 1 ≤ l ≤ d; = Z(x)1
¡
exp yT (b + Ry + Θk(x))
¢ (15)
y P (y|x) = 1
(11)
T
where H(x, y|Q) is the entropy of x and y given distribution where k(x) = [ k(x1 , x) · · · k(xn , x) ] is a n × 1
Q, and h·iQ denotes the expectation with respect to distri- vector and it can be seen as a new representation of sam-
bution Q. d is the dimension of the feature vector x and ple x. Correspondingly, with the identity ||ψ(W )||2F =
xl represents its l-th element. ηi , θil and φil are the es- tr(ψ(W )ψ(W )T ) = tr(ΘKΘT ) the Lagrangian function
timation errors following the Gaussian distribution which Eqn. 12 can be rewritten as
serve to smooth the MEM to improve the model’s gener- D E
alization ability. By modeling the pair-wise label correla- L(b, R, Θ) = − log P̂ (y|x)
Q̃ (16)
tions, the obtained model reveals the underlying label cor- λb
+ 2n ||b||22 + λ2nR ||R||2F + λ2n
W
tr(ΘKΘT )
relations. Formulation 11 can be solved by Lagrange Mul-
tiplier algorithms and¡ the obtained posterior¢ probability is where K = [k(xi , xj )]n×n is the kernel matrix. We
1
P̂ (y|x) = Z(x) exp yT (b + Ry + W x) , where Z(x) = call the above model Kernelized Maximum Entropy Model
P T
y y (b + Ry + W x) is the partition function, and the pa- (KMEM) in this paper. By minimizing Eqn.16, we can es-
rameters b, W , and R are Lagrangian multipliers that need timate the optimal parameters in KMEM.
3.2. EM based approach for incomplete labels
Beach
Given the partially labeled training set constructed by
2DAL (see Figure 2), we can handle the incomplete la- Sunset

bels by integrating out the unlabeled part to yield the Fall foliage
marginal distribution of the labeled part P̂ (yL(x) |x) =
P Field
yU (x) P̂ (yU (x) , yL(x) |x). Then substitute it for P̂ (y|x) in
Mountain
Eqn. 16, we can obtain:
D P E Urban

L(b, R, Θ) = − log yU (x) P̂ (yU (x) , yL(x) |x)

Urban
Beach

Fall foliage

Field

Mountain
Sunset
Q̃ (17)
λb
+ 2n ||b||22 + λ2nR ||R||2F + λ2n
W
tr(ΘKΘT )

By minimizing Eqn. 17, we can find the optimal parameters

Figure 3. The mutual information between different concepts in
for KMEM. However, it is difficult to minimize it directly.
Scene data set
Instead, we use the Expectation Maximization (EM) algo-
Class Total Class Total
rithm [5] to solve this optimization problem: Beach 369 Beach+Mountain 38
E-Step: Given the current t-th step parameter estimation Sunset 364 Foliage+Mountain 13
Foliage 360 Field+Mountain 75
bt , Rt , Θt , the T -function (i.e., the expectation of the La- Field 327 Field+Foliage+Mountain 1
grangain Eq. 16 under the current parameters given the la- Beach+Field 1 Urban 405
Foliage+Field 23 Beach+Urban 19
beled part) can be written as Mountain 405
Table 1. The description about the Scene data set
T (b,
D R, Θ|bt , Rt , Θt ) E
= −EU (x)|L(x);bt ,Rt ,Θt log P̂ (yU (x) , yL(x) |x; b, R, Θ) 4.1. Natural scene data set
Q̃
λb λR λW This natural scene data set is first used in a previous re-
+ 2n ||b||22 + 2
2n ||R||F + 2n tr(ΘKΘT )
(18) search on the multi-labeled image scene classification prob-
where EU (x)|L(x);bt ,Rt ,Θt is the expectation operator lem [2]. It contains 2, 407 natural images belonging to one
given the current estimated conditional probability or more of six natural scene categories including beach,
P̂ (yU (x) |yL(x) , x; bt , Rt , Θt ). sunset, fall foliage, field, mountain, and urban. Since the
M-Step: Update the parameters by minimizing T -function: data sets are multi-labeled, there are 14, 442 sample-label
pairs in this set. Each sample in this set has been assigned
bt+1 , Rt+1 , Θt+1 = arg min T (b, R, Θ|bt , Rt , Θt ) (19) by three positive labels at most. Table 1 describes the multi-
b,R,Θ
label distribution in this set. We can see that 177 samples
The derivatives of T -function with respect to its parameters have more than one positive labels. Although this number is
b, R, Θ is not large, it does not say the label correlation is low. In fact,
® the statistical correlations between the labels are determined
∂T λb
∂bi = hy i iQ − E y |L(x);b ,R ,Θ y i + n® b i by not only the correlations between positive labels but also
i t t t Q̃
∂T λR those between the negative labels, as well as between posi-
∂Rij = hyi yj iQ − Eyi ,y yy
j |L(x);bt ,Rt ,Θt i j Q̃
+ n Rij
® (20) tive and negative ones. In figure 3, we illustrate the mutual
∂T
∂Θil = hyi k(xl , x)iQ − Eyi |L(x);bt ,Rt ,Θt yi k(xl , x) Q̃
λW Pn information calculated over the whole data set. According
+ n k=1 Θik k(xk , xl ) to the information theory, the mutual information considers
Given these derivatives, we can use the efficient gradient all kinds of the correlations among the positive/negative la-
descent methods (e.g., LMVM [1]) to minimize Eqn. 18. bels as stated above. From this illustration, the correlations
between the labels are obvious. Note that, the mutual infor-
mation computed here is not the one used in 2DAL as Eqn.
4. Experiments 9. In Eqn. 9, the mutual information is calculated from the
In this section, we will evaluate the proposed 2DAL statistical model KMEM.
strategy on two real-world used data sets. The first data For the features used in this experiment, an image is first
set is a natural scene set with six image categories. The converted into CIE Luv color space and then the first and
second is a biological data set with 14 different types second color moments (mean and variance) are extracted
of genes. These two data sets are publicly available at over a 7 × 7 grid on the image. The end result is a 49 × 2 ×
http://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/. We 3 = 294 dimension feature vector [2].
compare the proposed 2DAL with the state-of-the-art active In this experiment, we compare the following three ac-
learning approaches. tive learning strategies:
1 0.6

0.9 0.58

0.8 0.56

Average F1 Score
Aveage F1 Score

0.7 0.54

0.6 0.52

0.5 0.5
2DAL
2DAL
0.4 1DAL 0.48
1DAL
RND RND
0.3 0.46

0.2 0.44
0 1000 2000 3000 4000 5000 6000 0 2000 4000 6000 8000 10000 12000 14000
The Total Number of Selected Sample−Label Pairs The Total Number of Selected Sample−Label Pairs

(a) Scene (b) Yeast

Figure 4. The performance of five active learning strategies over two real-world data sets (a)Scene (b)Yeast
Class 2DAL 1DAL RND improvement is obtained by considering its significant cor-
Beach 0.9523 0.8652 0.6744
Sunset 0.9916 0.9421 0.9002 relations with other categories (see Figure 3 for an illustra-
Fall Foliage 0.9887 0.9338 0.8927 tion of these label correlations) during the active learning
Field 0.9588 0.8813 0.8071
Mountain 0.7806 0.6457 0.6122 procedure. It confirms 2DAL can obviously improve the
Urban 0.8534 0.6162 0.6856
Table 2. F 1 scores after 100 iterations on six scene categories. classification performance.

1 The proposed 2DAL strategy: using the proposed 4.2. Gene data set
sample-label pair selection criterion in Section 2.2, The second data set is the Yeast data set [11] which con-
with KMEM as the underlying classifier. sists of micro-array expression data and phylogenetic pro-
2 1D active learning strategy (1DAL): using the mean-max files with 2,417 genes and each gene in the set belongs to
loss active learning strategy that has been proposed in one or more of 14 different functional classes. As for multi-
the previous work [11] on multi-label active learning. labeled gene data set, there are 33, 838 sample-label pairs
As stated in Section 1, this strategy selects only along in the sets. Each sample in this data set is annotated by 11
the sample dimension. It does not take advantages of positive labels at most. The detailed description about this
the label correlations to reduce human labeling cost. biological data set can be found in [6].
Therefore when one sample is selected, all its labels In the experiment, 242 (10%) genes with their labels are
have to be labeled. used as the initial training set. In each iteration, 140 sample-
3 Random strategy (RND): selecting the sample-label pairs label pairs are selected. Similar to section 4.1, the 1DAL
at random. For the sake of fair comparison with the selects 14 samples for annotating all their labels. That’s
proposed 2DAL, we also use KMEM as the classifier. equivalent to 140 sample-label pairs. Figure 4(b) illustrates
the performance of the three strategies on this data set.
We use the average F 1 score over all different labels for
2rp From the above two experiments, we have observed:
performance evaluation, i.e., F 1 = r+p where p and r are
precision and recall respectively. For the Scene data set, we 1 When given a fixed number of annotations, 2DAL out-
use 241 (10%) images as the initial training set. In each performs 1DAL over all the active learning iterations.
iteration, 60 sample-label pairs are selected by the 2DAL. This is because the former considers both sample and
Note that, for 1DAL, it requests for annotation on the basis label uncertainty for selecting sample-label pair, while
of samples rather than sample-label pairs, so in each iter- 1DAL only considers the sample uncertainty. There-
ation, it selects 10 images for annotating all the six labels fore, the informative label correlations associated with
or equivalently 60 image-label pairs. The average F1 score each sample can help to reduce the expensive human
is then computed over all the remaining unlabeled data. In labors needed to construct the labeled pool.
Figure 4(a), we show the performance of the three strate- 2 The proposed 2DAL gives good performance on diverse
gies over the total number of the selected sample-label pairs. data sets, ranging from natural scenes to gene images.
The proposed 2DAL has the best performance over all iter- This is an important character of a good algorithm to
ations. With the number of selected pairs increasing, the be used in real-world applications.
improvement becomes more and more significant. Table 2
5. Conclusion
compares the F 1 scores after 100 active learning iterations
over all the six scene categories. The proposed 2DAL out- In this paper, we proposed an efficient two dimensional
performs the other strategies on all the categories. In par- active learning (2DAL) strategy for multi-labeled image
ticular, the improvement is obvious on “Urban”. Such an classification. This 2DAL strategy selects the sample-label
pairs to annotate, along both the sample and label dimen- 0.5

sions. In contrast to traditional one-dimensional binary ac- 0.45

tive learning algorithms, 2DAL only needs to annotate a sub 0.4 1

H(p)
2
set of labels associated with a certain sample, thus much 0.35

more efficient. Furthermore, we showed that the tradi- 0.3 ²

tional active learning formulation is a special case of 2DAL 0.25

0.2 min{p, 1 − p)
when there is only one lable. Extensive experiments on two
0.15
widely used data sets have shown that for a given number 0.1
of required annotations, the proposed 2DAL strategy out- 0.05

performs other state-of-the-art sample selection strategies. 0

0 0.2 0.4 0.6 0.8 1

Appendix Figure 5. Illustration of the inequality 12 H(p) − ² ≤ min{p, 1 −

p} ≤ 12 H(p), ² = 12 log 54
Here we give the proof of Lemma 1.
Proof. Since the selected ys can take on two values References
{0, 1}, there are two possible¡ posterior distributions ¢ [1] S. Benson, L. C. McInnes, J. Moré, T. Munson, and J. Sarich. TAO
for¡ the unlabeled y¢i , i.e., P yi |ys = 0; yL(x) , x and user manual (revision 1.9). Technical Report ANL/MCS-TM-242,
Mathematics and Computer Science Division, Argonne National
P yi |ys = 1; yL(x) , x . If ys = 1 holds, the Bayesian clas- Laboratory, 2007. http://www.mcs.anl.gov/tao. 6
sification error is [7]: [2] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown. Learning multi-
¡ ¢ ¡ ¢ label scene classification. Pattern Recognition, 37(9), 2004. 1, 6
E y¡i |ys = 1; yL(x) , x = min{P¢ yi = 1|ys = 1; yL(x) , x [3] K. Brinker. On active learning in multi-label classification. ”From
, P yi = 0|ys = 1; yL(x) , x } Data and Information Analysis to Knowledge Engineering” of Book
(21) Series ”Studies in Classification, Data Analysis, and Knowledge Or-
ganization”, Springer, 2006. 1, 2
Given the inequality 21 H(p) − ² ≤ min{p, 1 − p} ≤
1 1 5 [4] E. Y. Chang, S. Tong, K. Goh, and C. Chang. Support vector ma-
2 H(p), ² = 2 log 4 (see figure 5), we have chine concept-dependent active learning for image retrieval. IEEE
1
¡ ¢ ¡ ¢ Transaction on Multimedia, 2005. 1
2 H yi¡|ys = 1; yL(x) , x − ¢ ² ≤ E yi |ys = 1; yL(x) , x [5] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum-likelihood
≤ 21 H yi |ys = 1; yL(x) , x from incomplete data via em algorithm. Journal of the Royal Statis-
(22) tical Society (Series B), 39(1), 1977. 2, 6
Similarly, if ys = 0 holds, [6] A. Elisseeff and J. Weston. A kernel method for multi-labelled clas-
sification. In Proc. of NIPS, 2002. 1, 7
1
¡ ¢ ¡ ¢ [7] M. E. Hellman and J. Raviv. Probability of error , equivocation, and
2 H yi¡|ys = 0; yL(x) , x − ¢ ² ≤ E yi |ys = 0; yL(x) , x
the chernoff bound. IEEE Transaction on Information Theory, 1970.
≤ 21 H yi |ys = 0; yL(x) , x . 3, 8
(23) [8] S. C. H. Hoi and M. R. Lyu. A semi-supervised active learning frame-
Therefore, the Bayesian classification error bound given the work for image retrieval. In Proc. of IEEE CVPR, 2005. 1
selected ys can be computed as: [9] F. Jing, M. Li, and H.-J. Zhang. Entropy-based active learning with
¡ ¢ support vector machine for content-based image retrieval. In Proc.
E yi¡|ys ; yL(x) , x of IEEE Conference on Multimedia and Expo(ICME), 2004. 4
¢ ¡ ¢ [10] A. Kapoor, K. Grauman, R. Urtasun, and T. Darrel. Active learning
= P¡ ys = 1|yL(x) , x¢ E¡ yi |ys = 1; yL(x) , x¢
with gaussian processes for object categorization. In Proc. of IEEE
+P y¡s = 0|yL(x) , x E¢ yi¡|ys = 0; yL(x) , x ¢ ICCV, 2007. 1, 4
(24)
≤ 21 P¡ ys = 1|yL(x) , x¢ H¡ yi |ys = 1; yL(x) , x¢ [11] X. Li, L. Wang, and E. Sung. Multi-label svm active learning for
1
+ 2 P ¡ys = 0|yL(x) , x¢ H yi |ys = 0; yL(x) , x image classification. In Proc. of ICIP, 2004. 1, 2, 7
= 21 H yi |ys ; yL(x) , x [12] G.-J. Qi, X.-S. Hua, Y. Rui, J.-H. Tang, T. Mei, and H.-J. Zhang. Cor-
relative multi-label video annotation. In Proc. of ACM Conference on
The last equality follows the definition of conditional en- Multimedia (ACM Multimedia), 2007. 1
[13] N. Roy and A. McCallum. Toward optimal active learning through
tropy. And similarly sampling esitmation of error reduction. In Proc. of ICML, 2001. 4
¡ ¢ [14] S. Tong and E. Y. Chang. Support vector machine active learning for
E yi¡|ys ; yL(x) , x ¢ ¡ ¢ image retrieval. In Proc. of ACM Conference on Multimedia (ACM
= P¡ ys = 1|yL(x) , x¢ E¡ yi |ys = 1; yL(x) , x¢ Multimedia), 2001. 1, 4
+P y¡s = 0|yL(x) , x E¢ ©yi |y¡s = 0; yL(x) , x ¢ ª [15] R. Yan, J. Yang, and A. Hauptmann. Automatically labeling data
≥ 12 P¡ ys = 1|yL(x) , x¢ © H¡ yi |ys = 1; yL(x) , x¢ − 2²ª using multi-class active learning. In Proc. of IEEE ICCV, 2003. 1
+ 12 P ¡ys = 0|yL(x) , x¢ H yi |ys = 0; yL(x) , x − 2² [16] S. Zhu, X. Ji, W. Xu, and Y. Gong. Multi-labelled classification using
maximum entropy method. In Proc. of ACM SIGIR, 2005. 1, 5
= 12 H yi |ys ; yL(x) , x − ²
(25)

ADeepReinforcement Active Learning Method for Multi-Label Image Classification
No ratings yet
ADeepReinforcement Active Learning Method for Multi-Label Image Classification
15 pages
a-survey-on-online-active-learning-t41pz1uj
No ratings yet
a-survey-on-online-active-learning-t41pz1uj
64 pages
gal17a
No ratings yet
gal17a
10 pages
Active Learning
No ratings yet
Active Learning
102 pages
77038933
No ratings yet
77038933
18 pages
Sampling Yue Fuselage
No ratings yet
Sampling Yue Fuselage
11 pages
2211.11612v2
No ratings yet
2211.11612v2
10 pages
How_to_measure_uncertainty_in_uncertainty_sampling
No ratings yet
How_to_measure_uncertainty_in_uncertainty_sampling
35 pages
Aghdam Et Al. - 2019 - Active Learning For Deep Detection Neural Networks
No ratings yet
Aghdam Et Al. - 2019 - Active Learning For Deep Detection Neural Networks
9 pages
ecmlpkdd2019
No ratings yet
ecmlpkdd2019
17 pages
hospedalesEtAl_pakdd2011
No ratings yet
hospedalesEtAl_pakdd2011
12 pages
Semi-Supervised Variational Adversarial Active Lea
No ratings yet
Semi-Supervised Variational Adversarial Active Lea
20 pages
Zhu and Bento - 2017 - Generative Adversarial Active Learning
No ratings yet
Zhu and Bento - 2017 - Generative Adversarial Active Learning
11 pages
activing learning method using svm for text classification
No ratings yet
activing learning method using svm for text classification
9 pages
ssl-al-bilevel
No ratings yet
ssl-al-bilevel
5 pages
A Simple Baseline for Low-Budget Active Learning
No ratings yet
A Simple Baseline for Low-Budget Active Learning
20 pages
Combining_active_learning_with_concept_drift_detection_for_data_stream_mining
No ratings yet
Combining_active_learning_with_concept_drift_detection_for_data_stream_mining
6 pages
Active Downsampling For Binary Classification With An Imbalanced Dataset
No ratings yet
Active Downsampling For Binary Classification With An Imbalanced Dataset
7 pages
Active_Finetuning_Exploiting_Annotation_Budget_in_the_Pretraining_Finetuning_Paradigm
No ratings yet
Active_Finetuning_Exploiting_Annotation_Budget_in_the_Pretraining_Finetuning_Paradigm
12 pages
PbAL For Skewed Data With Nonparametric Logistic Regression
No ratings yet
PbAL For Skewed Data With Nonparametric Logistic Regression
34 pages
Streaming Active Learning With Deep Neural Networks: Ash & Adams 2020
No ratings yet
Streaming Active Learning With Deep Neural Networks: Ash & Adams 2020
17 pages
17013-Article Text-20507-1-2-20210518
No ratings yet
17013-Article Text-20507-1-2-20210518
8 pages
active
No ratings yet
active
2 pages
Mathematics 11 00820
No ratings yet
Mathematics 11 00820
38 pages
4305-Article Text-7359-1-10-20190706
No ratings yet
4305-Article Text-7359-1-10-20190706
8 pages
A_Representation-Based_Query_Strategy_to_Derive_Qu
No ratings yet
A_Representation-Based_Query_Strategy_to_Derive_Qu
11 pages
FULLTEXT01
No ratings yet
FULLTEXT01
59 pages
Sinha Et Al. - 2019 - Variational Adversarial Active Learning
No ratings yet
Sinha Et Al. - 2019 - Variational Adversarial Active Learning
10 pages
Active Learning On A Budget: Opposite Strategies Suit High and Low Budgets
No ratings yet
Active Learning On A Budget: Opposite Strategies Suit High and Low Budgets
21 pages
Active Learning
No ratings yet
Active Learning
16 pages
clinical-trial-active-learning-1dlud0n4xp
No ratings yet
clinical-trial-active-learning-1dlud0n4xp
11 pages
TR1648
No ratings yet
TR1648
47 pages
PLDL-A Novel Method For LDL
No ratings yet
PLDL-A Novel Method For LDL
1 page
Active Learning For Improved Semi-Supervised Semantic Segmentation in Sattelite Images
No ratings yet
Active Learning For Improved Semi-Supervised Semantic Segmentation in Sattelite Images
17 pages
Learning Active Learning From Data
No ratings yet
Learning Active Learning From Data
11 pages
Active Learning With Sampling by Uncertainty and Density For Word
No ratings yet
Active Learning With Sampling by Uncertainty and Density For Word
8 pages
Active Learning For Data Streams A Survey
No ratings yet
Active Learning For Data Streams A Survey
48 pages
ICML - 2022 - Active Testing Sample-Efficient Model Evaluation
No ratings yet
ICML - 2022 - Active Testing Sample-Efficient Model Evaluation
11 pages
Active Surrogate Estimators An Active Learning Approach To LabelEfficient Model Evaluation
No ratings yet
Active Surrogate Estimators An Active Learning Approach To LabelEfficient Model Evaluation
14 pages
Active Learning in Multimedia Annotation and Retrieval - A Survey
No ratings yet
Active Learning in Multimedia Annotation and Retrieval - A Survey
21 pages
Active Learning From Imbalanced Data
No ratings yet
Active Learning From Imbalanced Data
4 pages
Active Learning in The Era of Big Data
No ratings yet
Active Learning in The Era of Big Data
13 pages
A New Vision of Collaborative Active Learning: A A A B
No ratings yet
A New Vision of Collaborative Active Learning: A A A B
14 pages
Selective Data Acquisition For Machine Learning: Josh Attenberg
No ratings yet
Selective Data Acquisition For Machine Learning: Josh Attenberg
45 pages
Stream and Pool Based Active Learning
No ratings yet
Stream and Pool Based Active Learning
11 pages
Scalable Active Learning For Multiclass Image Classification
No ratings yet
Scalable Active Learning For Multiclass Image Classification
15 pages
Less Is More: Active Learning With Support Vector Machines: Greg Schohn David Cohn
No ratings yet
Less Is More: Active Learning With Support Vector Machines: Greg Schohn David Cohn
8 pages
Active Learning Icml09
No ratings yet
Active Learning Icml09
96 pages
Active Learning From Multiple Knowledge Sources
No ratings yet
Active Learning From Multiple Knowledge Sources
8 pages
Active Deep Learning For Medical Imaging Segmentation
No ratings yet
Active Deep Learning For Medical Imaging Segmentation
3 pages
An Active Learning Algorithm Based On Parzen Window Classification
No ratings yet
An Active Learning Algorithm Based On Parzen Window Classification
14 pages
Sample Complexity Active LearninSDFSDGg
No ratings yet
Sample Complexity Active LearninSDFSDGg
31 pages
TongK01 SVM
No ratings yet
TongK01 SVM
22 pages
Hierarchical Sampling For Active Learning
No ratings yet
Hierarchical Sampling For Active Learning
8 pages
Active Sample Learning and Feature Selection: A Unified Approach
No ratings yet
Active Sample Learning and Feature Selection: A Unified Approach
11 pages
Data Mining - Utrecht University - 13. Active Learning
No ratings yet
Data Mining - Utrecht University - 13. Active Learning
57 pages
Active Learning Methods For Interactive Image Retrieval
No ratings yet
Active Learning Methods For Interactive Image Retrieval
12 pages
Structural Behaviour of Crushed Fine Ceramic Tiles As Partial Substitute of Fine Aggregates in The Production of Concrete
No ratings yet
Structural Behaviour of Crushed Fine Ceramic Tiles As Partial Substitute of Fine Aggregates in The Production of Concrete
8 pages
Finding Rare Classes: Active Learning With Generative and Discriminative Models
No ratings yet
Finding Rare Classes: Active Learning With Generative and Discriminative Models
13 pages
G11-Chemistry workbook
No ratings yet
G11-Chemistry workbook
222 pages
A-LEVEL-LITA-F-MATHS-LITM-2025-FM-P1-0775
No ratings yet
A-LEVEL-LITA-F-MATHS-LITM-2025-FM-P1-0775
4 pages
PT1 _Syllabus_XII_2025-26
No ratings yet
PT1 _Syllabus_XII_2025-26
2 pages
Get Fundamentals and Properties of Multifunctional Nanomaterials 1st Edition - Ebook PDF PDF Ebook With Full Chapters Now
100% (3)
Get Fundamentals and Properties of Multifunctional Nanomaterials 1st Edition - Ebook PDF PDF Ebook With Full Chapters Now
41 pages
Moment Gradient Factor for Steel I-beams
No ratings yet
Moment Gradient Factor for Steel I-beams
20 pages
Selfstudys Com File
No ratings yet
Selfstudys Com File
12 pages
Aoi-01-Eng-2024
No ratings yet
Aoi-01-Eng-2024
77 pages
EE6503 - Chapter 3
No ratings yet
EE6503 - Chapter 3
46 pages
Grade 9 Worksheet from chap 1 till Energy 16th nov 2024
No ratings yet
Grade 9 Worksheet from chap 1 till Energy 16th nov 2024
7 pages
Lesson 4 Rotation and Revolution of The Earth 1
No ratings yet
Lesson 4 Rotation and Revolution of The Earth 1
19 pages
09 - 2021 - Ahn - EnsemblePigDet - Ensemble Deep Learning For Accurate
No ratings yet
09 - 2021 - Ahn - EnsemblePigDet - Ensemble Deep Learning For Accurate
20 pages
PG Gold Medal List 2021 (1)
No ratings yet
PG Gold Medal List 2021 (1)
3 pages
R8 - C16 - 113 - Online - Coreset - Selection - For - Rehearsal-Based Continual Learning
No ratings yet
R8 - C16 - 113 - Online - Coreset - Selection - For - Rehearsal-Based Continual Learning
16 pages
TG30 Version 3 November 2018
No ratings yet
TG30 Version 3 November 2018
8 pages
IELTS Reading True False Not Given Tests
No ratings yet
IELTS Reading True False Not Given Tests
5 pages
List of Mini Projects For MTC
No ratings yet
List of Mini Projects For MTC
4 pages
Gashaw, Thesis Final - Revised
No ratings yet
Gashaw, Thesis Final - Revised
101 pages
12 - 2022 - Qian - A Forest Fire Identification System Based On Weighted Fusion Algorithm
No ratings yet
12 - 2022 - Qian - A Forest Fire Identification System Based On Weighted Fusion Algorithm
11 pages
2023-Lyu Box-Level Active Detection CVPR 2023 Paper
No ratings yet
2023-Lyu Box-Level Active Detection CVPR 2023 Paper
10 pages
Iu1641040027 - Naichal Desai PDF
No ratings yet
Iu1641040027 - Naichal Desai PDF
56 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Handout
No ratings yet
Handout
4 pages
ASTM A192, ASME SA192 Seamless Boiler and Superheater Tubes
No ratings yet
ASTM A192, ASME SA192 Seamless Boiler and Superheater Tubes
5 pages
Result - JEE Adv Mock Test - (15.03.2024)
No ratings yet
Result - JEE Adv Mock Test - (15.03.2024)
1 page
Screw Gauge Apparatus: To Be Written in The Right Page of The Record
No ratings yet
Screw Gauge Apparatus: To Be Written in The Right Page of The Record
4 pages
ASS-RDM Cour - 02
No ratings yet
ASS-RDM Cour - 02
5 pages
Titanium: Grade 5 (Ti6Al4V)
No ratings yet
Titanium: Grade 5 (Ti6Al4V)
2 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Design and Fabrication of Vertical Axis Wind Turbine With Magnetic Repulsion IJERTV6IS050462
No ratings yet
Design and Fabrication of Vertical Axis Wind Turbine With Magnetic Repulsion IJERTV6IS050462
6 pages
PH802 - Atomic and Molecular Spectroscopy (2020) - IRIS
No ratings yet
PH802 - Atomic and Molecular Spectroscopy (2020) - IRIS
2 pages
Modern Filipino Scientist
No ratings yet
Modern Filipino Scientist
3 pages
Print Control Strips
No ratings yet
Print Control Strips
4 pages
OMEGA AIR-Clean Air For Hospitals - SF Filter SF006 and 0310-VSF
No ratings yet
OMEGA AIR-Clean Air For Hospitals - SF Filter SF006 and 0310-VSF
1 page
Ansi Agma 6114 A06
100% (2)
Ansi Agma 6114 A06
78 pages
Gen Phy. Module 2
100% (1)
Gen Phy. Module 2
14 pages

2008 - CVPR-gjqi - Two-Dimensional Active Learning For Image Classification

Uploaded by

2008 - CVPR-gjqi - Two-Dimensional Active Learning For Image Classification

Uploaded by

Two-Dimensional Active Learning for Image Classification

Abstract age classification, as it can significantly reduce the human

tend it from a one-dimensional sample-centric approach to Beach N P P

a two-dimensional joint sample-label-centric approach for People N P N

... ... ... ... ... ... ... ...

Learning (2DAL) since it considers not only the samples to

Xn P ... ? ... P Xn P ... ? ... P

associated with these samples along the label dimension.

L(b, R, Θ) = − log yU (x) P̂ (yU (x) , yL(x) |x)

By minimizing Eqn. 17, we can find the optimal parameters

(a) Scene (b) Yeast

sions. In contrast to traditional one-dimensional binary ac- 0.45

tive learning algorithms, 2DAL only needs to annotate a sub 0.4 1

more efficient. Furthermore, we showed that the tradi- 0.3 ²

tional active learning formulation is a special case of 2DAL 0.25

performs other state-of-the-art sample selection strategies. 0

Appendix Figure 5. Illustration of the inequality 12 H(p) − ² ≤ min{p, 1 −

You might also like