0% found this document useful (0 votes)

4 views15 pages

An Instance Level Analysis of Classification Difficulty for Unlabeled Data

This paper explores methods for assessing classification difficulty in unlabeled data by adapting existing instance hardness measures and developing regression meta-models. The study demonstrates that both approaches effectively identify instances in borderline regions, which are typically harder to classify. The findings contribute to improving the reliability and trustworthiness of machine learning models in deployment scenarios where data lacks labels.

Uploaded by

FiveBase

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views15 pages

An Instance Level Analysis of Classification Difficulty for Unlabeled Data

Uploaded by

FiveBase

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

An Instance Level Analysis

of Classiﬁcation Diﬃculty for Unlabeled

Data

Patricia S. M. Ueda1(B) , Adriano Rivolli2 , and Ana Carolina Lorena1

1
Universidade Tecnológica Federal do Paraná, Cornélio Procópio, Brazil
[email protected], [email protected]
2
Instituto Tecnológico de Aeronáutica, São José dos Campos, Brazil
[email protected]

Abstract. Instance hardness measures allow us to assess and under-

stand why some observations from a dataset are difficult to classify.
With this information, one may curate and cleanse the training dataset
for improved data quality. However, these measures require data to be
labeled. This limits their usage in the deployment stage when data
is unlabeled. This paper investigates whether it is possible to identify
observations that will be hard to classify despite their label. For such,
two approaches are tested. The first adapts known instance hardness
measures to the unlabeled scenario. The second learns regression meta-
models to estimate the instance hardness of new data observations. In
experiments, both approaches were better at identifying instances lying
in borderline regions of the dataset, which pose a greater difficulty when
the label is unknown.

Keywords: Machine Learning · Instance hardness measures ·

Unlabeled data · Deployment of models

1 Introduction

The Machine Learning (ML) literature extensively provides algorithmic devel-

opments focused on model hyperparameter tuning and related model-centric
tasks. More recently, the community of data-centric Artificial Intelligence (AI)
is lighting the focus on the effort to understand more the data and its quality
improvement than on developing more complex ML models [13].
Paving the way for such a data-centric approach is a more fine-grained anal-
ysis of the data and classification performance. Herewith, aggregated measures
applied for classification problems, such as accuracy, precision, or similar met-
rics, restrict the understanding of the particularities in the data the algorithms
are modeling. Those aggregated metrics do not provide information about mis-
classification at the level of an instance or why they are misclassified. However, a
more reliable usage of ML algorithms must reveal for which particular instances
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2025
A. Paes and F. A. N. Verri (Eds.): BRACIS 2024, LNAI 15412, pp. 141–155, 2025.
https://doi.org/10.1007/978-3-031-79029-4_10
142 P. S. M. Ueda et al.

a model struggles to classify correctly and why. One way to achieve such under-
standing is leveraging knowledge from correlating data characteristics extracted
by a set of meta-features [12] to the predictive performance of multiple algo-
rithms, in a meta-learning (MtL) approach [2].
One particular set of meta-features is the set of data complexity measures,
previously proposed by Ho and Basu [5] to explore the overall complexity of solv-
ing the classification problem given the dataset available for learning, providing
a global perspective of the difficulty of the problem [1]. Since these measures
can fail to provide information at the instance-level [8], the Instance Hardness
Measures (IHMs) were introduced by Smith et al. [14] to characterize the diffi-
culty level of each individual instance of a dataset, giving information on which
particular instances are misclassified and why. These developments attend to a
trending interest in responsible AI that has emerged in recent years, making
researchers focus on the reliability and trustfulness of the predictions obtained
by ML models.
Nonetheless, the current IHMs need the instance label to be computed, which
restricts their use for analyzing and curating ML training datasets. In the use
of ML in production, where the class of an instance is unknown, adaptations
are needed. This paper proposes alternative instance hardness measures when
the instances do not have a label. The idea is to leverage the knowledge of the
hardness of the training dataset, which is labeled, to assess the hardness level
of new unlabeled instances. This knowledge can support reject-option strategies
in the future so the ML model might opt for abstaining from some predictions
that will be uncertain [4].
Firstly, a set of IHMs is adapted to disregard the labels of the new instances
in their computation. Another strategy tested was generating regression meta-
models to estimate the IHMs for new unlabeled observations in a meta-learning
approach at the instance level. Both approaches are compared experimentally
using one synthetic dataset and four datasets of the health domain, known for
presenting hard instances. Instances with characteristics making them lie in over-
lapping or borderline regions of the classes are highlighted as hard to classify
by both approaches. The adapted measures show an increased correlation to the
original values of the instance hardness measures and prove to be an adequate
alternative to estimate instance hardness in the deployment stage, driving the
solutions to a more refined level and contributing toward a more trustful use of
ML models.
The paper is organized as follows: Sect. 2 details the hardness measures to
apply to unlabeled data and how they were modified from the original measures.
Section 3 presents the materials and methods used in experiments, whose results
are presented in Sect. 4. Finally, Sect. 5 presents the conclusions of this work.

2 Instance Hardness Measures

The concept of instance hardness was introduced in the seminal work of Smith
et al. [14] as an alternative for a fine-grained analysis of classification difficulty.
An Instance Level Analysis of Classification Difficulty for Unlabeled Data 143

They deﬁne an instance as hard to classify if it gets consistently misclassiﬁed by

a set of classifiers of different biases. They also define a set of measures to explain
possible reasons why an instance is difficult to classify, which are regarded as
instance-level meta-features in the literature [7].
The base IHMs adopted in this work are presented next, along with their
adaptations, which are indicated by the “adj” (adjusted) extension. In their
definition, let D be a training dataset with n pairs of labeled instances (xi , yi ),
where each xi ∈ X is described by m input features and yi ∈ Y is the class of
the instance in the dataset. The number of classes is denoted as C. And let x
be a new instance for which the label is unknown.
To illustrate the concepts, consider the dataset in Fig. 1 containing two
classes, red and blue. Two instances are highlighted: x1 and x2 . The instance x1
is in a borderline area of the classes and might be difficult to classify despite its
class. The instance x2 is more aligned to the blue class. If the label registered
for it in the dataset is blue, it will be easily classified. Otherwise, it will have a
hardness level higher than x1 . Standard IHMs need to know these labels, so that
both x1 and x2 are contained in the labeled dataset D. This work introduces
adaptations to estimate the hardness level of an instance in the absence of its
label, meaning x1 and x2 are not in the labeled dataset D used to estimate the
hardness levels. Please note there are differences between the two estimations.
Based on the characteristics of x2 , it will probably be easily classified as blue.
In contrast, x1 will probably be considered hard to classify in both scenarios.

Fig. 1. Example of dataset with highlighted instances: x1 is in a borderline region

and can be diﬃcult to classify despite its class; x2 might be easy or hard to classify
depending on its registered label. (Color ﬁgure online)

2.1 Neighborhood-Based IHM

The hardness level of the instance can be obtained considering its neighbourhood
in the dataset. In the original IHMs, instances surrounded by elements sharing
the same label as themselves can be considered easier to classify. For new data
without labels, our approach seeks the neighbourhood of the instance in the
labeled dataset D and assigns a higher hardness level when there is a mix of
diﬀerent classes in this region.
144 P. S. M. Ueda et al.

k-Disagreeing Neighbors kDN: the original kDN measure computes the per-
centage of the k nearest neighbors of xi in the dataset D that have a different
label than the refereed instance:
{xj |xj ∈ kNN(xi ) ∧ yj = yi }
kDN(xi , yi ) = , (1)
k
where kNN(xi ) represents the set of k-nearest neighbors of the instance xi
in the dataset D. An instance will be considered harder to classify when the
value of kDN(xi , yi ) is higher. Values close to 1 represent an instance sur-
rounded by examples from a different class of itself. This would be x2 ’s case
in Fig. 1 when labeled red in D. Intermediate values of kDN(xi , yi ) are found
for borderline instances. Easier instances are those surrounded by elements
sharing their class label, which would correspond to x2 when it has a blue
label.
In the absence of an instance’s label, an alternative way to measure the mix-
ture of classes in its neighbourhood is to compute an entropy measure. Specif-
ically, the entropy is computed based on the proportion of the classes found
in the instance’s neighbourhood. Higher entropy values represent the new
instance is in regions from D near elements from different classes. This cor-
responds to the x1 case in Fig. 1. In contrast, x2 will be regarded as easy to
predict, as it is surrounded by elements of the blue class.
C

kDNadj (x) = − p(yj = ci ) log p(yj = ci ), for xj ∈ kNN(x), (2)
i=1

where p(yj = ci ) are the proportions of the classes of the k-nearest neighbours
of x in the dataset D.
Ratio of the Intra-class and Extra-Class Distances. N2IHM : the original
measure takes the complement of the ratio of the distance of xi to the nearest
example from its class in D to the distance it has to the nearest instance from
a different class (nearest enemy) in D with a normalization as presented next:
1
N2IHM (xi , yi ) = 1 − , (3)
IntraInter(xi ) + 1
where:
d(xi , NN(xi ∈ yi ))
IntraInter(xi , yi ) = , (4)
d(xi , NE(xi ))
where d is a distance function, NN(xi ∈ yi ) represents the nearest neighbor of
xi from its class and NE(xi ) is the nearest enemy of xi (NE(xi ) = NN(xi ∈
yj = yi )). In this formulation, when an instance is closer to an example from
another class than another from its own class, the N2IHM values will be larger,
indicating that this instance is harder to classify. This would correspond to
the case where x2 in Fig. 1 has the red label.
The alternative measure for unlabeled instances can be obtained by taking the
ratio of the minimum distance from x and the closest element in D, denoted
An Instance Level Analysis of Classification Difficulty for Unlabeled Data 145

as xj in Eq. 5, to the distance from x and the closest element from another
class in D, that is, a class diﬀerent from that of xj . This ratio will assume
value close to 1 when the instance is almost equally distant from diﬀerent
classes. This will happen more probably for borderline instances, such as x1
in Fig. 1.
min(d(x, xj ))
N2adj (x) = (5)
min(d(x, xk )|yk = yj )

2.2 Class Likelihood IHM

This type of measure captures if the instance is well situated in its class, consid-
ering the general patterns of this class. The likelihood can be estimated for that,
considering the input features are independent for simplifying the computations.
Class Likelihood Difference] CLD: the original measure takes the comple-
ment of the difference between the likelihood that xi belongs to its class yi
and the maximum likelihood it has to any other class. This complement is
taken to standardize the interpretation of the direction of hardness since the
confidence of an instance belongs to its class is larger than that of any other
class [9]:

1 − p(xi |yi )p(yi ) − maxyj =yi [p(xi |yj )p(yj )]
CLD(xi , yi ) = , (6)
2
where p(yi ) is the prior of class yi , set as C1 for all data instances. p(xi |yi ) rep-
resents the likelihood xi belongs to class yi and it can be estimated considering
the input features independent of each other, as in Naïve Bayes classification.
For example, if x2 in Fig. 1 is labeled as blue in D, it will be easy according
to this measure, as its likelihood to the blue class will be higher than to the
red class.
When the class of an instance cannot be defined in advance, the hardness
measure can be estimated by the difference between the two higher likeli-
hoods of all possible classes in the dataset. Like in the original measure, the
complement of the difference is taken to keep the interpretation that higher
values are found for instances harder to classify. The values of this measure
will tend to be higher for borderline instances since their likelihood of being
in different classes will be similar.

1 − maxyi [p(x|yi )p(yi )] − maxyj =yi [p(x|yj )p(yj )]
CLDadj (x) = . (7)
2

2.3 Tree-Based IHM

Decision trees (DTs) can be used to estimate the hardness level of an instance
based on the number of splits necessary to classify it. If many splits are required,
the instance’s classiﬁcation will be harder. The DT is built based on the labeled
dataset D. Unlabeled instances are input to the built DT, and the measure can
be computed based on where it is classiﬁed.
146 P. S. M. Ueda et al.

Disjunct Class Percentage DCP: from a pruned decision tree (DT) using D,
the leaf node where the instance is classiﬁed is considered the disjunct of xi .
The complement of the percentage of instances in this disjunct that shares
the same label as xi gives the original DCP measure:

{xj |xj ∈ Disjunct(xi ) ∧ yj = yi }

DCP(xi , yi ) = 1 − , (8)
{xj |xj ∈ Disjunct(xi )}
where Disjunct(xi ) represents the instances contained in the disjunct (leaf
node) where xi is placed. For easy instances, according to this measure, larger
percentages of examples sharing the same label as the instance will be found
in their disjunct. For example, if x2 in Fig. 1 has the red label in D, it will
probably be placed in a leaf node containing many elements of the blue class,
making it harder to classify according to the interpretation of this measure.
In scenarios where the instance’s class is unknown, we take the entropy of
the disjunct where the instance is placed as a hardness measure, similarly to
what has been done for kDN.
C

DCPadj (x) = − p(yj = ci ) log p(yj = ci ), for xj ∈ Disjunct(x), (9)
i=1

where the proportions of the classes are taken based on the disjunct where x
is placed in the DT built using the dataset D.
Tree Depth TD: the original measure gives the depth of the leaf node that
classiﬁes xi in a DT built using all labeled dataset D, normalized by the
maximum depth of the tree:

depthDT (xi )
TD(xi , yi ) = , (10)
max(depthDT (xj ∈ D))

where depthDT (xi ) gives the depth where the instance xi is placed in the
DT. Instances harder to classify tend to be placed at deeper levels of the tree,
making TD higher. There are two versions of this measure. One derives from
a pruned tree (TDP ) and the other from an unpruned tree (TDU ).
For unlabeled instances, the procedure for hardness estimation is the same as
in DCP, where the DT is built from the labeled set D, and next, the unlabeled
instance is submitted to the built DT. The depth of the leaf node where this
instance is classiﬁed by the DT is taken and used in the equation:

depthDT (x)
TDadj (x) = , (11)
max(depthDT (xj ∈ D))

2.4 Using Meta-models to Estimate IHM

Meta-learning is a traditional ML task that uses data related to ML itself [2].

Here, MtL is designed to predict IHM values without considering their labels.
This is done using the original input features from the dataset D to learn the
An Instance Level Analysis of Classiﬁcation Diﬃculty for Unlabeled Data 147

expected IHM values in a regression task. Therefore, in this approach regression

meta-models are induced to estimate the IHM values of new instances. Their
training datasets comprise the original input features of D and a label corre-
sponding to an IHM estimated from D in its original formulation. There is one
regression model per IHM.
The estimation of the IHM values for unlabeled data with this meta-learning
approach is compared to the usage of the adjusted IHM values.

3 Materials and Methods

In this section, we describe the materials and methods used in experiments
performed to analyze the behaviour of IHM for unlabeled data in classiﬁcation
problems.

3.1 Datasets
Five datasets are employed in the experiments. The first dataset was created
synthetically, containing three classes with some overlap. The other four datasets
are from the health domain, for which some instances are hard to classify due
to the overlap of attribute values for different classes or inconsistencies. Two
of them are from the UCI public repository [6] and have been employed in
previous related work [8,11]. The last two are related to severe COVID-19 cases
in two large hospitals from the São Paulo metropolitan area [15]. The main
characteristics of the five datasets are presented in Table 1, including the number
of instances, classes and input features.

Table 1. Summary of the datasets used in the study.

Blobs Diabetes Heart Hospital1 Hospital2

Instances 300 768 270 526 134
Classes 3 2 2 2 2
Features 2 8 13 17 19

The dataset blobs was generated synthetically using the make_blobs package
from the scikit-learn library [10], which can generate isotropic Gaussian blobs in
space. The standard deviation between the centers of the classes was set as 2 to
create some overlap between the input features and regions where the diﬃculty
in classifying the instances is harder than others. Figure 4 presents this dataset,
where it is possible to notice some overlap in the borderline regions of the classes.
The diabetes dataset is related to the incidence of diabetes in female patients
of Pima Indian heritage who are at least 21 years old. The objective is to iden-
tify the presence of diabetes. The predictive variables record blood indices and
patient characteristics, such as number of pregnancies and age [6].
148 P. S. M. Ueda et al.

Fig. 2. Illustration of the blobs dataset.

The heart dataset registers heart disease in patients and has features collected
during the exercise test, others reﬂecting blood indices and personal character-
istics of the patients, such as age and gender [6].
The last two datasets, named hospital, were extracted from the raw public
database provided by FAPESP COVID data sharing initiative [3]. The binary
response categorized patients as severe when hospital stay was greater than or
equal to 14 days or patients who progressed to death. The features collected in
those datasets were related to blood indices, age and gender [15].

3.2 Methodology
The adjusted IHM measures proposed in this paper were applied to the datasets,
considering each instance unlabeled once at a time, and the remaining instances
labeled, resembling a leave-one-out (LOO) cross-validation scheme.
The same procedure is used to generate the meta-models to predict the IHM
values, where one instance is left out as unlabeled at a time. The IHM of the other
instances is calculated using their original formulations. Next, a meta-dataset is
built, mapping the original features of the instances to the computed IHM values.
Regression meta-models are induced to learn this relationship and predict the
expected IHM value of the left-out instance. One meta-model is induced per
IHM measure considered. We used the Random Forest Regressor (RF) available
in the Scikit-learn library [10] with default hyperparameters’ values to generate
these meta-models.
We also computed the original IHMs for the entire datasets, which regard
the labels of all instances. Next, we compare the association of the IHM values
of the original measures to those of the estimated measures, where the estima-
tion is taken by the adjusted measures or the induced meta-models. Spearman’s
correlation provides a non-parametric estimation of the association (monotonic
relationship) of the modified measures with the original measure. This correla-
tion captures if the direction of the adjusted/estimated IHM is the same as the
value obtained from the original IHM. Higher values of the Spearman’s correla-
tion indicate more association between the estimated and original IHM.
We expect medium to high correlations, although there can be deviations of
values, since they do not strive to deliver identical IHM values. Indeed, instances
with noisy labels in the training datasets have characteristics that make them
An Instance Level Analysis of Classification Difficulty for Unlabeled Data 149

aligned to another class and are expected to show a lower correlation to the
original IHM values. But for most cases, we expect the hardness directions to be
maintained.
All codes and analyses are implemented in Python. The original IHMs are
computed using the PyHard package [7,9]. Codes of the adjusted measures are
in a public repository https://anonymous.4open.science/r/Adj-IHM-BF75. The
k value in kDN was set as 10, default value in the PyHard package.

4 Results

The results of the experiments performed are presented and discussed next.

4.1 Meta-models
First, we present the performance of the meta-models in the regression task.
Table 2 presents the Mean Squared Error (MSE) obtained in predicting the
IHMs using the regression meta-models. Lower values are indicative of better
performances in predicting the original IHM values.

Table 2. MSE obtained for the RF algorithm concerning predicting the IHMs to
diﬀerent datasets.

blobs diabetes heart hospital1 hospital2

kDN 0.219 0.214 0.232 0.190 0.209
N2 0.150 0.077 0.109 0.049 0.049
CLD 0.215 0.196 0.240 0.146 0.124
DCP 0.238 0.221 0.248 0.168 0.190
TDU 0.066 0.087 0.090 0.068 0.091
TDP 0.035 0.022 0.010 0.002 0.105

For some measures, the MSEs are lower, demonstrating a better approxima-
tion of the original IHM values. This happens mostly for tree-depth measures.
For others, the approximations are not as good (e.g. for kDN and DCP). One
possible explanation is that the tree depth measures do not depend as much on
the labels of the instances as the others. The only difference between the original
tree depth measures and their estimated counterparts is excluding one instance
from the decision tree induction, which affects less the results. For other mea-
sures, if an instance is incorrectly labeled, the original measures will point them
as very hard to classify. But this instance might be easily classified into another
class, making it easy without the label information.
150 P. S. M. Ueda et al.

4.2 Correlation Analysis

Table 3 shows Spearman’s correlation coeﬃcient between the original IHMs and
the measures obtained using the meta-learning approach. Values higher than
0.5 are highlighted in bold. The values of the estimated tree depth measures
are the highest, especially for the pruned version of the measure (TDU ). This
happens because, in the pruned version of the tree, noisy and outlier instances
tend to be placed in nodes which have undergone pruning. Therefore, the label of
the particular instance seems to matter less in the original IHM formulation. In
contrast, the formulation of the original CLD, DCP and kDN measures is highly
inﬂuenced by the label of each instance where they are measured. This decreases
the correlations, especially in datasets with many instances with feature values
akin to a class, despite being originally labeled into another class in the dataset.
This is the case for hospital 1 and 2 datasets, where situations such as instances
wrongly labeled or with overlapping feature values are more common.

Table 3. Spearman coeﬃcient obtained for the RF algorithm concerning predicting

the IHMs to diﬀerent datasets.

blobs diabetes heart hospital1 hospital2

kDN 0.721 0.651 0.632 0.411 0.496
N2 0.794 0.592 0.642 0.299 0.295
CLD 0.685 0.666 0.430 0.432 0.287
DCP 0.611 0.680 0.508 0.392 0.456
TDU 0.927 0.824 0.671 0.870 0.911
TDP 0.987 0.965 0.953 0.993 0.876

Table 4. Spearman coeﬃcient obtained for adjusted measures compared to the original
IHMs in diﬀerent datasets.

blobs diabetes heart hospital1 hospital2

kDN 0.696 0.713 0.836 0.427 0.554
N2 0.579 0.651 0.790 0.501 0.609
CLD 0.760 0.767 0.839 0.502 0.677
DCP 0.666 0.413 0.793 0.514 0.681
TDU 0.918 0.986 0.982 0.994 0.961
TDP 0.773 1.000 1.000 1.000 0.960

Table 4 presents the same results for the adjusted IHMs: their Spearman
correlation to the original IHMs. As in Table 3, values higher than 0.5 are bold-
faced. More boldfaced correlations are observed here. Similar observations con-
cerning the higher correlation values for tree depth-based measures are observed
An Instance Level Analysis of Classiﬁcation Diﬃculty for Unlabeled Data 151

in Table 4 too. The correlations observed for the adjusted measures are generally
higher than those observed for the measures estimated by the meta-regressors.
To make the differences clearer, Fig. 3 plots the Spearman’s correlations for the
adjusted IHM and the meta-models compared to the original values. Blue bars
represent the correlation of the adjusted measures, while orange bars denote the
meta-learning approach. Only for the blobs dataset and for the DCP-diabetes
combination were correlations of the meta-models higher than those of the
adjusted IHMs. The blobs dataset has difficult instances concentrated on the
border of the classes, while the other datasets may pose other sources of diffi-
culties which are not captured when the labels are absent, such as label noise.

Fig. 3. Spearman’s correlation applied to the adjusted IHM vs. the original IHM (blue
bars) and the predicted vs. expected values from MtL (orange bars). (Color ﬁgure
online)

Figure 4 shows the instances in the blobs dataset colored by the hardness of
the original IHM (in the left) followed by the adjusted IHM (in the center) and
152 P. S. M. Ueda et al.

the meta-learning approach (in the right). This can be done for this dataset,
as it is bi-dimensional. The harder the instance is to classify, the more intense
it is colored in red. In contrast, instances that are easier to classify are filled
with darker blue. The central areas of the plots contain the overlapping region
between the three classes (see Fig. 2) and, therefore, are harder to classify. The
first row corresponds to the kDN measure, while the second is the TDU measure.
For kDN, it is clear that the hardest instances are those in the border of the
classes. For TDU the pattern observed in the three approaches shows that the
harness level is related to partition derived from the decision tree classification.
All measures show similar behaviors. However, for the adjusted kDN measure,
more central instances have higher IHM values compared to the other measures.
It is important to note that since the adjusted measures can vary on a different
scale from the original IHM, the results presented in the plots were normalized
between 0 and 1 to allow a direct comparison.

Fig. 4. Visualization of the measures kDN (top) and TDU (bottom): original IHM
(left), adjusted IHM (middle) and meta-learning approach (right) for the blobs dataset.
(Color ﬁgure online)

4.3 Discussion
Considering the difference in the nature of the datasets, where the blobs were
artificially designed with three classes and the other are real-world health data,
the Spearman’s correlation in Fig. 3 shows that the MtL achieves more conver-
gent result than the adjusted IHM in the blobs dataset for great part of the
measures. Conversely, for real datasets, the adjusted IHM is more associated
with the original measure for almost all measures.
The tree-depth measures had the highest correlations to the original measures
for the adjusted IHM and the MtL approach. This is mostly related to the fact
that the original tree-depth measures do not depend so directly on the label of
An Instance Level Analysis of Classification Difficulty for Unlabeled Data 153

the instances. The other measures all regard whether the labels of some vicinity
are in accordance with the registered label of the instance. This makes them
deviate more for instances that are mislabeled, for instance.
This can be observed in Fig. 5, where the original and estimated IHMs KDN
(in the top) and TDU (in the bottom) are contraposed for all instances of the
blobs dataset. In the x-axis, we have the original measures, whilst, in the y-axis,
the proposed counterparts are taken. The adjusted kDN is normalized between
0 and 1 for direct comparison.

Fig. 5. The adjusted IHM vs the original IHM and meta-learning prediction vs original
IHM for the blobs dataset.

For the measure kDN, one can observe that as the hardness to classify the
instances grows, both the adjusted and the original IHM increase their values,
reaching their peak in the middle of the scale. After that, the value of the esti-
mated IHM assumes the opposite direction of the original measure. This result is
expected considering the unlabeled data, given that instances harder to classify
without a label will be predicted as belonging to any other class rather than
being an outlier from a speciﬁc class.
Conversely, in the TD unpruned graphs, the results indicate that the hard-
ness in classiﬁcation is independent of the class being known or not. For both
adjusted IHM and meta-learning approaches, there is some linearity between
those measures and the original IHM. It means that the proposed measures for
154 P. S. M. Ueda et al.

unlabeled data capture the increase of hardness for classification problems equiv-
alent to the increase in hardness when the class is given for this measure. This
result can be expected considering the nature of the measure.
The CLD measure, the only measure using likelihood as the metric, per-
formed more closely to the original measure with the adjusted IHM for all
datasets. Especially for datasets with two classes, in many cases, both estimates
might agree when the first and second classes with maximum likelihoods are the
same.
Overall, adjusted and meta-learning IHMs were able to assess the hardness
level of the unlabeled instances, with some prominence of the adjusted measures,
which showed larger correlations to the original measures in most cases. They
are also simpler to compute, as they do not need to induce a ML model as in
the meta-learning approach. In the absence of labels, most measures are more
effective in pointing borderline instances as posing a higher difficulty of posterior
classification.

5 Conclusion and Future Work

This research analyzed alternative ways to measure the hardness of instances for
classification problems in scenarios where the label of an instance is unknown,
that is, in the deployment stage. Standard IHMs from the literature were adapted
to this scenario. Their results were compared to the alternative of generating
regression meta-models to predict the IHM values. Both alternatives were effec-
tive on their behalf, correlating to the original IHMs that need to know the label
of each instance. The correlations were higher for some measures that do not rely
as much on the labels, but the results for other measures are expected as their
original formulation allows one to identify noise and outliers on data regard-
ing their labels. The results encourages the usage of the adjusted measures in
the deployment of ML models, allowing the identification of instances that ML
models might struggle to classify.
In future work, we will explore the patterns found in the comparisons between
the original and adjusted IHM not presented in this work and alternative mea-
sures for unlabeled data not addressed in this research. We will expand the
application of the adjusted measures and meta-learning to more datasets, and
tuning the meta-models could lead to new findings about the characteristics
of the instances. Another fruitful direction will be to explore the usage of the
adjusted measures for designing classification rejection options.

Acknowledgements. This study was ﬁnanced in part by the Coordenação de Aper-

feiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001. The
authors thank FAPESP for its support under grant 2021/06870-3.
An Instance Level Analysis of Classiﬁcation Diﬃculty for Unlabeled Data 155

References
1. Al Hosni, O.S., Starkey, A.: Investigating the performance of data complexity
& instance hardness measures as a meta-feature in overlapping classes problem.
In: ICCBDC 2023, Manchester, United Kingdom (2023). https://doi.org/10.1145/
3616131.3616132
2. Brazdil, P., van Rijn, J.N., Soares, C., Vanschoren, J.: Metalearning: Applica-
tions to Automated Machine Learning and Data Mining. Springer, Cham (2022).
https://doi.org/10.1007/978-3-030-67024-5
3. FAPESP: FAPESP COVID-19 data sharing/Br (2020). https://
repositoriodatasharingfapesp.uspdigital.usp.br
4. Franc, V., Prusa, D., Voracek, V.: Optimal strategies for reject option classifiers.
J. Mach. Learn. Res. 24(11), 1–49 (2023)
5. Ho, T.K., Basu, M.: Complexity measures of supervised classification problems.
IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002). https://doi.org/
10.1109/34.990132
6. Kelly, M., Longjohn, R., Nottingham, K.: The UCI machine learning repository
(2023). https://archive.ics.uci.edu
7. Lorena, A.C., Paiva, P.Y., Prudêncio, R.B.: Trusting my predictions: on the value
of instance-level analysis. ACM Comput. Surv. 56(7), 1–28 (2024)
8. Martínez-Plumed, F., Prudêncio, R.B., Martínez-Usó, A., Hernández-Orallo, J.:
Item response theory in AI: analysing machine learning classifiers at the instance
level. Artif. Intell. 271, 18–42 (2019)
9. Paiva, P.Y.A., Moreno, C.C., Smith-Miles, K., Valeriano, M.G., Lorena, A.C.:
Relating instance hardness to classification performance in a dataset: a visual app-
roach. Mach. Learn. 111(8), 3085–3123 (2022)
10. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn.
Res. 12, 2825–2830 (2011)
11. Prudêncio, R.B., Silva Filho, T.M.: Explaining learning performance with local
performance regions and maximally relevant meta-rules. In: Xavier-Junior, J.C.,
Rios, R.A. (eds.) BRACIS 2022. LNCS, vol. 13653, pp. 550–564. Springer, Cham
(2022). https://doi.org/10.1007/978-3-031-21686-2_38
12. Rivolli, A., Garcia, L.P., Soares, C., Vanschoren, J., de Carvalho, A.C.: Meta-
features for meta-learning. Knowl.-Based Syst. 108101 (2022)
13. Schweighofer, E.: Data-centric machine learning: improving model performance
and understanding through dataset analysis. In: Legal Knowledge and Information
Systems: JURIX 2021, vol. 346, p. 54. IOS Press (2021)
14. Smith, M.R., Martinez, T., Giraud-Carrier, C.: An instance level analysis of data
complexity. Mach. Learn. 95(2), 225–256 (2014)
15. Valeriano, M.G., et al.: Let the data speak: analysing data from multiple health
centers of the São Paulo metropolitan area for COVID-19 clinical deterioration
prediction. In: 2022 22nd IEEE International Symposium on Cluster, Cloud and
Internet Computing (CCGrid), pp. 948–951. IEEE (2022)

Review of Data Mining Classification Techniques
No ratings yet
Review of Data Mining Classification Techniques
4 pages
Estimating The Class Prior in Positive and Unlabeled Data Through Decision Tree Induction
No ratings yet
Estimating The Class Prior in Positive and Unlabeled Data Through Decision Tree Induction
8 pages
3.3 A Review of Unsupervised Feature Selection Methods
No ratings yet
3.3 A Review of Unsupervised Feature Selection Methods
42 pages
Oligois: Scalable Instance Selection For Class-Imbalanced Data Sets
No ratings yet
Oligois: Scalable Instance Selection For Class-Imbalanced Data Sets
15 pages
Data Mining: Accuracy and Error Measures For Classification and Prediction
No ratings yet
Data Mining: Accuracy and Error Measures For Classification and Prediction
15 pages
Pid Fuzzy Logic
No ratings yet
Pid Fuzzy Logic
15 pages
An Overview of Anomalous Sub Population: Mukti, Hari Singh
No ratings yet
An Overview of Anomalous Sub Population: Mukti, Hari Singh
4 pages
Unlabeled Data - Semi-Supervised Classification (PU Learning) - by Alon Agmon - Towards Data Science
No ratings yet
Unlabeled Data - Semi-Supervised Classification (PU Learning) - by Alon Agmon - Towards Data Science
10 pages
Efficiency Improvement in Classification Tasks Using Naive Bayes PDF
No ratings yet
Efficiency Improvement in Classification Tasks Using Naive Bayes PDF
5 pages
InTech-Types of Machine Learning Algorithms
No ratings yet
InTech-Types of Machine Learning Algorithms
32 pages
An Insight Into Classification With Imbalanced Data
No ratings yet
An Insight Into Classification With Imbalanced Data
29 pages
A Survey On Uncertainty Estimation in Deep Learning Classification Systems From A Bayesian Perspective
No ratings yet
A Survey On Uncertainty Estimation in Deep Learning Classification Systems From A Bayesian Perspective
35 pages
Heart Disease Prediction System Using Naive Bayes: Dhanashree S. Medhekar, Mayur P. Bote, Shruti D. Deshmukh
No ratings yet
Heart Disease Prediction System Using Naive Bayes: Dhanashree S. Medhekar, Mayur P. Bote, Shruti D. Deshmukh
5 pages
6-Machine Learning From Real Data
No ratings yet
6-Machine Learning From Real Data
10 pages
Applsci 13 04852 v2
No ratings yet
Applsci 13 04852 v2
18 pages
An Instance Level Analysis of Data Complexity
No ratings yet
An Instance Level Analysis of Data Complexity
32 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
39 pages
Imbalance Example Dependent Cost Classification 2023 Expert Systems With A
No ratings yet
Imbalance Example Dependent Cost Classification 2023 Expert Systems With A
13 pages
1-s2.0-S0031320324006320-main
No ratings yet
1-s2.0-S0031320324006320-main
13 pages
Naive Bayes Classification of Uncertain Data: 2009 Ninth IEEE International Conference On Data Mining
No ratings yet
Naive Bayes Classification of Uncertain Data: 2009 Ninth IEEE International Conference On Data Mining
6 pages
UNIT-3
No ratings yet
UNIT-3
34 pages
2002.01605v2
No ratings yet
2002.01605v2
41 pages
Paper IJRITCC
No ratings yet
Paper IJRITCC
5 pages
CH4_Imbalanced classes
No ratings yet
CH4_Imbalanced classes
18 pages
ML Algos
No ratings yet
ML Algos
31 pages
Wang 2019
No ratings yet
Wang 2019
37 pages
Machine Learning: Algorithms Types
No ratings yet
Machine Learning: Algorithms Types
27 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
A Framework for Characterizing What Makes an Instance Hard to Classify
No ratings yet
A Framework for Characterizing What Makes an Instance Hard to Classify
16 pages
Machine Learning: A Review of Classification and Combining Techniques
No ratings yet
Machine Learning: A Review of Classification and Combining Techniques
32 pages
Unit 3
No ratings yet
Unit 3
95 pages
World's Largest Science, Technology & Medicine Open Access Book Publisher
No ratings yet
World's Largest Science, Technology & Medicine Open Access Book Publisher
32 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Supervised Learning Classification Algorithms Comparison
No ratings yet
Supervised Learning Classification Algorithms Comparison
6 pages
InTech-Types of Machine Learning Algorithms PDF
No ratings yet
InTech-Types of Machine Learning Algorithms PDF
32 pages
Machine Learning: Algorithms Types
No ratings yet
Machine Learning: Algorithms Types
32 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
1-s2.0-S016786551730257X-main
No ratings yet
1-s2.0-S016786551730257X-main
7 pages
CS402 Mod 3
No ratings yet
CS402 Mod 3
2 pages
IntroClassificationDA-2024
No ratings yet
IntroClassificationDA-2024
129 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
Classification and Prediction
No ratings yet
Classification and Prediction
21 pages
Unit 6 Finalized
No ratings yet
Unit 6 Finalized
30 pages
2018 12state of ArtofImbalancedDataClassificationMethods
No ratings yet
2018 12state of ArtofImbalancedDataClassificationMethods
7 pages
Data MIning Chapter 8
No ratings yet
Data MIning Chapter 8
11 pages
Classification[1]
No ratings yet
Classification[1]
45 pages
5ef05cf6-737f-47f2-999f-cbf64870996c
No ratings yet
5ef05cf6-737f-47f2-999f-cbf64870996c
9 pages
Classification
No ratings yet
Classification
33 pages
Ch6 PDF
No ratings yet
Ch6 PDF
10 pages
Forecasting Municipal Solid Waste Generation Using Artificial Intelligence Modelling Approaches
No ratings yet
Forecasting Municipal Solid Waste Generation Using Artificial Intelligence Modelling Approaches
10 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Dwdm Unit IV Note
No ratings yet
Dwdm Unit IV Note
21 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Tres Hold
No ratings yet
Tres Hold
7 pages
Unit-2 - Part-I
No ratings yet
Unit-2 - Part-I
24 pages
Breast_Cancer_Classification-Group240
No ratings yet
Breast_Cancer_Classification-Group240
4 pages
Operations Research 7
No ratings yet
Operations Research 7
15 pages
Module 3: Project Time Management
No ratings yet
Module 3: Project Time Management
17 pages
Different Adv Algorithms For Machine Learning
No ratings yet
Different Adv Algorithms For Machine Learning
13 pages
CS 4700: Foundations of Artificial Intelligence
No ratings yet
CS 4700: Foundations of Artificial Intelligence
9 pages
Notes On Some Methods For Solving Linear Systems: Dianne P. O'Leary, 1983 and 1999 September 25, 2007
No ratings yet
Notes On Some Methods For Solving Linear Systems: Dianne P. O'Leary, 1983 and 1999 September 25, 2007
11 pages
Eigenfilter Approach To The Design of One-Dimensional and Multidimensional Two-Channel Linear-Phase FIR Perfect Reconstruction Filter Banks
No ratings yet
Eigenfilter Approach To The Design of One-Dimensional and Multidimensional Two-Channel Linear-Phase FIR Perfect Reconstruction Filter Banks
10 pages
Lab Assignment
0% (1)
Lab Assignment
62 pages
MSc-Physics-Syllabus-2022-2023_copy
No ratings yet
MSc-Physics-Syllabus-2022-2023_copy
50 pages
Norbert Wiener - The Homogeneous Chaos
No ratings yet
Norbert Wiener - The Homogeneous Chaos
41 pages
Sethi Model
No ratings yet
Sethi Model
4 pages
Computer Organization Assignment
No ratings yet
Computer Organization Assignment
2 pages
Chapter 6 (CONT') : Application: Powers of Matrices and Their Applications. 1 Powers of Matrices
No ratings yet
Chapter 6 (CONT') : Application: Powers of Matrices and Their Applications. 1 Powers of Matrices
9 pages
Card Payment Using Aes
No ratings yet
Card Payment Using Aes
48 pages
Simulation of Inverted Pendulum With Step Response
No ratings yet
Simulation of Inverted Pendulum With Step Response
8 pages
MG311 TUTORIAL - Statistics - 2016
No ratings yet
MG311 TUTORIAL - Statistics - 2016
1 page
State-space representation (5505ELEMM)
No ratings yet
State-space representation (5505ELEMM)
24 pages
A Study of the Optimization Algorithms in Deep Learning
No ratings yet
A Study of the Optimization Algorithms in Deep Learning
4 pages
Estimating Volatilities and Correlations: Options, Futures, and Other Derivatives, 9th Edition, 1
No ratings yet
Estimating Volatilities and Correlations: Options, Futures, and Other Derivatives, 9th Edition, 1
34 pages
Experiment No - 02
No ratings yet
Experiment No - 02
4 pages
Analysis and Prediction of Diabetes Using Machine Learning
No ratings yet
Analysis and Prediction of Diabetes Using Machine Learning
9 pages
Magic Lots Base Fixed Lots Base % Lots SL TP Max Spread Allowed
No ratings yet
Magic Lots Base Fixed Lots Base % Lots SL TP Max Spread Allowed
11 pages
Measurement Uncertainties Answers
No ratings yet
Measurement Uncertainties Answers
4 pages
Soumojit Kumar (Dob-05 Dec, 1986) : Email: Mobile: +91-7003578304
No ratings yet
Soumojit Kumar (Dob-05 Dec, 1986) : Email: Mobile: +91-7003578304
2 pages
19CSE456_VI Sem May 2022
No ratings yet
19CSE456_VI Sem May 2022
6 pages
Proposed List of Moocs (Nptel Jan-April-2023 Time Line) For B.E. Degree Honors
No ratings yet
Proposed List of Moocs (Nptel Jan-April-2023 Time Line) For B.E. Degree Honors
1 page
Lecture-03 Modeling and Simulation of Discrete Event Systems
No ratings yet
Lecture-03 Modeling and Simulation of Discrete Event Systems
10 pages
15CSE100 Computational Thinking and Problem Solving Course Plan
No ratings yet
15CSE100 Computational Thinking and Problem Solving Course Plan
5 pages
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
From Everand
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
Steve Brown
No ratings yet
Extending the Boundaries: An Expansive Journey into Nonparametric Curve Estimation
From Everand
Extending the Boundaries: An Expansive Journey into Nonparametric Curve Estimation
Pasquale De Marco
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

An Instance Level Analysis of Classification Difficulty for Unlabeled Data

Uploaded by

An Instance Level Analysis of Classification Difficulty for Unlabeled Data

Uploaded by

An Instance Level Analysis

of Classiﬁcation Diﬃculty for Unlabeled

Patricia S. M. Ueda1(B) , Adriano Rivolli2 , and Ana Carolina Lorena1

Abstract. Instance hardness measures allow us to assess and under-

Keywords: Machine Learning · Instance hardness measures ·

The Machine Learning (ML) literature extensively provides algorithmic devel-

2 Instance Hardness Measures

They deﬁne an instance as hard to classify if it gets consistently misclassiﬁed by

Fig. 1. Example of dataset with highlighted instances: x1 is in a borderline region

2.1 Neighborhood-Based IHM

2.2 Class Likelihood IHM

2.3 Tree-Based IHM

{xj |xj ∈ Disjunct(xi ) ∧ yj = yi }

2.4 Using Meta-models to Estimate IHM

Meta-learning is a traditional ML task that uses data related to ML itself [2].

expected IHM values in a regression task. Therefore, in this approach regression

3 Materials and Methods

Table 1. Summary of the datasets used in the study.

Blobs Diabetes Heart Hospital1 Hospital2

Fig. 2. Illustration of the blobs dataset.

blobs diabetes heart hospital1 hospital2

4.2 Correlation Analysis

Table 3. Spearman coeﬃcient obtained for the RF algorithm concerning predicting

blobs diabetes heart hospital1 hospital2

blobs diabetes heart hospital1 hospital2

5 Conclusion and Future Work

Acknowledgements. This study was ﬁnanced in part by the Coordenação de Aper-

You might also like