0% found this document useful (0 votes)
32 views32 pages

Handling Imbalanced Data and Metrics

The document discusses the challenges of class imbalance in machine learning, highlighting its prevalence in real-world datasets and the shortcomings of standard classification algorithms that assume balanced class distributions. It outlines various methodologies for addressing class imbalance, including data-level strategies like oversampling and undersampling, and critiques the reliance on accuracy as a performance metric. The document emphasizes the need for more sophisticated evaluation metrics and hybrid approaches that combine both oversampling and undersampling techniques to improve model performance on imbalanced datasets.

Uploaded by

Liyanshu patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views32 pages

Handling Imbalanced Data and Metrics

The document discusses the challenges of class imbalance in machine learning, highlighting its prevalence in real-world datasets and the shortcomings of standard classification algorithms that assume balanced class distributions. It outlines various methodologies for addressing class imbalance, including data-level strategies like oversampling and undersampling, and critiques the reliance on accuracy as a performance metric. The document emphasizes the need for more sophisticated evaluation metrics and hybrid approaches that combine both oversampling and undersampling techniques to improve model performance on imbalanced datasets.

Uploaded by

Liyanshu patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

A Comprehensive Treatise on

Imbalanced Classification:
Methodologies for Data Handling and
Model Evaluation

The Pervasiveness and Perils of Class Imbalance

The pursuit of building effective predictive models is a cornerstone of machine learning. A


fundamental, yet often overlooked, assumption underpinning many classical classification
algorithms is that the data is drawn from a balanced class distribution. In practice, this
assumption is frequently violated. The phenomenon of class imbalance, where the number of
observations across different classes is unequally distributed, is not an academic curiosity but
a pervasive characteristic of real-world datasets. This section defines the class imbalance
problem, establishes its prevalence across various domains, and deconstructs the
fundamental reasons it undermines the performance and utility of standard classification
models.

Defining Class Imbalance: A Formal and Practical Perspective

Formally, a dataset is considered to have a class imbalance when the distribution of its target
classes is highly skewed.1 In the context of binary classification, this means one class, termed
the majority class, contains a significantly larger number of instances than the other, termed
the minority class.4 While a minor disparity, such as a 45-55 split, is relatively balanced, a
dataset with a 90-10 or a more extreme 99-1 split is considered imbalanced.3 The issue can
manifest in multiclass problems as well, though it is most frequently discussed in the binary
context.3

This imbalance is not an anomaly but a natural consequence of the phenomena being
modeled in numerous practical domains.5 The minority class often represents the event of
interest—a rare but critical occurrence. Prominent examples include:
●​ Fraud Detection: In financial datasets, fraudulent transactions constitute a minuscule
fraction of the total volume, often less than 0.1%.3
●​ Medical Diagnosis: The prevalence of certain diseases, particularly rare conditions or
specific types of cancer, is inherently low within the general population.3
●​ Anomaly Detection: In fields like cybersecurity (intrusion detection) or industrial
manufacturing (fault detection), anomalous events are, by definition, infrequent
deviations from normal operation.2
●​ Text Classification: Common tasks such as spam detection or targeted sentiment
analysis often encounter datasets where one category (e.g., legitimate emails, neutral
sentiment) vastly outnumbers others.2
●​ Customer Churn Prediction: The number of customers who churn is typically much
smaller than the number who remain loyal.10

The challenge posed by class imbalance is not solely a function of the skewed ratio. It is often
compounded by other intrinsic data characteristics that can further complicate the learning
process, such as the degree of separability between classes, the presence of noisy or
mislabeled data, and the absolute size of the dataset, particularly the number of available
minority class examples.11

The Algorithmic Bias: Why Standard Classifiers Fail

The primary reason imbalanced datasets pose a significant challenge is that most
conventional machine learning algorithms are designed with an implicit assumption of a
balanced class distribution.3 The optimization objective for these algorithms is typically to
maximize overall accuracy or, equivalently, to minimize the total classification error across all
samples.1 This design choice has profound and detrimental consequences when applied to
imbalanced data.

When faced with a dataset where one class overwhelmingly dominates, a classifier can
achieve a very high accuracy score through a trivial and uninformative strategy: always
predicting the majority class.2 Because the majority class constitutes the bulk of the data, this
"lazy" approach minimizes the overall error rate, thus satisfying the algorithm's optimization
objective.7 The model learns that ignoring the minority class is the path of least resistance to
minimizing its loss function. The contribution of errors from the small number of minority
instances becomes statistically insignificant in the calculation of the total loss, leading the
model to develop a strong predictive bias towards the majority class.1
This algorithmic bias manifests in several critical failures:
●​ Poor Minority Class Performance: The model fails to learn the distinguishing
characteristics of the minority class, resulting in very low sensitivity (also known as recall)
for that class.3 It essentially becomes incapable of identifying the very instances it was
designed to detect.
●​ Poor Generalization: A model trained on an imbalanced dataset does not generalize well
to new, unseen data, especially for minority class predictions.3 It has not been exposed to
a sufficient number of minority examples to build a robust and accurate representation of
that class.
●​ Loss of Important Insights: By neglecting the minority class, the model fails to uncover
the patterns and relationships that are often of the greatest practical and economic
importance.1

Visually, this bias can be seen in the model's decision boundary. An algorithm trained on
imbalanced data will often produce a boundary that is shifted heavily towards the minority
class, or in extreme cases, it may classify the entire feature space as the majority class,
rendering the model completely useless for its intended purpose.9 This reveals a fundamental
conflict: the algorithm is succeeding in its own terms (minimizing error) while completely
failing in the terms of the application's objective (detecting the rare event). The problem is not
that the algorithm is broken; it is that its objective function is profoundly misaligned with the
practical goal in an imbalanced context. This misalignment dictates that any viable solution
must either transform the data to fit the algorithm's objective (data-level methods) or
transform the algorithm's objective to fit the problem's goal (algorithm-level methods).

The Accuracy Paradox: A Deceptive Metric

The failure of standard algorithms on imbalanced data is directly linked to the failure of
standard evaluation metrics. The most common metric, classification accuracy, becomes
dangerously misleading in this context. This phenomenon is known as the accuracy paradox:
a model can achieve an exceptionally high accuracy score while being completely
non-informative and practically useless.18

Consider a credit card fraud detection dataset with a 99:1 class imbalance, where 99% of
transactions are legitimate (majority class) and 1% are fraudulent (minority class). A naive
classifier that simply predicts "legitimate" for every single transaction will achieve a
classification accuracy of 99%.20 A practitioner might see this score and conclude the model
is performing exceptionally well. However, this model has zero predictive power for the actual
target of interest; it has a 0% success rate in identifying fraudulent transactions.2
This paradox underscores a critical principle: accuracy is a measure of overall correctness,
and in an imbalanced dataset, the overall correctness is dominated by the model's
performance on the majority class. The metric is blind to the distribution of errors between
classes. Consequently, relying on accuracy for model evaluation in imbalanced scenarios is a
common but severe mistake. It fundamentally invalidates accuracy as a primary performance
indicator and mandates the adoption of a more sophisticated suite of evaluation metrics that
can provide a class-aware and context-sensitive assessment of a model's true predictive
capabilities.10

Data-Level Strategies: Reshaping the Feature Space

Data-level strategies are among the most common and intuitive approaches to tackling class
imbalance. The core philosophy of these methods is to modify the training dataset itself to
create a more balanced class distribution before feeding it to a machine learning algorithm.
By rebalancing the data, these techniques aim to mitigate the inherent bias of standard
classifiers, allowing them to learn the characteristics of the minority class more effectively.
This section provides a detailed examination of the three primary categories of data-level
methods: oversampling, undersampling, and hybrid approaches. The progression of these
techniques reveals a significant evolution in thought, moving from simple, quantity-based
rebalancing to more sophisticated, quality-based methods that intelligently sculpt the feature
space to improve class separability.

Oversampling Techniques: Amplifying the Minority Signal

Oversampling methods address class imbalance by increasing the number of instances in the
minority class. The goal is to provide the learning algorithm with a stronger signal from the
underrepresented class, thereby reducing its bias towards the majority class.23

Random Oversampling (ROS)

Mechanism: Random Oversampling is the most straightforward oversampling technique. It


works by randomly selecting instances from the minority class and duplicating them, with
replacement, until the number of minority instances matches a desired ratio, often 1:1 with the
majority class.1

Advantages: The primary advantages of ROS are its simplicity and ease of implementation.
Furthermore, because it does not discard any data, it ensures that no information from the
original dataset is lost during the resampling process.23

Disadvantages: The principal drawback of ROS is a significant risk of overfitting.1 By


creating exact copies of existing minority instances, the model may learn to recognize these
specific replicated patterns rather than a generalizable decision boundary. This can lead to
excellent performance on the training data but poor performance on new, unseen data. The
duplication of data also increases the size of the training set, which can lead to a higher
computational load and longer training times.1

The SMOTE Family (Synthetic Minority Over-sampling Technique)

The Synthetic Minority Over-sampling Technique (SMOTE) and its variants were developed to
directly address the overfitting problem associated with ROS. Instead of duplicating existing
instances, SMOTE generates new, synthetic instances of the minority class, creating a more
diverse and robust representation.23

Core SMOTE Algorithm: The SMOTE algorithm operates in the feature space and can be
broken down into the following steps 24:
1.​ For each instance in the minority class, identify its k nearest neighbors that also belong
to the minority class. A typical value for k is 5.
2.​ Randomly select one of these k neighbors.
3.​ Calculate the difference vector between the original instance and the selected neighbor.
4.​ Multiply this difference by a random number between 0 and 1.
5.​ Add this new vector to the feature vector of the original instance. This creates a new,
synthetic data point that lies on the line segment connecting the original instance and its
chosen neighbor.
6.​ This process is repeated until the desired number of synthetic minority instances have
been generated.

Advantages: By generating novel, yet plausible, minority instances, SMOTE provides a richer
and more diverse training set for the classifier. This significantly reduces the risk of overfitting
compared to ROS and has been shown to be a highly effective baseline oversampling strategy
in numerous applications.23

Disadvantages and Critiques: Despite its advantages, the original SMOTE algorithm has
several known limitations:
●​ Generation of Noise: SMOTE does not consider the proximity of majority class instances
when generating synthetic samples. This can lead to the creation of new minority
instances in regions that are heavily populated by the majority class, effectively creating
noise and increasing class overlap, which can make the decision boundary even more
difficult to learn.18
●​ Insensitivity to Data Distribution: The algorithm generates the same number of
synthetic samples for each original minority instance, regardless of its local
neighborhood. It does not differentiate between instances that are "safe" (deep within
the minority class region) and those that are on the noisy border with the majority
class.26
●​ Limitations with Data Types and Dimensionality: Standard SMOTE relies on Euclidean
distance to find nearest neighbors, making it unsuitable for datasets containing
categorical or discrete variables without modification. Variants like SMOTE-NC (for mixed
data) and SMOTE-N (for categorical data) were developed to address this.17 The
algorithm's effectiveness can also degrade in high-dimensional spaces due to the "curse
of dimensionality".18
●​ Potential for Bias: Some researchers argue that SMOTE can be misapplied or that its
core assumption is flawed. By interpolating between existing points, it may generate
"falsified instances" that do not accurately represent the true, unknown distribution of
the minority class, potentially introducing a new form of bias.11

Advanced SMOTE Variants: To overcome the limitations of the original algorithm, a family of
more sophisticated SMOTE variants has been developed. These methods move beyond simple
interpolation and incorporate more information about the local data distribution to generate
higher-quality synthetic samples.
●​ Borderline-SMOTE: This variant focuses its synthetic sample generation efforts on the
most critical region: the decision boundary. It first identifies minority instances that are
"on the border" (i.e., where the majority of their nearest neighbors belong to the majority
class). It then applies the SMOTE algorithm only to these borderline instances, reinforcing
the areas where the model is most likely to be confused.17
●​ ADASYN (Adaptive Synthetic Sampling): ADASYN takes a similar but distinct approach
by adaptively generating more synthetic data for minority instances that are "harder to
learn." The "hardness" of an instance is determined by the proportion of majority class
instances in its local neighborhood. By generating more samples in these difficult
regions, ADASYN forces the classifier to pay greater attention to the most challenging
parts of the feature space.17
●​ SVM-SMOTE: This method leverages a Support Vector Machine (SVM) algorithm to
approximate the decision boundary. It then generates synthetic samples along the
direction of the support vectors, which are the instances that define the boundary. This
approach aims to create a cleaner and more robust separation between the classes by
strengthening the margin.34
●​ KMeans-SMOTE: This technique first applies the K-Means clustering algorithm to the
entire dataset. It then applies SMOTE within each cluster, generating synthetic samples
based on the local cluster density. This can be particularly effective if the minority class is
composed of several distinct sub-concepts or small, dense clusters, as it helps to
generate more diverse and representative samples for each sub-group.34

Undersampling Techniques: Reducing the Majority Noise

Undersampling techniques offer an alternative approach to rebalancing by reducing the


number of instances in the majority class. These methods are particularly appealing for very
large datasets where training time and computational resources are a concern.40

Random Undersampling (RUS)

Mechanism: As the name suggests, Random Undersampling involves randomly selecting and
discarding instances from the majority class until a desired class balance is achieved.9

Advantages: The primary benefit of RUS is its ability to significantly reduce the size of the
training dataset. This leads to faster model training and can alleviate storage and memory
constraints.18

Disadvantages: The main and most severe risk of RUS is information loss.3 By randomly
removing majority class instances, there is a high probability of discarding data points that are
crucial for defining the decision boundary or representing important variations within the
majority class. This can lead to a model that is biased in a new way and generalizes poorly to
unseen data.

Informed and Cleaning Undersampling

To mitigate the risk of information loss associated with RUS, more intelligent undersampling
methods have been developed. These techniques do not aim for a specific balance ratio but
instead focus on removing specific types of majority instances that are considered noisy or
unhelpful, thereby "cleaning" the feature space.

Tomek Links:
●​ Mechanism: A Tomek Link is defined as a pair of instances, one from the minority class
and one from the majority class, that are each other's nearest neighbors in the feature
space.40 The presence of such a pair often indicates either noise or an ambiguous region
along the class boundary. The Tomek Links undersampling algorithm identifies these
pairs and removes the majority class instance from each link.40
●​ Goal: The objective is not to balance the dataset but to "clean" the space between the
classes, creating a clearer and more well-defined decision boundary for the classifier to
learn.40
●​ Limitations: While effective for data cleaning, Tomek Links can be computationally
intensive on large datasets due to the pairwise distance calculations. Furthermore, it
often removes a relatively small number of instances, making it insufficient on its own to
address severe class imbalance.42

Edited Nearest Neighbors (ENN):


●​ Mechanism: The ENN rule is another data cleaning technique. It works by iterating
through the instances of the majority class and removing any instance whose class label
does not agree with the class of the majority of its k nearest neighbors. This process
effectively removes majority instances that are deep within minority regions or are
otherwise considered noisy or mislabeled.17

Hybrid Sampling Methodologies: The Best of Both Worlds

Recognizing the complementary strengths and weaknesses of oversampling and


undersampling, hybrid methodologies were developed to combine both approaches into a
single, more powerful pipeline.1 The rationale is to first address the lack of minority data
through oversampling and then clean up the potentially noisy result using a targeted
undersampling method.

The most common and effective strategy involves a two-step process:


1.​ Oversample: Apply an oversampling technique, typically SMOTE, to generate synthetic
instances for the minority class. This addresses the core problem of insufficient data for
the minority class.
2.​ Clean: Apply a cleaning undersampling technique, such as Tomek Links or ENN, to the
now-resampled dataset. This step removes instances that are considered noisy, including
original majority instances on the border and any "unsafe" synthetic minority instances
that may have been generated in overlapping regions.9

Key Examples:
●​ SMOTE-Tomek: This popular hybrid method first uses SMOTE to increase the
representation of the minority class and then uses Tomek Links to remove the resulting
borderline pairs. The outcome is a dataset that is not only more balanced but also has a
cleaner separation between the classes.9
●​ SMOTE-ENN: This method follows the same principle but uses the more aggressive
Edited Nearest Neighbors algorithm for the cleaning phase. This can result in the removal
of more noisy samples compared to Tomek Links.48

Advantages: Hybrid methods often yield superior performance compared to using either
oversampling or undersampling in isolation. They simultaneously tackle the problem of data
scarcity for the minority class and the problem of class overlap and noise at the decision
boundary.48 Empirical studies have shown that hybrid approaches can be particularly effective
for datasets with extreme levels of imbalance.50

The evolution of these data-level techniques illustrates a significant shift in the field's
understanding of the imbalance problem. The initial, naive solutions like ROS and RUS focused
purely on adjusting class counts—a quantity-based approach. The limitations of this
approach, namely overfitting and information loss, led to the development of SMOTE, which
improved the quality of the minority class representation by generating new data. However,
the recognition that SMOTE could introduce its own problems, such as noise and class
overlap, spurred the creation of more advanced SMOTE variants and hybrid methods. This
progression demonstrates a maturing perspective: the ultimate goal is not merely to achieve a
50:50 class ratio, but to intelligently sculpt the geometry of the feature space to create a
clean, well-defined, and easily learnable decision boundary. The choice of a data-level
technique, therefore, becomes a diagnostic decision. If the primary issue is a simple lack of
minority signal, SMOTE may be sufficient. If the data exhibits significant class overlap and a
noisy boundary, a more sophisticated approach like Borderline-SMOTE or a hybrid method
like SMOTE-Tomek is likely to be more effective.

Technique Core Primary Critical Ideal Use


Mechanism Advantages Disadvantage Case
s

Random Duplicates Simple to High risk of Quick baseline


Oversampling random implement; no overfitting; when data is
(ROS) minority class information increases limited and
instances. loss. training time. overfitting can
be controlled.
SMOTE Generates Reduces Can generate Standard
synthetic overfitting noise and choice for
minority compared to increase class oversampling
instances by ROS; creates a overlap; not when minority
interpolating more diverse ideal for class has a
between minority set. categorical or coherent
neighbors. high-dimensio structure.
nal data.

ADASYN Generates Focuses Can be Datasets


more synthetic learning on sensitive to where the
data for difficult noise; may decision
"harder-to-lea regions of the over-generate boundary is
rn" minority feature space. samples in complex and
instances. noisy areas. minority
instances are
sparsely
distributed.

Borderline-S Generates Strengthens Performance Problems


MOTE synthetic data the decision depends where clear
only for boundary heavily on the class
minority where quality of the separation at
instances near misclassificati boundary the boundary
the class on is most instances. is the primary
boundary. likely. challenge.

Random Randomly Reduces High risk of Very large


Undersamplin removes dataset size, information datasets
g (RUS) majority class leading to loss, where the
instances. faster training. potentially majority class
removing is highly
crucial redundant and
majority class training time is
patterns. a major
constraint.

Tomek Links Removes Cleans the Computational Data cleaning


majority class class ly expensive; step, often
instances that boundary by may not used in hybrid
form removing noisy remove methods, for
nearest-neigh and enough datasets with
bor pairs with ambiguous samples to noisy
minority points. significantly boundaries.
instances. address
imbalance.

SMOTE + Applies Balances the More A robust,


Tomek Links SMOTE to dataset while computationall general-purpo
oversample creating y complex than se approach
the minority well-defined, single for noisy and
class, then clean class methods. imbalanced
uses Tomek clusters. datasets.
Links to clean
the boundary.

Algorithm-Level Strategies: Modifying the Learning


Process

In contrast to data-level methods that alter the dataset, algorithm-level strategies modify the
learning algorithms themselves to make them more robust to class imbalance. These
techniques accept the original, skewed data distribution and adapt the model's training
process to pay more attention to the minority class. This approach represents a different
philosophical solution to the imbalance problem: instead of changing the data to suit the
model, it changes the model to suit the data. This section explores the three main categories
of algorithm-level strategies: cost-sensitive learning, specialized ensemble methods, and
one-class classification.

Cost-Sensitive Learning: Penalizing Errors Asymmetrically

Cost-sensitive learning directly confronts the fundamental issue of objective function


misalignment. Standard classifiers treat all misclassification errors as equal. However, in most
imbalanced classification scenarios, the consequences of different errors are vastly different.
For example, failing to detect a fraudulent transaction (a False Negative) is typically far more
costly than flagging a legitimate transaction for review (a False Positive). Cost-sensitive
learning formalizes this by assigning a higher penalty, or cost, to the misclassification of
minority class instances.1 The algorithm's optimization objective is then shifted from
minimizing the total number of errors to minimizing the total misclassification cost.13

Cost Matrices: The asymmetric penalties are formally defined in a cost matrix. For a binary
classification problem, this matrix specifies the cost for each of the four outcomes in a
confusion matrix. Crucially, the cost associated with a False Negative, $C(FN)$, is set to be
significantly higher than the cost of a False Positive, $C(FP)$.13 The total cost to be minimized
is then a weighted sum of errors:

$$Total Cost = C(FN) \times \text{Number of FNs} + C(FP) \times \text{Number of FPs}$$
Practical Implementation via Class Weights: In practice, most modern machine learning
libraries do not require the user to define an explicit cost matrix. Instead, they implement
cost-sensitive learning through a class_weight parameter. By assigning a higher weight to the
minority class, the user instructs the algorithm to amplify the loss contribution of any errors
made on instances of that class during training.1

A common and effective heuristic for setting these weights is to make them inversely
proportional to the class frequencies in the training data.54 For a dataset with a 99:1 ratio of
majority to minority instances, the weight for the minority class could be set to 99 and the
weight for the majority class to 1. Many libraries, such as scikit-learn, provide a 'balanced'
option for the class_weight parameter, which automatically calculates and applies these
inverse-frequency weights.56

Algorithm-Specific Modifications: The implementation of class weights varies depending


on the algorithm:
●​ Cost-Sensitive Logistic Regression: The class_weight parameter modifies the negative
log-likelihood loss function. The loss calculated for each sample is multiplied by its
corresponding class weight, meaning that errors on minority class samples contribute
much more to the total loss and, consequently, to the gradient updates of the model's
coefficients.56
●​ Cost-Sensitive Decision Trees and Random Forests: Class weights influence the node
splitting criterion (e.g., Gini impurity or information gain). When evaluating potential splits,
the algorithm gives more importance to splits that correctly classify instances from the
higher-weighted minority class. This biases the tree-building process towards creating
purer nodes for the minority class.54
●​ Cost-Sensitive Support Vector Machines (SVMs): The regularization parameter, $C$,
which controls the penalty for margin violations, is adjusted on a per-class basis. A
higher $C$ value is assigned to the minority class, imposing a stricter penalty for
misclassifying its instances and forcing the algorithm to find a hyperplane that better
separates them.34
●​ Focal Loss (for Deep Learning): Focal Loss is a more advanced cost-sensitive
technique designed for dense object detectors but widely applicable in deep learning. It
dynamically scales the standard cross-entropy loss, down-weighting the loss assigned to
well-classified examples (particularly the numerous "easy" examples from the majority
class). This allows the model to focus its training efforts on "hard," misclassified
examples, which are often instances of the minority class.18

Ensemble Learning Approaches: The Power of Combination

Ensemble methods, which build a strong predictive model by combining the outputs of
multiple weaker models, are naturally well-suited for imbalanced classification due to their
inherent robustness and variance reduction properties.1 Their performance can be further
enhanced through specific adaptations for imbalanced data.

Adapting Standard Ensembles:


●​ Bagging (e.g., Random Forest): Bagging (Bootstrap Aggregating) involves training
multiple models (e.g., decision trees) on different bootstrap samples of the training data.
While a standard Random Forest can perform reasonably well on imbalanced data, its
effectiveness can be significantly improved by either using it in conjunction with a
resampled dataset or by activating its built-in class_weight functionality, which applies
cost-sensitive learning to each individual tree in the forest.31
●​ Boosting (e.g., AdaBoost, Gradient Boosting, XGBoost): Boosting algorithms build
models sequentially, where each subsequent model is trained to correct the errors of its
predecessors. This mechanism naturally lends itself to imbalanced problems, as the
hard-to-classify minority instances will be consistently misclassified by early models and
will therefore receive greater focus in later stages of training. Modern boosting libraries
like XGBoost include specific hyperparameters, such as scale_pos_weight, to explicitly
assign a higher weight to the positive (minority) class, combining the power of boosting
with cost-sensitive learning.2

Specialized Ensemble Methods for Imbalance:


Beyond adapting standard ensembles, several methods have been designed specifically to
handle class imbalance by integrating resampling directly into the ensemble construction
process.
●​ EasyEnsemble: This method addresses the information loss problem of Random
Undersampling (RUS). Instead of creating a single undersampled dataset, EasyEnsemble
independently generates multiple balanced training subsets. Each subset contains all
instances from the minority class, combined with a different, randomly sampled subset of
the majority class. A separate classifier (often an AdaBoost model) is trained on each of
these subsets. The final prediction is an aggregation of the outputs from all the trained
classifiers.61 By using all majority instances across the entire ensemble, EasyEnsemble
avoids the severe information loss of a single RUS pass while still benefiting from the
computational efficiency of training on smaller, balanced datasets.63
●​ BalanceCascade: This method builds upon the idea of EasyEnsemble but introduces a
more guided, iterative approach. It trains a sequence of classifiers on undersampled
subsets, similar to EasyEnsemble. However, after each classifier in the cascade is trained,
the majority class instances that were correctly classified by it are removed from the pool
of available majority instances for the next iteration. This forces subsequent classifiers to
focus on progressively more difficult majority class examples that lie closer to the
decision boundary, leading to a more refined and accurate final ensemble.61

One-Class Classification: Reframing as Anomaly Detection

In cases of extreme class imbalance, or when the minority class instances are highly
heterogeneous and do not form a coherent group, it can be effective to reframe the problem
entirely. Instead of trying to distinguish between two classes, the task becomes one of
anomaly detection.1 This approach involves training a model exclusively on the majority class
data to learn a representation of "normal." Any new instance that deviates significantly from
this learned norm is then classified as an "anomaly" or "outlier," which corresponds to the
minority class.68

Key Algorithms:
●​ One-Class SVM: This is a variant of the Support Vector Machine algorithm that is trained
in an unsupervised manner on data from a single class (the majority or "normal" class). It
learns a decision boundary, typically a hypersphere or hyperplane, that encloses the
majority of the training data in the feature space. Any new data point that falls outside
this learned boundary is flagged as an outlier.1 A crucial hyperparameter is nu, which
serves as an upper bound on the fraction of training examples that can be considered
errors (i.e., fall outside the boundary) and a lower bound on the fraction of instances that
will be used as support vectors.68
●​ Isolation Forest: This is an ensemble-based algorithm built on the principle that
anomalies are "few and different" and are therefore easier to isolate than normal data
points. The algorithm constructs a forest of "isolation trees," where each tree recursively
and randomly partitions the data until individual instances are isolated. Anomalous
points, being different, are likely to be isolated in fewer partitions, resulting in a shorter
average path length from the root of the trees. This path length is used to calculate an
anomaly score for each instance.1

The choice between a data-level and an algorithm-level strategy reflects a fundamental


trade-off between data purity and model complexity. Data-level methods operate on the
premise of "fixing" the data so that a standard, often simpler, model can be applied
effectively. The complexity in this approach lies in the data preprocessing pipeline. In contrast,
algorithm-level methods accept the data in its original, imbalanced state and instead build a
more complex or specialized model designed to handle its inherent challenges. The
complexity here is embedded within the learning algorithm itself. This presents a practical
decision point for a practitioner: is it more feasible to engineer the data or to engineer the
model? The answer may depend on constraints such as the availability of specific libraries,
computational resources, or requirements for model interpretability. If a legacy model cannot
be modified, data-level methods are the only option. Conversely, if the data itself is immutable
for regulatory or other reasons, algorithm-level methods become necessary.

Furthermore, the existence of specialized ensemble methods like EasyEnsemble and


BalanceCascade offers a profound conceptual lesson. They demonstrate that creating a
single, "perfectly" balanced dataset is often a suboptimal strategy. By training on multiple,
different balanced subsets, these methods implicitly learn a more robust and comprehensive
representation of the majority class, effectively averaging out the potential biases introduced
by any single random undersampling pass. This suggests that the true distribution of the data
is best approximated not by one ideal sample, but by an aggregation of diverse perspectives
from many imperfect samples—a powerful concept that mirrors the core philosophy of
ensemble learning itself.

A Robust Framework for Classifier Evaluation

The successful handling of an imbalanced dataset is only half the battle; the other half is
accurately and honestly measuring the performance of the resulting classifier. As established
previously, traditional accuracy is a deeply flawed metric in this context. A robust evaluation
framework requires moving beyond accuracy and adopting a suite of metrics that provide a
nuanced, multi-faceted, and class-aware view of a model's performance. This section
deconstructs the essential evaluation tools, starting with the foundational confusion matrix
and extending to a range of threshold-based and rank-based metrics specifically suited for
imbalanced classification.

The Confusion Matrix: The Foundational Tool for Evaluation

The confusion matrix is the cornerstone of classification model evaluation. It is a simple table
that provides a comprehensive summary of a model's predictive performance by
cross-tabulating the actual class labels against the predicted class labels.20 For a binary
classification problem, it is a 2x2 matrix that contains four essential counts 73:
●​ True Positives (TP): The number of instances from the positive (minority) class that
were correctly predicted as positive.
●​ True Negatives (TN): The number of instances from the negative (majority) class that
were correctly predicted as negative.
●​ False Positives (FP) (Type I Error): The number of instances from the negative
(majority) class that were incorrectly predicted as positive. These are often referred to as
"false alarms."
●​ False Negatives (FN) (Type II Error): The number of instances from the positive
(minority) class that were incorrectly predicted as negative. These are the critical
"missed detections."

The confusion matrix is indispensable because it moves beyond a single, aggregated score
and reveals the specific types of errors a model is making. It is the fundamental source from
which all other meaningful threshold-based evaluation metrics are derived.11

Threshold-Based Metrics: Beyond Accuracy

Threshold-based metrics are calculated from the counts in the confusion matrix and provide
single-value scores that summarize different aspects of a model's performance at a given
classification threshold (typically 0.5 for probabilistic models).

Core Class-Specific Metrics

These metrics focus on the performance of the model with respect to a single class, which is
crucial for understanding how well the model is handling the minority class.
●​ Precision (Positive Predictive Value - PPV): Precision measures the reliability of the
positive predictions. It answers the question: "Of all the instances that the model
predicted as positive, what proportion were actually positive?".16 High precision is critical
in scenarios where the cost of a False Positive is high. For example, in a targeted
marketing campaign, high precision ensures that resources are not wasted on customers
who are not actually interested.
○​ Formula: $Precision = \frac{TP}{TP + FP}$.1
●​ Recall (Sensitivity or True Positive Rate - TPR): Recall measures the model's ability to
find all the positive instances in the dataset. It answers the question: "Of all the actual
positive instances, what proportion did the model successfully identify?".16 High recall is
paramount in domains where the cost of a False Negative is severe. For instance, in
medical diagnosis or fraud detection, failing to identify a disease or a fraudulent
transaction can have catastrophic consequences.
○​ Formula: $Recall = \frac{TP}{TP + FN}$.73
●​ Specificity (True Negative Rate - TNR): Specificity is the equivalent of recall for the
negative class. It measures the model's ability to correctly identify all the negative
instances. It answers the question: "Of all the actual negative instances, what proportion
did the model successfully identify?".78
○​ Formula: $Specificity = \frac{TN}{TN + FP}$.75

Balanced Performance Metrics

In practice, there is often a trade-off between precision and recall. A model can achieve
perfect recall by classifying every instance as positive, but this would result in very low
precision. Conversely, a model can achieve high precision by being very conservative with its
positive predictions, but this would likely lower its recall. Balanced metrics aim to combine
these individual measures into a single score that provides a more holistic assessment of
performance.
●​ F1-Score: The F1-score is the harmonic mean of Precision and Recall. It provides a
balanced measure that is high only when both precision and recall are high. Because it is
a harmonic mean, it penalizes extreme values more than a simple arithmetic mean would.
It is one of the most widely used metrics for imbalanced classification because it is
sensitive to the model's performance on the minority class and is not inflated by a large
number of true negatives.17
○​ Formula: $F1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}$.78
○​ The more general F-beta Score allows for explicit weighting of recall over precision.
A $\beta$ value greater than 1 gives more weight to recall, while a $\beta$ value less
than 1 gives more weight to precision.82
●​ Geometric Mean (G-Mean): The G-Mean is the geometric mean of Sensitivity (Recall)
and Specificity. It measures the balance between the classification performance on both
the minority and majority classes. A low G-Mean score indicates poor performance in
identifying the minority class, even if the model achieves high specificity by correctly
classifying the majority class.78 It is a good indicator of a model's ability to perform well
on both classes simultaneously.
○​ Formula: $G-Mean = \sqrt{Sensitivity \times Specificity}$.80
●​ Matthews Correlation Coefficient (MCC): The MCC is considered by many researchers
to be one of the most informative and robust single-score metrics for binary
classification, especially in imbalanced scenarios.86 It is a correlation coefficient between
the observed and predicted classifications and takes into account all four values in the
confusion matrix (TP, TN, FP, and FN). A key advantage of MCC is that it produces a high
score only if the classifier obtains good results in all four categories. Its value ranges from
-1 (perfect misclassification) to +1 (perfect classification), with 0 indicating a
performance no better than random guessing. Unlike the F1-score, it is inherently
symmetric, meaning its value does not change if the positive and negative classes are
swapped.17
○​ Formula: $MCC = \frac{TP \times TN - FP \times
FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}$

Rank-Based Metrics: Evaluating Probabilistic Performance

While threshold-based metrics evaluate a model's performance at a single decision point,


rank-based metrics evaluate the quality of a model's probabilistic outputs across all possible
thresholds. They assess the model's overall ability to separate the two classes.

Receiver Operating Characteristic (ROC) Curve and AUC-ROC

●​ Construction: The ROC curve is a graphical plot that illustrates the diagnostic ability of a
binary classifier as its discrimination threshold is varied. It plots the True Positive Rate
(TPR), or Recall, on the y-axis against the False Positive Rate (FPR) on the x-axis,
where $FPR = \frac{FP}{FP + TN}$.16
●​ Interpretation: The area under this curve, known as AUC-ROC, provides a
single-number summary of the model's performance. An AUC of 1.0 represents a perfect
classifier that can perfectly distinguish between the classes, while an AUC of 0.5
represents a model with no discriminative ability, equivalent to random guessing.77
●​ Limitation in Imbalanced Contexts: Despite its popularity, the AUC-ROC can be overly
optimistic and misleading on datasets with a severe class imbalance.17 The reason is
that the FPR calculation in the denominator includes the number of True Negatives
($TN$). In a highly imbalanced dataset, $TN$ is a very large number. This means the
model can generate a substantial number of false positives without making a significant
impact on the overall FPR, leading to a deceptively high AUC score that masks poor
performance on the minority class.
Precision-Recall (PR) Curve and AUC-PR

●​ Construction: The Precision-Recall (PR) curve plots Precision on the y-axis against
Recall on the x-axis at various threshold settings.16
●​ Interpretation: A skillful model is one that can maintain a high precision even as recall
increases. The baseline for a PR curve is not a diagonal line but a horizontal line
corresponding to the fraction of positive examples in the dataset (i.e., the prevalence of
the minority class).77 The area under this curve, AUC-PR (also known as Average
Precision), summarizes the plot.
●​ Superiority in Imbalanced Contexts: The PR curve is widely regarded as a more
informative and appropriate evaluation tool than the ROC curve for imbalanced
classification tasks.17 The key reason is that the calculation of Precision ($TP / (TP + FP)$)
does not involve the number of True Negatives. Therefore, the PR curve is not influenced
by the large number of correctly classified majority class instances. It focuses directly on
the performance of the model on the positive (minority) class, evaluating the trade-off
between correctly identifying positive instances and the rate of false alarms, which is
often the central business problem.

The selection of an evaluation metric is not a mere technicality performed after a model is
built; it is a strategic decision that must be made at the outset of a project, as it defines the
very criteria for success. This choice should be driven by the specific business context and
the relative costs of different types of prediction errors. For example, in a medical screening
application, the consequence of a False Negative (missing a sick patient) is far more severe
than that of a False Positive (subjecting a healthy patient to further tests). This context
dictates that Recall is the most critical metric, and the model should be optimized to
maximize it, even if it comes at the expense of lower Precision.60 Conversely, in a system that
automatically flags emails for deletion as spam, a False Positive (deleting an important,
non-spam email) is highly undesirable, making Precision the more important metric.76

This leads to a distinction between the tactical and strategic value of different metric types.
Threshold-based metrics like the F1-score or MCC are excellent for tactical evaluation,
assessing a model's performance at a single, fixed operational decision point. However,
rank-based metrics, particularly AUC-PR, offer a more strategic assessment. A model with a
high AUC-PR is inherently more valuable and flexible because it demonstrates a strong ability
to separate classes across a wide range of trade-offs. This provides the business with a
spectrum of good operating points, allowing them to adjust the decision threshold in the
future to meet changing needs (e.g., becoming more aggressive in fraud detection during a
high-risk period) without needing to retrain the entire model. Therefore, AUC-PR is invaluable
for strategic model comparison and selection, while threshold-based metrics are essential for
evaluating performance at the point of deployment.

Metric Formula Interpretation Prioritize When...


("What question (Business
does it answer?") Context)

Precision $\frac{TP}{TP + Of all instances The cost of False


FP}$ predicted as Positives is high
positive, how many (e.g., spam filtering,
were actually targeted
positive? marketing).

Recall $\frac{TP}{TP + Of all actual The cost of False


(Sensitivity) FN}$ positive instances, Negatives is high
how many did the (e.g., medical
model correctly diagnosis, fraud
identify? detection).

F1-Score $2 \times What is the False Positives and


\frac{Precision balanced False Negatives are
\times performance of roughly equal
Recall}{Precision + between precision importance.
Recall}$ and recall?

G-Mean $\sqrt{Sensitivity How well does the A balanced


\times Specificity}$ model perform on performance
both the minority across both classes
and majority is desired, ensuring
classes the majority class is
simultaneously? not ignored.

MCC $\frac{TP \times TN What is the A single, robust,


- FP \times correlation and balanced score
FN}{\sqrt{(TP+FP)( between the is needed that is
TP+FN)(TN+FP)(TN predicted and reliable even with
+FN)}}$ actual severe class
classifications, imbalance.
considering all four
confusion matrix
values?
AUC-PR Area Under the How well can the Evaluating the
Precision-Recall model rank positive overall probabilistic
Curve instances higher ranking ability of a
than negative ones, model for an
across all imbalanced
thresholds, problem,
focusing on independent of a
minority class specific threshold.
performance?

Synthesis and Strategic Recommendations

Navigating the landscape of imbalanced classification requires a strategic, multi-faceted


approach that extends beyond the mere application of a single algorithm or metric. It
demands a holistic methodology that integrates domain knowledge, thoughtful data handling,
appropriate algorithmic choices, and rigorous, context-aware evaluation. This concluding
section synthesizes the preceding analysis into a coherent framework, providing practitioners
with actionable guidelines and best practices for successfully developing and deploying
robust classifiers in the face of class imbalance.

A Comparative Framework for Handling Techniques

The choice of a technique to handle class imbalance is a critical decision with significant
trade-offs. The three major paradigms—Data-Level, Algorithm-Level, and Hybrid—offer
different philosophies for solving the problem.
●​ Data-Level Methods (Resampling): These techniques, such as SMOTE and RUS, focus
on transforming the training data to create a more balanced distribution. Their primary
advantage is that they are model-agnostic; a balanced dataset can be used to train any
standard classifier. However, they carry inherent risks. Oversampling methods like SMOTE
can introduce noise and potentially create artificial instances that do not reflect the true
data distribution, leading to overfitting. Undersampling methods like RUS risk significant
information loss by discarding potentially valuable majority class examples. The
computational cost of these methods, especially sophisticated oversampling techniques,
can also be considerable.
●​ Algorithm-Level Methods (Cost-Sensitive Learning & Specialized Algorithms):
These techniques modify the learning algorithm's objective function or internal
mechanics to make it inherently sensitive to the minority class. Cost-sensitive learning,
implemented via class weights, is a powerful and often computationally efficient
approach that forces the model to penalize errors on the minority class more heavily.
Specialized algorithms like One-Class SVMs reframe the problem entirely. The main
advantage of this approach is that it works with the original, true data distribution,
avoiding the potential artifacts of resampling. However, these methods require the
chosen algorithm to support such modifications (e.g., have a class_weight parameter)
and may be less interpretable than training a standard model on balanced data.
●​ Hybrid and Ensemble Methods: These approaches represent the most advanced and
often most effective strategies. Hybrid sampling methods (e.g., SMOTE-Tomek) combine
the benefits of oversampling and undersampling to both increase the minority signal and
clean the class boundary. Specialized ensemble methods (e.g., EasyEnsemble) overcome
the limitations of simple undersampling by training multiple classifiers on different
subsets of the majority class, thereby reducing information loss and improving
robustness. These methods are often more computationally expensive but tend to yield
superior performance by addressing multiple facets of the imbalance problem
simultaneously.

Guidelines for Technique Selection: A Practical Workflow

There is no single "best" technique for all imbalanced classification problems. The optimal
choice depends on the specific characteristics of the dataset, the computational constraints,
and the project's goals. The following workflow provides a structured approach to selecting
and applying these techniques effectively.

Step 1: Understand Your Domain and Data.


Before applying any technique, it is crucial to perform a thorough exploratory data analysis.
Understand the source of the imbalance: is the minority class naturally rare, or is the skew a
result of a biased data collection process?.3 Analyze the absolute number of minority
samples, the dataset's overall size and dimensionality, and the degree of class separability or
overlap. Visualizing the data can provide invaluable intuition.91
Step 2: Establish a Rigorous Baseline.
Always begin by training a standard classification algorithm on the original, imbalanced data.
Evaluate this baseline model using a comprehensive suite of appropriate metrics (as detailed
in Section 4), but never with accuracy alone. This baseline provides a crucial reference point
to quantify the improvement gained from any subsequent handling technique.
Step 3: Start with Simple, Robust Approaches.
●​ Cost-Sensitive Learning: For many problems, the most efficient and effective first step
is to use a robust, gradient-boosted tree algorithm (like XGBoost, LightGBM) or a
Random Forest and enable its built-in cost-sensitive learning capabilities (e.g.,
scale_pos_weight in XGBoost or class_weight='balanced' in scikit-learn).31 This approach
is computationally efficient, works directly on the true data distribution, and is often a
very strong performer.
●​ Collect More Data: If feasible, the most powerful solution is often to collect more data,
particularly for the underrepresented minority class. This is the only method that
introduces genuinely new information into the model.19

Step 4: Experiment with Resampling Techniques.


If cost-sensitive learning is insufficient or not applicable, proceed to resampling. The choice
between oversampling and undersampling should be guided by the dataset size.
●​ For very large datasets where training time is a concern, undersampling can be a
viable option. Start with Random Undersampling but consider more informed methods
like Tomek Links or ENN to minimize information loss.40
●​ For small or medium-sized datasets, oversampling is generally the preferred
approach to avoid discarding valuable data.23 SMOTE is a standard starting point, but if
class overlap is a concern, variants like Borderline-SMOTE or ADASYN should be
explored.
●​ Hybrid methods like SMOTE-Tomek or SMOTE-ENN are often a robust choice, providing a
good balance between generating minority data and cleaning the decision boundary.
They are particularly recommended for noisy datasets.48​
Remember that there is no universally superior method; empirical experimentation is
essential to find the best fit for a specific problem.1

Step 5: Use Cross-Validation Correctly.


This is one of the most critical best practices in handling imbalanced data. Resampling
techniques must be applied correctly within a cross-validation framework to avoid data
leakage and obtain a reliable estimate of model performance. The resampling should be
performed only on the training portion of each fold, never on the entire dataset before
splitting. The validation/test fold must remain in its original, imbalanced state to reflect the
real-world data distribution on which the model will be deployed.3 Using stratified k-fold
cross-validation is also essential to ensure that the class proportions are preserved in each
train-test split.11

Best Practices for Evaluation and Reporting

Effective evaluation is paramount for building trust in a model and making informed decisions.
The following best practices should be standard procedure for any imbalanced classification
project.

1. Abandon Accuracy as the Primary Metric.


For any dataset with a non-trivial class imbalance, accuracy is a misleading and inappropriate
metric. Its use as the sole or primary performance indicator should be avoided entirely, as it
will almost always provide an overly optimistic and uninformative assessment of the model.2
2. Select Evaluation Metrics Based on Business Goals.
The choice of metric should be a direct reflection of the project's objectives and the relative
costs of different types of errors.
●​ If the primary goal is to capture as many minority instances as possible and the cost of
False Negatives is high (e.g., disease screening), prioritize Recall.76
●​ If the primary goal is to ensure that positive predictions are highly reliable and the cost of
False Positives is high (e.g., flagging a customer's account for fraud), prioritize
Precision.76
●​ If a balance between False Positives and False Negatives is required, the F1-Score is a
suitable choice. For a more robust and statistically sound single-score metric that is less
sensitive to class distribution, the Matthews Correlation Coefficient (MCC) is highly
recommended.80
●​ To evaluate a model's overall ability to separate classes across all possible operating
thresholds, independent of a specific decision point, the Area Under the
Precision-Recall Curve (AUC-PR) is the most informative metric for imbalanced data.17

3. Report a Comprehensive Suite of Metrics.


Never rely on a single metric to tell the whole story. A transparent and thorough evaluation
report should always include the full confusion matrix along with a suite of relevant metrics,
such as Precision, Recall, F1-Score or MCC, and AUC-PR. This multi-faceted reporting
provides a complete picture of the model's strengths and weaknesses.60
4. Consider Threshold Moving.
Most classifiers output a probability score before a decision threshold (typically 0.5) is
applied to assign a class label. This default threshold is often not optimal for imbalanced
problems. Threshold moving (or threshold tuning) is a post-processing step where the
decision threshold is adjusted to optimize a specific evaluation metric (like the F1-score) or to
meet a specific business constraint (e.g., achieve a minimum of 90% recall). This simple yet
powerful technique can significantly improve a model's practical utility without the need for
retraining.1
Ultimately, the successful management of imbalanced data is not a single, isolated step but an
integrated part of the end-to-end machine learning workflow. It begins with a deep
understanding of the problem domain, informs the choice of data preparation and modeling
techniques, and culminates in a rigorous and context-aware evaluation. The evolution of the
field from simple data balancing to more sophisticated methods that guide the learner's focus
indicates a shift in philosophy. The goal is no longer just to balance class counts, but to
provide the learning algorithm with the right information and the right incentives to learn the
difficult but critical patterns hidden within the minority class. By adopting a holistic, empirical,
and principled methodology, practitioners can build models that are not only statistically
sound but also practically valuable and robust in the face of real-world data complexities.

Works cited

1.​ Computational Strategies for Handling Imbalanced Data in Machine Learning - ISI,
accessed October 21, 2025,
[Link]
[Link]
2.​ Everything You Need to Know When Assessing Imbalance Class Problem Skills -
Alooba, accessed October 21, 2025,
[Link]
em/
3.​ What is Imbalanced Dataset - GeeksforGeeks, accessed October 21, 2025,
[Link]
4.​ Class Imbalance Definition - Encord, accessed October 21, 2025,
[Link]
5.​ Class-imbalanced datasets | Machine Learning - Google for Developers,
accessed October 21, 2025,
[Link]
nced-datasets
6.​ Handling Imbalanced Data in Classification | Keylabs, accessed October 21, 2025,
[Link]
7.​ Tackling the Challenge of Imbalanced Datasets: A Comprehensive Guide -
Medium, accessed October 21, 2025,
[Link]
tasets-a-comprehensive-guide-2feb11ca2fa0
8.​ [Link], accessed October 21, 2025,
[Link]
h-code-8bc8fae71e1a#:~:text=Class%20imbalance%20occurs%20when%20one,
anomaly%20detection%2C%20and%20medical%20diagnosis.
9.​ Class Imbalance Strategies — A Visual Guide with Code | by Travis ..., accessed
October 21, 2025,
[Link]
h-code-8bc8fae71e1a
10.​How to Handle Imbalanced Data? - Analytics Vidhya, accessed October 21, 2025,
[Link]
ed-data-for-a-classification-problem/
11.​ Class Imbalance in Machine Learning - Train in Data's Blog, accessed October 21,
2025, [Link]
12.​Classification of Imbalanced Datasets using One-Class SVM, k-Nearest
Neighbors and CART Algorithm - The Science and Information (SAI) Organization,
accessed October 21, 2025,
[Link]
d_Dataset.pdf
13.​Cost-Sensitive Learning Methods for Imbalanced Data, accessed October 21,
2025,
[Link]
14.​A review of machine learning methods for imbalanced data challenges in
chemistry - PMC, accessed October 21, 2025,
[Link]
15.​How to Handle Imbalanced Data for Machine Learning in Python - Semaphore CI,
accessed October 21, 2025,
[Link]
16.​Practical ML: Addressing Class Imbalance | by Juan C Olamendy - Medium,
accessed October 21, 2025,
[Link]
25c4f1b97ee3
17.​Imbalanced Dataset: Strategies to Fix Skewed Class Distributions - Label Your
Data, accessed October 21, 2025,
[Link]
18.​How to Deal With Imbalanced Classification and Regression Data - [Link],
accessed October 21, 2025,
[Link]
on-data
19.​8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset -
[Link], accessed October 21, 2025,
[Link]
our-machine-learning-dataset/
20.​Failure of Classification Accuracy for Imbalanced Class Distributions -
[Link], accessed October 21, 2025,
[Link]
istributions/
21.​The harm of class imbalance corrections for risk prediction models: illustration
and simulation using logistic regression - PubMed Central, accessed October 21,
2025, [Link]
22.​How To Handle Imbalanced Data in Classification - phData, accessed October 21,
2025,
[Link]
23.​Exploring Oversampling Techniques for Imbalanced Datasets - Train in Data's
Blog, accessed October 21, 2025,
[Link]
/
24.​How to Handle Imbalanced Classes in Machine Learning - GeeksforGeeks,
accessed October 21, 2025,
[Link]
asses-in-machine-learning/
25.​Sampling for imbalance data in Python | by Mabrouka Salmi - Medium, accessed
October 21, 2025,
[Link]
-20fc995361db
26.​Challenges and limitations of synthetic minority oversampling techniques in
machine learning - PMC, accessed October 21, 2025,
[Link]
27.​Overcoming Class Imbalance with SMOTE: How to Tackle Imbalanced Datasets in
Machine Learning - Train in Data's Blog, accessed October 21, 2025,
[Link]
28.​ML | Handling Imbalanced Data with SMOTE and Near Miss Algorithm in Python,
accessed October 21, 2025,
[Link]
with-smote-and-near-miss-algorithm-in-python/
29.​SMOTE for Imbalanced Classification with Python -
[Link], accessed October 21, 2025,
[Link]
ication/
30.​SMOTE oversampling for better machine learning classification - Domino Data
Lab, accessed October 21, 2025,
[Link]
31.​Handling Imbalanced Data: 7 Innovative ... - Data Science Dojo, accessed October
21, 2025,
[Link]
32.​Selective oversampling approach for strongly imbalanced data - PMC, accessed
October 21, 2025, [Link]
33.​How does SMOTE work for dataset with only categorical variables?, accessed
October 21, 2025,
[Link]
k-for-dataset-with-only-categorical-variables
34.​Mastering Imbalanced Data: Comprehensive Techniques for ..., accessed October
21, 2025,
[Link]
-techniques-for-machine-learning-engineers-7b1641dd4395
35.​Comparison of OverSampling Methods(ImbalancedData) - Kaggle, accessed
October 21, 2025,
[Link]
balanceddata
36.​COMPARISON OF DATASET OVERSAMPLING ALGORITHMS AND THEIR
APPLICABILITY TO THE CATEGORIZATION PROBLEM, accessed October 21, 2025,
[Link]
37.​A Comparative Study of SMOTE, Borderline-SMOTE, and ADASYN Oversampling
Techniques using Different Classifiers | Request PDF - ResearchGate, accessed
October 21, 2025,
[Link]
MOTE_Borderline-SMOTE_and_ADASYN_Oversampling_Techniques_using_Differ
ent_Classifiers
38.​Compare over-sampling samplers — Version 0.14.0 - Imbalanced Learn,
accessed October 21, 2025,
[Link]
on_over_sampling.html
39.​A Comparative Study of Sampling Methods with Cross-Validation in the FedHome
Framework - arXiv, accessed October 21, 2025, [Link]
40.​The Role of Undersampling in Tackling Imbalanced Datasets in Machine Learning,
accessed October 21, 2025,
[Link]
a/
41.​3. Under-sampling — Version 0.14.0 - Imbalanced-learn, accessed October 21,
2025, [Link]
42.​Undersampling Techniques for Handling Unbalanced Datasets | CodeSignal Learn,
accessed October 21, 2025,
[Link]
ersampling-techniques-for-handling-unbalanced-datasets
43.​Tomek Links: An Undersampling Approach | by Simardeep Kaur - Medium,
accessed October 21, 2025,
[Link]
28f8d703c6a0
44.​3. Under-sampling — Version 0.15.dev0 - Imbalanced-learn, accessed October
21, 2025, [Link]
45.​Tomek links Algorithm – Undersampling to handle Imbalanced data in machine
learning by Mahesh Huddar - YouTube, accessed October 21, 2025,
[Link]
46.​Mitigating the Effects of Class Imbalance Using SMOTE and Tomek Link
Undersampling in SAS, accessed October 21, 2025,
[Link]
47.​(PDF) A Hybrid Sampling SVM Approach to Imbalanced Data Classification -
ResearchGate, accessed October 21, 2025,
[Link]
Approach_to_Imbalanced_Data_Classification
48.​Hybrid and Ensemble Methods: Advanced Approaches to Address ..., accessed
October 21, 2025,
[Link]
nced-approaches-to-address-imbalanced-data-in-machine-learning-b16548122
e5f
49.​(PDF) Comparative Analysis of Data Balancing Techniques for Machine Learning
Classification on Imbalanced Student Perception Datasets - ResearchGate,
accessed October 21, 2025,
[Link]
ata_Balancing_Techniques_for_Machine_Learning_Classification_on_Imbalanced_
Student_Perception_Datasets
50.​A Comparison of Undersampling, Oversampling, and SMOTE ..., accessed
October 21, 2025, [Link]
51.​Cost-Sensitive Learning for Imbalanced Classification ..., accessed October 21,
2025,
[Link]
sification/
52.​Cost-Sensitive Learning (CSL) - Machine Learning with Imbalanced Data -
YouTube, accessed October 21, 2025,
[Link]
53.​Analysis of preprocessing vs. cost-sensitive learning for imbalanced
classification. Open problems on intrinsic data characteris, accessed October 21,
2025, [Link]
54.​How to implement cost-sensitive learning in decision trees ..., accessed October
21, 2025,
[Link]
ng-in-decision-trees/
55.​Impact of imbalanced features on large datasets - PMC, accessed October 21,
2025, [Link]
56.​Imbalanced Classification: Cost Sensitive Algrthms - Kaggle, accessed October
21, 2025,
[Link]
itive-algrthms
57.​Cost-Sensitive Logistic Regression for Imbalanced Classification ..., accessed
October 21, 2025,
[Link]
58.​An Optimized Cost-Sensitive SVM for Imbalanced Data Learning - Department of
Computing Science, accessed October 21, 2025,
[Link]
59.​IMBENS: Ensemble Class-imbalanced Learning in Python - Zhining Liu, accessed
October 21, 2025, [Link]
60.​Tips for Handling Imbalanced Data in Machine Learning ..., accessed October 21,
2025,
[Link]
arning/
61.​Ensemble of Rotation Trees for Imbalanced Medical Datasets - PMC - PubMed
Central, accessed October 21, 2025,
[Link]
62.​EasyEnsemble and Feature Selection for Imbalance Data Sets - ResearchGate,
accessed October 21, 2025,
[Link]
_Selection_for_Imbalance_Data_Sets
63.​Exploratory Undersampling for Class-Imbalance Learning, accessed October 21,
2025, [Link]
64.​(PDF) EASY ENSEMMBLE WITH RANDOM FOREST TO HANDLE IMBALANCED
DATA IN CLASSIFICATION - ResearchGate, accessed October 21, 2025,
[Link]
RANDOM_FOREST_TO_HANDLE_IMBALANCED_DATA_IN_CLASSIFICATION
65.​Trainable Undersampling for Class-Imbalance Learning - AAAI Publications,
accessed October 21, 2025,
[Link]
66.​Survey of Imbalanced Data Methodologies - arXiv, accessed October 21, 2025,
[Link]
67.​Handling Imbalanced Data in Machine Learning: Data-level, Model-level
Strategies, and Evaluation Metrics | by Dgholamian | Medium, accessed October
21, 2025,
[Link]
ng-data-level-model-level-strategies-and-evaluation-6467115e5966
68.​One-Class Classification Algorithms for Imbalanced Datasets ..., accessed
October 21, 2025,
[Link]
69.​One Class SVM - Louise E. Sinks, accessed October 21, 2025,
[Link]
70.​Understanding One-Class Support Vector Machines - GeeksforGeeks, accessed
October 21, 2025,
[Link]
ort-vector-machines/
71.​What is a confusion matrix? - IBM, accessed October 21, 2025,
[Link]
72.​Confusion matrix - Wikipedia, accessed October 21, 2025,
[Link]
73.​Understanding the Confusion Matrix in Machine Learning - GeeksforGeeks,
accessed October 21, 2025,
[Link]
ning/
74.​How to interpret a confusion matrix for a machine learning model, accessed
October 21, 2025,
[Link]
75.​Precision and recall - Wikipedia, accessed October 21, 2025,
[Link]
76.​Performance Metrics: Confusion matrix, Precision, Recall, and F1 Score, accessed
October 21, 2025,
[Link]
n-recall-and-f1-score-a8fe076a2262/
77.​ROC Curves and Precision-Recall Curves for Imbalanced ..., accessed October 21,
2025,
[Link]
-imbalanced-classification/
78.​Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced
Data - SAS Support, accessed October 21, 2025,
[Link]
79.​7. Metrics — Version 0.14.0 - Imbalanced-learn, accessed October 21, 2025,
[Link]
80.​Tour of Evaluation Metrics for Imbalanced Classification ..., accessed October 21,
2025,
[Link]
-classification/
81.​F-score - Wikipedia, accessed October 21, 2025,
[Link]
82.​what are the appropriate evaluation metrics used for handle imbalanced data? -
Kaggle, accessed October 21, 2025,
[Link]
83.​Advanced Evaluation Metrics for Imbalanced Classification Models | by Rajneesh
Tiwari | CueNex | Medium, accessed October 21, 2025,
[Link]
fication-models-ee6f248c90ca
84.​Best techniques and metrics for Imbalanced Dataset - Kaggle, accessed October
21, 2025,
[Link]
mbalanced-dataset
85.​Which metric should we use to evaluate highly imbalanced classification model
performance? | ResearchGate, accessed October 21, 2025,
[Link]
ghly_imbalanced_classification_model_performance
86.​The advantages of the Matthews correlation coefficient (MCC) over F1 score and
accuracy in binary classification evaluation, accessed October 21, 2025,
[Link]
87.​Limitations in Evaluating Machine Learning Models for Imbalanced Binary
Outcome Classification in Spine Surgery: A Systematic Review - PMC - PubMed
Central, accessed October 21, 2025,
[Link]
88.​Low G-mean and MCC for binary classification of imbalanced data - Stack
Overflow, accessed October 21, 2025,
[Link]
-classification-of-imbalanced-data
89.​ROC and precision-recall with imbalanced datasets, accessed October 21, 2025,
[Link]
h-imbalanced-datasets/
90.​[Discussion] Metric to evaluate imbalance data. : r/MachineLearning - Reddit,
accessed October 21, 2025,
[Link]
o_evaluate_imbalance_data/
91.​[D]How to handle highly imbalanced dataset? : r/MachineLearning - Reddit,
accessed October 21, 2025,
[Link]
highly_imbalanced_dataset/
92.​7 Techniques to Handle Imbalanced Data - KDnuggets, accessed October 21,
2025,
[Link]
93.​How to Handle Unbalanced Classes: 5 Strategies - Roboflow Blog, accessed
October 21, 2025, [Link]

You might also like