Imbalanced-Learn module in Python
Last Updated :
11 Dec, 2020
Imbalanced-Learn is a Python module that helps in balancing the datasets which are highly skewed or biased towards some classes. Thus, it helps in resampling the classes which are otherwise oversampled or undesampled. If there is a greater imbalance ratio, the output is biased to the class which has a higher number of examples. The following dependencies need to be installed to use imbalanced-learn:
- scipy(>=0.19.1)
- numpy(>=1.13.3)
- scikit-learn(>=0.23)
- joblib(>=0.11)
- keras 2 (optional)
- tensorflow (optional)
To install imbalanced-learn just type in :
pip install imbalanced-learn
The resampling of data is done in 2 parts:
Estimator: It implements a fit method which is derived from scikit-learn. The data and targets are both in the form of a 2D array
estimator = obj.fit(data, targets)
Resampler: The fit_resample method resample the data and targets into a dictionary with a key-value pair of data_resampled and targets_resampled.
data_resampled, targets_resampled = obj.fit_resample(data, targets)
The Imbalanced Learn module has different algorithms for oversampling and undersampling:
We will use the built-in dataset called the make_classification dataset which return
- x: a matrix of n_samples*n_features and
- y: an array of integer labels.
Click dataset to get the dataset used.
Python3
# import required modules
from sklearn.datasets import make_classification
# define dataset
x, y = make_classification(n_samples=10000,
weights=[0.99],
flip_y=0)
print('x:\n', X)
print('y:\n', y)
Output:
Below are some programs in which depict how to apply oversampling and undersampling to the dataset:
Oversampling
- Random Over Sampler: It is a naive method where classes that have low examples are generated and randomly resampled.
Syntax:
from imblearn.over_sampling import RandomOverSampler
Parameters(optional): sampling_strategy='auto', return_indices=False, random_state=None, ratio=None
Implementation:
oversample = RandomOverSampler(sampling_strategy='minority')
X_oversample,Y_oversample=oversample.fit_resample(X,Y)
Return Type:a matrix with the shape of n_samples*n_features
Example:
Python3
# import required modules
from sklearn.datasets import make_classification
from imblearn.over_sampling import RandomOverSampler
# define dataset
x, y = make_classification(n_samples=10000,
weights=[0.99],
flip_y=0)
oversample = RandomOverSampler(sampling_strategy='minority')
x_over, y_over = oversample.fit_resample(x, y)
# print the features and the labels
print('x_over:\n', x_over)
print('y_over:\n', y_over)
Output:

- SMOTE, ADASYN: Synthetic Minority Oversampling Technique (SMOTE) and the Adaptive Synthetic (ADASYN) are 2 methods used in oversampling. These also generate low examples but ADASYN takes into account the density of distribution to distribute the data points evenly.
Syntax:
from imblearn.over_sampling import SMOTE, ADASYN
Parameters(optional):*, sampling_strategy='auto', random_state=None, n_neighbors=5, n_jobs=None
Implementation:
smote = SMOTE(ratio='minority')
X_smote,Y_smote=smote.fit_resample(X,Y)
Return Type:a matrix with the shape of n_samples*n_features
Example:
Python3
# import required modules
from sklearn.datasets import make_classification
from imblearn.over_sampling import SMOTE
# define dataset
x, y = make_classification(n_samples=10000, weights=[0.99], flip_y=0)
smote = SMOTE()
x_smote, y_smote = smote.fit_resample(x, y)
# print the features and the labels
print('x_smote:\n', x_smote)
print('y_smote:\n', y_smote)
Output:

Undersampling
- Edited Nearest Neighbours: This algorithm removes any sample which has labels different from those of its adjoining classes.
Syntax:
from imblearn.under_sampling import EditedNearestNeighbours
Parameters(optional): sampling_strategy='auto', return_indices=False, random_state=None, n_neighbors=3, kind_sel='all', n_jobs=1, ratio=None
Implementation:
en = EditedNearestNeighbours()
X_en,Y_en=en.fit_resample(X, y)
Return Type:a matrix with the shape of n_samples*n_features
Example:
Python3
# import required modules
from sklearn.datasets import make_classification
from imblearn.under_sampling import EditedNearestNeighbours
# define dataset
x, y = make_classification(n_samples=10000, weights=[0.99], flip_y=0)
en = EditedNearestNeighbours()
x_en, y_en = en.fit_resample(x, y)
# print the features and the labels
print('x_en:\n', x_en)
print('y_en:\n', y_en)
Output:

- Random Under Sampler: It involves sampling any random class with or without any replacement.
Syntax:
from imblearn.under_sampling import RandomUnderSampler
Parameters(optional): sampling_strategy='auto', return_indices=False, random_state=None, replacement=False, ratio=None
Implementation:
undersample = RandomUnderSampler()
X_under, y_under = undersample.fit_resample(X, y)
Return Type: a matrix with the shape of n_samples*n_features
Example:
Python3
# import required modules
from sklearn.datasets import make_classification
from imblearn.under_sampling import RandomUnderSampler
# define dataset
x, y = make_classification(n_samples=10000,
weights=[0.99],
flip_y=0)
undersample = RandomUnderSampler()
x_under, y_under = undersample.fit_resample(x, y)
# print the features and the labels
print('x_under:\n', x_under)
print('y_under:\n', y_under)
Output:
Similar Reads
Inspect Module in Python The inspect module in Python is useful for examining objects in your code. Since Python is an object-oriented language, this module helps inspect modules, functions and other objects to better understand their structure. It also allows for detailed analysis of function calls and tracebacks, making d
4 min read
Create and Import modules in Python In Python, a module is a self-contained Python file that contains Python statements and definitions, like a file named GFG.py, can be considered as a module named GFG which can be imported with the help of import statement. However, one might get confused about the difference between modules and pac
3 min read
External Modules in Python Python is one of the most popular programming languages because of its vast collection of modules which make the work of developers easy and save time from writing the code for a particular task for their program. Python provides various types of modules which include built-in modules and external m
5 min read
Python Fire Module Python Fire is a library to create CLI applications. It can automatically generate command line Interfaces from any object in python. It is not limited to this, it is a good tool for debugging and development purposes. With the help of Fire, you can turn existing code into CLI. In this article, we w
3 min read
Basics Of Python Modules A library refers to a collection of modules that together cater to a specific type of needs or application. Module is a file(.py file) containing variables, class definitions statements, and functions related to a particular task. Python modules that come preloaded with Python are called standard li
3 min read