Make_pipeline() function in Sklearn

Last Updated : 04 Sep, 2022

In this article let's learn how to use the make_pipeline method of SKlearn using Python.

The make_pipeline() method is used to Create a Pipeline using the provided estimators. This is a shortcut for the Pipeline constructor identifying the estimators is neither required nor allowed. Instead, their names will automatically be converted to lowercase according to their type. when we want to perform operations step by step on data, we can make a pipeline of all the estimators in sequence.

Syntax: make_pipeline()

parameters:

stepslist of Estimator objects: The chained scikit-learn estimators are listed below.
memorystr or object with the joblib.Memory interface, default=None: used to store the pipeline's installed transformers. No caching is done by default. The path to the cache directory is specified if a string is provided. A copy of the transformers is made before they are fitted when caching is enabled. As a result, it is impossible to directly inspect the transformer instance that the pipeline was given. For a pipeline's estimators, use the named steps or steps attribute. When fitting takes a while, it is useful to cache the transformers.
verbosebool, default=False: If True, each step's completion time will be printed after it has taken its required amount of time.

returns:

p: Pipeline: A pipeline object is returned.

Example: Classification algorithm using make pipeline method

This example starts with importing the necessary packages. 'diabetes.csv' file is imported. Feature variables X and y where X variables represent a set of independent features and 'y' represents a dependent variable. train_test_split() is used to split X and y variables into train and test sets. test_size is 0.3, which means 30% of data is test data. make_pipeline() method is used to create a pipeline where there's a standard scaler and logistic regression model. First, the standard scaler gets executed and then the logistic regression model. fit() method is used to fit the data in the pipe and predict() method is used to carry out predictions on the test set. accuracy_score() metric is used to find the accuracy score of the logistic regression model.

To read and download the dataset click here.

Python3

# import packages
from sklearn.linear_model import LogisticRegression 
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
import pandas as pd

# import the csv file
df = pd.read_csv('diabetes.csv')

# feature variables
X = df.drop('Outcome',axis=1)
y = df['Outcome']

# splitting data in train and test sets
X_train, X_test, y_train, y_test = train_test_split(X,y,
                                                    test_size=0.3,
                                                    random_state=101)
# creating a pipe using the make_pipeline method
pipe = make_pipeline(StandardScaler(),
                     LogisticRegression())

#fitting data into the model
pipe.fit(X_train, y_train)

# predicting values
y_pred = pipe.predict(X_test)

# calculating accuracy score
accuracy_score = accuracy_score(y_pred,y_test)
print('accuracy score : ',accuracy_score)