0% found this document useful (0 votes)
93 views

Machine Learning: Pradyumn Sharma Pragati Software Pvt. LTD

The document provides an overview of a 5-day machine learning training program. It includes an agenda that covers topics like linear regression, logistic regression, decision trees, clustering, and model deployment. It also discusses prerequisites like statistics and linear algebra. Finally, it introduces Google Colab as the development environment and libraries like Pandas and scikit-learn that will be used in the training.

Uploaded by

Surabhi Kulkarni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views

Machine Learning: Pradyumn Sharma Pragati Software Pvt. LTD

The document provides an overview of a 5-day machine learning training program. It includes an agenda that covers topics like linear regression, logistic regression, decision trees, clustering, and model deployment. It also discusses prerequisites like statistics and linear algebra. Finally, it introduces Google Colab as the development environment and libraries like Pandas and scikit-learn that will be used in the training.

Uploaded by

Surabhi Kulkarni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 85

Machine Learning

Pradyumn Sharma
Pragati Software Pvt. Ltd.
[email protected]
www.pragatisoftware.com
[email protected]

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 1
Program Contents
1.Day one 3.Day three
1. Essential concepts and 1. Polynomial regression
terminology 2. Logistic regression
2. Understanding linear (classification)
regression 3. Classification report
3. Hypothesis, cost function 4.Day four
4. Gradient Descent, learning rate 1. Decision tree
5. Essentials of numpy, pandas, 2. K-nearest neighbors
matplot libraries 3. Ensemble techniques
6. Training and test dataset split 5.Day five
2.Day two 1. Unsupervised learning:
1. Predictive modelling Clustering
2. Using the Stochastic Gradient 2. K-means
Descent (SGD) regressor 3. Anomaly detection
3. Tweaking the SGD regressor 4. Deploying ML Models
4. R-square: coefficient of
determination
5. Making predictions
6. Feature scaling

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 2
Machine Learning Prerequisites

• Machine Learning is more about Maths than about Python or


any libraries. More specifically, familiarity with at least some
basics of :
 Statistics, including probability
 Linear algebra
 Calculus
• https://www.youtube.com/watch?v=tGyfmzuR4d4&t=5s

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 3
What is Machine Learning?

• Giving computers the ability to learn without being explicitly


programmed with some knowledge. –Arthur Samuel, 1959.
• Computer algorithms that autonomously learn from data.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 4
Conventional Programming vs Machine Learning

Rules
Conventional Result
Input Programming

Rule: F = (9 * C / 5) + 32

Input: 20
Output: 68

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 5
Conventional Programming vs Machine Learning

Rules
Conventional Result
Input Programming

Input
Machine
Result Rules
Learning
F = (9 * C / 5) + 32

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 6
Machine Learning: Some More Examples

• Making predictions: consumer behavior, market behavior


• Medical diagnostics
• Spam filters
• Face recognition in Facebook
• Troll detection system
• Robots that learn by observing human actions
• Program to play chess
• Chatbots
• Autonomous vehicles
• Machine translation
• Natural language processing

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 7
Machine Learning Technologies

• Python with Scikit-learn


• Python with Tensorflow
• Python with PyTorch
• R programming language
• Matlab, Octave
• Lex (Amazon), Luis (Microsoft), Watson (IBM), Wit.ai
(Facebook)
• and many more
• Development and deployment environments
 Jupyter Notebook
 Google Colaboratory
 Google AutoML, Amazon SageMaker, Azure Machine Learning

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 8
Technologies Used in This Program

• Python
• General libraries: numpy, pandas, matplotlib
• ML library: scikit-learn
• Development environment: Google Colab

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 9
Introduction to Google Colaboratory

• Google Colaboratory, or 'Colab' allows you to write and run


Python programs in the browser. Benefits:
 Zero configuration required
 Free access to GPUs
 Easy sharing
 Limited memory access in the free version; higher limits in the paid
versions

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 10
Setting Up Google Colab

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 11
Pandas

• Pandas (pandas.pydata.org) is an open-source library that


provides data structures and data analysis tools.
• http://pandas.pydata.org/pandas-docs/stable/getting_started/1
0min.html
: 10 minutes to Pandas.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 12
scikit-learn

• Open-source, Machine Learning library in Python.


• https://scikit-learn.org/stable/index.html

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 13
Loading and Examining Data

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 14
Rentals in Andheri East, Mumbai

Source: www.housing.com
27-Oct-2017

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 15
Rentals in Andheri East, Mumbai

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 16
Loading the Data from Google Drive

import pandas as pd
from google.colab import drive

drive.mount('/content/drive')

full_data = pd.read_csv
("/content/drive/MyDrive/rentals.csv")

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 17
Viewing Data

• print (full_data)
• full_data

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 18
Some Statistics About Data

• full_data.describe ()

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 19
Standard Deviation of a Sample

• It is a statistical measure of dispersal of data from the mean


of a group of values.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 20
Percentiles

• 90th percentile = 90% of the data elements have lower values


• 99th percentile = 99% of the data elements have lower values
• 75th percentile = 75% of the data elements have lower values;
also called the 3rd quartile
• 50th percentile = 50% of the data elements have lower values;
also called the 2nd quartile, or median
• 25th percentile = 25% of the data elements have lower values;
also called the 1st quartile
• Interquartile range: 3rd quartile – 1st quartile. This indicates
the spread of data.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 21
The Normal Distribution

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 22
The Normal (Distribution) Curve

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 23
Many Distributions Follow the Normal Curve

• Examples : Height, weight, IQ levels.


• But not: income, house prices.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 24
Normal Distribution Table

• About 68% values fall within 1 standard deviation of the


mean
• About 95% values fall within 2 standard deviations of the
mean (to be more precise, 1.96 standard deviations)
• About 99.7% within 3 standard deviations of the mean.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 25
Normal Distribution

• In a random sample from a population, if the statistics for the


height of men are found to be as follows:
 mean: 67.8 inches
 standard deviation: 1.6 inches
• Then we can say that
 about 68% people have the height between 66.2 and 69.4 inches
 about 95% people have the height between 64.6 and 71 inches
 about 99.7% people have the height between 63 and 72.6 inches

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 26
Distinct Values and Their Count

full_data['bedrooms'].value_counts()

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 27
Correlations Among Variables

• full_data.corr ()
• Provides a measure of correlation between various variables.
• Value close to 1 => strong positive correlation
• Value close to -1 => strong negative correlation
• Value close to 0 => no linear correlation

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 28
Box Plot

full_data.area.plot(kind='box')

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 29
Box Plot

full_data.plot (kind='box', subplots=True,


layout=(2, 2))

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 30
Box Plot

full_data[['area', 'rent']].plot (kind='box',


subplots=True,layout=(2, 2),figsize=(10,6))

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 31
Outliers

• Upper value outliers:


values greater than 75th percentile + 1.5 * IQR
• Lower value outliers:
values less than 25th percentile – 1.5 * IQR

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 32
Histogram

full_data.area.hist()

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 33
Histogram

full_data['area'].hist(bins=25)

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 34
Scatter Matrix

full_data.plot(kind='scatter',
x = 'area', y = 'rent')

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 35
Scatter Matrix

from pandas.plotting import scatter_matrix


scatter_matrix(full_data)

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 36
Saving a Diagram

full_data.area.plot(kind='box')
plt.savefig
("/content/drive/MyDrive/output.png")

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 37
Understanding the Key Concepts

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 38
An Introduction to Linear Regression

• Suppose we consider linear regression with


just one input variable...

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 39
Area x Rent

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 40
Learning Algorithm

Training set

Learning
Algorithm

Size of house Hypothesis Expected Rent


(x) (h) (y)

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 41
Hypothesis

• Given the value of an


input variable (size of a
house), estimate the
value of output
variable (expected
rent).
• Hypothesis:

• Example:
 f = 32 + 1.8c
 rent = 5000 + 40 x area
• How do we choose the
values of and ?

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 42
Generalized Hypothesis

Or...

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 43
Hypothesis as a Matrix Operation

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 44
Cost Function: Least Squared Error (L2)

• Cost for is taken as


• Cost function

• Minimize J

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 45
The LinearRegression Class

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 46
Separating the Predictors and the Target

dataX = full_data.drop(columns=['rent'])
dataY = pd.DataFrame ({'rent':full_data.rent})

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 47
Training and Test Data Split

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 48
Training and Test Data Split

from sklearn.model_selection import train_test_split


trainX, testX, trainY, testY = train_test_split(dataX,
dataY, test_size = 0.20)

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 49
Different Random Splits Every Time

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 50
Ensuring Same Split Every Time

trainX, testX, trainY, testY = train_test_split


(dataX, dataY, test_size = 0.20, random_state =
11)

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 51
Defining the Model

from sklearn.linear_model import LinearRegression


model = LinearRegression ()

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 52
Training the Model

model.fit (trainX, trainY)

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 53
Evaluating the Results

print ('Coefficients:', model.coef_)


print ('Intercept:', model.intercept_)
print ('R2 on training data:', model.score(trainX,
trainY))

Hypothesis:
rent = 911 + 27 x area + 6103 x bedrooms + 840 x furnished

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 54
R-Squared (Coefficient of Determination)

CoD: a measure of how much


change in output variable (y) is
explained by changes in the
input variable (x)

If,
are the actual values,
are the predicted values,
is the mean of the actual values,

Then,

= 1-

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 55
Mean Squared Error, Mean Absolute Error

from sklearn.metrics import mean_squared_error,


mean_absolute_error

predictions = model.predict(trainX)
mse = mean_squared_error (trainY, predictions)
rmse = np.sqrt (mse)
print ('RMSE on training data: ', rmse)

mae = mean_absolute_error (trainY, predictions)


print ('MAE on training data: ', mae)

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 56
Predictions for User-Supplied Data

area = input('Enter area : ')


bedrooms = input('Enter bedrooms : ')
furnished = input(
'Enter furnished state (0/1/2) : ')
customTestX = pd.DataFrame({'area': area ,
'bedrooms': bedrooms,
'furnished': furnished }, index=[0])
prediction = model.predict (customTestX)
print('\nPrediction result :: ',prediction)

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 57
The Normal Equation Method

• The Normal Equation Method is a "closed-form solution" to


minimize the cost function J (using OLS, ordinary least
squares)
• Method to solve for theta analytically.

• x= y=

mxn mx1

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 58
Normal Equation Method

• A simple technique for regression. However,


• Inverting an (n x n) matrix is computationally intensive. The
order of this is between O(n2.4) and O(n3). Thus it is too slow
when n is very large, say in thousands.
• Some metrices are not invertible; hence this method will not
work for those.
• Does not work for classification and many other categories of
ML algorithms.
• Requires all the data to be in memory for the model to train.
Some other algorithms support batch learning, including
learning from a single training record at a time. Thus,
available RAM is not a constraint for them.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 59
The LinearRegression Class

• The LinearRegression() class in scikit-learn uses a


refinement of the Normal Equation Method called "Singular
Value Decomposition" (SVD).
• This is more efficient than the Normal Equation Method,
with the computational complexity being O(n2). This also
works even if the matrix XTX is not invertible.
• However, like the Normal Equation Method, this also
requires all the training data to be in memory.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 60
Gradient Descent

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 61
Lab: Manual Regression, Using Spreadsheet

• Step 1:
 put in formula for cost, play around with values of and , and see the
impact on average cost.
 with the same formula for cost, and a fixed value for , and varying
values , see the impact on average cost in data table.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 62
Gradient Descent

Avg cost
900

800

700

600

500
Avg cost

400

300

200

100

0
20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 63
Gradient Descent

Image source:
https://stackoverflow.com/questions/64940632/how-to-illustrate-a-3d-graph-of-gradient-descent-using-python-matplotlib

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 64
Gradient Descent

Hypothesis

Cost function

Objective
Minimize J
Algorithm
Start with some values of and (say = 0, = 0)
Repeat until convergence {
simultaneously update
(for j = 0 and j = 1)
}

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 65
What are and

• is the slope of the curve for


• Similarly, is the slope of the curve for
• is the learning rate.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 66
Simultaneous Updates

Applying partial derivatives, the simultaneous updates become:

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 67
Lab: Gradient Descent using MS Excel

• In the sheet titled “Step 2”:


 Start with some and (such as 0, 0)
 Put in formulas for estimate, cost, error, error * x
 Put in formulas for average error, average (error * x)
 Set learning rate () to 0.0000001.
 Manually run a few iterations of gradient descent logic, and see the
impact on the chart plotted.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 68
Gradient Descent Results

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 69
Gradient Descent Results

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 70
Gradient Descent Results

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 71
Learning Rate ()

• If is too small, gradient descent can be too slow.


• If is too large, gradient descent can overshoot the minimum
and fail to converge, or may even diverge.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 72
Impact of Alpha on J

Alpha = 0.0000001 Alpha = 0.0000003

Alpha = 0.000001 Alpha = 0.000003

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 73
Gradient Descent Variations

• Batch gradient descent: uses the entire training data set to


compute the gradient at every step; making it slow with large
training data sets.
• Gradient descent: at every step, a single training instance is
randomly selected, and the gradient is computed based on
that.
• Mini-batch gradient descent: at every step, computes the
gradient based on a small subset of randomly selected
instances from the training data set.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 74
Comparison

• Batch gradient descent is the slow, stochastic gradient


descent is the fastest.
• Batch gradient descent eventually settles down near the
optimal solution; stochastic gradient descent continues to
walk around, never settling down on the optimal.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 75
SGDRegressor

• Stochastic Gradient Descent Regressor.


• Stochastic = having a random probability distribution that
may be analyzed statistically but may not be predicted
precisely.
• A simple and efficient algorithm for linear regression.
• Scales very well for very large datasets (such as 105 rows)
and very large number of features (such as 105).
• https://scikit-learn.org/stable/modules/generated/sklearn.linea
r_model.SGDRegressor.html

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 76
Using SGDRegressor for Stochastic GD

model = SGDRegressor()

• Three modes of gradient descent:


 batch mode (average = True)
 stochastic (average = False)
 mini-batch (average = some value greater than 1)
• The default value for the parameter "average" (False) is
stochastic gradient descent.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 77
Training the Model

model.fit (trainX, trainY)

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 78
Training the Model

model.fit (trainX, trainY.values.ravel())

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 79
Some of the Model Parameters

• verbose. Default = 0 (silent). Other value = 1 (verbose


output).
• eta0: initial learning rate. Default = 0.01.
• max_iter: maximum number of iterations (epochs). Default =
1000.
• shuffle: whether the training data should be shuffled after
each iteration. Default = True
• random_state: used for shuffling the data, when shuffle =
True. Setting it to a constant value results in deterministic
randomization.
• tol: The stopping criterion. If for n_iter_no_change
iterations (default = 5), best_loss – loss < tol, then training
stops.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 80
Making Gradient Descent Work Well

• Observe changes in J after each iteration (with verbose = 1),


and if required, change eta0, max_iter, tol.

model = SGDRegressor (verbose = 1)

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 81
Making Gradient Descent Work Well

• Start with verbose = True, a low value of eta0 (such as


0.0000001), a low value for max_iter (such as 10 or 20).
Observe average loss, and R2.
• If the average loss seems to converge, increase eta0 by a
factor of 2 or 3, until it no longer appears to converge.
• If the average loss seems to diverge, decrease eta0 by a factor
of 2 or 3.
• Once you obtain a promising value of eta0, drop varose = 1,
and gradually increase max_iter. You may also want to tweak
tol.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 82
Tweaking Learning Rate

• For sufficiently small learning rate, J should decrease on each


iteration for batch gradient descent (average = True).
• But if it is too small, it can be slow to converge.

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 83
Making Gradient Descent Work Well

model = SGDRegressor(eta0 = 0.000005,


max_iter = 2000000, tol = 0.1,
shuffle = False)

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 84
Day 1 Review Questions

1. What is the difference between conventional


programming and ML?
2. Which method in the pandas library reads a CSV file
into memory?
3. What does dataframe.describe() function show?
4. What is standard deviation? 
5. What is a percentile score?
6. What is a boxplot?
7. What is an outlier? How are outliers identified?
8. What is the first split of the data that we perform for ML
algorithms?
9. Why do we split training and test data, and how?
10.What does random_state parameter achieve in
train_test_split?

Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 85

You might also like