Machine Learning: Pradyumn Sharma Pragati Software Pvt. LTD
Machine Learning: Pradyumn Sharma Pragati Software Pvt. LTD
Pradyumn Sharma
Pragati Software Pvt. Ltd.
[email protected]
www.pragatisoftware.com
[email protected]
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 1
Program Contents
1.Day one 3.Day three
1. Essential concepts and 1. Polynomial regression
terminology 2. Logistic regression
2. Understanding linear (classification)
regression 3. Classification report
3. Hypothesis, cost function 4.Day four
4. Gradient Descent, learning rate 1. Decision tree
5. Essentials of numpy, pandas, 2. K-nearest neighbors
matplot libraries 3. Ensemble techniques
6. Training and test dataset split 5.Day five
2.Day two 1. Unsupervised learning:
1. Predictive modelling Clustering
2. Using the Stochastic Gradient 2. K-means
Descent (SGD) regressor 3. Anomaly detection
3. Tweaking the SGD regressor 4. Deploying ML Models
4. R-square: coefficient of
determination
5. Making predictions
6. Feature scaling
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 2
Machine Learning Prerequisites
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 3
What is Machine Learning?
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 4
Conventional Programming vs Machine Learning
Rules
Conventional Result
Input Programming
Rule: F = (9 * C / 5) + 32
Input: 20
Output: 68
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 5
Conventional Programming vs Machine Learning
Rules
Conventional Result
Input Programming
Input
Machine
Result Rules
Learning
F = (9 * C / 5) + 32
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 6
Machine Learning: Some More Examples
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 7
Machine Learning Technologies
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 8
Technologies Used in This Program
• Python
• General libraries: numpy, pandas, matplotlib
• ML library: scikit-learn
• Development environment: Google Colab
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 9
Introduction to Google Colaboratory
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 10
Setting Up Google Colab
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 11
Pandas
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 12
scikit-learn
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 13
Loading and Examining Data
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 14
Rentals in Andheri East, Mumbai
Source: www.housing.com
27-Oct-2017
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 15
Rentals in Andheri East, Mumbai
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 16
Loading the Data from Google Drive
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')
full_data = pd.read_csv
("/content/drive/MyDrive/rentals.csv")
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 17
Viewing Data
• print (full_data)
• full_data
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 18
Some Statistics About Data
• full_data.describe ()
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 19
Standard Deviation of a Sample
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 20
Percentiles
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 21
The Normal Distribution
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 22
The Normal (Distribution) Curve
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 23
Many Distributions Follow the Normal Curve
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 24
Normal Distribution Table
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 25
Normal Distribution
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 26
Distinct Values and Their Count
full_data['bedrooms'].value_counts()
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 27
Correlations Among Variables
• full_data.corr ()
• Provides a measure of correlation between various variables.
• Value close to 1 => strong positive correlation
• Value close to -1 => strong negative correlation
• Value close to 0 => no linear correlation
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 28
Box Plot
full_data.area.plot(kind='box')
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 29
Box Plot
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 30
Box Plot
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 31
Outliers
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 32
Histogram
full_data.area.hist()
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 33
Histogram
full_data['area'].hist(bins=25)
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 34
Scatter Matrix
full_data.plot(kind='scatter',
x = 'area', y = 'rent')
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 35
Scatter Matrix
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 36
Saving a Diagram
full_data.area.plot(kind='box')
plt.savefig
("/content/drive/MyDrive/output.png")
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 37
Understanding the Key Concepts
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 38
An Introduction to Linear Regression
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 39
Area x Rent
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 40
Learning Algorithm
Training set
Learning
Algorithm
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 41
Hypothesis
• Example:
f = 32 + 1.8c
rent = 5000 + 40 x area
• How do we choose the
values of and ?
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 42
Generalized Hypothesis
Or...
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 43
Hypothesis as a Matrix Operation
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 44
Cost Function: Least Squared Error (L2)
• Minimize J
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 45
The LinearRegression Class
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 46
Separating the Predictors and the Target
dataX = full_data.drop(columns=['rent'])
dataY = pd.DataFrame ({'rent':full_data.rent})
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 47
Training and Test Data Split
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 48
Training and Test Data Split
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 49
Different Random Splits Every Time
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 50
Ensuring Same Split Every Time
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 51
Defining the Model
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 52
Training the Model
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 53
Evaluating the Results
Hypothesis:
rent = 911 + 27 x area + 6103 x bedrooms + 840 x furnished
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 54
R-Squared (Coefficient of Determination)
If,
are the actual values,
are the predicted values,
is the mean of the actual values,
Then,
= 1-
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 55
Mean Squared Error, Mean Absolute Error
predictions = model.predict(trainX)
mse = mean_squared_error (trainY, predictions)
rmse = np.sqrt (mse)
print ('RMSE on training data: ', rmse)
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 56
Predictions for User-Supplied Data
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 57
The Normal Equation Method
• x= y=
mxn mx1
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 58
Normal Equation Method
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 59
The LinearRegression Class
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 60
Gradient Descent
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 61
Lab: Manual Regression, Using Spreadsheet
• Step 1:
put in formula for cost, play around with values of and , and see the
impact on average cost.
with the same formula for cost, and a fixed value for , and varying
values , see the impact on average cost in data table.
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 62
Gradient Descent
Avg cost
900
800
700
600
500
Avg cost
400
300
200
100
0
20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 63
Gradient Descent
Image source:
https://stackoverflow.com/questions/64940632/how-to-illustrate-a-3d-graph-of-gradient-descent-using-python-matplotlib
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 64
Gradient Descent
Hypothesis
Cost function
Objective
Minimize J
Algorithm
Start with some values of and (say = 0, = 0)
Repeat until convergence {
simultaneously update
(for j = 0 and j = 1)
}
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 65
What are and
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 66
Simultaneous Updates
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 67
Lab: Gradient Descent using MS Excel
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 68
Gradient Descent Results
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 69
Gradient Descent Results
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 70
Gradient Descent Results
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 71
Learning Rate ()
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 72
Impact of Alpha on J
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 73
Gradient Descent Variations
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 74
Comparison
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 75
SGDRegressor
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 76
Using SGDRegressor for Stochastic GD
model = SGDRegressor()
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 77
Training the Model
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 78
Training the Model
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 79
Some of the Model Parameters
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 80
Making Gradient Descent Work Well
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 81
Making Gradient Descent Work Well
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 82
Tweaking Learning Rate
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 83
Making Gradient Descent Work Well
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 84
Day 1 Review Questions
Pragati Software Pvt. Ltd., 312, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 85