© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Introduction to ML and Decision Tree
Suman Debnath
Principal Developer Advocate | AWS
“AI is the new ’Electricity’..”
Before We Begin...
• Deep Learning (Subset of ML) - Uses Deep Neural Networks (a shallow network has one hidden
layer, a deep network has more than one) to learn features of the data in a hierarchical manner (e.g.
pixels from one layer recombine to form a line in the next layer)
– computer vision
– speech recognition
– natural language processing
• Artificial Intelligence – Basically a computer program doing something “smart”
– A bunch of if-then statements
–Machine Learning
• Machine Learning (Subset of AI) – A broad umbrella term for the technology that finds patterns in your
existing data, and uses them to make predictions on new data points
– Fraud Detection
– Deep Learning
AI | ML | DL – Maybe a picture is better?
Great Resource:
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
by Pedro Domingos
Timeline Of Machine Learning
1950 1952 1957 1979 1986 1997 2011 2012 2014 2016
The Learning
Machine (Alan Turing)
Machine Playing Checker
(Author Samuel)
Perceptron
(Frank Rosenblatt)
Stanford Cart
Backpropagation
(D. Rumelhart, G. Hinton, R. Williams)
Deep Blue Beats
Kasparov
Watson Wins Jeopardy
DeepMind Wins GoGoogle NN recognizing
cat in Youtube
Facebook DeepFace, Amazon
Echo, Turing Test Passed
Explosion in AI and ML Use Cases
Image recognition and tagging for photo organization
Object detection, tracking and navigation for Autonomous Vehicles
Speech recognition & synthesis in Intelligent Voice Assistants
Algorithmic trading strategy performance improvement
Sentiment analysis for targeted advertisements
43,252,003,274,489,856,000
43 QUINTILLION UNIQUE COMBINATIONS
F2 U' R' L F2 R L' U'
Learning
function
F2 U' R' L F2 R L' U'
Learning
function
1%
accuracy
R U r U R U2 r U2%
accuracy
Learning
function
20%
accuracy
40%
accuracy
60%
accuracy
80%
accuracy
95%
accuracy
2%
accuracy
Learning
function
95%
accuracy
?
F2 R F R′ B′ D F D′ B D F
SOLVED IN 0.9 SECONDS
Don’t code the patterns
Let the system
Learn Through Data
We Call This Approach Machine Learning
Types Of Machine Learning
Supervised Learning (depends on labeled datasets)
Baby
No, it’s a
Labrador.
Supervised Learning – How Machines Learn
Human intervention and validation required
e.g. Photo classification and tagging
Input
Label
Machine
Learning
Algorithm
Labrador
Prediction
Cat
Training Data
?
Label
Labrador
Adjust Model
Unsupervised Learning (learning without labels)
No human intervention required
(e.g. Customer segmentation)
Input
Machine
Learning
Algorithm
Prediction
Machine Learning Use Cases
Supervised Learning
Ø Classification
• Spam detection
• Customer churn prediction
Ø Regression
• House price prediction
• Demand forecasting
Unsupervised Learning
Ø Clustering
• Customer segmentation
There are other types as well
(Reinforcement Learning, for example)
but these two are the primary areas today
There are Lots of Machine Learning Algorithms
machinelearningmastery.com
There are Lots of Machine Learning Algorithms
machinelearningmastery.com
Color Size Fruit
Red Big Apple
Red Small Apple
Yellow Small Lemon
Red Big Apple
Green Big Apple
Yellow Big Lemon
Green Small Lemon
Red Big Apple
Yellow Big Lemon
Green Big Apple
Input Feature Target Label
Some Dataset
Decision Tree might look like …
Size of the fruit ?Apple
Color of the fruit ?
Apple Lemon
Lemon
Red
Green
Yellow
Big Small
Root
Branches
Leaf
Splitting
But the question is…
But the question is…given a dataset
But the question is…given a dataset, how can
we build a tree like this ?
Size of the fruit ?Apple
Color of the fruit ?
Apple Lemon
Lemon
Red
Green
Yellow
Big Small
Root
Branches
Leaf
Splitting
But the question is…given a dataset, how can
we build a tree like this ?
Size of the fruit ?Apple
Color of the fruit ?
Apple Lemon
Lemon
Red
Green
Yellow
Big Small
Root
Branches
Leaf
Splitting
General DT structure
Root
Interior
Interior
Leaf Leaf
Leaf
Leaf Interior
Leaf Leaf
General DT structure
Root
Interior
Interior
Leaf Leaf
Leaf
Leaf Interior
Leaf Leaf
Size of the fruit ?Apple
Color of the fruit ?
Apple Lemon
Lemon
Red
Green
Yellow
Big Small
Root
Branches
Leaf
Splitting
Training flow of a Decision Tree
• Prepare the labelled data set
• Try to pick the best feature as the root node
• Grow the tree until we get a stopping criteria
• Pass through the prediction data query through the tree
until we arrive at some le
• Once we get the leaf node, we have the prediction!! :)
Feature 1 Feature 2 Feature 3 Feature 4 Target Label
Training data, everything is known
Feature 1 Feature 2 Feature 3 Feature 4 Target Label
Root
Interior
Interior
Leaf Leaf
Leaf
Leaf Interior
Leaf Leaf
Training data, everything is known
Feature 1 Feature 2 Feature 3 Feature 4 Target Label
Feature 1 Feature 2 Feature 3 Feature 4 Target Label
???
Root
Interior
Interior
Leaf Leaf
Leaf
Leaf Interior
Leaf Leaf
Training data, everything is known
Prediction Data, only Feature 1 to 4 is known
UNKNOWN
Feature 1 Feature 2 Feature 3 Feature 4 Target Label
Feature 1 Feature 2 Feature 3 Feature 4 Target Label
???
Root
Interior
Interior
Leaf Leaf
Leaf
Leaf Interior
Leaf Leaf
Training data, everything is known
Prediction Data, only Feature 1 to 4 is known
UNKNOWN
Send the Query/Inference
Feature 1 Feature 2 Feature 3 Feature 4 Target Label
Feature 1 Feature 2 Feature 3 Feature 4 Target Label
???
Root
Interior
Interior
Leaf Leaf
Leaf
Leaf Interior
Leaf Leaf
Training data, everything is known
Prediction Data, only Feature 1 to 4 is known
UNKNOWN
Send the Query/Inference
Get the prediction
Math behind Decision Tree
• Entropy
• Information Gain(IG)
Entropy
It is the notion of the impurity of the data, now what is this new term impurity of
the data?
Entropy
It is the notion of the impurity of the data, now what is this new term impurity of
the data?
pure
Entropy
It is the notion of the impurity of the data, now what is this new term impurity of
the data?
pure less pure
Entropy
It is the notion of the impurity of the data, now what is this new term impurity of
the data?
impurepure less pure
Entropy
H(x) = - ∑ P(k) * log2(P(k))
k = ranges from 1 through n
H(x) = Entropy of x
P(k) = Probability of random variable x when x=k
Entropy
H(x) = - ∑ P(k) * log2(P(k))
k = ranges from 1 through n
H(x) = Entropy of x
P(k) = Probability of random variable x when x=k
Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Overcast Hot High FALSE Yes
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Sunny Mild High TRUE No
Dataset – D
Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Overcast Hot High FALSE Yes
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Sunny Mild High TRUE No
Dataset – D
X = “Play Ball”
Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Overcast Hot High FALSE Yes
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Sunny Mild High TRUE No
Dataset – D
P(k=Yes) => 9/14 = 0.64
X = “Play Ball”
Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Overcast Hot High FALSE Yes
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Sunny Mild High TRUE No
Dataset – D
P(k=Yes) => 9/14 = 0.64
P(k=No) => 5/14 = 0.36
X = “Play Ball”
Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Overcast Hot High FALSE Yes
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Sunny Mild High TRUE No
Dataset – D
P(k=Yes) => 9/14 = 0.64
P(k=No) => 5/14 = 0.36
log2 (0.64) = -0.64
log2 (0.36) = -1.47
X = “Play Ball”
X = “Play Ball”
P(k=Yes) => 9/14 = 0.64
P(k=No) => 5/14 = 0.36
log2 (0.64) = -0.64
log2 (0.36) = -1.47
X = “Play Ball”
P(k=Yes) => 9/14 = 0.64
P(k=No) => 5/14 = 0.36
log2 (0.64) = -0.64
log2 (0.36) = -1.47
H(x) = - ∑ P(k) * log2(P(k))
X = “Play Ball”
P(k=Yes) => 9/14 = 0.64
P(k=No) => 5/14 = 0.36
log2 (0.64) = -0.64
log2 (0.36) = -1.47
H(x) = - ∑ P(k) * log2(P(k))
H(x) = - [P(k=Yes) * log2(P(k=Yes)) + P(k=No) * log2(P(k=No))]
X = “Play Ball”
P(k=Yes) => 9/14 = 0.64
P(k=No) => 5/14 = 0.36
log2 (0.64) = -0.64
log2 (0.36) = -1.47
H(x) = - ∑ P(k) * log2(P(k))
H(x) = - [P(k=Yes) * log2(P(k=Yes)) + P(k=No) * log2(P(k=No))]
H(x) = - [(0.64 * log2 (0.64) + 0.36 * log2(0.36))]
X = “Play Ball”
P(k=Yes) => 9/14 = 0.64
P(k=No) => 5/14 = 0.36
log2 (0.64) = -0.64
log2 (0.36) = -1.47
H(x) = - ∑ P(k) * log2(P(k))
H(x) = - [P(k=Yes) * log2(P(k=Yes)) + P(k=No) * log2(P(k=No))]
H(x) = - [(0.64 * log2 (0.64) + 0.36 * log2(0.36))]
H(x) = 0.94
Information Gain(IG)
Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Overcast Hot High FALSE Yes
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Sunny Mild High TRUE No
Dataset – D
Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Outlook Temperature Humidity Windy Play ball
Overcast Hot High FALSE Yes
Overcast Cool Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Outlook Temperature Humidity Windy Play ball
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Sunny Mild Normal FALSE Yes
Sunny Mild High TRUE No
Outlook
Sub-Dataset – D1
Sub-Dataset – D2
Sub-Dataset – D3
Dataset – D
HD1(”Play Ball”) = 0.69
HD2(”Play Ball”) = 0
HD3(”Play Ball”) = 0.97
Weighted Entropy
0.69
5/14tim
es
5/14times
4/14 times
IGOutlook = Entropy(D) -Weighted Entropy
= 0.97 - 0.69
= 0.25
Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Outlook Temperature Humidity Windy Play ball
Overcast Hot High FALSE Yes
Overcast Cool Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Outlook Temperature Humidity Windy Play ball
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Sunny Mild Normal FALSE Yes
Sunny Mild High TRUE No
Outlook
Sub-Dataset – D1
Sub-Dataset – D2
Sub-Dataset – D3
Dataset – D
HD1(”Play Ball”) = 0.69
HD2(”Play Ball”) = 0
HD3(”Play Ball”) = 0.97
Weighted Entropy
0.69
5/14tim
es
5/14times
4/14 times
IGOutlook = Entropy(D) -Weighted Entropy
= 0.97 - 0.69
= 0.25
Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Outlook Temperature Humidity Windy Play ball
Overcast Hot High FALSE Yes
Overcast Cool Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Outlook Temperature Humidity Windy Play ball
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Sunny Mild Normal FALSE Yes
Sunny Mild High TRUE No
Outlook
Sub-Dataset – D1
Sub-Dataset – D2
Sub-Dataset – D3
Dataset – D
HD1(”Play Ball”) = 0.69
HD2(”Play Ball”) = 0
HD3(”Play Ball”) = 0.97
Weighted Entropy
0.69
5/14tim
es
5/14times
4/14 times
IGOutlook = Entropy(D) -Weighted Entropy
= 0.97 - 0.69
= 0.25
Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Outlook Temperature Humidity Windy Play ball
Overcast Hot High FALSE Yes
Overcast Cool Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Outlook Temperature Humidity Windy Play ball
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Sunny Mild Normal FALSE Yes
Sunny Mild High TRUE No
Outlook
Sub-Dataset – D1
Sub-Dataset – D2
Sub-Dataset – D3
Dataset – D
HD1(”Play Ball”) = 0.69
HD2(”Play Ball”) = 0
HD3(”Play Ball”) = 0.97
Weighted Entropy
0.69
5/14tim
es
5/14times
4/14 times
IGOutlook = Entropy(D) -Weighted Entropy
= 0.97 - 0.69
= 0.25
Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Outlook Temperature Humidity Windy Play ball
Overcast Hot High FALSE Yes
Overcast Cool Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Outlook Temperature Humidity Windy Play ball
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Sunny Mild Normal FALSE Yes
Sunny Mild High TRUE No
Outlook
Sub-Dataset – D1
Sub-Dataset – D2
Sub-Dataset – D3
Dataset – D
HD1(”Play Ball”) = 0.69
HD2(”Play Ball”) = 0
HD3(”Play Ball”) = 0.97
Weighted Entropy
0.69
5/14tim
es
5/14times
4/14 times
IGOutlook = Entropy(D) -Weighted Entropy
= 0.97 - 0.69
= 0.25
Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Outlook Temperature Humidity Windy Play ball
Overcast Hot High FALSE Yes
Overcast Cool Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Outlook Temperature Humidity Windy Play ball
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Sunny Mild Normal FALSE Yes
Sunny Mild High TRUE No
Outlook
Sub-Dataset – D1
Sub-Dataset – D2
Sub-Dataset – D3
Dataset – D
HD1(”Play Ball”) = 0.69
HD2(”Play Ball”) = 0
HD3(”Play Ball”) = 0.97
Weighted Entropy
0.69
5/14tim
es
5/14times
4/14 times
IGOutlook = Entropy(D) -Weighted Entropy
= 0.97 - 0.69
= 0.25
IGOutlook = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Outlook
= 0.94 – 0.69
= 0.25
IGOutlook = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Outlook
= 0.94 – 0.69
= 0.25
IGTemperature = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Temperature
= 0.94 - 0.91
= 0.03
IGHumidity = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Humidity
= 0.94 - 0.79
= 0.15
IGWindy = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Windy
= 0.94 - 0.90
= 0.04
Maximum IG ? - Outlook
IGOutlook = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Outlook
= 0.94 – 0.69
= 0.25
IGTemperature = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Temperature
= 0.94 - 0.91
= 0.03
IGHumidity = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Humidity
= 0.94 - 0.79
= 0.15
IGWindy = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Windy
= 0.94 - 0.90
= 0.04
Here is the algorithmic steps
1. First the entropy of the total dataset is calculated for the target
label/class.
Here is the algorithmic steps
1. First the entropy of the total dataset is calculated for the target
label/class.
2. The dataset is then split on different features.
a) The entropy for each branch is calculated. Then it is added
proportionally, to get total weighted entropy for the split.
b) The resulting entropy is subtracted from the entropy before the split.
c) The result is the Information Gain.
Here is the algorithmic steps
1. First the entropy of the total dataset is calculated for the target
label/class.
2. The dataset is then split on different features.
a) The entropy for each branch is calculated. Then it is added
proportionally, to get total weighted entropy for the split.
b) The resulting entropy is subtracted from the entropy before the split.
c) The result is the Information Gain.
3. The feature that yields the largest IG is chosen for the decision node.
Here is the algorithmic steps
1. First the entropy of the total dataset is calculated for the target
label/class.
2. The dataset is then split on different features.
a) The entropy for each branch is calculated. Then it is added
proportionally, to get total weighted entropy for the split.
b) The resulting entropy is subtracted from the entropy before the split.
c) The result is the Information Gain.
3. The feature that yields the largest IG is chosen for the decision node.
4. Repeat step #2 and #3, for each subset of the data(for each internal
node) until:
a) All the dependent features are exhausted
b) The stopping criteria are met.
Thankfully, we do not have to do all this(like calculating
Entropy, IG, etc.), we have lots of libraries/packages
available in Python which we can use to solve a problem
with decision tree.
Can you please show the CODE pls…
Amazon
Rekognition
Amazon
Personalize
Amazon
Textract
Amazon
Comprehend
Amazon
Translate
Amazon
Polly
Amazon
Transcribe
+ Medical
Amazon
Lex
V I S I O N T E X T C H A T B O T SS P E E C H P E R S O N A L I Z A T I O N
Ground Truth
data labelling
ML
Marketplace
SageMaker Studio IDE
SageMaker
Notebooks
SageMaker
Experiments
SageMaker
Debugger
SageMaker
Autopilot
SageMaker
Model Monitor
Model
training
Model
tuning
Model
hosting
Built-in
algorithms
SageMaker
Neo
N E W !
N E W !
N E W ! N E W !N E W !
Deep Learning
AMIs & Containers
GPUs and
CPUs
Inferentia
Elastic
Inference
FPGA
N E W !
N E W ! N E W ! N E W !
A M A Z O N
S A G E M A K E R
M L F R A M E W O R K S
& I N F R A S T R U C T U R E
A I S E R V I C E S
Amazon
Forecast
F O R E C A S T I N G
AWS ML Stack
Broadest and most complete set of Machine Learning capabilities
Amazon
Rekognition
Amazon
Personalize
Amazon
Textract
Amazon
Comprehend
Amazon
Translate
Amazon
Polly
Amazon
Transcribe
+ Medical
Amazon
Lex
V I S I O N T E X T C H A T B O T SS P E E C H P E R S O N A L I Z A T I O N
Ground Truth
data labelling
ML
Marketplace
SageMaker Studio IDE
SageMaker
Notebooks
SageMaker
Experiments
SageMaker
Debugger
SageMaker
Autopilot
SageMaker
Model Monitor
Model
training
Model
tuning
Model
hosting
Built-in
algorithms
SageMaker
Neo
N E W !
N E W !
N E W ! N E W !N E W !
Deep Learning
AMIs & Containers
GPUs and
CPUs
Inferentia
Elastic
Inference
FPGA
N E W !
N E W ! N E W ! N E W !
A M A Z O N
S A G E M A K E R
M L F R A M E W O R K S
& I N F R A S T R U C T U R E
A I S E R V I C E S
Amazon
Forecast
F O R E C A S T I N G
AWS ML Stack
Broadest and most complete set of Machine Learning capabilities
Amazon
Rekognition
Amazon
Personalize
Amazon
Textract
Amazon
Comprehend
Amazon
Translate
Amazon
Polly
Amazon
Transcribe
+ Medical
Amazon
Lex
V I S I O N T E X T C H A T B O T SS P E E C H P E R S O N A L I Z A T I O N
Ground Truth
data labelling
ML
Marketplace
SageMaker Studio IDE
SageMaker
Notebooks
SageMaker
Experiments
SageMaker
Debugger
SageMaker
Autopilot
SageMaker
Model Monitor
Model
training
Model
tuning
Model
hosting
Built-in
algorithms
SageMaker
Neo
N E W !
N E W !
N E W ! N E W !N E W !
Deep Learning
AMIs & Containers
GPUs and
CPUs
Inferentia
Elastic
Inference
FPGA
N E W !
N E W ! N E W ! N E W !
A M A Z O N
S A G E M A K E R
M L F R A M E W O R K S
& I N F R A S T R U C T U R E
A I S E R V I C E S
Amazon
Forecast
F O R E C A S T I N G
AWS ML Stack
Broadest and most complete set of Machine Learning capabilities
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Amazon SageMaker |
• Jupyter notebooks
• Support Jupyter Lab
• Multiple built-in kernels
• Bring your own kernels
• Integrate with Git
• Sample notebooks
Reference
Blog: An Introduction to Decision Tree and Ensemble Methods – Part 1
Code: Repository
Stay Connected
/suman-d /_sumand
Suman Debnath
Principal Developer Advocate
ml.aws
Backup
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Amazon SageMaker |
VPC
Notebook
VPC
Private subnetPublic subnet
SageMaker
Internet
Gateway
Internet
Gateway
NAT ENI
SageMaker
Endpoint
VPN
Gateway
ENI
Customer VPC
On Premises
Amazon VPC
InternetInternet
Accessing Internet and other resources from your SageMaker Notebook
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Amazon SageMaker |
SageMaker
Amazon
EC2
Amazon Elastic
Container Registry
Amazon
CloudWatch
AWS
CodeCommit
Amazon Simple
Storage Service
Amazon
SageMaker
AWS
RoboMaker
AWS
Lambda
Passes Execution Role
AWS Key
Management
AWS Identity
Management
Role
User
CreateNotebookInstance
CreateHyperParameterTuningJob
CreateTrainingJob
CreateModel
Permissions Model
Model Training
Model Training – Split training data
All Labeled Dataset
Training Data
70% 30%
Model Training – Training w/ training data
All Labeled Dataset
Training Data
70% 30%
Training
Trial
Model
Model Training – Split the test data
All Labeled Dataset
Training Data
70% 30%
Training
Trial
Model
Test
Data
Model Training – Model evaluation
All Labeled Dataset
Training Data
70% 30%
Training
Test
Data
Evaluation
Result
Trial
Model
Model Training - Performance Measurement
All Labeled Dataset
Training Data
70% 30%
Training
Test
Data
Evaluation
Result
Trial
Model Accuracy

Introduction to ML and Decision Tree

  • 1.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved | Introduction to ML and Decision Tree Suman Debnath Principal Developer Advocate | AWS
  • 2.
    “AI is thenew ’Electricity’..”
  • 3.
    Before We Begin... •Deep Learning (Subset of ML) - Uses Deep Neural Networks (a shallow network has one hidden layer, a deep network has more than one) to learn features of the data in a hierarchical manner (e.g. pixels from one layer recombine to form a line in the next layer) – computer vision – speech recognition – natural language processing • Artificial Intelligence – Basically a computer program doing something “smart” – A bunch of if-then statements –Machine Learning • Machine Learning (Subset of AI) – A broad umbrella term for the technology that finds patterns in your existing data, and uses them to make predictions on new data points – Fraud Detection – Deep Learning
  • 4.
    AI | ML| DL – Maybe a picture is better? Great Resource: The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos
  • 5.
    Timeline Of MachineLearning 1950 1952 1957 1979 1986 1997 2011 2012 2014 2016 The Learning Machine (Alan Turing) Machine Playing Checker (Author Samuel) Perceptron (Frank Rosenblatt) Stanford Cart Backpropagation (D. Rumelhart, G. Hinton, R. Williams) Deep Blue Beats Kasparov Watson Wins Jeopardy DeepMind Wins GoGoogle NN recognizing cat in Youtube Facebook DeepFace, Amazon Echo, Turing Test Passed
  • 6.
    Explosion in AIand ML Use Cases Image recognition and tagging for photo organization Object detection, tracking and navigation for Autonomous Vehicles Speech recognition & synthesis in Intelligent Voice Assistants Algorithmic trading strategy performance improvement Sentiment analysis for targeted advertisements
  • 7.
  • 8.
    F2 U' R'L F2 R L' U' Learning function
  • 9.
    F2 U' R'L F2 R L' U' Learning function 1% accuracy R U r U R U2 r U2% accuracy
  • 10.
  • 11.
  • 12.
  • 13.
    Don’t code thepatterns Let the system Learn Through Data
  • 14.
    We Call ThisApproach Machine Learning
  • 15.
  • 16.
    Supervised Learning (dependson labeled datasets) Baby No, it’s a Labrador.
  • 17.
    Supervised Learning –How Machines Learn Human intervention and validation required e.g. Photo classification and tagging Input Label Machine Learning Algorithm Labrador Prediction Cat Training Data ? Label Labrador Adjust Model
  • 18.
    Unsupervised Learning (learningwithout labels) No human intervention required (e.g. Customer segmentation) Input Machine Learning Algorithm Prediction
  • 19.
    Machine Learning UseCases Supervised Learning Ø Classification • Spam detection • Customer churn prediction Ø Regression • House price prediction • Demand forecasting Unsupervised Learning Ø Clustering • Customer segmentation There are other types as well (Reinforcement Learning, for example) but these two are the primary areas today
  • 20.
    There are Lotsof Machine Learning Algorithms machinelearningmastery.com
  • 21.
    There are Lotsof Machine Learning Algorithms machinelearningmastery.com
  • 22.
    Color Size Fruit RedBig Apple Red Small Apple Yellow Small Lemon Red Big Apple Green Big Apple Yellow Big Lemon Green Small Lemon Red Big Apple Yellow Big Lemon Green Big Apple Input Feature Target Label Some Dataset
  • 23.
    Decision Tree mightlook like … Size of the fruit ?Apple Color of the fruit ? Apple Lemon Lemon Red Green Yellow Big Small Root Branches Leaf Splitting
  • 24.
  • 25.
    But the questionis…given a dataset
  • 26.
    But the questionis…given a dataset, how can we build a tree like this ? Size of the fruit ?Apple Color of the fruit ? Apple Lemon Lemon Red Green Yellow Big Small Root Branches Leaf Splitting
  • 27.
    But the questionis…given a dataset, how can we build a tree like this ? Size of the fruit ?Apple Color of the fruit ? Apple Lemon Lemon Red Green Yellow Big Small Root Branches Leaf Splitting
  • 28.
    General DT structure Root Interior Interior LeafLeaf Leaf Leaf Interior Leaf Leaf
  • 29.
    General DT structure Root Interior Interior LeafLeaf Leaf Leaf Interior Leaf Leaf Size of the fruit ?Apple Color of the fruit ? Apple Lemon Lemon Red Green Yellow Big Small Root Branches Leaf Splitting
  • 30.
    Training flow ofa Decision Tree • Prepare the labelled data set • Try to pick the best feature as the root node • Grow the tree until we get a stopping criteria • Pass through the prediction data query through the tree until we arrive at some le • Once we get the leaf node, we have the prediction!! :)
  • 31.
    Feature 1 Feature2 Feature 3 Feature 4 Target Label Training data, everything is known
  • 32.
    Feature 1 Feature2 Feature 3 Feature 4 Target Label Root Interior Interior Leaf Leaf Leaf Leaf Interior Leaf Leaf Training data, everything is known
  • 33.
    Feature 1 Feature2 Feature 3 Feature 4 Target Label Feature 1 Feature 2 Feature 3 Feature 4 Target Label ??? Root Interior Interior Leaf Leaf Leaf Leaf Interior Leaf Leaf Training data, everything is known Prediction Data, only Feature 1 to 4 is known UNKNOWN
  • 34.
    Feature 1 Feature2 Feature 3 Feature 4 Target Label Feature 1 Feature 2 Feature 3 Feature 4 Target Label ??? Root Interior Interior Leaf Leaf Leaf Leaf Interior Leaf Leaf Training data, everything is known Prediction Data, only Feature 1 to 4 is known UNKNOWN Send the Query/Inference
  • 35.
    Feature 1 Feature2 Feature 3 Feature 4 Target Label Feature 1 Feature 2 Feature 3 Feature 4 Target Label ??? Root Interior Interior Leaf Leaf Leaf Leaf Interior Leaf Leaf Training data, everything is known Prediction Data, only Feature 1 to 4 is known UNKNOWN Send the Query/Inference Get the prediction
  • 36.
    Math behind DecisionTree • Entropy • Information Gain(IG)
  • 37.
    Entropy It is thenotion of the impurity of the data, now what is this new term impurity of the data?
  • 38.
    Entropy It is thenotion of the impurity of the data, now what is this new term impurity of the data? pure
  • 39.
    Entropy It is thenotion of the impurity of the data, now what is this new term impurity of the data? pure less pure
  • 40.
    Entropy It is thenotion of the impurity of the data, now what is this new term impurity of the data? impurepure less pure
  • 41.
    Entropy H(x) = -∑ P(k) * log2(P(k)) k = ranges from 1 through n H(x) = Entropy of x P(k) = Probability of random variable x when x=k
  • 42.
    Entropy H(x) = -∑ P(k) * log2(P(k)) k = ranges from 1 through n H(x) = Entropy of x P(k) = Probability of random variable x when x=k
  • 43.
    Outlook Temperature HumidityWindy Play ball Rainy Hot High FALSE No Rainy Hot High TRUE No Overcast Hot High FALSE Yes Sunny Mild High FALSE Yes Sunny Cool Normal FALSE Yes Sunny Cool Normal TRUE No Overcast Cool Normal TRUE Yes Rainy Mild High FALSE No Rainy Cool Normal FALSE Yes Sunny Mild Normal FALSE Yes Rainy Mild Normal TRUE Yes Overcast Mild High TRUE Yes Overcast Hot Normal FALSE Yes Sunny Mild High TRUE No Dataset – D
  • 44.
    Outlook Temperature HumidityWindy Play ball Rainy Hot High FALSE No Rainy Hot High TRUE No Overcast Hot High FALSE Yes Sunny Mild High FALSE Yes Sunny Cool Normal FALSE Yes Sunny Cool Normal TRUE No Overcast Cool Normal TRUE Yes Rainy Mild High FALSE No Rainy Cool Normal FALSE Yes Sunny Mild Normal FALSE Yes Rainy Mild Normal TRUE Yes Overcast Mild High TRUE Yes Overcast Hot Normal FALSE Yes Sunny Mild High TRUE No Dataset – D X = “Play Ball”
  • 45.
    Outlook Temperature HumidityWindy Play ball Rainy Hot High FALSE No Rainy Hot High TRUE No Overcast Hot High FALSE Yes Sunny Mild High FALSE Yes Sunny Cool Normal FALSE Yes Sunny Cool Normal TRUE No Overcast Cool Normal TRUE Yes Rainy Mild High FALSE No Rainy Cool Normal FALSE Yes Sunny Mild Normal FALSE Yes Rainy Mild Normal TRUE Yes Overcast Mild High TRUE Yes Overcast Hot Normal FALSE Yes Sunny Mild High TRUE No Dataset – D P(k=Yes) => 9/14 = 0.64 X = “Play Ball”
  • 46.
    Outlook Temperature HumidityWindy Play ball Rainy Hot High FALSE No Rainy Hot High TRUE No Overcast Hot High FALSE Yes Sunny Mild High FALSE Yes Sunny Cool Normal FALSE Yes Sunny Cool Normal TRUE No Overcast Cool Normal TRUE Yes Rainy Mild High FALSE No Rainy Cool Normal FALSE Yes Sunny Mild Normal FALSE Yes Rainy Mild Normal TRUE Yes Overcast Mild High TRUE Yes Overcast Hot Normal FALSE Yes Sunny Mild High TRUE No Dataset – D P(k=Yes) => 9/14 = 0.64 P(k=No) => 5/14 = 0.36 X = “Play Ball”
  • 47.
    Outlook Temperature HumidityWindy Play ball Rainy Hot High FALSE No Rainy Hot High TRUE No Overcast Hot High FALSE Yes Sunny Mild High FALSE Yes Sunny Cool Normal FALSE Yes Sunny Cool Normal TRUE No Overcast Cool Normal TRUE Yes Rainy Mild High FALSE No Rainy Cool Normal FALSE Yes Sunny Mild Normal FALSE Yes Rainy Mild Normal TRUE Yes Overcast Mild High TRUE Yes Overcast Hot Normal FALSE Yes Sunny Mild High TRUE No Dataset – D P(k=Yes) => 9/14 = 0.64 P(k=No) => 5/14 = 0.36 log2 (0.64) = -0.64 log2 (0.36) = -1.47 X = “Play Ball”
  • 48.
    X = “PlayBall” P(k=Yes) => 9/14 = 0.64 P(k=No) => 5/14 = 0.36 log2 (0.64) = -0.64 log2 (0.36) = -1.47
  • 49.
    X = “PlayBall” P(k=Yes) => 9/14 = 0.64 P(k=No) => 5/14 = 0.36 log2 (0.64) = -0.64 log2 (0.36) = -1.47 H(x) = - ∑ P(k) * log2(P(k))
  • 50.
    X = “PlayBall” P(k=Yes) => 9/14 = 0.64 P(k=No) => 5/14 = 0.36 log2 (0.64) = -0.64 log2 (0.36) = -1.47 H(x) = - ∑ P(k) * log2(P(k)) H(x) = - [P(k=Yes) * log2(P(k=Yes)) + P(k=No) * log2(P(k=No))]
  • 51.
    X = “PlayBall” P(k=Yes) => 9/14 = 0.64 P(k=No) => 5/14 = 0.36 log2 (0.64) = -0.64 log2 (0.36) = -1.47 H(x) = - ∑ P(k) * log2(P(k)) H(x) = - [P(k=Yes) * log2(P(k=Yes)) + P(k=No) * log2(P(k=No))] H(x) = - [(0.64 * log2 (0.64) + 0.36 * log2(0.36))]
  • 52.
    X = “PlayBall” P(k=Yes) => 9/14 = 0.64 P(k=No) => 5/14 = 0.36 log2 (0.64) = -0.64 log2 (0.36) = -1.47 H(x) = - ∑ P(k) * log2(P(k)) H(x) = - [P(k=Yes) * log2(P(k=Yes)) + P(k=No) * log2(P(k=No))] H(x) = - [(0.64 * log2 (0.64) + 0.36 * log2(0.36))] H(x) = 0.94
  • 53.
    Information Gain(IG) Outlook TemperatureHumidity Windy Play ball Rainy Hot High FALSE No Rainy Hot High TRUE No Overcast Hot High FALSE Yes Sunny Mild High FALSE Yes Sunny Cool Normal FALSE Yes Sunny Cool Normal TRUE No Overcast Cool Normal TRUE Yes Rainy Mild High FALSE No Rainy Cool Normal FALSE Yes Sunny Mild Normal FALSE Yes Rainy Mild Normal TRUE Yes Overcast Mild High TRUE Yes Overcast Hot Normal FALSE Yes Sunny Mild High TRUE No Dataset – D
  • 54.
    Outlook Temperature HumidityWindy Play ball Rainy Hot High FALSE No Rainy Hot High TRUE No Rainy Mild High FALSE No Rainy Cool Normal FALSE Yes Rainy Mild Normal TRUE Yes Outlook Temperature Humidity Windy Play ball Overcast Hot High FALSE Yes Overcast Cool Normal TRUE Yes Overcast Mild High TRUE Yes Overcast Hot Normal FALSE Yes Outlook Temperature Humidity Windy Play ball Sunny Mild High FALSE Yes Sunny Cool Normal FALSE Yes Sunny Cool Normal TRUE No Sunny Mild Normal FALSE Yes Sunny Mild High TRUE No Outlook Sub-Dataset – D1 Sub-Dataset – D2 Sub-Dataset – D3 Dataset – D HD1(”Play Ball”) = 0.69 HD2(”Play Ball”) = 0 HD3(”Play Ball”) = 0.97 Weighted Entropy 0.69 5/14tim es 5/14times 4/14 times IGOutlook = Entropy(D) -Weighted Entropy = 0.97 - 0.69 = 0.25
  • 55.
    Outlook Temperature HumidityWindy Play ball Rainy Hot High FALSE No Rainy Hot High TRUE No Rainy Mild High FALSE No Rainy Cool Normal FALSE Yes Rainy Mild Normal TRUE Yes Outlook Temperature Humidity Windy Play ball Overcast Hot High FALSE Yes Overcast Cool Normal TRUE Yes Overcast Mild High TRUE Yes Overcast Hot Normal FALSE Yes Outlook Temperature Humidity Windy Play ball Sunny Mild High FALSE Yes Sunny Cool Normal FALSE Yes Sunny Cool Normal TRUE No Sunny Mild Normal FALSE Yes Sunny Mild High TRUE No Outlook Sub-Dataset – D1 Sub-Dataset – D2 Sub-Dataset – D3 Dataset – D HD1(”Play Ball”) = 0.69 HD2(”Play Ball”) = 0 HD3(”Play Ball”) = 0.97 Weighted Entropy 0.69 5/14tim es 5/14times 4/14 times IGOutlook = Entropy(D) -Weighted Entropy = 0.97 - 0.69 = 0.25
  • 56.
    Outlook Temperature HumidityWindy Play ball Rainy Hot High FALSE No Rainy Hot High TRUE No Rainy Mild High FALSE No Rainy Cool Normal FALSE Yes Rainy Mild Normal TRUE Yes Outlook Temperature Humidity Windy Play ball Overcast Hot High FALSE Yes Overcast Cool Normal TRUE Yes Overcast Mild High TRUE Yes Overcast Hot Normal FALSE Yes Outlook Temperature Humidity Windy Play ball Sunny Mild High FALSE Yes Sunny Cool Normal FALSE Yes Sunny Cool Normal TRUE No Sunny Mild Normal FALSE Yes Sunny Mild High TRUE No Outlook Sub-Dataset – D1 Sub-Dataset – D2 Sub-Dataset – D3 Dataset – D HD1(”Play Ball”) = 0.69 HD2(”Play Ball”) = 0 HD3(”Play Ball”) = 0.97 Weighted Entropy 0.69 5/14tim es 5/14times 4/14 times IGOutlook = Entropy(D) -Weighted Entropy = 0.97 - 0.69 = 0.25
  • 57.
    Outlook Temperature HumidityWindy Play ball Rainy Hot High FALSE No Rainy Hot High TRUE No Rainy Mild High FALSE No Rainy Cool Normal FALSE Yes Rainy Mild Normal TRUE Yes Outlook Temperature Humidity Windy Play ball Overcast Hot High FALSE Yes Overcast Cool Normal TRUE Yes Overcast Mild High TRUE Yes Overcast Hot Normal FALSE Yes Outlook Temperature Humidity Windy Play ball Sunny Mild High FALSE Yes Sunny Cool Normal FALSE Yes Sunny Cool Normal TRUE No Sunny Mild Normal FALSE Yes Sunny Mild High TRUE No Outlook Sub-Dataset – D1 Sub-Dataset – D2 Sub-Dataset – D3 Dataset – D HD1(”Play Ball”) = 0.69 HD2(”Play Ball”) = 0 HD3(”Play Ball”) = 0.97 Weighted Entropy 0.69 5/14tim es 5/14times 4/14 times IGOutlook = Entropy(D) -Weighted Entropy = 0.97 - 0.69 = 0.25
  • 58.
    Outlook Temperature HumidityWindy Play ball Rainy Hot High FALSE No Rainy Hot High TRUE No Rainy Mild High FALSE No Rainy Cool Normal FALSE Yes Rainy Mild Normal TRUE Yes Outlook Temperature Humidity Windy Play ball Overcast Hot High FALSE Yes Overcast Cool Normal TRUE Yes Overcast Mild High TRUE Yes Overcast Hot Normal FALSE Yes Outlook Temperature Humidity Windy Play ball Sunny Mild High FALSE Yes Sunny Cool Normal FALSE Yes Sunny Cool Normal TRUE No Sunny Mild Normal FALSE Yes Sunny Mild High TRUE No Outlook Sub-Dataset – D1 Sub-Dataset – D2 Sub-Dataset – D3 Dataset – D HD1(”Play Ball”) = 0.69 HD2(”Play Ball”) = 0 HD3(”Play Ball”) = 0.97 Weighted Entropy 0.69 5/14tim es 5/14times 4/14 times IGOutlook = Entropy(D) -Weighted Entropy = 0.97 - 0.69 = 0.25
  • 59.
    Outlook Temperature HumidityWindy Play ball Rainy Hot High FALSE No Rainy Hot High TRUE No Rainy Mild High FALSE No Rainy Cool Normal FALSE Yes Rainy Mild Normal TRUE Yes Outlook Temperature Humidity Windy Play ball Overcast Hot High FALSE Yes Overcast Cool Normal TRUE Yes Overcast Mild High TRUE Yes Overcast Hot Normal FALSE Yes Outlook Temperature Humidity Windy Play ball Sunny Mild High FALSE Yes Sunny Cool Normal FALSE Yes Sunny Cool Normal TRUE No Sunny Mild Normal FALSE Yes Sunny Mild High TRUE No Outlook Sub-Dataset – D1 Sub-Dataset – D2 Sub-Dataset – D3 Dataset – D HD1(”Play Ball”) = 0.69 HD2(”Play Ball”) = 0 HD3(”Play Ball”) = 0.97 Weighted Entropy 0.69 5/14tim es 5/14times 4/14 times IGOutlook = Entropy(D) -Weighted Entropy = 0.97 - 0.69 = 0.25
  • 60.
    IGOutlook = HD(“PlayBall”) - Weighted Entropy after breaking the dataset with Outlook = 0.94 – 0.69 = 0.25
  • 61.
    IGOutlook = HD(“PlayBall”) - Weighted Entropy after breaking the dataset with Outlook = 0.94 – 0.69 = 0.25 IGTemperature = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Temperature = 0.94 - 0.91 = 0.03 IGHumidity = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Humidity = 0.94 - 0.79 = 0.15 IGWindy = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Windy = 0.94 - 0.90 = 0.04
  • 62.
    Maximum IG ?- Outlook IGOutlook = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Outlook = 0.94 – 0.69 = 0.25 IGTemperature = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Temperature = 0.94 - 0.91 = 0.03 IGHumidity = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Humidity = 0.94 - 0.79 = 0.15 IGWindy = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Windy = 0.94 - 0.90 = 0.04
  • 63.
    Here is thealgorithmic steps 1. First the entropy of the total dataset is calculated for the target label/class.
  • 64.
    Here is thealgorithmic steps 1. First the entropy of the total dataset is calculated for the target label/class. 2. The dataset is then split on different features. a) The entropy for each branch is calculated. Then it is added proportionally, to get total weighted entropy for the split. b) The resulting entropy is subtracted from the entropy before the split. c) The result is the Information Gain.
  • 65.
    Here is thealgorithmic steps 1. First the entropy of the total dataset is calculated for the target label/class. 2. The dataset is then split on different features. a) The entropy for each branch is calculated. Then it is added proportionally, to get total weighted entropy for the split. b) The resulting entropy is subtracted from the entropy before the split. c) The result is the Information Gain. 3. The feature that yields the largest IG is chosen for the decision node.
  • 66.
    Here is thealgorithmic steps 1. First the entropy of the total dataset is calculated for the target label/class. 2. The dataset is then split on different features. a) The entropy for each branch is calculated. Then it is added proportionally, to get total weighted entropy for the split. b) The resulting entropy is subtracted from the entropy before the split. c) The result is the Information Gain. 3. The feature that yields the largest IG is chosen for the decision node. 4. Repeat step #2 and #3, for each subset of the data(for each internal node) until: a) All the dependent features are exhausted b) The stopping criteria are met.
  • 67.
    Thankfully, we donot have to do all this(like calculating Entropy, IG, etc.), we have lots of libraries/packages available in Python which we can use to solve a problem with decision tree.
  • 68.
    Can you pleaseshow the CODE pls…
  • 69.
    Amazon Rekognition Amazon Personalize Amazon Textract Amazon Comprehend Amazon Translate Amazon Polly Amazon Transcribe + Medical Amazon Lex V IS I O N T E X T C H A T B O T SS P E E C H P E R S O N A L I Z A T I O N Ground Truth data labelling ML Marketplace SageMaker Studio IDE SageMaker Notebooks SageMaker Experiments SageMaker Debugger SageMaker Autopilot SageMaker Model Monitor Model training Model tuning Model hosting Built-in algorithms SageMaker Neo N E W ! N E W ! N E W ! N E W !N E W ! Deep Learning AMIs & Containers GPUs and CPUs Inferentia Elastic Inference FPGA N E W ! N E W ! N E W ! N E W ! A M A Z O N S A G E M A K E R M L F R A M E W O R K S & I N F R A S T R U C T U R E A I S E R V I C E S Amazon Forecast F O R E C A S T I N G AWS ML Stack Broadest and most complete set of Machine Learning capabilities
  • 70.
    Amazon Rekognition Amazon Personalize Amazon Textract Amazon Comprehend Amazon Translate Amazon Polly Amazon Transcribe + Medical Amazon Lex V IS I O N T E X T C H A T B O T SS P E E C H P E R S O N A L I Z A T I O N Ground Truth data labelling ML Marketplace SageMaker Studio IDE SageMaker Notebooks SageMaker Experiments SageMaker Debugger SageMaker Autopilot SageMaker Model Monitor Model training Model tuning Model hosting Built-in algorithms SageMaker Neo N E W ! N E W ! N E W ! N E W !N E W ! Deep Learning AMIs & Containers GPUs and CPUs Inferentia Elastic Inference FPGA N E W ! N E W ! N E W ! N E W ! A M A Z O N S A G E M A K E R M L F R A M E W O R K S & I N F R A S T R U C T U R E A I S E R V I C E S Amazon Forecast F O R E C A S T I N G AWS ML Stack Broadest and most complete set of Machine Learning capabilities
  • 71.
    Amazon Rekognition Amazon Personalize Amazon Textract Amazon Comprehend Amazon Translate Amazon Polly Amazon Transcribe + Medical Amazon Lex V IS I O N T E X T C H A T B O T SS P E E C H P E R S O N A L I Z A T I O N Ground Truth data labelling ML Marketplace SageMaker Studio IDE SageMaker Notebooks SageMaker Experiments SageMaker Debugger SageMaker Autopilot SageMaker Model Monitor Model training Model tuning Model hosting Built-in algorithms SageMaker Neo N E W ! N E W ! N E W ! N E W !N E W ! Deep Learning AMIs & Containers GPUs and CPUs Inferentia Elastic Inference FPGA N E W ! N E W ! N E W ! N E W ! A M A Z O N S A G E M A K E R M L F R A M E W O R K S & I N F R A S T R U C T U R E A I S E R V I C E S Amazon Forecast F O R E C A S T I N G AWS ML Stack Broadest and most complete set of Machine Learning capabilities © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved | Amazon SageMaker | • Jupyter notebooks • Support Jupyter Lab • Multiple built-in kernels • Bring your own kernels • Integrate with Git • Sample notebooks
  • 72.
    Reference Blog: An Introductionto Decision Tree and Ensemble Methods – Part 1 Code: Repository
  • 73.
  • 74.
  • 75.
  • 76.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved | Amazon SageMaker | VPC Notebook VPC Private subnetPublic subnet SageMaker Internet Gateway Internet Gateway NAT ENI SageMaker Endpoint VPN Gateway ENI Customer VPC On Premises Amazon VPC InternetInternet Accessing Internet and other resources from your SageMaker Notebook
  • 77.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved | Amazon SageMaker | SageMaker Amazon EC2 Amazon Elastic Container Registry Amazon CloudWatch AWS CodeCommit Amazon Simple Storage Service Amazon SageMaker AWS RoboMaker AWS Lambda Passes Execution Role AWS Key Management AWS Identity Management Role User CreateNotebookInstance CreateHyperParameterTuningJob CreateTrainingJob CreateModel Permissions Model
  • 78.
  • 79.
    Model Training –Split training data All Labeled Dataset Training Data 70% 30%
  • 80.
    Model Training –Training w/ training data All Labeled Dataset Training Data 70% 30% Training Trial Model
  • 81.
    Model Training –Split the test data All Labeled Dataset Training Data 70% 30% Training Trial Model Test Data
  • 82.
    Model Training –Model evaluation All Labeled Dataset Training Data 70% 30% Training Test Data Evaluation Result Trial Model
  • 83.
    Model Training -Performance Measurement All Labeled Dataset Training Data 70% 30% Training Test Data Evaluation Result Trial Model Accuracy