0% found this document useful (0 votes)
335 views

Outlook Temp Humidity Windy Play

The document contains weather and play data for 14 days. It analyzes the data using entropy and information gain to determine the attribute that best splits the data for predicting whether play is possible. Temperature has the highest information gain, followed by outlook, humidity, and windy. Splitting by temperature according to hot vs cool/mild minimizes entropy and best predicts whether play is possible.

Uploaded by

Sarfaraz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
335 views

Outlook Temp Humidity Windy Play

The document contains weather and play data for 14 days. It analyzes the data using entropy and information gain to determine the attribute that best splits the data for predicting whether play is possible. Temperature has the highest information gain, followed by outlook, humidity, and windy. Splitting by temperature according to hot vs cool/mild minimizes entropy and best predicts whether play is possible.

Uploaded by

Sarfaraz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Outlook Temp Humidity Windy Play

Sunny Hot High 0 No


Sunny Hot High 1 No
Overcast Hot High 0 Yes
Rainy Mild High 0 Yes
Rainy Cool Normal 0 Yes
Rainy Cool Normal 1 No
Overcast Cool Normal 1 Yes
Sunny Mild High 0 No
Sunny Cool Normal 0 Yes
Rainy Mild Normal 0 Yes
Sunny Mild Normal 1 Yes
Overcast Mild High 1 Yes
Overcast Hot Normal 0 Yes
Rainy Mild High 1 No

total entropy 0.940286


TEMP
P(Yes) 0.64 HOT 4 2Y,2N
P(No) 0.357143 MILD 6 4Y,2N
COOL 4 3Y,1N
number of yes play 0.940286
number of no play 0.940286 HUMIDITY
HIGH 7 3Y,4N
NORMAL 7 6Y,1N

WINDY
1 6 3Y,3N
0 8 6Y,2N

OUTLOOK
Sunny 5 2Y,3N
Overcast 4 4Y
Rainy 5 3Y,2N
entropy info E gain split info gain ratio
1 0.91106339301168 0.940286 0.029223 1.556657 0.018773
0.918295834054
0.811278124459

Entropy Info E gain split info gain ratio


0.985228136034 0.78845045730829 0.940286 0.151836 1 0.151836
0.591672778582

Entropy Info E gain split info gain ratio


1 0.89215892826236 0.940286 0.048127 0.985228 0.048849
0.811278124459

Entropy(outlook) INFO E(S) GAIN(OUTLsplit ratio Gain ratio


0.970950594455 0.69353613889619 0.940286 0.24675 1.577406 0.156428
0
0.970950594455
Outlook Temp Humidity Windy Play
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Rainy Mild Normal False Yes
Rainy Mild High 1 No

TEMP entropy info E(S) Gain Split ratio Gain ratio


HOT 0 0 0.952 0.97 0.018 0.970951 0.018539
MILD 3 2Y,1N 0.92
COOL 2 1Y,1N 1

HUMIDITY
HIGH 2 1Y, 1N 1 0.952 0.97 0.018 0.970951 0.018539
NORMAL 3 2Y, 1N 0.92

WINDY
1 3 3Y 0 0 0.97 0.97 0.970951 0.999021
0 2 2N 0
Outlook Temp Humidity Windy Play
Sunny Hot High 0 No
Sunny Hot High True No
Sunny Mild High 0 No
Sunny Cool Normal 0 Yes
Sunny Mild Normal 1 Yes

TEMP ENTROPY INFO E(S) GAIN split info gain ratio


HOT 2 2N 0 0.4 0.970951 0.570951 1.521928 0.37515
MILD 2 1Y,1N 1
COOL 1 1Y 0

HUMIDITY split ratio gain ratio


HIGH 3 3N 0 0 0.970951 0.970951 0.970951 1
NORMAL 2 2Y 0

WINDY
1 2 1Y,1N 1 0.950978 0.970951 0.019973 0.970951 0.020571
0 3 1Y, 2N 0.918296
CART Classification & Regression Tree

CART creates a Binary Tree

Formula
Outlook Temp Humidity Windy Play
Sunny Hot High 0 No Gini (D)
Sunny Hot High 1 No
Overcast Hot High 0 Yes
Rainy Mild High 0 Yes Group 1 (Overcast, Rainy)
Rainy Cool Normal 0 Yes Group 2 Sunny
Rainy Cool Normal 1 No
Overcast Cool Normal 1 Yes Gini A(Overcast, Rainy)=Gini A(Sunny)
Sunny Mild High 0 No Delta Gini A(Overcast,Sunny)
Sunny Cool Normal 0 Yes
Rainy Mild Normal 0 Yes
Sunny Mild Normal 1 Yes Group 1 (Sunny, Overcast)
Overcast Mild High 1 Yes Group 2 Rainy
Overcast Hot Normal 0 Yes
Rainy Mild High 1 No Gini(Sunny, Overcast)= Gini A (Rainy)
Delta Gini A(Sunny, Overcast)

Group 1 (Sunny, Rainy)


Group 2 Overcast

Gini A (Sunny, Rainy)


Delta Gini A(Sunny, Rainy)
1-P(play)^2-P(no play)^2 Delta gini (A)=gini (D)-gini(A)4or3..

0.459184

9 days 7Y, 2N
5 days 2Y, 3N

0.393651
0.065533

9 days 6Y, 3N
5 Days 3Y, 2N

0.457143
0.002041

10 days 5Y, 5N D1=10


4 days 4Y, 0N D2=4

0.357143
0.102041
Outlook Temp Humidity Windy Play
Overcast Hot High 0 Yes
Overcast Cool Normal 1 Yes Gini (D) 0.459184
Overcast Mild High 1 Yes
Overcast Hot Normal 0 Yes Gini (Hot, Mild) 10 days
Rainy Mild High 0 Yes Cool 4 days
Rainy Cool Normal 0 Yes
Rainy Cool Normal 1 No Gini (Hot, Cool) Hot, Cool 8
Rainy Mild Normal 0 Yes Mild 6
Rainy Mild High 1 No
Sunny Hot High 0 No Gini (Mild, Cool) Mild, Cool 10
Sunny Hot High 1 No Hot 4
Sunny Mild High 0 No Humidity
Sunny Cool Normal 0 Yes Gini (Humidity) High 7
Sunny Mild Normal 1 Yes Normal 7
Windy
Gini (Windy) 0 8
1 6

Outlook

Gini (Sunny, Overcast)

Gini (sunny, rainy)


Gini (Attribute) Delta Gini (attribute)
6Y, 4N 0.45 0.009184
3Y, 1N

5Y, 3N 0.4583333333 0.00085


4Y, 2N

7Y, 3N 0.4428571429 0.016327


2Y, 2N

3Y, 4N 0 0.459184
6Y, 1N

6Y, 2N 0.4285714286 0.030612


3Y, 3N

0.4571 0.002084

0.357
Group 1
Outlook Temp Humidity Windy Play Outlook
Rainy Mild High 0 Yes Overcast
Rainy Cool Normal 0 Yes Overcast
Rainy Cool Normal 1 No Overcast
Rainy Mild Normal 0 Yes Overcast
Rainy Mild High 1 No
Sunny Hot High 0 No
Sunny Hot High 1 No
Sunny Mild High 0 No
Sunny Cool Normal 0 Yes
Sunny Mild Normal 1 Yes

Gini (D) 10 5Y, 5N 0.5

Outlook Gini (attribute)


Sunny 5 2Y, 3N 0.48
Rainy 5 3Y, 2N

Temp
(Hot, Mild) 7 3Y, 4N 0.47619047619
Cool 3 2Y, 1N

(Hot, Cool) 5 2Y, 3N 0.48


Mild 5 3Y, 2N

(Cool, Mild) 8 5Y, 3N 0.375


Hot 2 2N

Humidity
High 5 1Y, 4N 0.32
Normal 5 4Y, 1N

Windy
0 6 4Y, 2N 0.41666666667
1 4 1Y, 3N
Group 2
Temp Humidity Windy Play
Hot High 0 Yes
Cool Normal 1 Yes
Mild High 1 Yes
Hot Normal 0 Yes

Delta Gini (attribute)


0.02

0.023809523809524

0.02

0.125

0.18

0.083333333333333
Outlook Temp Humidity Windy Play
Sunny Hot high 0 No
Sunny Hot high 1 No Outlook Temp Humidity Windy
Rainy Mild high 0 Yes Rainy Cool Normal 0
Sunny Mild high 0 no Rainy Cool Normal 1
Rainy Mild high 1 no Sunny Cool Normal 0
Rainy Mild Normal 0
Sunny Mild Normal 1
Gini (D) 5 1Y, 4N 0.32
Gini (attribute) Delta Gini (attribute)
Outlook Sunny 3 0Y, 3N 0.2 0.12
Rainy 2 1Y, 1N

Hot 2 0Y, 2N 0.2666666667 0.053333


Mild 3 1Y, 2N

0 3 1Y, 2N 0.2666666667 0.053333


1 2 0Y, 2N

The main objective is to find homogenity


Play
Yes
No
Yes
Yes
Yes
Advantages of Decision Tree
1
2
3
4
5
6
7

Disadvantages of Decision Tree Analysis


1
2
3
4
5
6
7
Advantages of Decision Tree
Easy Classification and data interpretation
Depicts most suitable project
Doesn’t require normalization of data
Doesn’t require scaling of data as well
Compared to other algorithm decision trees requires less effort for data preparation during pre-processing
Missing values in the data also do not affect the process of building a decision tree to any considerable extent
A decision tree model is very intuitive and easy to explain to technical teams as well as stakeholders

Disadvantages of Decision Tree Analysis


Inappropriate for Excessive Data
Difficult to handle numerous outcomes
Chances for classification error
Impact of variance
Unsuitable for continuous variable
Sensitive towards biasness
Expensive Process
Individual Variable 1 Variable 2
A 1 1 individual mean vector
B 1.5 2 Group 1 A (1,1)
C 3 4 Group 2 F (3.5,4)
D 5 7
E 3.5 5
F 4.5 5
G 3.5 4 Centroid 1 (1,1) Centroid 2 (3.5,4)
A 0 3.9051248379533
B 1.11803398875 2.8284271247462
C 3.605551275464 0.5
Step 1 Select k=2 Choose 2 individual randomly D 7.211102550928 3.3541019662497
E 4.716990566028 1
F 5.315072906367 1.4142135623731
G 3.905124837953 0

A
B
C
D
E
F
G

Group 1 Group 2
A C
B D
x 1.25 E
y 1.5 F
G
3.9 x
5y
X Y
Group 1 1.25 1.5
Group 2 3.9 5

SDFSF
1.833333
8
Centroid 1(1.25,1.5) Centroid 2 (3.9,5)
0.559016994374947 4.94064773081425 7
0.559016994374947 3.84187454245971 6
3.05163890393343 1.34536240470737
5
6.65676347784717 2.28254244210267
4.16082924427331 0.4 4
4.7762432936357 0.6 3
3.36340601176843 1.0770329614269
2

0
0.5 1 1.5 2 2.5 3 3.5 4 4.5
Since Individual in both group remain
same in both group. So A,B will be in
cluster 1 and C,D,E,F,G will be cluster
2

SDFSF

2.5 3 3.5 4 4.5 5 5.5

You might also like