0% found this document useful (0 votes)

27 views17 pages

Week 7 - Graded

Machine learning

Uploaded by

skgautamkorba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views17 pages

Week 7 - Graded

Machine learning

Uploaded by

skgautamkorba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Graded

This document has questions.

Question-1
Statement
We have a dataset of points for a classification problem using -NN algorithm. Now
consider the following statements:

S1: If , it is enough if we store any points in the training dataset.

S2: If , we need to store the entire dataset.

S3: The number of data-points that we have to store increases as the size of increases.

S4: The number of data-points that we have to store is independent of the value of .

Options
(a)

S1 and S3 are true statements

(b)

S2 and S4 are true statements

(c)

S1 alone is a true statement

(d)

S3 alone is a true statement

(e)

S4 alone is a true statement

Answer
(b)

Solution
The entire training dataset has to be stored in memory. For predicting the label of a test-point, we
have to perform the following steps:

Compute distance of the test-point from each training point.

Sort the training data points in ascending order of distance.
Choose the first points in this sequence.
Return that label which garners the maximum vote among these neighbors.
Question-2
Statement
The blue and the red points belong to two different classes. Both of them are a part of the
training dataset. The black point at also belongs to the training dataset, but its true color is
hidden form our view. The black point at is a test-point.

How should we recolor the black train point if the test point is classified as "red" without any
uncertainty by a -NN classifier, with ? Use the Euclidean distance metric for computing
distances.

Options
(a)

blue

(b)

red

(c)

Insufficient information

Answer
(b)
Solution
Since we are looking at the -NN algorithm with , we need to look at the four nearest
neighbors of the test data-point. The four points from the training dataset that are closest to the
test data-point are the following:

: black
: blue
: red
: red

Each of them is at unit distance from the test data-point. From the problem statement, it is given
that the test data-point is classified as "red" without any uncertainty. Let us now consider two
scenarios that concern the black training data-point at :

Black training data-point is colored red

There are three red neighbors and one blue neighbor. Therefore, the test-data point will be
classified as red. There is no uncertainty in the classification. This is what we want. However, for
the sake of completeness, let us look at the alternative possibility.

Black training data-point is colored blue

There will be exactly two neighbors that are blue and two that are red. In such a scenario, we can't
classify the black test-point without any uncertainty. That is, we could call it either red or blue.
This is one of the reasons why we choose an odd value of for the -NN algorithm. If is odd,
then this kind of a tie between the two classes can be avoided.
Question-3
Statement
Consider the following feature vectors:

The labels of these four points are:

If use a -NN algorithm with , what would be the predicted label for the following test point:

Answer
1

Solution
The distances are:

We see that among the three nearest neighbors, two have label 1 and one has label 0. Hence the
predicted label is 1. For those interested in a code for the same:

1 import numpy as np
2
3 x_1 = np.array([1, 2, 1, -1])
4 x_2 = np.array([5, -3, -5, 10])
5 x_3 = np.array([3, 1, 2, 4])
6 x_4 = np.array([0, 1, 1, 0])
7 x_5 = np.array([10, 7, -3, 2])
8
9 x_test = np.array([1, 1, 1, 1])
10
11 for x in [x_1, x_2, x_3, x_4, x_5]:
12 print(round(np.linalg.norm(x_test - x) ** 2))
Comprehension Type (4 to 6)
Statement
Consider the following split at some node in a decision tree:

The following is the distribution of data-points and their labels:

Node Num of points Labels

Q1 100 0

Q1 100 1

L1 50 0

L1 30 1

L2 50 0

L2 70 1

For example, L1 has 80 points of which 50 belong to class 0 and 30 belong to class 1. Use for
all calculations that involve logarithms.
Question-4
Statement
If the algorithm is terminated at this level, then what are the labels associated with L1 and L2?

Options
(a)

L1 : 0

(b)

L1 : 1

(c)

L2 : 0

(d)

L2 : 1

Answer
(a), (d)

Solution
has data-points out of which belong to class- and belong to class- . Since the
majority of the points belong to class- , this node will have as the predicted label.
has data-points out of which belong to class- and belong to class- . Since the
majority of the points belong to class , this node will have as the predicted label.
Question-5
Statement
What is the impurity in L1 if we use entropy as a measure of impurity? Report your answer correct
to three decimal places.

Answer

Range: [0.94, 0.96]

Solution
If represents the proportion of the samples that belong to class-1 in a node, then the impurity
of this node using entropy as a measure is:

For , . So, the impurity for turns out to be:

Code for reference:

1 import math
2 imp = lambda p: -p * math.log2(p) - (1 - p) * math.log2(1 - p)
3 print(imp(3 / 8))
Question-6
Statement
What is the information gain for this split? Report your answer correct to three decimal places.
Use at least three decimal places in all intermediate computations.

Answer

Range: [0.025, 0.035]

Solution
The information gain because of this split is equal to the decrease in impurity. Here, and
denote the cardinality of the leaves. is the total number of points before the split at node .

For this problem, the variables take on the following values:

, there are points in the node .

, there are points in the node .
, there are points in the node .

To calculate the entropy of the three nodes, we need the proportion of points that belong to
class-1 in each of the three nodes. Let us call them for node , for node and for node
:

Now, we have all the data that we need to compute , and :

Now, we have all the values to compute the information gain:

Code for reference:

1 import math
2 imp = lambda p: -p * math.log2(p) - (1 - p) * math.log2(1 - p)
3
4 p_0 = 1 / 2
5 p_1 = 3 / 8
6 p_2 = 7 / 12
7
8 n = 200
9 l_1 = 80
10 l_2 = 120
11
12 ig = imp(p_0) - ((l_1 / n) * imp(p_1) + (l_2 / n) * imp(p_2))
13 print(ig)
Question-7
Statement
Consider the following decision tree. Q-i corresponds to a question. The labels are and .

If a test-point comes up for prediction, what is the minimum and maximum number of questions
that it would have to pass through before being assigned a label?

Options
(a)

(b)

(c)

(d)
(e)

Answer
(b), (e)

Solution
Look at all paths from the root to the leaves. Find the shortest and longest path.
Question-8
Statement
is the proportion of points with label 1 in some node in a decision tree. Which of the following
statements are true? [MSQ]

Options
(a)

As the value of increases from to , the impurity of the node increases

(b)

As the value of increases from to , the impurity of the node decreases

(c)

The impurity of the node does not depend on

(d)

correspond to the case of maximum impurity

Answer
(d)

Solution
Options (a) and (b) are incorrect as the impurity increases from to and then
decreases. Option-(c) is incorrect for obvious reasons.
Question-9
Statement
Consider a binary classification problem in which all data-points are in . The red points belong
to class and the green points belong to class . A linear classifier has been trained on this
data. The decision boundary is given by the solid line.

This classifier misclassifies four points. Which of the following could be a possible value for the
weight vector?

Options
(a)

(b)

(c)

(d)
Answer
(b)

Solution
The weight vector is orthogonal to the decision boundary. So it will lie on the dotted line. This
gives us two quadrants in which the vector can lie in: second or fourth. In other words, we only
need to figure out its direction. If it is pointing in the second quadrant, then there will be four
misclassifications. If it is pointing in the fourth quadrant then all but four points will be
misclassified.
Question-10
Statement
Which of the following are valid decision regions for a decision tree classifier for datapoints in ?
The question in every internal node is of the form . Both the features are positive real
numbers.

Options
(a)

(b)
(c)

(d)

Answer
(a), (b), (d)

Solution
A question of the form can only result in one of these two lines:

a horizontal line
a vertical line

It cannot produce a slanted line as shown in option-(c). Options (a) and (d) correspond to what are
called decision stumps: a single node splitting into two child nodes.

EPC ICT Specimen 2025 - 2027 Question Paper 1
100% (1)
EPC ICT Specimen 2025 - 2027 Question Paper 1
12 pages
introduction to neural networks
No ratings yet
introduction to neural networks
3 pages
4th Summative Test in TLE ICT-ppt-WITH ANS KEY-part 1
No ratings yet
4th Summative Test in TLE ICT-ppt-WITH ANS KEY-part 1
1 page
finals19
No ratings yet
finals19
16 pages
MS_key-4
No ratings yet
MS_key-4
4 pages
class-test-1
No ratings yet
class-test-1
5 pages
EE 769 2020.02.29 Mid Term Solution
No ratings yet
EE 769 2020.02.29 Mid Term Solution
6 pages
Topic 4
No ratings yet
Topic 4
32 pages
ML Quiz 3
No ratings yet
ML Quiz 3
4 pages
A2 Sol
No ratings yet
A2 Sol
17 pages
Final Exam Epfl 2020 Machine Leaning
No ratings yet
Final Exam Epfl 2020 Machine Leaning
16 pages
CS 771A: Intro To Machine Learning, IIT Kanpur Name Roll No Dept
No ratings yet
CS 771A: Intro To Machine Learning, IIT Kanpur Name Roll No Dept
4 pages
CS115 01
No ratings yet
CS115 01
38 pages
MidSem 202122 Solution
No ratings yet
MidSem 202122 Solution
7 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
Tex
No ratings yet
Tex
7 pages
AI42001 Practice 2
No ratings yet
AI42001 Practice 2
4 pages
Quiz2_Mock_Solutions
No ratings yet
Quiz2_Mock_Solutions
19 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
8 pages
WEB PAGE DESIGNING (1)
No ratings yet
WEB PAGE DESIGNING (1)
3 pages
Merging Result-Merged
No ratings yet
Merging Result-Merged
14 pages
Ds 6
No ratings yet
Ds 6
24 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
Midterm With Solutions
No ratings yet
Midterm With Solutions
26 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
DL Assignment Solutions
No ratings yet
DL Assignment Solutions
64 pages
endsem_ML_makeup_AK-_1_
No ratings yet
endsem_ML_makeup_AK-_1_
7 pages
Exam 21
No ratings yet
Exam 21
17 pages
CIS Debian Linux 11 STIG Benchmark v1.0.0
No ratings yet
CIS Debian Linux 11 STIG Benchmark v1.0.0
1,192 pages
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
No ratings yet
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
12 pages
finals19
No ratings yet
finals19
16 pages
MidA-F21
No ratings yet
MidA-F21
8 pages
Snake Game on FPGA Using Verilog
No ratings yet
Snake Game on FPGA Using Verilog
7 pages
Machine Learning Solutions
No ratings yet
Machine Learning Solutions
6 pages
T-Shirt-Template-Spec-Sheet-TheVectorLab
No ratings yet
T-Shirt-Template-Spec-Sheet-TheVectorLab
2 pages
Apple 2020 MacBook Air M1 Chip MGN63HNA Laptop (8GB RAM 256GB SSD 13.3-Inch (33.74 Cm) Display 8-Core CPU 7-Core GPU Mac O
No ratings yet
Apple 2020 MacBook Air M1 Chip MGN63HNA Laptop (8GB RAM 256GB SSD 13.3-Inch (33.74 Cm) Display 8-Core CPU 7-Core GPU Mac O
1 page
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
MedTerm Machine Learning
No ratings yet
MedTerm Machine Learning
14 pages
Midterm Solutions PDF
No ratings yet
Midterm Solutions PDF
17 pages
CSC 501-Embedded Systems
No ratings yet
CSC 501-Embedded Systems
14 pages
endsem_ML_regular_AK
No ratings yet
endsem_ML_regular_AK
7 pages
CS771: Machine Learning: Tools, Techniques and Applications Mid-Semester Exam
No ratings yet
CS771: Machine Learning: Tools, Techniques and Applications Mid-Semester Exam
7 pages
hw3
No ratings yet
hw3
7 pages
Midterm Solutions Machine
100% (1)
Midterm Solutions Machine
17 pages
Midterm Solutions For Machine Learning
No ratings yet
Midterm Solutions For Machine Learning
13 pages
CSC Assessment 1 - 2024 25 - V141024 - Assessment Brief Form 2024-25
No ratings yet
CSC Assessment 1 - 2024 25 - V141024 - Assessment Brief Form 2024-25
12 pages
Raymarine C Series Widescreen Brochure
No ratings yet
Raymarine C Series Widescreen Brochure
10 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
10-701 Midterm Exam, Fall 2007
No ratings yet
10-701 Midterm Exam, Fall 2007
25 pages
Final2019 Solutions
No ratings yet
Final2019 Solutions
23 pages
Week-11 Graded
No ratings yet
Week-11 Graded
8 pages
Burner-Management-System-Logic-and-Interlock
No ratings yet
Burner-Management-System-Logic-and-Interlock
9 pages
Sample-Resume-4
No ratings yet
Sample-Resume-4
2 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
(Historical Facts and Fictions) Monica M. Bontty - Ancient Rome - Facts and Fictions-ABC-CLIO (2020)
100% (1)
(Historical Facts and Fictions) Monica M. Bontty - Ancient Rome - Facts and Fictions-ABC-CLIO (2020)
256 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
KJHGKJHG
No ratings yet
KJHGKJHG
1 page
Analysis of Turbine Cycle
No ratings yet
Analysis of Turbine Cycle
5 pages
Feed Water Heater
No ratings yet
Feed Water Heater
5 pages
Steam Turbine
No ratings yet
Steam Turbine
4 pages
Advantages:: Q.No 1.a Ans
No ratings yet
Advantages:: Q.No 1.a Ans
12 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Bizhub 3622 - 4422
No ratings yet
Bizhub 3622 - 4422
355 pages
12s 701 Final
No ratings yet
12s 701 Final
17 pages
Jva Zm1 Manual
No ratings yet
Jva Zm1 Manual
36 pages
Final Exam: CS 189 Spring 2020 Introduction To Machine Learning
No ratings yet
Final Exam: CS 189 Spring 2020 Introduction To Machine Learning
19 pages
MachineLearning MidTerm UMT Spring 2021
100% (1)
MachineLearning MidTerm UMT Spring 2021
12 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
Power Xpert Meter Mobile Quick Start Guide: Instruction Booklet IB150013EN
No ratings yet
Power Xpert Meter Mobile Quick Start Guide: Instruction Booklet IB150013EN
6 pages
601 sp09 Midterm Solutions
No ratings yet
601 sp09 Midterm Solutions
14 pages
Appen Mobile Recorder - Android User Instructions: Installation and Recording Instructions
No ratings yet
Appen Mobile Recorder - Android User Instructions: Installation and Recording Instructions
17 pages
Oled55bx L Series
No ratings yet
Oled55bx L Series
70 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
56 pages
Software Testing
No ratings yet
Software Testing
174 pages
Can-Pcie402 Datasheet en 1
No ratings yet
Can-Pcie402 Datasheet en 1
2 pages
NPTEL ML Assignment Week1
100% (4)
NPTEL ML Assignment Week1
5 pages
IO-Link Master With Extra Power
100% (1)
IO-Link Master With Extra Power
2 pages
11 String Handling
No ratings yet
11 String Handling
32 pages
An Forensic View of Bangladesh Bank Reserve Heist
No ratings yet
An Forensic View of Bangladesh Bank Reserve Heist
4 pages
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
No ratings yet
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
25 pages
9523 GSM-19T - 35 V7.0 Mine
No ratings yet
9523 GSM-19T - 35 V7.0 Mine
3 pages
Sap Ui5
No ratings yet
Sap Ui5
13 pages
Femap Uputstvo
No ratings yet
Femap Uputstvo
66 pages
SSU Logs: at Piltel Don Jose
No ratings yet
SSU Logs: at Piltel Don Jose
3 pages
Ragu - PLM Assignment 1
100% (1)
Ragu - PLM Assignment 1
14 pages
DL - Assignment 3 Solution
No ratings yet
DL - Assignment 3 Solution
7 pages
Programming in Excel VBA
No ratings yet
Programming in Excel VBA
97 pages
Introduction To Machine Learning IIT KGP Week 2
100% (1)
Introduction To Machine Learning IIT KGP Week 2
14 pages
10-701 Midterm Exam Solutions, Spring 2007
No ratings yet
10-701 Midterm Exam Solutions, Spring 2007
20 pages
Shashemene Poly Technique College Coc
No ratings yet
Shashemene Poly Technique College Coc
10 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet

Week 7 - Graded

Uploaded by

Week 7 - Graded

Uploaded by

Graded

This document has questions.

S1: If , it is enough if we store any points in the training dataset.

S2: If , we need to store the entire dataset.

S1 and S3 are true statements

S2 and S4 are true statements

S1 alone is a true statement

S3 alone is a true statement

S4 alone is a true statement

Compute distance of the test-point from each training point.

Black training data-point is colored red

Black training data-point is colored blue

The labels of these four points are:

The following is the distribution of data-points and their labels:

Node Num of points Labels

Range: [0.94, 0.96]

For , . So, the impurity for turns out to be:

Code for reference:

Range: [0.025, 0.035]

For this problem, the variables take on the following values:

, there are points in the node .

Now, we have all the data that we need to compute , and :

Now, we have all the values to compute the information gain:

Code for reference:

As the value of increases from to , the impurity of the node increases

As the value of increases from to , the impurity of the node decreases

The impurity of the node does not depend on

correspond to the case of maximum impurity

You might also like