Week 7 - Graded
Week 7 - Graded
S3: The number of data-points that we have to store increases as the size of increases.
S4: The number of data-points that we have to store is independent of the value of .
Options
(a)
(b)
(c)
(d)
(e)
Answer
(b)
Solution
The entire training dataset has to be stored in memory. For predicting the label of a test-point, we
have to perform the following steps:
How should we recolor the black train point if the test point is classified as "red" without any
uncertainty by a -NN classifier, with ? Use the Euclidean distance metric for computing
distances.
Options
(a)
blue
(b)
red
(c)
Insufficient information
Answer
(b)
Solution
Since we are looking at the -NN algorithm with , we need to look at the four nearest
neighbors of the test data-point. The four points from the training dataset that are closest to the
test data-point are the following:
: black
: blue
: red
: red
Each of them is at unit distance from the test data-point. From the problem statement, it is given
that the test data-point is classified as "red" without any uncertainty. Let us now consider two
scenarios that concern the black training data-point at :
There are three red neighbors and one blue neighbor. Therefore, the test-data point will be
classified as red. There is no uncertainty in the classification. This is what we want. However, for
the sake of completeness, let us look at the alternative possibility.
There will be exactly two neighbors that are blue and two that are red. In such a scenario, we can't
classify the black test-point without any uncertainty. That is, we could call it either red or blue.
This is one of the reasons why we choose an odd value of for the -NN algorithm. If is odd,
then this kind of a tie between the two classes can be avoided.
Question-3
Statement
Consider the following feature vectors:
If use a -NN algorithm with , what would be the predicted label for the following test point:
Answer
1
Solution
The distances are:
We see that among the three nearest neighbors, two have label 1 and one has label 0. Hence the
predicted label is 1. For those interested in a code for the same:
1 import numpy as np
2
3 x_1 = np.array([1, 2, 1, -1])
4 x_2 = np.array([5, -3, -5, 10])
5 x_3 = np.array([3, 1, 2, 4])
6 x_4 = np.array([0, 1, 1, 0])
7 x_5 = np.array([10, 7, -3, 2])
8
9 x_test = np.array([1, 1, 1, 1])
10
11 for x in [x_1, x_2, x_3, x_4, x_5]:
12 print(round(np.linalg.norm(x_test - x) ** 2))
Comprehension Type (4 to 6)
Statement
Consider the following split at some node in a decision tree:
Q1 100 0
Q1 100 1
L1 50 0
L1 30 1
L2 50 0
L2 70 1
For example, L1 has 80 points of which 50 belong to class 0 and 30 belong to class 1. Use for
all calculations that involve logarithms.
Question-4
Statement
If the algorithm is terminated at this level, then what are the labels associated with L1 and L2?
Options
(a)
L1 : 0
(b)
L1 : 1
(c)
L2 : 0
(d)
L2 : 1
Answer
(a), (d)
Solution
has data-points out of which belong to class- and belong to class- . Since the
majority of the points belong to class- , this node will have as the predicted label.
has data-points out of which belong to class- and belong to class- . Since the
majority of the points belong to class , this node will have as the predicted label.
Question-5
Statement
What is the impurity in L1 if we use entropy as a measure of impurity? Report your answer correct
to three decimal places.
Answer
Solution
If represents the proportion of the samples that belong to class-1 in a node, then the impurity
of this node using entropy as a measure is:
1 import math
2 imp = lambda p: -p * math.log2(p) - (1 - p) * math.log2(1 - p)
3 print(imp(3 / 8))
Question-6
Statement
What is the information gain for this split? Report your answer correct to three decimal places.
Use at least three decimal places in all intermediate computations.
Answer
Solution
The information gain because of this split is equal to the decrease in impurity. Here, and
denote the cardinality of the leaves. is the total number of points before the split at node .
To calculate the entropy of the three nodes, we need the proportion of points that belong to
class-1 in each of the three nodes. Let us call them for node , for node and for node
:
If a test-point comes up for prediction, what is the minimum and maximum number of questions
that it would have to pass through before being assigned a label?
Options
(a)
(b)
(c)
(d)
(e)
Answer
(b), (e)
Solution
Look at all paths from the root to the leaves. Find the shortest and longest path.
Question-8
Statement
is the proportion of points with label 1 in some node in a decision tree. Which of the following
statements are true? [MSQ]
Options
(a)
(b)
(c)
(d)
Answer
(d)
Solution
Options (a) and (b) are incorrect as the impurity increases from to and then
decreases. Option-(c) is incorrect for obvious reasons.
Question-9
Statement
Consider a binary classification problem in which all data-points are in . The red points belong
to class and the green points belong to class . A linear classifier has been trained on this
data. The decision boundary is given by the solid line.
This classifier misclassifies four points. Which of the following could be a possible value for the
weight vector?
Options
(a)
(b)
(c)
(d)
Answer
(b)
Solution
The weight vector is orthogonal to the decision boundary. So it will lie on the dotted line. This
gives us two quadrants in which the vector can lie in: second or fourth. In other words, we only
need to figure out its direction. If it is pointing in the second quadrant, then there will be four
misclassifications. If it is pointing in the fourth quadrant then all but four points will be
misclassified.
Question-10
Statement
Which of the following are valid decision regions for a decision tree classifier for datapoints in ?
The question in every internal node is of the form . Both the features are positive real
numbers.
Options
(a)
(b)
(c)
(d)
Answer
(a), (b), (d)
Solution
A question of the form can only result in one of these two lines:
a horizontal line
a vertical line
It cannot produce a slanted line as shown in option-(c). Options (a) and (d) correspond to what are
called decision stumps: a single node splitting into two child nodes.