0% found this document useful (0 votes)
66 views14 pages

Practice Questions

This document contains 11 practice questions related to machine learning algorithms like linear regression, k-nearest neighbors (KNN), Euclidean distance, and K-means clustering. The questions cover topics like finding the regression line and slope for a dataset, choosing the best K value for KNN, calculating distances between data points, predicting class labels using KNN, and identifying cluster centroids after iterations of K-means clustering. It also includes a question on logistic regression outputs and selecting an appropriate learning rate for gradient descent.

Uploaded by

Chetanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views14 pages

Practice Questions

This document contains 11 practice questions related to machine learning algorithms like linear regression, k-nearest neighbors (KNN), Euclidean distance, and K-means clustering. The questions cover topics like finding the regression line and slope for a dataset, choosing the best K value for KNN, calculating distances between data points, predicting class labels using KNN, and identifying cluster centroids after iterations of K-means clustering. It also includes a question on logistic regression outputs and selecting an appropriate learning rate for gradient descent.

Uploaded by

Chetanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Practice Questions

Andrew Ng
P-1
• Find the least square regression line for the following set of
data
{(-1 , 0),(0 , 2),(1 , 4),(2 , 5)}

b = (Σy − m Σx) /N
m=1.7 c=1.9
Andrew Ng
P-2
• In the image below, which would be the best value for k
assuming that the algorithm you are using is k-Nearest
Neighbor.

A) 3
B) 10
C) 20
D 50

Andrew Ng
P-3
• Which of the following will be Euclidean Distance between
the two data point A(1,3) and B(2,3)?
A) 1
B) 2
C) 4
D) 8

Andrew Ng
P-4
• Which of the following will be Manhattan Distance between
the two data point A(1,3) and B(2,3)?

A) 1
B) 2
C) 4
D) 8

Andrew Ng
Instructions
• Suppose, you have given the following data where x and y are
the 2 input variables and Class is the dependent variable.

Andrew Ng
P-5
• Suppose, you want to predict the class of new data point x=1
and y=1 using eucludian distance in 3-NN. In which class this
data point belong to?

A) + Class
• B) – Class
• C) Can’t say
• D) None of these

Andrew Ng
P-6
• In the previous question, you are now want use 7-NN instead
of 3-KNN which of the following x=1 and y=1 will belong to?

A) + Class
• B) – Class
• C) Can’t say

Andrew Ng
P-7
• Assume, you want to cluster 7 observations into 3 clusters using K-Means
clustering algorithm. After first iteration clusters, C1, C2, C3 has following
observations:
• C1: {(2,2), (4,4), (6,6)}
• C2: {(0,4), (4,0)}
• C3: {(5,5), (9,9)}
• What will be the cluster centroids if you want to proceed for second iteration?
A. C1: (4,4), C2: (2,2), C3: (7,7)
B. C1: (6,6), C2: (4,4), C3: (9,9)
C. C1: (2,2), C2: (0,0), C3: (5,5)
D. None of these

Andrew Ng
P-8
• Assume, you want to cluster 7 observations into 3 clusters using K-Means
clustering algorithm. After first iteration clusters, C1, C2, C3 has following
observations:
• C1: {(2,2), (4,4), (6,6)}
• C2: {(0,4), (4,0)}
• C3: {(5,5), (9,9)}
• What will be the Manhattan distance for observation (9, 9) from cluster centroid
C1. In second iteration.
• A. 10
• B. 5*sqrt(2)
• C. 13*sqrt(2)
• D. None of these

Andrew Ng
P-9
• If two variables V1 and V2, are used for clustering. Which of the following are true
for K means clustering with k =3?

• 1. If V1 and V2 has a correlation of 1, the cluster centroids will be in a straight line


• 2. If V1 and V2 has a correlation of 0, the cluster centroids will be in straight line
• Options:
• A. 1 only
• B. 2 only
• C. 1 and 2
• D. None of the above

Andrew Ng
P-10
• Suppose you have trained a logistic regression classifier and it
outputs a new example x with a prediction ho(x) = 0.2. This
means
• Our estimate for P(y=1 | x)
• Our estimate for P(y=0 | x)
• Our estimate for P(y=1 | x)
• Our estimate for P(y=0 | x)

Andrew Ng
P-11
• You run gradient descent for 15 iterations with a=0.3 and compute
J(theta) after each iteration. You find that the value of J(Theta)
decreases quickly and then levels off. Based on this, which of the
following conclusions seems most plausible?
• Rather than using the current value of a, use a larger value of a (say
a=1.0)
• Rather than using the current value of a, use a smaller value of a
(say a=0.1)
• a=0.3 is an effective choice of learning rate - answer
• None of the above
Andrew Ng
KNN implementation
• https://stackabuse.com/k-nearest-neighbors-
algorithm-in-python-and-scikit-learn

Andrew Ng

You might also like