0% found this document useful (0 votes)

87 views

Midterm 2006

This midterm exam has 7 questions over 11 pages. Students have 1 hour and 20 minutes to complete the exam, which is open book and open notes but electronic devices are not allowed. The exam covers topics including conditional independence, maximum likelihood estimation, decision trees, neural networks, regression, bias-variance decomposition, support vector machines, and generative vs discriminative classifiers.

Uploaded by

Muhammad Murtaza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views

Midterm 2006

Uploaded by

Muhammad Murtaza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

10-701/15-781, Fall 2006, Midterm

• There are 7 questions in this exam (11 pages including this cover sheet).
• Questions are not equally difficult.
• If you need more room to work out your answer to a question, use the back of the page
and clearly mark on the front of the page if we are to look at what’s on the back.
• This exam is open book and open notes. Computers, PDAs, cell phones are not allowed.
• You have 1 hour and 20 minutes. Good luck!

Name:

Andrew ID:

Q Topic Max. Score Score

1 Conditional Independece, MLE/MAP, Probability 12

2 Decision Tree 12

3 Neural Network and Regression 18

4 Bias-Variance Decomposition 12

5 Support Vector Machine 12

6 Generative vs. Discriminative Classifier 20

7 Learning Theory 14

Total 100

2. (4 pts) If a data point y follows the Poisson distribution with rate parameter θ, then the
probability of a single observation y is

θy e−θ
p(y|θ) = , for y = 0, 1, 2, · · · .
y!
You are given data points y1 , · · · , yn independently drawn from a Poisson distribution with
parameter θ. Write down the log-likelihood of the data as a function of θ.

3. (4 pts) Suppose that in answering a question in a multiple choice test, an examinee either
knows the answer, with probability p, or he guesses with probability 1 − p. Assume that the
probability of answering a question correctly is 1 for an examinee who knows the answer and
1/m for the examinee who guesses, where m is the number of multiple choice alternatives.
What is the probability that an examinee knew the answer to a question, given that he has
correctly answered it?

2
2 Decision Tree (12 pts)
The following data set will be used to learn a decision tree for predicting whether students are
lazy (L) or diligent (D) based on their weight (Normal or Underweight), their eye color (Amber or
Violet) and the number of eyes they have (2 or 3 or 4).

Weight Eye Color Num. Eyes Output

N A 2 L
N V 2 L
N V 2 L
U V 3 L
U V 3 L
U A 4 D
N A 4 D
N V 4 D
U A 3 D
U A 3 D

The following numbers may be helpful as you answer this problem without using a calculator:
log2 0.1 = −3.32, log2 0.2 = −2.32, log2 0.3 = −1.73, log2 0.4 = −1.32, log2 0.5 = −1.
*You don’t need to show the derivation for your answers in this problem.

1. (3 pts) What is the conditional entropy H(EyeColor|W eight = N )?

2. (3 pts) What attribute would the ID3 algorithm choose to use for the root of the tree (no
pruning)?

3. (4 pts) Draw the full decision tree learned for this data (no pruning).

4. (2 pts) What is the training set error of this unpruned tree?

3
3 Neural Network and Regression (18 pts)
Consider a two-layer neural network to learn a function f : X → Y where X = hX1 , X2 i consists of
two attributes. The weights, w1 , · · · , w6 , can be arbitrary. There are two possible choices for the
function implemented by each unit in this network:
1
• S: signed sigmoid function S(a) = sign[σ(a) − 0.5] = sign[ 1+exp(−a) − 0.5]

• L: linear function L(a) = c a

P
where in both cases a = i wi Xi

1. (4 pts) Assign proper activation functions (S or L) to each unit in the following graph so this
neural network simulates a linear regression: Y = β1 X1 + β2 X2 .

2. (4 pts) Assign proper activation functions (S or L) for each unit in the following graph so this
neural network simulates a binary logistic regression classifier: Y = arg maxy P (Y = y|X),
exp(β1 X1 +β2 X2 ) 1
where P (Y = 1|X) = 1+exp(β 1 X1 +β2 X2 )
, P (Y = −1|X) = 1+exp(β1 X 1 +β2 X2 )
.

3. (3 pts) Following problem 3.2, derive β1 and β2 in terms of w1 , · · · , w6 .

4
4. (4 pts) Assign proper activation functions (S or L) for each unit in the following graph so this
neural network simulates a boosting classifier which combines two logistic regression classifiers,
f1 : X → Y1 and f2 : X → Y2 , to produce its final prediction: Y = sign[α1 Y1 + α2 Y2 ]. Use
the same definition in problem 3.2 for f1 and f2 .

5. (3 pts) Following problem 3.4, derive α1 and α2 in terms of w1 , · · · , w6 .

5
4 Bias-Variance Decomposition (12 pts)
1. (6 pts) Suppose you have regression data generated by a polynomial of degree 3. Characterize
the bias-variance of the estimates of the following models on the data with respect to the true
model by circling the appropriate entry.
Bias Variance
Linear regression low/high low/high
Polynomial regression with degree 3 low/high low/high
Polynomial regression with degree 10 low/high low/high

2. Let Y = f (X) + ², where ² has mean zero and variance σ²2 . In k-nearest neighbor (kNN)
regression, the prediction of Y at point x0 is given by the average of the values Y at the k
neighbors closest to x0 .

(a) (2 pts) Denote the `-nearest neighbor to x0 by x(`) and its corresponding Y value by
y(`) . Write the prediction fˆ(x0 ) of the kNN regression for x0 in terms of y(`) , 1 ≤ ` ≤ k.

(b) (2 pts) What is the behavior of the bias as k increases?

(c) (2 pts) What is the behavior of the variance as k increases?

6
5 Support Vector Machine (12 pts)
Consider a supervised learning problem in which the training examples are points in 2-dimensional
space. The positive examples are (1, 1) and (−1, −1). The negative examples are (1, −1) and
(−1, 1).

1. (1 pts) Are the positive examples linearly separable from the negative examples in the original
space?

2. (4 pts) Consider the feature transformation φ(x) = [1, x1 , x2 , x1 x2 ], where x1 and x2 are,
respectively, the first and second coordinates of a generic example x. The prediction function
is y(x) = wT ∗ φ(x) in this feature space. Give the coefficients, w, of a maximum-margin
decision surface separating the positive examples from the negative examples. (You should
be able to do this by inspection, without any significant computation.)

3. (3 pts) Add one training example to the graph so the total five examples can no longer be
linearly separated in the feature space φ(x) defined in problem 5.2.

4. (4 pts) What kernel K(x, x0 ) does this feature transformation φ correspond to?

7
6 Generative vs. Discriminative Classifier (20 pts)
Consider the binary classification problem where class label Y ∈ {0, 1} and each training example
X has 2 binary attributes X1 , X2 ∈ {0, 1}.
In this problem, we will always assume X1 and X2 are conditional independent given Y , that
the class priors are P (Y = 0) = P (Y = 1) = 0.5, and that the conditional probabilities are as
follows:

P (X1 |Y ) X1 = 0 X1 = 1 P (X2 |Y ) X2 = 0 X2 = 1
Y =0 0.7 0.3 Y =0 0.9 0.1
Y =1 0.2 0.8 Y =1 0.5 0.5

The expected error rate is the probability that a classifier provides an incorrect prediction for an
observation: if Y is the true label, let Ŷ (X1 , X2 ) be the predicted class label, then the expected
error rate is
³ ´ 1
X 1
X ³ ´
PD Y = 1 − Ŷ (X1 , X2 ) = PD X1 , X2 , Y = 1 − Ŷ (X1 , X2 ) .
X1 =0 X2 =0

Note that we use the subscript D to emphasize that the probabilities are computed under the true
distribution of the data.
*You don’t need to show all the derivation for your answers in this problem.

1. (4 pts) Write down the naı̈ve Bayes prediction for all the 4 possible configurations of X1 , X2 .
The following table would help you to complete this problem.

X1 X2 P (X1 , X2 , Y = 0) P (X1 , X2 , Y = 1) Ŷ (X1 , X2 )

0 0
0 1
1 0
1 1

2. (4 pts) Compute the expected error rate of this naı̈ve Bayes classifier which predicts Y given
both of the attributes {X1 , X2 }. Assume that the classifier is learned with infinite training
data.

8
3. (4 pts) Which of the following two has a smaller expected error rate?

• the naı̈ve Bayes classifier which predicts Y given X1 only

• the naı̈ve Bayes classifier which predicts Y given X2 only

4. (4 pts) Now, suppose that we create a new attribute X3 , which is a deterministic copy of X2 .
What is the expected error rate of the naı̈ve Bayes which predicts Y given all the attributes
(X1 , X2 , X3 ) now? Assume that the classifier is learned with infinite training data.

5. (4 pts) Explain what is happening with naı̈ve Bayes in problem 6.4? Does logistic regression
suffer from the same problem? Why?

9
7 Learning Theory (14 pts)
You read in the paper that the famous bird migration website, Netflocks, is offering a $1M prize
for accurately recommending movies about penguins. Furthermore, it is providing a training data
set containing 100,000,000 labeled training examples. Each training example consists of a set
of 100 real-valued features describing a movie, along with a boolean label indicating whether to
recommend this movie to a person.
You determine that the $1M can be yours if you can train a linear Support Vector Machine
with a true accuracy of 98%. Of course you understand that PAC learning theory provides only
probabilistic bounds, so you decide to enter only if you can prove you have at least a 0.9 probability
of achieving an accuracy of 98%.

1. (8 pts) Can you use PAC learning theory to decide whether you can meet your performance
objective? If yes, give an expression for the number of training examples sufficient to meet
your performance objective. If not, explain why not, then provide the minimum set of addi-
tional assumptions needed so that PAC learning theory can be applied, and give an expression
of the number of training examples sufficient under your assumptions. (you may leave your
expression as an unsolved arithmetic expression, but it should contain only constants - no
variables).

10
2. (3 pts) Consider the PAC-style statement “we can achieve true accuracy of at least 98% with
probability 0.9.” What is the meaning of “with probability 0.9”? Answer this by describing
a randomized experiment which you could perform repeatedly to test whether the statement
is true.

3. (3 pts) Your friend already has a private dataset of 100,000,000 labeled movies, so she will end
up with twice as much training data as you. You train using the Netflocks data to produce
a classifier h1 . She uses the same learning algorithm, but trains with twice as much data to
produce her output hypothesis, h2 . You are interested in how well the training errors of h1
and h2 predict their true errors. Consider the ratio

errortrain (h1 ) − errortrue (h1 )

errortrain (h2 ) − errortrue (h2 )
Which of these is the most likely value for this ratio? Circle the answer and give a one-sentence
explanation.

√ 1 1 1
4, 2, 2, 1, −1, √ , ,
2 2 4

* Any resemblance to real persons, animals, or organizations, living or dead, is purely coincidental.

SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Maths Quest 11C Teacher's Addition
100% (13)
Maths Quest 11C Teacher's Addition
640 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Midterm Solutions
No ratings yet
Midterm Solutions
8 pages
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
No ratings yet
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
8 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
midterm2008f_sol
No ratings yet
midterm2008f_sol
12 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
CS725 2020 Midsem
No ratings yet
CS725 2020 Midsem
3 pages
Sample Final AI
No ratings yet
Sample Final AI
9 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
CSCI_5521_Spring_2025_Final_Exam
No ratings yet
CSCI_5521_Spring_2025_Final_Exam
8 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
ML Midsem 2018 Solutions
No ratings yet
ML Midsem 2018 Solutions
7 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Final2019 Solutions
No ratings yet
Final2019 Solutions
23 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
Final 2019
No ratings yet
Final 2019
15 pages
SMAI End 2015 S
No ratings yet
SMAI End 2015 S
4 pages
Final: CS 189 Spring 2013 Introduction To Machine Learning
No ratings yet
Final: CS 189 Spring 2013 Introduction To Machine Learning
9 pages
Adobe Scan 30-May-2023
No ratings yet
Adobe Scan 30-May-2023
7 pages
hw5_1
No ratings yet
hw5_1
6 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Midterm 2002
No ratings yet
Midterm 2002
10 pages
Midterm Sol
No ratings yet
Midterm Sol
23 pages
CS725 2020 Quiz1
No ratings yet
CS725 2020 Quiz1
3 pages
MLFA Spring 2024
No ratings yet
MLFA Spring 2024
11 pages
hw3_red
No ratings yet
hw3_red
4 pages
2011_end_spring_2011_Computer_Science_Machine_Learning
No ratings yet
2011_end_spring_2011_Computer_Science_Machine_Learning
10 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
56 pages
Final 2006
No ratings yet
Final 2006
15 pages
ml-20230316-1
No ratings yet
ml-20230316-1
9 pages
ML June 2024
No ratings yet
ML June 2024
12 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
final_exam_solutions
No ratings yet
final_exam_solutions
12 pages
Midterm
No ratings yet
Midterm
12 pages
Midterm Solutions For Machine Learning
No ratings yet
Midterm Solutions For Machine Learning
13 pages
hw3
No ratings yet
hw3
7 pages
University of Edinburgh College of Science and Engineering School of Informatics
No ratings yet
University of Edinburgh College of Science and Engineering School of Informatics
5 pages
Machine Learning PYQ 2021
No ratings yet
Machine Learning PYQ 2021
4 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
ML End Sem Nov2024 Paper
No ratings yet
ML End Sem Nov2024 Paper
4 pages
Machine Learning PYQ 2023
No ratings yet
Machine Learning PYQ 2023
8 pages
MachineLearning MidTerm UMT Spring 2021
100% (1)
MachineLearning MidTerm UMT Spring 2021
12 pages
12s 701 Final
No ratings yet
12s 701 Final
17 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
No ratings yet
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
12 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
hw1
No ratings yet
hw1
4 pages
10-701 Midterm Exam Solutions, Spring 2007
No ratings yet
10-701 Midterm Exam Solutions, Spring 2007
20 pages
Exam Spring 10
No ratings yet
Exam Spring 10
10 pages
Machine Learning PYQ 2022 Ans
No ratings yet
Machine Learning PYQ 2022 Ans
17 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Statistical Methods for ML
No ratings yet
Statistical Methods for ML
24 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Homework 4
No ratings yet
Homework 4
3 pages
Homework 3
No ratings yet
Homework 3
4 pages
Homework 2
No ratings yet
Homework 2
4 pages
Final 2003
No ratings yet
Final 2003
18 pages
03 MLE MAP NBayes-1-21-2015
No ratings yet
03 MLE MAP NBayes-1-21-2015
40 pages
GD & T Introduction
No ratings yet
GD & T Introduction
1 page
JEE Main Physics Syllabus
No ratings yet
JEE Main Physics Syllabus
1 page
Download full An Invitation to the Theory of the Hybridizable Discontinuous Galerkin Method Projections Estimates Tools Shukai Du ebook all chapters
No ratings yet
Download full An Invitation to the Theory of the Hybridizable Discontinuous Galerkin Method Projections Estimates Tools Shukai Du ebook all chapters
55 pages
4As-Lesson-Plan
No ratings yet
4As-Lesson-Plan
3 pages
ISC - 2019 Paper-7: 2, Find A 3 Tan 1 3 (X, Y) : x+2 y 8
No ratings yet
ISC - 2019 Paper-7: 2, Find A 3 Tan 1 3 (X, Y) : x+2 y 8
3 pages
Surprise Test - by MC Sir
No ratings yet
Surprise Test - by MC Sir
2 pages
BY Moses Ssali at GHS 2020 0775620833/0704221195: Prov 1:7
No ratings yet
BY Moses Ssali at GHS 2020 0775620833/0704221195: Prov 1:7
5 pages
Quantum Condensed Matter Physics-Lecture
No ratings yet
Quantum Condensed Matter Physics-Lecture
480 pages
Morning of The Magicians
100% (7)
Morning of The Magicians
247 pages
Guide To Self Studying Calculus BC
0% (1)
Guide To Self Studying Calculus BC
4 pages
Warshalls Algorithm
No ratings yet
Warshalls Algorithm
6 pages
Math Curriculum Guide
No ratings yet
Math Curriculum Guide
30 pages
fractions-decimals-and-percentages-CsKRY6z3XWtgNKfQ
No ratings yet
fractions-decimals-and-percentages-CsKRY6z3XWtgNKfQ
12 pages
Calculate Square Root
No ratings yet
Calculate Square Root
3 pages
Get Precalculus 10th Edition Larson Solutions Manual Free All Chapters Available
100% (17)
Get Precalculus 10th Edition Larson Solutions Manual Free All Chapters Available
52 pages
Eigenfaces and Fisherfaces: Naotoshi Seo
No ratings yet
Eigenfaces and Fisherfaces: Naotoshi Seo
5 pages
Interactive Session4 System ID With Solution
No ratings yet
Interactive Session4 System ID With Solution
18 pages
Unit III - Module 7 - ENS181
No ratings yet
Unit III - Module 7 - ENS181
12 pages
Decimals
No ratings yet
Decimals
2 pages
Abstract
No ratings yet
Abstract
22 pages
75c4e1ff281cadf5c3213bce8cfc293c_20231205_123530
No ratings yet
75c4e1ff281cadf5c3213bce8cfc293c_20231205_123530
12 pages
Equations Linear, Quadratic, Cubic and Higher Orders: 86. Degree of An
No ratings yet
Equations Linear, Quadratic, Cubic and Higher Orders: 86. Degree of An
60 pages
7.9 Answers
No ratings yet
7.9 Answers
11 pages
Save My Calculus - Part 2 - Equation of Tangent
No ratings yet
Save My Calculus - Part 2 - Equation of Tangent
27 pages
Welcome To The Presentation On: General Functions of Complex Variables
No ratings yet
Welcome To The Presentation On: General Functions of Complex Variables
31 pages
L18 19 20 Recursion
No ratings yet
L18 19 20 Recursion
37 pages
Response Surface Methodology
No ratings yet
Response Surface Methodology
33 pages
Emc 2014 Juniors Eng
No ratings yet
Emc 2014 Juniors Eng
1 page
Geometric Explanation of The Beltrami Theorem
No ratings yet
Geometric Explanation of The Beltrami Theorem
5 pages

Midterm 2006

Uploaded by

Midterm 2006

Uploaded by

10-701/15-781, Fall 2006, Midterm

Q Topic Max. Score Score

1 Conditional Independece, MLE/MAP, Probability 12

3 Neural Network and Regression 18

5 Support Vector Machine 12

6 Generative vs. Discriminative Classifier 20

Weight Eye Color Num. Eyes Output

1. (3 pts) What is the conditional entropy H(EyeColor|W eight = N )?

4. (2 pts) What is the training set error of this unpruned tree?

• L: linear function L(a) = c a

3. (3 pts) Following problem 3.2, derive β1 and β2 in terms of w1 , · · · , w6 .

5. (3 pts) Following problem 3.4, derive α1 and α2 in terms of w1 , · · · , w6 .

(b) (2 pts) What is the behavior of the bias as k increases?

(c) (2 pts) What is the behavior of the variance as k increases?

X1 X2 P (X1 , X2 , Y = 0) P (X1 , X2 , Y = 1) Ŷ (X1 , X2 )

• the naı̈ve Bayes classifier which predicts Y given X1 only

errortrain (h1 ) − errortrue (h1 )

You might also like