HW 1 in 2015
HW 1 in 2015
f (x; θ) = θ0 + θ1 x + θ2 x2 + . . . + θd xd
where d is the degree of the polynomial. Develop code that finds the θ which minimizes the risk
N
1 X1
Remp (θ) = (yi − f (x; θ))2
N i=1 2
on a data-set. To help you get started, download the Matlab code in “polyreg.m” (on the tutorial
web page) to do polynomial curve fitting. Use your code on the dataset “problem1.mat”. This should
include a matrix x, corresponding to the scalar features {x1 , . . . , xN }, and a matrix y, corresponding
to the scalar labels {y1 , . . . , yN }. Fit a polynomial model to this data for various choices for d, the
degree of the polynomial.
Which value(s) of d seems somewhat more reasonable? Please justify your answer using some
empirical measure.
It is easy to overfit the data when using polynomial regression. As a result, use cross-validation
by randomly splitting the data-set into two halves to select the complexity of the model (in this
case, the degree of the polynomial). Include a plot showing the training and testing risk across
various choices of d, and plot your f (x; θ) overlaid on the data for the best choice of d according to
cross-validation.
2 Problem 2 (10 points)
Regularized risk minimization: Modify the Matlab code for “polyreg.m” such that it learns a multi-
variate regression function f : R100 → R, where the basis functions are of the form
k
X
f (x; θ) = θ i xi
i=1
The data-set is available in “problem2.mat”. As before, the x variable contains {x1 , . . . , xN } and the
y variable contains their scalar labels {y1 , . . . , yN }.
Use an l2 loss function to penalize the complexity of the model, e.g. minimize the risk
N
1 X1 λ
Rreg (θ) = (yi − f (x; θ))2 + kθk2
N i=1 2 2N
Use two-fold cross validation (as in Problem 1) to find the best value for λ. Include a plot showing
training and testing risk across various choices of λ. A reasonable range for this data set would be
from λ = 0 to λ = 1000. Also, mark the λ which minimizes the testing error on the data set.
What do you notice about the training and testing error?
Since you are using gradient descent, you will have to specify the step size η and the tolerance .
Pick reasonable values for η and to then use your code to learn a classification function for the
dataset in “dataset4.mat”. Type “load dataset4” and you will have the variables X (input vectors)
and Y (binary labels) in your Matlab environment which contain the dataset.
Show any derivations you need to make for this algorithm.
Use the whole data set as training. Show with figures the resulting linear decision boundary on the
2D X data. Show the binary classification error and the empirical risk you obtained throughout
the run from random initialization until convergence. Note the number of iterations needed for your
choice of η and .