0% found this document useful (0 votes)
64 views8 pages

Ee5239 HW3-2

This document summarizes the results of running various optimization algorithms - steepest descent, diminishing step size, and Armijo rule - on two different datasets. For the first dataset, the algorithms were tested on a 50x10 matrix. Armijo rule performed best with parameters B=0.5, s=0.2, σ=0.1. For the second larger dataset using a 7291x3 matrix, Armijo rule again performed best with similar parameters, while diminishing step size outperformed fixed step size. Armijo rule was identified as the overall best performing algorithm between the three tested.

Uploaded by

Sarthak Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views8 pages

Ee5239 HW3-2

This document summarizes the results of running various optimization algorithms - steepest descent, diminishing step size, and Armijo rule - on two different datasets. For the first dataset, the algorithms were tested on a 50x10 matrix. Armijo rule performed best with parameters B=0.5, s=0.2, σ=0.1. For the second larger dataset using a 7291x3 matrix, Armijo rule again performed best with similar parameters, while diminishing step size outperformed fixed step size. Armijo rule was identified as the overall best performing algorithm between the three tested.

Uploaded by

Sarthak Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Homework 3

Sarthak Jain

2018/10/18

1 Running Algorithm on Data Set

1.1 Problem Formulation

min (1/2)||Ax − b||2 (1)

where A is a randomly generated 50x10 matrix, and b is a random 50X1 vector. Higher eigenvalue for
the above matrix is 61.9 (i.e. L=61.9).

1.2 Data Preprocessing

The feature matrix A is first ”scaled and shifted” according to the following equation:

A = (A − mean(A))/σ (2)

This makes sure that the condition number is good and the algorithm converges faster.

1.3 Using Steepest descent

Figure 1 shows the convergence of steepest descent with various step sizes in close proximity to 1/L. In
Figure 1 we notice that as the step size increases the convergence time increases upto 0.015 and then it
decreases again.

Figure 1: log(Error) vs Number of iterations for Steepest Descent with various stepsizes.

1
1.4 Diminishing Step Size

Figure 2 shows the result of diminishing step size (α/r). We see that as α increases the initial divergence
increases but the rate of convergences is better too. Thus there is a trade-off between initial divergence
and rate of convergence. We find that this algorithm performs the best at /alpha = 0.1.
p
Figure 3 shows the result of diminishing step size (α/ (r)). The initial divergence of this version of
diminishing step size is more than earlier version. Hence it works better for lower values of α compared
to the previous version. The best performance of this version is observed at α = 0.05.
Note that α/r2 takes infinite time to converge because the sum of the sequence converges to 2 ( and not
to infinity).

Figure 2: log(Error) vs Number of iterations for Diminishing stepsize=α/r.


Figure 3: log(Error) vs Number of iterations for Diminishing stepsize=α/ r.

1.5 Armiho Rule

Figure 4 and 5 display the performance of Armiho Rule for various values of s, B and σ, keeping two of
them fixed at a time. We find that this algorithm performs best close to B=0.5, s=0.2 and σ = 0.1.

2
Figure 4: log(Error) vs Number of iterations for Armiho Rule, varying B (s=1 and σ = 0.1.)

Figure 5: log(Error) vs Number of iterations for Armiho Rule, varying s (B=0.5 and σ = 0.1.)

3
2 Running Algorithm on Data Set

2.1 Problem Formulation

min (1/2)||Ax − b||2 (3)

where A is 7291x3 matrix, with the first two columns containing the features ”intensity” and ”symme-
try”; and the third column consists of 1 (for intercept). b is a 7291X1 vector consisting of label +1 if
the digit is 1 and −1 otherwise.
Eigenvalues of AT A are: 3634, 7291 and 10946.

2.2 Data Preprocessing

The feature matrix A is first scaled and shifted according to the following equation:

A = (A − mean(A))/σ (4)

This makes sure that the condition number is good and the algorithm converges faster.

2.3 Using Steepest descent

Figure 6 shows the convergence of steepest descent with various step sizes. Since L=10946, we will be
looking for values in the range of 10− 4 to 10− 5. In Figure 1 we notice that as the step size decrease
the convergence time increases, which is expected. For α > 0.0001 the problem diverges. Therefore we
should choose a step size smaller than 0.0001 but it shouldn’t be very small, otherwise it will take more
time to converge. Therefore α = 0.0001 is almost the best step size for steepest descent.

Figure 6: log(Error) vs Number of iterations for Steepest Descent with various stepsizes.

2.4 Diminishing Step Size

Figure 7 shows the result of diminishing step size (α/r). We see that as α increases in the interval
0.0006 to 0.00095 the performance gets better. However above 0.0001, the algorithm takes forever to
converge. This is because if the step size is above 0.0001, the initial error value shoots up to a very high
value (diverges) and then subsequently, it takes a lot of time for diminishing step size to bring it back
to 0. This is consistent with fixed step size, where in we found that for values greater than 0.0001, the

4
algorithm diverges.

p
Figure 8 shows the result of diminishing step size (α/ (r)). The performance of this version of diminish-
ing step size is much better compared to the previous version. This is because the step size is decreasing
slowly in each iteration and therefore the algorithm converges faster. Here we get the best performance
at 0.0005 (compared to higher values) because for lower α, the initial overshoot is very less compared to
higer α. The algoritm becomes very slow for α > 0.001 because of very large initial divergence.

Figure 7: log(Error) vs Number of iterations for Diminishing stepsize=α/r.


Figure 8: log(Error) vs Number of iterations for Diminishing stepsize=α/ r.

2.5 Armiho Rule

Figure 9, 10 and 11 displays the performance of Armiho Rule for various values of s, B and σ, keeping
two of them fixed at a time. For the first two plots the value of sigma = 0.1. We find that this algorithm
performs best close to B=0.5, s=0.2 and σ = 0.1.

2.6 Clash of the Best

In Figure 12, we take the best performers from the above three step size rules and Compare them. Figure
8 shows the comparison. For fixed size we take α = 0.0001, for Armiho rule we take B = 0.5, σ = 0.1

5
Figure 9: log(Error) vs Number of iterations for Armiho Rule, varying B (s=1 and σ = 0.1.)

Figure 10: log(Error) vs Number of iterations for Armiho Rule, varying s (B=0.5 and σ = 0.1.)

Figure 11: log(Error) vs Number of iterations for Armiho Rule, varying σ (B=0.5 and s = 0.8).

6
and s = 0.8. We find that Armiho rule is the best performer. However, diminishing step size almost
becomes equal to Armiho asymptotically.

Figure 12: Comparison between the best performers of the above three algorithms.

2.7 Regression Lines for Classification

In figures 13 and 14, we show the result of our algorithm for separating one and the rest. As the error is
negligible with all three algorithms the classifying hyperplane is the same irrespective of the algorithm
used.

Figure 13: Separating 1 and rest.

7
Figure 14: Separating 0 and rest.

You might also like