0% found this document useful (0 votes)

14 views58 pages

11 Gradient Descent

Uploaded by

mb6hbk2ctg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views58 pages

11 Gradient Descent

Uploaded by

mb6hbk2ctg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Foundations of Data Science, Fall 2024

Introduction to Data Science for Doctoral Students, Fall 2024

11. Gradient Descent Optimisation

Dr. Haozhe Zhang

October 28, 2024

MSc: https://lms.uzh.ch/url/RepositoryEntry/17589469505
PhD: https://lms.uzh.ch/url/RepositoryEntry/17589469506
Calculus Background

We consider differentiable multivariate functions f : RD → R

Two fundamental observations are behind gradient-based optimisation:

1. The gradient of f is perpendicular to the contour line of f

2. The gradient of f points in the direction of the steepest increase of f

To search for a local minimum of f , we can repeatedly move in the direction of the
steepest decrease of f , so in the opposite direction of the gradient of f .

To move faster (yet computationally more expensive), we can employ optimisation

methods that rely on the second-order derivatives of f , so on its Hessian matrix

1
Gradient Vectors are Orthogonal to Contour Curves

Theorem: If a function f is differentiable, the gradient of f at a point is either zero

or perpendicular to the contour line of f at that point.

Intuition for perpendicularity: Two hikers

at the same location on a mountain.
1. Choose the direction where the
slope is steepest

2. Choose a path that keeps the same

height

The theorem says that they depart in

directions perpendicular to each other.

2
Gradient Points in the Direction of the Steepest Increase (1/2)

Each component of the gradient says how fast the function changes with respect
to the standard basis:
∂f f (a + h ui ) − f(a)
(a) = lim
∂ xi h →0 h

where ui = [0 · · · 1 · · · 0]T is the unit vector in the direction of xi

1 i D

What about changing with respect to the direction of some arbitrary vector v?

3
Gradient Points in the Direction of the Steepest Increase (1/2)

Each component of the gradient says how fast the function changes with respect
to the standard basis:
∂f f (a + h ui ) − f(a)
(a) = lim
∂ xi h →0 h

where ui = [0 · · · 1 · · · 0]T is the unit vector in the direction of xi

1 i D

What about changing with respect to the direction of some arbitrary vector v?

Directional Derivative ∇v : derivative in direction of v

f (a + h v) − f (a) ∂f ∂f
∇v f (a) = lim = ∇f (a) · v = v1 (a) + . . . + vD (a)
h →0 h ∂ x1 ∂ xD

∇v f (a) compounds the effect of moving along the direction of the first
component, then along the direction of the second component, and so on

3
Gradient Points in the Direction of the Steepest Increase (2/2)

Directional Derivative in the direction of v: ∇v f (a) = ∇f (a) · v

Which direction should we walk from a so that f ’s output increases most quickly?

• That is, which unit vector v maximizes the directional derivative along v?

4
Gradient Points in the Direction of the Steepest Increase (2/2)

Directional Derivative in the direction of v: ∇v f (a) = ∇f (a) · v

Which direction should we walk from a so that f ’s output increases most quickly?

• That is, which unit vector v maximizes the directional derivative along v?

∗ 1
v = argmax ∇v f (a) = argmax ∇f (a) · v
v v
2 3
= argmax ∥∇f (a)∥∥v∥ cos θ = argmax ∥∇f (a)∥ cos θ
v v

where:

• (1) is the definition of directional derivative

• (2) is the definition of dot product, θ is the angle between ∇f (a) and v
• (3) holds since ∥v∥ = 1 (v is a unit vector)

Maximal value is for cos θ = 1 or θ = 0, so ∇f (a) and v∗ have the same direction

4
The Hessian Matrix

Hessian H: The matrix of all second-order partial derivatives of f

• Symmetric as long as all second derivatives exist

• Captures the curvature of the surface

5
The Hessian Matrix

Hessian H: The matrix of all second-order partial derivatives of f

• Symmetric as long as all second derivatives exist

• Captures the curvature of the surface

H has positive eigenvalues H has negative eigenvalues H has mixed eigenvalues

local minimum local maximum saddle point

5
Gradient and Hessian: Example

w12 w2
z = f ( w1 , w2 ) = 2
+ 22
a b

  " #
∂f 2w1
∇w f =  ∂ w1  = a2
2w2
∂f
∂ w2 b2

   
∂2f ∂2f 2
∂ 2 ∂ w1 ∂ w2 a2
0
H=  w2 1 =
∂2f

∂ f 2
∂ w2 ∂ w1 ∂ w22 0 b2

6
Gradient Descent

6
Gradient Descent Algorithm

Gradient descent is one of the simplest, but very general optimisation algorithms
for finding a local minimum of a differentiable function

• Recall our general optimisation goal for a loss function f with parameters w:

∗
w = argmin f (w)
w

• Gradient descent is iterative:

wt +1 = wt − ηt gt = wt − ηt ∇f (wt )

• It produces a new vector wt +1 at each iteration t

• At each iteration, w moves in the direction of the steepest descent

• Gradient descent may or may not reach w∗ = wj after any number j of

iterations

• ηt > 0 is the learning rate or step size

7
Gradient Descent: Convex vs. Non-Convex Functions

8
Gradient Descent for Least Squares Regression

N
X
L(w) = (Xw − y)T (Xw − y) = (xTi w − yi )2
i =1

We can compute the gradient of L with respect to w

∇w L = 2 XT Xw − XT y

9
Gradient Descent for Least Squares Regression

N
X
L(w) = (Xw − y)T (Xw − y) = (xTi w − yi )2
i =1

We can compute the gradient of L with respect to w

∇w L = 2 XT Xw − XT y

Gradient descent vs closed-form solution for very large (N) and wide (D) datasets

• In both cases we need to compute A = XT X and XT y

• Closed-form solution: also invert (perturbation of) A

• Gradient descent: recompute Aw at each iteration

So the number of iterations must be small to pay off

9
Choosing a Step Size

Choosing a good step-size is key and we may want a time-varying step size

10
Choosing a Step Size

Choosing a good step-size is key and we may want a time-varying step size

• If step size is too large, the algorithm may never converge

10
Choosing a Step Size

Choosing a good step-size is key and we may want a time-varying step size

• If step size is too large, the algorithm may never converge

• If step size is too small, the convergence may be very slow

10
Choosing a Step Size

• Constant step size: ηt = c

1
• Decaying step size: ηt = c/t. Different rates of decay common, e.g., √
t

• Backtracking line search

• Start with c/t (usually a large value)

• Check for a decrease: Is f (wt − ηt ∇f (wt )) < f (wt )?

• If decrease condition not met, multiply ηt by a decaying factor, e.g., 0.5

• Repeat until the decrease condition is met

11
When to Stop? Test for Convergence

Fixed number of iterations: Terminate if t≥T

Small increase: Terminate if f (wt +1 ) − f (wt ) ≤ ϵ1

Small change: Terminate if ∥wt +1 − wt ∥ ≤ ϵ2

12
Stochastic Gradient Descent

12
Optimisation Algorithms for Machine Learning

We minimise the objective function over data points (x1 , y1 ), . . . , (xN , yN )

N
1 X
L(w) = ℓ(w; xi , yi ) + λR(w)
N | {z } | {z }
i =1
Loss per data point Regularisation

The gradient of the objective function is

N
1 X
∇w L = ∇w ℓ(w; xi , yi ) + λ∇w R(w)
N
i =1

13
Optimisation Algorithms for Machine Learning

We minimise the objective function over data points (x1 , y1 ), . . . , (xN , yN )

N
1 X
L(w) = ℓ(w; xi , yi ) + λR(w)
N | {z } | {z }
i =1
Loss per data point Regularisation

The gradient of the objective function is

N
1 X
∇w L = ∇w ℓ(w; xi , yi ) + λ∇w R(w)
N
i =1

For Ridge Regression we have

N
1 X T
Lridge (w) = (w xi − yi )2 + λ
|w
T
{z w}
N | {z }
i =1 ℓ2 regularisation
square loss

N
1 X T
∇w Lridge = 2(w xi − yi )xi + 2λw
N
i =1

13
Stochastic Gradient Descent

As part of the learning algorithm, we calculate the following gradient:

N
1 X
∇w L = ∇w ℓ(w; xi , yi ) + λ∇w R(w)
N
i =1

Suppose we pick a random datapoint (xi , yi ) and evaluate gi = ∇w ℓ(w; xi , yi )

What is E[gi ]?

14
Stochastic Gradient Descent

As part of the learning algorithm, we calculate the following gradient:

N
1 X
∇w L = ∇w ℓ(w; xi , yi ) + λ∇w R(w)
N
i =1

Suppose we pick a random datapoint (xi , yi ) and evaluate gi = ∇w ℓ(w; xi , yi )

What is E[gi ]?

N
1 X
E[gi ] = ∇w ℓ(w; xi , yi )
N
i =1

In expectation gi points in the same direction as the entire gradient

(except for the regularisation term)

14
Stochastic Gradient Descent

As part of the learning algorithm, we calculate the following gradient:

N
1 X
∇w L = ∇w ℓ(w; xi , yi ) + λ∇w R(w)
N
i =1

Suppose we pick a random datapoint (xi , yi ) and evaluate gi = ∇w ℓ(w; xi , yi )

What is E[gi ]?

N
1 X
E[gi ] = ∇w ℓ(w; xi , yi )
N
i =1

In expectation gi points in the same direction as the entire gradient

(except for the regularisation term)

We compute the gradient at one data point instead of at all data points!
• Online learning
• Cheap to compute one gradient

14
Stochastic Gradient Descent vs (Batch) Gradient Descent

• 1000 data points for training and 1000 data points for test
• 2 features x1 ∼ N (0, 5) and x2 ∼ N (0, 8); centred labels
• Least-squares linear regression model fw (x) = x1 w1 + x2 w2
• Parameters (w1 , w2 ): initial (−2, −3) and final (1, 1)

15
Stochastic Gradient Descent vs (Batch) Gradient Descent

In practice: mini-batch gradient descent significantly improves the performance

• reduces the variance in the gradients and hence it is more stable than SGD
15
Sub-Gradient Descent

15
Minimising the Lasso Objective

Linear model trained with least squares loss and ℓ1 -regularisation:

N
X D
X
Llasso (w) = (wT xi − yi )2 + λ | wi |
i =1 i =1

• Quadratic part of the loss function can’t be framed as linear programming

• Lasso regularisation does not allow for closed-form solutions

• Typically resort to general optimisation methods

• We still have the problem that the objective function is not differentiable
everywhere!

16
Minimising the Lasso Objective

Linear model trained with least squares loss and ℓ1 -regularisation:

N
X D
X
Llasso (w) = (wT xi − yi )2 + λ | wi |
i =1 i =1

• Quadratic part of the loss function can’t be framed as linear programming

• Lasso regularisation does not allow for closed-form solutions

• Typically resort to general optimisation methods

• We still have the problem that the objective function is not differentiable
everywhere!

In these cases, we can use the sub-gradient descent approach:

• The function may have several sub-gradients at a given point

• Choose any of these sub-gradients in the gradient descent update formula
16
Sub-gradient Descent

We discuss the case when f is convex:

f (αx + (1 − α)y ) ≤ αf (x ) + (1 − α)f (y ) for all x , y and for α ∈ [0, 1]

f1 (x ) = 0.1x 2

17
Sub-gradient Descent

We discuss the case when f is convex:

f (αx + (1 − α)y ) ≤ αf (x ) + (1 − α)f (y ) for all x , y and for α ∈ [0, 1]

f1 (x ) = 0.1x 2

f (x ) if x < 2
1
f2 ( x ) =
2x − 3.6 otherwise

Tangent lines at x = 2
17
Sub-gradient Descent

We discuss the case when f is convex:

f (αx + (1 − α)y ) ≤ αf (x ) + (1 − α)f (y ) for all x , y and for α ∈ [0, 1]

f1 (x ) = 0.1x 2

f (x ) if x < 2
1
f2 ( x ) =
2x − 3.6 otherwise

Tangent lines at x = 2
17
Sub-gradient Descent

We discuss the case when f is convex:

f (αx + (1 − α)y ) ≤ αf (x ) + (1 − α)f (y ) for all x , y and for α ∈ [0, 1]

A convex function lies above the tangent plane at any point.

Univariate case: f (x ) ≥ f (x0 ) + g · (x − x0 ) for every g in the set f ′ (x0 ) of

sub-derivatives g of f at x0

Multivariate case: f (x) ≥ f (x0 ) + gT (x − x0 ) for every g in the set ∇f (x0 ) of

sub-gradients g of f at x0 17
Sub-gradient Descent: Example 1

Compute sub-derivatives of f (z ) = |z | at points x0 = 1, x1 = −3, and x2 = 0.

|z |
1

f (z ) ≥ f (xi ) + g · (z − xi )
f (z ) 0.5 f (z ) ≥ f (x0 ) + g · (z − x0 ) = 1 + g · (z − 1)
f (z ) ≥ f (x1 ) + g · (z − x1 ) = 3 + g · (z + 3)
−1 −0.5 0.5 1 f (z ) ≥ f (x2 ) + g · (z − x2 ) = g · z
z
−0.5

−1

18
Sub-gradient Descent: Example 1

Compute sub-derivatives of f (z ) = |z | at points x0 = 1, x1 = −3, and x2 = 0.

|z |
1

−1

f is differentiable at x0 = 1 and x1 = −3, so there is one derivative of f at each of

these points. For x0 = 1, g = 1. For x1 = −3, g = −1.

18
Sub-gradient Descent: Example 1

Compute sub-derivatives of f (z ) = |z | at points x0 = 1, x1 = −3, and x2 = 0.

|z |
1

−1

f is differentiable at x0 = 1 and x1 = −3, so there is one derivative of f at each of

these points. For x0 = 1, g = 1. For x1 = −3, g = −1.

At x2 = 0, f admits the sub-derivatives g ∈ [−1, 1].

18
Sub-gradient Descent: Example 2

Compute a sub-derivative of f (z ) = max(z , 0) at point x0 = 0.

max z , 0
1

f (z ) 0.5

−1 −0.5 0.5 1
z
−0.5

−1

19
Sub-gradient Descent: Example 2

Compute a sub-derivative of f (z ) = max(z , 0) at point x0 = 0.

max z , 0
1

f (z ) 0.5

−1 −0.5 0.5 1
z
−0.5

−1

f (z ) ≥ f (x0 ) + g · (z − x0 ) = g · z

The sub-derivatives g ∈ [0, 1] satisfy the above inequality.

19
Constrained Convex Optimisation

Gradient descent

• Minimises f (x) by moving in the negative gradient direction at each step:

wt +1 = wt − ηt ∇f (wt )

• There is no constraint on the parameters

20
Constrained Convex Optimisation

Gradient descent

• Minimises f (x) by moving in the negative gradient direction at each step:

wt +1 = wt − ηt ∇f (wt )

• There is no constraint on the parameters

Projected gradient descent

• Minimises f (x) subject to additional constraints wC ∈ C:

zt +1 = wt − ηt ∇f (wt )
wt +1 = argmin ∥zt +1 − wC ∥
wC ∈C

• Gradient step is followed by a projection step

20
Constrained Convex Optimisation: Examples

Minimise (Xw − y)T (Xw − y) subject to the ridge and lasso constraints
PD
wT w < R i =1 | wi | < R

21
Second Order Methods

21
Newton’s Method

In calculus: Finds roots of a differentiable function f , i.e., solutions to f (x) = 0

In optimisation: Finds roots of f ′ , i.e., solutions to f ′ (x) = 0

• Function f needs to be twice-differentiable

• The roots of f ′ are stationary points of f , i.e., minima/maxima/saddle points

22
Newton’s Method in One Dimension

• Construct a sequence of points x1 , . . . , xn starting with an initial guess x0

23
Newton’s Method in One Dimension

• Construct a sequence of points x1 , . . . , xn starting with an initial guess x0

• Sequence converges towards a minimiser x ∗ of f using sequence of

second-order Taylor approximations of f around the iterates:

′ 1
f (x ) ≈ f (xk ) + (x − xk )f (xk ) + (x − xk )2 f ′′ (xk )
2

23
Newton’s Method in One Dimension

• Construct a sequence of points x1 , . . . , xn starting with an initial guess x0

• Sequence converges towards a minimiser x ∗ of f using sequence of

second-order Taylor approximations of f around the iterates:

′ 1
f (x ) ≈ f (xk ) + (x − xk )f (xk ) + (x − xk )2 f ′′ (xk )
2

• xk +1 = x ∗ defined as the minimiser of this quadratic approximation

23
Newton’s Method in One Dimension

• Construct a sequence of points x1 , . . . , xn starting with an initial guess x0

• Sequence converges towards a minimiser x ∗ of f using sequence of

second-order Taylor approximations of f around the iterates:

′ 1
f (x ) ≈ f (xk ) + (x − xk )f (xk ) + (x − xk )2 f ′′ (xk )
2

• xk +1 = x ∗ defined as the minimiser of this quadratic approximation

• If f ′′ is positive, then the quadratic approximation is convex,

and a minimiser is obtained by setting the derivative to zero:

d ′ 1
0= f (xk ) + (x − xk )f (xk ) + (x − xk )2 f ′′ (xk ) = f ′ (xk ) + (x ∗ − xk )f ′′ (xk )
dx 2

Then,
∗ ′ ′′ −1
xk +1 = x = xk − f (xk )[f (xk )]
23
Geometric Interpretation of Newton’s Method

At iteration k , we fit a paraboloid to the surface of f at xk with the same slopes

and curvature as the surface at xk and go for the extremum of that paraboloid

24
Geometric Interpretation of Newton’s Method

At iteration k , we fit a paraboloid to the surface of f at xk with the same slopes

and curvature as the surface at xk and go for the extremum of that paraboloid

f (x ) = x 3 − 9x g (x ) = f (x0 ) + (x − x0 )f ′ (x0 ) = x03 − 9x0 + (x − x0 )(3x02 − 9)

• Gradient descent: First derivative

• Local linear approximation
24
Geometric Interpretation of Newton’s Method

At iteration k , we fit a paraboloid to the surface of f at xk with the same slopes

and curvature as the surface at xk and go for the extremum of that paraboloid

f (x ) = x 3 − 9x g (x ) = f (x0 ) + (x − x0 )f ′ (x0 ) = x03 − 9x0 + (x − x0 )(3x02 − 9)

r (x ) = f (x0 )+(x −x0 )f ′ (x0 )+ 21 (x −x0 )2 f ′′ (x0 ) = x03 −9x0 +(x −x0 )(3x02 −9)+ 12 (x −x0 )6x0

• Newton: Second derivative

• Gradient descent: First derivative
• Degree 2 Taylor approximation
• Local linear approximation around current point 24
Newton’s Method in High Dimensions

First derivative → gradient Second derivative → Hessian

• Approximate f around xk using second order Taylor approximation

T 1
f (x) ≈ f (xk ) + gk (x − xk ) + (x − xk )T Hk (x − xk )
2

• The gradient of f is given by:

∇x f = gk + Hk (x − xk )

• Setting ∇x f = 0, we obtain x∗ = xk − H− 1
k gk

• We move directly to the (unique) stationary point x∗ of f

• We repeat the above iteration with xk +1 = x∗

25
Newton’s Method: Computation and Convergence

Newton’s method

• Computational requirements at each Newton step

D

• D+ 2
partial derivatives and inverse of the Hessian

• Instead: Compute x as the solution of the system of linear equations

Hk (x − xk ) = −gk

using factorisations (e.g., Cholesky) of Hk

26
Newton’s Method: Computation and Convergence

Newton’s method

• Computational requirements at each Newton step

D

• D+ 2
partial derivatives and inverse of the Hessian

• Instead: Compute x as the solution of the system of linear equations

Hk (x − xk ) = −gk

using factorisations (e.g., Cholesky) of Hk

• For convex f
• It converges to stationary points of the quadratic approximation

• All stationary points are global minima

26
Newton’s Method: Computation and Convergence

Newton’s method

• Computational requirements at each Newton step

D

• D+ 2
partial derivatives and inverse of the Hessian

• Instead: Compute x as the solution of the system of linear equations

Hk (x − xk ) = −gk

using factorisations (e.g., Cholesky) of Hk

• For convex f
• It converges to stationary points of the quadratic approximation

• All stationary points are global minima

• For non-convex f
• Stationary points may not be minima nor in the decreasing direction of f

• Not successful for training deep neural networks:

abundance of saddle points for their non-convex objective functions
26
Summary

Convex Optimization

• Convex Optimization is ‘efficient’ (i.e., polynomial time)

• Try to cast learning problem as a convex optimization problem

• Many, many extensions of standard gradient descent exist: Adagrad,

Momentum-based, BGFS, L-BGFS, Adam, etc.

• Books: Boyd and Vandenberghe, Nesterov’s Book

Non-Convex Optimization

• Encountered frequently in deep learning

• (Stochastic) Gradient Descent gives local minima

• Nonlinear Programming - Dimitri Bertsekas

Detailed Lesson Plan in General Mathematics: & E Publishing, Inc
100% (5)
Detailed Lesson Plan in General Mathematics: & E Publishing, Inc
6 pages
Podar International School: Answer Scheme
No ratings yet
Podar International School: Answer Scheme
20 pages
Grade 9 2ND QUARTER EXAM
83% (6)
Grade 9 2ND QUARTER EXAM
2 pages
Stack
No ratings yet
Stack
92 pages
Gradient_Descent
No ratings yet
Gradient_Descent
52 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
LInear
No ratings yet
LInear
14 pages
Chapter 4
No ratings yet
Chapter 4
65 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
Math YHPLinear Regression
No ratings yet
Math YHPLinear Regression
13 pages
Module2-Optimizations
No ratings yet
Module2-Optimizations
65 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
8 pages
Gradient Descent Algorithm is a first
No ratings yet
Gradient Descent Algorithm is a first
5 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
27 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
3 Gradient Descent
No ratings yet
3 Gradient Descent
8 pages
Chapter Gradient Descent
No ratings yet
Chapter Gradient Descent
6 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Gradient - Descent Important 23-24
No ratings yet
Gradient - Descent Important 23-24
37 pages
Gradient Decent
No ratings yet
Gradient Decent
40 pages
4_Gradient Descent and Stochastic GD
No ratings yet
4_Gradient Descent and Stochastic GD
37 pages
Gradient Descent Algorithm.Y... (1)
No ratings yet
Gradient Descent Algorithm.Y... (1)
10 pages
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
No ratings yet
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
8 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
ML Notes
No ratings yet
ML Notes
14 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
GRADIENT DESCENT
No ratings yet
GRADIENT DESCENT
5 pages
07_Gradient_Descent_For_Linear_Regression_10_min
No ratings yet
07_Gradient_Descent_For_Linear_Regression_10_min
5 pages
Module 3
No ratings yet
Module 3
27 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Lect 5- Gradient Descent
No ratings yet
Lect 5- Gradient Descent
31 pages
cs188 Fa23 Note23
No ratings yet
cs188 Fa23 Note23
2 pages
Lecture10v01 Descent2
No ratings yet
Lecture10v01 Descent2
18 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Gradient Descent
No ratings yet
Gradient Descent
108 pages
Unit VI Optimization Techniques question bank solved answer
No ratings yet
Unit VI Optimization Techniques question bank solved answer
20 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
Backpropagation_optimization_tutorial
No ratings yet
Backpropagation_optimization_tutorial
14 pages
Unit3_rev3
No ratings yet
Unit3_rev3
201 pages
Notes Unit 1-3 Part-III
No ratings yet
Notes Unit 1-3 Part-III
25 pages
MScFE 650 MLF - Video - Transcripts - M3
No ratings yet
MScFE 650 MLF - Video - Transcripts - M3
19 pages
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
No ratings yet
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
19 pages
Lecture 11
No ratings yet
Lecture 11
30 pages
DL Unit -2
No ratings yet
DL Unit -2
20 pages
Gradient Descent_PR
No ratings yet
Gradient Descent_PR
31 pages
3.Linear Regression
No ratings yet
3.Linear Regression
18 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Lec 5 - Gradient-Descent
No ratings yet
Lec 5 - Gradient-Descent
31 pages
A Short Course in Automorphic Functions
From Everand
A Short Course in Automorphic Functions
Joseph Lehner
No ratings yet
Homography: Homography: Transformations in Computer Vision
From Everand
Homography: Homography: Transformations in Computer Vision
Fouad Sabry
No ratings yet
Exercise 4 From Section 18, Page 111: I I I I
No ratings yet
Exercise 4 From Section 18, Page 111: I I I I
2 pages
Module 3 Numerical Integration
No ratings yet
Module 3 Numerical Integration
10 pages
Block Model Approach
No ratings yet
Block Model Approach
6 pages
2 Euclid
No ratings yet
2 Euclid
25 pages
Python-2mark With Answer
No ratings yet
Python-2mark With Answer
16 pages
CIE 115 P1 Problem Sets
No ratings yet
CIE 115 P1 Problem Sets
2 pages
Arc Length and Area of A Sector
100% (1)
Arc Length and Area of A Sector
62 pages
Computer Science / It Digital Logic: Gate 2016
No ratings yet
Computer Science / It Digital Logic: Gate 2016
71 pages
DLP - G1 Math
No ratings yet
DLP - G1 Math
7 pages
Year 5 - Western Australian Curriculum v8.1: Mathematics - Eden Hill Primary School
No ratings yet
Year 5 - Western Australian Curriculum v8.1: Mathematics - Eden Hill Primary School
3 pages
Problem 3.55: Given: Find: Assumptions: Solution
No ratings yet
Problem 3.55: Given: Find: Assumptions: Solution
1 page
Curriculum Map - Math 8
No ratings yet
Curriculum Map - Math 8
9 pages
CLASS - 4-Maths
No ratings yet
CLASS - 4-Maths
5 pages
Example Problems On Support Vector Machines: Problem 1
No ratings yet
Example Problems On Support Vector Machines: Problem 1
2 pages
Grade 8 - Math - Midterm Exam - Revision Sheet
No ratings yet
Grade 8 - Math - Midterm Exam - Revision Sheet
12 pages
General Relativity Solved Problems
No ratings yet
General Relativity Solved Problems
81 pages
Breiman-JASA-EstimatingOptimalTransformations-1985
No ratings yet
Breiman-JASA-EstimatingOptimalTransformations-1985
20 pages
Wolfram Alpha Examples
No ratings yet
Wolfram Alpha Examples
5 pages
Fuzzy Logic
No ratings yet
Fuzzy Logic
39 pages
06.week 4 - Concept and Calculation of Experimental Variogram
No ratings yet
06.week 4 - Concept and Calculation of Experimental Variogram
38 pages
A Binary Number System Is One of The Four Types of Number System
No ratings yet
A Binary Number System Is One of The Four Types of Number System
10 pages
COT Q3 - MATH Solid and Plane Figures
No ratings yet
COT Q3 - MATH Solid and Plane Figures
8 pages
DLLec5b Notes
No ratings yet
DLLec5b Notes
4 pages
Introduction To Rational Functions: Lesson 4.1
No ratings yet
Introduction To Rational Functions: Lesson 4.1
20 pages
Creeping Flow Between Two Concentric Spheres PDF
100% (1)
Creeping Flow Between Two Concentric Spheres PDF
6 pages