1-Review of Linear Regression
1-Review of Linear Regression
30 448.524
32.4138 509.248
34.8276 535.104
37.2414 551.432
39.6552 623.418
… …
2
Visualization of table 1 data
Relationship between house selling price
and house area
price
m2
3
Example (next)
4
Solution: Idea
Solution: Draw a line closest to the data points and calculate the house price at 50
price
m2
5
Solution: Programming
• Step 1- Training: Find the line closest to the
data points (called model)
6
Solution: Programming
• Step 1- Training: Find the line closest to the
data points (called model) à using Gradient
descent algorithm
7
Solution: Programming
• Step 1- Training: Find the line closest to the
data points (called model) à using Gradient
descent algorithm
8
Formulating model
• Model formula: 𝑦 = 𝑤1 ∗ 𝑥 + 𝑤0
9
Formulating model
• Model formula: 𝑦 = 𝑤1 ∗ 𝑥 + 𝑤0
à Linear Model
10
Formulating model
• Model formula: 𝑦 = 𝑤1 ∗ 𝑥 + 𝑤0
à Linear Model
• Problem becomes: find 𝑤1, 𝑤0
• Represent input data points as:
{(xi,yi), i = 1...30}
In which: yi = 𝑤1 ∗ 𝑥𝑖 + 𝑤0
• Represent estimated data point as:
𝑦)𝑖 = 𝑤1 ∗ 𝑥𝑖 + 𝑤0
11
Model training
• Random initial data point: 𝑤1 = 1, 𝑤0 = 0 à Model becomes: y = x
• Model fine-tuning
price
y=
Difference between true
price and estimated price
y=x
𝑦! =
m2
12
Model training
• Problem: estimated price is too far from true
price. For example, at the point x = 42
Difference between true price and estimated price
at the data point x = 42 of linear model y = x
price
y=x
𝑦! =
m2
13
Model training
• Need a metric to evaluate the linear model with
parameter set: (w0,w1) = (0,1)
Difference between true price and estimated price
at the data point x = 42 of linear model y = x
price
y=x
𝑦! =
m2
14
Loss Function
• For each data point (xi,yi), the difference between
the actual price and the predicted price:
!
∗ (𝑦)𝑖 − 𝑦# )2
"
• The difference across the entire data set as the
sum of the differences of each data point:
! !
J= ∗ ∗ ( ∑) (
&'( 𝑖𝑦
$ − 𝑦$ ) 2)
" #
15
Loss Function
! !
J= ∗ ∗ ( ∑)
&'(( 𝑦
$𝑖 − 𝑦$ ) 2)
" #
• J >= 0
• The smaller J is, the model is more close to the
actual data points
• If J = 0 then the model passes through all data
points
16
Loss Function
! !
J= ∗ ∗ ( ∑)
&'(( 𝑦
$𝑖 − 𝑦$ ) 2)
" #
• The problem transfers from: finding the linear
model 𝑦 = 𝑤1 ∗ 𝑥 + 𝑤0
cloest to the data points
• à to: finding the parameter (w0,w1) such that
J obtains the minimum value
à Use Gradient descent algorithm to find
minimum value of J
17
Gradient Descent Algorithm
• Idea: use derivative to find the minimum value
of a function f(x)
• Algorithm:
(1) Random initialization: x = x0
(2) Assign: x = x - learning_rate * f’(x)
(3) Re-compute f(x). Stop if f(x) is small enough,
or repeat step (2) if not
18
Gradient Descent Algorithm
Note:
• learning_rate is non-negative constant
• step 2 will be repeated until a large enough
number of times or f(x) is small enough
19
Gradient Descent: example
20
Gradient Descent: example
21
Gradient Descent: example
Step 1: Random initialization x= -2 (Point A)
Step 2: compute f’(x)
then x = xA– learning_rate * f’(xA)
Step 3: compute f(x) à still big à move to
point C, and repeat Step 2
22
Gradient Descent: example
In detail: if we choose initial value: x = 10, learning_rate = 0.1, then the
values of step 2 and step 3 will be as in the following table:
Time x f(x)
1 8.00 64.00
2 6.40 40.96
3 5.12 26.21
4 4.10 16.78
5 3.28 10.74
6 2.62 6.87
7 2.10 4.40
8 1.68 2.81
9 1.34 1.80
10 1.07 1.15
23
Gradient Descent: example
Visualization of Table 2
24
Effect of Learning Rate Selection
25
Effect of Learning Rate Selection
Epoch: Number of times of step 2, Loss: the function to find the minimum value
26
Practical Work 01
• Setup python environment
– Anaconda
– Or virtualenv
–…
• Write Python code to plot data on previous
table
27
Practical Work 01
Time x f(x)
1 8.00 64.00
2 6.40 40.96
3 5.12 26.21
4 4.10 16.78
5 3.28 10.74
6 2.62 6.87
7 2.10 4.40
8 1.68 2.81
9 1.34 1.80
10 1.07 1.15
28
29