Lecture 05
Lecture 05
Spring 2024
Linear Regression
Dependent Variable
Weight gain vs intake of food
Positive relationship
Regression Line
Minimize the difference between the estimated and
actual value Negative relationship
Error
Linear
Regression
^
𝑦𝑦=𝑏 0 +𝑏
=𝑚𝑥 1 𝑥1
+𝐶
𝑏^
𝑦 =𝑏
=
6 𝑥 +𝑏 𝑥
0 =0.6
0 1 𝑏
1 =
∑ (=1𝒙 − 𝒙 ) ∗( 𝒚 − 𝒚 )
Where
∑ ( 𝒙 − 𝒙 )𝟐
1 1
10
^
𝑦 =𝑏 +𝑏1 𝑥 1
𝑏1 =? 0
^ =slope+𝑏
𝑦 =𝑏 of the line 𝑏 0=2.2
0 1 𝑥1 4=𝑏0 + 0.6 ∗ 3
(3 , 4)
^
𝑦 =𝑏 0 +𝑏1 𝑥 1
x y
1 2 1-3=-2 2-4=-2 4 4
6
5
2 4 2-3=-1 4-4=0 1 0
4 3 5 3-3=0 5-4=1 1 0
3
4 4 4-3=1 4-4=0 1 0
2
1 5 5 5-3=2 5-4=1 4 2
0 1 2 4 5
3
3 4 10 6
Linear
Regression 6
5
𝑏1 =∑ 𝒚^ −𝒚 ¿
2 ¿ ¿ 𝑏1 =
3.6
=𝟎 . 𝟔
4
range is from 0 to 1
∑ 𝒚 −𝒚 ¿2 6
=0.6 means it’s a good fit
3
2
1
^
𝑦 =2.2+ 0.6 𝑥
0 1 2 3 4 5
x y
1 2 1-3=-2 2-4=-2 4 2.8 2.8-4=-1.2 1.44 4 4
2 4 2-3=-1 4-4=0 0 3.4 -0.6 0.36 1 0
3 5 3-3=0 5-4=1 1 4 0 0 1 0
4 4 4-3=1 4-4=0 0 4.6 0.6 0.36 1 0
5 5 5-3=2 5-4=1 1 5.2 1.2 1.44 4 2
3 4 6 3.6 10 6
Bias: Underfitting
Gap between Actual and estimated value.
High Bias means estimated value is far away from the actual
value. And vice versa.
When algorithm has limited flexibility to learn.
Pays less attention to training data, and over simplify the
model.
Such models always leads to high error on training and test
data.
Variance:
How much scattered the estimated values are
A model with high variance pays lots of attention to training
data and doesn't generalize.
Overfitting
Anaconda
is open source (free) of python programing for machine learning with tools like
Spider
Jupitar notebook is a platform
Google provided free GPU (Online access). Also paid ( Faster)
More number of cores, parallel computation.
Pandas Application Programing Interface (API)
Numerical Python (numpy)
Pandas is open source python library data analysis tool, providing high performance