0% found this document useful (0 votes)
19 views19 pages

Lec 6

Uploaded by

vucarot2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views19 pages

Lec 6

Uploaded by

vucarot2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

L INEAR R EGRESSION

Nehal Khosla, Priyanshu Parida

National Institute of Science Education and Research

January 30, 2023


PART I: L INEAR R EGRESSION : P ROBLEM & S OLUTION

1 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Linear Regression: The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Regression Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Mathematical Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Loss Function: Mean Squared Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Linear Regression: The Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11


3.1 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Quality of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1 / 18
PART II: L INEAR R EGRESSION : T HE I NDUCTIVE B IAS

1 Inductive Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.1 A List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2 Choice of Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 / 18
PART III: L INEAR R EGRESSION : A PPLICATIONS AND S HORTCOMINGS

1 Applications and Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 / 18
Part I

L INEAR R EGRESSION : T HE P ROBLEM

4 / 18
R EGRESSION

▶ Regression is a statistical method that attempts to predict the strength and nature of the
relationship between a dependent variable, and one or a series of independent variables.
▶ It does so by finding a curve that minimizes the error in the actual and expected values of the
dependent variable of training data-set.
▶ For proper interpretation of regression, several assumptions (inductive bias) about the data and
the model must hold.
▶ Linear regression is one of the common forms of this method. It establishes a linear relationship
between the two variables.

5 / 18
L INEAR R EGRESSION
I NTRODUCTION

▶ Linear regression is a supervised machine learning algorithm.


▶ The model tries to find a best-fit line to establish a linear relationship between the dependent
(y) and independent(x) variable.
▶ The model then uses this fit to predict the appropriate y-values for unknown x-values.
▶ The best-fit line is achieved minimizing the error between predicted and actual values. This is
done by minimizing the loss function.

6 / 18
L INEAR R EGRESSION
R EGRESSION L INE

The line showing the linear relationship between the dependent and independent variable is called
a regression line. An example of a regression line is shown below:

Figure. Regression line for log R (dependent variable)


vs log d (independent variable)

The regression line may be positive, wherein the dependent variable increases with increase in
independent variable; or it may be negative, wherein the dependent variable decreases with
increase in independent variable (as in above figure).

7 / 18
L INEAR R EGRESSION : T HE P ROBLEM
T YPES

Linear regression may be classified further into the following two types:
▶ Simple Linear Regression: Assumes a linear relationship between a single independent
variable and a dependent variable.
▶ Multiple Linear Regression: Assumes a linear relationship between two or more independent
variables and a dependent variable.

8 / 18
L INEAR R EGRESSION : T HE P ROBLEM
M ATHEMATICAL R EPRESENTATION

Once a linear relationship has been determined by the algorithm, the general form of each model
may be represented as follows:
▶ Simple Linear Regression
y = ax + b + u
▶ Multiple Linear Regression
Y = a1 x + a2 x +a3 x + b + u
where:
y = Dependent variable
x = Independent variable
a = Slope(s) of the variable(s)
b = The y-intercept
u = The regression residual/error term

9 / 18
L INEAR R EGRESSION : T HE P ROBLEM
L OSS F UNCTION : M EAN S QUARED E RROR

▶ The regression line is achieved by minimizing the sum of mean squared error (loss function) for
all points in the domain. The loss function is gives as:

1
MSE = Σ(y − f (x))2
N
where f(x) = a1 x + a2 x .... +b

10 / 18
L INEAR R EGRESSION : T HE S OLUTION
S OLUTION

The best fit line may be found in the following two manners:
▶ Closed form (Exact form) Solution:
• It solves the problem in terms of simple functions and mathematical operators.
• The closed form solution for linear regression is as follows:

B = (X′ X)−1 X′ Y

where B = Matrix of regression parameters


X = Matrix of X values
X’ = Transpose of X
Y = Matrix of Y values
• Although this method gives an accurate model, it is computationally expensive, especially
when there are more than 4 dimensions.
▶ Gradient Descent:
• It is used to minimize MSE by calculating the gradient of the loss function.
• It is an iterative optimization algorithm.

11 / 18
L INEAR R EGRESSION : T HE S OLUTION
Q UALITY OF F IT

▶ The goodness of the fit achieved determines how linearly the variables are correlated.
▶ The goodness of fit may be calculated using the Pearson correlation coefficient, which is given
by:
Σ(x1 − x)(yi − y)
r= p
Σ(xi − x)2 Σ(yi − y)2
▶ The higher the value of r, the better is the fit.

12 / 18
Part II

L INEAR R EGRESSION : T HE I NDUCTIVE B IAS

13 / 18
I NDUCTIVE B IAS
A L IST

Linear regression takes the following assumptions, or inductive biases:


▶ The assumption that the dependent and independent variables are linearly related.
▶ Homoscedasticity: The assumption that the error term should be the same for all points.
▶ The assumption that MSE is the most appropriate loss function for linear regression.

14 / 18
C HOICE OF L OSS F UNCTION

Let us analyse some loss functions to justify the choice of MSE as an appropriate loss function.
▶ L1 = (y-f(x)): This loss function gives out both positive and negative values, which cancel out to
give near zero error for large data.
▶ L2 = |(y-f(x))|: Although errors do not cancel out here, the outliers are penalised equally as
compared to standard data.
▶ L3 = (y-f(x))2 : In this case, the errors do not cancel out. Also, outliers are penalised more, giving
a more appropriate regression line.
Hence, MSE is an appropriate choice for loss function.

15 / 18
Part III

L INEAR R EGRESSION : A PPLICATIONS AND


S HORTCOMINGS

16 / 18
A PPLICATIONS AND S HORTCOMINGS

▶ Linear regression finds its applications in several fields, like market analysis, financial analysis,
environmental health, and medicine.
▶ However, it does leave somethings to desire for. A linear correlation does not indicate
causation, i.e. a connection between two variable does not imply that one causes the other.
▶ Linear regression is prone to noise and overfitting.
▶ It is prone to multicollinearity, i.e. occurence of correlation between two or more independent
variables. This reduces the statistical significance of an independent variable.

17 / 18
R EFERENCES

1. CS460 Machine Learning 2023 Lectures, Subhankar Mishra.


2. Linear Regression in Machine Learning, Javatpoint.
3. ML|Linear Regression, geeksforgeeks.

18 / 18

You might also like