L INEAR R EGRESSION
Nehal Khosla, Priyanshu Parida
National Institute of Science Education and Research
January 30, 2023
PART I: L INEAR R EGRESSION : P ROBLEM & S OLUTION
1 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Linear Regression: The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Regression Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Mathematical Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Loss Function: Mean Squared Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Linear Regression: The Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Quality of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1 / 18
PART II: L INEAR R EGRESSION : T HE I NDUCTIVE B IAS
1 Inductive Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.1 A List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2 Choice of Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 / 18
PART III: L INEAR R EGRESSION : A PPLICATIONS AND S HORTCOMINGS
1 Applications and Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 / 18
Part I
L INEAR R EGRESSION : T HE P ROBLEM
4 / 18
R EGRESSION
▶ Regression is a statistical method that attempts to predict the strength and nature of the
relationship between a dependent variable, and one or a series of independent variables.
▶ It does so by finding a curve that minimizes the error in the actual and expected values of the
dependent variable of training data-set.
▶ For proper interpretation of regression, several assumptions (inductive bias) about the data and
the model must hold.
▶ Linear regression is one of the common forms of this method. It establishes a linear relationship
between the two variables.
5 / 18
L INEAR R EGRESSION
I NTRODUCTION
▶ Linear regression is a supervised machine learning algorithm.
▶ The model tries to find a best-fit line to establish a linear relationship between the dependent
(y) and independent(x) variable.
▶ The model then uses this fit to predict the appropriate y-values for unknown x-values.
▶ The best-fit line is achieved minimizing the error between predicted and actual values. This is
done by minimizing the loss function.
6 / 18
L INEAR R EGRESSION
R EGRESSION L INE
The line showing the linear relationship between the dependent and independent variable is called
a regression line. An example of a regression line is shown below:
Figure. Regression line for log R (dependent variable)
vs log d (independent variable)
The regression line may be positive, wherein the dependent variable increases with increase in
independent variable; or it may be negative, wherein the dependent variable decreases with
increase in independent variable (as in above figure).
7 / 18
L INEAR R EGRESSION : T HE P ROBLEM
T YPES
Linear regression may be classified further into the following two types:
▶ Simple Linear Regression: Assumes a linear relationship between a single independent
variable and a dependent variable.
▶ Multiple Linear Regression: Assumes a linear relationship between two or more independent
variables and a dependent variable.
8 / 18
L INEAR R EGRESSION : T HE P ROBLEM
M ATHEMATICAL R EPRESENTATION
Once a linear relationship has been determined by the algorithm, the general form of each model
may be represented as follows:
▶ Simple Linear Regression
y = ax + b + u
▶ Multiple Linear Regression
Y = a1 x + a2 x +a3 x + b + u
where:
y = Dependent variable
x = Independent variable
a = Slope(s) of the variable(s)
b = The y-intercept
u = The regression residual/error term
9 / 18
L INEAR R EGRESSION : T HE P ROBLEM
L OSS F UNCTION : M EAN S QUARED E RROR
▶ The regression line is achieved by minimizing the sum of mean squared error (loss function) for
all points in the domain. The loss function is gives as:
1
MSE = Σ(y − f (x))2
N
where f(x) = a1 x + a2 x .... +b
10 / 18
L INEAR R EGRESSION : T HE S OLUTION
S OLUTION
The best fit line may be found in the following two manners:
▶ Closed form (Exact form) Solution:
• It solves the problem in terms of simple functions and mathematical operators.
• The closed form solution for linear regression is as follows:
B = (X′ X)−1 X′ Y
where B = Matrix of regression parameters
X = Matrix of X values
X’ = Transpose of X
Y = Matrix of Y values
• Although this method gives an accurate model, it is computationally expensive, especially
when there are more than 4 dimensions.
▶ Gradient Descent:
• It is used to minimize MSE by calculating the gradient of the loss function.
• It is an iterative optimization algorithm.
11 / 18
L INEAR R EGRESSION : T HE S OLUTION
Q UALITY OF F IT
▶ The goodness of the fit achieved determines how linearly the variables are correlated.
▶ The goodness of fit may be calculated using the Pearson correlation coefficient, which is given
by:
Σ(x1 − x)(yi − y)
r= p
Σ(xi − x)2 Σ(yi − y)2
▶ The higher the value of r, the better is the fit.
12 / 18
Part II
L INEAR R EGRESSION : T HE I NDUCTIVE B IAS
13 / 18
I NDUCTIVE B IAS
A L IST
Linear regression takes the following assumptions, or inductive biases:
▶ The assumption that the dependent and independent variables are linearly related.
▶ Homoscedasticity: The assumption that the error term should be the same for all points.
▶ The assumption that MSE is the most appropriate loss function for linear regression.
14 / 18
C HOICE OF L OSS F UNCTION
Let us analyse some loss functions to justify the choice of MSE as an appropriate loss function.
▶ L1 = (y-f(x)): This loss function gives out both positive and negative values, which cancel out to
give near zero error for large data.
▶ L2 = |(y-f(x))|: Although errors do not cancel out here, the outliers are penalised equally as
compared to standard data.
▶ L3 = (y-f(x))2 : In this case, the errors do not cancel out. Also, outliers are penalised more, giving
a more appropriate regression line.
Hence, MSE is an appropriate choice for loss function.
15 / 18
Part III
L INEAR R EGRESSION : A PPLICATIONS AND
S HORTCOMINGS
16 / 18
A PPLICATIONS AND S HORTCOMINGS
▶ Linear regression finds its applications in several fields, like market analysis, financial analysis,
environmental health, and medicine.
▶ However, it does leave somethings to desire for. A linear correlation does not indicate
causation, i.e. a connection between two variable does not imply that one causes the other.
▶ Linear regression is prone to noise and overfitting.
▶ It is prone to multicollinearity, i.e. occurence of correlation between two or more independent
variables. This reduces the statistical significance of an independent variable.
17 / 18
R EFERENCES
1. CS460 Machine Learning 2023 Lectures, Subhankar Mishra.
2. Linear Regression in Machine Learning, Javatpoint.
3. ML|Linear Regression, geeksforgeeks.
18 / 18