Linear Regression
Linear regression is a fundamental statistical method used to
model the relationship between a dependent variable and one or
more independent variables. It is widely applied in various fields,
including economics, biology, and social sciences, to predict
outcomes and understand data trends.
What is Linear Regression?
Linear regression is a predictive modeling
Definition
technique that estimates the relationship between
A predictive modeling technique to estimate
variables by fitting a linear equation. It aims to
predict the dependent variable based on the relationships.
independent variable(s).
Equation
Y = a + bX + ε, where Y is dependent, X
independent.
Types
Simple and multiple linear regression.
Assumptions of Linear Regression
Linearity
Independence
The relationship between the
Observations should be
dependent and independent
independent of each other.
variables should be linear.
Homoscedasticity
Normality
The residuals should have
The residuals should ideally
constant variance at all levels of
follow a normal distribution.
the independent variable(s).
Benefits of Linear Regression
Simplicity Efficiency Insightful Versatile
Easy to understand Requires less Coefficients provide Applicable in finance,
and interpret, making computational power insights into the healthcare, and
data analysis for quick analyses and relationships between marketing to solve
accessible. predictions. variables. diverse problems.
Limitations of Linear Regression
Linear regression is a widely used method; however, it has several limitations that can affect its predictive
accuracy and reliability in real-world applications.
Linearity Assumption Sensitivity to Outliers
The method assumes a linear relationship, Linear regression is sensitive to outliers, which
which may not always hold true for real-world can disproportionately influence the results,
data, leading to inaccurate predictions. skewing the model.
Multicollinearity Overfitting
In multiple linear regression, high correlation Adding too many variables can lead to a model
among independent variables can lead to that fits the training data well but performs
unreliable coefficient estimates and difficulties poorly on unseen data.
in interpretation.