11, 12. Predictive Analysis
11, 12. Predictive Analysis
Multiple
Regression
Dr. Rajarshi Debnath
Marketing Area
FORE School of Management,
New Delhi
Agenda
• Linear Regression
• Multiple Regression
Ŷ = a + bX
Product Moment
Correlation
• A scatter diagram, or
scattergram, is a plot of
the values of two
variables for all the cases
or observations.
Conducting
Bivariate
Regression
Analysis
Which
Straight
Line is
Best?
Assumptions
• The error term is normally distributed. For each fixed value of X, the
distribution of Y is normal.
• The means of all these normal distributions of Y, given X, lie on a
straight line with slope b.
• The mean of the error term is 0.
• The variance of the error term is constant. This variance does not
depend on the values assumed by X.
• The error terms are uncorrelated. In other words, the observations
have been drawn independently.
Model (fit) summary
Assumption checks
r- Correlation
r2 – Explains % of variance
•variance inflation
factor (VIF)>5
Linear regression makes
several key assumptions:
As before, the coefficient a represents the intercept, but the b's are
now the partial regression coefficients.
Statistics Associated with Multiple Regression
(1 of 2)
• Adjusted R2. R2, coefficient of multiple determination, is adjusted for
the number of independent variables and the sample size to account
for the diminishing returns. After the first few variables, the
additional independent variables do not make much contribution.
• F test. The F test is used to test the null hypothesis that the
coefficient of multiple determination in the population, R2pop, is zero.
This is equivalent to testing the null hypothesis. The test statistic has
an F distribution with k and (n − k − 1) degrees of freedom.
Statistics Associated with Multiple Regression
(2 of 2)
• RMSEA (Root Mean Square Error of Approximation): < 0.08 is good fit.