Statistics Micro Mini Multiple Regression: January 5-9, 2008 Beth Ayers
Statistics Micro Mini Multiple Regression: January 5-9, 2008 Beth Ayers
Multiple Regression
Beth Ayers
• Graphical Summary
‒ Scatter plot
• Numerical Summary
‒ Correlation
‒ R2
‒ Regression equation
‒ Response = ¯0 + ¯1 ¢ explanatory
• Test of significance
‒ Test significance of regression equation coefficients
January 6, 2009 - morni 3
ng session
Scatter plot
• Shows relationship between two
quantitative variables
‒ y-axis = response variable
‒ x-axis = explanatory variable
• Correlation2 = R2
‒ ¯0 is the intercept
‒ the value of the response variable when the
explanatory variable is 0
‒ ¯1 is the slope
‒ For each 1 unit increase in the explanatory
variable, the response variable increases by ¯1
• Independence of errors
‒ Can often be checked by knowing how data was
collected. If not sure can use autocorrelation plots.
• Normality of errors
‒ Look at normal probability plot
‒ If non-normal confidence intervals and estimated
coefficients will be wrong
• R2 = 89.44
‒ 89.44% of the variability in efficiency can be
explained by words per minute typed
• Regression Equation
‒ Y = ¯0 + ¯1¢X1 + ¯2¢X2 + . . . + ¯N¢XN
• Numerical Summary
‒ Look at the correlation matrix of the response
and all of the explanatory variables
• Step 1
‒ Does the data provide evidence that any of
the explanatory variables are important in
predicting Y?
‒ No – none of the variables are important, the
model is useless
‒ Yes – at least one variable is important, move
to step 2
• Step 2
‒ For each explanatory variable Xj: does the
data provide evidence that Xj has a significant
linear effect with Y, controlling for all the
other variables
January 6, 2009 - morni 27
ng session
Step 1
• Test the overall hypothesis that at least
one of the variables is needed
‒ H0: none of the explanatory variables are
important in predicting the response variable
‒ H1: at least one of the explanatory variables
is important in predicting the response
variable
• Conclusions
‒ Words per minute is significant but GPA is not
‒ In this case we ended up with a simple linear
regression with words per minute as the only
explanatory variable
January 6, 2009 - morni 35
ng session
Looking at R2adj
• R2adj (wpm and GPA) = 89.39
• Classroom variable
‒ Pre-test score
• Response variable
‒ Final exam score
• F-statistic = 95.56
• P-value = 0.0000
• Conclusions
‒ Pretest score and time are significant but number
correct is not
January 6, 2009 - morni 50
ng session
Example
• This is not surprising given the high
correlation (0.90) between pretest score
and number correct
• Formally show
‒ Number Correct ~ Pretest + Time
‒ R2 = 0.8044
‒ Tolerance = 1 – 0.8044 = 0.1956
‒ Lower than 0.20
‒ VIF = 1/0.1956 = 5.11
‒ VIF is greater than 5
• Step 2
‒ Test significance of pretest score
‒ T-statistic: 14.93
‒ P-value = 0.0000
• R2adj = 84.34
‒ 84% of the variability in final exam score is
explained by pretest score and time
• Step 2
‒ Test significance of number correct score
‒ T-statistic: 12.09
‒ P-value = 0.0000
X3 1.00 0.08
X4 1.00
January 6, 2009 - morni 63
ng session
Exploratory Analysis
• Appears reasonable that each of the 4
explanatory variables may have a linear
relationship with the response variable
• Step 2
‒ Test significance of X1
‒ T-statistic: -9.04
‒ P-value = 0.0000
‒ Test significance of X2
‒ T-statistic: 207.21
‒ P-value = 0.0000
‒ Test significance of X3
‒ T-statistic: 0.88
‒ P-value = 0.3817
‒ Test significance of X4
‒ T-statistic: 181.57
‒ P-value = 0.0000
January 6, 2009 - morni 66
ng session
Conclusions
• Variable X3 is not significant in predicting
Y
• Step 2
‒ Test significance of X1
‒ T-statistic: -42.62
‒ P-value = 0.0000
‒ Test significance of X2
‒ T-statistic: 208.82
‒ P-value = 0.0000
‒ Test significance of X4
‒ T-statistic: 181.46
‒ P-value = 0.0000
January 6, 2009 - morni 69
ng session
Things to Note
• When we reran the regression without X3,
the changes in the regression equation
and step 2 of the analysis were mostly to
X1