0% found this document useful (0 votes)
27 views73 pages

Chapter 2

Uploaded by

ayafki19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views73 pages

Chapter 2

Uploaded by

ayafki19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 73

Basic Business Statistics

th
12 Edition

Chapter 14

Introduction to Multiple Regression

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-1

Learning Objectives
In this chapter, you learn:
How to develop a multiple regression model

How to interpret the regression coefficients


How to determine which independent variables to



include in the regression model
How to determine which independent variables are more

important in predicting a dependent variable
How to use categorical variables in a regression
model 

How to predict a categorical dependent variable using



logistic regression

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-2
The Multiple Idea: Examine the linear
relationship between
Regression Model

DCOVA

1 dependent (Y) & 2 or more independent


variables (Xi) Multiple Regression Model with k Independent
Variables:

Y-intercept Population slopes Random Error i 0 1 1i 2 2i k ki iY 

β  βX  βX  β Xε
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-3

Multiple Regression
Equ
atio
n
DCOV
A
The coefficients of the multiple regression
model are estimated using sample data

Multiple regression equation with k independent variables:


Estimated (or predicted) value of Y Estimated slope coefficients
Estimated intercept
i 0 1 1i 2 2i k kiY  b  bX  bX     bXˆ
In this chapter we will use Excel or Minitab to
obtain the regression slope coefficients and other
regression summary measures.
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-4

Multiple Regression
Equa
tion
(continue
d)
X1 Slope for variable X
1

Two variable model Slope for variable X 2


DCOVA

ˆ
Y 0 1 1 22Y b b XbX 

X2

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-5

Example:
2 Independent Variables
DCOV
A
A distributor of frozen dessert pies wants to

evaluate factors thought to influence demand

Dependent variable: Pie sales (units per


week) 

Independent variables: Price (in $)


Advertising ($100’s)

Data are collected for 15


weeks 

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-6
Sales Example
Pie
Price DCOVA
Pie Advertising
Week ($)
Sales ($100s) Multiple regression equation:
1 350 5.50 3.3 2 460 7.50 3.3 3 350 8.00 3.0 4 430
8.00 4.5 5 350 6.80 3.0 6 380 7.50 4.0 7 430 4.50
3.0 8 470 6.40 3.7 9 450 7.00 3.5 10 490 5.00 4.0
11 340 7.20 3.5 12 300 7.90 3.2 13 440 5.90 4.0 14
450 5.00 3.5 15 300 7.00 2.7

Sales = b0 + b1(Price) +
b2(Advertising)
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-7

Excel Multiple Regression


Out
put
DCOV
A
Regression Statistics
Multiple R 0.72213

R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341 Observations 15 Sales  306.526 - 24.975(Pri ce)  74.131(Adv
ertising)

ANOVA
df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389
2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-8

Minitab Multiple Regression


Out
put
DCOV
A
Sales  306.526 - 24.975(Pri ce)  74.131(Adv ertising)

The regression equation is


Sales = 307 - 25.0 Price + 74.1 Advertising

Predictor Coef SE Coef T P


Constant 306.50 114.30 2.68 0.020
Price -24.98 10.83 -2.31 0.040
Advertising 74.13 25.97 2.85 0.014

S = 47.4634 R-Sq = 52.1% R-Sq(adj) = 44.2%

Analysis of Variance

Source DF SS MS F P
Regression 2 29460 14730 6.54 0.012
Residual Error 12 27033 2253
Total 14 56493

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-9

The Multiple Regression


Equ
atio
n
DCOVA

Sales  306.526 - 24.975(Pri ce)  74.131(Adv ertising)


where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
b1= -24.975: sales will advertising
decrease, on
average, by 24.975 pies per b2= 74.131: sales will increase,
week for each $1 increase in on average, by 74.131 pies per
selling price, net of the effects week for each $100 increase in
of advertising, net of the effects of
changes due to changes due to price
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-10

Using The Equation to


Make Predictions
DCOVA
Predict sales for a week in which the
selling price is $5.50 and advertising
is $350:

 Sales 306.526 - 24.975(Pri ce) 74.131(Adv

ertising)  306.526 - 24.975 (5.50) 74.131 (3.5) 


428.62

Predicted sales Note that Advertising is in $100’s, so


is 428.62 pies $350 means that X2= 3.5
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-11

Predictions in Excel using PHStat


PHStat | regression |
multiple regression … DCOVA

Check the
“confidence and prediction interval
estimates” box
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-12

Predictions in PHStat
(continued)
DCOVA

Input values

<
Predicted Y value
Confidence interval for the
mean value of Y, given
these X values

Prediction interval for an


individual Y value, given
these X
values
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-13
Predictions in DCOVA

Minitab

Predicted Values for New Observations


Confidence interval for the mean value of Y,
New given these X values
Obs Fit SE Fit 95% CI 95% PI
1 428.6 17.2 (391.1, 466.1) (318.6, 538.6)
ˆ
Y value Predicted

Values of Predictors for New Observations

New
Obs Price Advertising
1 5.50 3.50

Prediction interval for an individual Y value,


Input values given these X values
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-14

Coefficient of
Multiple Determination
DCOV
A
Reports the proportion of total
variation in Y explained by all X
variables taken together

SSR
regression sum of squares
r 
2

SST
total sum of squares
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-15

Multiple Coefficient of
Determination In ExcelDCOVA
Regression Statistics
SSR
Multiple R 0.72213 R Square 0.52148 Adjusted R SST
Square 0.44172 Standard Error 47.46341 Observations 56493.3
15
52.1% of the variation in pie sales
29460.0 is explained by the variation in
r2   price
.52148 and advertising
ANOVA
df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389
2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-16

Multiple Coefficient
Predictor Coef SE Coef T P
Constant 306.50 114.30 2.68 0.020
Price -24.98 10.83 -2.31 0.040

of Advertising 74.13 25.97 2.85 0.014

S = 47.4634 R-Sq = 52.1% R-Sq(adj) =


Determination In 44.2% Analysis of Variance

Minitab Source DF SS MS F P
Regression 2 29460 14730 6.54 0.012
Residual Error 12 27033 2253
Total 14 56493
SSR
29460.0
The regression equation is r2  
Sales = 307 - 25.0 Price + 74.1 Advertising .52148
SST
56493.3

52.1% of the variation in pie


DCOVA sales is explained by the
variation in price and
advertising
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-17

2
Adjusted r
DCOVA
r2 never decreases when a new X variable is

added to the model
This can be a disadvantage when comparing

models
What is the net effect of adding a new
variable? 

We lose a degree of freedom when a new X


variable is added
Did the new X variable add enough

explanatory power to offset the loss


of one degree of freedom?

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-18

2 DCOVA
Adjusted r (continued)
Shows the proportion of variation in Y
explained

by all X variables adjusted for the
number of X variables used
 
 



 
 
n
1

22
rr
adj
nk
1 (1 ) 1
(where n = sample size, k = number of independent

variables) Penalize excessive use of

unimportant independent

variables
Smaller than r2

Useful in comparing among models

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-19

Regression Statistics

2
Adjusted r in r .44172 2

adj
Multiple R 0.72213 R Square 0.52148 Adjusted R

Excel Square 0.44172 Standard Error 47.46341 Observations


15

44.2% of the variation in pie sales


is explained by the variation in
DCOVA price and advertising, taking into
account the sample size and
number of independent variables
ANOVA
df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389
2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-20

2
Adjusted r in Minitab
DCOV
A

The regression equation is Total 14 56493


Sales = 307 - 25.0 Price + 74.1 Advertising
r .441722adj
Predictor Coef SE Coef T P
Constant 306.50 114.30 2.68 0.020
Price -24.98 10.83 -2.31 0.040
Advertising 74.13 25.97 2.85 0.014

S = 47.4634 R-Sq = 52.1% R-Sq(adj) =


44.2% of the variation in pie sales
44.2% Analysis of Variance
is explained by the variation in
Source DF SS MS F P price and advertising, taking into
Regression 2 29460 14730 6.54 0.012 account the sample size and
Residual Error 12 27033 2253 number of independent variables
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-21

Is the Model Significant?


DCOV
A
F Test for Overall Significance of the
Model 

Shows if there is a linear relationship between all


of the X variables considered together


and Y Use F-test statistic

Hypotheses:

H0: β1= β2=… = βk= 0 (no linear
relationship) H1: at least one βi≠ 0 (at
least one independent variable affects Y)

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-22

F Test for Overall Significance


 Test statistic: SSR k
MSR DCOVA


F STAT
SSE
MSE
 1
nk

where FSTAThas numerator d.f. = k and


denominator d.f. = (n – k - 1)

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-23

F Test for Overall Significance In

Regression Statistics MSR


Multiple R 0.72213
(continued)
Excel DCOVA
R Square 0.52148 Adjusted R Square 0.44172 Standard 14730.0
Error 47.46341 Observations 15
FSTAT  
6.5386
MSE of freedomP-value for the F Test
2252.8
With 2 and 12 degrees
ANOVA
df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389
2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-24

F Test for Overall Significance In


Minitab
Predictor Coef SE Coef T P
Constant 306.50 114.30 2.68 0.020
Price -24.98 10.83 -2.31 0.040
Advertising 74.13 25.97 2.85 0.014

S = 47.4634 R-Sq = 52.1% R-Sq(adj) = 44.2%

Analysis of Variance
The regression equation is
Source DF SS MS F P
Sales = 307 - 25.0 Price + 74.1 Advertising
Regression 2 29460 14730 6.54 0.012
Residual Error 12 27033 2253 MSR
Total 14 56493 14730.0
FSTAT  
DCOVA 6.5386
MSE
2252.8

With 2 and 12 degrees of freedom the F Test


P-value for
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-25

F Test for Overall


Signi
fican
ce
(continue
d)
H0: β1= β2= 0 Test Statistic: DCOVA
H1: β1and β2not both zero  MSR
= .05 F 6.5386 STAT  MSE
df1= 2 df2= 12 Decision:
Critical Value: Since FSTATtest statistic is in
F0.05= 3.885 the rejection region (p
value < .05), reject H0
 = .05 Conclusion:
0 F at least one
Do no Reject H0 t There is evidence that independent variable
affects Y
reject H0 F0.05= 3.885
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-26

Residuals in Multiple Regression


Two variable model DCOVA

Y Sample

Residual = ei
<
= (Yi – Yi)
ˆ
0 1 1 22 Y b b XbXYi 
observation
<
Yi
x2i x1i

X2

X1 minimizing the sum of


The best fit equation is found by squared errors, e2
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-27

Multiple Regression Assumptions


DCOVA
Errors (residuals) from the regression model:
<
ei= (Yi – Yi)
Assumptions:
The errors are normally
distributed 

Errors have a constant


variance 

The model errors are


independent 

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-28

Residual Plots Regression


Used
These residual plots are
in Multiple
used in multiple 
DCOVA

regression: <

Residuals vs. Yi

Residuals vs. X1i


Residuals vs. X2i


Residuals vs. time (if time series


data) 

Use the residual plots to check


for violations of regression
assumptions
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-29

Are Individual Use t tests of individual


Variables variable slopes 
Significant?
DCOVA

Shows if there is a linear relationship between the



variable Xjand Y holding constant the
effects of other X variables
Hypotheses:

H0: βj= 0 (no linear


relationship) 
H1: βj≠ 0 (linear relationship does exist

between Xjand Y)

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-30

Are Individual  0
Variables b
Significant? j

H0: βj= 0 (no linear


(continued) DCOVA
relationship) H1: βj≠ 0
(linear relationship does
exist between Xjand Y)

Test Statistic:
t  (df = n – k – 1)
STAT
S
j b

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-31

Are Individual Output


Variables
Significant? Excel (continued) DCOVA
Regression Statistics t Stat for Advertising is tSTAT=
Multiple R 0.72213 R Square 0.52148 Adjusted R 2.855, with p-value .0145
Square 0.44172 Standard Error 47.46341 Observations
15

= -2.306, with
t Stat for Price is tSTAT

p
value .0398
ANOVA
df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389
2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-32

Are Individual Variables


Significant? Minitab Output
DCOVA

The regression equation is Sales = 307 - 25.0 Price + 74.1 Advertising

Predictor Coef SE Coef T P


Constant 306.50 114.30 2.68 0.020
Price -24.98 10.83 -2.31 0.040 Source DF SS MS F P
Advertising 74.13 25.97 2.85 0.014 Regression 2 29460 14730 6.54 0.012
Residual Error 12 27033 2253
S = 47.4634 R-Sq = 52.1% R-Sq(adj) = 44.2% Total 14 56493
= -2.31, with p
Analysis of Variance t Stat for Price is tSTAT
value .040 t Stat for Advertising is tSTAT= 2.85,
with p-value .014

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-33

Inferences about the


Slope: t Test Example
H0: βj= 0 H1: βj 0
From the Excel
output:
For Price tSTAT= -2.306,
with p-value .0398
DCOVA
d.f. = 15-2-1 = 12 p-value .0145
For Advertising tSTAT= 2.855, with

 The test statistic foreach variable falls


= .05
t/2= 2.1788
values < .05) Decision:
in the rejection region (p-
/2=.025 variable There is evidence
/2=.025 Conclusion: that both
Reject H0for each
Re Reject H0 ject H0
Price and Advertising affect
-tα/2Do not reject H 0 pie sales at  = .05
0tα/2
-2.1788 2.1788
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-34

Confidence Interval Estimate


for the Slope DCOVA
Confidence interval for the population

slope βj j j b btS α/2 


where t has
(n – k – 1) d.f.

Coefficients Standard Error


Intercept 306.52619 114.25389 Price -24.97509
Here, t has
10.83213 Advertising 74.13096 25.96732
(15 – 2 – 1) = 12 d.f.

Example: Form a 95% confidence interval for the effect of


changes in price (X1) on pie sales:
-24.975 ± (2.1788)(10.832)
So the interval is (-48.576 , -1.374)
(This interval does not contain zero, so price has a significant effect
on sales)
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-35

Confidence Interval Estimate


for the Slope DCOVA
(continued)
Confidence interval for the population slope βj

Coefficients Standard Error … Lower 95% Upper 95%


Intercept 306.52619 114.25389 … 57.58835 555.46404 Price -24.97509 10.83213 … -
48.57626 -1.37392 Advertising 74.13096 25.96732 … 17.55303 130.70888

Example: Excel output also reports these interval endpoints:


Weekly sales are estimated to be reduced by between
1.37 to 48.58 pies for each increase of $1 in the selling
price, holding the effect of price constant

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-36

Testing Portions of the


Multiple Regression Model
DCOVA
Contribution of a Single Independent
Variable Xj 

SSR(Xj| all variables except Xj)


= SSR (all variables) – SSR(all variables except X j)

Measures the contribution of Xjin explaining the


total variation in Y (SST)

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-37
Testing Portions Contribution of a Single
Independent Variable X ,
of the Multiple j

Regression Model
(continued) DCOVA

assuming all other variables are already


included (consider here a 2-variable
model):

SSR(X1| X2)
= SSR (all variables) – SSR(X2)

From ANOVA section


of regression for 0 2 2Y  b b Xˆ
ˆ
From ANOVA section
of regression for
Y  b  bX  bX
01122

Measures the contribution of X1in


explaining SST
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-38

The Partial F-Test Statistic


DCOV
A
Consider the hypothesis test:

H0: variable Xj does not significantly improve the model


after all other variables are included
H1: variable Xjsignificantly improves the model after
all other variables are included

Test using the F-test statistic:



(with 1 and n-k-1 d.f.)

SSR (X | all variables except j) j


 STAT F
MSE
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-39

Testing Portions of Model:


Example
DCOVA

Example: Frozen dessert pies

Test at the  = .05 level


to determine whether
the price variable
significantly improves
the model
given that
advertising
is included

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-40

Testing Portions of
Model: Example
(continued)
H0: X1(price) does not  = .05, df = 1 and 12
improve the model with
X2(advertising) included F0.05= 4.75
H1: X1 does improve model DCOVA
ANOVA
(For X1and X2) (For X2 only) ANOVA
df SS MS df SS

Regression 2 Residual 12 Total 14

29460.0268 7
27033.3064 7
56493.3333 3
14730.0134 3
2252.77553 9

Regression 1 17484.22249 Residual 13 39009.11085 Total 14 56493.33333

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-41

Testing Portions of Model:


Example (continued) DCOVA

ANOVA
(For X1and X2) (For X2 only) ANOVA
df SS MS df SS

Regression 2 Residual 12 29460.0268 7 56493.3333 3 Regression 1 17484.22249


14730.0134 3
27033.3064 7 Residual 13 39009.11085
Total 14 2252.77553 9
Total 14 56493.33333
12
SSR (X | X )  29 , 460 .03 17 , 484 .22

 STAT F 5.316
MSE(all) 2252 .78

Conclusion: Since FSTAT= 5.316 > F0.05= 4.75 Reject H0;


Adding X1 does improve model

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-42

Relationship Between Test


Statistics DCOVA

The partial F test statistic developed in this section and



the t test statistic are both used to determine
the contribution of an independent variable to
a multiple regression model.
The hypothesis tests associated with these two

statistics always result in the same decision (that
is, the p-values are identical).

t  F
2
STAT STAT

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-43

Coefficient of Partial
Determination for k variable
mo
del
DCOVA
2
r
Yj.(all variables except j)

SSR (X | all variables except j)



j


SST SSR(all variables) SSR(X | all variables except j) j

Measures the proportion of variation in the



dependent variable that is explained by Xj while
controlling for (holding constant) the other
independent variables

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-44
Coefficient of Partial
Determination in Excel
DCOV
A
Coefficients of Partial Determination can be

found using Excel:

PHStat | regression | multiple


regression … 

Check the “coefficient of partial


determination” box 
R e g r e s s io n A n a l y s is
C o e ffic ie n ts o f P a r tia l D e te r m in a tio n

In te r m e d ia te C a lc u la tio n s
S S R (X 1 , X 2 ) 2 9 4 6 0 .0 2 7
68

SST 5 6 4 9 3 .3 3 3 3
3

S S R (X 2 ) 1 7 4 8 4 .2 2 2 9 S S R (X 1 | 2 ) 1 1 9 7 5 .8 0
4 X 43

S S R (X 1 ) 1 1 1 0 0 .4 3 8 3 S S R (X 2 | 1 ) 1 8 3 5 9 .5 8
0 X 88

8
4
C o e ffic ie n ts

r 2 Y 1 .2 0 .3 0 7 0 0 0 1
8

r 2 Y 2 .1 0 .4 0 4 4 5 9 5
2

8
4

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-45

Using Dummy Variables


DCOV
A
A dummy variable is a categorical independent

variable with two levels:
yes or no, on or off, male or
female 

coded as 0 or 1

Assumes the slopes associated with numerical



independent variables do not change with the
value for the categorical variable
If more than two levels, the number of dummy

variables needed is (number of levels - 1)

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-46

Dummy-Variable Example
(with 2 Levels)
DCOVA
ˆ

1 2

012 YbbXbX

Let:
Y = pie sales
X1= price
X2= holiday (X2= 1 if a holiday occurred during the
week) (X2= 0 if there was no holiday that week)

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-47

Dummy-Variable Example
(with 2 Levels) (continued)
DCOVA
ˆ
Holiday

Y b b X b (1) (b b ) b X 01021 121

ˆ
 No Holiday

Y b b X b (0) b b X 0101 121

Y (sales) b0 + b2 Ho day If H0: β2= 0 is


li (X rejected, then
Same
Different slope
intercept
2
b0 N =1 “Holiday”
)
has a
o Ho day
li (X2 significant effect
=0 on pie sales
)
X1(Price)
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-48

Interpreting the Dummy


Variable
Coefficient (with DCOVA

2 Levels)
Example: Sales  300 - 30(Price) 
15(Holiday)

Sales: number of pies sold per


week Price: pie price in $

Holiday: during the week 0 If no


1 If a holiday occurred holiday occurred
b2= 15: on average, sales were 15 pies
greater in weeks with a holiday than in
weeks without a holiday, given the same
price

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-49

Dummy-Variable DCOVA

Models (more
than 2 Levels)
The number of dummy variables is one less

than the number of levels
Example:

Y = house price ; X1= square feet


If style of the house is also thought to
matter: 

Style = ranch, split level, colonial

Three levels, so two dummy


variables are needed

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-50

Dummy-Variable than 2 Levels)


Models (more
(continued) DCOVA
 Example: Let “colonial” be the default category,
and let X2and X3 be used for the other two
categories:
Y = house price
X1= square feet
X2= 1 if ranch, 0 otherwise
X3= 1 if split level, 0 otherwise

The multiple regression equation is:


ˆ

0112233 YbbXbXbX

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-51

Interpreting the Dummy


Variable Coefficients (with 3
Levels)
Consider the regression DCOVA
equation:
ˆ
123 Y 20.43 0.045X 23.53X 18.84X 
For a colonial: X2= X3= 0 With the same square feet, a ranch
ˆ will have an estimated average
1 Y 20.43 0.045X   For a price of 23.53 thousand dollars
more than a colonial.
ranch: X2= 1; X3= 0
ˆ
Y 20.43 0.045X 23.53   1
With the same square feet, a
For a split level: X2= 0; X3= 1 split-level will have an estimated
average price of
ˆ
Y 20.43 0.045X 18.84   1 
18.84 thousand dollars more
than a colonial.
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-52

Interaction Between
Independent Variables
DCOV
A
Hypothesizes interaction between pairs of X

variables
Response to one X variable may vary at different

levels of another X variable

Contains two-way cross product


terms 

ˆ


YbbXbXbX 0112233


b b X b X b (X X ) 01122312

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-53

Effect of Interaction
DCOVA

Y β β X β X β X X ε0  1 1  22  312 
Given:

Without interaction term, effect of X1 on Y is



measured by β1
With interaction term, effect of X1 on Y is

measured by β1 + β3 X2
Effect changes as X2changes

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-54

Interaction regression equation is Yˆ

Example Y= 1 + 2X1 + 3X2 + 4X1X2 12

Suppose X2is a dummy


variable and the estimated DCOVA

X2= 1:
8
Y = 1 + 2X1+ 3(1) + 4X1(1) = 4 + 6X1

4
X2= 0:
Y = 1 + 2X1+ 3(0) + 4X1(0) = 1 + 2X1
0
0 0.5 1 1.5 X1
Slopes are different if the effect of X1 on Y depends on
X2value
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-55

Significance of Interaction
Ter
m
DCOV
A
Can perform a partial F test for the
contribution

of a variable to see if the addition
of an interaction term improves
the model

Multiple interaction terms can be


included 

Use a partial F test for the simultaneous contribution


of multiple variables to the model

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-56
Simultaneous
Contribution of
Independent Variables
DCOVA
Use partial F test for the simultaneous

contribution of multiple variables to the
model
Let m variables be an additional set of variables

added simultaneously
To test the hypothesis that the set of m variables

improves the model:

[SSR(all)  SSR (all except new set of m variables )] / m


 STAT F MSE(all)
(where FSTAThas m and n-k-1 d.f.)
Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-57

Logistic Regression
DCOVA
Used when the dependent variable Y is binary

(i.e., Y takes on only two values)
Examples

Customer prefers Brand A or Brand B


Employee chooses to work full-time or


part-time 

Loan is delinquent or is not


delinquent 
Person voted in last election or
did not 

Logistic regression allows you to predict the



probability of a particular categorical response

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-58

Logistic DCOVA
(continued)
Regression
Logistic regression is based on the odds ratio,

which represents the probability of a
success compared with the probability
of failure
probabilit y of success

Odds ratio

1 probabilit y of success

The logistic regression model is based on the



natural log of this odds ratio

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-59

Logistic Logistic Regression Model:

Regression (continued) DCOVA

0 1 1i 2 2i k ki i ln(odds ratio)  β  βX  βX  β Xε

Where k = number of independent variables in the model


εi= random error in observation i

Logistic Regression Equation:

0 1 1i 2 2i k ki ln(estimat ed odds ratio)  b  bX  bX     bX

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-60

Estimated Odds Ratio


and Probability of
Success
DCOVA

Once you have the logistic regression


equation,

compute the estimated odds ratio:
ln(estimat ed odds ratio)
Estimated odds ratio  e

The estimated probability of


success is 

estimated oddsratioEstimated probabilit y of success




1 estimated oddsratio

Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall


Chap 14-61

Chapter Summary
Developed the multiple regression
model 
Tested the significance of the multiple regression

model
Discussed adjusted r2

Discussed using residual plots to check model



assumptions
Tested individual regression
coefficients 

Tested portions of the regression


model 

Used dummy variables


Evaluated interaction effects


Discussed logistic regression



Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall
Chap 14-62

You might also like