Regression Analysis: Unified Concepts, Practical Applications, and Computer Implementation
Regression Analysis: Unified Concepts, Practical Applications, and Computer Implementation
• MURPHREE
BOWERMAN • O’CONNELL
EXPERT PRESS Unified Concepts, Practical Applications, to Decision Making Collection
DIGITAL LIBRARIES and Computer Implementation Donald N. Stengel, Editor
EBOOKS FOR Bruce L. Bowerman • Richard T. O’Connell
BUSINESS STUDENTS • Emily S. Murphree
Curriculum-oriented, born- This book is a concise and innovative book that gives a complete
Regression
digital books for advanced presentation of applied regression analysis in approximately
business students, written one-half the space of competing books. With only the modest
prerequisite of a basic (non-calculus) statistics course, this text is
by academic thought appropriate for the widest possible audience.
Analysis
leaders who translate real- After a short chapter, Chapter 1, introducing regression,
world business experience this book covers simple linear regression and multiple regres-
into course readings and sions in a single cohesive chapter, Chapter 2, by efficiently
reference materials for integrating the discussion of these two techniques. Chapter 2
students expecting to tackle also makes learning easier for students of all backgrounds
management and leadership
challenges during their
by teaching the necessary statistical background topics (for
example, hypothesis testing) and the necessary matrix a lgebra
concepts as they are needed in teaching regression. Chapter 3
Unified Concepts,
Practical
professional careers. continues the integrative approach of the text by giving a
unified presentation of more advanced regression models, in-
POLICIES BUILT cluding models using squared and interaction terms, models
BY LIBRARIANS
Applications,
using dummy variables, and logistic regression models.
• Unlimited simultaneous The book concludes with Chapter 4, which organizes the
usage techniques of model building, model diagnosis, and model
and Computer
• Unrestricted downloading improvement into an easy to understand six step procedure.
and printing Bruce L. Bowerman is professor emeritus of decision sciences
• Perpetual access for a at Miami University in Oxford, Ohio. He received his PhD d
egree
REGRESSION ANALYSIS
one-time fee
• No platform or
maintenance fees
in statistics from Iowa State University in 1974 and has over
forty years of experience teaching basic statistics, regression
analysis, time series forecasting, and other courses. He has
Implementation
been the recipient of an Outstanding Teaching award from his
• Free MARC records
Bruce L. Bowerman
students at Miami and an Effective Educator award from the
• No license to execute Richard T. Farmer School of Business Administration at Miami.
The Digital Libraries are a
Richard T. O’Connell
Richard T. O’Connell is professor emeritus of decision sci-
comprehensive, cost-effective ences at Miami University, Oxford, Ohio. He has more than
way to deliver practical 35 years of experience teaching basic statistics, regression
treatments of important analysis, time series forecasting, quality control, and other
courses. Professor O’Connell has been the recipient of an Effe
Emily S. Murphree
business issues to every
ctive Educator award from the Richard T. Farmer School of
student and faculty member. Business Administration at Miami.
Emily S. Murphree is professor emeritus of statistics at
Miami University, Oxford, Ohio. She received her PhD in sta-
tistics from the University of North Carolina with a research
For further information, a concentration in applied probability. Professor Murphree re-
free trial, or to order, contact: ceived Miami’s College of Arts and Sciences Distinguished
Education Award and has received various civic awards.
sales@businessexpertpress.com
Quantitative Approaches
www.businessexpertpress.com/librarians
to Decision Making Collection
Donald N. Stengel, Editor
ISBN: 978-1-60649-950-4
Regression Analysis
Regression Analysis
Unified Concepts, Practical
Applications, and Computer
Implementation
10 9 8 7 6 5 4 3 2 1
Keywords
logistic regression, model building, model diagnostics, multiple regres-
sion, regression model, simple linear regression, statistical inference, time
series regression
Contents
Preface��������������������������������������������������������������������������������������������������ix
References�������������������������������������������������������������������������������������������261
Index�������������������������������������������������������������������������������������������������263
Preface
Regression Analysis: Unified Concepts, Practical Applications, and Computer
Implementation is a concise and innovative book that gives a complete
presentation of applied regression analysis in approximately one-half the
space of competing books. With only the modest prerequisite of a basic
(non-calculus) statistics course, this text is appropriate for the widest pos-
sible audience—including college juniors, seniors, and first year graduate
students in business, the social sciences, the sciences, and statistics, as
well as professionals in business and industry. The reason that this text
is appropriate for such a wide audience is that it takes a very unique and
integrative approach to teaching regression analysis. Most books, after a
short chapter introducing regression, cover simple linear regression and
multiple regression in roughly four chapters by beginning with a chapter
reviewing basic statistical concepts and then having chapters on simple
linear regression, matrix algebra, and multiple regression. In contrast, this
book, after a short chapter introducing regression, covers simple linear
regression and multiple regression in a single cohesive chapter, Chapter 2,
by efficiently integrating the discussion of the two techniques. In addi-
tion, the same Chapter 2 teaches both the necessary basic statistical con-
cepts (for example, hypothesis testing) and the necessary matrix algebra
concepts as they are needed in teaching regression. We believe that this
approach avoids the needless repetition of traditional approaches and
does the best job of getting a wide variety of readers (who might be stu-
dents with different backgrounds in the same class) to the same level of
understanding.
Chapter 3 continues the integrative approach of the book by discuss-
ing more advanced regression models, including models using squared
and interaction terms, models using dummy variables, and logistic regres-
sion models. The book concludes with Chapter 4, which organizes the
techniques of model building, model diagnosis, and model improvement
into a cohesive six step procedure. Whereas many competing texts spread
such modeling techniques over a fairly large number of chapters that can
x PREFACE
seem unrelated to the novice, the six step procedure organizes both stan-
dard and more advanced modeling techniques into a unified presenta-
tion. In addition, each chapter features motivating examples (many real
world, all realistic) and concludes with a section showing how to use SAS
followed by a set of exercises. Excel, MINITAB, and SAS outputs are
used throughout the text, and the book’s website contains more exercises
for each chapter. The book’s website also houses Appendices B, C, and
D. Appendix B gives careful derivations of most of the applied results in
the text. These derivations are referenced in the main text as the applied
results are discussed. Appendix C includes an applied discussion extend-
ing the basic treatment of logistic regression given in the main text. This
extended discussion covers binomial logistic regression, generalized (mul-
tiple category) logistic regression, and Poisson regression. Appendix D
extends the basic treatment of modeling time series data given in the main
text. The Box-Jenkins methodology and its use in regression analysis are
discussed
Author Bruce Bowerman would like to thank Professor David
Nickerson of the University of Central Florida for motivating the writing
of this book. All three authors would like to thank editor Scott Isenberg,
production manager Destiny Hadley, and permissions editor Marcy
Schneidewind, as well as the fine people at Exeter, for their hard work.
Most of all we are indebted to our families for their love and encourage-
ment over the years.
Bruce L. Bowerman
Richard T. O’Connell
Emily S. Murphree
CHAPTER 1
An Introduction to
Regression Analysis
The simple linear regression model relates the dependent variable, which
is denoted y, to a single independent variable, which is denoted x, and
assumes that the relationship between y and x can be approximated by a
straight line. We can tentatively decide whether there is an approximate
straight-line relationship between y and x by making a scatter diagram,
or scatter plot, of y versus x. First, data concerning the two variables are
observed in pairs. To construct the scatter plot, each value of y is plotted
against its corresponding value of x. If the y values tend to increase or
decrease in a straight-line fashion as the x values increase, and if there is a
scattering of the ( x , y ) points around the straight line, then it is reasonable
to describe the relationship between y and x by using the simple lin-
ear regression model. We illustrate this in the following example, which
shows how regression analysis can help a natural gas company improve its
gas ordering process.
Example 2.1
When the natural gas industry was deregulated in 1993, natural gas com-
panies became responsible for acquiring the natural gas needed to heat
the homes and businesses in the cities they serve. To do this, natural gas
6 REGRESSION ANALYSIS
before assessing a fine, the company would like the actual and predicted
weekly fuel consumptions to differ by no more than 10 percent. Our
experience suggests that weekly fuel consumption substantially depends
on the average hourly temperature (in degrees Fahrenheit) measured in
the city during the week. Therefore, we will try to predict the depen-
dent (response) variable weekly fuel consumption ( y) on the basis of the
independent (predictor) variable average hourly temperature (x) during the
week. To this end, we observe values of y and x for eight weeks. The data
are given in the Excel output of Figure 2.1, along with a scatter plot of
y versus x. This plot shows (1) a tendency for the fuel consumptions to
decrease in a straight line fashion as the temperatures increase and (2) a
scattering of points around the straight line.
To begin to find a regression model that represents characteristics (1)
and (2) of the data plot, consider a specific average hourly temperature x.
For example, consider the average hourly temperature 28°F, which was
observed in week one, or consider the average hourly temperature 45.9°F,
which was observed in week five (there is nothing special about these
two average hourly temperatures, but we will use them throughout this
example to help explain the idea of a regression model). For the specific
average hourly temperature x that we consider, there are, in theory, many
weeks that could have this temperature. However, although these weeks
each have the same average hourly temperature, other factors that affect
fuel consumption could vary from week to week. For example, these
weeks might have different average hourly wind velocities, different ther-
mostat settings, and so forth. Therefore, the weeks could have different
fuel consumptions. It follows that there is a population of weekly fuel
A B C D E F G H
1 TEMP FUELCONS
2 28 12.4
3 28 11.7 15
4 32.5 12.4
5 39 10.8 13
6 45.9 9.4 11
FUEL
7 57.8 9.5 9
8 58.1 8
7
9 62.5 7.5
10 5
11 20 30 40 50 60 70
12 TEMP
13
14
my| x = b0 + b1 x
my|28 = b0 + b1(28)
= 15.77 − .1281(28)
= 12.18 MMcf of natural gas
As another example, it would also follow that the mean of the popu-
lation of all weekly fuel consumptions that could be observed when the
average hourly temperature is 45.9°F is
a straight line. For example, consider the eight mean weekly fuel con-
sumptions that correspond to the eight average hourly temperatures in
Figure 2.1. In Figure 2.2 we depict these mean weekly fuel consumptions
as triangles that lie exactly on the straight line defined by the equation
x
28.0 45.9 62.5
(a) The line of means and the error terms
y
b1= The change in mean weekly consumption
that is associated with a one-degree increase
b0 + b1c in average hourly temperature
b1
b0 + b1(c + 1)
x
c c+1
(b) The slope of the line of means
Figure 2.2 The simple linear regression model relating weekly fuel
consumption to average hourly temperature
10 REGRESSION ANALYSIS
b0 + b1 (c )
For the second week, suppose that the average hourly temperature is
(c + 1). The mean weekly fuel consumption for all such weeks is
b0 + b1 (c + 1)
It is easy to see that the difference between these mean weekly fuel con-
sumptions is b1. Thus, as illustrated in Figure 2.2(b), the slope b1 is the
change in mean weekly fuel consumption that is associated with a one-
degree increase in average hourly temperature. To interpret the meaning
of the y-intercept b0, consider a week having an average hourly tempera-
ture of 0°F. The mean weekly fuel consumption for all such weeks is
b0 + b1 (0) = b0
y = my | x + e
= b0 + b1 x + e
This model says that the weekly fuel consumption y observed when
the average hourly temperature is x differs from the mean weekly fuel
consumption my| x by an amount equal to e (epsilon). Here e is called an
error term. The error term describes the effect on y of all factors other than
the average hourly temperature. Such factors would include the average
hourly wind velocity and the average hourly thermostat setting in the city.
For example, Figure 2.2(a) shows that the error term for the first week is
positive. Therefore, the observed fuel consumption y = 12.4 in the first
week was above the corresponding mean weekly fuel consumption for all
weeks when x = 28. As another example, Figure 2.2(a) also shows that the
error term for the fifth week was negative. Therefore, the observed fuel
consumption y = 9.4 in the fifth week was below the corresponding mean
weekly fuel consumption for all weeks when x = 45.9. Of course, since
we do not know the true values of b0 and b1, the relative positions of the
quantities pictured in the figure are only hypothetical.
With the fuel consumption example as background, we are ready to
define the simple linear regression model relating the dependent variable y to
the independent variable x . We suppose that we have gathered n observa-
tions—each observation consists of an observed value of x and its corre-
sponding value of y. Then:
y = my| x + e = b0 + b1 x + e
Here
y
An observed
value of y
when x equals x0 Straight line defined
by the equation
Error myx = b0 + b1x
term
Mean value of y
Slope = b1 when x equals x0
b0 One-unit change
in x
y-intercept
x
0
x0 = A specific value of x
y
16
15 An estimated regression line
14 y^ = b0 + b1x
13
12
Predicted fuel
11 consumption
10 Residual
9
8 Observed fuel consumption
7
x
0 10 20 30 40 50 60 70
∧
ei = yi − y i = yi − (b0 + b1 xi )
Then, the least squares line is the line that minimizes the sum of the
squared prediction errors (that is, the sum of squared residuals)
n n n
SSE= ∑ ei2 = ∑ ( yi − yi )2 = ∑ ( yi − (b0 + b1 xi ))2
∧
i =1 i =1 i =1
To find the least squares line, we find the values of the y-intercept b0
∧
and slope b1 that give values of y i = b0 + b1 xi that minimize SSE. These val-
ues of b0 and b1 are called the least squares point estimates of b0 and b1. Using
calculus (see Section B.1 in Appendix B), we can show that the least
squares point estimates are as follows:
SSxy 1
1. The least squares point estimate of the slope b1 is b1 = , where
SSxx
1
In order to simplify notation, we will often drop the limits onnsummations in this and
subsequent chapters. That is, instead of using the summation ∑ i =1
we will simply write ∑.
Simple and Multiple Regression: An Integrated Approach 15
SSxy = ∑ ( xi − x )( yi − y ) = ∑ xi yi −
(∑ x )(∑ y )i i
and
(∑ x )
2
SSxx = ∑ ( xi − x ) = ∑ x
2 i
i
2
−
n
y=
∑y i
and x =
∑x i
n n
Example 2.2
y = my |x + e
= b0 + b1 x + e
we first consider the summations that are shown in Table 2.1. Using these
summations, we calculate SSxy and SSxx as follows:
( ∑ xi )( ∑ yi )
SS xy = ∑ xi yi −
n
(351.8)(81.7 )
= 3413.11 − = − 1779.6475
8
( ∑ xi )2
SS xx = ∑ xi2 −
n
(351.8)2
= 16,874.76 − = 1404.355
8
16 REGRESSION ANALYSIS
yi xi x i2 x i yi
12.4 28.0 (28.0) = 7842
(28.0)(12.4) = 347.2
11.7 28.0 (28.0) = 7842
(28.0)(11.7) = 327.6
12.4 32.5 (32.5)2 = 1,056.25 (32.5)(12.4) = 403
10.8 39.0 (39.0) = 1,521
2
(39.0)(10.8) = 421.2
9.4 45.9 (45.9)2 = 2,106.81 (45.9)(9.4) = 431.46
9.5 57.8 (57.8) = 3,340.84
2
(57.8)(9.5) = 549.1
8.0 58.1 (58.1)2 = 3,375.61 (58.1)(8.0) = 464.8
7.5 62.5 (62.5) = 3,906.25
2
(62.5)(7.5) = 468.75
Σyi = 81.7 Σ x i = 351.8 Σ x i = 16, 874.76 Σ x i yi = 3,413.11
2
SSxy −179.6475
b1 = = = − .1279
SSxx 1404.355
Furthermore, because
y=
∑y i
=
81.7
= 10.2125 and x =
∑x i
=
351.8
= 43.98
8 8 8 8
8
SSE = ∑ e2i = 2.568
i=1
∧
y = b0 + b1 x = 15.84 − .1279 x
∧
y1 = 15.84 − .1279(28) = 12.2560
∧
e1 = y1 − y 1 = 12.4 − 12.2560 = .1440
If we consider all of the residuals in Table 2.4 and add their squared
values, we find that SSE, the sum of squared residuals, is 2.568. If we
calculated SSE by using any point estimates of b0 and b1 other than the
least squares point estimates b0 = 15.84 and b1 = -.1279, we would obtain
a larger value of SSE. The SSE of 2.568 given by the least squares point
estimates will be used throughout this chapter.
18 REGRESSION ANALYSIS
∧
y = b0 + b1 x
= 15.84 − .1279 x
is the point estimate of my| x = b0 + b1 x , the mean of all weekly fuel con-
sumptions that could be observed when the average hourly temperature
is x. In addition, we predict the error term e to be zero. Therefore, y∧ is
also the point prediction of an individual value y = b0 + b1 x + e , which is
the amount of fuel consumed in a single week that has an average hourly
temperature of x. Note that the reason we predict the error term e to be
zero is that, because of several regression assumptions to be discussed in
Section 2.3, e has a 50 percent chance of being positive and a 50 percent
chance of being negative.
Now suppose a weather forecasting service predicts that the average
hourly temperature in the next week will be 40°F. Because 40°F is in the
experimental region,
∧
y = 15.84 − .1279( 40)
= 10.72 MMcf of natural gas
is (1) the point estimate of the mean weekly fuel consumption when the
average hourly temperature is 40°F and (2) the point prediction of an
individual weekly fuel consumption when the average hourly tempera-
ture is 40°F. This says that (1) we estimate that the average of all possible
weekly fuel consumptions that could potentially be observed when the
average hourly temperature is 40°F equals 10.72 MMcf of natural gas,
Simple and Multiple Regression: An Integrated Approach 19
and (2) we predict that the fuel consumption in a single week when the
average hourly temperature is 40°F will be 10.72 MMcf of natural gas.
To conclude this example, note that Figure 2.5 illustrates both the
∧
point prediction y = 10.72 and the potential danger of using the least
squares line to predict outside the experimental region. In the figure, we
extrapolate the least squares line far beyond the experimental region to
obtain a prediction for a temperature of -10°F. As shown in Figure 2.1,
for values of x in the experimental region, the observed values of y tend
to decrease in a straight-line fashion as the values of x increase. However,
for temperatures lower than 28°F the relationship between y and x might
become curved. If it does, extrapolating the straight-line prediction equa-
tion to obtain a prediction for x = -10 might badly underestimate mean
weekly fuel consumption (see Figure 2.5).
The previous example illustrates that when we are using a least squares
regression line, we should not estimate a mean value or predict an indi-
vidual value unless the corresponding value of x is in the experimental
region—the range of the previously observed values of x. Often the value
y
23
22 The relationship between mean fuel consumption
21 and x might become curved at low temperatures
True mean fuel 20
consumption when 19
x = −10 18
17
Estimated mean fuel 16
consumption when 15
x = −10 obtained by 14
extrapolating the 13 The least squares line
least squares line 12 ^
y = 15.84 −.1279x
11
^
y = 10.72 10
9
8
7
x
−10 0 10 20 30 40 50 60 70
28 62.5
Experimental region
x = 0 is not in the experimental region. For example, consider the fuel con-
sumption problem. Figure 2.5 illustrates that the average hourly tempera-
ture 0°F is not in the experimental region. In such a situation, it would not
be appropriate to interpret the y-intercept b0 as the estimate of the mean
value of y when x equals zero. In the case of the fuel consumption prob-
lem, it would not be appropriate to use b0 = 15.84 as the point estimate
of the mean weekly fuel consumption when average hourly temperature is
zero. Therefore, because it is not meaningful to interpret the y-intercept in
many regression situations, we often omit such interpretations.
Regression models that employ more than one independent variable are
called multiple regression models. We begin our study of these models by
considering the following example.
Example 2.3
Consider the fuel consumption problem in which the natural gas company
wishes to predict weekly fuel consumption for its city. In Section 2.1 we
used the single predictor variable x, average hourly temperature, to predict
y, weekly fuel consumption. We now consider predicting y on the basis
of average hourly temperature and a second predictor variable—the chill
index. The chill index for a given average hourly temperature expresses the
combined effects of all other major weather-related factors that influence
fuel consumption, such as wind velocity, cloud cover, and the passage of
weather fronts. The chill index is expressed as a whole number between
0 and 30. A weekly chill index near zero indicates that, given the average
hourly temperature during the week, all other major weather-related fac-
tors will only slightly increase weekly fuel consumption. A weekly chill
index near 30 indicates that, given the average hourly temperature during
Simple and Multiple Regression: An Integrated Approach 21
the week, other weather-related factors will greatly increase weekly fuel
consumption.
The company has collected data concerning weekly fuel consumption
( y), average hourly temperature ( x1 ) , and the chill index ( x2 ) for the last
eight weeks. These data are given in Table 2.3. Figure 2.6 presents a scatter
plot of y versus x1. (Note that the y and x1 values given in Table 2.3 are
the same as the y and x values given in Figure 2.1). This plot shows that
y tends to decrease in a straight-line fashion as x1 increases. This suggests
that if we wish to predict y on the basis of x1 only, the simple linear regres-
sion model (having a negative slope)
y = b0 + b1 x1 + e
relates y to x1. Figure 2.6 also presents a scatter plot of y versus x2. This plot
shows that y tends to increase in a straight-line fashion as x2 increases.
This suggests that if we wish to predict y on the basis of x2 only, the sim-
ple linear regression model (having a positive slope)
y = b0 + b1 x2 + e
relates y to x2. Since we wish to predict y on the basis of both x1 and x2, it
seems reasonable to combine these models to form the model
y = b0 + b1 x1 + b2 x2 + e
y
13
12
11
10
9
8
7
x1
20 28.0 32.5 39.0 45.9 57.8 58.1 62.5 70
y
13
12
11
10
9
8
7
x2
0 5 10 15 20 25
is the average fuel consumption for all weeks having an average
hourly temperature equal to 45.9 and a chill index equal to 8.
Simple and Multiple Regression: An Integrated Approach 23
b0 + b1 x1 + b2 x2 = b0 + b1(0) + b2 (0) = b0
b0 + b1(c ) + b2 (d )
For the second week, suppose that the average hourly temperature is c + 1
and the chill index is d . The mean weekly fuel consumption for all such
weeks is
b0 + b1(c + 1) + b2 (d )
24 REGRESSION ANALYSIS
It is easy to see that the difference between these mean fuel consumptions
is b1. Since weeks one and two differ only in that the average hourly tem-
perature during week two is one degree higher than the average hourly
temperature during week one, we can interpret the parameter b1 as the
change in mean weekly fuel consumption that is associated with a one-
degree increase in average hourly temperature when the chill index does
not change.
The interpretation of b2 can be established similarly. We can interpret
b2 as the change in mean weekly fuel consumption that is associated with
a one-unit increase in the chill index when the average hourly tempera-
ture does not change.
x2
Experimental
25 (32.5, 24) region
(39.0, 22)
20
(28.0, 18)
(57.8, 16)
15
(28.0, 14)
10
(45.9, 8)
(58.1, 1) (62.5, 0)
x1
20 30 40 50 60 70
Here the combinations of x1 and x2 values are the ordered pairs in the
figure.
We next write the mean value of y when the average hourly tempera-
ture is x1 and the chill index is x2 as my|x1 ,x2 (pronounced mu of y given x1
and x2) and consider the equation
my|x1 ,x2 = b0 + b1 x1 + b2 x2
which relates mean fuel consumption to x1 and x2. Since this is a linear
equation in two variables, geometry tells us that this equation is the equa-
tion of a plane in three-dimensional space. We sometimes refer to this
plane as the plane of means, and we illustrate the portion of this plane
corresponding to the ( x1 , x2 ) combinations in the experimental region in
Figure 2.8. As illustrated in this figure, the model
y = my|x1 ,x2 + e
= b0 + b1 x1 + b2 x2 + e
e = y−my28.0, 18
10 e = y−my45.9, 8
x1
5 10 20 30 40 50
10
15
20 (28.0, 18)
25 (45.9, 8)
30
x2
Experimental region
says that the eight error terms cause the eight observed fuel consumptions
(the dots in the upper portion of the figure) to deviate from the eight
mean fuel consumptions (the triangles in the figure), which exactly lie on
the plane of means
my|x1 ,x2 = b0 + b1 x1 + b2 x2
For example, consider the data for week one in Table 2.3 ( y = 12.4,
x1 = 28.0, x2 = 18). Figure 2.8 shows that the error term for this week
is positive, causing y to be higher than my|28.0,18 (mean fuel consumption
when x1 = 28 and x2 = 18). Here factors other than x1 and x2 (for instance,
thermostat settings that are higher than usual) have resulted in a positive
error term. As another example, the error term for week 5 in Table 2.3
( y = 9.4, x1 = 45.9, x2 = 8) is negative. This causes y for week five to be
lower than my|45.9,8 (mean fuel consumption when x1 = 45.9 and x2 = 8).
Here factors other than x1 and x2 (for instance, lower-than-usual thermo-
stat settings) have resulted in a negative error term.
The fuel consumption model expresses the dependent variable as a
function of two independent variables. In general, we can use a multiple
regression model to express a dependent variable as a function of any num-
ber of independent variables. For example, the Cincinnati Gas and Elec-
tric Company predicts daily natural gas consumption as a function of four
independent variables—average temperature, average wind velocity, aver-
age sunlight, and change in average temperature from the previous day.
The general form of a multiple regression model expresses the dependent
variable y as a function of k independent variables x1 , x2 ,…, x k . We call
this general form the (multiple) linear regression model and express it as
shown in the following box.
is
∧
yi = b0 + b1 xi1 + b2 xi 2 + . . . + bk xik
∧
ei = yi − yi small. We define the least squares points estimates to be the val-
ues b0 ,b1 ,b2 ,…,bk that minimize the sum of squared residuals
n
SSE = ∑ ( yi − y∧i )2
i =1
Using calculus (see Section B.2), it can be shown that the least squares
point estimates can be calculated by using a formula involving matrix
algebra. We now discuss matrix algebra and explain the formula.
A matrix is rectangular array of numbers (called elements) that is com-
posed of rows and columns. Matrices are denoted by boldface letters.
For example, we will use two matrices to calculate the least squares point
estimates of the parameters b0 , b1 and b2 in the fuel consumption model
y = b0 + b1 x1 + b2 x2 + e
observed average hourly temperatures x11 = 28, x21 = 28, . . . , x81 = 62.5.
The independent variable x2 is multiplied by b2 , and thus the column of
the X matrix corresponding to b2 is a column containing the observed chill
indices x12 = 18, x22 = 14, . . . , x82 = 0.
The dimension of a matrix is determined by the number of rows and
columns in the matrix. Since the matrix X has eight rows and three col-
umns, this matrix is said to have dimension 8 by 3 (commonly written
8 × 3). In general, a matrix with m rows and n columns is said to have
dimension m × n. As another example, the matrix y has eight rows and
one column. In general, a matrix having one column is called a column
vector. In order to use the matrix X and column vector y to calculate the
least squares point estimates, we first define the transpose of X.
The transpose of a matrix is formed by interchanging the rows and
columns of the matrix. For example, the transpose of the matrix X, which
we denote as X′ is
1 1 1 1 1 1 1 1
X′ = 28.0 28.0 32.5 39.0 45.9 57.8 58.1 62.5
18 14 24 22 8 16 1 0
1 2 r 1 2 j n
1 1
2 2
Am×rBr×n =
i
m r
1 2 j n
1
2
= = Cm×n
i cij
We multiply X′ by X as follows:
1 28.0 18
1 28.0 14
1 32.5 24
1 1 1 1 1 1 1 1
1 39.0 22
X ′ X = 28.0 28.0 32.5 39.0 45.9 57.8 58.1 62.5
1 45.9 8
18 14 24 22 8 16 1 0
1 57.8 16
1 58.1 1
1 62.5 0
8.0 351.8 103.0
= 351.8 16874.76 3884.1
103.0 3884.1 190 01.0
12.4
1112.4
.7
11.7
12.4
1 1 1 1 1 1 1 1 12.4
1 1 1 1 1 1 1 1 10.8
X ′ y = 28.0 28.0 32.5 39.0 45.9 57.8 58.1 62.510.8
X ′ y = 28.0 28.0 32.5 39.0 45.9 57.8 58.1 62.5 9.4
18
18 14 24 22 8 16 1 0 9.4
14 24 22 8 16 1 0 9.5
89.0.5
8.0
7.5
7.5
81.7
81.7
= 3413.11
= 3413.11
1157.4
1157.4
32 REGRESSION ANALYSIS
1 0 0
I = 0 1 0
0 0 1
3 1 2 2 1
A = 1 .5 0 c = 1 d = 0
2 0 4 0 2
y = b0 + b1 x1 + b2 x2 + e
b0
−1
b1 = b = (X ′ X ) X ′ y
b2
5.43405 −.085930 −.118856 81.7
= −.085930 .00147070 .00165094 3413.11
−.118856 .00165094 .00359276 1157.4
13.1087
= −.09001
.08249
y = b0 + b1 x1 + b2 x2 + ... + bk xk + e
0 1 2 . . . k
x1 x2 . . . xk
y1 1 x11 x12 ⋅ ⋅ ⋅ x1k
y 1 x21 x22 ⋅ ⋅ ⋅ x2k
y= X =
2
yn 1 xn1 xn 2 ⋅ ⋅ ⋅ xnk
34 REGRESSION ANALYSIS
b0
b1
b2 = b = (X ′ X )−1 X ′y
b
k
y1 1 x1
y 1 x2
y=
2
and X=
yn 1 xn
y − b1 x
b0 −1
b = b = (X ′ X ) X ′ y = SSxy
1
SSxx
These are the same formulas for b0 and b1 that we presented in
Section 2.1.
Example 2.4
Figure 2.10 is the Minitab output of a regression analysis of the fuel con-
sumption data in Table 2.3 by using the model
y = b0 + b1 x1 + b2 x2 + e
This output shows that the least squares point estimates of b0 , b1 , and b2
are b0 = 13.1087, b1 = − .09001, and b2 = .08249 , as have been calcu-
lated previously using matrices.
The point estimate b1 = −.09001 of b1 says we estimate that mean
weekly fuel consumption decreases (since b1 is negative) by .09001 MMcf
of natural gas when average hourly temperature increases by one degree
and the chill index does not change. The point estimate b2 = .08249 of b2
says we estimate that mean weekly fuel consumption increases (since b2 is
positive) by .08249 MMcf of natural gas when there is a one-unit increase
in the chill index and average hourly temperature does not change.
The equation
∧
y = b0 + b1 x1 + b2 x2
= 13.1087 − .09001x1 + .08249 x2
∧
y1 = 13.1087 − .09001(28.0) + .08249(18) = 12.0733
36 REGRESSION ANALYSIS
∧
e1 = y1 − y 1 = 12.4 − 12.0733 = .3267
Table 2.4 gives the point prediction obtained using the least squares
prediction equation and the residual for each of the eight observed
fuel consumption values. ln addition, this table tells us that the SSE
equals .674.
The least squares prediction equation is the equation of a plane that
we sometimes call the least squares plane. For combinations of values of
x1 and x2 that are in the experimental region, the least squares plane is the
estimate of the plane of means (see Figure 2.8). This implies that the point
on the least squares plane corresponding to the average hourly tempera-
ture x1 and the chill index x2
∧
y = b0 + b1 x1 + b2 x2
= 13.1087 − .09001x1 + .08249 x2
is the point estimate of my|x1 ,x2, the mean of all the weekly fuel consump-
tions that could be observed when the average hourly temperature is x1
and the chill index is x2. ln addition, since we predict the error term to
∧
be zero, y is also the point prediction of y = my |x1 ,x2 + e , which is the
Table 2.4 Predictions and residuals using the least squares point
estimates b0 = 13.1, b1 = −.0900 , and b2 = .0825
amount of fuel consumed in a single week when the average hourly tem-
perature is x1 and the chill index is x2.
For example, suppose a weather forecasting service predicts that in the
next week the average hourly temperature will be 40°F and the chill index
will be 10. Since this combination is inside the experimental region (see
Figure 2.7), we see that
∧
y = 13.1087 − .09001( 40) + .08249(10)
= 10.333 MMcf of natural gas
is
1. The point estimate of the mean weekly fuel consumption when the
average hourly temperature is 40°F and the chill index is 10.
2. The point prediction of the amount of fuel consumed in a single week
when the average hourly temperature is 40°F and the chill index is 10.
∧
Notice that y = 10.333 is given at the bottom of the Minitab output in
Figure 2.10. Also, note that Figure 2.11 is the Minitab output that results
from using the data in Figure 2.1 and the simple linear regression model
y = b0 + b1 x + e
h
s = standard error r Explained variation k SSE = Unexplained variation lTotal variation
i 2 j
m
F (model) statistic n p-value for F (model) o ŷ when x = 40 p sˆy q 95% confidence interval
when x = 40 r 95% prediction interval when x = 40
Simple and Multiple Regression: An Integrated Approach 39
∧
y = b0 + b1 x01 + b2 x02 + ... + bk x0 k
is the point estimate of the mean value of the dependent variable when
the values of the independent variables are x01 , x02 ,..., x0k . In addition,
∧
y is the point prediction of an individual value of the dependent variable
when the values of the independent variables are x01 , x02 ,..., x0k. Here
we predict the error term to be zero.
Example 2.5
y = b0 + b1 x1 + b2 x2 + b3 x3 + b4 x 4 + b5 x5 + e
1 x1 x2 x3
x4 x5
y1 3669.88
y 3473.95 1 43.10 74065.11 4582.88 2.51 .34
y = = X = 1
2
108.13 58117.30 5539.78 5.51 .15
y25 2799.97 1
21.14 22809.53 3552.00 9.14 −.74
Simple and Multiple Regression: An Integrated Approach 41
Table 2.5 Sales territory performance data, data plots, and regression
(a) The data (b) Data plots
Sales
Mkt Mkt
Sales Time Poten Adver Share Change
Time
3,669.88 43.10 74.065.11 4,582.88 2.51 0.34
Sales
2,295.10 13.82 21,118.49 2,950.38 10.91 -0.72
MktPoten
4,675.56 186.18 68,521.27 2,243.07 8.27 0.17
Sales
6,125.96 161.79 57,805.11 7,747.08 9.15 0.50
Sales
3,367.45 220.32 35,602.08 2,086.16 7.07 -0.49
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 5 37862659a 7572532 40.91d <.0001e
Error 19 3516890b 185099
Corrected Total 24 41379549c
REGRESSION ANALYSIS
Parameter Estimates
Parameterg Standardh
Variable Label DF Estimate Error
t Valuei Pr > |t|j
Intercept Intercept 1
-1113.78788 419.88690
Time Time 1 1.18170 -2.65 0.0157
MktPoten MktPoten 3.61210
1 3.06 0.0065
Adver Adver 0.04209
1 0.00673 6.25 <.0001
MktShare MktShare 0.12886
1 0.03704 3.48 0.0025
Change Change 256.95554
1 39.13607 6.57 <.0001
324.53345 157.28308 2.06 0.0530
Dep Var Predictedl Std Erroro
Obs Sales Value Mean Predict 95% CL Meanm 95% CL Predictn
26 . 4182 141.8220 3885 4479 3234 5130
a e h
Explained variation b SSE = unexplained variation cTotal variation d F(model) p-value for F(model) f R 2 g b j s bj i t -statistic j p-value for
t-statistic k s = standard error 1 ŷ m 95 percent confidence interval for mean n95 percent prediction interval os^y
Simple and Multiple Regression: An Integrated Approach 43
∧
y = −1113.7879 + 3.6121(85.42) + .0421(35,182.73)
+ .1289(7281.65) + 256.9555(9.64 ) + 324.5335(.28)
= 4182(that is, 418, 200 units )
which is given on the SAS output. The actual sales for the question-
able sales representative were 3088. This sales figure is 1094 less than
∧
the point prediction y = 4182. However, we will have to wait until we
study p rediction intervals to determine whether there is strong evidence
that the actual sales figure is unusually low. In the exercises, the reader will
further analyze the sales territory performance data by using techniques
(including prediction intervals) that will be discussed in the rest of this
chapter.
44 REGRESSION ANALYSIS
y = my| x1 , x2 ,...., xk + e
= b0 + b1 x1 + b2 x2 + ... + bk xk + e
Taken together, the first three regression assumptions say that at any
given combination of values of x1 , x2 , … , x k , the population of potential
error term values is normally distributed with mean zero and a variance
s 2 that does not depend on the combination of values of x1 , x2 , … , x k .
The model
y = b0 + b1 x1 + b2 x2 + ... + bk xk + e
y
12.4 = Observed value of y when x = 32.5
The least squares point estimates b0 ,b1 ,b2 ,…,bk of the parameters
b0 , b1 , b2 ,…, bk of the linear regression model are calculated by using the matrix
algebra equation b = ( X ′ X ) X ′ y and thus depend upon the n observed
−1
Table 2.6 Three samples of weekly fuel consumptions and their least
squares point estimates
Average hourly The chill Sample Sample Sample
Week temperature, x1 index, x 2 1 2 3
1 28.0 18 y1 = 12.4 y1 = 12.0 y1 = 10.7
2 28.0 14 y2 = 11.7 y2 = 11.8 y2 = 10.2
3 32.5 24 y3 = 12.4 y3 = 12.3 y3 = 10.5
4 39.0 22 y4 = 10.8 y4 = 11.5 y4 = 9.8
5 45.9 8 y5 = 9.4 y5 = 9.1 y5 = 9.5
6 57.8 16 y6 = 9.5 y6 = 9.2 y6 = 8.9
7 58.1 1 y7 = 8.0 y7 = 8.5 y7 = 8.5
8 62.5 0 y8 = 7.5 y8 = 7.2 y8 = 8.0
b0 = 13.1087 b0 = 12.949 b0 = 11.593
b1 = -.09001 b1 = -.0882 b1 = -.0548
b2 = .08249 b2 = .0876 b2 = .0256
y = b0 + b1 x1 + b2 x2 + ... + bk x k + e
SSE
s2 =
n − (k + 1)
SSE
s=
n − (k + 1)
50 REGRESSION ANALYSIS
i =1
y = b0 + b1 x1 + b2 x2 + e
81.7
X ′ y = 3413.11
1157.40
It follows that
81.7
b′ X ′ y = 13.1087 −.09001 .08249 3413.11
1157.40
= 13.1087(81.7 ) + ( −.09001)3413.11) + (.08249)(1157.40)
= 859.236
Simple and Multiple Regression: An Integrated Approach 51
Furthermore, the eight observed fuel consumptions (see Table 2.1) can
be used to calculate
∑y
i =1
1
2
= y12 + y22 + ... + y82
8
SSE = ∑ yi2 − b′ X ′ y
i =1
= 859.91 − 859.236
= .674
and a point estimate of s is s = .428 = .6542. Here, SSE = 2.57, s 2 = .428, and s
SSE = 2.57, s 2 = .428, and s = .6542 are given on the Minitab output in Figure 2.11.
52 REGRESSION ANALYSIS
Moreover, notice that s = .3671 for the model using both the average hourly
temperature and the chill index is less than s = .6542 for the model using
only the average hourly temperature. Therefore, we have evidence that the
two independent variable model will give more accurate predictions of
future weekly fuel consumptions
y
( yi − y^i ) = unexplained
yi deviation
( yi − y ) y^i = b0 + b1xi
= total deviation
( y^i − y ) = explained
deviation
y
Least
squares
line
x
xi
n n n
∑( y
i =1
i − y )2 = ∑ ( y i − y )2 + ∑ ( yi − y i )2
i =1
∧
i =1
∧
The sum of the squared total deviations, ∑ ( yi − y )2, is called the total
variation and measures the variation of the yi values around their mean y .
The sum of the squared explained deviations, ∑ ( y∧ i − y )2, is called the
explained variation and measures the amount of the total variation that is
explained by the linear regression model. The sum of the squared unex-
plained deviations, ∑ ( yi − y∧ i )2 , is called the unexplained variation (this
is another name for SSE) and measures the amount of the total variation
that is left unexplained by the linear regression model. We now define the
coefficient of determination, denoted by R 2, to be the ratio of the explained
variation to the total variation. That is R 2 = (explained variation)/(total
variation), and we say that R 2 is the proportion of the total variation in
the n observed values of y that is explained by the linear regression model.
Neither the explained variation nor the total variation can be negative
54 REGRESSION ANALYSIS
Explained variation
R2=
Total variation
At the end of this section we will discuss some special facts about the
coefficient of determination, R 2, when using the simple linear regression
model. When using a multiple linear regression model (a model with
more than one independent variable), we sometimes refer to R 2 as the
multiple coefficient of determination, and we define the multiple correlation
coefficient to be R = R 2 . For example, consider the fuel consumption
model y = b0 + b1 x1 + b2 x2 + e .
Simple and Multiple Regression: An Integrated Approach 55
8 ∑y i
∑ yi 2 = 859.91 b′ X ′ y = 859.236 y =
i =1
i =1
8
= 10.2125
8 8
Unexplained variation = SSE = ∑ ( yi − yi )2 = ∑y
∧
i
2
− b′ X ′ y
i =1 i =1
8 8
Total variation = ∑( y
i =1
i − y )2 = ∑ yi 2 − 8 y 2
i =1
= 859.91 − 8(10.21225)2
= 25.549
or
8
Explained variation = ∑ ( y i − y )2
∧
i =1
= b′ X ′ y − 8 y 2
= 859.236 − 8(10.2125)2 = 24.875
The Minitab output in Figure 2.10 tells us that the total, explained, and
unexplained variations for this model are, respectively, 25.549, 24.875, and
.674. This output also tells us that the multiple coefficient of determination is
2.4.2 Adjusted R2
Adjusted R2
The adjusted coefficient of determination (adjusted R 2) is
k n −1
R 2 = R2 −
n − 1 n − k − 1
Simple and Multiple Regression: An Integrated Approach 57
k n −1
R 2 = R2 −
n − 1 n − k − 1
2 8 −1
= .974 −
8 − 1 8 − 2 − 1
= .963
k n −1
r 2 = r2 −
n − 1 n − k − 1
1 8 −1
= .899 −
8 − 1 8 − 1 − 1
= .883
These quantities are shown on the Minitab output in Figure 2.11. They
are not as large as the R 2 of .974 and the R 2 of .963 given by the regression
model that uses both the average hourly temperature and the chill index
as predictor variables. We next define the simple correlation coefficient as
follows.
where b1 is the slope of the least squares line relating y to x. This correla-
tion coefficient measures the strength of the linear relationship between y
and x.
x x x
r = − r 2 = − .899 = −.948
This simple correlation coefficient says that x and y have a strong ten-
dency to move together in a linear fashion with a negative slope. We
have seen this tendency in Figure 2.1, which indicates that y and x are
negatively correlated.
If we have computed the least squares slope b1 and r 2, the method
given in the previous box provides the easiest way to calculate r. The sim-
ple correlation coefficient can also be calculated using the formula
SS xy
r=
SS xx SS yy
SS xy −179.6475
r= = = −.948
SS xx SS yy (1404.355)(25.549)
60 REGRESSION ANALYSIS
It is important to point out that high correlation does not imply that a
cause-and-effect relationship exists. When r indicates that y and x are highly
correlated, this says that y and x have a strong tendency to move together
in a straight-line fashion. The correlation does not mean that changes in
x cause changes in y. Instead, some other variable (or variables) could be
causing the apparent relationship between y and x. For example, sup-
pose that college students’ grade point averages and college entrance exam
scores are highly positively correlated. This does not mean that earning a
high score on a college entrance exam causes students to receive a high
grade point average. Rather, other factors such as intellectual ability, study
habits, and attitude probably determine both a student’s score on a col-
lege entrance exam and a student’s college grade point average. In general,
while the simple correlation coefficient can show that variables tend to
move together in a straight-line fashion, scientific theory must be used to
establish cause-and-effect relationships.
y = b0 + b1 x1 + b2 x2 + ... + bk xk + e
H0 : b1 = b2 = ... = bk = 0
which says that no overall regression relationship exists, versus the alter-
native hypothesis
(Explained variation)/ k
F (model) =
(Unexplained variation) /[n − (k + 1)]
y = b0 + b1 x1 + b2 x2 + e
The Minitab output in Figure 2.10 tells us that the explained and unex-
plained variations for this model are, respectively, 24.875 and .674. It
follows, since there are k = 2 independent variables, that
(Explained variation) / k
F (model) =
(Unexplained variation)/[n - (k + 1)]
Simple and Multiple Regression: An Integrated Approach 63
a = The probability
1−a of a type I error
F[a]
p− value
F(model)
24.875 / 2 12.438
= =
.674 /[8 − (2 + 1)] .135
= 92.30
H 0 : b1 = b2 = … = bk = 0
versus
(Explained variation) / k
F (model) =
(Unexplained variation) /[n - (k + 1)]
Here the rejection point F[a ] is the point on the horizontal axis under
the curve of the F distribution having k numerator and n − (k + 1)
denominator degrees of freedom so that the tail area to the right of this
point is a .
2.6 Individual t Tests
Consider the linear regression model
y = b0 + b1 x1 + b2 x2 + ... + bk xk + e
Simple and Multiple Regression: An Integrated Approach 67
sb j = s c jj
sb j = s c jj
bj bj − 0
t= =
sb j sb j
This test statistic measures the distance between b j and zero (the value that
makes the null hypothesis H 0 : b j = 0 true). If the absolute value of t is
large, this implies that the distance between b j and zero is large and pro-
vides evidence that we should reject H 0 : b j = 0. Before discussing how
large in absolute value t must be in order to reject H 0 : b j = 0 at level of
significance a , we first show how to calculate this test statistic.
Example 2.6
y = b0 + b1 x1 + b2 x2 + e
Simple and Multiple Regression: An Integrated Approach 69
column
row 0 1 2
0 5.43405 − .085930 − .118856
−1
( X ′ X ) = 1 −.085930 .00147070 .00165094
2 −.118856 .00165094 .00359276
c00
= c11
c 22
It can be shown that, if the regression assumptions hold, then the pop-
( )
ulation of all possible values of b j − b j / sb j is described by a probabil-
ity distribution called the t-distribution. The curve of the t distribution
is symmetrical and bell-shaped and centered at zero (see Figure 2.16),
and the spread of this curve is determined by a parameter called the
number of degrees of freedom of the t-distribution. The t-distribution
(
describing the population of all possible values of b j − b j / sb j has )
70 REGRESSION ANALYSIS
Table 2.7 Calculations of the standard errors of the b j values and the
t-Statistics for testing H 0 : b0 = 0, H 0 : b1 = 0, and H 0 : b2 = 0 in
the fuel consumption model y = b0 + b1 x1 + b2 x 2 + f
bj
Independent bj sbj = s c jj t= p - value
sbj
variable
13.1087
Intercept b0 = 13.1087 sb0 = s c00 t= = 15.32 .000
.8557
= .3671 5.434
= .8557
−.09001
x1 b1 = −.09001 sb1 = s c11 t= = − 6.39 .001
.01408
= .3671 .00147
= .01408
.08249
x2 b2 = .08249 sb2 = s c 22 t= = 3.75 .013
.0220
= .3671 .0036
= .00220
−t [a /2] 0 t [a /2]
−t 0 t
y = b0 + b1 x1 + b2 x2 + e
bj
t =
sb j
Here the rejection point t[a /2 ] is the point on the horizontal axis under
the curve of the t-distribution having n − (k + 1) degrees of freedom
so that the tail area to the right of this point is a / 2 .
We have seen in Section 2.5 that the intercept b0 is the mean value
of the dependent variable when all of the independent variables
x1 , x2 ,…, x k equal zero. In some situations it might seem logical that
b0 would equal zero. For example, if we were using the simple linear
regression model y = b0 + b1 x + e to relate x, the number of items pro-
cessed at a naval installation, to y, the number of labor hours required to
process the items, then it might seem logical that b0, the mean number
of hours required to process zero items, is zero. Therefore, if we fail to
reject H 0 : b0 = 0 and cannot conclude that the intercept is significant at
the .05 level of significance, it might be reasonable to set b0 equal to zero
and remove it from the regression model. This would give us the model
y = b1 x + e , and we would say that we are performing a regression anal-
ysis through the origin. We will give some specialized formulas for doing
this in Section 2.9. In general, to perform a regression analysis through
the origin in (multiple) linear regression (that is, to set the intercept b0
equal to zero), we would fit the model by leaving the column of 1’s out
of the X matrix. However, in general, logic seeming to indicate that b0
equals zero can be faulty. For example, the intercept b0 in the model
y = b0 + b1 x + e relating the number of items processed to processing
time might represent a mean basic “set up” time to process any number
of items. This would imply that b0 might not be zero. In fact, many
statisticians (including the authors) believe that leaving the intercept in
a regression model will give the model more “modeling flexibility” and
is appropriate, no matter what the t test of H 0 : b0 = 0 says about the
significance of the intercept.
We next consider how to calculate a confidence interval for a regres-
sion parameter.
b j ± t[a /2 ]sb
j
76 REGRESSION ANALYSIS
y = b0 + b1 x1 + b2 x2 + e
This interval says we are 95 percent confident that if average hourly tem-
perature increases by one degree and the chill index does not change,
then mean weekly fuel consumption will decrease by at least .0538 MMcf
of natural gas and by at most .1262 MMcf of natural gas. Furthermore,
since this 95 percent confidence interval does not contain 0, we can reject
H 0 : b1 = 0 in favor of Η a : b1 ≠ 0 at the .05 level of significance.
To conclude this subsection, note that because we calculate the least
squares point estimates by using the matrix algebra equation b = (X ′ X )-1 X ′ y ,
the least squares point estimate b j of b j is a linear function of y1 , y2 ,..., yn .
For this reason, we call the least squares point estimate b j a linear point
estimate (which, since mb j = b j , is also an unbiased point estimate) of b j. An
important theorem called the Gauss-Markov Theorem says that if regres-
sion assumptions 1, 2, and 4 hold, then the variance (or spread around b j)
of all possible values (from all possible samples) of the least squares point
estimate b j is smaller than the variance of all possible values of any other
unbiased, linear point estimate of b j. This theorem is important because it
says that the actual value of the least squares point estimate b j that we obtain
from the actual sample we observe is likely to be nearer the true b j than
would be the actual value of any other unbiased, linear point estimate of b j
(we prove the Gauss-Markov Theorem in Sections B.6 and B.9).
Simple and Multiple Regression: An Integrated Approach 77
b0 b
t = and t = 1
sb0 sb1
where
1 x2 s
sb0 = s c00 = s + and sb1 = s c11 =
n SS xx SSxx
(Explained variation)/k
F (model) =
(Unexplained variation)/[n − (kk + 1)]
(Explained variation)
=
(Unexplained variation)/n − 2
The Minitab output in Figure 2.11 gives t = b1 / sb1, F(model), and the
corresponding p-value, which Minitab says is .000 (meaning less than
.001). It follows that we can reject H 0 : b1 = 0 in favor of H a : b1 ≠ 0 at
the .001 level of significance. Therefore, we have extremely strong evi-
dence that x (average hourly temperature) is significantly related to y in
the simple linear regression model.
r n−2
t=
1− r2
x − m 2 x − mx y − my
1 1
f ( x, y ) = exp − x
− 2 r s s +
2psx s y 1 − r2 2(1 − r ) sx
2
x y
x − m 2 x − mx y − my y − my
2
1 1
exp − − 2 r s s + s
x
2(1 − r ) sx
2
sx s y 1 − r2 x y y
r n−2 −.948 8 − 2
t= = = −7.3277
1− r 2
1 − ( −.948)2
Minitab output in Figure 2.11. Moreover, the p-value for both tests is
the same, and the Minitab output tells us that this p-value is less than
.001. It follows that we can reject H 0 : r = 0 in favor of H a : r ≠ 0 at the
.001 level of significance. Therefore, we have extremely strong evidence of
a nonzero population correlation coefficient between the average hourly
temperature and weekly fuel consumption. In Chapter 4 we will use tests
of population correlation coefficients between the dependent variable and
the potential independent variables and between just the potential inde-
pendent variables themselves to help us “build” an appropriate regression
model.
To conclude this section, note that it can be shown that for large sam-
ples (n ≥ 25), an approximate 100 (1 − a ) percent confidence interval for
(1 / 2 )ln[(1 + r ) / (1 − r )] is
1 1+ r 1
ln ± z[a /2 ]
2 1− r n−3
e 2 a − 1 e 2b − 1
e 2 a + 1 , e 2b + 1
Note that, in calculating the first interval, z[a /2 ] is the point on the
horizontal axis under the curve of the standard normal distribution so
that the tail area to the right of this point is a / 2. Table A3 in Appendix A
is a table of areas under the standard normal curve. For example, suppose
that the sample correlation coefficient between the productivities and
aptitude test scores of n = 250 word processing specialists is .84. To find
a 95 percent confidence interval for (1 / 2)ln[(1 + r ) / (1 − r )], we use z[.025].
Because the standard normal curve tail area to the right of z[.025] is .025,
the standard normal curve area between 0 and z[.025] is .5 − .025 = .475.
Looking up .475 in the body of Table A3, we find that z[.025] = 1.96.
Therefore, the desired confidence interval is
Simple and Multiple Regression: An Integrated Approach 81
1 1 + r 1 1 1 + .84 1
ln ± z[.025] = ln ± 1.96 = [1.0965, 1.3459]
2 1− r n − 3 2 1 − .84 250 − 3
1 + .84 1
ln ± 1.96 = [1.0965, 1.3459]
1 − .84 250 − 3
e 2(1.0965) − 1 e 2(1.3459 ) − 1
e 2(1.0965) + 1 , e 2(1.3459 ) + 1 = [.80, .87 ]
∧
y = b0 + b1 x01 + b2 x02 + ... + bk x0k
is
the mean value of the dependent variable y when the values of the
independent variables are x01 , x02 , ..., x0 k .
2. The point prediction of
∧
the point estimate and point prediction y . Unless we are extremely lucky,
∧
the value of y that we calculate using the sample we observe will not
exactly equal the mean value of y or an individual value of y. Therefore, it
is important to calculate a confidence interval for the mean value of y and a
prediction interval for an individual value of y . Both of these intervals are
based on a quantity called the distance value. We first define this quantity,
show how to calculate it, and explain its intuitive meaning. Then, we
find the confidence interval and prediction interval based on the distance
value.
Example 2.7
∧
y = b0 + b1 x01 + b2 x02
= 13.1087 − .09001( 40.0) + .08249(10)
= 10.333 MM Mcf of natural gas
is the point estimate of the mean fuel consumption when x1 equals 40 and
x2 equals 10, and is the point prediction of the individual fuel consump-
tion in a single week when x1 equals 40 and x2 equals 10. To calculate the
and
1 1
x 0 = x01 = 40
x02 10
x2
25 (32.5, 24)
(x01, x02) = (30, 18)
(39.0, 22)
The distance between
20 (x01, x02) = (30, 18) and
(28.0, 18) (x1, x2) = (43.98, 12.88)
(57.8, 16)
15
(28.0, 14)
(x1, x2) = (43.98, 12.88)
10 (x01, x02) = (40, 10)
The distance (45.9, 8)
between
(x01, x02) = (40, 10)
5 and
(x1, x2) = (43.98, 12.88)
(58.1, 1) (62.5, 0)
x1
20 30 40 50 60 70
point ( x01 , x02 ) = ( 40, 10) and the point ( x1 , x2 ) = ( 43.98, 12.88) is the
distance in two-dimensional space between these points. It can be shown
that the distance value x 0′ ( X ′ X )−1 x 0 = .2157 is reflective of this distance.
That is, in general, the greater the distance is between a point ( x01 , x02 )
and the center ( x1 , x2 ) = ( 43.98, 12.88) of the experimental region, the
greater is the distance value. For example, Figure 2.17 shows that the dis-
tance between the point ( x01 , x02 ) = (30, 18) and ( x1 , x2 ) = ( 43.98, 12.88)
is greater than the distance between the point ( x01 , x02 ) = ( 40, 10)
and ( x1 , x2 ) = ( 43.98, 12.88). Consequently, the distance value corre-
sponding to the point ( x01 , x02 ) = (30, 18), which is calculated using
x 0′ = [1 x01 x02 ] = [1 30 18] and equals x 0′ (X ′ X )−1 x 0 = .2701 , is greater
than the distance value corresponding to the point ( x01 , x02 ) = ( 40, 10),
which is calculated using x 0′ = [1 x01 x02 ] = [1 40 10] and equals .2157.
In general, let x01 , x02 , ..., x0k be the values of the independent vari-
ables x1 , x 2 ,…, x k for which we wish to estimate the mean value of the
dependent variable and predict an individual value of the dependent vari-
able. Also, define the center of the experimental region to be the point
( x1 , x2 ,..., xk ), where x1 is the average of the previously observed x1 values,
x2 is the average of the previously observed x2 values, and so forth. Then,
Simple and Multiple Regression: An Integrated Approach 85
it can be shown that the greater the distance is (in k-dimensional space)
between the point x01 , x02 , ..., x0 k and ( x1 , x2 ,..., xk ), the greater is the dis-
tance value x 0′ (X ′ X )−1 x 0 , where x 0′ = [1 x01 x02 ... x0k ].
It can also be shown (see Section B.7) that, if the regression assump-
tions hold, then the population of all possible values of the point esti-
∧
mate y = b0 + b1 x01 + b2 x02 + ... + bk x0k is normally distributed with mean
my| x01 , x02 ,...., x0 k and standard deviation s ∧y = s Distance value . Since the
standard error s is the point estimate of σ, the point estimate of s y∧ is
∧
s ∧y = s Distance value, which is called the standard error of the estimate y .
Using this standard error, we can form a confidence interval. Note that
the t[a /2 ] point used in the confidence interval (and in the prediction
interval to follow) are based on n − (k + 1) degrees of freedom.
y∧ ± t s Distance value
[a /2 ]
y∧ ± t s 1 + Distance value
[a /2 ]
∧
Comparing the formula [ y ± t[a /2 ]s Distance value ] for a con-
fidence interval for the mean value my| x01 , x02 ,...., x0 k with the formula
∧
[ y ± t[a / 2 ] s 1 + Distance value ] for a prediction interval for an individ-
ual value y = my|x01 ,x02 ,..., x0 k + e , we note that the formula for the prediction
interval has an “extra 1” under the radical. This makes the prediction
interval longer than the confidence interval. Intuitively, the reason for
the extra 1 under the radical is that, although we predict the error term
∧
to be zero when computing the point prediction y of an individual value
y = my|x01 ,x02 ,..., x0 k + e , the error term will probably not be zero. The extra 1
under the radical accounts for the added uncertainly that the error term
causes, and thus the prediction interval is longer. Also, note the larger the
distance value is, the longer are the confidence interval and the prediction
interval. Said another way, when (x01 , x02 , ..., x0 k ) is farther from the cen-
ter of the observed data, y∧ = b0 + b1 x01 + b2 x02 + ... + bk x0k is likely to be
less accurate as a point estimate and point prediction.
Before considering an example, consider the simple linear regression
∧
model y = b0 + b1 x + e . For this model y = b0 + b1 x0 is the point esti-
mate of the mean value of y when x is x0 and is the point prediction of
an individual value of y when x is x0. Therefore, since 1 is multiplied by
∧
b0 and x0 is multiplied by b1 in the expression y = b0 + b1 x0 , it follows that
x 0′ = [1 x01 ]. If we use x 0′ to calculate the distance value, it can be shown
that
1 ( x0 − x )2
Distance value = x 0′ (X ′ X )−1x 0 = +
n SSxx
Simple and Multiple Regression: An Integrated Approach 87
Example 2.8
∧
y = 13.1087 − .09001x01 + .08249 x02
= 13.1087 − .09001( 40) + .08249(10)
= 10.333 MMcf of natural gas
This interval says that we are 95 percent confident that the amount of
fuel consumed in a single week (next week) when the average hourly tem-
perature is 40°F and the chill index is 10 will be between 9.293 MMcf of
natural gas and 11.374 MMcf of natural gas.
88 REGRESSION ANALYSIS
∧
The point prediction y = 10.333 of next week’s fuel consumption
would be the natural gas company’s transmission nomination (order of
natural gas from the pipeline transmission service) for next week, This
point prediction is the midpoint of the 95 percent prediction interval,
[9.293, 11.374], for next week’s fuel consumption. As previously calcu-
lated, the half-length of this interval is 1.04, and the 95 percent predic-
tion interval can be expressed as [10.333 ± 1.04]. Therefore, since 1.04 is
(1.04/10.333)100% = 10.07% of the transmission nomination of 10.333,
the model makes us 95 percent confident that the actual amount of natural
gas that will be used by the city next week will differ from the natural gas
company’s transmission nomination by no more than 10.07 percent. That
is, we are 95 percent confident that the natural gas company’s percentage
nomination error will be less than or equal to 10.07 percent. It follows
that this error will probably be within the 10 percent allowance granted
by the pipeline transmission system, and it is unlikely that the natural gas
company will be required to pay a transmission fine.
The bottom of the Minitab output in Figure 2.10 gives the point esti-
∧
mate and prediction y = 10.333, along with the just calculated confidence
and prediction intervals. Moreover, although the Minitab output does not
directly give the distance value, it does give s∧y = s Distance value under
the heading “SE Fit.” Specifically, since the Minitab output tells us that
s y∧ equals .170 and also tells us that s equals .3671, the Minitab output tells
us that the distance value equals ( s y∧ / s )2 = (.170 / .3671)2 = .2144515.
The reason that this value differs slightly from the value calculated using
matrices is that the values of s y∧ and s on the Minitab output are rounded.
In order to use the simple linear regression model y = b0 + b1 x + e
to predict next week’s fuel consumption on the basis of just the aver-
age hourly temperature of 40°F, recall from Example 2.2 that
b0 = 15.84, b1 = − .1279, x = 43.98, and SSxx = 1404. 355. Also recall
from Section 2.3 that s = .6542. The simple linear regression model’s point
∧
prediction of next week’s fuel consumption is y = 15.84 - .1279(40) =
10.72 MMcf of natural gas. Furthermore, we compute the distance value
to be (1 / n ) + ( x0 − x )2 / SSxx = (1 / 8) + ( 40 − 43.98) / 1404.355 = .1362
2
15
14 Regression
95%CI
13 95%PI
12
11
Fuelcons
10
9
8
7
6
30 35 40 45 50 55 60 65
Temp
by the flow meter. If we consider fitting the simple linear regression model
y = b0 + b1 x + e to these data, we find that x = 5.5, y = 5.45, SSxy = 74.35,
and SSxx = 82.5. This implies that b1 = SSxy / SSxx = 74.35 / 82.5 = .9012
and b0 = y − b1 x = 5.45 − .9012(5.5) = .4934. Moreover, we find that
SSE = .0608, s 2 = SSE / (n − 2 ) = .0608 / (10 − 2 ) = .0076, s = .0076 = .0872, sb 1 = s / SSxx = .08
s = .0076 = .0872, sb 1 = s / SSxx = .0872 / 82.5 = .0096 , and the t statistic for test-
ing H 0 : b1 = 0 is t = b1 / sb1 = .9012 / .0096 = 93.87. The inverse predic-
tion problem asks us to predict the x value that corresponds to a particular
y value. That is, sometime in the future the liquid soap production line will
be in operation, we will make a meter reading y of the flow rate and we
would like to know the actual flow rate x. The point prediction of and a 100
(1 − a ) percent prediction interval for x are as follows.
Inverse Prediction
If the regression assumptions are satisfied for the simple linear regres-
sion model, then
1 ( x∧ − x− ) − d
x∧ L = x− +
1− c2
1 ( x∧ − x− ) + d
x∧U = x− +
1− c2
t[a /2] s (n + 1) ( x∧ − x− ) (t[a /2 ] )2 s 2
d = (1 − c ) +
2
and c = 2
2
b1 n SS xx b1 SSxx
where t = b1 / sb1 . To use the prediction interval, we require that t > t[a /2 ] ,
which implies that c < 1, c 2 < 1, and (1 − c 2 ) in the prediction interval
formula is greater than zero and less than one. For example, suppose that we
wish to have a point prediction of and a 100 (1 − a ) % = 95% prediction
interval for the actual flow rate x that corresponds to a meter reading of y = 4.
The point prediction of x is x∧ = ( y − b0 ) / b1 = ( 4 − .4934) / .9012 = 3.8910.
Moreover, t[a /2 ] = t[.025] (based on n - 2 = 10 - 2 = 8 degrees of freedom)
is 2.306. Because t has been previously calculated to be 93.87 and because
t = 93.87 > 2.306 = t[.025] we can calculate a 95 percent prediction
interval for x as follows:
∧ 1
x U = 5.5 + (3.8910 − 5.5)
.9994
2.306(.0872) 11 (3.8910 − 5.5)2
+ (.9994 ) +
.9012 10 82.5
1
= 5.5 + ( −1.6090 + .2373) = 4.1274
.9994
∧ 1
x L = 5.5 + ( −1.6090 − .2373) = 3.6526
.9994
Therefore, we are 95 percent confident that the actual flow rate when the
meter reading is y = 4 is between 3.6526 and 4.1274.
i =1
2.11 Exercises
Exercise 2.1
DATA TERR;
INPUT Sales Time MktPoten Adver Mktshare Change;
DATALINES;
3669.88 43.10 74065.11 4582.88 2.51 .34
3473.95 108.13 58117.30 5539.78 5.51 .15 Sales territory
performance
→
data (See
2799.97 21.14 22809.53 3552.00 9.14 −.74 Table 2.5)
. 85.42 35182.73 7281.65 9.64 .28
PROC PRINT;
PROC REG DATA = TERR;
MODEL Sales = Time MktPoten Adver MktShare Change/P CLM CLI;
4.3 1 4 .20
5.5 1 12 66 3.6
5 .20
y= X= X ′ X = 66 378 19.8
3.6 19.8 1.16
6.5 1 7 .40
12
i =1
(c) −y = 5.50833; Explained variation = b ′X ′y − n y− 2 = 31.12417
12
(d) Total variation = ∑ yi2 − n y− 2 = 32.46917 .
i =1 −2
(e) R 2 = .958576; also calculate R .
(f ) F(model) = 104.13; also test H 0 : b1 = b2 = 0 by setting a equal to
.05 and using a rejection point; what does the test tell you?
(g) sb = s c00 = .69423, t = b0 / sb = .96; sb = s c11 = .09981, t = b1 / sb = 13.19; sb = s c
0 0 1 1 2
Exercise 2.2
Recall that Figure 2.11 is the SAS output of a regression analysis of the
sales territory performance data in Table 2.5 by using the model
y = b0 + b1 x1 + b2 x2 + b3 x3 + b4 x 4 + b5 x5 + e
Simple and Multiple Regression: An Integrated Approach 95
(a) Show how F(model) = 40.91 has been calculated by using other
quantities on the output. The SAS output tells us that the p-value
related to F(model) is less than .0001. What does this say?
(b) The SAS output tells us that the p-values for testing the significance
of the independent variables Time, MktPoten, Adver, MktShare, and
Change are, respectively, .0065, < .0001, .0025, < .0001, and .0530.
Interpret what these p-values say. Note: Although the p-value of
.0530 for testing the significance of Change is larger than .05, we
will see in Chapter 4 that retaining Change ( x2 ) in the model makes
the model better.
(c) Consider a questionable sales representative for whom Time =
85.42, MktPoten = 35,182.73, Adver = 7281.65, MktShare =
9.64, and Change = .28. In Example 2.5 we have seen that the
point prediction of the sales corresponding to this combination of
∧
values of the independent variables is y = 4182 (that is, 418,200
∧
units). In addition to giving y = 4182, the SAS output tells us that
sy∧ = s Distance value (shown under the heading “Std Error Pre-
dict”) is 141.8220. Since the SAS output also tells us that s for the
sales territory performance model equals 430.23188, the distance
value equals ( s ∧y / s )2 = (141.8220/430.23188)2 = .109. Specify
what row vector x 0′ SAS used to calculate the distance value by the
∧
matrix algebra expression x 0′ ( X ′X )−1 x 0 . Then, use y , the distance
value, s, and t[.025] based on n − (k + 1) = 25 − (5 + 1) = 19
degrees of freedom to verify that (within rounding) the 95 per-
cent prediction interval for the sales corresponding to the ques-
tionable sales representative’s values of the independent variables
is [3234, 5130]. This interval is given on the SAS output. Recall-
ing that the actual sales for the questionable representative were
3082, why does the prediction interval provide strong evidence
that these actual sales were unusually low?
Exercise 2.3
i =1 i =1
the matrix algebra
n
formulan
b = (X ′ X )−1 X ′y gives the least squares point
estimate b1 = ∑ xi yi / ∑ xi2 of b 1.
i =1 i =1
CHAPTER 3
More Advanced
Regression Models
y = β0 + β1 x + β 2 x 2 + ε
where
Example 3.1
my my my
x x x
(a) (c) (e)
my my my
x x x
(b) (d) (f)
effects of this additive, mileage tests are carried out in a laboratory using
test equipment that simulates driving under prescribed conditions. The
amount of additive ST-3000 blended with the gasoline is varied, and the
gasoline mileage for each test run is recorded. Table 3.1 gives the results of
the test runs. Here the dependent variable y is gasoline mileage (in miles
per gallon, mpg) and the independent variable x is the amount of additive
ST-3000 used (measured as the number of units of additive added to each
gallon of gasoline). One of the study’s goals is to determine the number
of units of additive that should be blended with the gasoline to maximize
gasoline mileage. The company would also like to predict the maximum
mileage that can be achieved using additive ST-3000.
Figure 3.2 gives a scatter plot of y versus x. Since the scatter plot has
the appearance of a quadratic curve (that is, part of a parabola), it seems
reasonable to relate y to x by using the quadratic model
y = β0 + β1 x + β 2 x 2 + ε
∧
y = 25.7152 + 4.9762 x − 1.01905x 2
This is the equation of the best quadratic curve that can be fitted to the
data plotted in Figure 3.2. The MINITAB output also tells us that the
p-values related to x and x 2 are less than .001. This implies that we have
very strong evidence that each of these model components is significant.
The fact that x 2 seems significant confirms the graphical evidence that
there is a quadratic relationship between y and x. Once we have such
confirmation, we usually retain the linear term x in the model no mat-
ter what the size of its p-value. The reason is that geometrical consider-
ations indicate that it is best to use both x and x 2 to model a quadratic
relationship.
The oil company wishes to find the value of x that results in the highest
predicted mileage. Using calculus, it can be shown that the value x = 2.44
maximizes predicted gas mileage. Therefore, the oil company can maximize
More Advanced Regression Models 101
Analysis of Variance
Source DF SS MS F P
Regression 2 67.915 33.958 414.92 0.000
Residual Error 12 0.982 0.082
Total 14 68.897
Fit SE Fit 95% CI 95% PI
31.7901 0.1111 (31.5481, 32.0322) (31.1215, 32.4588)
Example 3.2
y
10.0
9.5
9.0
8.5
8.0
7.5
7.0
x4
−.2 −.1 0 .1 .2 .3 .4 .5 .6 .7 .8
10.0
9.5
9.0
8.5
8.0
7.5
7.0
x3
5.0 5.5 6.0 6.5 7.0 7.5 8.0
y = b0 + b1 x 4 + b2 x3 + b3 x32 + b4 x 4 x3 + e
y1 7.38
y2 8.51
y = y3 = 9.52
y 9.26
30
More Advanced Regression Models 105
1 x4 x3 x32 x 4 x3
1 − .05 5.50 (5.50) 2
( −.05)(5.50)
2
1 .25 6.75 (6.75) (.25)(6.75)
X = 1 .60 7.25 (7.25)2 (.60)(7.25)
1 .55 6.80 (6.80)2 (.55)(6.80)
1 x4 x3 x32 x 4 x3
1 − .05 5.50 30.25 − .275
1 .25 6.75 45.5625 1.6875
= 1 .60 7.25 52.5625 4.35
1 .55 6.80 46.24
3.74
b = (X ¢ X)-1 X ¢ y
Figure 3.6 presents the SAS output obtained by using the interaction model
to perform a regression analysis of the Fresh demand data. This output
shows that each of the p-values for testing the significance of the intercept
and the independent variables is less than .05. Therefore, we have strong
106
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 4 12.39419 3.09855 72.78 <.0001
Error 25 1.06440 0.04258
Corrected Total 29 13.45859
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > │t│
Intercept Intercept 1 29.11329 7.48321 3.89 0.0007
x4 PriceDif 1 11.13423 4.44585 2.50 0.0192
x3 AdvExp 1 -7.60801 2.46911 -3.08 0.0050
x3SQ x3 ** 2 1 0.67125 0.20270 3.31 0.0028
x4x3 x4 * x3 1 -1.47772 0.66716 -2.21 0.0361
Dep Var Predicted Std Error
Obs Demand Value Mean Predict 95% CL Mean 95% CL Predict
31 . 8.3272 0.0563 8.2112 8.4433 7.8867 8.7678
Figure 3.6 SAS output of a regression analysis of the Fresh demand data using the interaction
model y = b0 + b1 x 4 + b2 x 3 + b3 x 32 + b4 x 4 x 3 + f
More Advanced Regression Models 107
evidence that the intercept and each of x 4 , x3 , x32 , and x 4 x3 are significant.
In particular, since the p-value related to x 4 x3 is .0361, we have strong evi-
dence that the interaction variable x 4 x3 is important. This confirms that the
interaction between x 4 and x3 that we suspected really does exist.
Suppose that Enterprise Industries wishes to predict demand for Fresh
in a future sales period when the price difference will be $.20 (x 4 = .20) and
when the advertising expenditure for Fresh will be $650,000 (x3 = 6.50).
Using the least squares point estimates in Figure 3.6, the needed point pre-
diction is
∧
y = 29.11329 + 11.13423(.20) − 7.60801(6.50) + .67125(6.50)2
− 1.47772(.20)(6.50)
= 8.3272 (832,720 bottles)
This point prediction is given on the SAS output of Figure 3.6, which
also tells us that the 95% confidence interval for mean demand when
x 4 equals .20 and x3 equals 6.50 is [8.2112, 8.4433] and that the 95%
prediction interval for an individual demand when x 4 equals .20 and x3
equals 6.50 is [7.8867, 8.7678]. Here, since
x′0 = [1 .20 6.50 (6.50)2 (.20)(6.50)] = [1 .20 6.50 42.25 1.3]
This interval says that we are 95 percent confident that the actual demand
in the future sales period will be between 788,670 bottles and 876,780
bottles. The upper limit of this interval can be used for inventory con-
trol. It says that if Enterprise Industries plans to have 876,780 bottles on
hand to meet demand in the future sales period, then the company can
be very confident that it will have enough bottles. The lower limit of the
108 REGRESSION ANALYSIS
∧
y = 29.11329 + 11.13423 x 4 − 7.60801x3 + .67125x32 − 1.47772 x 4 x3
obtained from the least squares point estimates in Figure 3.6. Also, con-
sider the six combinations of price difference x 4 and advertising expendi-
ture x3 obtained by combining the x 4 values .10 and .30 with the x3 values
6.0, 6.4, and 6.8. When we use the prediction equation to predict the
demands for Fresh corresponding to these six combinations, we obtain
∧
the predicted demands ( y ) shown in Figure 3.7(a) (Note that we con-
sider two x 4 values because there is a linear relationship between y and x 4,
and we consider three x3 values because there is a quadratic relationship
between y and x3). Now
∧
1. If we fix x3 at 6.0 in Figure 3.7(a) and plot the corresponding y values
7.86 and 8.31 versus the x 4 values .10 and .30, we obtain the two
squares connected by the lowest line in Figure 3.7(b). Similarly, if we
∧
fix x3 at 6.4 and plot the corresponding y values 8.08 and 8.42 versus
the x 4 values .10 and .30, we obtain the two squares connected by the
middle line in Figure 3.7(b). Also, if we fix x3 at 6.8 and plot the cor-
∧
responding y values 8.52 and 8.74 versus the x 4 values .10 and .30, we
obtain the two squares connected by the highest line in Figure 3.7(b).
∧
Examining the three lines relating y to x 4, we see that the slopes of
these lines decrease as x3 increases from 6.0 to 6.4 to 6.8. This says that
as the price difference x 4 increases from .10 to .30 (that is, as Fresh
becomes less expensive compared to its competitors), the rate of increase
∧
of predicted demand y is slower when advertising expenditure x3 is
higher than when advertising expenditure x3 is lower. Moreover, this
might be logical because it says that when a higher advertising expendi-
ture makes more customers aware of Fresh’s cleaning abilities and thus
causes customer demand for Fresh to be higher, there is less opportu-
nity for an increased price difference to increase demand for Fresh.
More Advanced Regression Models 109
(a) (b)
^y
x4 8.8
y^ when x3 = 6.8
x3 .10 .30 8.6
(c)
^y
9.00
^y
8.75 when x4 = .30
8.50
8.25
8.00
y^ when x4 = .10
x3
6.0 6.2 6.4 6.6 6.8 7.0
∧
2. If we fix x 4 at .10 in Figure 3.7(a) and plot the corresponding y val-
ues 7.86, 8.08, and 8.52 versus the x3 values 6.0, 6.4, and 6.8, we
obtain the three squares connected by the lower quadratic curve in
Figure 3.7(c). Similarly, if we fix x 4 at .30 and plot the corresponding
∧
y values 8.31, 8.42, and 8.74 versus the x3 values 6.0, 6.4, and 6.8,
we obtain the three squares connected by the higher quadratic curve
in Figure 3.7(c). The nonparallel quadratic curves in Figure 3.7(c)
say that as advertising expenditure x3 increases from 6.0 to 6.4 to
∧
6.8, the rate of increase of predicted demand y is slower when the
price difference x 4 is larger (that is, x 4 = .30) than when the price
difference x 4 is smaller (that is, x 4 = .10). Moreover, this might be
logical because it says that when a larger price difference causes cus-
tomer demand for Fresh to be higher, there is less opportunity for
an increased advertising expenditure to increase demand for Fresh.
110 REGRESSION ANALYSIS
y = β0 + β1 x1 + β 2 x2 + β3 x1 x2 + ε
Example 3.3
Suppose that Electronics World, a chain of stores that sells audio and video
equipment, has gathered the data in Table 3.3. These data concern store
sales volume in July of last year ( y, measured in thousands of dollars), the
number of households in the store’s area (x, measured in thousands), and
the location of the store (on a suburban street or in a suburban shopping
mall—a qualitative independent variable). Figure 3.8 gives a data plot of
y versus x. Stores having a street location are plotted as solid dots, while
stores having a mall location are plotted as asterisks. Notice that the line
relating y to x for mall locations has a higher y-intercept than does the
line relating y to x for street locations.
260
240
220
Mall location
200
m = ( b0 + b2)+ b1x
180
160 Street location
m = b0 + b1x
140
120
100
80
60
b0 + b2
b0
x
0
20 40 60 80 100 120 140 160 180 200 220 240 260
Figure 3.8 Plot of the sales volume data and a geometrical interpre-
tation of the model y = b0 + b1 x + b2 D M + f
In order to model the effects of the street and shopping mall locations, we
define a dummy variable denoted DM as follows:
y = β0 + β1 x + β 2 DM + ε
b0 + b1 x + b2 DM = b0 + b1 x + b2 (0)
= b0 + b1 x
More Advanced Regression Models 113
b0 + b1 x + b2 DM = b0 + b1 x + b2 (1)
= ( b0 + b2 ) + b1 x
In addition to the data concerning street and mall locations in Table 3.3,
Electronics World has also collected data concerning downtown locations.
The complete data set is given in Table 3.4 and plotted in Figure 3.9. Here,
stores having a downtown location are plotted as open circles. A model
describing these data is
y = b0 + b1 x + b2 DM + b3 DD + e
y
260
240
220
200
Mall location
180
m = ( b0 + b2) + b1x Downtown location
160 m = ( b0 + b3) + b1x
140 Street location
120 m = b0 + b1x
100
80
60
b0 + b2
b0 + b3
b0
x
0
20 40 60 80 100 120 140 160 180 200 220 240 260
It follows that
β0 + β1 x + β 2 DM + β3 DD = β0 + β1 x + β 2 (0) + β3 (0)
= β0 + β1 x
β0 + β1 x + β 2 DM + β3 DD = β0 + β1 x + β 2 (1) + β3 (0)
= ( β0 + β 2 ) + β1 x
β0 + β1 x + β 2 DM + β3 DD = β0 + β1 x + β 2 (0) + β3 (1)
= ( β0 + β3 ) + β1 x
b0 14.978
b .8686
1 = b = ( X ¢ X )-1 X ¢ y =
b2 28.374
b3 6.864
To compare the effects of the street, shopping mall, and downtown locations,
consider comparing three means, which we denote as mh ,S , mh , M , and mh ,D.
116 REGRESSION ANALYSIS
1 x DM D D
157.27 1 161 0 0
93.28 1 99 0 0
136.81 1 135 0 0
123.79 1 120 0 0
153.51 1 164 0 0
241.74 1 221 1 0
201.54 1 179 1 0
y= 206.71 X= 1 204 1 0
229.78 1 214 1 0
135.22 1 101 1 0
224.71 1 231 0 1
195.29 1 206 0 1
242.16 1 248 0 1
115.21 1 107 0 1
197.82 1 205 0 1
Figure 3.10 The column vector y and matrix X using the data in
Table 3.4 and the model y = b0 + b1 x + b2 D M + b3 D D + f
These means represent the mean sales volumes at stores having h households
in the area and located on streets, in shopping malls, and downtown, respec-
tively. If we set x = h, it follows that
and
µh , M − µh ,S = ( β0 + β1h + β 2 ) − ( β0 + β1h ) = β 2
which is the difference between the mean sales volume for stores in mall
locations having h households in the area and the mean sales volume for
stores in street locations having h households in the area. Figure 3.11 gives
the MINITAB output of a regression analysis of the data in Table 3.4 by
using the dummy variable model. The output tells us that the least squares
point estimate of β 2 is b2 = 28.374. This says that for any given number
More Advanced Regression Models 117
This interval says we are 95 percent confident that for any given number
of households in a store’s area, the mean monthly sales volume in a mall
location is between $18,554 and $38,193 greater than the mean monthly
sales volume in a street location. The MINITAB output also shows that
the t-statistic for testing H 0 : β 2 = 0 versus H a : β 2 ≠ 0 equals 6.36 and
that the related p-value is less than .001. Therefore, we have very strong
evidence that there is a difference between the mean monthly sales vol-
umes in mall and street locations.
In order to compare downtown and street locations, we look at
µh ,D − µh ,S = ( β0 + β1h + β3 ) − ( β0 + β1h ) = β3
118 REGRESSION ANALYSIS
Since the MINITAB output in Figure 3.11 tells us that b3 = 6.864, we esti-
mate that for any given number of households in a store’s area, the mean
monthly sales volume in a downtown location is $6,864 greater than the
mean monthly sales volume in a street location. Furthermore, since the
output tells us that sb3 = 4.770, a 95 percent confidence interval for b3 is
This says we are 95 percent confident that for any given number of house-
holds in a store’s area, the mean monthly sales volume in a downtown
location is between $3,636 less than and $17,363 greater than the
mean monthly sales volume in a street location. The MINITAB output
also shows that the t-statistic and p-value for testing H 0 : β3 = 0 versus
H a : β3 ≠ 0 are t = 1.44 and p-value = .178. Therefore, we do not have
strong evidence that there is a difference between the mean monthly sales
volumes in downtown and street locations.
In order to compare mall and downtown locations, we look at
µh , M − µh ,D = ( β0 + β1h + β 2 ) − ( β0 + β1h + β3 ) = β 2 − β3
This says that for any given number of households in a store’s area we
estimate that the mean monthly sales volume in a mall location is $21,510
greater than the mean monthly sales volume in a downtown location.
There are two approaches for calculating a confidence interval for
µh , M − µh ,D and for testing the null hypothesis H 0 : µh , M − µh ,D = 0.
Because µh , M − µh ,D equals the linear combination b2 − b3 of the β j ’s in
the model y = β0 + β1 x + β 2 DM + β3 DD + ε , one approach shows how
to make statistical inferences about a linear combination of β j ’s. This
approach is discussed in Section 3.5. The other approach, discussed near
the end of this section, involves specifying an alternative dummy variable
More Advanced Regression Models 119
∧
y = b0 + b1 (200) + b2 (1) + b3 (0)
= 14.978 + .8686(200) + 28.374(1)
= 217.007
In modeling the sales volume data we might consider using the model
y = β0 + β1 x + β 2 DM + β3 DD + β 4 xDM + β5 xDD + ε
120 REGRESSION ANALYSIS
2. for a mall location, mean sales volume equals (since DM = 1 and DD= 0)
(a)
y
260
240
220
200 Mall location
m = ( b0 + b2) + ( b1 + b4)x
180
160
140 Street location
m = b0 + b1x
120
100
80 Downtown location
60 m = ( b0 + b3) + ( b1 + b5)x
b0 + b2
b0 + b3
b0
x
0
20 40 60 80 100 120 140 160 180 200 220 240 260
(b)
Root MSE 6.79953 R-Square 0.9877
Dependent Mean 176.98933 Adj R-Sq 0.9808
Coeff Var 3.841777
Parameter Standard
Variable Estimate Error t Value Pr > │t│
Intercept 7.90042 17.03513 0.46 0.6538
x 0.92070 0.12343 7.46 <.0001
DM 42.72974 21.50420 1.99 0.0782
DD 10.25503 21.28319 0.48 0.6414
XDM -0.09172 0.14163 -0.65 0.5334
XDD -0.03363 0.13819 -0.24 0.8132
SAS output tells us that the p-values related to the significance of xDM
and xDD are large—.5334 and .8132, respectively. Therefore, these inter-
action terms do not seem to be important. In addition, the SAS output
tells as that the standard error s for the interaction model is s = 6.79953,
which is larger than the s of 6.34941 for the no-interaction model
y = β0 + β1 x + β 2 DM + β3 DD + ε (see Figure 3.11). It follows that the
no-interaction model, which is sometimes called the parallel slopes model,
seems to be the better model describing the sales volume data. Recall that
122 REGRESSION ANALYSIS
This says we are 95 percent confident that for any given number of house-
holds in a store’s area, the mean monthly sales volume in a mall loca-
tion is between $12,563 and $30,457 greater than the mean monthly
sales volume in a downtown location. The Excel output also shows that
the t-statistic and p-value for testing the significance of µh , M − µh ,D are,
respectively, 5.29 and 0.000256. Therefore, we have very strong evidence
More Advanced Regression Models 123
that there is a difference between the mean monthly sales volumes in mall
and downtown locations.
y = β0 + β1 x + β 2 DM + β3 DD + ε
describes the sales volume data better than does the interaction (or
unequal slopes) model
y = b0 + b1 x + b2 DM + b3 DD + b4 xDM + b5 xDD + e
The reasons for this decision were that the no-interaction model has the
smaller standard error s and the p-values related to the significance of xDM
and xDD in the interaction model are large—.5334 and .8132—indicat-
ing that these interaction terms are not important. Another way to decide
which of these models is best is to test the significance of the interaction
portion of the interaction model. We do this by testing the null hypothesis
H 0 : b4 = b5 = 0
which says that neither of the interaction terms significantly affects sales
volume, versus the alternative hypothesis
124 REGRESSION ANALYSIS
which says that at least one of the interaction terms significantly affects
sales volume.
In general, consider the regression model
y = β0 + β1 x1 + ... + β g x g + β g +1 x g +1 + ... + β k xk + ε
H 0 : β g +1 = β g + 2 = ... = β k = 0
SSE R − SSEC
More Advanced Regression Models 125
H 0 : β g +1 = β g + 2 = ... = β k = 0
versus
(SSE R − SSEC ) / (k − g )
F =
SSEC /[n − (k + 1)]
Also define the p-value related to F to be the area under the curve of
the F distribution [having k − g and n − (k + 1) degrees of freedom]
to the right of F . Then, we can reject H 0 in favor of H a at level of
significance a if either of the following equivalent conditions holds:
1. F > F[α ]
2. p-value < α
(SSE R − SSEC ) / (k − g )
F=
SSEC /[n − (k + 1)]
are equivalent. It can also be shown that in this case the p-value related to
t equals the p-value related to F .
Example 3.4
Although the partial SAS output in Figure 3.12 (b) does not show the
unexplained variation for this complete model, SAS can be used to show
that this unexplained variation is 416.1027. That is, SSEC = 416.1027.
If the null hypothesis H 0 : β 4 = β5 = 0 is true, the complete model
becomes the following reduced model:
Reduced Model: y = b0 + b1 x + b2 DM + b3 DD + e
More Advanced Regression Models 127
(SSE R − SSEC ) / (k − g )
F=
SSEC /[n − (k + 1)]
( 443.4650 − 416.1027 ) / 2
416.1027 / (15 − 6)
= .2959
y = β0 + β1 x + β 2 DM + β3 DD + ε
H 0 : µh ,S = µh , M = µh , D
128 REGRESSION ANALYSIS
which says that the street, mall, and downtown locations have the same
effects on mean sales volume (no differences between locations).
To carry out this test we consider the following:
Complete model: y = β0 + β1 x + β 2 DM + β3 DD + ε
b2 = mh , M − mh ,S and b3 = mh ,D − mh ,S
which says that at least two locations have different effects on mean sales
volume, is equivalent to
Reduced model: y = b0 + b1 x + e
For this model the unexplained variation is SSE R = 2467.8067. Noting that
two parameters ( b2 and b3 ) are set equal to 0 in the statement of H 0 : β 2 = β3 = 0
H 0 : β 2 = β3 = 0 , we have k − g = 2. Therefore, the needed partial F-statistic is
(SSE R − SSEC ) / (k − g )
F=
SSEC /[n − (k + 1)]
(2467.8067 − 443.4650) / 2
=
4433.4650 /[15 − 4]
= 25.1066
More Advanced Regression Models 129
SSE R − SSEC
R 2 ( x g +1 ,…, xk | x1 ,…, x g ) =
SSE R
=the proportion of the unexplained
variation in the reduced model that
is explained by the extra independent
variables in the complete model
2. The partial coefficient of correlation is
SSE R − SSEC
R 2 ( DM , DD | x ) =
SSE R
2467.8067 − 443.4650
=
2467.8067
= .8206
y = b0 + b1 x + b2 DM + b3 DD + e
In general, let
l = l0 b0 + l1 b1 + l2 b2 + … + lk bk
∧
l = l0b0 + l1b1 + l2b2 + … + lk bk
s∧ = s m ′( X ′ X )−1m
l
sl∧ = s m ′ ( X ′ X ) m
−1
∧ ∧
l l
t=s =
s m′ ( X ′ X ) m
∧ −1
l
l∧ l∧ −1
± t[a /2] sl∧ = ± t [a /2]s m ′ ( X ′ X ) m
Example 3.5
y = b0 + b1 x + b2 DM + b3 DD + e
Since we have seen in Example 3.3 that the least squares point estimates
of β 2 and β3 are b2 = 28.374 and b3 = 6.864, the point estimate of
l = b2 − b3 is
∧
l = b2 − b3 = 28.374 − 6.864 = 21.51
132 REGRESSION ANALYSIS
Noting that
it follows that
m′ = 0 0 1 − 1
and
0
0
m=
1
−1
This says that we are 95 percent confident that for any given number
of households in a store’s area the mean monthly sales volume in a mall
location is between $12,563 and $30,457 greater than the mean monthly
sales volume in a downtown location.
We next point out that almost all of the SAS regression outputs we
have looked at to this point were obtained by using a SAS procedure
called PROC REG. This procedure will not carry out statistical infer-
ence for linear combinations of regression parameters (such as b2 − b3).
However, another SAS procedure called PROC GLM (GLM stands
for “General Linear Model”) will do this. Figure 3.14 gives a partial
More Advanced Regression Models 133
Figure 3.14 Partial SAS PROC GLM output for the model
y = b0 + b1 x + b2 D M + b3 D D + f
and
e ( β0 + β1 x )
p( x ) =
1 + e ( β0 + β1 x )
e(
−3.7456 +1.1109(5))
∧ 6.1037
p (5) = = = .8593
( −3.7456 +1.1109(5)) 1 + 6.1037
1+ e
∧
That is, p (5) = .8593 is the point estimate of the probability that a house-
hold receiving a coupon having a price reduction of $50 will redeem the
coupon. The middle of the right side of Table 3.5 gives the values of
∧
p (x) for x = 1, 2, 3, 4, 5, and 6.
The general logistic regression model relates the probability that an
event (such as redeeming a coupon) will occur to k independent variables
x1 , x2 , . . . , xk . This general model is
e ( b0 + b1 x1 + b2 x2 +…+ bk xk )
p( x1 , x2 ,…, xk ) =
1 + e ( b0 + b1 1 + b2
x x2 +…+ bk xk )
More Advanced Regression Models 137
e ( β0 + β1 x1 + β2 x2 )
p( x1 , x2 ) =
1 + e ( β0 + β1 x1 + β2 x2 )
0.8
0.6
Group
0.4
0.2
0.0
80 85 90 95 100
Test 1
1.0
0.8
0.6
Group
0.4
0.2
0.0
70 75 80 85 90 95
Test 2
(79, 77), (88, 75), (81, 85), (85, 83), (82, 72), (82, 81), (81, 77),
(86, 76), (81, 84), (85, 78), (83, 77), and (81, 71). The source of
the data for this example is Dielman (1996), and F igure 3.15 shows
scatterplots of Group versus x1 (the score on test 1) and Group versus
x2(the score on test 2).
The MINITAB output in Figure 3.16 tells us that the point estimates
of b0 , b1 , and b2 are b0 = −56.17, b1 = .4833, and b2 = .1652. Consider,
therefore, a potential employee who scores a 93 on test 1 and an 84 on
test 2. It follows that a point estimate of the probability that the potential
employee will perform successfully in that position is
e(
−56.17 + .4833(93) + .1652 (84 ))
∧ 14.206506
p(93, 84) = = = .9342
( −56.17 + .4833(93)+ .1652(84)) 15.206506
1+ e
More Advanced Regression Models 139
Odds 95% CI
Predictor Coef SE Coef Z P Ratio LOwer Upper
Constant -56.1704 17.4516 -3.22 0.001
Test 1 0.483314 0.157779 3.06 0.002 1.62 1.19 2.21
Test 2 0.165218 0.102070 1.62 0.106 1.18 0.97 1.44
Log-Likelihood = -13.959
Test that all slopes are zero: G = 31.483, DF = 2, p-value = 0.000
1
Like the curve of the F-distribution, the curve of the chi-square distribution is skewed with a
tail to the right. The exact shape of a chi-square distribution curve is determined by the (single)
number of degrees of freedom associated with the chi-square distribution under consideration.
140 REGRESSION ANALYSIS
p( x1 , x2 )
odds =
1 − p( x1 , x2 )
e ( b0 + b1 x1 + b2 x2 +…+ bk xk )
p( x1 , x2 ,…, xk ) =
1 + e ( b0 + b1 1 + b2
x x2 +…+ bk xk )
More Advanced Regression Models 141
e ( b0 + b1 x1 + b2 x2 +…+ bk xk )
1−
1 + e(
b0 + b1 x1 + b2 x2 +…+ bk xk )
1 + e(
b0 + b1 x1 + b2 x2 +…+ bk xk )
− e ( b0 + b1 x1 + b2 x2 +…+ bk xk )
=
1 + e ( b0 + b1 1 + b2
x x2 +…+ bk xk )
1
= ( b0 + b1 x1 + b2 x2 +…+ bk xk )
1+ e
( )
b0 + b1 x1 + b2 x2 +…+ b j x j +1 +…+ bk xk
e
b0 + b1 x1 + b2 x2 +…+ b j x j +…+ bk xk
e
e
b0 + b1 x1 + b2 x2 +…+ b j −1 x j −1 + b j +1 x j +1 +…+ bk xk b j x j +1
e
( )
= b + b x + b x +…+ b x j −1 + b j +1 x j +1 +…+ bk xk b j x j
e 0 1 1 2 2 j −1
e
b j ( x j +1) − bj x j
= e e
=e
( )
b j x j +1 − b j x j
bj
=e
142 REGRESSION ANALYSIS
b
This says that e j is the point estimate of the odds ratio for x j , which is the
proportional change in the odds that is associated with a one unit increase
in x j when the other independent variables stay constant. Also, note that
the natural logarithm of the odds is ( β0 + β1 x1 + β 2 x2 + … + β k x k ),
which is called the logit. If b0 , b1 , b2 ,…, bk are the point estimates of
β0 , β1 , β 2 ,…, β k , the point estimate of the logit, denoted by lg, is
(b0 + b1 x1 + b2 x2 + … + bk xk ). It follows that the point estimate of the
probability that the event will occur is
DATA DETR;
INPUT Y X4 X3 DB DC;
X3SQ = X3*X3;
X43 = X4*X3;
X3DB = X3*DB;
X3DC = X3*DC;
DATALINES;
7.38 -0.05 5.50 1 0
8.51 0.25 6.75 1 0
9.52 0.60 7.25 1 0
7.50 0.00 5.50 0 0
9.33 0.25 . 7.00 0 1
.
.
9.26 0.55 6.80 0 1
. 0.20 6.50 0 1 } Future sales period
PROC REG;
MODEL Y = X4 X3 X3SQ X43 DB DC/P CLM CLI;
T1: TEST DB=0, DC=0;}Performs partial F test of H0 : b5 = b6 = 0
PROC GLM;
MODEL Y = X4 X3 X3SQ X43 DB DC/P CLI;
ESTIMATE ‘MUDAB-MUDAA’ DB 1; }Estimates b5
ESTIMATE ‘MUDAC-MUDAA’ DC 1; }Estimates b6
ESTIMATE ‘MUDAC-MUDAB’ DB -1 DC 1; }Estimates b6−b5
PROC REG;
MODEL Y = X4 X3 X3SQ X43 DB DC X3DB X3DC/P CLM CLI;
T2: TEST DB=0, DC=0, X3DB=0, X3DC=0;}
Tests H0 : b5 = b6 = b7 = b8 = 0
data;
input Group Test1 Test2;
datalines;
1 96 85
1 96 85 Note: The 0’s (unsuccessful employees)
. must be a “higher number” than the
. 1’s (successful employees) when using SAS.
1 87 82 So we used 2’s to represent the
2 93 74 unsuccessful employees.
2 90 84
.
.
2 81 71
. 93 84
. 85 82
proc logistic;
model Group = Test1 Test2;
output out=results P=PREDICT L=CLLOWER U=CLUPPER;
proc print;
Figure 3.19 SAS program for performing logistic regression using the
performance data
More Advanced Regression Models 145
3.8 Exercises
Exercise 3.1
2 108.933 156.730
5 124.124 175.019
8 129.756 183.748
146 REGRESSION ANALYSIS
y^ y^
190 190 y^ when x1 = 22
180 y^ when x2 = 8 180
170 170
160 y^ when x2 = 5 160
150 150
140 140
130 130
120 y^ when x2 = 2 120
110 110 y^ when x1 = 13
100 x1 100 x2
12 14 16 18 20 22 2 3 4 5 6 7 8
Exercise 3.2
(a) Discuss why the data plot on the side of this exercise part indicates
that the model y = b0 + b1 x + b2 DS + e might
appropriately describe the obtained data. Here, DS
Months
Exercise 3.3
Recall from Example 3.2 that Enterprise Industries has observed the
historical data in Table 3.2 concerning y(demand for Fresh liquid laun-
dry detergent), x 4(the price difference), and x3 (Enterprise Industries’
advertising expenditure for Fresh). To ultimately increase the demand
for Fresh, Enterprise Industries’ marketing department is comparing
the effectiveness of three different advertising campaigns. These cam-
paigns are denoted as campaigns A, B, and C. Campaign A consists
entirely of television commercials, campaign B consists of a balanced
mixture of television and radio commercials, and campaign C consists
of a balanced mixture of television, radio, newspaper, and magazine ads.
To conduct the study, Enterprise Industries has randomly selected one
advertising campaign to be used in each of the 30 sales periods in Table
3.2. Although logic would indicate that each of campaigns A, B, and
C should be used in 10 of the 30 sales periods, Enterprise Industries
has made previous commitments to the advertising media involved in
the study. As a result, campaigns A, B, and C were randomly assigned
to, respectively, 9, 11, and 10 sales periods. Furthermore, advertising
was done in only the first three weeks of each sales period, so that the
carryover effect of the campaign used in a sales period to the next sales
period would be minimized, Table 3.7 lists the campaigns used in the
sales periods.
To compare the effectiveness of advertising campaigns A, B, and C, we
define two dummy variables. Specifically, we define the dummy v ariable
148 REGRESSION ANALYSIS
y = b0 + b1 x 4 + b2 x3 + b3 x 32 + b4 x 4 x3 + b5 DB + b6 DC + e
b0 + b1 x 4 + b2 x3 + b3 x 32 + b4 x 4 x3 + b5 DB + b6 DC
it follows that
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 6 13.06502 2.17750 127.25 <.0001
Error 23 0.39357 0.01711
Corrected Total 29 13.45859
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t value Pr > │t│
Intercept Intercept 1 25.61270 4.79378 5.34 <.0001
X4 X4 1 9.05868 3.03170 2.99 0.0066
X3 X3 1 -6.53767 1.58137 -4.13 0.0004
X3SQ X3 ** 2 1 0.58444 0.12987 4.50 0.0002
X4X3 X4 * X3 1 -1.15648 0.45574 -2.54 0.0184
DB DB 1 0.21369 0.06215 3.44 0.0022
DC DC 1 0.38178 0.06125 6.23 <.0001
Dep Var Predicted Std Error
Obs Y Value Mean Predict 95% CL Mean 95% CL Predict
31 . 8.5007 0.0469 8.4037 8.5977 8.2132 8.7881
Figure 3.21 SAS PROC REG output of a regression analysis of the fresh demand data
using the model y = b0 + b1 x 4 + b 2 x 3 + b3 x 32 + b4 x 4 x 3 + b5 D B + b6 DC + f
More Advanced Regression Models 149
150 REGRESSION ANALYSIS
Parameter Estimates
Parameter Standard
Variable Estimate Error t value Pr > │t│
Intercept 25.82638 4.79456 5.39 <.0001
X3 -6.53767 1.58137 -4.13 0.0004
X4 9.05868 3.03170 2.99 0.0066
X3SQ 0.58444 0.12987 4.50 0.0002
X4X3 -1.15648 0.45574 -2.54 0.0184
DA -0.21369 0.06215 -3.44 0.0022
DC 0.16809 0.06371 2.64 0.0147
Figure 3.22 SAS PROC REG output for the fresh demand model
y = b0 + b1 x 4 + b 2 x 3 + b3 x 32 + b4 x 4 x 3 + b5 D B + b6 DC + f
and
m[ d ,a ,C ] − m[ d ,a , A ] = b6 and m[ d ,a ,C ] − m[ d ,a ,B ] = b6 − b5
(a) Use the least squares point estimates of the model parameters to find
a point estimate of each of the three differences in means. Also, find
a 95 percent confidence interval for and test the significance of each
of the first two differences in means.
(b) The prediction results at the bottom of the SAS output correspond
to a future period when the price difference will be x 4 = .20 , the
advertising expenditure x3 = 6.50, and campaign C will be used.
∧
Show how y = 8.5007 is calculated. Identify and interpret a 95
percent confidence interval for the mean demand and a 95 p ercent
More Advanced Regression Models 151
y = b0 + b1 x 4 + b2 x3 + b3 x 32 + b4 x 4 x3 + b5 DA + b6 DC + e
y = b0 + b1 x 4 + b2 x3 + b3 x 32 + b4 x 4 x3 + b5 DB + b6 DC + b7 x3 DB + b8 x3 DC + e
+ b2 x3 + b3 x 32 + b4 x 4 x3 + b5 DB + b6 DC + b7 x3 DB + b8 x3 DC + e
When there are many independent variables in a model, we
might not be able to trust the p-values to tell us what is import-
ant. This is because of a condition called multicollinearity, which
is discussed in Section 4.1. Note, however, that the p-value for
x3 DC is the smallest of the p-values for the independent variables
DB , DC , x3 DB , and x3 DC . This might be regarded as “some evidence”
that “some interaction” exists between advertising expenditure and
advertising campaign. To further investigate this interaction, note
that the model utilizing x3 DB and x3 DC implies that
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > │t│
Intercept Intercept 1 28.68734 5.12847 5.59 <.0001
X3 X3 1 -7.41146 1.66169 -4.46 0.0002
REGRESSION ANALYSIS
Figure 3.23 Partial SAS PROC REG output for the fresh demand model
y = b0 + b1 x 4 + b 2 x 3 + b3 x 32 + b4 x 4 x 3 + b5 D B + b6 DC + b7 x 3 D B + b8 x 3 DC + f
More Advanced Regression Models 153
In Exercises 3.4 through 3.6 you will perform partial F tests by using the
following three Fresh detergent models:
Model1 : y = b0 + b1 x 4 + b2 x3 + b3 x32 + b4 x 4 x3 + e
b3 x32 + b4 x 4 x3 + e Model 2 : y = b0 + b1 x 4 + b2 x3 + b3 x32 + b4 x 4 x3 + b5 DB + b6 DC + e
b x + b4 x 4 x3 + b5 DB + bModel
2
3 3 e : y = b + b x + b x + b x2 + b x x + b D + b D + b x D + b x D + e
6 DC + 3
0 1 4 2 3 3 3 4 4 3 5 B 6 C 7 3 B 8 3 C
b3 x32 + b4 x 4 x3 + b5 DB + b6 DC + b7 x3 DB + b8 x3 DC + e
The values of SSE for models 1, 2, and 3 are, respectively, 1.0644, .3936,
and .3518.
Figure 3.24 Partial SAS PROC GLM output for the fresh demand
2
model y = b0 + b1 x 4 + b 2 x 3 + b3 x 3 + b4 x 4 x 3 + b5 D B + b6 DC + f
Exercise 3.7 Figure 3.24 presents a partial SAS PROC GLM output
obtained by using the model
y = β0 + β1 x 4 + β 2 x3 + β3 x32 + β 4 x 4 x3 + β5 DB + β 6 DC + ε
Exercise 3.9 Recall from Exercise 3.3 that we have used the Fresh deter-
gent demand model
More Advanced Regression Models 155
y = β0 + β1 x 4 + β 2 x3 + β3 x32 + β 4 x 4 x3 + β5 DB + β 6 DC + β 7 x3 DB + β8 x3 DC + ε
β 4 x 4 x3 + β5 DB + β 6 DC + β 7 x3 DB + β8 x3 DC + ε
(a) Using ( X ′ X ) and X ′ y , show how the least squares point estimates
−1
∧
y = b0 + b1 (.20) + b2 (6.50) + b3 (6.50)2 + b4 (.20)(6.50)
+ b5 (0) + b6 (1) + b7 (6.50)(0) + b8 (6.50)(1)
= 8.5118
The SAS output also tells us that a 95 percent prediction interval for
demand for Fresh in this sales period is [8.2249, 8.7988]. What is
the row vector x 0′ that is used to calculate this prediction interval by
∧
the formula [ y ± t[a / 2 ] s 1 + x ¢0 (X ¢ X)-1 x 0 ]?
(c) D1FF1, DIFF2, DIFF3, and DIFF4 on the SAS output are
DIFF1 = m[ d ,a ,C ] − m[ d ,a , A ] = b6 + b8 (6.2)
DIFF2 = m[ d ,a ,C ] − m[ d ,a , A ] = b6 + b8 (6.6)
DIFF3 = m[ d ,a ,C ] − m[ d ,a ,B ] = b6 − b5 + b8 (6.2) − b7 (6.2)
DIFF4 = m[ d ,a ,C ] − m[ d ,a ,B ] = b6 − b5 + b8 (6.6) − b7 (6.6)
156
(X′X)−1 = −108.8802964 −96.91280936 35.568798578 −2.890482718 14.502515125 0.0050715341 8.4814570462 −0.067312284 −1.279797485
−46.06948829 0.8689183154 11.401693141 −0.669108229 0.0050715341 31.892710178 21.543187306 −4.856450129 −3.282203044
−99.05615848 −57.76335611 28.239059064 −1.994271188 8.4814570462 21.543187306 41.6971799 −3.311305375 −6.410044917
7.1382273495 0.3095802122 −1.781312674 0.1055660411 −0.067313284 −4.856450129 −3.311305375 0.7448064027 0.5069898915
14.985137276 8.7293799345 −4.266453431 0.3003816013 −1.279797485 −3.282203044 −6.410044917 0.5069898915 0.9906624167
Figure 3.25 The matrix X and a partial SAS PROC GLM output using the model
y = b0 + b1 x 4 + b2 x 3 + b3 x 32 + b4 x 4 x 3 + b5 D B + b6 DC + b7 x 3 D B + b8 x 3 DC + f
More Advanced Regression Models 157
∧
l = b6 − b5 + b8 (6.6) − b7 (6.6)
= −.93507 − ( −.48068) + .20349(6.6) − .10722(6.6)
= .18095
l
the model (that is, s = SSE / (n − (k + 1)) ), and m¢ ( X ¢ X ) m for
-1
m¢¢ for DIFF1, DIFF2, and DIFF3. Then, using the fact that m¢ ( X ¢ X ) m
-1
Exercise 3.11 Mendenhall and Sinicich (2011) present data that can
be used to investigate allegations of gender discrimination in the hiring
practices of a particular firm. Of the twenty-eight candidates who applied
for employment at the firm, nine were hired. The combinations of edu-
cation x1, (in years), experience x2, (in years), and gender x3 (a dummy
variable that equals 1 if the potential employee was a male and 0 if the
potential employee was a female) for the nine hired candidates were
(6, 6, 1), (6, 3, 1), (8, 3, 0), (8, 10, 0), (4, 5, 1), (6, 1, 1), (8, 5, 1),
(4, 10, 1), and (6, 12, 0). For the nineteen candidates that were not hired,
the combinations of values of x1 , x2 , and x3 were (6, 2, 0), (4, 0, 1), (4, 1, 0),
(4, 2, 1), (4, 4, 0), (6, 1, 0), (4, 2, 1), (8, 5, 0), (4, 2, 0), (6, 7, 0),
(6, 4, 0), (8, 0, 1), (4, 7, 0), (4, 1, 1), (4, 5, 0), (6, 0, 1), (4, 9, 0)(8, 1, 0),
and (6, 1, 0). If p( x1 , x2 , x3 ) denotes the probability of a poten-
tial employee being hired, and if we use the logistic regression model
p( x1 , x2 , x3 ) = e ( b0 + b1 x1 + b2 x2 + b3 x3 ) /[1 + e ( b0 + b1 x1 + b2 x2 + b3 x3 ) ] to analyze these
data, we find that the point estimates of the model parameters and their
associated p-values (given in parentheses) are
b0 = −14.2483 (.0191), b1 = 1.1549 (.0552 ), b2 , = .9098 (.0341), and
b3 = 5.6037 (.0313).
Time is .758, which says that the Accts values increase as the Time values
increase. Such a relationship makes sense because it is logical that the lon-
ger a sales representative has been with the company the more accounts
he or she handles. Statisticians often regard multicollinearity in a dataset
to be severe if at least one simple correlation coefficient between the inde-
pendent variables is at least .9. Since the largest such simple correlation
coefficient in Figure 4.1 is .758, this is not true for the sales territory
performance data. Note, however, that even moderate multicollinearity
can be a potential problem. This will be demonstrated later using the sales
territory performance data.
Another way to measure multicollinearity is to use variance inflation
factors. Consider a regression model relating a dependent variable y to a
set of independent variables x1 ,...., x j −1 , x j , x j +1 ,..., xk . The variance infla-
tion factor for the independent variable x j in this set is denoted VIF j and
is defined by the equation
1
VIF j =
1 − R j2
output of the t-statistics, p-values, and variance inflation factors for the
sales territory performance model that relates y to all eight independent
variables. The largest variance inflation factor is VIF6 = 5.639. To calculate
VIF6 , SAS first calculates the multiple coefficient of determination for
the regression model that relates x6 to x1 , x2 , x3 , x 4 , x5 , x7 , and x8 to
be R62 = .822673. It then follows that
1 1
VIF6 = = = 5.639
1 − R62 1 − .822673
i =1
x j is not related to the other independent variables x1 ,..., x j −1 , x j +1 ,..., xk
through a multiple regression model that relates x j to x1 ,..., x j −1 , x j +1 ,..., xk ,
then the variance inflation factor VIFj = 1 / (1 − R j2 ) equals 1. In this case
σ b2j = σ 2 / SSx j x j . If R j2 > 0, x j is related to the other independent variables.
This implies that 1 − R j2 is less than 1, and VIF = 1 / (1 − R j2 ) is greater
than 1.Therefore, σ b2j = σ 2 (VIF j ) / SSx j x j is inflated beyond the value of
σ b2j when R j2 = 0. Usually, the multicollinearity between independent
variables is considered (1) severe if the largest variance inflation factor is
greater than 10 and (2) moderately strong if the largest variance inflation
factor is greater than five. Moreover, if the mean of the variance inflation
factors is substantially greater than one (sometimes a difficult criterion
Model Building and Model Diagnostics 163
y = b0 + b1 x 1 + b2 x2 + e
adequately describes this data, then calculating the least squares point esti-
mates is like fitting a plane to the points on the top of the picket fence.
Clearly, this plane would be quite unstable. That is, a slightly different height
of one of the pickets (a slightly different y value) could cause the slant of the
fitted plane (and the least squares point estimates that determine this slant)
to radically change. It follows that when strong multicollinearity exists, sam-
pling variation can result in least squares point estimates that differ substan-
tially from the true values of the regression parameters. In fact, some of the
least squares point estimates may have a sign (positive or negative) that dif-
fers from the sign of the true value of the parameter (we will see an example
of this in the exercises). Therefore, when strong multicollinearity exists, it is
dangerous to individually interpret the least squares point estimates.
x2
x1
.0025, .0001, and .0530. Note that Time (p-value = .0065) seems highly
significant and Change (p-value = .0530) seems somewhat significant in
the five-independent-variable model. However, when we consider the
model that uses all eight independent variables, Time (p-value = .3134)
seems insignificant and Change (p-value = .1390) seems somewhat insig-
nificant. The reason that Time and Change seem more significant in the
model with five independent variables is that since this model uses fewer
variables, Time and Change contribute less overlapping information and
thus have additional importance in this model.
SSE
s=
n − (k + 1)
∧
[ y ± t[ a / 2 ] s 1 + Distance value ]
Example 4.1
M M
k k
t t C W R
P A S h A k a
T o d h a c L t
i t v a n c o i
Mallows m e e r g t a n
Vars R-Sq R-Sq(adj) C-P S e n r e e s d g
1 56.8 55.0 67.6 881.09 X
1 38.8 36.1 104.6 1049.3 X
2 77.5 75.5 27.2 650.39 X X
2 74.6 72.3 33.1 691.11 X X
3 84.9 82.7 14.0 545.51 X X X
3 82.8 80.3 18.4 582.64 X X X
4 90.0 88.1 5.4 453.84 X X X X
4 89.6 87.5 6.4 463.95 X X X X
5 91.5 89.3 4.4 430.23 X X X X X
5 91.2 88.9 5.0 436.75 X X X X X
6 92.0 89.4 5.4 428.00 X X X X X X
6 91.6 88.9 6.1 438.20 X X X X X X
7 92.2 89.0 7.0 435.67 X X X X X X X
7 92.0 88.8 7.3 440.30 X X X X X X X
8 92.2 88.3 9.0 449.03 X X X X X X X X
SSE
s2 =
n − (k + 1)
the mean yi value for the k independent variable model. If the k inde-
pendent variable model has been misspecified and the true model
describing yi uses perhaps more independent variables that imply
that the true mean yi value is myi (True), we would want to consider the
expected value of
170 REGRESSION ANALYSIS
This expected value, which is called the mean squared error of the fitted
∧
value yi can be shown to equal
where [ my∧ − myi ( True )] represents the squared bias of the k independent
2
i
∧
variable model and s y∧2 is the variance of yi for the k independent variable
i
∧
model. The total mean squared error for all n fitted yi values is the sum of
the n individual mean squared errors
n n
∑[ m
i =1
∧
yi
− myi ( True )]2 + ∑ s ∧y2
i =1
i
1 n n
Γ= 2 ∑
s i =1
[ m ∧
yi
− m y ( True )]2
+ ∑ sy∧2
i i
i =1
n n n
i
i =1 i =1 i =1
n
Here, it can be proven that ∑ x i′(X ′ X)-1 x i = (k + 1) for a model that uses
=1
k independent variables. It ican also be proven that if SSE denotes the
unexplained variation for the model using k independent variables,
then
n
mSSE = ∑ [ my∧ − myi( True )]2 + [n − (k + 1)]s 2
i
i =1
Model Building and Model Diagnostics 171
∑[ m
i =1
y∧i
− myi ( True )]2 = mSSE − [n − (k + 1)]s 2
1
Γ= m − [n − (k + 1)]s 2 + (k + 1)s 2
s 2 SSE
m
= SSE − [n − 2(k + 1)]
s2
If we estimate mSSE by SSE , the unexplained variation for the model using
k independent variables, and if we estimate s 2 by s 2p , the mean square error
for the model using all p potential independent variables, then the estimate
of Γ for the model using k independent variables is called the C statistic
and is defined by the equation:
SSE
C= − [n − 2(k + 1)]
s 2p
For example, consider the sales territory performance case. It can be ver-
ified that the mean square error for the model using all p = 8 indepen-
dent variables is 201,621.21 and that the SSE for the model using the
first k = 5 independent variables (Model 2 in the previous example) is
3,516,812.7933. It follows that the C-statistic for this latter model is
Since the C-statistic for a given model is a function of the model’s SSE ,
and since we want SSE to be small, we want C to be small. Although
adding an unimportant independent variable to a regression model will
decrease SSE , adding such a variable can increase C . This can happen
when the decrease in SSE caused by the addition of the extra independent
variable is not enough to offset the decrease in n − 2 (k + 1) caused by
172 REGRESSION ANALYSIS
1
That fact that C = p + 1 for the model using all p potential independent variables is not a recom-
mendation for choosing this model as the best model but a consequence of estimating s 2 by s2p,
which means that we are assuming that this model has no bias.
Model Building and Model Diagnostics 173
Accts 21.7 19.0 15.6 9.2 Time 2.0 2.0 2.3 3.6 3.8
T-Value 5.50 6.41 5.19 3.22 T-Value 1.04 1.10 1.34 3.06 3.01
P-Value 0.000 0.000 0.000 0.004 P-Value 0.313 0.287 0.198 0.006 0.007
Adver 0.227 0.216 0.175 MktPoten 0.0372 0.0373 0.0383 0.0421 0.0444
T-Value 4.50 4.77 4.74 T-Value 4.54 4.75 5.07 6.25 6.20
P-Value 0.000 0.000 0.000 P-Value 0.000 0.000 0.000 0.000 0.000
WkLoad 20 20
T-Value 0.59 0.61
P-Value 0.565 0.550
Rating 8
T-Value 0.06
P-Value 0.950
Figure 4.5 MINITAB iterative procedures for the sales territory performance problem
176 REGRESSION ANALYSIS
v ariables can improve a regression model. In Figure 4.6a we present the five
squared variables and the ten (pairwise) interaction variables that can be
formed using Time, MktPoten, Adver, MktShare, and Change. Consider
having MINITAB evaluate all possible models involving these squared and
interaction variables, where the five linear variables are included in each
possible model. If we have MINITAB do this and find the best model of
each size in terms of s, we obtain the output in Figure 4.6b. (Note that
we do not include values of the C statistic on the output because it can be
shown that this statistic can give misleading results when using squared
and interaction variables). Examining the output, we see that the model
that uses 12 squared and interaction variables (or a total of 17 variables,
including the 5 linear variables) has the smallest s (174.6) of any model. If
we desire a somewhat simpler model, note that s does not increase substan-
tially until we move from a model having seven squared and interaction
variables to a model having six such variables. Moreover, we might sub-
jectively conclude that the s of 210.70 for the model using seven squared
and interaction variables is not that much larger than the s of 174.6 for the
model using 12 squared and interaction variables. In addition, if we fit the
model having seven squared and interaction variables to the sales territory
performance data, it can be verified that the p-value for each and every
independent variable in this model is less than .05. Therefore, we might
subjectively conclude that this model represents a good balance between
having a small s, having small p-values, and being simple (having fewer
independent variables). Finally, note that the s of 210.70 for this model
is considerably smaller than the s of 430.23 for the model using only lin-
ear independent variables (see Table 2.5c). This smaller s yields shorter
95 percent prediction intervals, and thus more precise predictions for eval-
uating the performance of questionable sales representatives. For exam-
ple, consider the questionable sales representative discussed in Example
2.5. The 95 percent prediction interval for the sales of this representative
given by the model using only linear variables is [3234, 5130] (see Obs 26
in Table 2.5c), whereas the 95 percent prediction interval for the sales
of this representative given by the seven squared and interaction variable
model in Figure 4.6b is much shorter—[3979.4, 5007.8] (see Obs 26 in
Figure 4.6c).
(a) The five squared variables and the ten (pairwise) interaction variables
SQT = TIME*TIME TC = TIME*CHANGE
SQMP = MKTPOTEN*MKTPOTEN MPA = MKTPOTEN*ADVER
SQA = ADVER*ADVER MPMS = MKTPOTEN*MKTSHARE
SQMS = MKTSHARE*MKTSHARE MPC = MKTPOTEN*CHANGE
SQC = CHANGE*CHANGE AMS = ADVER*MKTSHARE
TMP = TIME*MKTPOTEN AC = ADVER*CHANGE
TA = TIME*ADVER MSC = MKTSHARE*CHANGE
TMS = TIME*MKTSHARE
(b) MINITAB comparisons (note: all models include the 5 linear variables)
S S M
S Q S Q S T T M P M A M
Squared and M
Total Q M Q M Q M T M T P P M A S
interaction T P A S C P A S C A S C S C C
Vars Vars R-Sq R-Sq(adj) S
92.2 365.87 X
6 1 94.2
94.1 318.19 X X
7 2 95.8
94.7 301.61 X X X
8 3 96.5
95.3 285.53 X X X X
9 4 97.0
95.7 272.05 X X X X X
10 5 97.5
96.5 244.00 X X X X X X
11 6 98.1
97.4 210.70 X X X X X X X
12 7 98.7
97.8 193.95 X X X X X X X X
13 8 99.0
98.0 185.44 X X X X X X X X X
14 9 99.2
98.2 175.70 X X X X X X X X X X
15 10 99.3
98.2 177.09 X X X X X X X X X X X
16 11 99.4
98.2 174.60 X X X X X X X X X X X X
17 12 99.5
98.1 183.22 X X X X X X X X X X X X X
18 13 99.5
97.9 189.77 X X X X X X X X X X X X X X
19 14 99.6
97.4 210.78 X X X X X X X X X X X X X X X
20 15 99.6
(c) Predicted sales performance using the seven squared and interaction variable model
Dep Var Predict Std Err Lower95% Upper95% Lower95% Upper95%
Obs SALES Value Predict Mean Mean Predict Predict
26 . 4493.6 106.306 4262.0 4725.2 3979.4 5007.8
Figure 4.6 Sales territory performance model building using squared and
Model Building and Model Diagnostics 177
interaction variables
178 REGRESSION ANALYSIS
y = b0 + b1 x1 + b2 x2 + ... + bk xk + e
are to be valid. The first three regression assumptions say that, at any
given combination of values of the independent variables x1 , x2 ,..., xk , the
population of error terms that could potentially occur
The fourth regression assumption says that any one value of the error
term is statistically independent of any other value of the error term. To
assess whether the regression assumptions hold in a particular situation,
note that the regression model implies that the error term ε is given by
the equation e = y − ( b0 + b1 x1 + b2 x2 + ... + bk xk ). The point estimate of
this error term is the residual
e = y − y∧ = y − ( b0 + b1 x1 + b2 x 2 + ... + bk x k )
∧
where y = b0 + b1 x1 + b2 x2 + ... + bk xk is the predicted value of the depen-
dent variable y . Therefore, since the n residuals are the point estimates
of the n error terms in the regression analysis, we can use the residuals to
check the validity of the regression assumptions about the error terms.
One useful way to analyze residuals is to plot them versus various criteria.
The resulting plots are called residual plots. To construct a residual plot, we
compute the residual for each observed y value. The calculated residuals
are then plotted versus some criterion. To validate the regression assump-
tions, we make residual plots against (1) values of each of the independent
Model Building and Model Diagnostics 179
variables x1 , x2 ,..., xk ; (2) values of y∧, the predicted value of the dependent
variable; and (3) the time order in which the data have been observed (if
the regression data are time series data).
Example 4.2
Figure 4.7 The QHIC data and residuals, and a scatter plot (Continued)
180 REGRESSION ANALYSIS
2000
Upkeep
1000
0
100 200 300
Value
Figure 4.7 The QHIC data and residuals, and a scatter plot
Model Building and Model Diagnostics 181
For instance, for the first home, when y = 1,412.08 and x = 237.00, the
residual is
The MINITAB output in Figure 4.8a and 4.8b gives plots of the residuals
for the QHIC simple linear regression model against values of x (value)
∧
and y (predicted upkeep). To understand how these plots are constructed,
∧
recall that for the first home y = 1,412.08, x = 237.00, y = 1, 371.816,
and the residual is 40.264. It follows that the point plotted in Figure 4.8a
corresponding to the first home has a horizontal axis coordinate of the x
value 237.00 and a vertical axis coordinate of the residual 40.264. It also
follows that the point plotted in Figure 4.8b corresponding to the first
∧
home has a horizontal axis coordinate of the y value 1,371.816, and a ver-
tical axis coordinate of the residual 40.264. Finally, note that the QHIC
data are cross-sectional data, not time series data. Therefore, we cannot
make a residual plot versus time.
300
200 200
100 100
0 0
−100
Residual
−100
Residual
−200 −200
−300 −300
50 100 150 200 250 300 0 400 800 1200 1600
Value Fitted value
REGRESSION ANALYSIS
Residual
Residual
−150 −100
−250 −200
−350 −300
−3.0 −2.0 −1.0 0.0 1.0 2.0 3.0
100 200 300
Normal score
Value
Figure 4.8 Residual analysis for QHIC data models (a) Simple linear regression model residual
Ÿ
plot versus x (value) (b) Simple linear regression model residual plot versus y (predicted upkeep)
(c) Simple linear regression model normal plot (d) Quadratic regression model residual plot versus
x (value)
Model Building and Model Diagnostics 183
between y and x. One remedy for the simple linear regression model’s
violation of the correct functional form assumption is to fit the quadratic
regression model y = b0 + b1 x + b2 x 2 + e to the QHIC data. When we do
this and plot the model’s residuals versus x (value), we obtain the residual
plot in Figure 4.8d. The fact that this residual plot does not have any
curved appearance implies that the quadratic regression model has rem-
edied the violation of the correct functional form assumption. However,
note that the residuals fan out as x increases, indicating that the constant
variance assumption is still being violated.
If we generalize the above ideas to the multiple linear regression model,
we can say that if a residual plot against a particular independent variable
∧
x j or against the predicted value of the dependent variable y has a curved
appearance, then this indicates a violation of regression assumption 1 and
says that the multiple linear regression model does not have the correct
functional form. Specifically, the multiple linear regression model may
need additional squared or interaction variables, or both. To give an illus-
tration of using residual plots in multiple linear regression, consider the
sales territory performance data in Table 2.5a and recall that Table 2.5c
gives the SAS output of a regression analysis of these data using the model
y = β0 + β1 x1 + β 2 x2 + β3 x3 + β 4 x 4 + β5 x5 + ε
The least squares point estimates on the output give the prediction
equation
∧
y = −1113.7879 + 3.6121x1 + .0421x2 + .1289 x3 + 256.9555x 4 + 324.5335x5
+ .0421x2 + .1289 x3 + 256.9555 x 4 + 324.5335x5
860
430
Residual
0
−430
−860
0.0 5,000.0 10,000.0 15,000.0
Advertising
860
430
Residual
0
−430
−860
0 2000 4000 6000 8000
Predicted
1000
500
Residual
0
−500
−1000
−3.0 −2.0 −1.0 0.0 1.0 2.0 3.0
Normal score
and incorrect functional form before making any final conclusions about
the normality assumption.
If the variances σ 12 ,σ 22 ,...,σ n2 of the error terms are unequal and known,
then the variances can be equalized by using the transformed model
yi 1 x x x
= β0 + β1 i1 + β 2 i 2 + ... + β k ik + ηi
σi σi σi σi σi
n
SSE* = ∑ ( yi / si − y i / si )2
∧
i =1
n
= ∑ (1 / si )2 [ yi − y i ]
∧ 2
i =1
n
= ∑ (1 / si )2 [ yi − (b0 + b1 xi1 + b2 xi 2 + … + bk xik )]
2
i =1
yi = b0 + b1 xi1 + b2 xi 2 + … + bk xik + ei
n
SSE W = ∑ wi [ yi − {b0 (w ) + b1 (w ) xi1 + b2 (w ) xi 2 + . . . + bk (w ) xik }]2
i =1
With respect to (2), statisticians have shown that the formula for the
weighted least squares point estimates is
190 REGRESSION ANALYSIS
b0 (w)
b1(w)
b2 (w) = (X ¢ WX)-1 X ¢ Wy
.
:
b (w)
k
Here, y and X are defined in Section 2.2 for the original, untransformed
model, and
w1 0 0
0 w2 0
W=
0 0 wn
Step 2: Plot the residuals from the fitted regression model against each
independent variable. If the residual plot against increasing values of the
independent variable x j fans out, plot the absolute values of the residuals
versus the xij values. If the plot shows a straight line relationship, fit the
simple linear regression model | ei | = b0′ + b1′ xij + ei′ to the absolute val-
ues of the residuals and predict the absolute value of the ith residual to be
Step 3: Use pabei as the point estimate of si and use ordinary least squares
to fit the transformed model
yi 1 x x x
= b0 + b1 i1 + b2 i 2 + ���+ bk ik + hi
pabei pabei pabei pabei pabei
or, equivalently, use weighted least squares to fit the original, untrans-
formed model, where wi = (1 / pabei ) .
2
Note that if in step 2 the plot of the absolute values of the residuals
versus the xij values did not have a straight line appearance, but a plot of
the squared residuals versus the xij values did have a straight line appear-
ance, we would fit the simple linear regression model ei2 = β0′ + β1′ xij + ε i
to the squared residuals and predict the squared value of the ith residual
to be psqei = b0′ + b1′ xij . In this case we estimate σ i2 by psqei and si by
psqei , which implies that we should specify a transformed regression
model by dividing all terms in the original regression model by psqei .
Alternatively, we can fit the original regression model using weighted least
squares where wi = 1 / psqei .
For example, recall that Figure 4.8d shows that when we fit the qua-
dratic regression model y = b0 + b1 x + b2 x 2 + e to the QHIC data, the
model’s residuals fan out as x increases. A plot of the absolute values of the
model’s residuals versus the x values can be verified to have a straight line
appearance. Figure 4.11 shows that when we use the simple linear regres-
sion model to relate the model’s absolute residuals to x, we obtain the
equation pabei = 22.23055 + .49067 xi for predicting the absolute values
of the model’s residuals. For example, because the value x of the first home
in Figure 4.7 is 237, the prediction of the absolute value of the quadratic
model’s residual for home 1 is pabe1 = 22.23055 + .40967 (237 ) = 138.519.
This and the other predicted absolute residuals are shown in Figure 4.11.
Figures 4.12 and 4.13 are the partial SAS outputs that are obtained if we
use ordinary least squares to fit the transformed model
yi 1 xi xi2
= b0 + b + b + hi
pabei
1
pabei
2
pabei pabei
192 REGRESSION ANALYSIS
Parameter Standard
Variable DF Estimate Error t Value Pr > │t│
Parameter Standard
Variable DF Estimate Error t Value Pr > │t│
Figure 4.12 Partial SAS output when using ordinary least squares to
fit the transformed model yi / pabei = b0 (1 / pabei ) + b1 (x i / pabei ) + b2 (x i2 / pabei ) + hi
) + b1 (x i / pabei ) + b2 (x i2 / pabei ) + hi
Parameter Standard
Variable DF Estimate Error t Value Pr > │t│
Figure 4.13 Partial SAS output when using weighted least squares to
fit the original model yi = b0 + b1 x i + b2 x 2i + fi, where wi = (1 / pabei )
2
∧
y0 1 x0 x02
= b0 + b + b
130.178 130.178 1 130.178 2 130.178
1 220
= −41.63220 + 3.23363
130.178 130.178
(220)2
+.01178
130.178
= 9.5252
∧
Figure 4.12 shows that y 0 / 130.178 = 9.5252 . It follows that
y = 9.5252 (130.178) = 1240, which is shown in Figure 4.13 and can be
∧
0
= 1240
∧
Because the point prediction y0 = $1240 of the home’s yearly upkeep
expenditure is at least $500, QHIC will send the home an advertising
brochure. Figure 4.12 also shows that a 95 percent confidence inter-
val for m00 / 130.178 is [9.0045, 10.0459] . It follows that a 95 percent
confidence internal for m0 is 9.0045 (130.178) , 10.0459 (130.178) = $1172, $1308
0.0459 (130.178) = $1172, $1308 , which is shown on the weighted least squares output
in Figure 4.13. Because this interval says that QHIC is 95 percent confi-
dent that m0 is at least $1172, QHIC is more than 95 percent confident
that m0 is at least $1000. Therefore, a home with a value of $220,000 will
also be sent the special, more elaborate advertising brochure.
Scatterplot of square roots of upkeep vs value Scatterplot of quartic roots of upkeep vs value
45 7
7.0
6.5
6.0
5.5
5.0
root transformation best equalizes the error variance and straightens out
the curved data plot in Figure 4.7. Note that the natural logarithm trans-
formation seems to overtransform the data—the error variance tends to
decrease as the home value increases and the data plot seems to bend
down. The plot of the quartic roots indicates that the quartic root trans-
formation also seems to overtransform the data (but not by as much as
the logarithmic transformation). In general, as the fractional power gets
smaller, the transformation gets stronger. Different fractional powers are
best in different situations.
Because the plot in Figure 4.14 of the square roots of the upkeep
expenditures versus the home values has a straight-line appearance, we
consider the model y ∗ = b0 + b1 x + e, where y ∗ = y .5. If we fit this model
to the QHIC data, we find that the least squares point estimates of b0 and
b1 are b0 = 7.201 and b1 = .127047. Moreover, a plot of the transformed
model’s residuals versus x has a horizontal band appearance. Consider a
home worth $220,000. Using the least squares point estimates, a point
prediction of y ∗ for such a home is y∧ ∗ = 7.201 + .127047 (220) = 35.151.
This point prediction is given on the MINITAB output in Figure 4.15,
as is the 95 percent prediction interval for y ∗, which is [30.348, 39.954].
It follows that a point prediction of the upkeep expenditure for a home
worth $220,000 is (35.151)2 = $1,235.59 and that a 95 percent prediction
interval for this upkeep expenditure is [(30.348)2, (39.954)2] = [$921.00,
$1596.32]. Recall that QHIC will send an advertising brochure to any
home that has a predicted upkeep expenditure of at least $500. It follows
that a home worth $220,000 will be sent an advertising brochure. This
is because the predicted yearly upkeep expenditure for such a home is (as
just calculated) $1,235.59. Also, recall that QHIC will send a special,
more elaborate advertising brochure to a home if its value makes QHIC
95 percent confident that m0, the mean yearly upkeep expenditure for
all homes having this value, is at least $1000. We were able to find a 95
percent confidence interval for m0 using the transformed quadratic regres-
sion model of the previous subsection. However, although Figure 4.15
gives a 95 percent confidence interval for the mean of the square roots
of the upkeep expenditures, the mean of these square roots is not equal
to m0 , and thus we cannot square both ends of the confidence interval
in Figure 4.15 to find a 95 percent confidence interval for m0. This is a
Model Building and Model Diagnostics 197
sum the squared deviations. The appropriate set mean of y values for a
particular y value is the mean of all of the y values that correspond to
the same x value as does the particular y value. For the light data, the
optical readings corresponding to the x values 0 and 0 are 2.86 and 2.64,
which have a set mean of (2.86 + 2.64 ) / 2 = 2.75 and associated devia-
tions of 2.86 − 2.75 = .11 and 2.64 − 2.75 = − .11. The optical readings
corresponding to the x values 1 and 1 are 1.57 and 1.24, which have a
set mean of 1.405 and associated deviations of 1.57 − 1.405 = .165 and
1.24 − 1.405 = −.165. The optical readings corresponding to the x values
2 and 2 are .45 and 1.02, which have a set mean of .735 and associated
deviations of -.285 and .285. The optical readings corresponding to the
x values 3 and 3 are .65 and .18, which have a set mean of .415 and asso-
ciated deviations of .235 and -.235. The optical readings corresponding
to the x values 4 and 4 are .15 and .01, which have a set mean of .08 and
associated deviations of .07 and -.07. The optical readings corresponding
to the x values 5 and 5 are .04 and .36, which have a set mean of .20 and
associated deviations of -.16 and .16. The sum of squares due to pure
error for the light data, SSPE , is the sum of the squares of the 12 deviations
that we have calculated and equals .4126. Also, if we fit the simple linear
regression model to the data, we find that SSE , the sum of squared resid-
uals, is 2.3050. In general to perform a lack of fit test, we let the symbol
m denote the number of distinct x values for which there is at least one y
value ( m = 6 for the light data), and we let n denote the total number of
observations (n = 12 for the light data). We then calculate the following
lack of fit statistic, the value of which we show for the light data:
Scatterplot of y vs x
3.0
2.5
2.0
1.5
y
1.0
0.5
0.0
0 1 2 3 4 5
x
Scatterplot of log y vs x
1
0
−1
Log y
−2
−3
−4
−5
0 1 2 3 4 5
x
Note that we use the special symbols b0′ , b1′ , and e ′ to represent the y-
intercept, slope, and the error term in the simple linear regression model
ln y = b0′ + b1′x + e ′ because, although this model is not appropriate, it
can lead us to find an appropriate model. The reason is that the model
ln y = b0′ + b1′x + e ′ is equivalent to the model
′ ′ ′ ′
y = e ( b0 + b1 x + e ′ ) = (e b0 )(e b1 x )(e e ′ )
− b3 x
= b2 e h
models the straight line decreasing pattern in the natural logarithms of the y’s,
the expression β 2 e − β3 x measures the curvilinear (or exponential) decreasing
pattern in the y’s themselves (see the upper plot in F igure 4.16). However,
the error term h = e e′ is multiplied by the expression b2e − b3 x in the model
y = β 2 e − β3 xη . Therefore, this model incorrectly assumes that as x increases
and thus β 2 e − β3 x decreases, the variation in the y’s themselves decreases.
To model the fact that as x increases and thus b2e − b3 x decreases, the varia-
tion of the y’s stays constant (as we can see is true from the upper plot in
Figure 4.16), we can change the multiplicative error h = e e′ to an additive
error term e . In addition, although the upper plot in Figure 4.16 implies
that the mean amount of transmitted light my| x might be approaching zero
as x increases, we will add an additional parameter b1 into the final model
to allow the possibility that my| x might be approaching a nonzero value b1
as x increases. This gives us the final model
y = β1 + β 2 e − β3 x + ε
The final model is not linear in the parameters β1 , β 2 , and β3, and nei-
ther is the previously discussed similar model y = β 2 e − β3 xη . However, by
taking natural logarithms, the model y = b2e − b3 x h can be linearized to
the previously discussed logarithmic model as follows:
ln y = ln( b2 e − b3 x h) = ln b2 + ln e − b3 x + ln(e e ′ )
= ln b2 − b3 x + e ′ = b0′ + b1′x + e ′
Model Building and Model Diagnostics 201
where b0′ = ln b2 and b1′ = − b3. If we fit this simple linear regres-
sion model to the natural logarithms of the transmitted light values, we
find that the least squares point estimates of b0′ and b1′ are b0′ = 1.02
and b1′ = −.7740. Considering the models ln y = b0′ + b1′x + e and
′
y = b2e − b x h, since b0′ = ln b2 , it follows that b2 = e b0 , and thus a point esti-
3
Sum of
Iter beta1 beta2 beta3 Squares
β1 , β 2 , and β3. Figure 4.17b shows that the final estimates obtained are
b1 = .0288, b2 = 2.7233, and b3 = .6828. Because the approximate 95 per-
cent confidence intervals for β 2 and β3 do not contain zero, we have strong
evidence that β 2 and β3 are significant in the model. Because the 95 percent
confidence interval for b1 does contain zero, we do not have strong evidence
that b1 is significant in the model, However, we will arbitrarily leave b1 in
∧
the model and form the prediction equation y = .0288 + 2.7233 e −.6828 x .
A practical use of this equation would be to pass a beam of light through a
solution of the chemical that has an unknown chemical concentration x,
make an optical reading (call it y ∗) of the amount of transmitted light,
set y ∗ equal to .0288 + 2.7233 e −.6828 x , and solve for the chemical concen-
tration x.
1 501 488 504 578 545 632 728 725 585 542 480 530
2 518 489 528 599 572 659 739 758 602 587 497 558
3 555 523 532 623 598 683 774 780 609 604 531 592
4 578 543 565 648 615 697 785 830 645 643 551 606
5 585 553 576 665 656 720 826 838 652 661 584 644
6 623 553 599 657 680 759 878 881 705 684 577 656
7 645 593 617 686 679 773 906 934 713 710 600 676
8 645 602 601 709 706 817 930 983 745 735 620 698
9 665 626 649 740 729 824 937 994 781 759 643 728
10 691 649 656 735 748 837 995 1040 809 793 692 763
REGRESSION ANALYSIS
11 723 655 658 761 768 885 1067 1038 812 790 692 782
12 758 709 715 788 794 893 1046 1075 812 822 714 802
13 748 731 748 827 788 937 1076 1125 840 864 717 813
14 811 732 745 844 833 935 1110 1124 868 860 762 877
1100
1000
900
800
700
600
Figure 4.18 Hotel room averages and a time series plot of the hotel room averages.
Model Building and Model Diagnostics 205
The expression ( β0 + β1t ) models the linear trend evident in the middle
plot of Figure 4.19. Furthermore, M1 , M 2 , � � � ,M11 are seasonal dummy
variables defined for months January (month 1) through November
(month 11). For example, M1 equals 1 if a monthly room average was
observed in January, and 0 otherwise; M 2 equals 1 if a monthly room
average was observed in February, and 0 otherwise. Note that we have
33
Square root of y
31
29
27
25
23
21
0 50 100 150
Time
5.8
5.6
Quartic root of y
5.4
5.2
5
4.8
4.6
0 50 100 150
Time
7.1
6.9
Natural log of y
6.7
6.5
6.3
6.1
0 50 100 150
Time
Figure 4.19 Time series plots of the square roots, quartic roots, and
natural logarithms of the hotel room averages
206 REGRESSION ANALYSIS
not defined a dummy variable for December (month 12). It follows that
the regression parameters β M 1 , β M 2 ,…, β M 11 compare January through
November with December. Intuitively, for example, β M1, is the differ-
.25
ence, excluding trend, between the level of the time series ( yt ) in Jan-
uary and the level of the time series in December. A positive β M1 would
imply that, excluding trend, the value of the time series in January can
be expected to be greater than the value in December. A negative β M1
would imply that, excluding trend, the value of the time series in January
can be expected to be smaller than the value in December. In general,
a trend component such as β1t and seasonal dummy variables such as
M1 , M 2 ,…, M11 are called time series variables, whereas an independent
variable (such as Traveler’s Rest monthly advertising expenditure) that
might have a cause and effect relationship with the dependent variable
(monthly hotel room average) is called a causal variable. We should use
whatever time series variables and causal variables that we think might
significantly affect the dependent variable when analyzing time series
data. As another example, if we plot the demands for Fresh detergent in
Table 3.2 versus time (or the sales period number), there is a clear lack of
any trend or seasonal patterns. Therefore, it does not seem necessary to
add any time series variables into the previously discussed Fresh demand
model y = b0 + b1 x 4 + b2 x3 + b3 x32 + b4 x 4 x3 + e . Further verifying this
conclusion is Figure 4.20, which shows that a plot of the model’s residuals
versus time has no trend or seasonal patterns.
0.0
−0.1
−0.2
−0.3
−0.4
−0.5
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Observation order
Figure 4.20 Residual plot versus time for the fresh detergent model
y = b0 + b1 x 4 + b 2 x 3 + b3 x 32 + b4 x 4 x 3 + f
Model Building and Model Diagnostics 207
Even when we think we have done our best to include the important
time series and causal variables in a regression model describing a dependent
variable that has been observed over time, the time-ordered error terms in
the regression model can still be autocorrelated. Intuitively, we say that error
terms occurring over time have positive autocorrelation when positive error
terms tend to be followed over time by positive error terms and when nega-
tive error terms tend to be followed over time by negative error terms. Pos-
itive autocorrelation in the error terms is depicted in Figure 4.21, which
illustrates that positive autocorrelation can produce a cyclical error term pattern
over time. Because the residuals are point estimates of the error terms, if a plot
of the residuals versus the data’s time sequence has a cyclical appearance, we
have evidence that the error terms are positively autocorrelated and thus that
the independence assumption is violated. Another type of autocorrelation
that sometimes exists is negative autocorrelation, where positive error terms
tend to be followed over time by negative error terms and negative error terms
tend to be followed over time by positive error terms. Negative autocorrela-
tion can produce an alternating error term pattern over time (see Figure 4.22)
and is suggested by an alternating pattern in a plot of the time ordered-re-
siduals. Both positive and negative autocorrelation can be caused by leaving
important independent variables out of a regression model. For example,
Error term
5 9
Time
1 2 3 4 6 7 8
Error term
5 9
Time
1 2 3 4 6 7 8
∑ (e t − et −1 )2
d= t =2
n
∑e
t =1
2
t
1. If d < d L ,a , we reject H 0.
2. If d > dU ,a , we do not reject H 0.
3. If d L ,a ≤ d ≤ dU ,a , the test is inconclusive.
Table A4 give values of d L,α and dU ,α for a = .05 and different values of k,
the number of independent variables used by the regression model, and n,
the number of observations. (Tables of d L,α and dU ,α for different values of
a can be found in more detailed books of statistical tables). Since there are
n = 30 Fresh demands in Table 3.2 and k = 4 independent variables in the
210 REGRESSION ANALYSIS
Fresh demand model, Table A4 tells us that d L,.05 = 1.14 and dU,.05 = 1.74.
Since d = 1.512 for the Fresh demand model is between these points, the
test for positive autocorrelation is inconclusive (as is the residual plot in
Figure 4.20).
It can be shown that the Durbin–Watson statistic d is always between
0 and 4. Large values of d (and hence small values of 4 − d ) lead us to
conclude that there is negative autocorrelation because if d is large, this
indicates that the differences (et − et − 1 ) are large. This says that the adja-
cent error terms ε t and ε t −1 are negatively autocorrelated. Consider testing
the null hypothesis H 0 that the error terms are not autocorrelated versus the
alternative hypothesis H a that the error terms are negatively autocorrelated.
Durbin and Watson have shown that based on setting the probability of a
Type I error equal to a , the points d L,α and dU ,α are such that
1. If ( 4 − d ) < d L ,a , we reject H 0.
2. If ( 4 − d ) > dU ,a , we do not reject H 0.
3. If d L ,a ≤ ( 4 − d ) ≤ dU ,a the test is inconclusive.
For example, for the fresh demand model we see that ( 4 − d ) = ( 4 − 1.512 ) = 2.488
(4 − d ) = (4 − 1.512) = 2.488 is greater than dU ,.05 = 1.74 . Therefore, on the basis
of setting a equal to .05, we do not reject the null hypothesis of no auto-
correlation. That is, there is no evidence of negative (first-order) autocor-
relation.
We can also use the Durbin–Watson statistic to test for positive or neg-
ative autocorrelation. Specifically, consider testing the null hypothesis H 0
that the error terms are not autocorrelated versus the alternative hypothesis
H a that the error terms are positively or negatively autocorrelated. Durbin and
Watson have shown that, based on setting the probability of a Type I error
equal to a , we perform both the above described test for positive autocor-
relation and the above described test for negative autocorrelation by using the
critical values d L,α /2 and dU ,α /2 for each test. If either test says to reject H 0, then
we reject H 0. If both tests say to not reject H 0, then we do not reject H 0. Finally,
if either test is inconclusive, then the overall test is inconclusive.
As another example of testing for positive autocorrelation, consider
the n = 168 hotel room averages in Figure 4.18 and note that when we fit
the quartic root room average model
Model Building and Model Diagnostics 211
et = f1et −1 + f2 et − 2 + ... + fq et − q + at
where the at ’ s , which are called random shocks, are assumed to be numer-
ical values that have been randomly and independently selected from
a normally distributed population of numerical values having mean 0
and a variance that does not depend on t. Estimates of the autoregressive
model parameters are obtained by using all terms in the autoregressive
model. Then the error term with the smallest (in absolute value) t statistic
is selected. If the t statistic indicates that this term is significant at the
α stay level (that is, the related p-value is less than α stay), then the proce-
dure terminates by choosing the error structure including all q terms.
If this term is not significant at the a stay level, it is removed from the
model, and estimates of the model parameters are obtained by using an
autoregressive model containing all the remaining terms. The procedure
continues by removing terms one at a time from the model describing the
error structure. At each step a term is removed if it has the smallest (in
absolute value) t statistic of the terms remaining in the model and if it is
212 REGRESSION ANALYSIS
not significant at the a stay level. The procedure terminates when none of
the terms remaining can be removed. The experience of the authors
indicates that choosing a stay equal to .15 is effective and when monthly
data is being analyzed, choosing q = 18 is also effective. When we make
these choices to analyze the room average data, Figure 4.23 tells us that
SAS PROC AUTOREG chooses the autoregressive model
When we use SAS PROC ARIMA to fit the quartic root room average
model combined with this autoregressive error term model, we obtain the
SAS output of estimation, diagnostic checking, and forecasting that is given in
Figure 4.24. Without going into the theory of diagnostic checking, it can be
shown that because each of the chi-square p-values in Figure 4.24b is greater
than .05, the combined model has adequately accounted for the autocorrela-
tion in the data (see Bowerman et al. 2005). Using the least squares point
estimates in Figure 4.24a, we compute a point prediction of y169 .25
, the quar
tic root of the hotel room average in period 169 (January of next year) to be
∧
b0 + b1t + bM 1M1 + bM 2 M 2 + ... + bM 11M11 + e t
∧
= b0 + b1(169) + bM 1(1) + bM 2 (0) + ... + bM 11(0) + e 169
∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧
= b0 + b1(169) + bM 1 + f1 e 168 + f2 e 167 + f3 e 166 + f12 e 157 + f18 e 151
∧ ∧
= 4.80114 + .0035312(169) + ( −.04589) + .30861 e 168 + .12487 e 167
∧ ∧ ∧
+ ( −.26534) e 166 + .26437 e 157 + ( −.15846) e 151
= 5.3788
Figure 4.24 Partial SAS PROC ARIMA output of a regression analysis using the quartic root room average
Model Building and Model Diagnostics 213
Here, the predictions e∧168, e∧167, e∧166, e∧157 and e∧151 of the error terms e168 , e167 , e166 , e157 , and e151
e168 , e167 , e166 , e157 , and e151 are the residuals e168 , e167,e166 , e157 , and e151 obtained
by using the quartic root room average model to predict the quartic roots
of the room averages in periods 168, 167, 166, 157, and 151. For example,
because the quartic root of y167 = 762 (see Figure 4.18) is 5.253984, and
because period 167 is a November with bM 11 = −.14464, we have e∧167 = e167 = 5.253984 − [ 4.80114 +
∧
e167 = e167 = 5.253984 − [ 4.80114 + .0035312(167 ) + ( −.14464)] = .0077736. The
.25
point prediction 5.3788 of y169 is given in Figure 4.24c and implies that
4
the point prediction of y169 is ( 5.3788 ) = 837.02 [see Figure 4.24d].
.25
Figure 4.24c also tells us that a 95 percent prediction interval for y169 is
[5.3318, 5.4257], which implies that a 95 percent prediction interval
for y169 is [(5.3318)4, (5.4257)4] = [808.17, 866.63] (see Figure 4.24d).
This interval says that Traveler’s Rest can be 95 percent confident that the
monthly hotel room average in period 169 (January of next year) will be
no less than 808.17 rooms per day and no more than 866.63 rooms per
day. Lastly, note that Figures 4.24c and 4.24d also give point predictions
.25 .25
of and 95 percent prediction intervals for y170 ,..., y180 and y170 ,..., y180 (the
hotel room averages in February through December of next year).
In order to see how least squares point estimates like those in
Figure 4.24(a) are calculated, consider, in general, a regression model
that describes a time series of yt values by using k time series and/or
causal independent variables. We will call this model the original regres-
sion model, and to simplify discussions to follow, we will express it by
showing only an arbitrary one of its k independent variables. Therefore,
we will express this model as yt = b0 + + b j xtj + + et . If the error
terms in the model are not statistically independent but are described
by the error term model et = j1et −1 + j2 et − 2 + + jq et − q + at , regression
assumption 4 is violated. To remedy this regression assumption viola-
tion, we can use the original regression model to write out expressions for
yt , j1 yt −1 , j2 yt − 2 , … , jq yt − q and then consider the transformed regres-
sion model
yt − j1 yt −1 − j2 yt − 2 − − jq yt − q
= b0 − b0 j1 − b0 j2 − − b0 jq + +
b j xtj − b j j1 xt −1, j − b j j2 xt − 2, j − − b j jq xt − q , j
+ + et − j1 et −1 − j2 et − 2 − − jq et − q
Model Building and Model Diagnostics 215
This transformed model can be written concisely as yt∗ = b0∗ + + b j xtj∗ + + et∗
∗
yt∗ = b0∗ + + b j xtj∗ + + et∗ , where, for t = q + 1, q + 2, … , n : yt = yt − j1 yt −1 − j2 yt − 2 − − jq yt − q
∗
yt = yt − j1 yt −1 − j2 yt − 2 − − jq yt − q , b0 = b0 (1 − j1 − j2 − − jq ), xtj = xtj − j1 xt −1, j − j2 xt − 2, j − − jq xt −
∗ ∗
Observation 3
Observation 1
Observation 2
x
Source: Procedures and Analysis for Staffing Standards Development: Regression Analysis Hand-
book (San Diego, CA: Navy Manpower and Material Analysis Center. 1979).
(a) Diagnostics for original model and studentized deleted residuals for options 1 and 2
Std Err Student Hat Diag Option1 Option2 Cook’s
Obs Residual Residual Residual H Rstudent Rstudent Rstudent D Dffits
(b) Plot of residuals for original model (c) Plot of residuals for Option 1 (d) Plot of residuals for Option 2
1,844.338 774.320 1,091.563
1,229.559 727.708
387.160
614.779 363.854
0.000 0.000
0.000
=std.error)
=std.error)
=std.error)
Residual (gridlines
Residual (gridlines
Residual (gridlines
−774.320 −727.708
0 5000 10000 15000 20000 25000 0 5000 1000015000 20000 0 5000 1000015000 20000
Predicted Predicted Predicted
Figure 4.26 Partial SAS output of outlying and influential observation diagnostics
Model Building and Model Diagnostics 219
220 REGRESSION ANALYSIS
One option for dealing with the fact that hospital 14 is an outlier with
respect to its y value is to assume that hospital 14 has been run ineffi-
ciently. Because we need to develop a regression model using efficiently
run hospitals, based on this assumption we would remove hospital 14
from the data set. If we perform a regression analysis using a model
relating y to x1 , x2 , and x3 with hospital 14 removed from the data set
(we call this Option 1), we obtain a standard error of s = 387.16. This s
is considerably smaller than the large standard error of 614.779 caused
by hospital 14’s large residual when we use all 17 hospitals to relate y to
x1 , x2 , and x3.
A second option is motivated by the fact that large organizations
sometimes exhibit inherent inefficiencies. To assess whether there might be
general large hospital inefficiency, we define a dummy variable DL that equals
1 for the larger hospitals 14 to 17 and 0 for the smaller hospitals 1 to 13. If we
fit the resulting regression model y = b0 + b1 x1 + b2 x 2 + b3 x3 + b4 DL + e
to all 17 hospitals (we call this Option 2), we obtain a b4 of 2871.78 and a
p-value for testing H 0 : β 4 = 0 of .0003. This indicates the existence of a
large hospital inefficiency that is estimated to be an extra 2871.78 hours
per month. In addition, the dummy variable model’s s is 363.854, which
is slightly smaller than the s of 387.16 obtained using Option1. In the
exercises the reader will use the studentized deleted residual for hospital
14 when using Option 2 (see Figure 4.26a) to show that hospital 14 is not
an outlier with respect to its y value. This means that if we remove hos-
pital 14 from the data set and predict y14 by using a newly fitted dummy
variable model having a large hospital inefficiency estimate based on the
remaining large hospitals 15, 16, and 17, the prediction obtained indi-
cates that hospital 14’s labor hours are not unusually large. This justifies
leaving hospital 14 in the data set when using the dummy variable model.
In summary, both Options 1 and 2 seem reasonable. The reader will fur-
ther compare these options in the exercises.
222 REGRESSION ANALYSIS
INTERCEP X1 X2 X3
Obs Dfbetas Dfbetas Dfbetas Dfbetas
16 0.9880 -1.4289 1.7339 -1.1029
17 0.0294 -3.0114 1.2688 0.3155
H = X (X ′ X ) X ′
−1
which has n rows and n columns. For i = 1, 2, ... , n we define the lever-
age value hi of the x values xi1 , xi 2 , ... , xik to be the ith diagonal element of
H. It can be shown that
∧
y (i )= b0(i ) + b1(i ) xi1 + b2(i ) xi 2 + ... + bk(i ) xik
1/2
e d n−k −2
d i = i and i = ei 2
1 − hi sdi SSE (1 − hi ) − ei
ei2 hi
Di = 2
(k + 1)s 2 (1 − hi )
g (ji ) d rj ,i
= i
s g(i )
j
sdi (r j′r j )(1 − hi )
r j′ is row j of R.
Also, let f i = y∧ i − y∧ (i ). If s f i denotes the standard error of this differ-
ence, then the difference in fits statistic is defined to be f i / s f i . It can be
shown that
1/ 2
f i di hi
=
s f i sdi 1 − hi
226 REGRESSION ANALYSIS
Model 1: ln y = b0 + b1 x1 + b2 x2 + b3 x3 + b8 x8 + e
Model 2: ln y = b0 + b1 x1 + b2 x2 + b3 x3 + b6 x6 + b8 x8 + e
Model 3: ln y = b0 + b1 x1 + b2 x2 + b3 x3 + b5 x5 + b8 x8 + e
Note that although we did not discuss the PRESS statistic in Section 4.2,
it is another useful model building statistic.
Each model was fit to the remaining 54 observations (the validation
data) and also used to compute
n∗
∑( y
i =1
i
¢ ∧
− yi )2
MSPR =
n∗
when n * is the number of observations in the validation data set, yi' is the
value of the dependent variable for the i th observation in the validation
∧
data set, and yi is the prediction of yi' using the training data set model.
Model Building and Model Diagnostics 227
(a) Plot of e(2) versus e¢(2) (b) Plot of e(3) versus e¢(3)
3000 3000
2000 2000
1000 1000
Manhours
Manhours
0 0
−1000 −1000
−2000 −2000
−3000 −3000
−150 −100 -50 0 50 100 −3 −2 –1 0 1 2
BedDay StayDay
The values of MSPR for the three above models, as well as the values of
PRESS, C , s 2 , and R 2 when the three models are fit to the validation data
set, are shown in Table 4.3. Model 3 was eliminated because the sign
of the age coefficient changed from a negative b5 = −.0035 to a positive
b5 = .0025 as we went from the training data set to the validation data
set. Model 1 was chosen as the final model because it had (1) the smallest
PRESS for the training data; (2) the smallest PRESS, C, and s 2 for the val-
idation data; (3) the second smallest MSPR; (4) all p-values less than .01
(it was the only model with all p-values less than .10); and (5) the fewest
independent variables. The final prediction equation was
y = 3.852 + .073 x + .0142 x + .0155 x + .353 x
ln 1 2 3 8
and thus y∧ = e ln y
228 REGRESSION ANALYSIS
y = β0 + β1 x1 + ... + β j −1 x j −1 + β j +1 x j +1 + ... + β k xk + ε
and let b0′, b1′,..., b ′j −1 , b ′j +1 ,..., bk′ be the least squares point estimates of the
parameters in the model
versus
of the model parameters than are the usual least squares point estimates.
We first show how to calculate ridge point estimates. Then we discuss the
advantage and disadvantages of these estimates.
To calculate the ridge estimates of the parameters in the model
where
1 yi − y 1 xij − x j
yi′ = and xij′ =
n −1 sy n − 1 sx j
Here, y and s y are the mean and the standard derivation of the n observed
values of the dependent variable y , and, for j = 1, 2,…, k ,�x j and s x j are
the mean and the standard deviation of the n observed values of the jth
independent variable x j . If we form the matrices
1 rx1, x2 . . . rx1, xk ry , x1
r 1 . . . rx2 , xk r
x2 , x1 y , x2
∑ ∑ . . . ∑ ∑ .
X′ X = X′ y =
. . . .
. . . .
rxk, x1 rxk, x2 . . . 1 ry , xk
Model Building and Model Diagnostics 231
Ridge Estimation
The ridge point estimates of the parameters b1 ′,..., bk ′ of the standardized
regression model are
b1′,R
.
i i i
. = ( X¢ X + c I )−1 X ¢ y•
.
bk′,R
sy
b j ,R = b j′,R j = 1,. . . , k
sx j
b0,R = y − b1,R x1 − b2,R x2 − ... − bk ,R xk
This can be proven to be equal to the sum of the squared bias of the proce-
dure and the variance of the procedure. Here, the variance is the average
of the squared deviations of the different possible point estimates from
the mean of all possible point estimates. If the procedure is unbiased, the
mean of all possible point estimates is the parameter we are estimating. In
other words, when the bias is zero, the mean squared error and the vari-
ance of the procedure are the same, and thus the mean squared error of
the (unbiased) least squares estimation procedure for estimating β j is the
variance σ b2j . The mean squared error of the ridge estimation procedure is
[ mb j ,R − b j ]2 + sb2j ,R
It can be proved that as the biasing constant c increases from zero, the bias
of the ridge estimation procedure increases, and the variance of this proce-
dure decreases. It can further be proved that there is some c > 0 that makes
σ b2j ,R so much smaller than σ b2j that the mean squared error of the ridge esti-
mation procedure is smaller than the mean squared error of the least squares
estimation procedure. This is one advantage of ridge estimation. It implies
that the ridge point estimates are less affected by multicollinearity than the
least squares point estimates. Therefore, for example, they are less affected
by small changes in the data. One problem is that the optimum value of c
differs for different applications and is unknown.
Before discussing how to choose c, we note that, in addition to using
the standardized regression model to calculate ridge point estimates,
some statistical software systems automatically use this model to calculate
the usual least squares point estimates. The reason is that when strong
multicollinearity exists, the columns of the matrix X obtained from
the usual (multiple) linear regression model are close to being linearly
dependent and thus there can be serious rounding errors in calculating
(X ¢ X )-1. Such errors can also occur when the elements of X ¢ X have sub-
stantially different magnitudes. This occurs when the magnitudes of the
independent variables differ substantially. Use of the standardized regres-
• •
sion model means that X ′ X consists of simple correlation coefficients, all
elements of which are between –1 and 1. Therefore these elements have
the same magnitudes. This can help to eliminate serious rounding errors
• • –1
in calculating ( X ′ X ) and thus in calculating the least squares point
Model Building and Model Diagnostics 233
∑ ∑ ∑ ∑ ∑ ∑
(X¢ X + c I)-1 X ¢ X (X ¢ X + c I)-1
Table 4.4 The ridge point estimates for the hospital labor needs model
mates that are less sensitive than the least squares point estimates to both
outlying observations and failures of the model assumptions. For exam-
ple, if the populations sampled are not normal but are heavy tailed, then
we are more likely to obtain a yi value that is far from the mean yi value.
This value will act much like an outlier, and its effect can be dampened
by minimizing the sum of absolute residuals. An excellent discussion of
robust regression procedures is given by Myers (1986).
n 2 n∗ 2
∑ ( yi − y )
i =1
∑( y′ − y)
i =1
i
MSE = MSPR =
n n∗
where yi is the ith GPA among the n = 352 GPA’s in the training data set
and yi′ is the ith GPA among the n∗ = 353 GPA’s in the validation data
set. In the second step, we find the dividing point in the ( x1 , x2 ) = (ACT,
H.S. Rank) space that gives the greatest reduction is MSE. As illustrated
in Figure 4.29b the dividing point is a high school rank of 81.5, and the
new MSE and MSPR are
Model Building and Model Diagnostics 237
(a) Step 1
(e) Step 5
100 y5 = 3.559
y = average of all 100
H.S. Rank 96.5 y3 = 2.950 y4 = 3.261
352 GPA’s in the 81.5
H.S. Rank
training data set
y1 y2 = 2.794
20 =
2.362
15 ACT 35 20
(b) Step 2
15 19.5 23.5 35
100 ACT
y1 = Region 1 average
(f) The regression tree
81.5
H.S. Rank
y2 = Region 2 average
H.S. rank < 81.5
20
Yes
15 ACT 35
ACT < 19.5
(c) Step 3
100 Yes No
y1 2.362 2.794
81.5
H.S. Rank
No
y2 y3
ACT < 23.5
20 Yes No
15 19.5 ACT 35 2.950 H.S. rank < 96.5
Yes No
(d) Step 4
100 3.261 3.559
y3 y4
81.5
H.S. Rank
y1 y2
20
15 19.5 23.5 35
ACT
n1 n2
∑ ( yi − y 1 ) + ∑ ( yi − y 2 )
2 2
i =1 i =1
MSE =
n
and
n∗1 2 n∗2 2
∑ ( yi′ − y 1 ) + ∑ ( yi′ − y 2 )
i =1 i =1
MSPR =
n
238 REGRESSION ANALYSIS
DATA TERR;
INPUT SALES TIME MKTPOTEN ADVER MKTSHARE CHANGE
ACCTS WKLOAD RATING;
TMP = TIME*MKTPOTEN;
TA = TIME*ADVER;
TMS = TIME*MKTSHARE;
TC = TIME*CHANGE;
MPA = MKTPOTEN*ADVER;
MPMS = MKTPOTEN*MKTSHARE;
MPC= MKTPOTEN*CHANGE;
AMS= ADVER*MKTSHARE;
AC= ADVER*CHANGE;
MSC= MKTSHARE*CHANGE;
SQT= TIME*TIME;
SQMP= MKTPOTEN*MKTPOTEN;
SQA= ADVER*ADVER;
SQMS= MKTSHARE*MKTSHARE;
SQC= CHANGE*CHANGE;
DATALINES;
3669.88 43.10 74065.11 4582.88 2.51 0.34 24.86 15.05 4.9
3473.95 108.13 58117.30 5539.78 5.51 0.15 107.32 19.97 5.1
.
.
2799.97 21.14 22809.53 3552.00 9.14 -0.74 88.62 24.96 3.9
PROC PLOT;
PLOT SALES*(TIME MKTPOTN ADVER MKTSHARE CHANGE ACCTS WKLOAD RATING);
PROC CORR;
PROC REG;
MODEL SALES = TIME MKTPOTN ADVER MKTSHARE CHANGE ACCTS WKLOAD
RATING/VIF;
Figure 4.30 SAS program for model building using the sales territory
performance data
gives the SAS program needed to perform residual analysis and to fit the
transformed regression model and a weighted least squares regression
model when analyzing the QHIC data in Table 4.7. Figure 4.32 gives the
SAS program needed to analyze the hotel room average occupancy data in
Figure 4.18. Figure 4.33 gives the SAS program for model building and
240 REGRESSION ANALYSIS
data qhic;
input value upkeep;
val_sq = value**2;
datalines;
237.00 1412.08
153.08 797.20
.
.
122.02 390.16
198.02 1090.84
220 .
Proc reg;
model upkeep = value val_sq:
plot r.*value;
output out = new1 r=resid p = yhat;
(Note: This statement places the residuals and the y^ values in a new data
set called “new1”. The command “r=resid” says that we are giving the name
“resid” to the residuals (r). The command “p = yhat” says that we are giving the name
“yhat” to the predicted values (p).
data new2;
set new1;
abs_res = abs(resid);
proc plot;
plot abs_res*value;
proc reg;
model abs_res = value;
Output out = new3 p = shat;
proc print;
var shat;
data new4;
set new3;
y_star = upkeep/shat;
inv_pabe = 1/shat;
value_star = value/shat;
val_sq_star = val_sq/shat;
wt = shat**(-2);
proc reg;
model y_star = inv_pabe value_star val_sq_star / noint clm cli;
plot r.*p.;
proc reg;
model upkeep = value val_sq / clm cli;
weight wt;
plot r.*p.;
DATA OCCUP;
INPUT Y M;
IF M = 1 THEN M1 = 1;
ELSE M1 = 0;
IF M = 2 THEN M2 = 1; Defines the
ELSE M2 = 0; dummy variables
• M1, M2, ...., M11
•
IF M = 11 THEN M11 = 1;
ELSE M11 = 0;
TIME = _N_;
LNY = LOG(Y);
SRY = Y**.5;
QRY = Y**.25;
PROC PLOT;
PLOT y*TIME;
PLOT LNY*TIME;
PLOT SRY*TIME;
PLOT QRY*TIME;
DATLINES;
501 1
488 2 Hotel room average
• occupancy data
•
877 12
. 1
. 2 Predicts next year’s
• monthly room averages
•
. 12
(These statements perform backward elimination on the quartic root room average model residuals
with q=18 and astay = .15. Recall that the error term model chosen is et = φ1 et–1 + φ2 et–2 + φ3 et–3 +
φ12 et–12 + φ18 et–18. The following commands fit the quartic root room average model combined
with this error term model.)
PROC ARIMA DATA = OCCUP;
IDENIFY VAR = QRY CROSSCOR = (TIME M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11)
NOPRINT;
ESTIMATE INPUT = (TIME M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11)
P = (1,2,3,12,18) PRINTALL PLOT;
FORECAST LEAD = 12 OUT = FCAST3;
DATA FORE3;
SET FCAST3;
Y = QRY**4;
FY = FORECAST**4;
L95CI = L95**4;
U95CI = U95**4;
PROC PRINT DATA = FORE3;
VAR Y L95CI FY U95CI;
DATA HOSP;
INPUT Y X1 X2 X3 X4 X5 D;
DATALINES;
566.52 2463 472.92 4.45 15.57 18.0 0
696.82 2048 1339.75 6.92 44.02 9.5 0
.
.
4026.52 15543 3865.67 5.50 127.21 126.8 0
10343.81 36194 7684.10 7.00 252.90 157.7 1
11732.17 34703 12446.33 10.78 409.20 169.4 1
15414.94 39204 14098.40 7.05 463.70 331.4 1
18854.45 86533 15524.00 6.35 510.22 371.6 1
. 56194 14077.88 6.89 456.13 351.2 1
PROC PRINT;
PROC CORR;
PROC PLOT;
PLOT Y * (X1 X2 X3 X4 X5 D);
PROC REG;
MODEL Y = X1 X2 X3 X4 X5 D / VIF;
PROC REG;
MODEL Y = X1 X2 X3 X4 X5 D / SELECTION = RSQUARE ADJRSQ
MSE RMSE CP;
MODEL Y = X1 X2 X3 X4 X5 D / SELECTION = STEPWISE
SLENTRY = .10 SLSTAY = .10;
PROC REG; Detects outlying
MODEL Y = X1 X2 X3 D / P R INFLUENCE CLM CLI VIF; and influential
OUTPUT OUT = ONE PREDICTED = YHAT RESIDUAL = RESID; observations
PRC PLOT DATA = ONE;
Constructs
PLOT RESID * (X1 X2 X3 D YHAT);
residual
PROC UNIVARIATE PLOT DATA = ONE;
and normal
VAR RESID;
plots
RUN;
Figure 4.33 SAS program for model building and residual analysis
and for detecting outlying and influential observations using the
hospital labor needs data
4.11 Exercises
Exercise 4.1
DATA TRANSMIS;
INPUT CHEMCON LIGHT;
DATALINES;
0.0 2.86
0.0 2.64
1.0 1.57
. light data in Section 4.3
.
.
5.0 0.36
PROC NLIN; NLIN is SAS’s nonlinear regression procedure
PARAMETERS BETA1 = 0 BETA2 = 2.77 BETA3 = .774;
MODEL LIGHT = BETA1 + BETA2*EXP(-BETA3*CHEMCON);
Figure 4.34 SAS program for fitting the nonlinear regression model
y = b1 + b2 e- b3 x + f to the light data
stay, in days), Load (x 4, average daily patient load), and Pop (x5, eligible
population in the area, in thousands). Figure 4.35 gives MINITAB
and SAS outputs of multicollinearity analysis and model building for
these data.
(a) Discuss why Figure 4.35a and 4.35b indicate that BedDays, Load,
and Pop are most strongly involved in multicollinearity. Note that
the negative coefficient (that is, least squares point estimate) of
b3 = −394.3 for Length might be intutively reasonable because it
might say that, when all other independent variables remain con-
stant, an increase in average length of patients’ stay implies less
patient turnover and thus fewer start-up hours needed for the initial
care of new patients. However, the negative coefficients for Load and
Pop do not seem to be intuitively reasonable—another indication
of extremely muticollnearity. The extremely strong multicollinearity
between BedDays, Load, and Pop implies that we may not need all
three in a regression model.
(b) Which model has the highest adjusted R 2, smallest C statistic, and
smallest s?
(c) (1) Which model is chosen by stepwise regression in Figure 4.35?
(2) If we start with all five potential independent variables and use
backward elimination with an α stay of .10, the procedure removes
(in order) Load and Pop and then stops. Which model is chosen by
backward elimination? (3) Discuss why the model that uses Xray,
244 REGRESSION ANALYSIS
BedDays, and Length seems to be the overall best model. (4) Which
of BedDays, Load, and Pop does this best model use?
(d) Consider a questionable hospital for which Xray = 56,194,
BedDays = 14,077.88, Length = 6.89, Load = 456.13, and Pop =
351.2. The least squares point estimates and associated p-values (given
in parentheses) of the parameters in the best model, y = b0 + b1 x1 + b2 x2 + b3 x3 + e,
y = b0 + b1 x1 + b2 x2 + b3 x3 + e, are b0 = 1523.3892 (.0749), b1 = .05299 (.0205) ,
b2 = .97898 (< .0001) and b3 = −320.9508 (.0563). Using this
model, a point prediction of and a 95 percent prediction interval
for the labor hours, y0, of an efficiently run hospital having the same
Model Building and Model Diagnostics 245
Step 1 2 3
Constant -28.13 -68.31 1523.39
Length -321
T-value -2.10
p-value 0.056
Exercise 4.2
Table 4.5 shows data concerning the time, y , required to perform ser-
vice (in minutes) and the number of laptop computers serviced, x, for
15 service calls. Figure 4.37 shows that the y values tend to increase in a
straight line fashion and with increasing variation as the x values increase.
If we fit the simple linear regression model y = b0 + b1 x + e to the data,
the model’s residuals fan out as x increases (we do not show the residual
246 REGRESSION ANALYSIS
300
200
Time
100
0
0 1 2 3 4 5 6 7 8 9 10 11
Laptops
Parameter Standard
Variable DF Estimate Error t value Pr>│t│
Figure 4.39 Partial SAS output when using least squares to fit the
tranformed model yi / pabei = b0 (1 / pabei ) + b1 ( x i / pabei ) + n i
Parameter Standard
Variable DF Estimate Error t value Pr>│t│
Figure 4.40 Partial SAS output when using weighted least squares to
fit the orginal model yi = b0 + b1 x i + fi
∧
(a) Show how the predicted service time y 0 / 37.4275 = 5.0157 in Figure
∧
4.39 and the predicted service time y 0 = 187.7256 in Figure 4.40
have been calculated by SAS.
248 REGRESSION ANALYSIS
(b) Letting m0 represent the mean service time for all service calls on
which seven laptops will be serviced, Figure 4.39 says that a 95
percent confidence interval for m0 / 37.4275 is [4.2809, 5.7506],
and Figure 4.40 says that a 95 percent confidence interval for m0 is
[160.2224, 215.2288]. If the number of minutes we will allow for
the future service call is the upper limit of the 95 percent confidence
interval for m0, how many minites will we allow?
Exercise 4.3
300
y 200
100
0
0 2 4 6 8 10 12 14 16
Year
Figure 4.41 Number of steakhouses in operation versus year
5
In(y)
2
0 2 4 6 8 10 12 14 16
Year
(a) Use the least squares point estimates to calculate the point prediction.
(b) By exponentiating the point prediction and prediction interval—that
is by calculating e 6.1802 and [e 5.9945 , e 6.3659]—find a point prediction of
and a 95 percent prediction interval for the number of steakhouses in
operation next year.
(c) The model ln yt = b0 + b1t + et is called a growth curve model
because it implies that yt = e ( b0 + b1t + et ) = (e b0 )(e b1t )(e et ) = a 0a1t ht
250 REGRESSION ANALYSIS
Exercise 4.4
∧ .25
In Section 4.4 we used e 166 to help compute a point prediction of y169 ,
∧
the quartic root of the hotel room average in period 169. Calculate e 166.
Exercise 4.5
In Exercise 4.1 you concluded that the best model describing the hos-
pital labor needs data in Table 4.2 is y = b0 + b1 x1 + b2 x2 + b3 x3 + e . In
Section 4.5 we concluded using the studentized deleted residual that
hospital 14 is an outlier with respect to its y value. Option 1 for deal-
ing with this outlier is to remove hospital 14 from the data and fit the
model y = b0 + b1 x1 + b2 x2 + b3 x3 + e to the remaining 16 observations.
Option 2 is to fit the model y = b0 + b1 x1 + b2 x2 + b3 x3 + b4 DL + e to
all 17 observations. Here, DL = 1 for the larger hospitals 14 to 17 and 0
otherwise.
(a) (1) Use the studentized deleted residuals in Figure 4.26a (see
Option 1 Rstudent and Option 2 Rstudent) to see if there are any
outliers with respect to their y values when using Options 1 and
2. (2) Is hospital 14 an outlier with respect to its y value when
using Option 2? (3) Consider a questionable large hospital (DL = 1)
for which Xray = 56.194, BedDays = 14,077.88, and Length =
6.89. Also, consider the labor needs in an efficiently run large hos-
pital described by this combination of values of the independent
variables. The 95 percent prediction intervals for these labor needs
given by the models of Options 1 and 2 are, respectively, [14,906,
16,886] and [15,175, 17,030]. By comparing these prediction
Model Building and Model Diagnostics 251
Statistical Tables
Table A1: An F table: Values of F[.05]
Table A2: A t-table: Values of t[γ]
Table A3: A table of areas under the standard normal curve
Table A4: Critical values for the Durbin—Watson d statistic (α = .05)
Table A1. An F table: Values of F[.05]
.05
F
0 F[.05]
(Continued)
Table A1. An F table: Values of F[.05] (Continued)
Source: Reproduced by permission from Merrington and Thompson (1943) © by the Biometrika Trustees.
256 REGRESSION ANALYSIS
0 t[g]
0 z
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
0.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2518 2549
0.7 .2580 .2612 .2642 .2673 .2704 .2734 .2764 2794 .2823 .2852
0.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7 .4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633
1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767
2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890
2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936
2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964
2.7 .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986
3.0 .49865 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990
4.0 .4999683
Source: Reproduced by permission from Durbin and Waston (1951) © by the Biometrika
Trustees.
References
Andrews, R.L., and S.T. Ferguson. 1986. “Integrating Judgment With a Regression
Appraisal.” The Real Estate Appraiser and Analyst 52, no. 2, pp. 71–74.
Bowerman, B.L., R.T. O’Connell, and A.B. Koehler. 2005. Forecasting, Time
Series, and Regression. 4th ed. Belmont, CA: Brooks Cole.
Cravens, D.W., R.B. Woodruff, and J.C. Stomper. January, 1972. “An Analytical
Approach for Evaluation of Sales Territory Performance.” Journal of Marketing
36, no. 1, pp. 31–37.
Dielman, T. 1996. Applied Regression Analysis for Business and Economics. Belmont,
CA: Duxbury Press.
Durbin,J., and G.S. Waston. 1951. “Testing for Serial Correlation in Least Squares
Regression, II.” Biometrika 30, pp. 159–178.
Freund, R.J., and R.C. Littell. 1991. SAS System for Regression. 2nd ed. Cary, NC:
SAS Institute Inc.
Kennedy, W.J., and J.E. Gentle. 1980. Statistical Computing. New York, NY:
Dekker.
Kutner, M.H., C.S. Nachtsheim, J. Neter, and W. Li. 2005. Applied Linear
Statistical Models. 5th ed. Burr Ridge, IL: McGraw. Hill, Irwin.
Mendenhall, W., and T. Sincich. 2011. A Second Course in Statistics: Regression
Analysis. 7th ed. Upper Saddle River, NJ: Prentice Hall.
Merrington, M. 1941. “Table of Percentage Points of the t-Distribution.”
Biometrika 32, p. 300.
Merrington, M., and Thompson, C.M. April, 1943. “Tables of Percentage Points
of the Inverted Beta (F)-Distribution.” Biometrika 33, no. 1, pp. 73–88.
Myers, R. 1986. Classical and Modern Regression with Applications. Boston, MA:
Duxbury Press.
Neter, J., W. Wasserman, and G.A. Whitmore. 1972. Fundamental Statistics for
Business and Economics. 4th ed. Boston, MA: Allyn & Bacon, Inc.
Ott, R.L. 1984. An Introduction to Statistical Methods and Data Analysis. 2nd ed.
Boston, MA: Duxbury Press.
Ott, R.L., and M.L. Longnecker. 2010. An Introduction to Statistical Methods and
Data Analysis. 6th ed. Belmont, CA: Brooks/Cole.
Index
Adjusted coefficient of determination, First-order autocorrelation, 208
56–57 F table, 254–255
Autocorrelated errors, 208–216
Autoregressive model, 211 Gauss–Markov theorem, 76
General logistic regression model, 136
Backward elimination, 172–174,
211 Handling unequal variances, 188–194
Biasing constant, 231 Hildreth-Lu procedure, 216
Bonferroni procedure, 134
Box-Jenkins methodology, 216
Independence assumption, diagnosing
and remedying violations of,
Causal variable, 206 45
Chi-square p-values, 212 autocorrelation, 202–208
Cochran-Orcutt procedure, 215 Durbin–Watson test, 208–216
Coefficients of determination, 52–60 modeling autocorrelated errors,
Conditional least squares method, 208–216
216 seasonal patterns, 202–208
Confidence intervals, 81–89, 90 trend, 202–208
Constant variance assumption, 44, Independent (predictor) variable, 7,
181–184 74
Correct functional form, assumption Indicator variables, 111
of, 184–187 Individual t tests, 66–69
Correlation, 57–60 population correlation coefficient,
Correlation matrix, 159 78–81
Cross-sectional data, 2 simple linear regression model, 77–78
C-statistic, 169 using p-value, 72–76
Curvature, rate of, 98 using rejection point, 69–72
Individual value
Dependent (response) variable, 7 point prediction of, 18
fractional power transformations of, prediction interval for, 86
194–197 Interaction model, 120
Distance value, 82 Interaction terms, 97–110, 174–177
Dummy variables, 110–123, 137 Interaction variables, 101
Durbin–Watson d statistic, 258–259 Intercept, 23
Durbin–Watson statistic, 209, 210 Inverse prediction in simple linear
Durbin–Watson test, 208–216 regression, 89–91
• MURPHREE
BOWERMAN • O’CONNELL
EXPERT PRESS Unified Concepts, Practical Applications, to Decision Making Collection
DIGITAL LIBRARIES and Computer Implementation Donald N. Stengel, Editor
EBOOKS FOR Bruce L. Bowerman • Richard T. O’Connell
BUSINESS STUDENTS • Emily S. Murphree
Curriculum-oriented, born- This book is a concise and innovative book that gives a complete
Regression
digital books for advanced presentation of applied regression analysis in approximately
business students, written one-half the space of competing books. With only the modest
prerequisite of a basic (non-calculus) statistics course, this text is
by academic thought appropriate for the widest possible audience.
Analysis
leaders who translate real- After a short chapter, Chapter 1, introducing regression,
world business experience this book covers simple linear regression and multiple regres-
into course readings and sions in a single cohesive chapter, Chapter 2, by efficiently
reference materials for integrating the discussion of these two techniques. Chapter 2
students expecting to tackle also makes learning easier for students of all backgrounds
management and leadership
challenges during their
by teaching the necessary statistical background topics (for
example, hypothesis testing) and the necessary matrix a lgebra
concepts as they are needed in teaching regression. Chapter 3
Unified Concepts,
Practical
professional careers. continues the integrative approach of the text by giving a
unified presentation of more advanced regression models, in-
POLICIES BUILT cluding models using squared and interaction terms, models
BY LIBRARIANS
Applications,
using dummy variables, and logistic regression models.
• Unlimited simultaneous The book concludes with Chapter 4, which organizes the
usage techniques of model building, model diagnosis, and model
and Computer
• Unrestricted downloading improvement into an easy to understand six step procedure.
and printing Bruce L. Bowerman is professor emeritus of decision sciences
• Perpetual access for a at Miami University in Oxford, Ohio. He received his PhD d
egree
REGRESSION ANALYSIS
one-time fee
• No platform or
maintenance fees
in statistics from Iowa State University in 1974 and has over
forty years of experience teaching basic statistics, regression
analysis, time series forecasting, and other courses. He has
Implementation
been the recipient of an Outstanding Teaching award from his
• Free MARC records
Bruce L. Bowerman
students at Miami and an Effective Educator award from the
• No license to execute Richard T. Farmer School of Business Administration at Miami.
The Digital Libraries are a
Richard T. O’Connell
Richard T. O’Connell is professor emeritus of decision sci-
comprehensive, cost-effective ences at Miami University, Oxford, Ohio. He has more than
way to deliver practical 35 years of experience teaching basic statistics, regression
treatments of important analysis, time series forecasting, quality control, and other
courses. Professor O’Connell has been the recipient of an Effe
Emily S. Murphree
business issues to every
ctive Educator award from the Richard T. Farmer School of
student and faculty member. Business Administration at Miami.
Emily S. Murphree is professor emeritus of statistics at
Miami University, Oxford, Ohio. She received her PhD in sta-
tistics from the University of North Carolina with a research
For further information, a concentration in applied probability. Professor Murphree re-
free trial, or to order, contact: ceived Miami’s College of Arts and Sciences Distinguished
Education Award and has received various civic awards.
sales@businessexpertpress.com
Quantitative Approaches
www.businessexpertpress.com/librarians
to Decision Making Collection
Donald N. Stengel, Editor
ISBN: 978-1-60649-950-4
For a given temperature within the experimental region, the point prediction of weekly fuel consumption is calculated using the regression equation y^ = b0 + b1x, where x is the given temperature. For example, for 40°F, y^ = 15.84 - 0.1279(40) = 10.72 MMcf .
Residual plots can be utilized to check for violations in regression assumptions by analyzing the patterns of residuals. A residual plot with a pattern, such as fanning out (increasing error variance) or funneling in (decreasing error variance), indicates a violation of the constant variance assumption. A horizontal band appearance suggests that the constant variance assumption holds . Additionally, if a residual plot against predicted values or independent variables shows a curved pattern, it may indicate a violation of the assumption of the correct functional form of the regression model . Residual plots also help in identifying outliers or influential observations by showing studentized residuals that are significantly different from others, suggesting potential outliers . Furthermore, lack of a straight-line appearance in a residual normal plot could suggest non-normality of errors, indicating another potential assumption violation . Therefore, examining residual plots across different criteria helps in validating regression assumptions and diagnosing model issues ."}
In a dummy variable model comparing store locations, the point prediction for mean sales volume in different locations is interpreted using parameter estimates. For example, β2 represents the difference between mall and street locations. The point estimate of β2 indicates that mall locations have higher mean monthly sales volumes than street locations by the amount of this estimate .
The weighted least squares method addresses heteroscedasticity by applying weights to the observations, inversely proportional to their variance. This stabilizes variance across levels of the independent variable, ensuring that each observation contributes appropriately to the model fit .
SSE, or the Sum of Squares for Error, represents the total deviation of observed values from the fitted values in the regression model. It is calculated by summing the squared differences between each observed value and its corresponding predicted value (yi - y^)^2 .
Multicollinearity inflates the variance of the coefficients in a regression model, which is quantified by the Variance Inflation Factor (VIF). A VIF greater than 1 indicates that multicollinearity is present, and as multicollinearity becomes stronger, the VIF increases further . When the VIF is greater than 10, multicollinearity is considered severe, and if it is over 5, it is considered moderately strong . This inflation hinders the ability to assess the significance of the variables, as it decreases the precision of the estimates and can make some variables appear less important than they really are . Thus, the VIF is directly related to the degree of multicollinearity; as multicollinearity increases, the VIF increases, indicating inflated variance which can adversely affect the stability and reliability of coefficient estimates .
The implication of finding that F(model) > F[a] in regression analysis is that the null hypothesis, which states that the current regression model provides no better fit to the data than a model with no independent variables, is rejected. This indicates that at least one of the independent variables significantly contributes to predicting the dependent variable, suggesting that the regression model improves over a model without these predictors . When F(model) exceeds the critical value F[a] for a given significance level, it shows that the data provides sufficient evidence to support that the model with predictors is statistically significant in explaining variability in the response variable ."}
The significance of the p-value in the context of a regression model's F-test is to determine whether to reject the null hypothesis (H0) that all coefficients of independent variables in the model are zero. A p-value is the probability of observing an F-statistic as extreme as or more extreme than the observed value, assuming H0 is true. If the p-value is less than the chosen level of significance (α), we reject H0, indicating at least one independent variable significantly affects the dependent variable . This approach provides a more informative and straightforward method than directly using F critical values from tables . A small p-value suggests that the observed F-statistic is highly unlikely under the null hypothesis, pointing to the significance of the regression model .
The slope of the line of means, b1, in a regression context represents the change in the mean value of the dependent variable, y, associated with a one-unit increase in the independent variable, x. A positive slope indicates that as x increases, the mean value of y also increases, while a negative slope indicates that the mean value of y decreases as x increases . The slope is a parameter in the linear regression equation which helps in estimating or predicting the value of y based on x .
The interpretation of the y-intercept, b0, is often considered to have dubious practical value because it represents the mean value of the dependent variable when the independent variable equals zero, which may not be meaningful or possible in many contexts . This is especially true when the value x = 0 is outside the range of observed data or is not a reasonable value for the variable, making the y-intercept interpretation irrelevant or misleading .