Simple Regression: Multiple-Choice Questions
Simple Regression: Multiple-Choice Questions
Multiple-Choice Questions
1. We can show that, when the null hypothesis H 0 : 0 is true and the random
r n2
variables have a joint normal distribution, then the random variable -
1 r2
which is used to test the hypothesis that there is no linear association in the
population between a pair of random variables - follows the
A) normal distribution
B) Student's t distribution
C) chi-square distribution
D) F distribution
2. In the assumptions of the linear regression model, which of the following is not
one of the assumptions regarding the error term?
4. Which of the following sum of squares is minimized by the least squares method?
A) Increasing SSE.
B) Decreasing the absolute value of b1 .
C) Increasing the variance of X.
D) Decreasing sample size.
6. A regression analysis between sales (in $1000) and advertising (in $) resulted in
the following least squares line: ŷ = 80,000 + 4x. This implies that:
9. What is the most accurate statement you can make for testing the hypotheses in
the previous question?
A) Reject H0 : 0 at = 0.01
B) Reject H0 : 0 at = 0.05
C) Reject H0 : 0 at = 0.025
D) Reject H0 : 0 at = 0.005
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.442
R Square “A”
Adjusted R Square 0.133
Standard Error “B”
Observations 15.000
ANOVA
df SS MS F Significance F
Regression 1 44.397 44.397 3.154 “C”
Residual 13 “D” 14.076
Total 14 227.389
A) 0.195
B) 0.805
C) 0.442
D) 0.67
Simple Regression
A) 2.58
B) 6.67
C) 3.75
D) 3.95
12. What is the most accurate statement that can be made about the value of *C*?
A) <0.01
B) >0.05
C) <0.025
D) None of the above.
A) 172.25
B) 162.42
C) 140.03
D) 182.99
A) 9.35
B) 3.06
C) 9.82
D) 22.96
A) 1.136
B) -1.136
C) 0.278
D) -0.278
A) 0.025
B) 0.05
C) 0.10
D) 0.01
Simple Regression
17. In order to estimate with 95% confidence the expected value of y in a simple
linear regression problem, a random sample of 10 observations is taken. Which
of the following t-table values listed below would be used?
A) 2.228
B) 1.860
C) 1.812
D) 2.306
18. Suppose you are trying to develop a forecast of yn 1 based on xn 1 , which of the
following will not reduce the prediction interval for your prediction?
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.187
R Square 0.035
Adjusted R 0.026
Square
Standard Error 5652.090
Observations 115.000
ANOVA
df SS MS F Significance
F
Regression 1 130433116.219 130433116.219 4.083 0.046
Residual 113 3609911959.868 31946123.539
Total 114 3740345076.087
Simple Regression
20. How would you best explain the y-intercept in this situation?
A) For each additional 1-year increase in the age of the driver, we would expect
damage to increase by $10,726.
B) For each additional 1-year increase in the age of the driver, we would expect
damage to increase by $70.
C) It makes no sense to explain the intercept in this situation, since we can not
have a driver with age of zero.
D) The average amount of damage was $10,726.
21. On average, what would be the dollar value of an accident involving a 25-year-old
driver?
A) $11,836.56
B) $10,795.47
C) $13,372.58
D) $2,474.90
A) 3.669
B) 2.756
C) 2.787
D) 6.785
ANSWER: C
24. What is the most accurate statement that can be made about the p-value for this
test?
A) Fail to reject H 0
B) Reject H 0
C) Not enough information to draw a conclusion
D) None of the above
A) 4.83.
B) 7.11.
C) 50.50.
D) 58.92.
Simple Regression
A) 0.691.
B) 0.806.
C) 0.749.
D) 0.209.
A) 2.53.
B) 2.34.
C) 2.20.
D) 2.77.
29. What are the most appropriate null and alternative hypotheses regarding the
population correlation?
A) H0 : 0 and H1 : 0
B) H0 : 0 and H1 : 0
C) H0 : 0 and H1 : 0
D) H0 : 0 and H1 : 1
31. Develop a 90% confidence interval for the population slope if the following
regression information are given: b1 = 23.5, p-value = 0.01 and n = 25
A) 23.5 17.35
B) 23.5 15.35
C) 23.5 16.35
D) 23.5 14.35
33. The vertical spread of the data points about the regression line is measured by the:
A) regression coefficient
B) y-intercept
C) standard error of estimate
D) F-ratio
34. Which of the following will not tend to increase the standard error of the slope?
A) increase by 1 pound
B) decrease by 1 pound
C) decrease by 24 pounds
D) increase by 5 pounds
Simple Regression
A) 0.2929
B) 0.7122
C) 0.5408
D) 0.4671
A) -0.6834
B) 0.5412
C) 0.6834
D) –0.5412
A) 103.08
B) 179.23
C) 55.52
D) 74.37
40. Which of the following would most likely represent a 95% confidence interval for
the estimate of Y, given X = X ?
A) 55.52 6.21
B) 55.52 12.42
C) 55.52 40.4
D) 55.52 18.63
A) 0.043
B) 54.85
C) 0.978
D) 0.025
Simple Regression
A) 2.038
B) 1.035
C) 3.832
D) 3.463
A) 0.678
B) 0.791
C) 0.905
D) 0.775
44. What is the standard deviation of the number of hours households spend
connected to the internet each month?
A) 12.817
B) 14.667
C) 15.679
D) 11.990
A) 136.313
B) 155.786
C) 181.750
D) 159.032
A) 0.678
B) 0.791
C) 0.905
D) 0.775
Simple Regression
A) 0.6337
B) 0.9482
C) 0.5541
D) 0.6475
A) 7.87
B) 8.87
C) 8.37
D) 9.37
A) 691.062
B) 1033.601
C) 461.812
D) 437.918
A) 116.399
B) 458.938
C) 712.082
D) 688.188
A) 0.637
B) 0.575
C) 0.601
D) 0.664
52. What is the estimate of the variance of the population model error?
A) 118.347
B) 114.698
C) 19.399
D) 76.156
Simple Regression
53. What is the standard error of the slope of the regression line of hours on income?
A) 0.256
B) 0.234
C) 0.211
D) 0.269
54. What is the value of the test statistic for testing H0 : 1 0 vs. H1 : 1 0 ?
A) 2.36
B) 3.00
C) 2.48
D) 2.71
56. Suppose we were to run a linear regression using the data in the following scatter
70
60
50
40
30
20
10
0
0 10 20 30 40 50
plot.
What are the most reasonable values for y-intercept b0 and the slope b1 ?
A) b0 = 45 and b1 = 2
B) b0 = 45 and b1 = -2
C) b0 = 45 and b1 = -20
D) b0 = 120 and b1 = -2
Simple Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.492
R Square 0.242
Adjusted R 0.208
Square
Standard Error 40.975
Observations 24.000
ANOVA
df SS MS F Significance F
Regression 1 11809.406 11809.406 7.034 *
Residual * * *
Total * *
A) 21
B) 22
C) 23
D) 24
A) 1678.9
B) 1,554.2
C) 1,493.6
D) 1,407.3
Simple Regression
A) 21
B) 22
C) 23
D) 24
A) 10,945.2
B) 11,759.9
C) 10,130.5
D) 36,935.8
61. What is the value of total sum of squares?
A) 48,745.2
B) 46,538.7
C) 50,292.4
D) 52,644.8
A) 112.4
B) 102.3
C) 108.6
D) 105.5
A) 0.66
B) 0.76
C) 0.85
D) 0.61
Simple Regression
65. A regression analysis between sales (in $1000) and advertising (in $100) resulted
in the following least squares line: ŷ = 75 + 5x. This implies that if advertising is
$800, then the predicted amount of sales (in dollars) is:
A) $4075
B) $115,000
C) $64,000
D) $79,000
66. In publishing the results of some research work, the following values of the
coefficient of determination were listed. Which one would appear to be
incorrect?
A) 0.91
B) 0.06
C) 0.47
D) -0.64
69. The regression line ŷ = 3 + 2x has been fitted to the data points (4,8), (2,5), and
(1,2). The residual sum of squares will be:
A) 10
B) 15
C) 13
D) 22
Simple Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.834298
R Square 0.696053
Adjusted R Square 0.645395
Standard Error 3.924283
Observations 8
ANOVA
df SS MS F Significance F
Regression 1 211.6 211.6 13.74026 0.010007609
Residual 6 92.4 15.4
Total 7 304
A) 0.696
B) 0.834
C) 0.645
D) 0.803
71. What is the value of the test statistic for testing H0 : 0 vs. H1 : 0 ?
A) 4.439
B) 4.113
C) 3.702
D) 3.305
72. What is (are) the critical value(s) for testing the hypotheses in the second question
at 0.05 level of significance?
A) + 1.943
B) 2.447
C) + 1.645
D) – 1.645
Simple Regression
73. Based on your answers to the second and third questions, what is your conclusion
at the 0.05 level of significance?
A) Since the test statistic equals 4.439 and the critical value equals 1.645, we
reject the null hypothesis and assume there is a positive correlation between
the variables
B) Since the test statistic equals 4.113 and the critical value equals -1.645, we fail
to reject the null hypothesis and assume there is no correlation between the
variables.
C) Since the test statistic equals 3.305 and the critical value equals 2.447, we
reject the null hypothesis and assume there is a correlation between the
variables.
D) Since the test statistic equals 3.702 and the critical value equals 1.943, we
reject the null hypothesis and assume there is a positive correlation between
the variables
A) 0.696
B) 0.645
C) 0.834
D) 1.241
A) 3.71
B) 1.24
C) 4.60
D) 3.92
A) 11.17
B) 64.30
C) 92.40
D) 13.74
Simple Regression
77. What is the approximate point estimate of the final exam score for a student who
have studied 5.45 hours?
A) 81
B) 92
C) 85
D) 89
78. The following values are listed as coefficients of correlation (r). The one that
indicates an inverse relationship between the two variables x and y is:
A) 0.0
B) -0.8
C) 0.9
D) 1.3
79. Suppose we were to run a linear regression using the data in the following scatter
plot.
120
100
80
60
40
20
0
0 10 20 30 40 50
What are the most reasonable values for the y-intercept b0 and the slope b1 ?
A) b0 = 0 and b1 = 1
B) b0 = 25 and b1 = 2.5
C) b0 = -25 and b1 = -2.5
D) b0 = 0 and b1 = -1
Simple Regression
80. Given the least squares regression line ŷ = -2.88 + 1.77x and a coefficient of
determination of 0.81, the coefficient of correlation is:
A) -0.88
B) +0.88
C) +0.90
D) –0.90
81. Suppose that we are interested in exploring the determinants of successful high
schools. One possible measure of success might be the percentage of students
who go on to college. The teachers’ union argues that there should be a
relationship between the average teachers’ salary and high school success. The
following regression line is obtained: “% of students going on to college = 13 +
0.001Average Teachers’ Salary” Which of the following statements is true?
83. Which of the following is used to plot the dependent variable versus the
independent variable?
A) Histogram
B) Bar chart
C) Pie chart
D) Scatter diagram
Simple Regression
84. In a regression problem the following pairs of (x, y) are given: (2, 1), (2,-1), (2, 0),
(2,-2) and (2, 2). That indicates that the:
A) coefficient of correlation is –1
B) coefficient of correlation is 0
C) coefficient of correlation is 1
D) coefficient of determination is between –1 and 1
ANSWER: B
85. Suppose we were to run a linear regression using the data in the following scatter
40
35
30
25
20
15
10
5
0
0 20 40 60 80 100 120
plot.
What are the most reasonable values for y-intercept, b0 and the slope b1 ?
A) b0 = -20 and b1 = 2
B) b0 = 0 and b1 = -2
C) b0 = -20 and b1 = 0.5
D) b0 = 20 and b1 = 2
86. Which of the following statements is true regarding the coefficient of correlation?
A) –0.40
B) –0.60
C) +0.40
D) +0.53
89. If the coefficient of correlation is 0.75, what does the coefficient of determination
equal?
A) 0.8660
B) 0.5625
C) 0.5916
D) 0.6123
A) predict the value of the dependent variable y for given value of the
independent variable X
B) predict the value of the independent variable X for a given value of the
dependent variable Y
C) measure the strength of the relationship between X and Y
D) All of the above.
91. Which of the following is true about the standard error of estimate?
92. If the least squares equation is Yˆ = 20 + 5X, then the value of 5 indicates
94. If all the points on a scatter diagram lie on a straight line, what is the standard
error of estimate?
A) –1
B) +1
C) 0
D)
95. Simple linear regression and correlation analysis require that the scales of
measurement be expressed in either
A) nominal or ordinal
B) ordinal or interval
C) interval or ratio
D) ratio or nominal
96. Which of the following table values would be appropriate for a 90% confidence
interval for the mean of y from a simple linear regression problem if the sample
size is 13?
A) 1.782
B) 1.796
C) 1.645
D) 2.179
Simple Regression
97. The following values of the coefficient of determination were listed in some
research articles. Which one would appear to be incorrect?
A) -0.81
B) 0.96
C) 0.52
D) 0.00
98. The regression sum of squares (SSR) is 83.6. Which of the following must be
true?
A) H 0 : 0 0
B) H0 : 0
C) H0 : r 0
D) Any of the above
100. In a regression problem the following pairs of (x, y) are given: (-2, 4), (-1, 1), (0,
0), (1, 1) and (2, 4). What does this indicate about the value of coefficient of
determination?
A) It is –1
B) It is +1
C) It is 0
D) It is undefined
True-False Questions
101. Regression analysis is used to measure the strength of the association between
two numerical variables, while correlation analysis is used for prediction.
102. The residual and the model error are equivalent concepts.
103. In general, increasing the number of observations will lead to a higher coefficient
of determination.
Simple Regression
105. We can always substitute any value for x into a least – squares regression line
ŷ b0 b1 x and make a meaningful decision about the predicted value of y.
106. One of the standard assumptions of the linear regression model is that the
variance of the error terms is equal to one.
107. When the predicted values of y and the actual values of y are the same, the
standard error of estimate will be 0.0.
109. The coefficient of determination is a statistical test of the fit of linear regression.
111. The width of the confidence interval estimate for the average value of Y does not
depend on the standard error of the estimate.
112. If the correlation coefficient is greater than 0.5 then the slope of the simple
regression model is greater than 1.
113. You give a pre-employment examination to your applicants. The test is scored
from 1 to 100. You have data on their sales at the end of one year measured in
dollars. You want to know if there is any linear relationship between pre-
employment examination score and sales. An appropriate test to use is the t-test
on the population correlation coefficient.
114. If the correlation coefficient is greater than 0.5 then the coefficient of
determination R 2 from a simple regression model is greater than 0.25.
Simple Regression
115. Suppose that for two random variables, X and Y, we test H 0 : = 0 against a two-
sided alternative, and we are unable to reject the null hypothesis. We could
therefore conclude that there is no relationship between X and Y.
117. In the sample regression line yˆ b0 b1 x, the term b0 is the y-intercept; this is the
value of y where the line intersects the y-axis whenever x = 0.
118. The value of the variation explained by the regression line can never be larger
than 1.0.
119. The coefficient of determination is the positive square root of the coefficient of
correlation.
120. The regression sum of squares (SSR) can never be greater than the total sum of
squares (SST).
121. The coefficient of determination is the proportion of the total variation in the
independent variable X that is explained or accounted for by its relationship with
the dependent variable Y.
122. When testing the strength of the relationship between two variables, the null
hypothesis of interest is H 0 : 0
123. The smaller the sample size, the smaller the standard error of estimate.
125. Regression analysis is the technique used to measure the strength of the
relationship between two variables using the coefficient of correlation and the
coefficient of determination.
126. The least squares method minimizes the sum of the vertical distances between the
actual values Y and the predicted values Yˆ .
127. For a given data set of (x, y) values, an infinite number of possible regression
equations can be fitted to the corresponding scatter diagram, and each equation
will have a unique combination of values for the y-intercept b0 and the slope
Simple Regression
b1 .
However, only one equation will be the “best fit” as defined by the least-
squares criterion.
128. The coefficient of determination represents the ratio of SSR to SST.
129. In simple linear regression, the fit of the regression equation to the data is
improved as error sum of squares increases and regression sum of squares
decreases.
131. When we compute the sample correlation r from data, the result will be definitely
zero when the population correlation is zero.
133. When testing the strength of the relationship between two variables, the alternate
hypothesis is H1 : 0
134. The divisor of the standard error of estimate in simple linear regression is: n – 2.
135. The purpose of correlation analysis is to find how strong the relationship is
between two variables.
136. The strength of the correlation between two variables depends on the sign of the
coefficient of correlation.
137. When the predicted values of y and the actual values of y are the same, the
standard error of estimate will be -1.0.
138. The t-test for the true slope 1 = 0 is identical to the t-test for the true correlation
=0.
139. The regression sum of squares represents the variability that is explained by the
intercept of the regression equation.
140. When the predicted values of y and the actual values of y are the same value, the
standard error of the estimate will be 1.0.
141. Correlation coefficients of –0.95 and +0.95 represent relationships between two
variables that have equal strength but different directions.
Simple Regression
142. The uniform variance assumption for the linear regression model states that the
error terms are random variables with a mean equal to one and the same variance.
143. The value of the variation explained by the regression line can never be smaller
than 0.0.
144. The regression sum of squares (SSR) can never be larger than the error sum of
squares (SSE).
146. The regression sum of squares (SSR) can never be larger than the total sum of
squares.
147. If the coefficient of determination is .64, then the correlation coefficient must be
0.80.
149. The hypothesis test for the population slope relies on the F distribution.
151. What factors would tend to reduce the variation of a prediction from a simple
linear regression?
ANSWER:
a) An increase in the variation of X
b) An increase in sample size
c) Predictions for X values closer to the mean of X
d) Higher coefficient of determination
e) Lower significance level associated with the prediction interval.
152. The management of a local hotel is interested in determining the optimal staffing
of the dining room. They believe that the number of overnight guests may
explain the number of dinners served on a particular evening. They collected
some data and ran a linear regression. The equation of the regression line is given
by:
Number of dinners served = 120 + .60 Number of overnight guests.
Interpret these results for the hotel management.
153. For a random sample of 263 professionals, the correlation between their age and
their income was found to be 0.17. Use 0.05 level of significance to test the null
hypothesis hat there is no linear relationship between these two variables against
the alternative that there is a positive relationship.
154. What is the difference between a population linear regression model and an
estimated linear regression model?
Simple Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.187
R Square *
Adjusted R Square
Standard Error *
Observations 115.000
ANOVA
Df SS MS F Significance
F
Regression 1.000 * 130433116.21 * 0.046
9
Residual 113.000 3609911959.868 *
Total 114.000 3740345076.087
161. What is the p-value associated with the independent variable age?
Simple Regression
168. What is the correlation coefficient for the stock returns of the two years? What
sign does it have? Why?
173. Use the F-statistic in the eleventh question to test the hypothesis in the tenth
question at = 0.05.
174. Have you noticed any relationship between the Student’s t-statistic computed for
the slope coefficient, b1 , in the tenth question for = 0.025 and the F-statistic
computed in the eleventh question for = 0.05?
Simple Regression
175. In a study it was shown that for a sample of 375 college faculty the
correlation was 0.15 between annual raises and teaching evaluations. What would
be the coefficient of determination of a regression of annual raises on teaching
evaluations for this sample? Interpret your results.
176. Suppose that we obtained an estimated equation for the regression of weekly sales
of ice cream and the price charges during the week. Interpret the constant b0 for
the product brand manager.
177. Suppose you are interested in understanding why some people send more e-mail
messages than others do. One possible explanation may be the age of the
individual. Older people tend to be less technologically savvy, and may have
fewer friends who use e-mail. A researcher examines the number of e-mails a
person sends weekly using regression analysis and comes up with the following
equation of the regression line:
Number of e-mails = 63 – 0.5 Age. Interpret these results.
Years of 14 23 14 19 19 12 10 4
Employment
Income 69.7 71.2 68.2 71.5 69.7 70.1 67.6 66.1
180. Use computer to run the simple linear regression analysis of income on length of
employment.
185. Test the hypotheses in the previous question using the F-statistic.
187. Develop an approximation of the 95% confidence interval for the expected value
of Y, given X = X ?
190. Explain the difference between the residual ei and the model error i .
191. Compute the coefficients for a least squares regression equation and write the
equation, given the following sample statistic:
x 12; y 48; sx 90; s y 72; rxy 0.45; n 50 .
Simple Regression
194. Calculate the covariance between instructor ratings and expected grades.
195. Calculate the sample correlation coefficient between instructor ratings and
expected grades.
197. Test at the 10% significance level the hypothesis that the population
correlation coefficient is zero against the alternative that it is positive.
198. What does the least-squares criterion have to do with obtaining a regression line
for a given set of data?
y
i 1
i 24.8 , x
i 1
i 27.6 , x
i 1
2
i 152.8 , and x y
i 1
i i 158.5
xi x x x y y
2
i i
i 1 i 1
approximately equal size, x 30.0 , y 18.8 , 412 , 212
n 1 n 1
Where, Y = Number of bottles sold per evening, and X = Average costs, in dollars, of a
meal.
y y
2
i 250; R 2 0.84; n = 50.
i 1
x x
2
yˆi 6 8xi and also the following statistics: se 3.20 , x 8 , n = 42, and i =
i 1
420.
208. Find the 95% confidence interval for the point where x =18.
Simple Regression
209. Find the 95% prediction interval for the point where x =18.