0% found this document useful (0 votes)
1K views

Simple Regression: Multiple-Choice Questions

This document contains 22 multiple choice questions about simple linear regression. The questions cover topics such as: the assumptions of the linear regression model; hypothesis testing of regression coefficients; interpreting R-squared, standard errors, t-statistics, and p-values from regression output; using the regression equation to make predictions; and testing for linear relationships between variables using the sample correlation coefficient. The questions are based on examples of regressions relating variables like exam grades, car prices, insurance claims, income, and more. Correct answers to individual questions are not provided.

Uploaded by

Nameera Alam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views

Simple Regression: Multiple-Choice Questions

This document contains 22 multiple choice questions about simple linear regression. The questions cover topics such as: the assumptions of the linear regression model; hypothesis testing of regression coefficients; interpreting R-squared, standard errors, t-statistics, and p-values from regression output; using the regression equation to make predictions; and testing for linear relationships between variables using the sample correlation coefficient. The questions are based on examples of regressions relating variables like exam grades, car prices, insurance claims, income, and more. Correct answers to individual questions are not provided.

Uploaded by

Nameera Alam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Simple Regression

Multiple-Choice Questions

1. We can show that, when the null hypothesis H 0 :   0 is true and the random
r n2
variables have a joint normal distribution, then the random variable -
1 r2
which is used to test the hypothesis that there is no linear association in the
population between a pair of random variables - follows the

A) normal distribution
B) Student's t distribution
C) chi-square distribution
D) F distribution

2. In the assumptions of the linear regression model, which of the following is not
one of the assumptions regarding the error term?

A) The mean of the error term is 0.


B) The error term follows the t-distribution.
C) The variance of the error term is constant.
D) The error terms across observations are independent of one another.

3. Test H0 :   1 vs. H1 :   1 if the following regression information are given: b1 =


1.39, sb1 = 0.18, and n = 30.

A) Reject H 0 for  = 0.01


B) Reject H 0 for  = 0.10
C) Reject H 0 for  = 0.005
D) Unable to reject H 0 for  < 0.10

4. Which of the following sum of squares is minimized by the least squares method?

A) Error sum of squares


B) Regression sum of squares
C) Total sum of squares
D) All of the above
Simple Regression

5. Which of the following will tend to make it more likely to reject H 0 : 1  0 ?

A) Increasing SSE.
B) Decreasing the absolute value of b1 .
C) Increasing the variance of X.
D) Decreasing sample size.

6. A regression analysis between sales (in $1000) and advertising (in $) resulted in
the following least squares line: ŷ = 80,000 + 4x. This implies that:

A) an increase of $1 in advertising is expected to result in an increase of $4 in


sales
B) an increase $4 in advertising is expected to result in an increase of $4,000 in
sales
C) an increase of $1 in advertising is expected to result in an increase of $80,004
in sales
D) an increase of $1 in advertising is expected to result in an increase of $4,000
in sales

7. If you perform a linear regression analysis of hours on income, which of the


following statements is the most accurate statement regarding the p-value
associated with the hypothesis test for the population slope, assuming a two-sided
alternative?

A) 0.02< p-value < 0.5


B) 0.01< p-value < 0.02
C) 0.05< p-value < 0.10
D) p-value > 0.10

THE NEXT TWO QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


You want to explore the relationship between the grades students receive on their first
two exams. For a sample of 15 students, you find a correlation coefficient of 0.47

8. What is the value of the test statistic for testing H 0 :   0 vs. H1 :   0 ?


A) 2.80
B) 1.06
C) 1.39
D) 1.92
Simple Regression

9. What is the most accurate statement you can make for testing the hypotheses in
the previous question?

A) Reject H0 :   0 at  = 0.01
B) Reject H0 :   0 at  = 0.05
C) Reject H0 :   0 at  = 0.025
D) Reject H0 :   0 at  = 0.005

THE NEXT SEVEN QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


The manager of a used-car dealership is very interested in the resale price of used cars.
The manager feels that the age of the car is important in determining the resale value. He
collects data on the age and resale value of 15 cars and runs a regression analysis with the
value of the car (in thousands of dollars) as the dependent variable and the age of the car
(in years) as the independent variable. Unfortunately, he spilled his coffee on the printout
and lost some of the results, identified by “A” through “F”. The partial results left are
displayed below.

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.442
R Square “A”
Adjusted R Square 0.133
Standard Error “B”
Observations 15.000
ANOVA
df SS MS F Significance F
Regression 1 44.397 44.397 3.154 “C”
Residual 13 “D” 14.076
Total 14 227.389

Coefficients Standard Error t Stat P-value


Intercept “E” 3.835 5.988 0.000
Age “F” 0.640 -1.776 “G”

10. What is the value of *A*?

A) 0.195
B) 0.805
C) 0.442
D) 0.67
Simple Regression

11. What is the value of *B*?

A) 2.58
B) 6.67
C) 3.75
D) 3.95

12. What is the most accurate statement that can be made about the value of *C*?

A) <0.01
B) >0.05
C) <0.025
D) None of the above.

13. What is the value of *D*?

A) 172.25
B) 162.42
C) 140.03
D) 182.99

14. What is the value of *E*?

A) 9.35
B) 3.06
C) 9.82
D) 22.96

15. What is the value of *F*?

A) 1.136
B) -1.136
C) 0.278
D) -0.278

16. What is the approximate value of *G*?

A) 0.025
B) 0.05
C) 0.10
D) 0.01
Simple Regression

17. In order to estimate with 95% confidence the expected value of y in a simple
linear regression problem, a random sample of 10 observations is taken. Which
of the following t-table values listed below would be used?

A) 2.228
B) 1.860
C) 1.812
D) 2.306

18. Suppose you are trying to develop a forecast of yn 1 based on xn 1 , which of the
following will not reduce the prediction interval for your prediction?

A) Lower the variation in the independent variable.


B) Lower the standard error of estimate.
C) Increase the sample size.
D) Choose a value of xn 1 closer to the mean of x.

THE NEXT FOUR QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


An insurance company analyst is interested in analyzing the dollar value of damage in
automobile accidents. She collects data from 115 accidents, and records the amount of
damage as well as the age of the driver. The results of her regression analysis are listed
below.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.187
R Square 0.035
Adjusted R 0.026
Square
Standard Error 5652.090
Observations 115.000

ANOVA
df SS MS F Significance
F
Regression 1 130433116.219 130433116.219 4.083 0.046
Residual 113 3609911959.868 31946123.539
Total 114 3740345076.087
Simple Regression

Coefficient Standard Error t Stat P-value


s
Intercept 10725.802 1535.215 6.987 0.000
Age 69.964 34.625 2.021 0.046

19. Which of the following statements is true?

A) The F-statistic tells us that this model is statistically significant in explaining


variation in damage.
B) We should not put much faith in these results since the value of R 2 is less
than 10%.
C) The age of a driver is not statistically significant in explaining damage.
D) At  = 0.05, there is no evidence of a linear relationship between age and
damage.

20. How would you best explain the y-intercept in this situation?

A) For each additional 1-year increase in the age of the driver, we would expect
damage to increase by $10,726.
B) For each additional 1-year increase in the age of the driver, we would expect
damage to increase by $70.
C) It makes no sense to explain the intercept in this situation, since we can not
have a driver with age of zero.
D) The average amount of damage was $10,726.

21. On average, what would be the dollar value of an accident involving a 25-year-old
driver?

A) $11,836.56
B) $10,795.47
C) $13,372.58
D) $2,474.90

22. Which of the following statements is the best explanation of the R 2 ?

A) 3.5% of accident damage is explained by the age of the driver.


B) 3.5% of the variation in accident damage can be explained by variation in the
age of the driver.
C) 3.5% of the time, the amount of damage is explained by the age of the driver.
D) 3.5% of accident damage can be explained by variation in the age of the
driver.
Simple Regression

THE NEXT THREE QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


For a random sample of 263 professionals, the correlation between their age and their
income was found to be 0.17. You are interested in testing the null hypothesis that there
is no linear relationship between these two variables against the alternative that there is a
positive relationship.

23. What is the value of the test statistic?

A) 3.669
B) 2.756
C) 2.787
D) 6.785
ANSWER: C

24. What is the most accurate statement that can be made about the p-value for this
test?

A) p -value < 0.005


B) p -value < 0.01
C) p -value < 0.025
D) p -value < 0.05

25. What is your conclusion in testing H0 :   0 vs. H1 :   0 at  = 0.05?

A) Fail to reject H 0
B) Reject H 0
C) Not enough information to draw a conclusion
D) None of the above

THE NEXT THREE QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


You want to explore the relationship between the grades students receive on their first
quiz (X) and their first exam (Y). The first quiz and test scores for a sample of eight
students reveal the following summary statistics:
 x  x  y  y   353.5, sx  4.0311, and s y  18.1384

26. What is the covariance between X and Y?

A) 4.83.
B) 7.11.
C) 50.50.
D) 58.92.
Simple Regression

27. What is the sample correlation coefficient?

A) 0.691.
B) 0.806.
C) 0.749.
D) 0.209.

28. What is the value of the test statistic for testing H 0 :   0 ?

A) 2.53.
B) 2.34.
C) 2.20.
D) 2.77.

THE NEXT TWO QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


You want to explore the relationship between the scores students receive on their first
quiz and their first exam. You believe that there is a positive correlation between the two
scores.

29. What are the most appropriate null and alternative hypotheses regarding the
population correlation?

A) H0 :   0 and H1 :   0
B) H0 :   0 and H1 :   0
C) H0 :   0 and H1 :   0
D) H0 :   0 and H1 :   1

30. What is the appropriate decision rule?

A) Reject H 0 if r(n-2)/(1- r 2 )> tn  2, 


B) Reject H 0 if r n  2 / 1  r 2  tn  2, 
C) Reject H 0 if r  n  2  / 1  r 2  tn  2, 
D) Reject H 0 if r  n  2 / 1  r 2  tn  2,  / 2
Simple Regression

31. Develop a 90% confidence interval for the population slope if the following
regression information are given: b1 = 23.5, p-value = 0.01 and n = 25

A) 23.5  17.35
B) 23.5  15.35
C) 23.5  16.35
D) 23.5  14.35

32. In a simple regression problem, if the standard error of estimate se = 18 and n =


10, then what is the error sum of squares, SSE?
A) 2916
B) 2592
C) 1800
D) 3240

33. The vertical spread of the data points about the regression line is measured by the:

A) regression coefficient
B) y-intercept
C) standard error of estimate
D) F-ratio

34. Which of the following will not tend to increase the standard error of the slope?

A) Increase the sample size.


B) Decrease the variation in X.
C) Decrease the SSE.
D) Increase the value of  .

35. The residual is defined as the difference between the:

A) actual value of y and the estimated value of y


B) actual value of x and the estimated value of x
C) actual value of y and the estimated value of x
D) actual value of x and the estimated value of y

36. A regression analysis between weight (y in pounds) and height (x in inches)


resulted in the following least squares line: ŷ = 120 + 5x. This implies that if the
height is increased by 1 inch, the weight is expected to:

A) increase by 1 pound
B) decrease by 1 pound
C) decrease by 24 pounds
D) increase by 5 pounds
Simple Regression

THE NEXT FOUR QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


Suppose we have the following information from a simple regression: b0 = 117.4, b1 = -
14.39, sb0 = 0.18, sb1 = 0.18, n = 300, x = 4.3, SST = 17045, and SSE = 12053.

37. What is the coefficient of determination?

A) 0.2929
B) 0.7122
C) 0.5408
D) 0.4671

38. What is the correlation coefficient?

A) -0.6834
B) 0.5412
C) 0.6834
D) –0.5412

39. What is the sample mean of Y?

A) 103.08
B) 179.23
C) 55.52
D) 74.37

40. Which of the following would most likely represent a 95% confidence interval for
the estimate of Y, given X = X ?

A) 55.52  6.21
B) 55.52  12.42
C) 55.52  40.4
D) 55.52  18.63

THE NEXT TWO QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


Consider a random sample of 25 observations of two variables X and Y. The following
summary statistics are available:  yi = 57.2,  xi = 1253.4,  xi2 = 73296.4, and
 x y = 3133.7.
i i

41. What is the slope of the sample regression line?

A) 0.043
B) 54.85
C) 0.978
D) 0.025
Simple Regression

42. What is the intercept of the sample regression line?

A) 2.038
B) 1.035
C) 3.832
D) 3.463

THE NEXT TWELVE QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


A sample of 8 households was asked about their monthly income (X) and the number of
hours they spend connected to the internet each month (Y). The data yield the following
statistics:
 x  324,  y  393,  x  x   y  y   x  x  y  y   1090.5
2 2
 1720.875,  1150,

43. What is the standard deviation of the households’ monthly income?

A) 0.678
B) 0.791
C) 0.905
D) 0.775

44. What is the standard deviation of the number of hours households spend
connected to the internet each month?

A) 12.817
B) 14.667
C) 15.679
D) 11.990

45. What is the sample covariance between X and Y?

A) 136.313
B) 155.786
C) 181.750
D) 159.032

46. What is the sample correlation coefficient between X and Y?

A) 0.678
B) 0.791
C) 0.905
D) 0.775
Simple Regression

47. What is the slope of the regression line of hours on income?

A) 0.6337
B) 0.9482
C) 0.5541
D) 0.6475

48. What is the y-intercept of the regression line of hours on income?

A) 7.87
B) 8.87
C) 8.37
D) 9.37

49. What is the regression sum of squares?

A) 691.062
B) 1033.601
C) 461.812
D) 437.918

50. What is the error sum of squares?

A) 116.399
B) 458.938
C) 712.082
D) 688.188

51. What is the value of the coefficient of determination?

A) 0.637
B) 0.575
C) 0.601
D) 0.664

52. What is the estimate of the variance of the population model error?

A) 118.347
B) 114.698
C) 19.399
D) 76.156
Simple Regression

53. What is the standard error of the slope of the regression line of hours on income?

A) 0.256
B) 0.234
C) 0.211
D) 0.269

54. What is the value of the test statistic for testing H0 : 1  0 vs. H1 : 1  0 ?

A) 2.36
B) 3.00
C) 2.48
D) 2.71

55. An indication of no linear relationship between two variables would be a:

A) coefficient of determination equal to 1


B) coefficient of determination equal to -1
C) coefficient of correlation of 0
D) coefficient of correlation equal to -1

56. Suppose we were to run a linear regression using the data in the following scatter

70
60
50
40
30
20
10
0
0 10 20 30 40 50

plot.

What are the most reasonable values for y-intercept b0 and the slope b1 ?

A) b0 = 45 and b1 = 2
B) b0 = 45 and b1 = -2
C) b0 = 45 and b1 = -20
D) b0 = 120 and b1 = -2
Simple Regression

THE NEXT EIGHT QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


A sales manager is interested in determining the relationship between the amount spent
on advertising and total sales. The manager collects data for the past 24 months and runs
a regression of sales on advertising expenditures. The results are presented below but,
unfortunately, some values identified by asterisks are missing.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.492
R Square 0.242
Adjusted R 0.208
Square
Standard Error 40.975
Observations 24.000

ANOVA
df SS MS F Significance F
Regression 1 11809.406 11809.406 7.034 *
Residual * * *
Total * *

Coefficients Standard Error t Stat P-value


Intercept * 26.239 4.021 0.001
Advertising 2.015 * 2.652 0.015

57. What are the degrees of freedom for residuals?

A) 21
B) 22
C) 23
D) 24

58. What is the value of mean square error?

A) 1678.9
B) 1,554.2
C) 1,493.6
D) 1,407.3
Simple Regression

59. What are the total degrees of freedom?

A) 21
B) 22
C) 23
D) 24

60. What is the value of residual sum of squares?

A) 10,945.2
B) 11,759.9
C) 10,130.5
D) 36,935.8
61. What is the value of total sum of squares?

A) 48,745.2
B) 46,538.7
C) 50,292.4
D) 52,644.8

62. What is the value of significance F?

A) Larger than 0.10


B) Smaller than 0.01
C) Smaller than 0.05
D) None of the above.

63. What is the regression coefficient of y-intercept?

A) 112.4
B) 102.3
C) 108.6
D) 105.5

64. What is the standard error of estimate?

A) 0.66
B) 0.76
C) 0.85
D) 0.61
Simple Regression

65. A regression analysis between sales (in $1000) and advertising (in $100) resulted
in the following least squares line: ŷ = 75 + 5x. This implies that if advertising is
$800, then the predicted amount of sales (in dollars) is:

A) $4075
B) $115,000
C) $64,000
D) $79,000

66. In publishing the results of some research work, the following values of the
coefficient of determination were listed. Which one would appear to be
incorrect?

A) 0.91
B) 0.06
C) 0.47
D) -0.64

67. Correlation analysis is used to determine the:

A) strength of the relationship between x and y


B) least squares estimates of the regression parameters
C) predicted value of y for a given value of x
D) coefficient of determination

68. In a regression problem, if the coefficient of determination is 0.90, this means


that:

A) 90% of the y values are positive


B) 90% of the variation in y can be explained by the regression line
C) 90% of the x values are equal
D) 90% of the variation in x can be explained by regression line

69. The regression line ŷ = 3 + 2x has been fitted to the data points (4,8), (2,5), and
(1,2). The residual sum of squares will be:

A) 10
B) 15
C) 13
D) 22
Simple Regression

THE NEXT EIGHT QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


A professor of statistics is interested in studying the relationship between the number of
hours graduate students spent studying for his comprehensive final exam and the exam
score. The results of the regression analysis of hours studied on exam scores are
presented below.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.834298
R Square 0.696053
Adjusted R Square 0.645395
Standard Error 3.924283
Observations 8

ANOVA
df SS MS F Significance F
Regression 1 211.6 211.6 13.74026 0.010007609
Residual 6 92.4 15.4
Total 7 304

Coefficients Standard Error t Stat P-value


Intercept 64.3 5.75413 11.17458 0.000031
Hours studied 4.6 1.24097 3.70679 0.010008

70. What is the sample correlation?

A) 0.696
B) 0.834
C) 0.645
D) 0.803

71. What is the value of the test statistic for testing H0 :   0 vs. H1 :   0 ?

A) 4.439
B) 4.113
C) 3.702
D) 3.305

72. What is (are) the critical value(s) for testing the hypotheses in the second question
at 0.05 level of significance?

A) + 1.943
B)  2.447
C) + 1.645
D) – 1.645
Simple Regression

73. Based on your answers to the second and third questions, what is your conclusion
at the 0.05 level of significance?

A) Since the test statistic equals 4.439 and the critical value equals 1.645, we
reject the null hypothesis and assume there is a positive correlation between
the variables
B) Since the test statistic equals 4.113 and the critical value equals -1.645, we fail
to reject the null hypothesis and assume there is no correlation between the
variables.
C) Since the test statistic equals 3.305 and the critical value equals 2.447, we
reject the null hypothesis and assume there is a correlation between the
variables.
D) Since the test statistic equals 3.702 and the critical value equals 1.943, we
reject the null hypothesis and assume there is a positive correlation between
the variables

74. What is the value of the coefficient of determination?

A) 0.696
B) 0.645
C) 0.834
D) 1.241

75. What is the slope of the sample regression line?

A) 3.71
B) 1.24
C) 4.60
D) 3.92

76. What is the intercept of the sample regression line?

A) 11.17
B) 64.30
C) 92.40
D) 13.74
Simple Regression

77. What is the approximate point estimate of the final exam score for a student who
have studied 5.45 hours?

A) 81
B) 92
C) 85
D) 89

78. The following values are listed as coefficients of correlation (r). The one that
indicates an inverse relationship between the two variables x and y is:

A) 0.0
B) -0.8
C) 0.9
D) 1.3

79. Suppose we were to run a linear regression using the data in the following scatter
plot.

120
100
80
60
40
20
0
0 10 20 30 40 50

What are the most reasonable values for the y-intercept b0 and the slope b1 ?

A) b0 = 0 and b1 = 1
B) b0 = 25 and b1 = 2.5
C) b0 = -25 and b1 = -2.5
D) b0 = 0 and b1 = -1
Simple Regression

80. Given the least squares regression line ŷ = -2.88 + 1.77x and a coefficient of
determination of 0.81, the coefficient of correlation is:

A) -0.88
B) +0.88
C) +0.90
D) –0.90

81. Suppose that we are interested in exploring the determinants of successful high
schools. One possible measure of success might be the percentage of students
who go on to college. The teachers’ union argues that there should be a
relationship between the average teachers’ salary and high school success. The
following regression line is obtained: “% of students going on to college = 13 +
0.001Average Teachers’ Salary” Which of the following statements is true?

A) Increase % of students going on to college by 0.001 percent, we would expect


average teacher’s salary to increase by one dollar.
B) Increase % of students going on to college by one percent, we would expect
average teacher’s salary to increase by 0.001 dollar.
C) Increase teacher’s average salary by 0.001 dollar, we would expect % of
students going on to college to increase by one percent.
D) Increase average teacher’s salary by one dollar, we would expect % of
students going on to college to increase by 0.001 percent.

82. In regression analysis, if the coefficient of determination is 1.0, then the:

A) residual sum of squares must be 0.0


B) error sum of squares must be 1.0
C) regression sum of squares must be 1.0
D) total sum of squares must be 0.0

83. Which of the following is used to plot the dependent variable versus the
independent variable?

A) Histogram
B) Bar chart
C) Pie chart
D) Scatter diagram
Simple Regression

84. In a regression problem the following pairs of (x, y) are given: (2, 1), (2,-1), (2, 0),
(2,-2) and (2, 2). That indicates that the:

A) coefficient of correlation is –1
B) coefficient of correlation is 0
C) coefficient of correlation is 1
D) coefficient of determination is between –1 and 1
ANSWER: B

85. Suppose we were to run a linear regression using the data in the following scatter

40
35
30
25
20
15
10
5
0
0 20 40 60 80 100 120

plot.

What are the most reasonable values for y-intercept, b0 and the slope b1 ?

A) b0 = -20 and b1 = 2
B) b0 = 0 and b1 = -2
C) b0 = -20 and b1 = 0.5
D) b0 = 20 and b1 = 2

86. Which of the following statements is true regarding the coefficient of correlation?

A) It takes on values from –1.0 to +1.0 inclusive.


B) It measures the strength of the relationship between two variables
C) A value of 0.00 indicates the dependent and independent variables are not
related
D) All of the above
Simple Regression

87. Which value of the following coefficient of correlation indicates a stronger


correlation than 0.50?

A) –0.40
B) –0.60
C) +0.40
D) +0.53

88. What does a coefficient of correlation of 0.80 infer?

A) Almost no correlation because 0.80 is not close enough to 1.0


B) Eight percent of the variation in the independent variable is explained by the
dependent variable.
C) Sixty four percent of the variation in the dependent variable is explained by
the independent variable
D) None of the above

89. If the coefficient of correlation is 0.75, what does the coefficient of determination
equal?

A) 0.8660
B) 0.5625
C) 0.5916
D) 0.6123

90. Based on the regression equation, we can

A) predict the value of the dependent variable y for given value of the
independent variable X
B) predict the value of the independent variable X for a given value of the
dependent variable Y
C) measure the strength of the relationship between X and Y
D) All of the above.

91. Which of the following is true about the standard error of estimate?

A) It is a measure of the accuracy of the prediction


B) It is based on the squared vertical deviations between Y and Yˆ
C) It is always positive
D) All of the above
Simple Regression

92. If the least squares equation is Yˆ = 20 + 5X, then the value of 5 indicates

A) where the regression line meets the Y-axis


B) for each unit increase in X, Y increases on average by 5
C) for each unit increase in Y, X increases on average by 5
D) None of the above

93. The covariance between two variables X and Y

A) has the same sign as the coefficient of correlation


B) always greater than zero
C) always smaller than zero
D) is computed by dividing  XY by the number of (X,Y) pairs.

94. If all the points on a scatter diagram lie on a straight line, what is the standard
error of estimate?

A) –1
B) +1
C) 0
D) 

95. Simple linear regression and correlation analysis require that the scales of
measurement be expressed in either

A) nominal or ordinal
B) ordinal or interval
C) interval or ratio
D) ratio or nominal

96. Which of the following table values would be appropriate for a 90% confidence
interval for the mean of y from a simple linear regression problem if the sample
size is 13?

A) 1.782
B) 1.796
C) 1.645
D) 2.179
Simple Regression

97. The following values of the coefficient of determination were listed in some
research articles. Which one would appear to be incorrect?

A) -0.81
B) 0.96
C) 0.52
D) 0.00

98. The regression sum of squares (SSR) is 83.6. Which of the following must be
true?

A) The correlation coefficient is 0.95.


B) The slope of the regression line is positive
C) The error sum of squares (SSE) is larger than or equal to 16.4
D) The total sum of squares (SST) is larger than or equal to 83.6.

99. In a simple linear model, testing H 0 : 1  0 is the same as testing

A) H 0 : 0  0
B) H0 :   0
C) H0 : r  0
D) Any of the above

100. In a regression problem the following pairs of (x, y) are given: (-2, 4), (-1, 1), (0,
0), (1, 1) and (2, 4). What does this indicate about the value of coefficient of
determination?

A) It is –1
B) It is +1
C) It is 0
D) It is undefined

True-False Questions

101. Regression analysis is used to measure the strength of the association between
two numerical variables, while correlation analysis is used for prediction.

102. The residual and the model error are equivalent concepts.

103. In general, increasing the number of observations will lead to a higher coefficient
of determination.
Simple Regression

104. The results of a linear regression can be strongly influenced by outliers.

105. We can always substitute any value for x into a least – squares regression line
ŷ  b0  b1 x and make a meaningful decision about the predicted value of y.

106. One of the standard assumptions of the linear regression model is that the
variance of the error terms is equal to one.

107. When the predicted values of y and the actual values of y are the same, the
standard error of estimate will be 0.0.

108. In performing a regression analysis involving two numerical variables, we are


assuming the variation around the line of regression is linear and depends on each
x value.

109. The coefficient of determination is a statistical test of the fit of linear regression.

110. In general, increasing the variation of X will lead to a higher coefficient of


determination.

111. The width of the confidence interval estimate for the average value of Y does not
depend on the standard error of the estimate.

112. If the correlation coefficient is greater than 0.5 then the slope of the simple
regression model is greater than 1.

113. You give a pre-employment examination to your applicants. The test is scored
from 1 to 100. You have data on their sales at the end of one year measured in
dollars. You want to know if there is any linear relationship between pre-
employment examination score and sales. An appropriate test to use is the t-test
on the population correlation coefficient.

114. If the correlation coefficient is greater than 0.5 then the coefficient of
determination R 2 from a simple regression model is greater than 0.25.
Simple Regression

115. Suppose that for two random variables, X and Y, we test H 0 :  = 0 against a two-
sided alternative, and we are unable to reject the null hypothesis. We could
therefore conclude that there is no relationship between X and Y.

116. A correlation coefficient equal to –1 or +1 indicates perfect correlation.

117. In the sample regression line yˆ  b0  b1 x, the term b0 is the y-intercept; this is the
value of y where the line intersects the y-axis whenever x = 0.

118. The value of the variation explained by the regression line can never be larger
than 1.0.

119. The coefficient of determination is the positive square root of the coefficient of
correlation.

120. The regression sum of squares (SSR) can never be greater than the total sum of
squares (SST).

121. The coefficient of determination is the proportion of the total variation in the
independent variable X that is explained or accounted for by its relationship with
the dependent variable Y.

122. When testing the strength of the relationship between two variables, the null
hypothesis of interest is H 0 :   0

123. The smaller the sample size, the smaller the standard error of estimate.

124. The coefficient of determination can only take on positive values.

125. Regression analysis is the technique used to measure the strength of the
relationship between two variables using the coefficient of correlation and the
coefficient of determination.

126. The least squares method minimizes the sum of the vertical distances between the
actual values Y and the predicted values Yˆ .

127. For a given data set of (x, y) values, an infinite number of possible regression
equations can be fitted to the corresponding scatter diagram, and each equation
will have a unique combination of values for the y-intercept  b0  and the slope
Simple Regression

 b1  .
However, only one equation will be the “best fit” as defined by the least-
squares criterion.
128. The coefficient of determination represents the ratio of SSR to SST.

129. In simple linear regression, the fit of the regression equation to the data is
improved as error sum of squares increases and regression sum of squares
decreases.

130. The coefficient of determination is often interpreted as the percent of variability in


the dependent variable Y that is explained by the regression equation.

131. When we compute the sample correlation r from data, the result will be definitely
zero when the population correlation  is zero.

132. When we are seeking data to estimate a regression model, it is important to


choose the observations of the independent variable that provide the smallest
possible spread in X so that we obtain a regression model with the largest
coefficient of determination,

133. When testing the strength of the relationship between two variables, the alternate
hypothesis is H1 :   0

134. The divisor of the standard error of estimate in simple linear regression is: n – 2.

135. The purpose of correlation analysis is to find how strong the relationship is
between two variables.

136. The strength of the correlation between two variables depends on the sign of the
coefficient of correlation.

137. When the predicted values of y and the actual values of y are the same, the
standard error of estimate will be -1.0.

138. The t-test for the true slope 1 = 0 is identical to the t-test for the true correlation
 =0.

139. The regression sum of squares represents the variability that is explained by the
intercept of the regression equation.

140. When the predicted values of y and the actual values of y are the same value, the
standard error of the estimate will be 1.0.

141. Correlation coefficients of –0.95 and +0.95 represent relationships between two
variables that have equal strength but different directions.
Simple Regression

142. The uniform variance assumption for the linear regression model states that the
error terms are random variables with a mean equal to one and the same variance.

143. The value of the variation explained by the regression line can never be smaller
than 0.0.

144. The regression sum of squares (SSR) can never be larger than the error sum of
squares (SSE).

145. A t-test is used to test the significance of the coefficient of correlation.

146. The regression sum of squares (SSR) can never be larger than the total sum of
squares.

147. If the coefficient of determination is .64, then the correlation coefficient must be
0.80.

148. A coefficient of correlation of –0.98 indicates a very weak negative correlation.

149. The hypothesis test for the population slope relies on the F distribution.

150. If the coefficient of correlation is –0.80, then the coefficient of determination is –


0.64.
Simple Regression

Short Answer and Applied Questions

151. What factors would tend to reduce the variation of a prediction from a simple
linear regression?

ANSWER:
a) An increase in the variation of X
b) An increase in sample size
c) Predictions for X values closer to the mean of X
d) Higher coefficient of determination
e) Lower significance level associated with the prediction interval.

152. The management of a local hotel is interested in determining the optimal staffing
of the dining room. They believe that the number of overnight guests may
explain the number of dinners served on a particular evening. They collected
some data and ran a linear regression. The equation of the regression line is given
by:
Number of dinners served = 120 + .60 Number of overnight guests.
Interpret these results for the hotel management.

153. For a random sample of 263 professionals, the correlation between their age and
their income was found to be 0.17. Use 0.05 level of significance to test the null
hypothesis hat there is no linear relationship between these two variables against
the alternative that there is a positive relationship.

154. What is the difference between a population linear regression model and an
estimated linear regression model?
Simple Regression

THE NEXT SEVEN QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


The results of a regression analysis are listed below but, unfortunately, some values as
identified by asterisks are missing.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.187
R Square *
Adjusted R Square
Standard Error *
Observations 115.000

ANOVA
Df SS MS F Significance
F
Regression 1.000 * 130433116.21 * 0.046
9
Residual 113.000 3609911959.868 *
Total 114.000 3740345076.087

Coefficients Standard Error t Stat P-value


Intercept 10725.802 1535.215 6.987 0.000
Age * 34.625 2.021 *

155. Calculate the coefficient of determination

156. Determine the mean square error

157. Determine the standard error of estimate

158. Calculate the regression sum of squares

159. What is the value of the test statistic F?

160. Calculate the slope coefficient.

161. What is the p-value associated with the independent variable age?
Simple Regression

THE NEXT THIRTEEN QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


Suppose that you are interested in the relationship between the return rate on a stock in
2005, compared to the return rate in 2004. You believe that the return rates in both years
are positively correlated. A sample of 15 stocks yields the following regression results: b0
= 5.3, b1 = 1.04, sb0 = 1.79, sb1 = 0.2163, R 2 = 0.64, and MSE = 35.4.

162. How would you slope coefficient b1 ?


163. What is the least squares regression line?

164. What is the predicted value of Y when X = 5?

165. Calculate the error sum of squares.

166. Calculate the total sum of squares.

167. Calculate the regression sum of squares.

168. What is the correlation coefficient for the stock returns of the two years? What
sign does it have? Why?

169. What are the appropriate null and alternative hypotheses?

170. Test the hypothesis in the eighth question at  = 0.05.

171. Test H0 : 1  0 vs. H1 : 1  0 at   0.025 .

172. Prepare the analysis of variance table for regression.

173. Use the F-statistic in the eleventh question to test the hypothesis in the tenth
question at  = 0.05.

174. Have you noticed any relationship between the Student’s t-statistic computed for
the slope coefficient, b1 , in the tenth question for  = 0.025 and the F-statistic
computed in the eleventh question for  = 0.05?
Simple Regression

175. In a study it was shown that for a sample of 375 college faculty the
correlation was 0.15 between annual raises and teaching evaluations. What would
be the coefficient of determination of a regression of annual raises on teaching
evaluations for this sample? Interpret your results.

176. Suppose that we obtained an estimated equation for the regression of weekly sales
of ice cream and the price charges during the week. Interpret the constant b0 for
the product brand manager.

177. Suppose you are interested in understanding why some people send more e-mail
messages than others do. One possible explanation may be the age of the
individual. Older people tend to be less technologically savvy, and may have
fewer friends who use e-mail. A researcher examines the number of e-mails a
person sends weekly using regression analysis and comes up with the following
equation of the regression line:
Number of e-mails = 63 – 0.5 Age. Interpret these results.

THE NEXT TWO QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


In a recent survey, 250 adults were asked about their monthly expenditures (in dollars) on
the lottery. In addition, data were collected about the number of years of education they
had. A simple regression was run, and the following results were obtained: b0 = 35.6, b1
= -2.37, sb0 = 18.9, and sb1 = 1.28 .

178. How would you Interpret the slope coefficient b1 ?

179. Test H0 : 1  0 vs. H1 : 1  0 at   0.05 .

THE NEXT SEVEN QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


You are interested in exploring the relationship between the income of professors
(measured in thousands of dollars) and the number of years they have been employed by
the university. You collect the following data from eight professors.

Years of 14 23 14 19 19 12 10 4
Employment
Income 69.7 71.2 68.2 71.5 69.7 70.1 67.6 66.1

180. Use computer to run the simple linear regression analysis of income on length of
employment.

181. What is the estimated regression line?


Simple Regression

182. What is the estimated income for a professor with 20 years of


employment?

183. How would you interpret the slope coefficient b1 ?

184. How would you interpret the coefficient of determination?

Test H0 : 1  0 vs. H1 : 1  0 using the t-test.

185. Test the hypotheses in the previous question using the F-statistic.

THE NEXT TWO QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


Suppose you are interested in determining the relationship between X and Y. You have
the following information: b0 = 117.4, b1 = -14.39, sb0 = 0.18, sb1 = 0.18, n = 300, x = 4.3,
SST = 17045, and SSE = 12053

186. What is the sample mean of Y?

187. Develop an approximation of the 95% confidence interval for the expected value
of Y, given X = X ?

188. Discuss the conceptual basis of using a t-test for inferences


about the population slope parameter. How does this t-test
relate to the t-test for the correlation coefficient and the F test
for the significance of the model? Do all these relationships make
sense?

189. Using regression techniques, we can plot a scatter diagram and


plot a line to the data using the Least Squares method.
Generally, the points do not fall directly on the line. Why does
this happen? Is it a problem?

190. Explain the difference between the residual ei and the model error  i .

191. Compute the coefficients for a least squares regression equation and write the
equation, given the following sample statistic:
x  12; y  48; sx  90; s y  72; rxy  0.45; n  50 .
Simple Regression

THE NEXT THREE QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


Given the estimated regression equation: ŷ = 95 + 8X.

192. What is the change in Y when X changes by -3?

193. What is the predicted value of Y when X =10?

THE NEXT THREE QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


A college administers for all its courses a student evaluation questionnaire. For a random
sample 12 courses the accompanying table shows both the average student ratings of the
instructor (on a scale from 1 to 5), and the average expected grades of the students (on a
scale from 0 to 4).
Inspector Rating 3.5 4.3 3.2 3.6 4.1 2.9 3.9 4.4 3.4 4.2 3.8 4.9
Expected Grade 2.8 2.9 2.4 3.2 3.3 2.8 3.1 3.3 2.6 3.0 3.4 3.5

194. Calculate the covariance between instructor ratings and expected grades.

195. Calculate the sample correlation coefficient between instructor ratings and
expected grades.

196. Therefore, the sample correlation coefficient between instructor ratings


and expected grades is

197. Test at the 10% significance level the hypothesis that the population
correlation coefficient is zero against the alternative that it is positive.

198. What does the least-squares criterion have to do with obtaining a regression line
for a given set of data?

THE NEXT THREE QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


For a sample of 20 monthly observations a financial analyst wants to regress the
percentage rate of return (Y) of the common stock of a corporation on the percentage rate
of return (X) of the Standard and Poor’s 500 Index. The following summary statistics are
available:
20 20 20 20

y
i 1
i  24.8 , x
i 1
i  27.6 , x
i 1
2
i  152.8 , and x y
i 1
i i  158.5

199. Estimate the linear regression of Y on X.

200. Interpret the slope of the sample regression line.

201. Interpret the intercept of the sample regression line.


Simple Regression

THE NEXT THREE QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


It was hypothesized that the number of bottles of an imported premium beer sold per
evening in the restaurants of a city depends linearly on the average costs of meals in the
restaurants. The following results were obtained for a sample of n =20 restaurants, of
n n

  xi  x    x  x  y  y 
2
i i
i 1 i 1
approximately equal size, x  30.0 , y  18.8 ,  412 ,  212
n 1 n 1
Where, Y = Number of bottles sold per evening, and X = Average costs, in dollars, of a
meal.

202. Determine the sample regression line

203. Interpret the slope of the sample regression line

204. Is it possible to provide a meaningful interpretation of the intercept of the sample


regression line? Explain.

THE NEXT THREE QUESTIONS ARE BASED ON THE FOLLOWING


INFORMATION:
The following statistics are computed from a random sample of pairs of X and Y
observations:
n

 y  y 
2
i  250; R 2  0.84; n = 50.
i 1

205. Compute the regression sum of squares, SSR.

206. Compute the error sum of squares, SSE.

207. Compute se2

THE NEXT TWO QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


Given a simple regression analysis, suppose that we have obtained the fitted regression
model:
n

 x  x 
2
yˆi  6  8xi and also the following statistics: se  3.20 , x  8 , n = 42, and i =
i 1
420.

208. Find the 95% confidence interval for the point where x =18.
Simple Regression

209. Find the 95% prediction interval for the point where x =18.

210. A scatter diagram includes the data points (x = 2, y = 5), (x = 4, y = 12), (x = 6, y


= 20), (x = 8, y = 28), and (x = 10, y = 30). Two regression lines are proposed:
1 yˆ  .6  3x, and yˆ  1  5x. Using the least squares criterion, which of these
regression lines is the better fit to the data? Why?

THE NEXT FOUR QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


The following regression statistics are given: sample size is 40, SST = 1000, and the
correlation between X and Y is 0.70.

211. Compute the coefficient of determination

212. Compute the regression sum of squares, SSR

213. Compute the error sum of squares, SSE

214. Use a simple regression model to test the hypothesis: H 0 : 1  0 vs.


H1 : 1  0 , with  = 0.05.

THE NEXT SEVEN QUESTIONS ARE BASED ON THE FOLLOWING INFORMATION:


Consider the following values of variables x and y.:
x 97 113 81 68 90 79
y 103 103 105 115 127 104

215. Perform a regression analysis on the data.

You might also like