0% found this document useful (0 votes)
5 views15 pages

PracticeProblems_FinalExam_Solutions

The document provides a comprehensive review of various statistical analyses, including hypothesis tests related to survival rates on the Titanic, gender disparities in NYPD ranks, and the relationship between acidity levels and biological diversity in streams. It includes detailed tables, test statistics, and conclusions drawn from Chi-square tests and linear regression analyses. The findings indicate significant relationships in each case, highlighting the impact of class, gender, and acidity on survival and diversity outcomes.

Uploaded by

justlikethatkr0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views15 pages

PracticeProblems_FinalExam_Solutions

The document provides a comprehensive review of various statistical analyses, including hypothesis tests related to survival rates on the Titanic, gender disparities in NYPD ranks, and the relationship between acidity levels and biological diversity in streams. It includes detailed tables, test statistics, and conclusions drawn from Chi-square tests and linear regression analyses. The findings indicate significant relationships in each case, highlighting the impact of class, gender, and acidity on survival and diversity outcomes.

Uploaded by

justlikethatkr0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Final Exam Review Questions

1. Review questions for the older material can be found in the previous exams’ review folders.
Additionally, the actual exams are good sources of questions to review. Remember, if it was important
enough to be on one of the early exams, it is still important enough to be on the final.

2. Provided below is information on who survived the sinking of the Titanic based on whether they were
crew members, or passengers booked in first-, second-, or third-class staterooms:
Crew First Second Third
Alive 212 202 118 178
Dead 673 123 167 528

a) Is there evidence that the chance of surviving was dependent on what class passenger they were?
Conduct the appropriate hypothesis test. Be sure to include the hypotheses, check the conditions,
report the test statistic, degrees of freedom and p-value, reach a decision and provide a conclusion
within the context of the problem.

This output looks like:

Rows: Class Columns: Status

Alive Dead All

Crew 212 673 885


285.5 599.5 885.0
18.915 9.007 *

First 202 123 325


104.8 220.2 325.0
90.046 42.879 *

Second 118 167 285


91.9 193.1 285.0
7.390 3.519 *

Third 178 528 706


227.7 478.3 706.0
10.864 5.173 *

All 710 1491 2201


710.0 1491.0 2201.0
* * *

Cell Contents: Count


Expected count
Contribution to Chi-square

Pearson Chi-Square = 187.793, DF = 3, P-Value = 0.000


Likelihood Ratio Chi-Square = 178.414, DF = 3, P-Value = 0.000

Ho: There is no relationship between Class and Survival for passengers


Ha: There is a relationship between Class and Survival for passengers
Conditions
Randomization Probably not satisfied, but that doesn’t really matter here since we don’t want to
make conclusions for a larger population

Expected Cell Counts (Middle # in each of the Cells in the Two-Way Table from Minitab) – all > 1 and all > 5
Test Statistic: Chi-Sq = 187.793
DF = 3
p-value = 0

Decision According to the scale of p-values, there is very strong evidence to Reject Ho

Conclusion There is very strong evidence that there was a relationship between Class and Survival Status for
passengers on the Titanic.

b) Based on the Chi-Square statistic contributions, which cells (i.e., combinations of categories)
contribute the most towards the test statistic. Report the top two cells and their contributions.

The contributions to the Chi-Square test statistic are the bottom entries in each cell in the Two-way table from
the Minitab output.

The two cells that contribute the most to the test statistic are “First Class, Alive” and “First Class, Dead”. The
contribution of “First Class, Alive” is 90.046 and the contribution of “First Class, Dead” is 42.879.

c) Based on your answer to part b, provide a slightly more in-depth conclusion that what you
concluded in part a.

The number of First Class passengers that survived the sinking of the Titanic was much higher than expected (if
there had been no relationship between Class and Survival), while the number of First Class passengers that
died was much than expected (again, if there had been no relationship between Class and Survival). It is clear
than the First Class passengers were much more likely to survive the tragedy.

3. The table below shows the rank attained by the male and female officers in the New York City Police
Department (NYPD). Use these data to answer the following questions.

Higher Total
Officer Detective Sergeant Lieutenant Captain
Ranks
Male 21,900 4,058 3,898 1,333 359 218 31766
Female 4,281 806 415 89 12 10 5613
Total 26181 4864 4313 1422 371 228 37379

a) What proportion of the NYPD is female?

What we want to know about the group of interest # of NYPD members that are females 5613
   0.15
Group of interest All NYPD members 37379

b) What proportion of officers are male?

What we want to know about the group of interest # of officers that are male 21900
   0.836
Group of interest All Officers 26181

c) What proportion of NYPD are female detectives?


What we want to know about the group of interest # of NYPD members that are female detectives 806
   0.022
Group of interest All NYPD members 37379

d) Is there evidence of differences in ranks attained by males and females? Conduct the appropriate
hypothesis test. Be sure to include the hypotheses, check the conditions, report the test statistic,
degrees of freedom and p-value, reach a decision and provide a conclusion within the context of
the problem.

Tabulated statistics: Sex, Rank

Using frequencies in Count

Rows: Sex Columns: Rank

Higher
Captain Detective Ranks Lieutenant Officer Sergeant All

Female 12 806 10 89 4281 415 5613


55.7 730.4 34.2 213.5 3931.5 647.7 5613.0
34.30 7.82 17.16 72.63 31.08 83.58 *

Male 359 4058 218 1333 21900 3898 31766


315.3 4133.6 193.8 1208.5 22249.5 3665.3 31766.0
6.06 1.38 3.03 12.83 5.49 14.77 *

All 371 4864 228 1422 26181 4313 37379


371.0 4864.0 228.0 1422.0 26181.0 4313.0 37379.0
* * * * * * *

Cell Contents: Count


Expected count
Contribution to Chi-square

Pearson Chi-Square = 290.131, DF = 5, P-Value = 0.000


Likelihood Ratio Chi-Square = 343.887, DF = 5, P-Value = 0.000

Ho: There is no relationship between rank and sex.


Ha: There is a relationship between rank and sex.

Conditions
Randomization – not a random sample; if we are only interested in the NYPD, then this is not a problem. If we
want to extend our conclusions to a larger population, we might not be able to do that.
Expected Cell Counts - OK

Test Statistic: Chi-Sq = 290.131


df = 5
p-value = 0

Decision According to the scale of p-values, there is very strong so Reject Ho

Conclusion There is very strong evidence that there is a relationship between rank and sex (in the NYPD); there
evidence of differences in ranks attained by males and females.
For practice (follow-up analysis): The cell that contributes the most to the test statistic is Female Sergeants
(contribution = 83.579). This indicates that there are fewer Female Sergeants than male sergeants (fewer than
we would expect if the null hypothesis of no relationship between rank and sex was true).

4. Biologists studying the effects of acid rain on wildlife collected data from 163 streams in the
Adirondack Mountains. They recorded the pH (acidity) of the water and the BCI, a measure of
biological diversity. The data are located in acid_rain.mtw. Use Minitab to construct a linear
regression model that uses the acidity of the water to predict the biological diversity.

Says that “acidity” is the explanatory Says that “biological diversity” is the
variable – it is being used to predict response – the variable being predicted
(explain) the other variable (explained)

a) What is the response variable? What is the explanatory variable?

Response Variable = BCI (the measure of biological diversity) – this is the variable being predicted
(explained)
Explanatory Variable = pH (acidity) – the variable used to explain/predict biological diversity

b) Provide an interpretation of the estimated slope within the context of the problem.

An increase of “1” in the pH level is associated with a decrease of -197.69 in the predicted BCI.

c) We wish to determine if there is evidence that the acidity level is linearly related to the biological
diversity for streams in the Adirondacks. State the appropriate null and alternative hypotheses.

Ho: 1 = 0 (There is no linear relationship between acidity level and biological diversity.)
Ha: 1 ≠ 0 (There is a linear relationship between acidity level and biological diversity.)

To get Minitab output to answer the remaining questions:


Stat > Regression > Regression > Fit Regression Model; BCI is the response and pH is the “continuous” predictor
Click “Graphs” and select “Histogram of Residuals” and “Residual versus Fits”.

d) Is there any evidence to suggest that inference on using a linear regression model is not appropriate?
Indicate what you use to justify your answer.

This means to check the conditions!


Histogram
(response is BCI)
30

25

20

Frequency
15

10

0
-450 -300 -1 50 0 1 50 300
Residual

Linearity – Satisfied (no curve in versus fits)


Independence - ? It is not clear how the data were collected – could possibly have a violation of this
condition
Normality – Satisfied (there may be a couple of outliers on “the low end”, but doesn’t look too bad)
Equal S.D. – Maybe a slight funnel in versus fits

Regression Analysis: BCI versus pH


Model Summary

S R-sq R-sq(adj) R-sq(pred)


140.437 27.07% 26.62% 25.27%

Coefficients

Term Coef SE Coef T-Value P-Value VIF


Constant 2733 188 14.55 0.000
pH -197.7 25.6 -7.73 0.000 1.00

Regression Equation

BCI = 2733 - 197.7 pH

e) Report the test statistic, degrees of freedom, and p-value associated with testing the hypotheses
from part b.

Test statistic: t = -7.73


df = n – 2 = 163 – 2 = 161
p-value = 0

f) Use the p-value to reach a decision and provide a conclusion within the context of the problem.

According to the scale of p-values, we have very strong evidence to Reject Ho and conclude that there is
strong evidence that acidity level (pH) is linearly related to biological diversity (BCI).

g) Estimate, with 95% confidence, the population slope coefficient. Also provide an interpretation of
your confidence interval.
To construct a confidence interval for one of the regression parameters: bi ±t*SEbi

Predictor Coef SE Coef T P


Constant 2733.4 187.9 14.55 0.000
pH -197.69 25.57 -7.73 0.000
b1 SEb1
b1  t * SEb1
 197.69  1.98425.57
 248.42,  146.96
Note that t* = 1.984 comes from Statkey with df = 161.

With 95% confidence, as pH increases by 1, the average BCI decreases by between 146.96 and 248.42.

h) Does your confidence interval from part f provide evidence that increased acidity is associated with
decreased biological diversity? Briefly explain why or why not.

“Increased acidity is associated with decreased biological diversity” = Negative Slope


This confidence interval does provide some evidence that there is a negative relationship between
acidity and biological diversity because it only contains negative numbers (we are 95% confident that
the true slope relating acidity level and biological diversity for the population is between -248.42 and -
146.96).

i) What proportion of variation about biological diversity does the linear regression explain?

R2 = 27.1%
Interpretation of R2: About 27% of the variation in biological diversity (BCI) is explained by this
linear regression model.

Review: Recall that you can calculate r (the correlation between acidity and BCI) using R 2 (just pay
attention to the sign on the slope!).

r   0.271  0.521

Note that you need to convert the percent given you to by Minitab to a proportion before you do
any calculations with it. Don’t forget the negative sign (always be sure that the sign on your
correlation makes the sign on the slope!)!

j) Predict the BCI for a neutral stream (i.e., a stream with pH of 7).

Using Minitab: Stat > Regression > Regression > Predict; Enter 7 in the “pH” box:

Regression Equation

BCI = 2733 - 197.7 pH

Variable Setting
pH 7

Fit SE Fit 95% CI 95% PI


1349.51 13.9225 (1322.02, 1377.00) (1070.82, 1628.21)

By Hand:

BCI  2733  198 7 


 1347

5. Does your IQ depend on the size of your brain? A group of female college students took a test that
measured their verbal IQs and also underwent an MRI scan to measure the size of their brains (in
1000s of pixels). The data is located in brain_size.mtw. Perform the appropriate hypothesis test to
determine if brain size and IQ are linearly related. Be sure to report all the necessary parts of the
hypothesis test.
This means to perform a hypothesis test
“Does your IQ depend on the size of your brain?” about the slope (1)!

This means “Can IQ be explained by” something else


– this implies that IQ is the Response variable!

Histogram
(response is VerbalIQ)

4
Frequency

0
-40 -30 -20 -1 0 0 10 20 30
Residual

Regression Analysis: VerbalIQ versus Brain Size


Model Summary

S R-sq R-sq(adj) R-sq(pred)


21.5291 6.50% 1.30% 0.00%

Coefficients

Term Coef SE Coef T-Value P-Value VIF


Constant 24.2 76.4 0.32 0.755
Brain Size 0.0988 0.0884 1.12 0.278 1.00

Regression Equation

VerbalIQ = 24.2 + 0.0988 Brain Size

Ho: 1 = 0 (There is no linear relationship between brain size and IQ)


Ha: 1 ≠ 0 (There is a linear relationship between brain size and IQ)

Conditions
Linearity - Satisfied (no major curved pattern in the “Versus Fits” plot – though there may by a couple
of outliers)
Independence - We don’t know much about how the data were collected, but it is probably safe to
assume that one individual’s IQ does not affect another’s
Normality – Not great! There is a bimodal pattern in the histogram. This condition could be violated,
but it really is hard to tell with such a small sample size (but we’ll proceed anyway)
Equal S.D. – Satisfied (no funnel pattern in the “Versus Fits” plot)

Test Statistic: t = 1.12


df = n – 2 = 20 – 2 = 18 (you need to look at the Minitab file to figure out how many observations there
are – there are 20 rows in the Minitab file)
p-value = 0.278

Decision According to the scale of p-values, we have no evidence to reject Ho ( Fail to Reject Ho)

Conclusion There is no evidence that brain size and IQ are linearly related.

6. Located in the file fifty_states.mtw are various measurements on the 50 United States from several
years ago. The murder rate is per 100,000, HS graduation rate is in %, income is per capita income in
dollars, illiteracy rate is per 1000, and life expectancy is in years.

The variable being predicted is always the


response variable. Here, life expectancy is being
predicted by the remaining variables.

a) Fit a regression model that predicts the life expectancy based on all of the other available variables
and answer the following questions. Note that you may wish to read through these parts first so
you only have to get Minitab output once.

To obtain the appropriate regression output using Minitab:


Stat > Regression > Regression > Fit Regression – Enter Life Expectancy as the response and the
remaining variables as the “Continuous” predictors
Select “Graphs” and choose “Histogram of Residuals” and “Residuals versus Fits”.

i. Report the regression equation.

Regression Equation

Life exp = 69.48 - 0.2619 Murder + 0.0461 HSGrad + 0.000125 Income + 0.276 Illiteracy

ii. Which variables are statistically significant at the  = 0.05 level?

From the Minitab output:

Term Coef SE Coef T-Value P-Value VIF


Constant 69.48 1.33 52.43 0.000
Murder -0.2619 0.0445 -5.89 0.000 2.04
HSGrad 0.0461 0.0218 2.11 0.040 2.36
Income 0.000125 0.000242 0.52 0.608 1.67
Illiteracy 0.276 0.311 0.89 0.379 2.71
Of the four predictors in the model (boxed in output above), only Murder and HSGrad have p-values less
than = 0.05. Thus Murder and HSGrad are the only variables that are statistically significant at the 
= 0.05 level.

iii. How many degrees of freedom are associated with these tests?

# terms added together in the model = 5


df = = 50 – 5 = 45

iv. Predict the life expectancy for Vermont. Hint: you need to identify the value of each
predictor for Vermont (find Vermont in the Minitab worksheet).

First, locate Vermont in the Minitab worksheet and identify the value of each predictor for Vermont:
Murder = 5.5, HSgrad = 57.1, Income = 3907, Illiteracy = 0.6

You can use Minitab to make the prediction after fitting the model: Stat > Regression > Regression >
Predict, enter the values of the predictors for Vermont in the appropriate columns:

Regression Equation

Life exp = 69.48 - 0.2619 Murder + 0.0461 HSGrad + 0.000125 Income + 0.276 Illiteracy

Variable Setting It is generally a good idea to look at this section of output to make
Murder 5.5
HSGrad 57.1 sure that you entered the correct value for each predictor (especially
Income 3907 in Multiple regression, you want to make sure that you put the values
Illiteracy 0.6
in the columns in the same order that they appear in the model).

Fit SE Fit 95% CI 95% PI


71.3313 0.242121 (70.8436, 71.8190) (69.6383, 73.0243)

The predicted life expectancy for Vermont


is 71.331.

By Hand:

LifeExp  69 .5  0.262 5.5  0.0461 57 .1  0.000125 3907   0.276 0.6
 71 .35

v. Calculate the residual for Vermont.

e  y  yˆ  71.64  71.331  0.309


(You could also use the prediction made by hand.)

vi. Check the conditions for regression inference. Are they all met? Comment on the validity
of each condition.
Histogram
(response is Life exp)

12

10

Frequency
6

0
-1 .6 -0.8 0.0 0.8 1 .6
Residual

Linearity – Satisfied (no curve in the “Versus Fits” plot)


Independence – Satisfied (One state’s life expectancy should seriously influence another’s)
Normality – Satisfied (Histogram looks a little funky, but not terribly bad)
Equal SD – Satisfied (No funnel pattern in the “Versus Fits” plot)

vii. What percent of variation in life expectancy can be explained by the regression?

R2 = 67%
Interpretation of R2: 67% of the variation in life expectancy can be explained by this
regression model.

b) Fit a regression that only uses the statistically significant explanatory variables as the predictors.

This model should only include Murder and HSGrad. These were the only variables that were significant
at the  = 0.05 level (both had p-values less than a = 0.05).

Regression Analysis: Life exp versus Murder, HSGrad


Model Summary

S R-sq R-sq(adj) R-sq(pred)


0.795872 66.28% 64.85% 60.71%

Coefficients

Term Coef SE Coef T-Value P-Value VIF


Constant 70.30 1.02 69.21 0.000
Murder -0.2371 0.0353 -6.72 0.000 1.31
HSGrad 0.0439 0.0161 2.72 0.009 1.31

Regression Equation

Life exp = 70.30 - 0.2371 Murder + 0.0439 HSGrad

i. What percent of variation in life expectancy can be explained by this regression? Has it
changed much from before?
R2 = 66.3%
Interpretation of R2: About 66% of the variation in life expectancy can be explained by this
regression model.

ii. Predict the life expectancy for Vermont.

Regression Equation

Life exp = 70.30 - 0.2371 Murder + 0.0439 HSGrad

Variable Setting
Murder 5.5
HSGrad 57.1

Fit SE Fit 95% CI 95% PI


71.4991 0.130539 (71.2364, 71.7617) (69.8766, 73.1215)

The predicted life expectancy for Vermont (with Murder = 5.5 and HSGrad = 57.1) is 71.499.

iii. Is the prediction for Vermont based on this model much different than in the previous
model?

This is not much different from the prediction made with the larger model (containing all four
predictors). Because Income and Illiteracy were not statistically significant, removing them from
the regression model did not hurt our predictions.

7. How accurately can we predict the number of calories in a fast food burger based on other nutritional
information? To answer this, we will investigate the data located in the file burgers.mtw. We have
information on 96 different hamburger sandwiches.

Calories is the response variable because we are


using all of the remaining variables to
explain/predict the number of calories

a) Fit a regression model that predicts the number of calories in a burger from the serving size (in
grams), the grams of Total Fat, grams of Saturated Fat, grams of Trans Fats, Carbs (g) and Sodium
(mg). Write out the estimated regression equation.

To obtain the appropriate regression output using Minitab:


Stat > Regression > Regression > Fit Regression Model – Enter Calories as the response and the
remaining variables as the “Continuous” predictors
Select “Graphs” and choose “Histogram of Residuals” and “Residual versus Fits”.

The regression equation is


Calories = 11.3 + 0.545 Serving Size + 6.42 Total Fat + 7.14 Saturated Fat + 4.96 Trans Fat
+ 2.81 Carbs + 0.0205 Sodium

b) The dataset contains 96 different hamburgers; however, not all are being used in the regression
because several restaurants do not report trans fats. How many burgers are being used in this
regression? Hint: Although we didn’t cover this in class, the answer is already in your output, just
look through it.
The following phrase appears in the Minitab output, directly below the regression equation:

Rows unused 25

There are 96 rows in the dataset, 25 of which did not get used. Thus, 96 – 25 = 71 burgers were
used to fit this model.

c) Check the conditions for inference. Briefly discuss what tool you use to check each condition.

Histogram
(response is Calories)
16

14

12

10

Frequency
8

0
-60 -40 -20 0 20 40
Residual

Linearity – Satisfied (no curved pattern appears in the “Versus Fits” plot)
Independence – Probably not! (Example: burgers from McDonald’s might be related)
Normality – Satisfied (Histogram is roughly unimodal and symmetric – a couple of low observations,
but not major outliers)
Equal SD – Satisfied (no funnel pattern in the “Versus Fits” plot)

d) Which predictor variables are not significant at the  = 0.10 level? Report the variable and the
corresponding p-value.

Term Coef SE Coef T-Value P-Value VIF


Constant 11.3 12.7 0.89 0.376
Serving Size 0.5452 0.0797 6.84 0.000 6.40
Total Fat 6.416 0.636 10.09 0.000 17.87
Saturated Fat 7.14 1.36 5.24 0.000 12.46
Trans Fat 4.96 3.94 1.26 0.212 2.09
Carbs 2.809 0.373 7.53 0.000 2.69
Sodium 0.0205 0.0116 1.77 0.082 3.38

There is one variable that is not significant at the  = 0.10 level: Trans Fat with p-value 0.212.

e) How many degrees of freedom are used for the above tests?

df = 71 – 7 = 64 (remember n is the number of observations used to fit the model, which in this case
is 71, because of the missing values for some variables and there are 7 terms added together in the
model).

f) Does this regression model seem to do a good job in predicting the number of calories? Support
your answer using the appropriate number.
R-Sq = 99.2%

Interpretation of R2: About 99% of the variation in the number of calories in burgers (from these
restaurants) is explained by this regression model.

The high value of R2 indicates that this is a pretty darn good model!

g) Now fit a new model removing trans fat as a predictor. Does it appear that this model can
accurately predict the number of calories in a burger? Briefly explain why or why not.

Regression Analysis: Calories versus Serving Size, Total Fat, ...

Model Summary

S R-sq R-sq(adj) R-sq(pred)


27.7610 98.94% 98.89% 98.74%

Coefficients

Term Coef SE Coef T-Value P-Value VIF


Constant 28.1 11.2 2.50 0.014
Serving Size 0.3666 0.0814 4.51 0.000 7.25
Total Fat 7.241 0.574 12.62 0.000 17.77
Saturated Fat 5.00 1.21 4.14 0.000 13.19
Carbs 3.396 0.281 12.08 0.000 1.73
Sodium 0.0275 0.0123 2.23 0.028 4.55

Regression Equation

Calories = 28.1 + 0.3666 Serving Size + 7.241 Total Fat + 5.00 Saturated Fat
+ 3.396 Carbs + 0.0275 Sodium

S = 27.7610 R-Sq = 98.9% R-Sq(adj) = 98.9%

For the new model, R2 = 98.9%, which is really close to what it was for the model containing Trans
Fat. This model seems to be able to accurately predict the number of calories in a burger.

h) Which model (the one with trans fat or the one without trans fat) do you prefer? Briefly explain
why you prefer the model you’ve chosen.

The model without trans fat would be preferred. It explains nearly as much of the variation in the
number of calories as the model with trans fat, but

1. It uses ALL of the data (more information and more degrees of freedom!)
2. It is easier to use for predictions (fewer numbers to plug in)

8. Are burgers at different restaurants healthier than others? One way of answering this is to compare
the mean calories in a burger at the different restaurants. We will again use burgers.mtw for this
problem.
You are comparing means for several different (more
than 2) restaurants, that means ANOVA! Look at the
Minitab file to see how many restaurants there are!
a) Write the hypotheses associated with testing whether there is some difference in the mean
number of calories for burgers from the different restaurants.
Ho: AW = BK = C Jr = DQ = DT = H = In = Jack = McD = S = W = WC
Ha: not all of the means are equal

To obtain the ANOVA output for this problem:


Stat > ANOVA > One-Way
Enter Calories as the response and Fast Food Restaurant as the Factor (explanatory variable).
“Graphs” – select “Boxplots of Data”
“Comparisons” – select “Tukey” (You will use this if you reject your null hypothesis)

b) Check the conditions associated with the ANOVA F test. Be sure to indicate any concerns that you
may have about whether these data pass the conditions, but continue on with the later parts of the
problem assuming that the conditions passed.

Random and Independent Samples – Not Really!


Normality – Maybe not! Lots of outliers
Equal SD – No Way! (Look at the sample standard deviations)

Boxplot of Calories
1600

1400

1200

1000
Calories

800

600

400

200
. 's r x 's c 's
W ng Jr e n c o e ge Bo ld ni dy st
le
A& Ki l's ue Ta de ur na So
er r Q l r B he
o en Ca
rg Ca ir y De Ha ut in
t
cD W te
Bu hi
Da -O
ac
k M W
n-N J
I
Fast Food Restaurant

c) Report the appropriate test statistic and the degrees of freedom associated with it.

Source DF Adj SS Adj MS F-Value P-Value


Fast Food Restaurant 11 2270669 206424 4.03 0.000
Error 84 4300427 51196
Total 95 6571096

Test Statistic: F = 4.03


df1 = 11, df2 = 84

d) Report the p-value and use it to make a decision.

p-value = 0 <  = 0.05, so Reject Ho

e) Provide a conclusion in the context of the problem.

There is strong evidence that at least one of the restaurant’s have a mean number of calories (in
their burgers) that is different from the rest.
f) There are a Burger King, McDonalds and Dairy Queen in Canton. For each pair of restaurants,
report the interval showing the estimated difference in the mean number of calories. Also indicate
which, if any, restaurant has the statistically significantly higher mean number of calories. If none
of these restaurants are significantly higher than the others, simply state that they are not
statistically different.

Because there are so many restaurants represented in this data set, there are LOTS of possible pairwise
comparisons. However, we just want to focus on comparisons of the mean number of calories in burgers for
Burger King, McDonalds and Dairy Queen (can ignore the rest of the multiple comparisons output).

Tukey Pairwise Comparisons

Grouping Information Using the Tukey Method and 95% Confidence

Fast Food Restaurant N Mean Grouping


Carl's Jr. 11 912.7 A
Hardee's 12 885.8 A
Jack in the Box 11 878.2 A B
Burger King 13 752.3 A B
Wendy's 4 735 A B C
Dairy Queen 11 705.5 A B
A&W 4 702.5 A B C
Sonic 15 695.3 A B
In-N-Out Burger 1 670.0 A B C
Del Taco 2 585.0 A B C
McDonald's 7 515.7 B C
White Castle 5 288.0 C

Means that do not share a letter are significantly different.

Because each pair of BK, DQ, and McD’s has a letter in common, none of these pairs have average calorie
contents that are significant different.

You might also like