STAT7055 Spring Session 2017
Topic 1 Tutorial Questions
1. Data was collected on 105 homes in Canberra in 2003. For each house, the following
information was collected: the estimated price of the house (in dollars); the number of
bedrooms; the size of the house (in square metres); whether or not a pool was present (yes
or no); the distance from Civic; the rating of the insulation in the house (none, average
or high); the suburb; the number of bathrooms; and the type of internet connectivity
available (dialup, ADSL or the NBN, where dialup is the slowest connection and the NBN
is the fastest). Classify each variable as either nominal, ordinal, discrete or continuous.
2. You work in a country where every resident plays a sport every day. However the only
two sports played are table tennis (when it is raining) and golf (when it is sunny). Your
job is to provide statistical analysis to the management of a company that sells “ping-
pong” (table tennis) balls directly through the internet. Over the past eight months you
have collected the following data:
Month Marketing Number of rainy Number of
expenditure ($) days sales
1 4150 6 778
2 3000 10 779
3 2500 25 4200
4 10600 2 250
5 12000 7 300
6 8000 20 6000
7 1500 18 1500
8 6850 9 500
For this data, the sample coefficients of variation for marketing expenditure, number of
rainy days per month, and number of sales have been calculated to be 0.642849, 0.656009,
and 1.194023, respectively.
(a) The marketing manager has told you that it simply makes sense that there is a
strong and positive correlation between marketing expenditure and the number
of sales made. Provide some analysis regarding this relationship. What do you
conclude from your results?
(b) Using the data above, calculate the correlation coefficient between the number of
rainy days per month and the number of sales. The covariance between the number
of rainy days per month and the number of sales has been calculated as 14012.23.
(c) What does the result in (b) above suggest, and provide a potential reason for this
result.
Try using R to calculate the sample correlation coefficients from the raw data given in
the table.
Page 1 of 4
STAT7055 Spring Session 2017 Topic 1 Tutorial
3. A quality control officer in a chocolate factory records the number of minutes it takes
for the company’s signature chocolate bar to melt at room temperature. He recorded
the following 11 times for 11 different chocolate bars:
14 20 20 12 9 13 35 12 11 12 46
(a) Calculate the mean, mode and median of the times.
(b) It turned out that the quality control officer occasionally fell asleep while recording
the time for a chocolate bar to melt, leading to some incorrectly large melting times.
Based on this information, which would be a better measure of central tendency for
this data, the mean or the median?
(c) Calculate the IQR of the times.
(d) Calculate the difference between the 60th percentile and the 10th percentile.
(e) To what percentile does a time of 15.5 minutes correspond to?
4. There is a shortcut version for calculating the sample variance given by the following
formula: ! !
n Pn 2
1 X ( X i )
s2 = Xi2 − i=1
n−1 i=1
n
Show that this is equivalent to the definition given in the lectures. In other words, show
that: ! ! !
n n Pn 2
1 X 2 1 X ( X i )
Xi − X̄ = Xi2 − i=1
n − 1 i=1 n−1 i=1
n
Bonus: Show that the shortcut version of the sample covariance given below is equivalent
to the definition given in lectures.
n
! !
( ni=1 Xi ) ( ni=1 Yi )
P P
1 X
sXY = Xi Yi −
n−1 i=1
n
5. The Hula painted frog is an extremely rare species of frog that was thought to be extinct
but was rediscovered in 2011. Only 11 are believed to be living in the wild. Suppose the
weights of these 11 frogs are known and given in the table below (in grams):
13 26 22 16 18 28 14 15 15 17 25
Page 2 of 4
STAT7055 Spring Session 2017 Topic 1 Tutorial
(a) Calculate the population variance of these 11 frogs.
Suppose now we take five random samples of size four from this population, with each
new sample being taken after returning the previous sample to the population. The five
samples, along with some sample statistics, are listed below:
Pn
Sample X̄ i=1 Xi2
13, 22, 18, 16 17.25 1233
26, 15, 17, 15 18.25 1415
14, 18, 15, 25 18 1370
25, 14, 16, 17 18 1366
13, 26, 25, 18 20.5 1794
(b) Calculate the sample variance for each of the five samples.
(c) Calculate the sample variance for each of the five samples, but this time using n as
the denominator, instead of n − 1. That is, calculate:
n
∗2 1X 2
s = Xi − X̄
n i=1
(d) Calculate the average of the five samples variances in part (b) and the average of
the five sample variances in part (c). What do you notice?
6. The average score for a class of 30 students was 75. The 20 male students in the class
averaged 70. The boxplots for the scores for the male and female students are given
below.
100
●
90
80
70
60
50
40
Male Female
Page 3 of 4
STAT7055 Spring Session 2017 Topic 1 Tutorial
(a) What was the average of the 10 female students in the class?
(b) Describe the relationship between the median and the mean for both male students
and female students.
(c) Did a greater proportion of male students or female students score above 83?
Page 4 of 4