0% found this document useful (0 votes)
241 views22 pages

Problem Set by Pritha Guha

This document contains 24 quantitative techniques problems related to sampling methods and parameter estimation. The problems cover topics like simple random sampling, stratified sampling, clustered sampling, proportional allocation, Neyman allocation, maximum likelihood estimation, method of moments estimation, and more. The document provides data, population characteristics, and sample values to help solve problems related to sampling distributions, point estimation, and statistical inference.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
241 views22 pages

Problem Set by Pritha Guha

This document contains 24 quantitative techniques problems related to sampling methods and parameter estimation. The problems cover topics like simple random sampling, stratified sampling, clustered sampling, proportional allocation, Neyman allocation, maximum likelihood estimation, method of moments estimation, and more. The document provides data, population characteristics, and sample values to help solve problems related to sampling distributions, point estimation, and statistical inference.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

BM 21-23 (Term II)

Quantitative Technique II

PROBLEM SET
PRITHA GUHA

XLRI | Jamshedpur
1. Suppose following are the values in some population:
5, 27, 4, 17, 4.5, 19, 2, 11, 3, 6, 13, 18
A sample of size 4 is taken, and is observed to be 3, 4, 4.5, 2.
Is it most likely to be (a) a simple random sample, (b) a stratified sample or (c) a
clustered sample? Give reason for your answer.

2. A sample of size 15 is to be chosen from a population of size 100, divided into two
strata of sizes 60 and 40, respectively. If proportional allocation is used, and samples
are selected without replacement from the stratum, how many different samples can
be selected in total?

3. Suppose 10,000 payment vouchers are generated in 2017 in XLRI. An auditor checks
the vouchers by drawing a probability sample (known as audit sample).
a) Why simple random sampling may not be appropriate?
b) Which sampling design would you prefer?

4. A statistics student who is curious about the relationship between the amount of time
students spend on social networking sites and their performance at school decides to
conduct a survey. Various research strategies for collecting data are described below.
In each, name the sampling method proposed and any bias you might expect.
(a) He randomly samples 40 students from the study's population, gives them the
survey, asks them to fill it out and bring it back the next day.
(b) He gives out the survey only to his friends, making sure each one of them fills out
the survey.
(c) He posts a link to an online survey on Facebook and asks his friends to fill out the
survey.
(d) He randomly samples 5 classes and asks a random sample of students from those
classes to fill out the survey.

5. A university wants to determine what fraction of its undergraduate student body


support a new Rs. 2500 annual fee to improve the student union. For each proposed
method below, indicate whether the method is reasonable or not.
(a) Survey a simple random sample of 500 students.

Page 1 of 13
(b) Stratify students by their field of study, then sample 10% of students from each
stratum.
(c) Cluster students by their ages (e.g. 18 years old in one cluster, 19 years old in one
cluster, etc.), then randomly sample three clusters and survey all students in those
clusters.

6. A school in Jamshedpur is planning conduct a sample survey for all the teachers about
how many hours the teachers require to create class notes for the students using a
mobile phone. There are 20 pre-primary teachers, 25 primary teachers and 30
secondary teachers. The school decides to choose a total of 15 teachers using a
stratified sampling model by considering the three types of teachers as three different
strata. The mean and the standard deviation of the time required to create class notes
for each stratum were as follows:
Mean (in hours) Standard deviation/ SD (in hours)
Pre-primary 43.48 3.11
Primary 45.40 2.83
Secondary 53.05 6.80

What is the number of samples to be chosen from the strata of secondary teachers if
Neyman allocation is used (mark the closest one)?

7. TV watching time (TVWatching.csv): An advertising firm, interested in determining


how much to emphasize television advertising in a certain state decides to conduct a
sample survey to estimate the average number of hours each week that households
within that state watch television. The state has two towns, A and B, and a rural area
C. Town A is built around a factory and most households contain factory workers with
school-aged children. Town B contains mainly retirees, and the rural area C are mainly
farmers. There are 155 households in town A, 62 in town B and 93 in the rural area, C.
The firm decides to choose a total of 40 households from the state from Town A, Town
B and rural area C.
a) Using R, select a simple random sample with replacement (SRSWR) and without
replacement (SRSWOR) from the population.
b) Now the firm decides to use a stratified sampling model by considering Town A,
Town B and rural area C as three different strata. Find the number of samples to be

Page 2 of 13
chosen if proportional allocation and Neyman allocation is used. Choose the
samples for both allocations.

8. It is known that 80% of all Brand A MP3 players work in a satisfactory manner
throughout the warranty period (are “successes”). Suppose that n = 10 players are
randomly selected. Let X = the number of successes in the sample. The statistic X/n is
the sample proportion (fraction) of successes. Obtain the sampling distribution of this
statistic. (Can you simulate this problem in R?)

9. As part of a quality control process for computer chips, an engineer at a factory


randomly samples 212 chips during a week of production to test the current rate of
chips with severe defects. She finds that 27 of the chips are defective.
a) What population is under consideration in the data set?
b) What parameter is being estimated?
c) What is the point estimate for the parameter?
d) What is the name of the statistic can we use to measure the uncertainty of the
point estimate?
e) Compute the value from part (d) for this context.
f) The historical rate of defects is 10%. Should the engineer be surprised by the
observed rate of defects during the current week?
g) Suppose the true population value was found to be 10%. If we use this proportion
to recompute the value in part (e) using p = 0.1 instead of 𝑝̂ , does the resulting
value change much?

10. Let X1, X2, X3, X4, X5 be an independent and identically distributed (IID) sample
from a population with mean μ and variance 1. Which of the following is not an
unbiased estimator of μ?
1
A) (X1 + X2 + 2X3 + 2X4 + 2X5 )
4
1
B) (X1 + X2 + X3 + X4 +X5 )
5
1
C) (X1 + 2X2 + 3X3 + 4X4 +5X5 )
15
D) 2X1 + 2X2 − X3 − X4 − X5

11. Sample variance is unbiased for population variance in SRSWR.

Page 3 of 13
12. Let X1, X2, X3, X4, X5 be an independent and identically distributed (IID) sample
from a population with mean μ and variance 1. Which of the following has the lowest
variance?
1
A) (X1 + X2 + X3 + X4 +X5 )
5
B) 2X1 + 2X2 − X3 − X4 − X5
1
C) (X1 + X2 + 2X3 + 2X4 + 2X5 )
4
1
D) (X1 + 2X2 + 3X3 + 4X4 +5X5 )
15

13. For a SRS drawn WR from Poisson population, show that both sample mean 𝑋̅ =
1 𝑛 1
∑𝑖=1 𝑋𝑖 and ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅ )2 are unbiased estimates for the population mean λ.
𝑛 𝑛−1

14. Suppose that 𝑋1 , 𝑋2 , … , 𝑋𝑛 are IID random sample from a Bernoulli distribution
with probability of success being p.
a) Show that 𝑋 = 𝑝̂ will be an unbiased estimator of p.
b) Also show that 𝑝̂ (sample proportion) is a consistent estimator of p (population
proportion).

15. X is a discrete random variable with the following probability mass function:
where 0 ≤ θ≤1. The following 10 samples were taken: 3, 0, 2, 1, 3, 2, 1, 0, 2, 1.
X 0 1 2 3
P(X) 2θ/3 θ/3 2(1-θ)/3 (1-θ)/3
a) Find MME of θ.
b) Find MLE of θ.

16. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be IID random sample from a discrete distribution with


probability mass function
𝑝, 𝑥 = 0
𝑝(𝑥) = { 2𝑝, 𝑥 = 1
1 − 3𝑝, 𝑥 = 2
Find MME of p.

17. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be IID random sample from a Uniform [-b,b]. Find MME of b.
18. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be IID random sample from an Exponential(λ). Find MLE of λ.
Page 4 of 13
19. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be IID random sample from a 𝑁(𝜇, 𝜎 2 ). Find MLE of 𝜇 and 𝜎 2 .

20. Suppose X1, X2, …, Xn is an independently and identically distributed (IID) sample
from a normal distribution with mean 0 and variance σ2.
a) What is the likelihood function for σ2, using the sample?
b) Showing all relevant steps, obtain a maximum likelihood estimator for the
parameter σ2.

21. Obtain a maximum likelihood estimator (MLE) and a method of moment


estimator (MME) for the parameter a of a uniform distribution on the interval [a, 2a],
where a >0.

22. Suppose a simple random sample from an uniform distribution on the interval
(0,b) is obtained as follows: 42, 46, 44, 47, 47, 43, 62, 64. Determine a method of
moment estimate of b.

23. Suppose the number of customers X that enter a store between the hours 9AM
and 10AM follows a Poisson distribution with parameter θ. Suppose a random sample
of the number of customers that enter the store between 9AM and 10AM for 10 days
results in the values 9, 7, 9, 15, 10, 13, 11, 7, 2, 12. Determine the MLE of θ.

24. A marketing analyst wishes to obtain a sample of size 100 with replacement from
a population to estimate the population mean µ (unknown). However, due to a coding
error, her algorithm starts recording every value twice. After recording 50 values in
that manner, the coding error is discovered, and corrected, although the recorded
values are left unchanged. Then 50 more observations are recorded to make it to a
100. The final sample therefore is as follows: X1, X1, X2, X2, …, X25, X25, X26, X27, …, X75.
Let the corresponding sample mean be X*. On the other hand, let the sample mean of
̅.
X1, X2, …, X25, X26, X27, …, X75, each used once (i.e., after removing the duplicates), be X
For the following, wherever necessary, assume that X1, X2, …, X75 are independent and
identically distributed with mean µ and variance σ2.
a) Among X* and X ̅, which is/are unbiased for µ?
b) Which estimator, among X* and ̅ X, is more efficient, and why?

Page 5 of 13
25. The capture/recapture method is sometimes used to estimate the size of a
wildlife population. Suppose that 10 animals are captured, tagged, and released. On a
later occasion, 20 animals are captured, and it is found that 4 of them are tagged. How
large is the population?

26. The Pareto distribution is used in economics to model values exceeding a


threshold. For a fixed known threshold value of 𝑥0 > 0, the density function is
𝑓(𝑥|𝑥0 , 𝜃) = 𝜃𝑥0𝜃 𝑥 −𝜃−1 , 𝑥 ≥ 𝑥0 𝑎𝑛𝑑 𝜃 > 1. (Note that the cumulative distribution
𝑥 −𝜃
function if X is 𝑃(𝑋 ≤ 𝑥) = 𝐹𝑋 (𝑥) = 1 − ( ) .)
𝑥 0
a) Find the method of moments (MME) estimate of θ.
b) Find MLE of θ.

27. Suppose that the 95% confidence interval obtained for the difference of average
weekly sales in two stores is (-3.232, 12.102) (in lakhs of rupees). From this, which of
the conclusions is NOT possible without additional information?
(a) At 10% level of significance, the null of equal average weekly sales against the
alternative of unequal average weekly sales cannot be rejected
(b) At 1% level of significance, the null of equal average weekly sales against the
alternative of unequal average weekly sales cannot be rejected
(c) At 2% level of significance, the null of equal average weekly sales against the
alternative of unequal average weekly sales cannot be rejected
(d) At 5% level of significance, the null of equal average weekly sales against the
alternative of unequal average weekly sales cannot be rejected

28. A manufacturer of light bulbs claims an average life of more than 750 hours per
bulb. A consumer group did not believe this claim and tested a sample of 35 bulbs. The
average lifetime of these 35 bulbs was 740 hours with a standard deviation of 30 hours.
The manufacturer responded that their claim was based on testing hundreds of bulbs.
a) What is the 95% confidence interval for life of light bulbs based on the data
collected by the consumers?
b) True or false: Compared to the consumer group’s 95% confidence interval, the
manufacturer’s 95% confidence interval is more likely to contain the population
mean because it is based on a larger sample. Justify your answer.

Page 6 of 13
c) The volume in a set of soft drink bottle is known to follow a Normal distribution with
standard deviation of 4 ml. You have taken a sample of the bottles and measured their
volumes. How many bottles do you have to sample to have a 90% confidence interval
for µ with width 1?

29. Suppose that against a certain opponent the number of points the BM basketball
team scores is normally distributed with unknown mean θ and unknown variance, σ2.
Suppose that over the course of the last 6 games between the two teams, BM scored
the following points: 59, 62, 59, 74, 70, and 61.
a) Compute a 95% t–confidence interval for θ. Does 95% confidence mean that the
probability θ is in the interval you just found is 95%?
b) Now suppose that you learn that σ2 = 25. Compute a 95% z–confidence interval for
θ. How does this compare to the interval in (i)?

30. The production line for Glow toothpaste is designed to fill tubes of toothpaste
with a mean weight of 60 grams. Periodically, a sample of 50 tubes will be selected in
order to check the filling process. Quality assurance procedures call for the
continuation of the filling process if the sample results are consistent with the
assumption that the mean filling weight for the population of toothpaste tubes is 60
grams; otherwise the filling process will be stopped and adjusted.
a) Formulate the null and the alternative hypotheses to help determine when the
filling process should continue operating and when it should be stopped and
corrected.
b) Assume that a sample of 50 toothpaste tubes provides a sample mean of 61 grams
and standard deviation of 2 grams. Based on the information:
(i) Compute the p-value of the test; hence carry out the testing of the hypotheses
formed in part (a) at a 5% level of significance.
(ii) Obtain a 95% confidence interval for the mean filling weight and then carry out
the same test as in part (i).
c) What is the power of the test if the true mean filling weight of the toothpaste tubes
is 60.5 grams, if we assume the true standard deviation of the filling weights is 2
grams?

Page 7 of 13
31. A large hotel chain periodically runs various surveys on their website to
understand customer preferences. One such survey is about the preference of
smoking versus non-smoking rooms. In a random sample of 400 visitors to their
website five years ago, 166 had indicated their preference for the non-smoking rooms.
This year, 205 such visitors, in a sample of 380, preferred the non-smoking rooms.
a) What would be a 95% confidence interval for the true difference of proportions,
regarding the preference of non-smoking rooms, from five year ago to now?
b) Would you recommend that the hotel chain convert more rooms to non-smoking?
Support your recommendation, by testing the appropriate hypotheses, using the
data given above, at a 0.05 level of significance.

32. A controlled clinical trial was performed on 9 patients to investigate the effect of
a drug on the behavioural disorders of chronic schizophrenics. The following table
gives the behavioural rating scores for the patients at the beginning of the trial and
after 3 months of the end of the trial. High scores are good.
Patient 1 2 3 4 5 6 7 8 9
Before Drug 2.3 2.0 1.9 3.1 2.2 2.3 2.8 1.9 1.1
After Drug 3.1 2.1 2.45 3.7 2.54 3.72 4.54 1.61 1.63
Stating your assumptions, null and alternative hypotheses, test at a 5% level of
significance whether the drug improves the patients’ behavioural rating scores.
33. The screening process for detecting a rare disease is not perfect. Researchers
have developed a blood test that is considered fairly reliable. It gives a positive
reaction in 98% of the people who have that disease. However, it erroneously gives a
positive reaction in 3% of the people who do not have the disease. Suppose the null
hypothesis is “the individual does not have the disease” and the alternative hypothesis
is “the individual has the disease”.
a) What is the probability of Type I error?
b) What is the power of the test?

Page 8 of 13
34. A study has been conducted to check whether average height changes from
generation to generation. The following table gives the heights (in centimetres) of a
sample of 8 fathers and their oldest adult sons.
Height of father 165.1 160 170.2 162.6 172.7 157.5 177.8 167.6
Height of son 172.7 167.6 172.7 175.3 167.6 165.1 180.3 170.2
Stating your assumptions, null and alternative hypotheses, test at a 5% level of
significance whether the average height changes from generation to generation. What
is the p-value of the test?

35. The mean lifetime of a sample of 100 light bulbs produced by a company is
computed to be 1590 hours with a standard deviation of 120 hours. The manager of
the company wants to test the null hypothesis µ = 1600 hours against the alternative
hypothesis µ ≠ 1600 hours, where µ is the mean lifetime of all the bulbs produced by
the company.
a) Compute the p-value of the test.
b) From (a), what is your conclusion at a 5% level of significance?
c) Obtain a 95% confidence interval for µ. Would you arrive at the same conclusion,
for the test of hypotheses using this confidence interval as in b)? Justify.

36. An oil company wishes to study the effects of three different fuel additives on
mean fuel mileage. The company randomly selects three groups of six automobiles
each and assigns a group of six automobiles to each additive type (A, B, and C). All the
18 automobiles are of same make and model. Each of the six automobiles assigned to
fuel additive test is driven using the appropriate additive and the fuel mileage (in
km/lit) for the test drive is recorded. Following are the results:

Additive A 12.5 15 14.4 11 14.9 13


Additive B 14.2 12.3 15.4 17.3 15.1 14.1
Additive C 16.4 12.6 14 17.8 11.2 13.6
Assume that the data are normally and independently distributed. Can we consider the
variances of each of the three groups to be the same? Perform a relevant test at 5%
level of significance by stating your hypotheses.

Page 9 of 13
a) Suppose that we want to test at a 5% level of significance whether there is a
difference in the three types of fuel additives regarding mileage of the automobile.
Towards that, state the null and alternative hypotheses, the assumptions you make and
why they are justifiable, compute the appropriate test statistics and then perform the
test.
b) Now consider a situation when the data are not known to be normally distributed.
Suggest and perform an appropriate test at a 5% significance level to check whether
there is a difference in the three types of fuel additives regarding mileage of the
automobile. Remember to state your assumptions and hypotheses.

37. A software company develops software for GPS navigational system. From market
area A to residential area B, there are four routes possible, Route 1, Route 2, Route 3
and Route 4. The company obtains data for travelling along each route for one week
and obtains the following data. The time of travel from market area A to residential
area B is given in minutes.
Route 1 29.5 30.5 33 31 32
Route 2 27.5 32.5 28 30 29
Route 3 25 27 23.5 25.5 26
Route 4 24 26.5 28.5 31.5 24.5

a) Assume that the data are normally and independently distributed.


i) Can we consider the variances in time of each of the four routes to be the same?
Perform a relevant test at 1% level of significance by stating your hypotheses.
ii) Suppose that we want to test at a 1% level of significance whether there is a
difference in the four different routes. Towards that, state the null and alternative
hypotheses, the assumptions you make and why they are justifiable, compute the
appropriate test statistics and then perform the test.
b) Now consider a situation when the data are not known to be normally distributed.
Suggest and perform an appropriate test at a 1% significance level to check whether there
is a difference in the four different routes. Remember to state your assumptions and
hypotheses. 10%
level of
significan
ce

Page 10 of 13
38. The Scholastic Aptitude Test (SAT) contains three areas: critical reading,
mathematics, and writing. Each area is scored on an 800-point scale. A sample of SAT
scores for six students follows.

Student Critical Mathematical Writing


Reading
1 526 534 530
2 594 590 586
3 465 464 445
4 561 566 553
5 436 478 430
6 430 458 420
Assume that the data are normally and independently distributed. Using a .05 level of
significance, do students perform differently on the three areas of the SAT?

39. In an attempt to improve students’ performance on the GMAT, a university is


considering offering the following three GMAT preparation programs.
i) A three-hour review session, ii) A one-day program covering, iii) An intensive 10-week
course.
Scores on the GMAT range from 200 to 800, with higher scores implying higher aptitude.
The GMAT is usually taken by students from three colleges: the College of Business, the
College of Engineering, and the College of Arts and Sciences. Let us assume that the
randomly selected students participated in the preparation programs and then took the
GMAT. The scores obtained are reported in the following table:
College
Business Engineering Art and
Preparation Science
Program 3-hr review 500, 580 540, 460 480, 400
1 Day Program 460, 540 560, 620 420, 480
10-week course 560, 600 600, 580 480, 410
Assume that the data are normally and independently distributed. Test the following at
5% level of significance:
a) Do the preparation programs differ in terms of effect on GMAT scores?
b) Do the undergraduate colleges differ in terms of effect on GMAT scores?
Page 11 of 13
c) Do students in some colleges do better on one type of preparation program
whereas others do better on a different type of preparation program?

40. Sales figures for a sample of 40 days from a store selling mobile phones is given
below:
26 29 33 30 27 29 24 30 30 34
39 24 29 31 31 32 27 26 31 28
36 30 33 25 33 31 36 35 30 24
34 36 41 33 34 29 31 31 32 26

Retracing the steps of what we did in class, perform a chi-square goodness of fit test
at a 5% level of significance to test whether the data come from a normal distribution.
Clearly write your null and alternative hypotheses, group the data in a reasonable
number of classes, compute the value of the appropriate test statistic, perform the
test and write your final conclusion.

41. A farmer who has sprayed insecticide on 6 of his apple trees to eliminate mites.
From each of the 6 apple trees, 25 leaves were selected, and the number of mites were
counted. Following is the data on 150 leaves which gives the number of count of mites
on each leaf:

Number per leaf 0 1 2 3 4 5 6


Observed count 70 38 17 10 9 3 3

Retracing the steps of what we did in class, perform a chi-square goodness of fit test
at a 5% level of significance to test whether the data come from a Poisson distribution.
Clearly write your null and alternative hypotheses, group the data in a reasonable
number of classes, compute the value of the appropriate test statistic, perform the
test and write your final conclusion.

42. An employment survey asked a sample of human resource executives how their
company planned to change its workforce over the next 12 months. A categorical
response variable showed three options: The company plans to hire and add to the
number of employees, the company plans no change in the number of employees, or

Page 12 of 13
the company plans to lay off and reduce the number of employees. Another
categorical variable indicated if the company was private or public. Sample data for
180 companies are summarized as follows.
Employment Plan Company
Private Public
Add Employees 37 32
No Change 19 34
Lay Off Employees 16 42
Construct a test of independence to determine if the employment plan for the next 12
months is independent of the type of company. At a 5% level of significance, what is
your conclusion?

Page 13 of 13
BM 21-23 (Term II)

Quantitative Technique II

PROBLEM SET WITH SOLUTION


PRITHA GUHA

XLRI | Jamshedpur
30. a) Solution: To test, H0: µ = 60, H1: µ ≠ 60, i.e., whether the mean weight of the
toothpaste tube is significantly different from 60gms.
b) i) Solution: p-value (0.0004069519) < α (0.05), we reject H0, i.e., from the given
sample, at 5% level of significance, we conclude that we need to stop the filling
process and correct it as the filling weight is significantly different from 60 gms.
ii) Solution: [60.44563, 61.55437]
c) Solution: 0.4224016
31. a) Solution: [−0.194067, −0.05488039]
b) Solution: To test H0: p1 = p2, H1: p1 < p2
We have α = 0.05
Common population proportion,
166 + 205
𝑝̂ = = 0.475641
400 + 380
𝑆𝑝̂−
1 𝑝̂2 = 0.03577499
−0.1244737
Now, 𝑍𝑜𝑏𝑠 = = −3.47935
0.03577499

As Zobs < -Zα, we reject H0 for α = 0.05, i.e., we can recommend the hotel chain to
convert more rooms to non-smoking.
32. Solution: Using parametric test:
Assumptions: 1. The population from which the samples are collected is following a
normal distribution.
2. Xi and Yi are related for the same i, but, Xi and Yj are independent whenever 𝑖 ≠ 𝑗.
We would be doing a paired t-test.
As 𝑡𝑜𝑏𝑠 (−3.094344) < 𝑡8;0.05 (−1.859548), we reject H0, i.e., there is some positive
effect of the drug on the patients at 5% level of significance.
Using non-parametric test:
Let Xi = Before drug scores, Yi = After drug scores
(X1, Y1), …, (X9, Y9), are matched pairs.
Assumption:
The distribution of the population from which the sample is chosen is not known to be
normal.

Page 1 of 7
Suppose X1, …, X9 is from distribution F and Y1, …, Y9 is from distribution G.
Di = Xi - Yi
To test, whether the drug improves the patients scores.
H0: F and G are identical probability distributions, i.e., there is no change in the scores
H1: F is shifted to the left of G, i.e., the drug improves the scores
We would be performing a Wilcoxon Signed Rank Test.
As our H1: F is shifted to the left of G , the test statistic:
T = T+ = sum of the ranks corresponding to positive values of Di
Patient Before After Di We would be using R.
(Xi) (Yi)
R Code:
1 2.3 3.1 -0.8
2 2.0 2.1 -0.1 Wilcox.test(X, Y, alternative = “less”, paired = T)
3 1.9 2.45 -0.55
R outputs:
4 3.1 3.7 -0.6
5 2.2 2.54 -0.34 Test statistic value = 2
6 2.3 3.72 -1.42
p-value = 0.005859
7 2.8 4.54 -1.74
8 1.9 1.61 0.29 As α = 0.05 and p-value (0.005859) < α (0.05), we
9 1.1 1.63 -0.53 reject H0, i.e., there is some positive effect of the
drug on the patients at 5% level of significance.
33. a) Solution: 0.03; b) Solution: 0.98
34. Solution: From two-sided paired t-test, p-value: 0.03995
35. a) Solution: 0.4066;b) Solution: Cannot reject null; c) Solution: [1566.48, 1613.52]
36. Discussed in class
37. Solution: i)There are 4 different, Route 1, Route 2, Route 3, Route 4.
To test whether the variances of the four routes are same, i.e., to test
H0: σ12 = σ22 = σ32 = σ42, H1: Not all σj2 s are equal
where σ12, σ22, σ32, σ42 are variances corresponding to the four populations for Route 1,
Route 2, Route 3, Route 4 respectively.
Assumptions:

Page 2 of 7
1. The population from which the samples are collected is following a normal distribution
2. The data are independently distributed
We would be performing Bartlett’s test for homogeneity of variance.
Under H0, the test statistic H ~ χ2 with 3 degrees of freedom
We reject H0 for α = 0.1 if p-value < α
We are using R to perform the test.
R Code: bartlett.test(Time~Route)
Relevant R outputs:
data: Time by Route
Bartlett's K-squared = 2.3838, df = 3, p-value = 0.4967
As p-value (0.4967) > α (0.1), we cannot reject H0.
Thus, we conclude that, the variances of each of the four groups are the same/not
significantly different.

ii) Solution: We would like to test at 1% level (α =0.01), whether there is a difference
in travelling along the four routes from market area A to residential area B.
To test,
H0: µ1 = µ2 = µ3 = µ4, H1: At least one µi is different
where µ1, µ2, µ3 and µ4 are means corresponding to the four populations for Route 1,
Route 2, Route 3, Route 4 respectively.
Assumption:
1. The population from which the samples are collected is following a normal distribution
with the same variance (i.e. σ12 = σ22 = σ32 = σ42using part (i))
2. The data are independently distributed
With the above assumptions we can perform one-way ANOVA for comparing the means
of the four populations. We are using R to perform the test.
R Code: Anova(lm(Time~Route), type = "II")
We are filling up the ANOVA table from the outputs obtained from R.
ANOVA Table
Page 3 of 7
Source of Degrees of Sum of Mean Sum of F-Statistic
Variation Freedom squares Squares (Fobs)
Between Groups 3 98.55 32.85 7.763663
Error 16 67.70 4.23125
Total 19 166.25
Here, k = 4, n=20, α = 0.01.
Under H0, test statistic, F ~ F3,16;0.01 and from ANOVA table, we get Fobs = 7.763663
Using R/F-distribution table to find F3,16;0.01.
R code: qf(0.90, 3,16).
From R, F3,16;0.1 = 2.461811.
As Fobs (7.763663) > F3,16;0.1 (2.461811), we reject H0, i.e., there is significant difference in
travelling along the four routes from market area A to residential area B.
Other approach using p-value:
From R output we get, p-value = 0.002015458
Our α = 0.10. As p-value < α, thus we reject H0.
b) Solution: Now suppose we do not know whether the data are normally distributed.
To test, H0: All the four populations are identical, H1: At least two of the populations are
different
Assumption:
1. The samples are independent
We would be performing Kruskall-Wallies test (Non-parametric Test) to test the
hypothesis.
As there are no ties, the test statistic,
𝑘
12 𝑇𝑖2
𝐻= ∑ − 3(𝑛 + 1)
𝑛(𝑛 + 1) 𝑛𝑖
𝑖=1

where, n1 = 5, n2 = 5, n3 = 5, n4 = 5, n = n1 + n2 + n3 + n4 = 20, k = 4,
Ti = sum of ranks for the i-th group.
We also have α = 0.1.

Page 4 of 7
Under H0, H ~ χ2 with 3 degrees of freedom
Route Route Route Route As there are no ties,
1 2 3 4
12 822 642 242 402
13 9 4 2 𝐻𝑜𝑏𝑠 = ( + + + )−3
15 19 8 7 20 ∗ 21 5 5 5 5
20 10 1 11 ∗ 20 = 11.263
16 14 5 17 From Chi-square table, for α = 0.1,
18 12 6 3
82 = T1 64= T2 24= T3 40=T4 χ2 with 3 degrees of freedom = 6.251389
As Hobs > χ2 with 3 degrees of freedom, we reject H0, i.e., there is significant difference in
travelling along the four routes from market area A to residential area B.
Using R:
R Code: kruskal.test(Time~Route)
Output: Hobs = 11.263
As α = 0.1 and under H0, H ~ χ2 with 3 degrees of freedom.
R code: qchisq(0.9, 3) or look at chi-sq table
R output: for α = 0.1, χ2 with 3 degrees of freedom = 6.251389
As Hobs > χ2 with 3 degrees of freedom, we reject H0, i.e., there is significant difference in
time travelling along the four routes from market area A to residential area B.
Other approach using p-value:
From R output we get, p-value = 0.01039
Our α = 0.1. As p-value < α, thus we reject H0.
38. Solution: 𝐻0 : 𝜇𝐶𝑅 = 𝜇𝑀 = 𝜇𝑊 , 𝐻1 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝜇𝑖 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡
Two-way ANOVA Table (Without interaction)
Source of Variation DF SS MS F-stat
Subject 2 1348 674 5.6167
Student 5 63250 12650 105.4167
Error 10 1200 120
Total 17 65798
R Code: Anova(lm(Marks~Subject+Student), type = "II") and α=0.05

Page 5 of 7
p-value: 0.0232 < α=0.05, reject the null 𝐻0
39. Discussed in class
40. Solution: H0: The data is from a normally distributed family
H1: The data is not from a normally distributed family
The total number of observations, n = 40.
Finding the number of classes: As 25 < 40 < 26, we can divide the data into 5 groups with
equal probabilities.
1
Thus the expected frequency for each group = 40 × = 8.
5
From the sample, mean = 30.75, variance = 15.833
As we are dividing the data into 5 groups with equal probabilities, we would be looking
for 20th, 40th, 60th and 80th percentile of N(30.75, 15.833).
Obtaining the percentiles using R.
R code (after reading and attaching the file):
break.MS = qnorm(c(0.001, 0.2, 0.4, 0.6, 0.8, 0.999), mean(Mobile), sd(Mobile))
The percentiles (from R output):
18.45362, 27.40109, 29.74190, 31.75810, 34.09891, 43.04638
Dividing into 5 groups using the above cut-offs’ and computing the group frequencies
using
Class Oi Ei Under H0, the test statistic, 𝜒 2 ~ 𝜒𝑑2 , where, d = k-m-1.
18.5 – 27.4 9 8 Also, α = 0.05
27.4 – 29.4 5 8 Here we have, d = k – m – 1 = 5 – 2 -1 =2
29.4 – 31.8 11 8 5
2
(𝑂𝑖 − 𝐸𝑖 )2
31.8 – 34.1 9 8 𝜒𝑜𝑏𝑠 = ∑ =3
𝐸𝑖
34.1 - 43 6 8 𝑖=1
2 2
We reject H0 if 𝜒𝑜𝑏𝑠 > 𝜒2;0.05
2
Using R to obtain 𝜒2;0.05 using the R code qchisq(0.95, 2).
2
From R output we get, 𝜒2;0.05 = 5.991465.
2 2
As 𝜒𝑜𝑏𝑠 < 𝜒2;0.05 , we cannot reject H0, i.e., we can say at 5% level of significance, the
data is from a normally distributed family.
41. Solution: H0: The data is from a Poisson distribution family
H1: The data is not from a Poisson distribution family

 =1.14

Page 6 of 7
Number O E Number O E
per leaf per leaf
0 70 47.97285 0 70 47.97285 10.11396
1 38 54.68905 1 38 54.68905 5.092873
2 17 31.17276 2 17 31.17276 6.443675
3 10 11.84565 ≥3 25 16.16529 4.828376
4 9 3.37601
5 3 0.7697303
6 3 0.1739

Under H0, the test statistic, 𝜒 2 ~ 𝜒𝑑2 , where, d = k-m-1.


Also, α = 0.05
Here we have, d = k – m – 1 = 4 -2 =2
(𝑂𝑖 −𝐸𝑖 )2
2
𝜒𝑜𝑏𝑠 = ∑4𝑖=1 =26.47888
𝐸𝑖
2 2
We reject H0 if 𝜒𝑜𝑏𝑠 > 𝜒3;0.05
2
Using R to obtain 𝜒3;0.05 using the R code qchisq(0.95, 3).
2
From R output we get, 𝜒3;0.05 = 7.814728.
2 2
As 𝜒𝑜𝑏𝑠 > 𝜒3;0.05 , we reject H0, i.e., we can say at 5% level of significance, the data is not
from a Poisson distribution distributed family.

42. Solution: χ2 = 9.44, df =2


p-value is less than .01
Reject null 𝐻0 ; thus, plan not independent of type of company.

Page 7 of 7

You might also like