100% found this document useful (1 vote)
52 views11 pages

Week-7_GA_Solution_1

The document contains solutions to various statistical problems related to data science, including the application of Chebyshev's inequality, empirical distributions, sample means, variances, and the weak law of large numbers. It covers topics such as calculating sample sizes, variances of linear combinations of random variables, and bounds on probabilities. Each problem is presented with a detailed solution and relevant calculations.

Uploaded by

bjoshita05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
52 views11 pages

Week-7_GA_Solution_1

The document contains solutions to various statistical problems related to data science, including the application of Chebyshev's inequality, empirical distributions, sample means, variances, and the weak law of large numbers. It covers topics such as calculating sample sizes, variances of linear combinations of random variables, and bounds on probabilities. Each problem is presented with a detailed solution and relevant calculations.

Uploaded by

bjoshita05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Statistics for Data Science - 2

Graded assignment Solution- Sept 2024


Week 7

1. Suppose X1 , X2 , . . . , Xn are n iid random variables with mean µ and variance σ 2 = 16.
Using Chebyshev’s inequality, find the minimum value of n such that

P (| X − µ |< 1) > 0.90.

Answer: 160
Solution:
σ2
E[X̄] = µ, Var[X̄] =
n
Note that X̄ = X1 + X2 + . . . + Xn
By Chebyshev’s inequality, we have
 σ2
P X −µ <δ ≥1− 2

And given

P (| X − µ |< 1) > 0.90


σ2
⇒ 1 − 2 > 0.90

16
⇒1− > 0.90
n
16
⇒ 1 − 0.90 >
n
16
⇒ 0.10 >
n
16
⇒n>
0.10
⇒ n > 160

Hence, the minimum of n should be 160.

2. Consider a sample 1, 0, 1, 0, 1, 1, 0, 1, 1, 1 from the Bernoulli(0.6) distribution.


(a) Compute the empirical distribution of the sample.
A. p(0) = 0.4, p(1) = 0.6
B. p(0) = 0.3, p(1) = 0.7
C. p(0) = 0.5, p(1) = 0.5
Course: Statistics for Data Science - II Page 2 of 11

D. p(0) = 0.7, p(1) = 0.3


Answer: B

Solution :
Since, the empirical distribution is the discrete distribution with PMF

#(Xi = t)
p(t) =
n
#(Xi = t) denotes the number of times t occurs in the samples.
Therefore,
3
p(0) = = 0.3
10
And
7
p(1) = = 0.7
10
Hence, option (B) is correct.

(b) Compute the sample mean. Enter the answer correct to one decimal place.
Answer: 0.7
Solution:
X 1 + X2 + . . . + X 1 0
X̄ =
n
1+0+1+0+1+1+0+1+1+1
=
10
7
=
10
= 0.7

3. Let X1 , X2 , X3 are three independent and identically distributed random variables with
mean µ and variance σ 2 . Given below are 3 different formulations of sample mean.
(Observe that E[A] = E[B] = E[C]).

X1 + X 2 + X3
A=
3

B = 0.1X1 + 0.3X2 + 0.6X3

C = 0.2X1 + 0.3X2 + 0.5X3


Course: Statistics for Data Science - II Page 3 of 11

Choose the correct option from the following:


(a) Var(A) = Var(B) = Var(C)
(b) Var(A) ≥ Var(B) ≥ Var(C)
(c) Var(A) ≤ Var(B) ≤ Var(C)
(d) Var(A) ≤ Var(C) ≤ Var(B)

Solution:
Let X1 , X2 , X3 ∼ i.i.d. X, where E[X] = µ, Var(X) = σ 2

 
X1 + X2 + X3
Var(A) = Var
3
1
= (Var[X1 ] + Var[X2 ] + Var[X3 ])
9
1 σ2
= (3σ 2 ) =
9 3

Var(B) = Var(0.1X1 + 0.3X2 + 0.6X3 )


= 0.01Var[X1 ] + 0.09Var[X2 ] + 0.36Var[X3 ]
= 0.46σ 2

Var(C) = Var(0.2X1 + 0.3X2 + 0.5X3 )


= 0.04Var[X1 ] + 0.09Var[X2 ] + 0.25Var[X3 ]
= 0.38σ 2

Therefore, Var(B) ≥ Var(C) ≥ Var(A).

4. A random sample of size 25 is collected from a normal population with a mean of 50


and a standard deviation of 5. Find the variance of the sample mean.
Solution:
We know that the variance of the sample mean X is given by

σ2
Var[X] =
n
52
=
25
=1
Course: Statistics for Data Science - II Page 4 of 11

X
5. A fair die is rolled 100 times. Let denote the number of times six is obtained. Find
100
X 1
a bound for the probability that differs from by less than 0.1 using the weak law
100 6
of large numbers.

5
(a) at least
36
31
(b) at least
36
5
(c) at most
36
31
(d) at most
36
Solution:
X denotes the number of times six is obtained on rolling a fair die 100 times. Let
X1 , X2 , . . . , X100 be 100 i.i.d. samples such that
(
1 if six appears on rolling a fair die
Xi =
0 otherwise

1 5
E[Xi ] = µ = and Var(Xi ) = σ 2 =
6 36
Notice that X = X1 + X2 + X3 + · · · + X100 .
 
X 1
To find: Bound on P − < 0.1 . By the weak law of large numbers, we have
100 6
σ2
P (|X − µ| < δ) ≥ 1 − 2

  5
X 1 36
⇒P − < 0.1 ≥ 1 −
100 6 100 × 0.01
 
X 1 5 31
⇒P − < 0.1 ≥ 1 − =
100 6 36 36

6. Let X1 , X2 , . . . , X5 be i.i.d. samples whose distribution has a mean of 20 and variance


of 4. Suppose the sample variance is defined as

(X1 − X)2 + · · · + (X5 − X)2


S2 =
5
X1 +X2 +···+X5
where X = 5
. Find the expected value of S 2 .
Course: Statistics for Data Science - II Page 5 of 11

Solution:

σ2 4
E[X] = µ = 20 and Var[X] = = = 0.8.
n 5
" n
#
1 X
E[S 2 ] = E (Xi − X)2
n i=1
" n #
1 X
= E (Xi − X)2
n
" i=1
n
#
1 X 2
= E (Xi2 + X − 2Xi X)
n
" i=1
n
#
1 X 2
= E Xi2 + nX − 2nXX
n
" i=1
n
#
1 X 2 2
= E Xi2 + nX − nX
n
" i=1
n
#
1 X
= E Xi2
n
" ni=1 #
1 X 2
= E[Xi2 ] − nE[X ]
n i=1
" n  2 #
1 X 2 σ
= (σ + µ2 ) − n + µ2
n i=1 n
1
(nσ 2 + nµ2 ) − (σ 2 + nµ2 )

=
n
(n − 1)σ 2
=
n

4
Here, n = 5, therefore, E[S 2 ] = × 4 = 3.2.
5
7. Suppose Xi ∼ Normal 0, i12 , where i = 1, 2, . . . , 9 and X1 , X2 , . . . , X9 are independent


of each other. Let Y be a random variable defined as Y = 9i=1 iXi . Find the variance
P
of Y .
Solution
Course: Statistics for Data Science - II Page 6 of 11

9
!
X
Var(Y ) = Var iXi
i=1
= Var(X1 + 2X2 + 3X3 + · · · + 9X9 )
= Var(X1 ) + Var(2X2 ) + · · · + Var(9X9 )
= Var(X1 ) + 4Var(X2 ) + · · · + 81Var(X9 )
   
1 1 1
= 2 + 4 2 + · · · + 81 2
1 2 9
=9

8. A random sample of size 50 is collected from a population P , where P ∼ Uniform[0, 12].


Find a lower bound on the probability that the sample mean will be at most 3 units
away from the actual mean using the weak law of large numbers.

Solution:

P ∼ Uniform[0, 12]
0 + 12 (12 − 0)2 144
E[P ] = µ = = 6, Var(P ) = σ 2 = = = 12
2 12 12

By weak law of large numbers, we have

σ2
P (|X − µ| < δ) ≥ 1 −
nδ 2
 12 73
P |X − µ| < 3 ≥ 1 − = ≈ 0.9733
50 × 9 75

9. Suppose a random sample is used to estimate the proportion of voters in a city. If the
sample proportion is roughly 0.45, what sample size is necessary so that the standard
deviation of the sample proportion is 0.02?
Solution
Let the random variable X represent that the selected candidate is a voter.
Let Xi be defined as
(
1, if the selected candidate is a voter
Xi =
0, otherwise

Define an event A as A : X = 1.
It is given that P (A) = 0.45.
P (A)(1 − P (A))
We know that Var(S(A)) = .
n
Course: Statistics for Data Science - II Page 7 of 11

r
p(1 − p)
= 0.02
r n
(0.45)(0.55)
= 0.02 =⇒ n = 618.75 ≈ 619
n

10. The average life (in years) of an electronic watch follows an exponential distribution with
1
parameter . Find the lower bound on the probability that the mean life of a random
2
sample of 50 such watches falls between 1 and 3 years. Enter your answer correct to two
decimals.
Hint: Use the weak law of large numbers.
Solution
Let the random variable X represent the life of an electronic watch.
It is given that X ∼ Exp(1/2) and 50 such samples are taken.
E[X] = µ = 2, Var(X) = σ 2 = 4
To find: a lower bound on P (1 < X < 3).
By the weak law of large numbers, we have

σ2
P (|X − µ| < δ) ≥ 1 −
nδ 2
 4
P |X − 2| < 1 ≥ 1 −
50 × 1
23
=
25
= 0.92

11. A university evaluates the final scores of students based on coursework and a final project.
The variance of the coursework scores is 15, and the variance of the final project scores
is 30. Coursework contributes 70% to the final evaluation, while the project contributes
30%. Assuming the scores of coursework and project are uncorrelated, what is the
variance of the final evaluation scores? Enter the answer correct to two decimal places.
Answer : 10.05 ; Range : 10.02 to 10.08
Solution
To compute the variance of the final evaluation scores, we use the formula for the variance
of a linear combination of uncorrelated random variables:

Var(aX + bY ) = a2 Var(X) + b2 Var(Y ),

where X and Y are independent random variables, and a and b are the weights of X
and Y , respectively.
Course: Statistics for Data Science - II Page 8 of 11

In this case:

a = 0.7 (weight of coursework),


b = 0.3 (weight of the project),
Var(X) = 15 (variance of coursework scores),
Var(Y ) = 30 (variance of project scores).

Substitute the values into the formula:

Var(Final Evaluation) = (0.7)2 · 15 + (0.3)2 · 30.

Var(Final Evaluation) = 0.49 · 15 + 0.09 · 30.

Var(Final Evaluation) = 7.35 + 2.7 = 10.05.

Therefore, the variance of the final evaluation scores is 10.05.

12. In a large population of students, an unknown proportion p prefers Chips over Cookies.
A survey was conducted among 200 students and 120 preferred Chips. Assume the 200
samples are i.i.d. Bernoulli (p).
Based on the given information, answer the following questions:
[(i).]What is the sample mean? Enter the answer correct to one decimal place.
Answer : 0.6 To determine the sample mean, we use the formula for the sample
mean:
Number of successes
X̄ =
Total number of samples
Given:

Number of successes = 120 (students who preferred Chips),


Total number of samples = 200 (total students surveyed).

Substituting the values into the formula:


120
X̄ = .
200
Simplify the fraction:
X̄ = 0.6.
Thus, the sample mean is:
0.6 .
What is the variance of the sample mean? Assume p is equal to the sample
mean.[(a)]
2.
1. (a) 0.24
Course: Statistics for Data Science - II Page 9 of 11

(b) 0.36
(c) 0.0018
(d) 0.0012
Answer : d
The variance of the sample mean is given by the formula:

p(1 − p)
Var(X̄) = ,
n
where:

p = proportion of successes (equal to the sample mean),


n = total number of samples.

From the given data:

p = 0.6 (calculated sample mean),


n = 200 (total number of students surveyed).

Substituting the values into the formula:

0.6(1 − 0.6)
Var(X̄) = .
200
Simplify the expression:
0.6 · 0.4
Var(X̄) = .
200

0.24
Var(X̄) = .
200

Var(X̄) = 0.0012.
Thus, the variance of the sample mean is:

0.0012 .

3. If 30 more students join the survey and they all prefer Cookies, what will be the
new sample mean and variance of the sample mean?
[(a)]new sample mean = 0.24, new variance = 0.0012 new sample mean = 0.6,
new variance = 0.0012 new sample mean = 0.6522, new variance = 0.0009 new
sample mean = 0.5217, new variance = 0.00108
Answer : d
Course: Statistics for Data Science - II Page 10 of 11

New Sample Mean


The formula for the sample mean is:
Number of successes
X̄ = .
Total number of samples

From the given data:

Original number of successes = 120,


Original total number of samples = 200.

If 30 more students join and they all prefer Cookies, the updated values are:

New number of successes = 120 (unchanged, as no additional students prefer Chips),


New total number of samples = 200 + 30 = 230.

The new sample mean is:


120
X̄ = .
230
Simplify the fraction:

X̄ = 0.5217 (rounded to 4 decimal places).

Thus, the new sample mean is:


0.52174 .

New Variance of the Sample Mean


The formula for the variance of the sample mean is:
p(1 − p)
Var(X̄) = ,
n
where:

p = new sample mean,


n = new total number of samples.

Substituting the values:

p = 0.52174,
n = 230.

Substitute these into the formula:


0.52174(1 − 0.52174)
Var(X̄) = .
230
Course: Statistics for Data Science - II Page 11 of 11

Simplify the calculation:


0.52174 · 0.47826
Var(X̄) = .
230

0.249998
Var(X̄) = .
230

Var(X̄) = 0.00108 (rounded to 5 decimal places).


Thus, the new variance of the sample mean is:

0.00108 .

You might also like