Data Science Hypothesis Testing
Data Science Hypothesis Testing
Lecture 7
Centre for Data Science
Institute of Technical Education and Research
Siksha ‘O’ Anusandhan (Deemed to be University)
Bhubaneswar, Odisha, 751030
1 / 34
Overview
3 p-Values
4 Confidence Intervals
5 p-Hacking
7 Bayesian Inference
8 References
2 / 34
Statistical Hypothesis Testing
3 / 34
Example: Flipping a coin
4 / 34
Example: Flipping a Coin
5 / 34
Example: Flipping a Coin
6 / 34
Example: Flipping a Coin
7 / 34
Example: Flipping a Coin
We can also do the reverse find either the nontail region or the
(symmetric) interval around the mean that accounts for a certain
level of likelihood.
For example, if we want to find an interval centered at the mean and
containing 60% probability, then we find the cutoffs where the upper
and lower tails each contain 20% of the probability (leaving 60%).
1 from s c r a t c h . p r o b a b i l i t y i m p o r t i n v e r s e n o r m a l c d f
2 d e f n o r m a l u p p e r b o u n d ( p r o b a b i l i t y : f l o a t , mu : f l o a t = 0 ,
s i g m a : f l o a t = 1 ) => f l o a t :
3 ””” R e t u r n s t h e z f o r w h i c h P( Z <= z ) = p r o b a b i l i t y ”””
4 r e t u r n i n v e r s e n o r m a l c d f ( p r o b a b i l i t y , mu , s i g m a )
5 d e f n o r m a l l o w e r b o u n d ( p r o b a b i l i t y : f l o a t , mu : f l o a t = 0 ,
s i g m a : f l o a t = 1 ) => f l o a t :
6 ””” R e t u r n s t h e z f o r w h i c h P( Z >= z ) = p r o b a b i l i t y ”””
7 r e t u r n i n v e r s e n o r m a l c d f ( 1 = p r o b a b i l i t y , mu , s i g m a )
8 / 34
Example: Flipping a Coin
1 d e f n o r m a l t w o s i d e d b o u n d s ( p r o b a b i l i t y : f l o a t , mu : f l o a t = 0 ,
s i g m a : f l o a t = 1 ) => T u p l e [ f l o a t , f l o a t ] :
2 ””” R e t u r n s t h e s y m m e t r i c ( a b o u t t h e mean ) bounds t h a t
c o n t a i n t h e s p e c i f i e d p r o b a b i l i t y ”””
3 t a i l p r o b a b i l i t y = (1 = p r o b a b i l i t y ) / 2
4 # u p p e r bound s h o u l d h a v e t a i l p r o b a b i l i t y a b o v e i t
5 u p p e r b o u n d = n o r m a l l o w e r b o u n d ( t a i l p r o b a b i l i t y , mu ,
sigma )
6 # l o w e r bound s h o u l d h a v e t a i l p r o b a b i l i t y b e l o w i t
7 l o w e r b o u n d = n o r m a l u p p e r b o u n d ( t a i l p r o b a b i l i t y , mu ,
sigma )
8 r e t u r n lower bound , upper bound
9 / 34
Example: Flipping a Coin
10 / 34
Example: Flipping a Coin
Consider the test that rejects H0 if X falls outside the bounds given by
1 lower bound , upper bound = normal two sided bounds (0 .95 ,
mu 0 , s i g m a 0 )
11 / 34
Example: Flipping a Coin
12 / 34
p- Values
13 / 34
p- Values
Note
Why did we use a value of 529.5 rather than using 530? This is what’s
called a continuity correction. It reflects the fact that
normal probability between(529.5, 530.5, mu 0, sigma 0) is a better
estimate of the probability of seeing 530 heads than
normal probability between(530, 531, mu 0, sigma 0) is.
14 / 34
p- Values
One way to convince yourself that this is a sensible estimate is with a
simulation:
1 i m p o r t random
2 extreme value count = 0
3 for in range (1000) :
4 num heads = sum ( 1 i f random . random ( ) < 0 . 5 e l s e 0 f o r
in range (1000) )
5 i f num heads >= 530 o r num heads <= 4 7 0 :
6 e x t r e m e v a l u e c o u n t += 1
7 # p= v a l u e was 0 . 0 6 2 => ˜62 e x t r e m e v a l u e s o u t o f 1000
8 a s s e r t 59 < e x t r e m e v a l u e c o u n t < 6 5 , f ” {
e x t r e m e v a l u e c o u n t }”
15 / 34
p- Values
16 / 34
Confidence Intervals
17 / 34
Confidence Intervals
Note
This is a statement about the interval, not about p. You should
understand it as the assertion that if you were to repeat the experiment
many times, 95% of the time the ”true” parameter would lie within the
observed confidence interval.
we do not conclude that the coin is unfair, since 0.5 falls within our
confidence interval.
18 / 34
Confidence Intervals
19 / 34
p-Hacking
20 / 34
p-Hacking
1 from t y p i n g i m p o r t L i s t
2 d e f r u n e x p e r i m e n t ( ) => L i s t [ b o o l ] :
3 ””” F l i p s a f a i r c o i n 1000 t i m e s , True = heads , F a l s e =
t a i l s ”””
4 r e t u r n [ random . random ( ) < 0 . 5 f o r in range (1000) ]
5 d e f r e j e c t f a i r n e s s ( e x p e r i m e n t : L i s t [ b o o l ] ) => b o o l :
6 ””” U s i n g t h e 5% s i g n i f i c a n c e l e v e l s ”””
7 num heads = l e n ( [ f l i p f o r f l i p i n e x p e r i m e n t i f f l i p ] )
8 r e t u r n num heads < 469 o r num heads > 531
9 random . s e e d ( 0 )
10 experiments = [ run experiment () for in range (1000) ]
11 num rejections = len ( [ experiment for experiment in
experiments i f r e j e c t f a i r n e s s ( experiment ) ] )
12 a s s e r t n u m r e j e c t i o n s == 46
21 / 34
Example: Running an AIB Test
22 / 34
Example: Running an AIB Test
Let’s say that NA people see ad A, and that nA of them click it.
We can think of each ad view as a Bernoulli trial where pA is the
probability that someone clicks ad A.
we know that nA /NA is approximately apnormal random variable with
mean pA and standard deviation σA = pA (1 − pA )/NA .
Similarly, nB /NB is approximately a normal
p random variable with
mean pB and standard deviation σB = pB (1 − pB )/NB .
1 d e f e s t i m a t e d p a r a m e t e r s (N : i n t , n : i n t ) => T u p l e [ f l o a t ,
float ]:
2 p = n / N
3 s i g m a = math . s q r t ( p * ( 1 = p ) / N)
4 r e t u r n p , sigma
23 / 34
Example: Running an AIB Test
24 / 34
Example: Running an AIB Test
For example, if ”tastes great” gets 200 clicks out of 1,000 views and
—”ess bias” gets 180 clicks out of 1,000 views, the statistic equals.
1 z = a b t e s t s t a t i s t i c ( 1 0 0 0 , 2 0 0 , 1 0 0 0 , 1 8 0 ) # = 1.14
which means there’s only a 0.003 probability we’d see such a large
difference if the ads were equally effective.
25 / 34
Bayesian Inference
26 / 34
Bayesian Inference
For example, if alpha and beta are both 1, it’s just the uniform
distribution.
If alpha is much larger than beta, most of the weight is near 1.
if alpha is much smaller than beta, most of the weight is near 0.
28 / 34
Bayesian Inference
Beta Distribution
Beta(10,10)
Beta(1,1)
Beta(4,16)
Beta(16,4)
4
3
Probability
0
0.0 0.2 0.4 0.6 0.8 1.0
Values of Random Variable X
29 / 34
Bayesian Inference
Maybe we don’t want to take a stand on whether the coin is fair, and
we choose alpha and beta to both equal 1.
Then we flip our coin a bunch of times and see h heads and t tails.
Bayes’s theorem tells us that the posterior distribution for p is again a
Beta distribution, but with parameters alpha + h and beta + t.
Let’s say you flip the coin 10 times and see only 3 heads.
Your posterior distribution would be a Beta(4, 8), centered around
0.33. {(4,8) = (1+3, 1+7)}
If you started with a Beta(20, 20), your posterior distribution would
be a Beta(23, 27), centered around 0.46. {(23,27) = (20+3, 20+7)}
If you started with a Beta(30, 10), your posterior distribution would
be a Beta(33, 17), centered around 0.66. {(33,17) = (30+3, 10+7)}
30 / 34
Bayesian Inference
0
0.0 0.2 0.4 0.6 0.8 1.0
31 / 34
Bayesian Inference
If you flipped the coin more and more times, the prior would matter
less and less until eventually you’d have (nearly) the same posterior
distribution no matter which prior you started with.
Using Bayesian inference to test hypotheses is considered somewhat
controversial in part because the mathematics can get somewhat
complicated and in part because of the subjective nature of choosing
a prior.
32 / 34
References
[1] Joel Grus. Data Science from Scratch. First Principles with Python.
Second Edition. O’REILLY, May 2019.
33 / 34
Thank You
34 / 34