0% found this document useful (0 votes)

2K views

Hypothesis Testing in Python

The document describes using a hypothesis test to analyze crash frequency data from rear-end and side-swipe crashes at intersections. The null hypothesis is that there is no difference in the mean crash frequencies between the two types. A t-test is used to calculate the t-statistic and p-value, finding no statistically significant difference since the t-value is in the acceptance region and the p-value is above the critical value. Therefore, the analyst cannot reject the null hypothesis of no difference in the mean crash frequencies.

Uploaded by

Umair Durrani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2K views

Hypothesis Testing in Python

Uploaded by

Umair Durrani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Hypothesis Testing for Mean Difference (2 Samples) using Python

April 26, 2015

In [1]: # Telling IPython to render plots inside cells
%matplotlib inline
In [3]: # Importing required Libraries
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import ggplot as gg
from IPython.display import display
from IPython.display import Image
from IPython.display import HTML

Problem Statement

A traffic analyst in the city of Zreeha wants to find if there is any difference in the crash frequencies (no.
of crashes per year) between rear-end and side-swipe crashes. The transport depeartment collects crash
frequencies for a year at 10 sites of 4-legged intersections. The data is described below in the data frame df.
Statistically speaking, the analyst wants to answer the question:
Are the crash frequencies between rear-end and side-swipe crashes at 4-legged intersection statistically different?

In [14]: # Rear-end Crash

HTML(<img src="http://upload.wikimedia.org/wikipedia/commons/1/1f/Head_On_Collision.jpg" width
Out[14]: <IPython.core.display.HTML object>

In [13]: # Side-swipe Crash

HTML(<img src="http://upload.wikimedia.org/wikipedia/commons/5/50/Japanese_car_accident_blur.j
Out[13]: <IPython.core.display.HTML object>

1.1
1.1.1

Data Description
Reading Data

We will first read the data which is saved in a csv file:

In [21]: df = pd.read_csv(C:\\Users\\durraniu\\Documents\\HT2.csv)
df.head()

Out[21]:
0
1
2
3
4

Unnamed: 0 Crash Frequency \n(Crashes per year)

Site #
Rear-end
1
10
2
7
3
6
4
5

Unnamed: 2
Side-swipe
12
9
4
7

We can see that the first row is un-necessary here so we can skip that.
In [22]: df = pd.read_csv(C:\\Users\\durraniu\\Documents\\HT2.csv, skiprows = 2)
df.head()
Out[22]:
0
1
2
3
4
1.1.2

Site #
1
2
3
4
5

Rear-end
10
7
6
5
9

Side-swipe
12
9
4
7
8

Summary Statistics

In [23]: df.describe()
Out[23]:
count
mean
std
min
25%
50%
75%
max

Site #
10.00000
5.50000
3.02765
1.00000
3.25000
5.50000
7.75000
10.00000

Rear-end
10.000000
8.200000
1.932184
5.000000
7.000000
8.500000
9.750000
11.000000

Side-swipe
10.000000
8.300000
2.311805
4.000000
7.000000
8.000000
9.750000
12.000000

But we are not really interested in individual averages of rear-end and side-swipe crashes but the difference
between them. Our main goal is to verify whether the mean of the differences is statistically significant.
1.1.3

Hypothesis Testing

For estimating the significance in mean difference in crash frequencies well first find the difference:
In [24]: df[d] = df[Rear-end] - df[Side-swipe]
df.head()
Out[24]:
0
1
2
3
4

Site #
1
2
3
4
5

Rear-end
10
7
6
5
9

Side-swipe d
12 -2
9 -2
4 2
7 -2
8 1

The mean of the differences of two samples is:

In [27]: dbar = df[d].mean()
print(dbar)
-0.1

And the standard deviation is:

In [28]: s = df[d].std()
print(s)
1.66332999332
Hypothesis Our null hypothesis is that there is no difference between the crash frequencies of rear-end
and side-swipe crashes or, in other words, the mean of the population of all these differences is zero:
Ho : D = 0 and the alternative hypothesis would be:
HA : D 6= 0
Level of significance = 0.5
In [64]: HTML(<img src="HT2.png" width=750 height=500/>)
Out[64]: <IPython.core.display.HTML object>
Critical Value Because we have a sample size of 10 only we will use t-test instead of Z distribution.
According to CLT, the mean of the sampling distribution of mean differences in crash frequencies of Rearend and Side-swipe crashes is equal to the population mean difference which is assumed as zero in this
case.
We can find the critical t for 0.05 significance level and degree of freedom 9 using following command:
In [73]: from scipy.stats import distributions as dists
tcritical = dists.t.ppf(1-0.05/2, 9)
print(tcritical)
2.26215716274
t-statistic

From our data we can compute t score using following formula:

p
t = (d D )/(s/ (n))

We can use the following command in stats module to find the t-statistic and p-value for two-tailed test:
In [74]: paired_sample = stats.ttest_rel(df[Rear-end], df[Side-swipe])
print "The t-statistic is %.3f and the p-value is %.3f." % paired_sample
The t-statistic is -0.190 and the p-value is 0.853.

1.2

Conclusion

Because the t-value falls in the acceptance region i.e. between 2.262 and -2.262 critical t-values we fail to
reject the null hypothesis.
Another way to interpret the result is that the p-value is higher than the critical t-value, the probability
of getting the observed or extreme mean difference given the null hypothesis is true is higher than the
probability of rejecting the null hypothesis when it is in fact true. Therefore, we fail to reject the null
hypothesis. In the context of this example, we say that mean difference between rear-end and side-swipe
crashes is not statistically significant.

1.3

Resources
Learning Python for Data Analysis and Visualization
Data Analysis and Statistical Inference course
Caldwell, Sally. Statistics unplugged. Cengage Learning, 2012.
paired t test in python

In [67]: %reload_ext version_information

%version_information numpy, scipy, matplotlib, sympy, pandas, ggplot
Out[67]:
Software
Python
IPython
OS
numpy
scipy
matplotlib
sympy
pandas
ggplot
Sun Apr 26

Version
2.7.9 64bit [MSC v.1500 64 bit (AMD64)]
3.0.0
Windows 8 6.2.9200
1.9.2
0.15.1
1.4.3
0.7.6
0.16.0
0.6.5
17:40:56 2015 Eastern Daylight Time

Solution-3685 Basic Statistics 1st Assignment
100% (1)
Solution-3685 Basic Statistics 1st Assignment
12 pages
Union and Intersection of Events
100% (3)
Union and Intersection of Events
40 pages
T Test
No ratings yet
T Test
35 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
54 pages
T Test
0% (1)
T Test
3 pages
11.univariate and Bivariate Analysis of Data
100% (1)
11.univariate and Bivariate Analysis of Data
18 pages
hypothesis
No ratings yet
hypothesis
16 pages
Statistical_Hypothesis_Testing
No ratings yet
Statistical_Hypothesis_Testing
20 pages
Chapter 2 T Test
No ratings yet
Chapter 2 T Test
42 pages
AD3411 - 6 To11
No ratings yet
AD3411 - 6 To11
15 pages
Module 3 Hypothesis Testing Using R
No ratings yet
Module 3 Hypothesis Testing Using R
7 pages
RM Presentation
No ratings yet
RM Presentation
19 pages
Lecture Material 6
No ratings yet
Lecture Material 6
3 pages
An Introduction To T-Tests: Statistical Test Means Hypothesis Testing
100% (1)
An Introduction To T-Tests: Statistical Test Means Hypothesis Testing
8 pages
STAT501 Online - Spring2024 - FinalExam
No ratings yet
STAT501 Online - Spring2024 - FinalExam
14 pages
Chap 4 2nd Part
No ratings yet
Chap 4 2nd Part
18 pages
_unit-2
No ratings yet
_unit-2
7 pages
MKTG 4110 Class 4
No ratings yet
MKTG 4110 Class 4
10 pages
Lab 5 - Hypothesis Testing Using One Sample T-Test: Table 1
No ratings yet
Lab 5 - Hypothesis Testing Using One Sample T-Test: Table 1
7 pages
T-Test Z Test
No ratings yet
T-Test Z Test
33 pages
Hypothesis Testing Statistics
No ratings yet
Hypothesis Testing Statistics
59 pages
AOD.Lec7-8.Activities
No ratings yet
AOD.Lec7-8.Activities
10 pages
Hypothesis Testing in Machine Learning Using Python - by Yogesh Agrawal - 151413
No ratings yet
Hypothesis Testing in Machine Learning Using Python - by Yogesh Agrawal - 151413
15 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Chapter 4 _STAT1204 A
No ratings yet
Chapter 4 _STAT1204 A
10 pages
Q4 W1 Focused and Consolidated With Annotation
No ratings yet
Q4 W1 Focused and Consolidated With Annotation
33 pages
Transformando La Movilidad Urbana en Mexico2
No ratings yet
Transformando La Movilidad Urbana en Mexico2
4 pages
Prob Stat Lesson 9
No ratings yet
Prob Stat Lesson 9
44 pages
Stats - Hypothesis - Testing - Ipynb at Main Pik1989 - Stats GitHub
No ratings yet
Stats - Hypothesis - Testing - Ipynb at Main Pik1989 - Stats GitHub
10 pages
Hypothesis Test
No ratings yet
Hypothesis Test
19 pages
WA 4
No ratings yet
WA 4
7 pages
T-Test in ML
No ratings yet
T-Test in ML
3 pages
Statistics Qestions PDF
No ratings yet
Statistics Qestions PDF
66 pages
pdf_merge
No ratings yet
pdf_merge
23 pages
Analysing and Presenting Data: Practical Hints: Daniele CEI, Giorgio MATTEI
No ratings yet
Analysing and Presenting Data: Practical Hints: Daniele CEI, Giorgio MATTEI
53 pages
Unit Ii DS LM
No ratings yet
Unit Ii DS LM
20 pages
Chapter10 Stats
No ratings yet
Chapter10 Stats
7 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
8 pages
Applied Econometrics Using Stata
No ratings yet
Applied Econometrics Using Stata
48 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
60 pages
Aqt 1
No ratings yet
Aqt 1
33 pages
Introduction to Statistical Hypothesis Testing in R
No ratings yet
Introduction to Statistical Hypothesis Testing in R
8 pages
Hypothesis_testing
No ratings yet
Hypothesis_testing
5 pages
3.1 Comparing Two Means
No ratings yet
3.1 Comparing Two Means
26 pages
An Introduction To Statistical Inference
No ratings yet
An Introduction To Statistical Inference
33 pages
Topic - Chapter 10 - Two-Sample Hypothesis Tests
No ratings yet
Topic - Chapter 10 - Two-Sample Hypothesis Tests
1 page
Chapter2 handout
No ratings yet
Chapter2 handout
34 pages
Hypothesis Testing : Z-Test, T-Test, F-Test
No ratings yet
Hypothesis Testing : Z-Test, T-Test, F-Test
42 pages
Isl 6.2
No ratings yet
Isl 6.2
11 pages
Ce 023 Module 5 and 6
No ratings yet
Ce 023 Module 5 and 6
9 pages
Chapter 2 & 3-Review of Probability and Statistics
No ratings yet
Chapter 2 & 3-Review of Probability and Statistics
93 pages
Neon Green White Playful Illustrative Market Research Presentation
No ratings yet
Neon Green White Playful Illustrative Market Research Presentation
21 pages
Lecture 2 HYPOTHESIS TESTING Real
No ratings yet
Lecture 2 HYPOTHESIS TESTING Real
10 pages
Grey Minimalist Business Project Presentation
No ratings yet
Grey Minimalist Business Project Presentation
16 pages
Notes509fall11sec45 PDF
No ratings yet
Notes509fall11sec45 PDF
12 pages
Inferential Statistics Final[1]
No ratings yet
Inferential Statistics Final[1]
49 pages
Common Statistics
No ratings yet
Common Statistics
23 pages
Hypothesis Testing for two populations (Excel Tutorial)
No ratings yet
Hypothesis Testing for two populations (Excel Tutorial)
5 pages
ENME392-Sample Final
No ratings yet
ENME392-Sample Final
8 pages
Hypothesis of Two Population
No ratings yet
Hypothesis of Two Population
122 pages
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet
Dive Into Algorithms: A Pythonic Adventure for the Intrepid Beginner
From Everand
Dive Into Algorithms: A Pythonic Adventure for the Intrepid Beginner
Bradford Tuckfield
No ratings yet
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
2 Biostat Probability
No ratings yet
2 Biostat Probability
53 pages
II-4 B Basic Statistics - COLLADO Pages 9-10
No ratings yet
II-4 B Basic Statistics - COLLADO Pages 9-10
2 pages
Pertemuan 7z
No ratings yet
Pertemuan 7z
31 pages
Lesson 2 Linear Regression
100% (1)
Lesson 2 Linear Regression
21 pages
Chap5 Statistical Inference
No ratings yet
Chap5 Statistical Inference
38 pages
Skittles Project Final
No ratings yet
Skittles Project Final
5 pages
Demand
No ratings yet
Demand
2 pages
Module 2 - Statistics 2 Activities Answers Eglyn
100% (1)
Module 2 - Statistics 2 Activities Answers Eglyn
7 pages
Final Test Stat
No ratings yet
Final Test Stat
6 pages
MATH 231-Statistics-Dr. Hanif Mian
No ratings yet
MATH 231-Statistics-Dr. Hanif Mian
3 pages
SPSS Statistics: A Practical Guide 5e 5th Edition Kellie Bennett download pdf
100% (3)
SPSS Statistics: A Practical Guide 5e 5th Edition Kellie Bennett download pdf
41 pages
Stat 410 Tutorial Week 7
No ratings yet
Stat 410 Tutorial Week 7
3 pages
Chapter 06 Discrete Probability Distributions Answer Key
No ratings yet
Chapter 06 Discrete Probability Distributions Answer Key
91 pages
Final Examination Puerto Galera National High School - San Isidro Extension
No ratings yet
Final Examination Puerto Galera National High School - San Isidro Extension
4 pages
4 Discrete Probability Distribution
No ratings yet
4 Discrete Probability Distribution
11 pages
Stratified Sampling
No ratings yet
Stratified Sampling
4 pages
Efron 1994
100% (1)
Efron 1994
14 pages
xSulBlessed Top 1 Flags Brasil
No ratings yet
xSulBlessed Top 1 Flags Brasil
6 pages
Random Variable PDF
100% (1)
Random Variable PDF
9 pages
Laporan Praktikum StatDas2020 - Siti Rubi'Ah (G1B019069) FISIKA A
No ratings yet
Laporan Praktikum StatDas2020 - Siti Rubi'Ah (G1B019069) FISIKA A
59 pages
Name: Pangilinan, Maria Angela Q. Section: BSMT1-1D Biostatistics Activity n0.2
100% (1)
Name: Pangilinan, Maria Angela Q. Section: BSMT1-1D Biostatistics Activity n0.2
3 pages
QTM Assignment
No ratings yet
QTM Assignment
12 pages
ISEE 760 Syllabus Fall 2021
No ratings yet
ISEE 760 Syllabus Fall 2021
14 pages
Sampling
No ratings yet
Sampling
9 pages
MTPDF2 Probability
No ratings yet
MTPDF2 Probability
107 pages
Stochastic Process: X T X X
No ratings yet
Stochastic Process: X T X X
8 pages

Hypothesis Testing in Python

Uploaded by

Hypothesis Testing in Python

Uploaded by

Hypothesis Testing for Mean Difference (2 Samples) using Python

April 26, 2015

In [14]: # Rear-end Crash

In [13]: # Side-swipe Crash

We will first read the data which is saved in a csv file:

Unnamed: 0 Crash Frequency \n(Crashes per year)

The mean of the differences of two samples is:

And the standard deviation is:

From our data we can compute t score using following formula:

In [67]: %reload_ext version_information

You might also like