Statistics For Traffic Engineers
Statistics For Traffic Engineers
Control
Topic: Statistics for Traffic Engineers
Nikolas Geroliminis
Ecole Polytechnique Fédérale de Lausanne
[email protected]
Role of Statistical Inference in
Decision-Making Process
Real World
Data Collection
Information obtained from the
Estimation of Parameters, Statistical sampled data is used to make
Choice of Distribution Inference generalizations about the
populations from which the
samples were obtained
Calculation of Probabilities,
(Using the prescribed distributions,
and estimated parameters) Sample vs. Population
Information for
Decision-Making
and Design
Role of Sampling in Statistical
Inferences
− ∞ < x < +∞
1
µ≈x x=
n
∑ xi
σ 2 ≈ s2 s2 =
1
∑ (xi − x )2
n −1
Statistical Analysis
Used to address the following questions:
1. How many samples are required?
2. What confidence should I have in this estimate?
3. What statistical distribution best describes the
observed data mathematically?
4. Has a traffic engineering design resulted in a
change in the characteristics of the population?
Distributions
What is meant by distributional form?
It is the frequency of specific values occurring within
the measured data set
Considering a traffic stream along a signalized
arterial
What operational considerations are there for the
signal if:
traffic volume is constant per unit time (i.e., uniform) vs.
randomly varying (some other distribution)?
What design considerations are there for turn bays?
Describing a Distribution
Two types of statistical parameters that
describe a distribution
Central tendency
Dispersion
Common Statistical Measures
Measures of central tendency
n
Sample Mean ∑x
i =1
i
x=
n
Sample Median
~
x = Middle value if odd # of observations
~
x = Average of two middle values if even #
of observations
Mode
Most frequent observation
Common Statistical Measures
Measures of dispersion (or variability)
Sample Variance ⎛ n
⎞
2
n 2 n
⎜ ∑ x
⎝ i =1 ⎠
i ⎟
∑ (x
i =1
i − x) ∑
i =1
x 2
i −
n
s2 = =
n −1 n −1
Sample Standard Deviation
s = s2
Sample Coefficient of Variation
s
cov =
x
Distribution Terms
The mechanism for assigning probabilities to
events defined by random variables is to use
either a mass function (for discrete variables) or
a density function (for continuous variables)
⎧ 1
⎪ 0≤ x≤5
f (x ) = ⎨ 5
⎪⎩ 0 otherwise
Normal
Normal distribution function is continuous
p.d.f. is:
1 ⎛ x−µ ⎞
2
− ⎜ ⎟
f ( x; µ , σ ) =
1 2⎝ σ ⎠
e
σ 2π
µ = mean, σ = standard deviation
(for population, true)
x = mean, s = standard deviation
(for sample, estimated)
Normal
What does it mean, conceptually?
Distribution is centered about its mean
Spread is function of standard deviation
Mean, median, and mode are numerically equal
68.27% of observations will be within 1 std. dev.,
95.45% within 2 std. dev., 99.73% within 3 std. dev.
Values of -∞ to ∞ are theoretically possible, but
generally there are practical limits (-4 to 4)
Standard Normal
p.d.f. for standard normal dist. is:
f ( z;0,1) =
1 −( z 2 / 2)
e
2π
To get a standard normal random variable for a
measurement from a nonstandard normal dist.,
use:
x−µ
z=
σ
Standard
Normal
Distribution
Poisson
Discrete distribution
Commonly referred to as ‘counting distribution’
Represents the count distribution of random
events
Poisson
For a sequence of events to be considered truly
random, two conditions must be met
Any point in time is as likely as any other for an
event to occur (e.g., vehicle arrival)
The occurrence of an event does not affect the
probability of the occurrence of another event (e.g.,
the arrival of one vehicle at a point in time does not
affect the arrival time of any other vehicle)
Poisson
p.m.f. for Poisson dist. is:
e − λt (λt ) x
p( x) =
x!
p(x) = probability of exactly x vehicles arriving in a time
interval t
x = # of vehicles arriving in a specific time interval
λ = average rate of arrival (veh/unit time)
t = selected time interval (duration of each counting period
(unit time))
Poisson
p.m.f. also commonly expressed as:
−m x
e m
p( x) =
x!
m = average number of occurrences during a specific time
period t (i.e., m = λt)
Poisson Example
A roadway has an average hourly volume of
360 vph. Assume that the arrival of vehicles is
Poisson distributed, estimate the probabilities of
having 0, 1, 2, 3, 4, and 5 or more vehicles
every 20 seconds.
See board
Negative Exponential
The assumption of Poisson distributed vehicle arrivals
also implies a distribution of the time intervals between
the arrivals of successive vehicles (i.e., time headway)
To demonstrate this, let the average arrival rate, λ, be in
units of vehicles per second, so that
q
λ=
3600
Substituting into Poisson equation yields
− qt
e 3600
(qt / 3600) x
p( x) =
x!
Negative Exponential
Note that the probability of having no vehicles arrive in
a time interval of length t (i.e., P(0)) is the equivalent of
the probability of a vehicle headway, h, being greater
than or equal to the time interval t.
P ( 0) = P ( h ≥ t )
− qt
3600 − qt
(1)e
= = e 3600
1
This distribution of vehicle headways is known as the negative
exponential distribution
Negative Exponential Example
A roadway has an average hourly volume of
360vph. Assume that the arrival of vehicles is
Poisson distributed. What is the probability of gap
between successive vehicles will be between 8 to
10 seconds?
See board
Expectation and Variance
∞
−∞
2 2
] − E [ x ]2
or
Sn − nµ
lim f Zn ( z ) = N ( 0, 1) where Z n =
n →∞ nσ
The sum of n similarly distributed random variables tends to the normal distribution,
no matter what the initial, underlying distribution is.
0.2
0.15
Probability
0.1
0.05
0
k= 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
20
15
10
0
1
7
Interval
Step 4: Plot CDF
100%
80%
60%
40%
20%
0%
20
30
40
50
60
70
80
Speed
Sample Size Example
Want to collect speed data from freeway segment
Previous studies determined s = 4 mph (use with
caution)
Want to estimate population mean (µ) within
± 1 mph at a 99% confidence level
2
⎛ 2.58 ⎞
n = ⎜4• ⎟ = 106.5 → 107 observations needed
⎝ 1 ⎠
Sample Size Example
Consider already collected speed data sample
Mean = 52.3 mph
Std. dev. = 6.3 mph
n = 200
Want to calculate if we have an adequate sample size for a 99%
confidence level and ε = 1
2
⎛ 2.58 ⎞
n = ⎜ 6.3 ⎟ = 264 ∴ not enough observations
⎝ 1 ⎠
How about for 95% confidence level?
2
⎛ 1.96 ⎞
n = ⎜ 6.3 ⎟ = 152 < 200 ∴ OK
⎝ 1 ⎠
Hypothesis Testing
A theoretical proposition which can be tested
statistically
A statement about an event, the outcome of
which is unknown at the time of the prediction,
set forth in a way that it can be rejected
Possible Outcomes in the Testing of
a Hypothesis
H0: Null hypothesis
H1: Alternative hypothesis
Only one of the two hypotheses is true, but don’t know which is true
Test
True False
True OK. Type I error
Reality
False Type II error OK
χ =∑
2
I
( f0 − ft )
2
i =1 ft
χ2 = chi-squared value
f0 = observed number or frequency of observations
in category i
ft = theoretical (or other observed) number or
frequency of expected observations in category i
i = category index
I = number of categories
Chi-squared Test (cont.)
Determine reference Chi-squared value
Compare calculated Chi-squared value to
reference value
If computed value < reference value, do no reject
hypothesis that the empirical data fit the theoretical
distribution
Chi-Square Distribution
Example
Consider the spot speed data shown before
The computed mean was 48 mph and the computed
standard deviation is 8.6 mph.
Consider the following hypothesis:
H0: The underlying distribution is normal with µ=48
mph and σ=8.6 mph.