0% found this document useful (0 votes)
320 views

Statistics For Traffic Engineers

This document discusses the role of statistics in traffic engineering decision making. It explains that statistical analysis is used to estimate parameters from sample data and make inferences about populations. It also discusses important statistical concepts like distributions, sampling, common measures of central tendency and dispersion, probability mass functions, probability density functions, and cumulative distribution functions. Finally, it outlines several common traffic-related distributions like uniform, normal, Poisson, and negative exponential.

Uploaded by

salini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
320 views

Statistics For Traffic Engineers

This document discusses the role of statistics in traffic engineering decision making. It explains that statistical analysis is used to estimate parameters from sample data and make inferences about populations. It also discusses important statistical concepts like distributions, sampling, common measures of central tendency and dispersion, probability mass functions, probability density functions, and cumulative distribution functions. Finally, it outlines several common traffic-related distributions like uniform, normal, Poisson, and negative exponential.

Uploaded by

salini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Fundamentals of Traffic Operations and

Control
Topic: Statistics for Traffic Engineers

Nikolas Geroliminis
Ecole Polytechnique Fédérale de Lausanne

[email protected]
Role of Statistical Inference in
Decision-Making Process
Real World

Data Collection
Information obtained from the
Estimation of Parameters, Statistical sampled data is used to make
Choice of Distribution Inference generalizations about the
populations from which the
samples were obtained
Calculation of Probabilities,
(Using the prescribed distributions,
and estimated parameters) Sample vs. Population

Information for
Decision-Making
and Design
Role of Sampling in Statistical
Inferences

− ∞ < x < +∞

1
µ≈x x=
n
∑ xi

σ 2 ≈ s2 s2 =
1
∑ (xi − x )2
n −1
Statistical Analysis
„ Used to address the following questions:
1. How many samples are required?
2. What confidence should I have in this estimate?
3. What statistical distribution best describes the
observed data mathematically?
4. Has a traffic engineering design resulted in a
change in the characteristics of the population?
Distributions
„ What is meant by distributional form?
„ It is the frequency of specific values occurring within
the measured data set
„ Considering a traffic stream along a signalized
arterial
„ What operational considerations are there for the
signal if:
„ traffic volume is constant per unit time (i.e., uniform) vs.
randomly varying (some other distribution)?
„ What design considerations are there for turn bays?
Describing a Distribution
„ Two types of statistical parameters that
describe a distribution
„ Central tendency
„ Dispersion
Common Statistical Measures
„ Measures of central tendency
n
„ Sample Mean ∑x
i =1
i
x=
n
„ Sample Median
„
~
x = Middle value if odd # of observations
„
~
x = Average of two middle values if even #
of observations
„ Mode
„ Most frequent observation
Common Statistical Measures
„ Measures of dispersion (or variability)
„ Sample Variance ⎛ n

2

n 2 n
⎜ ∑ x
⎝ i =1 ⎠
i ⎟

∑ (x
i =1
i − x) ∑
i =1
x 2
i −
n
s2 = =
n −1 n −1
„ Sample Standard Deviation
s = s2
„ Sample Coefficient of Variation
s
cov =
x
Distribution Terms
The mechanism for assigning probabilities to
events defined by random variables is to use
either a mass function (for discrete variables) or
a density function (for continuous variables)

„ Probability mass function (p.m.f.)


„ Probability density function (p.d.f.)
„ Cumulative distribution function (c.d.f.)
p.m.f.
„ For discrete data
„ Name refers to point masses
„ Probability mass is distributed in discrete points
along measurement axis.
p.d.f.
„ For continuous data
„ Two conditions must be met
„ f(x) ≥ 0 for all x

∫ f ( x) dx = 1 (area under entire graph)


-∞

„ Thus, probability of value being between a and b is


the area under the curve between those two points.
p.d.f.
„ Name implies that probability density is
“smeared” in a continuous fashion along entire
interval of possible values.
„ Contrary to p.m.f., specific values along
measurement axis of continuous distribution
have probability of zero
c.d.f.
„ Cumulative probability for some value
X≤x
„ For p.m.f., c.d.f. is obtained by summing the
p.m.f. p(x) over all possible values x satisfying
X≤x
„ For p.d.f., c.d.f. is obtained by integrating f(x)
between the limits -∞ and x
Common Traffic Distributions
„ Uniform
„ Normal
„ Poisson
„ Negative Exponential
Uniform
„ Examples (discrete):
„ Tossing a coin
„ Rolling a six-sided die
„ Examples (continuous):
„ D/D/1 queuing (deterministic arrivals and departures with one
departure channel)
„ Suppose I take a bus to work, and that every five minute a
bus arrives at my stop. Because of variation in the time I
leave my house, I don’t always arrive at the bus stop at the
same time, so my waiting time, X, for the next bus is a
continuous random variable.
Uniform Distribution
⎧⎪ 1
A≤ x≤ B
f ( x; A, B ) = ⎨ B − A
⎪⎩ 0 otherwise

The set of possible values of X is the interval [0, 5]. A


possible probability density function for X is:

⎧ 1
⎪ 0≤ x≤5
f (x ) = ⎨ 5
⎪⎩ 0 otherwise
Normal
„ Normal distribution function is continuous
„ p.d.f. is:
1 ⎛ x−µ ⎞
2
− ⎜ ⎟
f ( x; µ , σ ) =
1 2⎝ σ ⎠
e
σ 2π
„ µ = mean, σ = standard deviation
(for population, true)
„ x = mean, s = standard deviation
(for sample, estimated)
Normal
„ What does it mean, conceptually?
„ Distribution is centered about its mean
„ Spread is function of standard deviation
„ Mean, median, and mode are numerically equal
„ 68.27% of observations will be within 1 std. dev.,
95.45% within 2 std. dev., 99.73% within 3 std. dev.
„ Values of -∞ to ∞ are theoretically possible, but
generally there are practical limits (-4 to 4)
Standard Normal
„ p.d.f. for standard normal dist. is:
f ( z;0,1) =
1 −( z 2 / 2)
e

„ To get a standard normal random variable for a
measurement from a nonstandard normal dist.,
use:
x−µ
z=
σ
Standard
Normal
Distribution
Poisson

„ Discrete distribution
„ Commonly referred to as ‘counting distribution’
„ Represents the count distribution of random
events
Poisson
„ For a sequence of events to be considered truly
random, two conditions must be met
„ Any point in time is as likely as any other for an
event to occur (e.g., vehicle arrival)
„ The occurrence of an event does not affect the
probability of the occurrence of another event (e.g.,
the arrival of one vehicle at a point in time does not
affect the arrival time of any other vehicle)
Poisson
p.m.f. for Poisson dist. is:
e − λt (λt ) x
p( x) =
x!
„ p(x) = probability of exactly x vehicles arriving in a time
interval t
„ x = # of vehicles arriving in a specific time interval
„ λ = average rate of arrival (veh/unit time)
„ t = selected time interval (duration of each counting period
(unit time))
Poisson
p.m.f. also commonly expressed as:
−m x
e m
p( x) =
x!
„ m = average number of occurrences during a specific time
period t (i.e., m = λt)
Poisson Example
„ A roadway has an average hourly volume of
360 vph. Assume that the arrival of vehicles is
Poisson distributed, estimate the probabilities of
having 0, 1, 2, 3, 4, and 5 or more vehicles
every 20 seconds.
„ See board
Negative Exponential
„ The assumption of Poisson distributed vehicle arrivals
also implies a distribution of the time intervals between
the arrivals of successive vehicles (i.e., time headway)
„ To demonstrate this, let the average arrival rate, λ, be in
units of vehicles per second, so that
q
λ=
3600
„ Substituting into Poisson equation yields
− qt
e 3600
(qt / 3600) x
p( x) =
x!
Negative Exponential
„ Note that the probability of having no vehicles arrive in
a time interval of length t (i.e., P(0)) is the equivalent of
the probability of a vehicle headway, h, being greater
than or equal to the time interval t.

P ( 0) = P ( h ≥ t )
− qt
3600 − qt
(1)e
= = e 3600
1
This distribution of vehicle headways is known as the negative
exponential distribution
Negative Exponential Example
A roadway has an average hourly volume of
360vph. Assume that the arrival of vehicles is
Poisson distributed. What is the probability of gap
between successive vehicles will be between 8 to
10 seconds?

See board
Expectation and Variance

„ Expectation (Mean) x = E ( x ) = ∫ xf ( x )dx −∞


x

„ Variance σ = E[( x − x ) ] = ∫ ( x − E[ x ]) f ( x )dx = E[ x


2
x
2

−∞
2 2
] − E [ x ]2

pdf m ean variance


B ernoulli P0 = 1 − p , P1 = p p p (1 − p )
n!
B inom ial P k q n−k np npq
( n − k )! k !
α k
P oisson e −α α α
k!
U niform 1 (b − a ) (a + b) 2 ( b − a ) 2 12
E xponential λ e − λx 1 λ 1 λ2
( x − m )2
1 −

σ
2 2
N orm al e m
2π σ
Sum of Random Variables
and Central Limit Theorem
Let Sn = x1 + x 2 +L + x n
where x1 , x 2 ,..., x n are i. i. d. with mean µ and variance σ 2 ,
then
lim fSn ( s ) = N ( nµ , nσ 2 )
n →∞

or
Sn − nµ
lim f Zn ( z ) = N ( 0, 1) where Z n =
n →∞ nσ

The sum of n similarly distributed random variables tends to the normal distribution,
no matter what the initial, underlying distribution is.

See board for an illustration


Approximating a Normal
Distribution

0.2

0.15
Probability

0.1

0.05

0
k= 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Figure 11. Binomial probability distribution with parameters n = 100


and p = 0.07 (shaded) and normal approximation to it (unshaded).
Sample Size
„ How many observations do we need?
„ It depends…on several things (e.g., confidence bounds,
standard deviation of the underlying distribution, and
tolerance)
„ Although larger samples are likely to lead to better
estimates of distribution parameters…
„ Data collection is expensive
„ Usually only able to measure fraction of possible values in the
population
„ Therefore, we would like to collect only as much data
that will give us our required level of statistical
confidence
Sample Sizes
2
⎛ z α /2 ⎞
n = ⎜s ⎟
⎝ ε ⎠
n = minimum number of measured speeds
s = estimated sample standard deviation, mph
zα/2 = constant corresponding to the desired confidence level
ε = permitted error in the average speed estimate, mph
Normal
„ Speed data
55 42 53 67 58 65 63
31 51 66 54 49 55 44
49 47 69 76 20 46 62
30 69 56 45 25 64 54
74 44 35 83 64 78 65
45 33 75 48 56 50 66
72 49 63 58 70 37 55
68 29 38 34 47 39 53
64 41 59 89 42 44 51
79 38 54 54 77 58 61
Step 1: Sort Data
„ Rank all data in ascending order:
1 - 20
2 - 25
3 - 29
4 - 30
5 - 31
6 - 33
„ and so on...
Step 2: Group Data
Suggestion:
20-29 interval 1: 3
30-39 interval 2: 9
40-49 interval 3: 15
50-59 interval 4: 18
60-69 interval 5: 15
70-79 interval 6: 8
80-89 interval 7: 2
Step 3: Plot Histogram

20

15

10

0
1

7
Interval
Step 4: Plot CDF

100%
80%
60%
40%
20%
0%
20

30

40

50

60

70

80
Speed
Sample Size Example
„ Want to collect speed data from freeway segment
„ Previous studies determined s = 4 mph (use with
caution)
„ Want to estimate population mean (µ) within
± 1 mph at a 99% confidence level

2
⎛ 2.58 ⎞
n = ⎜4• ⎟ = 106.5 → 107 observations needed
⎝ 1 ⎠
Sample Size Example
„ Consider already collected speed data sample
„ Mean = 52.3 mph
„ Std. dev. = 6.3 mph
„ n = 200
„ Want to calculate if we have an adequate sample size for a 99%
confidence level and ε = 1
2
⎛ 2.58 ⎞
n = ⎜ 6.3 ⎟ = 264 ∴ not enough observations
⎝ 1 ⎠
„ How about for 95% confidence level?
2
⎛ 1.96 ⎞
n = ⎜ 6.3 ⎟ = 152 < 200 ∴ OK
⎝ 1 ⎠
Hypothesis Testing
„ A theoretical proposition which can be tested
statistically
„ A statement about an event, the outcome of
which is unknown at the time of the prediction,
set forth in a way that it can be rejected
Possible Outcomes in the Testing of
a Hypothesis
H0: Null hypothesis
H1: Alternative hypothesis

Only one of the two hypotheses is true, but don’t know which is true

Test
True False
True OK. Type I error
Reality
False Type II error OK

Type I error: Reject a correct null hypothesis (false negative)


Type II error: Fail to reject a false null hypothesis (false positive)
Hypothesis Testing Steps
„ Formulate a hypothesis (H0)
„ Design a test procedure by which a decision can be
made
„ Use statistics to refine the test procedure, recognizing
the tradeoff of Type I error versus Type II error
„ Apply the test
„ Make a decision
Examples
„ Before and after study
„ Speed reduction of 5mph (it happened, it didn’t)
„ Accident reduction of 10% (it happened, it didn’t)
„ Compare two distributions (i.e., are two sample data
come from the same distribution?)
„ Whether observed pattern of data fits a particular
distribution (Chi-Square Test)
„ Significance of coefficients in a regression model (t
Test)
„ Etc.
Example
„ Spot speeds observed over a year on a freeway
were found to be normally distributed with a
mean of 47.25 mph, with s.d. = 8.61mph.
However, some new equipment has indicated
that the mean speed is 48.63 mph Is there any
evidence that (a) the new equipment is faulty and
(b) the new equipment is indicating a speed that
is lower than the actual speed?
Test for Significant Difference
„ Are two samples of data from the same
distribution?
„ How much difference is a significant difference?
x1 − x 2
z=
s12 s 22
+
n1 n 2

Where all variables are as defined before, with subscripts 1 and 2


referring to samples 1 and 2, respectively.
Distribution Fitting
„ How do we determine distributional form?
„ How confident can I be that the sample
distribution represents the population dist.?
Distribution Fitting
„ Plot the data
„ Use a histogram: a graphical representation of a
frequency distribution
„ Examine Plot
„ Can overlay with theoretical distributions for
comparison
Histogram w/theoretical normal
curve overlay
Goodness-of-Fit
„ If distributions look like a match, proceed to
statistical test
„ Statistical Testing
„ Different tests have been devised to compare fit of
empirical data with theoretical distribution
„ One of the most common tests is:
„ Chi-squared (Χ 2)
Chi-squared Test
„ How does Chi-squared test work?
„ Define categories (or ranges) and assign data to the
categories
„ There should be at least 5 categories and 5 data entries
per category
„ Compute the expected number of samples for each
category based upon the theorized distribution
„ Compute difference between actual
observations/class and theoretical distribution
observations/class
„ Compute Chi-squared value (see next page)
Chi-squared Statistic

χ =∑
2
I
( f0 − ft )
2

i =1 ft

„ χ2 = chi-squared value
„ f0 = observed number or frequency of observations
in category i
„ ft = theoretical (or other observed) number or
frequency of expected observations in category i
„ i = category index
„ I = number of categories
Chi-squared Test (cont.)
„ Determine reference Chi-squared value
„ Compare calculated Chi-squared value to
reference value
„ If computed value < reference value, do no reject
hypothesis that the empirical data fit the theoretical
distribution
Chi-Square Distribution
Example
„ Consider the spot speed data shown before
The computed mean was 48 mph and the computed
standard deviation is 8.6 mph.
„ Consider the following hypothesis:
„ H0: The underlying distribution is normal with µ=48
mph and σ=8.6 mph.

„ N=7 categories, f=N-1-g=7-1-2=4 (# of degrees of


freedom), a=0.05, Chi-squared value=9.488

„ Computed Chi-square value=1.0209<9.488 => cannot


reject H0

You might also like