CS30 5 System Modeling and Simulation Prof. Dr. Khaled Mahar
CS30 5 System Modeling and Simulation Prof. Dr. Khaled Mahar
5 Simulation
Lecture 8
1
Chapter 9
Input Modeling
3
Data Collection
6
Histograms [Identifying the distribution]
7
Histograms continued
The number of class intervals depends on:
The number of observations
The dispersion of the data
Suggested: No. of interval = the square root of the sample size
For continuous data:
Corresponds to the probability density function of a theoretical
distribution
For discrete data:
Corresponds to the probability mass function
If few data points are available: combine adjacent cells to
eliminate the ragged appearance of the histogram
8
Sample Histograms
6
5
4
3
2
1
0
0 2 4 6 8 10 12 14 16 18 20 22 24
9
Sample Histograms (cont.)
25
20
15
10
0
0~7 8 ~ 15 16 ~ 24
10
Sample Histograms (cont.)
12
10
8
6
4
2
0
0~2 3~5 6~8 9~11 12~14 15~17 18~20 21~24
11
Discrete Data Example
12
Discrete Data Example (cont.)
13
Histogram of number of arrivals per period
20
18
16
14
12
10
8
6
4
2
0
0 1 2 3 4 5 6 7 8 9 10 11
Number of arrivals per period
Since the data is discrete and there are ample data, so the
histogram may have a cell for each possible value in the data
range 14
Continuous Data Example
79.919 3.081 0.062 1.961 5.845 3.027 6.505 0.021 0.012 0.123
6.769 59.899 1.192 34.760 5.009 18.387 0.141 43.565 24.420 0.433
144.695 2.663 17.967 0.091 9.003 0.941 0.878 3.371 2.157 7.579
0.624 5.380 3.148 7.078 23.960 0.590 1.928 0.300 0.002 0.543
7.004 31.764 1.005 1.147 0.219 3.217 14.382 1.008 2.336 4.562
15
Continuous Data Example (cont.)
16
Continuous Data Example (cont.)
23
10
2
1 1 1 1 1 1
0 0
0 3 6 9 12 15 18 21 24 27 30 33 36 ...
17
Selecting the Family of Distributions
[Identifying the distribution]
A family of distributions is selected based on:
The context of the input variable
The physical characteristics of the input process
Is it naturally discrete or continuous valued?
Are the observable values inherently bounded or is there
no natural bound?
Shape of the histogram
There is no “true” distribution for any stochastic input process
Goal: obtain a good approximation
18
Selecting the Family of Distributions
[Identifying the distribution]
Page 364: Use the physical basis of the distribution as a
guide, for example:
Binomial: # of successes in n trials
Poisson: # of independent events that occur in a fixed amount of
time or space
Normal: distribution of a process that is the sum of a number of
component processes (time to assemble a product is the sum of
times required for each assembly operation)
Exponential: time between independent events, or a process time
that is memoryless
Weibull: time to failure for components
Discrete or continuous uniform: models complete uncertainty
Triangular: a process for which only the minimum, most likely,
and maximum values are known.
19
Parameter Estimation [Identifying the distribution]
X S2 i 1
n n 1
If the data are discrete and have been grouped in a frequency
distribution:
j 1 f j X j
n n
j 1
f j X 2
j nX 2
X S2
n n 1
20
Parameter Estimation [Identifying the distribution]
364
X 3.64
100
2080 100 * (3.64) 2
S
2
99
7.63
22
Maximum Likelihood Method [Identifying the distribution]
23
Goodness-of-Fit Tests [Identifying the distribution]
24
Chi-Square test [Goodness-of-Fit Tests]
25
Chi-Square test [Goodness-of-Fit Tests]
26
Chi-Square test [Goodness-of-Fit Tests]
where ai-1 and ai are the endpoints of the ith class interval
and f(x) is the assumed pdf, F(x) is the assumed cdf.
Recommended number of class intervals (k):
Sample Size, n Number of Class Intervals, k
20 Do not use the chi-square test
50 5 to 10
100 10 to 20
1/2
> 100 n to n/5
Vehicle Arrival :
H0: the random variable is Poisson distributed.
H1: the random variable is not Poisson distributed.
xi Observed Frequency, Oi Expected Frequency, Ei (Oi - Ei)2/Ei Ei np ( x)
0 12 2.6
7.87 e x
1
2
10
19
9.6
17.4 0.15
n
3 17 21.1 0.8
x!
4 19 19.2 4.41
5 6 14.0 2.57
6 7 8.5 0.26
7 5 4.4
8 5 2.0
9 3 0.8 11.62 Combined because
10 3 0.3
> 11 1 0.1 of min Ei
100 100.0 27.68
02 27.68 02.05,5 11 .1
Degree of freedom is k-s-1 = 7-1-1 = 5, hence, the hypothesis is
rejected at the 0.05 level of significance. 28
Kolmogorov-Smirnov Test
[Goodness-of-Fit Tests]
Recall from Chapter 7:
The test compares the continuous cdf, F(x), of the hypothesized
distribution with the empirical cdf, SN(x), of the N sample
observations.
Based on the maximum difference statistics (Tabulated in A.8):
D = max| F(x) - SN(x)|
A more powerful test, particularly useful when:
Sample sizes are small,
No parameters have been estimated from the data.
29
Fitting a Non-stationary Poisson Process
Fitting a NSPP to arrival data is difficult, possible approaches:
Fit a very flexible model with lots of parameters or
Approximate constant arrival rate over some basic interval of time,
but vary it from time interval to time interval. Our focus
30
Fitting a Non-stationary Poisson Process
The estimated arrival rate during the ith time period is:
1 n
̂ (t )
nt j 1
Cij
9:30 - 10:00 20 13 12 30
31
Covariance and Correlation
[Multivariate/Time Series]
32
Some Correlation Patterns
r r==0;0;No r r==.931;
.931;Strong
Strongpositive
positivecorrelation
Nocorrelation
correlation correlation
r r==1;1;Linear
Linearrelationship
relationship
r r==-.67;
-.67;Weaker
Weakernegative
negative
correlation
correlation