System Simulation
Dr. Dessouky
Description
Simulation is a very powerful and widely used management
science technique for the analysis and study of complex systems.
Simulation may be defined as a technique that imitates the
operation of a real-world system as it evolves over time. This is
normally done by developing a simulation model. A simulation
model usually takes the form of a set of assumptions about the
operation of the system, expressed as mathematical or logical
relations between the objects of interest in the system.
Simulation has its advantages and disadvantages. We will focus
our attention on simulation models and the simulation technique.
Simulation
What is simulation:
The process of designing a mathematical
or logical model of a real-system and then
conducting computer-based experiments
with the model to describe, explain, and
predict the behavior of the real system.
Simulation
Where simulation fits in
Programming
Simulation
Analysis
Modeling
Probability &
Statistics
Basic Terminology
In most simulation studies, we are concerned with
the simulation of some system.
Thus, in order to model a system, we must
understand the concept of a system.
Definition: A system is a collection of entities that
act and interact toward the accomplishment of some
logical end.
Systems generally tend to be dynamic their status
changes over time. To describe this status, we use
the concept of the state of a system.
Example Simulation Model
Ford - # of Panels per day (throughput)
Emergency Room (beds, doctors, nurses), (minor, moderate, major,
critical)
TRW Ballistic Missile Survivability against Soviet Threat
Paramount Farms Pistachio
Miami University Parking
HMT Disks Throughput
Christopher Ranch Garlic Capacity
Power Integration Semiconductor Capacity, and random machine
down times
Value of Simulation
Empirical Method verses mathematical
model
Allow you to calculate the extreme values
not just the expected value
Simulation
What is simulation
Simulationistheactualrunningofthe
modelsystemtogaininsightintoits
performance.
Simulation
Why use simulation
Simulationisusedtobetterunderstandthe
expectedperformanceoftherealsystem
andtotesttheeffectivenessofthesystem
design.
Simulation
Why use simulation
Without building them
experimental system
new concepts
Without disturbing them
costly experimentation
unsafe experimentation.
Without destroying them
Determine limits of stress
Queuing systems
Performancemeasures(output)
Datarequirements(input)
Usesofmodel
Kendallsnotation
Queuing systems
SystemPerformancemeasures(outputs)
Expectednumberofcustomersinsystem
Expectednumberofcustomersinqueue
Expectedtimeinsystem
Expectedtimeinqueue
Serverutilization
Probabilityofncustomersinsystem
Throughput
Queuing systems
Datarequirements(Inputs)
Interarrivaltimedistribution
Servicetimedistribution
Numberofservers
Queuediscipline
Systemcapacity
Sizeofinputpopulation
Kendallsnotation(M/M/s/FCFS/K/M)
Alternativetosimulation
Simulation
Analytic models
Physical experimentation
Visit other sites
Simulation vs. analytic modeling
Advantage:
various performance measures
greater realism
easier to understand
model the steady-state as well as the transit
behavior.
Disadvantage:
May not provide you with the optimal solution
time to construct model will be longer.
Simulation vs. Physical
Advantage:
High Speed
Not disruptive
Replication easy
Control variations
Generally less costly
Disadvantage:
Realism
Validity
Simulation vs. Alternatives
Realism
V
Cost
V
Representing system
System:
a collection of mutually interacting objects
designed to accomplish a goal (machines
repair system)
Entities:
denotes an element/object within boundary of
system (machines, operators, repairman)
Entity work being performed on object
Resource performing the work
Representing system
Attribute:
Characteristic or property or an entity
(machine ID, Type of breakdown, time that
machine went down)
Activity:
transforms the state of an object usually over
some time (repairman service time, machine
run time)
Representing system
State of the system:
Numeric values that contain all the
information necessary to describe the system
at any time.
Delays:
Processes that take a conditional length of
time in the system
Representing system
Events:
Change the state of the system(end of service
of machine,machine breaks down)
Queue:
it is set, used to model waiting
Ex. Elevator systems
Entities
Elevators, people
Sets
People waiting at each floor
Attributes
Elevators capacity, speed, destination,
current location of each elevator
People inter-arrival time at each floor,
destination of each people
Ex. Elevator systems
State of system:
# of people on each elevator
# of people in each floor
Activities
Load/Unloading passenger
Travel to next floor (speed and distance)
Persons travel to elevator
Ex. Elevator systems
Delays:
Persons waiting for elevator
Events:
Elevator arrival
End unloading
End Loading
Person Arrival
Static Simulation vs. Dynamic
Simulation
There are two types of simulation models,
static and dynamic.
Definition: A static simulation model is a
representation of a system at a particular
point in time.
We usually refer to a static simulation as a
Monte Carlo simulation.
Static Simulation vs. Dynamic
Simulation
Definition: A dynamic simulation is a
representation of a system as it evolves
over time.
Within these two classifications, a
simulation may be deterministic or
stochastic.
A deterministic simulation model is one
that contains no random variables; a
stochastic simulation model contains one
or more random variables.
Discrete Event vs. Continuous
Event Simulation
Discrete event:
state of system changes only at discrete points
in time(events)
ex. Machine repair problem
Programming
Look at system only when events occur; time is
advanced from event to event.
Discrete Event vs. Continuous
Event Simulation
Continuous event:
state of system changes continuously over
time
Ex. Level of fluid in tank
Programming:
Advances time in small intervals. Use differential
equations to represent flows.
An Example of a Discrete-Event
Simulation
To simulate a queuing system, we first have to
describe it.
We assume arrivals are drawn from an infinite
calling population.
There is unlimited waiting room capacity, and
customers will be serve in the order of their arrival
(FCFS).
Arrivals occur one at a time in a random fashion.
All arrivals are eventually served with the
distribution of service teams as shown in the book.
Service times are also assumed to be random.
After service, all customers return to the calling
population.
For this example, we use the following variables
to define the state of the system: (1) the number of
customers in the system; (2) the status of the
server that is, whether the server is busy or idle;
and (3)the time of the next arrival.
An event is defined as a situation that causes the
state of the system to change instantaneously.
All the information about them is maintained in a
list called the event list.
Time in a simulation is maintained using a variable
called the clock time.
We begin this simulation with an empty system and
arbitrarily assume that our first event, an arrival,
takes place at clock time 0.
Next we schedule the departure time of the first
customer.
Departure time = clock time now + generated service time
Also, we now schedule the next arrival into the system
by randomly generating an interarrival time from the
interarrival time distribution and setting the arrival time
as
Arrival time = clock time now + generated interarrival time
Both these events are their scheduled times are
maintained on the event list.
This approach of simulation is called the next-event
time-advance mechanism, because of the way the
clock time is updated. We advance the simulation clock
to the time of the most imminent event.
As we move from event to event, we carry out the
appropriate actions for each event, including any
scheduling of future events.
The jump to the next event in the next-event mechanism
may be a large one or a small one; that is, the jumps in
this method are variable in size.
We contrast this approach with the fixed-increment
time-advance method.
With this method, we advance the simulation clock in
increments of t time units, where t is some
appropriate time unit, usually 1 time unit.
For most models, however, the next event
mechanism tends to be more efficient
computationally.
Consequently, we use only the next-event
approach for the development of the models for
the rest of the chapter.
To demonstrate the simulation model, we need to
define several variables:
TM = clock time of the simulation
AT = scheduled time of the next arrival
DT = scheduled time of the next departure
SS = status of the server (1=busy, 0=idle)
WL = length of the waiting line
MX = length (in time units) of a simulation run
We now begin the simulation by initializing
all the variables. This simple example
illustrates some of the basic concepts in
simulation and the way in which simulation
can be used to analyze a particular problem.
World View The Structure concepts and views
under which the simulation is guided for the
development of the simulation model
Event Orientation defines the changes in state that
occur at each event time
Process Orientation describes the process through
which the entities in the system flow
Activity Scanning Orientation describes the activities in
which the entities in the system engage
Discrete Event Simulation
Event scheduling
Write modules that describe changes in the
state of the system at each event
Main program advances time
One subprogram for each event
General purpose programming language
Discrete Event Simulation
Process interaction
Write modules that describe the progress of
entities through the system
As entities move the systems changes state
Entities are held to represent activities and
delays
Promodel programming language
Event scheduling
Time is advanced from event to event
Future events list ordered list of
upcoming events
As events are scheduled, they are added to the
list
As events occur they are removed from list
Activities in event ( one / event type)
Event scheduling
List is required to keep track of entities in
a set
Statistics Two types
Sample statistics average of some values
(W)
W = (W1 +W2 + +Wn)/n = Total Wait / # of wait
Time average statistics time weighted (L)
L = (0(t1) + 1(t2-t1) + 2(t3-t2) + 1(t4-t3)) / t4
Activity scanning
Activity scanning
Time is modeled in fixed time increments to
check if activity occurred
Small time increments is inefficient
Large time increments may miss activity
describes the activities in which the entities in
the system engage.
Process Oriented
Process oriented:
Many simulation models include elements
which occur in defined patterns
The logic associated with such a system or
events can be generalized and defined by a
single statement
A simulation language could then translate
such statement into the appropriate sequence
of events
describes the processes through which the
entities in the system flow.
Process Oriented
Process oriented:
These statements, define a sequence of events
which are automatically executed by the
simulation language as the entities move
through the process
Create arrival entities every t time units
However, since we are normally restricted to a
set of standardized statement, provided by the
simulation language, our model flexibility is
not as great as with the event condition
Feature provided by a language
Conceptual framework(entities, attributes,
resource, queues)
Maintenance of event list
Random variable generation
Animation
Debugging function
Output analysis
Input analysis
Report generation
Simulation Languages
One of the most important aspects of a simulation
study is the computer programming.
Several special-purpose computer simulation
languages have been developed to simplify
programming.
The best known and most readily available
simulation languages, including GPSS, GASP IV
and SLAM.
Most simulation languages use one of two different
modeling approaches or orientations; event
scheduling or process interaction.
GPSS uses the process-interaction approach.
SLAM allows the modeler to use either approach
or even a mixture of the two, whichever is the
most appropriate for the model being analyzed.
Of the general-purpose languages, FORTRAN is
the most commonly used in simulation.
In fact, several simulation languages, including
GASP IV and SLAM, use a FORTRAN base.
To use GASP IV we must provide a main program,
an initialization routine, and the event routines.
For the rest of the program, we use the GASP
routines.
Because of these prewritten routines, GASP IV
provides a great deal of programming flexibility.
GPSS, in contrast to GASP, is a highly structured
special-purpose language.
GPSS does not require writing a program in the
usual sense.
Building a GPSS model then consist of
combining these sets of blocks into a flow
diagram so that it represents the path an
entity takes as it passes through the system.
SLAM was developed by Pritsket and
Pegden (1979). It allows us to develop
simulation model as network models,
discrete-event models, continuous models, or
any combination of these.
The decision of which language to use is one of
the most important that a modeler or an analyst
must make in performing a simulation study.
The simulation language offer several advantages.
The most important of these is that the specialpurpose languages provide a natural framework
for simulation modeling and most of the features
needed in programming a simulation model.
The Simulation Modeling Steps
We now discuss the process for a complete
simulation study and present a systematic
approach of carrying out a simulation.
A simulation study normally consists of several
distinct stages. (See Figure in the book)
However, not all simulation studies consist of all
these stages or follow the order stated here.
On the other hand, there may even be considerable
overlap between some of these stages.
Problem/Model Formulation
State the objective of the study.
Identify the Problem. Determine any underlying
causes if possible.
Determine the input variables.
Controllable Variables.
Uncontrollable Variables.
Make assumptions / boundaries that were used to
simplify the model.
Determine Performance measures used to
measure the objective. (Output)
Data collection/acquisition
Determine the Data Collection System or
Estimates to be used.
Observe the system
Historical or Similar Systems
Theoretical Estimates
Engineering Estimates
Operator Estimates
Vendor Estimates
Identify the data collected.
How it was collected.
How it was represented in the model.
Model Construction or
Development
Identify The Real System
Determine Conceptual Model -Activities
and Events
Develop the Logical Model.
Identify the Programming Language used.
Computer Implementation (Promodel,
Arena, Slam Systems).
Model Construction or
Development
Modeling Tips
Art vs. Science
Over Simplification vs. Unnecessary Detail
Start Simple
Add stronger assumptions
Model Verification and
Validation
Verification: Determining whether
simulation model works as intended.
Verifying the Model.
Structure: Walk Through of the Model
Debugger.
Trace = print or writing in process calculations.
Animation.
Model testing
Analytical Model.
Model Verification and
Validation
Verification.
Logical Model.
Are events represented correctly?
Are mathematical formulas and relationships
correct?
Are statistical measures formulated correctly?
Computer Model/Simulation Model.
Does the code contain all aspects of the logical
model?
Are the statistics and formulas calculated correctly?
Does the model contain coding errors?
Model Verification and
Validation
Validation:Determine whether Simulation
of The Model is a credible representation
of a Real System.
Compare the model with the actual systems
by performing statistical tests. T-Test &
C.I.
Conceptual Model.
Does the model contain all relevant elements,
events and relationships?
Will the model answer the questions of concern?
Model Verification and
Validation
Logical Model.
Does the model contain all events included in the
conceptual model?
Does the model contain all the relationships of the
conceptual model?
Computer Model/Simulation Model.
Is the computer model a valid representation of the
real system?
Can the computer model duplicate the performance
of the real system?
Does the computer model output have credibility
with system experts and decision makers?
Experimentation and Analysis of
Results
Experimentation The execution of the
simulation model to obtain output values
Analysis of Results The process of analyzing
the simulation outputs to draw inferences and
make recommendations for problem resolution
Implementation and Documentation
The process of implementing decisions
resulting from the simulation and
documenting the model and its use.
Manual Simulation Example
Given the following arrival times for a single
server system what will be the average number
in the queue, average number in the system,
average time in system, average time in queue,
the number of completed jobs, number in the
queue, number in the system, and server
utilization at time 15 if the service time is 3 time
units for each entity.
1, 3, 5, 9,13,15,17
Data Collection
Activities may be represented as
Constants
Random variables
Collection of data
Design a data collection form
Record more than single attribute in case you
need to use data in a different way.
Use several session to get representative data
Use control charts
Data Collection
Machine
Begin Repair End Repair
Time
Elapsed
Data Collection
Testing data
Independence
Randomness
Homogeneity
Data Collection
Test of Independence
Ho: Measure A is independent of measure B
H1: Measure A is not independent of measure
B.
Inventory and day of week
Data Collection
Test of Randomness
Ho: f(xi/xj) = f(xi) =Independent
Hi: f(xi/xj) f(xi) : Dependent
For example, when simulation a production
process in which the items can be defective or
good, it would be important to know if
successive items are randomly distributed with
reputation good items followed by some of
defective items.
Data Collection
Test of Homogeneity
Tests for whether multiple sets of data can be
considered as coming from statistical
population are generally referred to as tests of
homogeneity distribution free.
Ho : G(x) =H(x)
H1 : G(x) H(x)
Two different workers working on the same
machine.
Random Variable
Two types
Discrete
Continuous
Random Variable
Probability mass function
Discrete
P(X = xi) = p(xi)
p(xi) = 1
Random Variable
Probability density function
Continuous
f(x) = e x x > 0
P(X = a) = 0
- f(x) dx = 1
P(a < x < b) = ab f(x) dx
Random Variable
Cumulative distribution function (CDF)
F(X) = P(X <= x)
X<x
p(xi)
- x f(x) dx
Random Variable
Expected value
= E(x)
= xi p (xi)
= x f(x) dx
Random Variable
Variance
V ( x ) E[( x ) 2 ]
E[ x 2 2 x 2 ]
E
x ( E ( x))
2
i
p ( xi ) (
xi p( xi))
Random Variable
Standard deviation
SD ( X )
Sums of R.V.
V (X )
Y a1 x1 a2 x2
E ( y ) a1 E ( x1 ) a2 E ( x2 )
2
V (Y ) a1 V ( x1 ) a2 V ( x2 )
Random Variable
SampleMean X
SampleVari ance S
(X X )
n 1
2
i
nx
n 1
Poisson Probability Distribution
Consider a discrete r.v. which is often useful
when dealing with the number of occurrences
of an event over a specified interval of time.
Suppose we want to find the probability
distribution of the accidents at the intersection
of Rural and Apache during a one week
period.
The R.V. we are interested in is the number of
accidents.
Poisson Probability Distribution
i. The Poisson Distribution provides a good model for the probability
distribution of the number of rare events that occur in space, time,
and volume where is the average at which events occur.
ii. Define: A r.v. is said to have a Poisson distribution if the p.m.f of
X is
x e
P(x) = f(x) =
, x = 0,1,
x!
where is the rate per unit time or per unit area
E[ X ]
iii.
V (X )
Exponential Distribution
Previously, we discussed the Poisson random variable,
which was the number of events occurring in a given
interval. This number was a discrete r.v. and the
probabilities associated with it could be described by the
Poisson Probability Distribution.
Not only is the number of events a r.v., but the waiting
time between event is also a random variable. This r.v. is a
continuous r.v. for it can assume any positive value.
This r.v. is an exponential r.v. which can be described by
the exponential distribution.
Exponential Distribution
e x
x 0& 0
i. Pdf: f ( x)
otherwise
0
where = rate at which events occur
ii. Correspondingly,
x
F ( x) P ( X x) e x dx 1 e x , x 0
0
1
V (X ) 2
E[ X ]
iii. An important application of the exponential distribution is to
model the distribution of component lifetime. A reason for its
popularity is because of the memory-less property of the
Exponential Distribution
The Uniform Distribution
o The simplest distribution is the one in which a continuous r.v. can assume
any value within a interval [a, b]
Def:
A continuous r.v. X is said to have a uniform distribution on the
interval [a,b] if the probability distribution (pdf) of X is:
1
a xb
f ( x) b a
0
otherwise
The Uniform Distribution
The cumulative distribution is
x
F ( X ) P ( X x)
f ( x)dx
x x
x
a
xa
f ( x)dx
ba a ba ba ba
1
ba
E[ X ] xf ( x)dx x(
)dx
ba
2
(b a ) 2
V (X )
12
The Uniform Distribution
Note:
An important uniform distribution is
that for when a = 0 and b = 1, namely
U(0, 1)
A U(0,1) r.v. can be used to simulate
observation of other random variables
of the discrete and continuous type.
The Triangular Distribution
Continuous Distribution
2( x a )
f ( x)
a xb
(b a )(c a )
2(c x)
bxc
(c b)(c a )
0
elsewhere
The Triangular Distribution
F ( x) 0
xa
( x a) 2
F ( x)
(b a )(c a )
(c x ) 2
1
(c b)(c a )
1
xc
a xb
bxc
The Triangular Distribution
F ( x) 0
xa
abc
E ( x)
3
a 2 b 2 c 2 ab ac bc
V ( x)
18
a min{x1 xn }
c max{x1 xn }
b 3 x a c
Normal Distribution
It is a fact that measurements on many random variables will follow a bellshaped distribution.
Random variable of this type are closely approximated by a Normal
Probability Distribution.
A continuous r.v. X is said to have a normal distribution if the pdf of X is
f ( x)
1
2
( x )2
2 2
, 0, x ,
The distribution contains 2 parameters ( and ). These are the expected
value and the variance and hence locate the center of the distribution and
measure its spread.
Normal Distribution
The Standard Normal Distribution
To compute P(a x b) when X ~ N(, 2), we must evaluate
b
f ( x)dx
a
1
2
( x )2
2 2
dx
Note: None of the standard integration techniques can be used
to evaluate this pdf. Instead, for = 0, and 2 = 1, the pdf has
been evaluated and values have been computed. Using the
table, probabilities for any other values of and 2 can be
determined
Normal Distribution
The normal distribution for parameters values
2
=
0,
and
= 1 is called the standard normal
distribution. A r.v. that has a standard
distribution is called a standard normal random
variable (denoted by Z). The pdf of Z is:
f ( z)
1
2
z2
Normal Distribution
The cumulative distribution of Z is
z
P( Z z )
f ( y)dy
and is denoted by (Z)
Note: The N(0,1) Table returns the cumulative
probability up to z or (z)
Normal Distribution
Non-standard Normal Distribution
The table only provides probabilities for r.v.
following the N(0,1) distribution. Thus, when X
2
2
~ N(, ), (i.e. not = 0, = 1), probabilities
involving X are computed by standardizing
the r.v. to N(0,1) scale.
Selecting a Distribution
Theoretical prior knowledge
Random arrival => exponential IAT
Sum of large manufactures => Normal CLT
Compare histogram with probability mass
or probability density
Data Collection
Little variability model as a constant.
Variability model as a random variable.
Empirical vs. Theoretical, Select a
Distribution, Estimate Parameter of
distribution, goodness or fit test.
X2 goodness of fit test
Compare observed versus theoretical
density
A collection of data can be as a sample
from a specified p.d.f
H0: Xis are IID r.v. with density f(x)
H1: Xis are not IID r.v. with density f(x)
X2 goodness of fit test
Critical value
If H0 is true, TS ~ X2k-1-(# of par estimated),
A large T.S.would cause rejection of H0
Reject Ho if T.S. > X2 critical
i
i
TS
k
i 1
X2 goodness of fit test
Issues test is an art
Number of intervals > 2
Size of intervals: Ei ~ same > 5
Requires relatively large amount of data
K-S test
Compare observed with theoretical CDF
Limited to continuous distribution, known
parameters
H0: Xi are IID r.v. with CDF F(x)
H1: Xi are not IID r.v. with CDF F(x)
Test statistic From table
K-S test
Critical value
A large T.S would cause rejection
Critical value 0.01
1.63 / n
0.05
1.36 / n
0.10
1.22 / n
i 1
i ^
TS max max( F ( Xi )
), max( F ( Xi ))
n
n
Parameter estimation
Set of data
x
xi
n
x1, x2, xm
s
2
2
2
x
n
x
i
n 1
Methods of moments => equate E(X),
V(X) to x and S2
Parameter estimation
Maximum likelihood => find parameter
that max the likelihood of obtaining the
given sample
Produces efficient and consistent estimates
Not always unbiased
Superior properties to methods of moments
Common sense.
Statistical Analysis of Simulations
As previously mentioned, output data from
simulation always exhibit random variability, since
random variables are input to the simulation model.
We must utilize statistical methods to analyze
output from simulations.
The overall measure of variability is generally
stated in the form of a confidence interval at a given
level of confidence.
Thus, the purpose of the statistical analysis is to
estimate this confidence interval.
Output analysis
Need multiple observations to estimate
variability
Y1, Y2, Y3, . Yn
Estimate a confidence interval for the
measure of performance
Estimate the number of observations
required to obtain the desired precision
Output analysis
What is an observation?
Is observation a sample statistic or time
average statistic?
Is this a steady state simulation or
terminating simulation?
Are the observations independent or
correlated?
Terminating vs Steady State Simulation
Often, the type of model determines which
type of output analysis is appropriate for a
particular simulation.
However, the system or model may not
always be the best indicator of which
simulation would be the most appropriate.
It is quite possible to use the terminating
simulation approach for systems more
suited to steady-state simulations, and vice
versa.
Observation vs Time Based
Observation (Sample)
Average Time In System
Average Time In Queue
Time Based
Average Number in System
Average Number in Queue
Machine Utilization
Terminating simulation
Simulation in which the output measure of
performance is defined over a specific
interval of time with a specific starting
condition and a specific ending condition
Retail sales during a business day
Project network
Time to produce a batch of parts in a work cell
Military Simulations
Terminating simulation
Has a specified starting and ending
condition.
Each observation must have the same
starting and ending.
Observations are obtained by replication.
Use a different seed for random number
generation.
Steady state simulation
Simulation in which the output measure of
performance is defined over an infinite
interval of time independent of the initial
state of the system and stopping condition
Average production from an assembly line of
well trained employees
Inventory simulation
Steady state simulation
Independent of starting and ending
condition.
Remove initial condition bias
Specify warm-up period (transient period) .
Set initial condition too steady state.
Have a very long run length
Steady state simulation
1. Individual Yi average of individuals.
2. Replication Yi average of each one.
3. Batch means batch by time, by number.
Terminating vs. Steady state simulation
Terminating
Observations are obtained by replication
Each observation must reflect the specified
starting and ending condition
Use a different seed for each replication
Y1, Y2, , Yr => one independent
observation per replication
Confidence interval for steady state
simulation
Y1, Y2, . Yn
Trying to estimate a long run performance
measure independent of starting and
ending conditions
Two problems
Initial condition bias
Dependent observations
Confidence interval for steady state
simulation
Outline
Removing initial condition bias
Creating independent observation
Replication/ deletion
Batch means
Confidence interval for replication
Let Y1, Y2, and Y3YR be measures of
performance from R independent
replication.
Independent -> different seed for each run
Y t r 1 ,
(Y
) RY
R 1
Confidence interval for replication
Approximate due to need for Yi ~ Normal
(1-) Confidence Interval => Probability
of containing true mean
1
Var (Y ) Var ( R
Y1 ...
1
2
1
R
YR )
(Var (Y1 ) ... Var (YR ))
R
RS 2
S2
2
R
R
Number of replication needed
Suppose we desire a confidence interval
Y I HalfLength
Based on a preliminary run of R0
replication, we have an estimate of S2 and
confidence interval
Y t
,
1
R0 1
2
S
R0
Number of replication needed
Find R such that
I t
,
1
R 1
2
S2
R
If R is large,
r 1
R
R
*S
Test for comparing two means
H0: 1 2 = 0
H1: 1 2 0
Two approaches:
Form a (1 ) confident on 1 2 :
Y1 Y2 t / 2,r V (Y1 Y2 )
Reject H0 if confident does not contain 0.
Perform a t test
(Y1 Y2 ) 0
V (Y1 Y2 )
Reject if \t\ > tr,/2
Assumptions
Case 1: Y1, Y2 YR1
Case 2: Y1, Y2 YR2
Y1 , s12
Y2 , s 22
Observations are independent
Observation are normally distributed
Variances are unknown/known.
Variances are equal/unequal
Observations are paired/unpaired.
Test for comparing two means
Equal Variance
1. Assumptions: independent, normal, unknown, unpaired, equal
variance.
2
2
( R1 1) S12 ( R2 1) S 22
(Yi Y1 ) (Yi Y2 )
2
2. S p
R1 R2 2
R1 R2 2
3. Var (Y1 Y2 ) Var (Y1 ) Var (Y2 )
S p2
R1
4. (1 )confident : Y1 Y2 t / 2, R1 R2 2
5. t-test: t
t-crit = t
S p2
S p2
R1
R2
S p2
R2
( y1 y 2 )
Sp
1 1
R1 R2
R1 R2 2 ,
6. Note: Many simulations do not have equal variance.
Test for comparing two means
One sided test
Need to make hypothesis in advance
Use t test, adjust critical value
Test for comparing two means
Test for normal population with known variance
Assumptions: independent, normal, known variance,
unpaired, unequal variance.
2 populations: X1 ~ N(1, 12) & X2 ~ N(2, 22)
Sample m from X1 & sample n from X2
Want to test whether 1= 2
H0: 1 = 2
H1: 1 2
X 1 X 2 ( 1 1 ) X 1 X 2
Test Statistic: Z 0
2
2
2
2
1
2
m
n
1 2
m
n
Test for comparing two means
Unequal Variance
1. Assumptions: independent, normal, unknown variance,
unpaired, unequal variance.
S12 S 22
2. Var (Y1 Y2 ) Var (Y1 ) Var (Y2 )
R1 R2
3. (1 )confident : Y1 Y2 t / 2,
S2 S2
1
2
R R
1
2
S2
1
R
1
R1 1
S2
2
R
2
2 1
S12 S 22
R1 R2
Test for comparing two means
Paired Test
Assumptions: independent, normal, unknown variance,
equal # of replications
Case 1: Y1, Y2 YR
Case 2: Y1, Y2 YR
Different: d1, d2 dR , where di = yi yi
2
di
(d i d )
2
d
Sd
R
R 1
H0: 1 2 = 0 d = 0
H1: 1 2 0 d 0
(1 )confident : d t / 2, R 1
t
d
S d2
R
S d2
V (d )
R
Test for comparing two variances
F-test for equal variance
1.
H 0 12 22
H 1 12 22
2. Test statistics = F =
3. Critical Value =
S12
S 22
R1 1, R2 1,
4. Example
F =5.4/2.55 = 2.12
= .10, Fcritical = F9,9,.05 = 3.18, can not reject Ho
Common Random Number
The process of comparing cases with the
same set of random numbers
creating identical condition
Observation
Confident Interval
(Y1 Y2 ) t R 1, / 2
V (Y1 Y2 )
V (Y1 Y2 ) V (Y1 ) V (Y2 ) 2Cov (Y1 , Y2 )
Use the paired test
Random Numbers
Generation of U(0,1) random number
algorithm used by the RND function
Generation of random variates from
various distributions algorithm used by
EXPONENTIAL, UNIFORM, and so on
(these algorithms use U(0,1) random
numbers.
Random Number Generation
Desirable properties
Fast and efficient
Capable of repeating same sequence
Statistically equivalent to U(0,1)
Independent and dense
Large cycle length or period
Low storage requirements
Old method tables
Random Number Generation
Pseudo random number generators
A non random sequence of numbers each
completely determined by its predecessor, the
algorithm, and initially, the seed.
Linear Congruential Generator
Zi = ( a * Zi-1 + C ) mod m
Z0 = seed
Ui = Zi / m (Random Number)
If we choose a, C, and m correctly, => then
we achieve a maximum period
0<= Zi <= m-1
Linear Congruential Generator
Rule For Full Period :
C is relatively prime to m.
other than 1, hence there is no integer that exactly
divides C and m
Every prime factor of M is also a prime factor
of A-1
If m is exactly dividable by 4, then A-1 must
be exactly dividable by 4
Linear Congruential Generator
A full period does NOT mean always a
good random number generator
Multiplicative Generators
Zi = a * Zi-1 mod m
Z0 = seed
Saves an addition, more popular
Multiplicative Generators
C=0
M divides both m and c
Condition (a) is violated
Not full period
P = m 1 is largest available period
Multiplicative Generators
2b is not a good choice for m
only possible numbers
Let m = 2b - 1
Testing a random number generator
Testing the distribution
Generate 1000 or more observations
X2 test or K-S test for U(0, 1)
Use 100 intervals
Test for independence
Runs up
Tests designed to compare observed and
expected distribution
E(x) = .5 V(X) = 1/12, where a = 0, b=1
Random variate generation
Assume a random number generator is
available to generate Ui ~ U(0, 1)
Goal: Generate Xi from a specified
distribution f(x) or p(x) of F(x)
Three methods
Inverse transformation method
Convolution method
Acceptance\Rejection method
Random variate generation
Apply these methods to the five
distributions we are using in this class
Uniform
Triangular
Exponential
Normal
Poisson
Inverse transformation method
General idea use CDF
Select Ui
Find corresponding xi
That is xi = F-1(Ui)
Advantage of inverse transformation method
One Ui per xi
Disadvantage
CDF may not always exist
Inverse transformation method
Exponential distribution
f(x) = e -x x 0
F(X) = 1 - e -x x 0
Ui = F(Xi) = 1 - e -xi
(1- Ui) = e -xi
ln(1- Ui) = - Xi
Xi = - (1/ ) ln(1- Ui) = - (1/ ) ln(Ui)
Inverse transformation method
Triangular distribution
( x a )2
,a x b
( x ) ( b a )( c a )
ui
( c x )2
1
,b x c
( c b )( c a )
No
Yes
u
i
x a
i
ba
ca
(b a)(c a) ui
ba
ui c a
ba
ui ca
x c
i
(c b)(c a)(1 u i)
Convolution Method
Applicable to situation where the random
variable of interest can be expressed as a
sum of other random variables that are IID
(independent identical distributed)
X=Y1+Y2+Y3. +Yn
Idea: Generate Y1. Yn and add these up
to calculate X
Convolution Method
Normal distribution
Focus: Generating Zi ~ N(0, 1)
xi
Zi
xi Z i ~ N ( , )
Generating Zi
1
f (Z )
e
2
1 2
z
2
Inverse transformation: F(x) does not exist
Acceptance\Rejection: Not bounded
Convolution Method
Normal distribution
Generate Ui
Generate Zi
Then Xi Zi
Zi~N(0,1)
Acceptance\Rejection Method
Applicable to distribution functions that
are hard to integrate
Idea
Find a majoring function t(x) where t(x) > f(x)
Sample values of x from t(x) call it x*
Sample Ui < f(x*) / t(x*), accept x*
Simplification for this class we will
always use a rectangular majoring function
9.3 Random Numbers and Monte
Carol Simulation
The procedure of generating these times from the
given probability distributions is known as
sampling from probability distributions, or
random variate generation, or Monte Carlo
sampling.
We will discuss several different methods of
sampling from discrete distributions.
The principle of sampling from discrete
distributions is based on the frequency
interpretation of probability.
In addition to obtaining the right frequencies, the
sampling procedure should be independent; that is,
each generated service time should be independent
of the service times that precede it and follow it.
This procedure of segmentation and using a roulette
wheel is equivalent to generating integer random
numbers between 00 and 99.
This follows from the fact that each random number
in a sequence has an equal probability of showing
up, and each random number is independent of the
numbers that precede and follow it.
A random number, Ri, is defined as an
independent random sample drawn from a
continuous uniform distribution whose
1 0function
x 1 (pdf) is given
probability
density
f ( x)
0 otherwise
by
Random Number Generators
Since our interest in random numbers is for use
within simulations, we need to be able to generate
them on a computer.
This is done by using mathematical functions called
random number generators.
Most random number generators use some form of a
congruential relationships. Examples of such
generators include linear congruential generator, the
multiplicative generator, and the mixed generator.
The lineal congruential generator is by far the most
widely used.
Each random number generated using this methods
will be a decimal number between 0 and 1.
Random numbers generated using congruential
methods are called pseudorandom numbers.
Random number generators must have these
important characteristics:
1.
2.
3.
4.
The routine must be fast
The routine should not require a lot of core storage
The random numbers should be replicable; and
The routine should have a sufficiently long cycle
Most programming languages have built-in
library functions that provide random (or
pseudorandom) numbers directly.
Computer Generation of Random
Numbers
We now take the method of Monte Carlo sampling
a stage further and develop a procedure using
random numbers generated on a computer.
The idea is to transform the U(0,1) random
numbers into integer random numbers between 00
and 99 and then to use these integer random
numbers to achieve the segmentation by numbers.
We now formalize this procedure and use it to
generate random variates for a discrete random
variable.
The procedure consists of two steps:
1. We develop the cumulative probability
distribution (cdf) for the given random
variable, and
2. We use the cdf to allocate the integer random
numbers directly to the various values of the
random variables.
9.4 An Example of Monte Carlo
Simulation
The book uses a Monte Carlo simulation to
simulate a news vendor problem.
The procedure in this simulation is different from
the queuing simulation, in that the present
simulation does not evolve over time in the same
way.
Here, every day is an independent simulation.
Such simulations are commonly referred to as
Monte Carlo simulations.
9.5 Simulations with Continuous
Random Variables
In many simulations, it is more realistic and
practical to use continuous random variables.
We present and discuss several procedures for
generating random variates from continuous
distributions.
The basic principle is similar to the discrete case.
We first generate U(0,1) random number and then
transform it into a random variate from the
specified distribution.
The selection of a particular algorithm will
depend on the distribution from which we want
to generate, taking into account such factors as
the exactness of the random variables, the
computations and storage efficiencies, and the
complexity of the algorithm.
The two most common used algorithms are the
inverse transformation method (ITM) and the
acceptance-rejection method (ARM).
Inverse Transformation Method
The inverse transformation method is generally used
for distribution whose cumulative distribution
function can be obtained in closed form.
Examples include the exponential, the uniform, the
triangular, and the Weibull distributions.
For distributions whose cdf does not exist in closed
form, it may be possible to use some numerical
method, such as a power-series expansion, within
the algorithm to evaluate the cdf.
The ITM is relatively easy to describe and
execute.
It consists of the following steps:
Step1: Given the probability density formula f(x) for a
random variable X, obtain
the cumulative distribution
x
function F(x)Fas
( x)
f (t )dt
Step 2: Generate a random number r.
Step 3: Set F(x) = r and solve for x.
We consider the distribution given by the function
x
2
f ( x)
0
0x2
otherwise
A function of this type is called a ramp function.
To obtain random variates from the distribution
using the inverse transformation method, we first
computer the cdf as
x t
F ( x) dt
0 2
x2
In Step 2, we generate a random number r.
Finally, in Step 3, we set F(x) =r and solve for x.
x2
r
4
x 2 r
Since the service time are defined only for positive
values of x, a service time of
as the solution
for x. This equation is called a random variate
generator or a process generator.
Thus, to obtain a service time, we
x 2first
r generate a
random number and then transform it using the
preceding equation.
As this example shows, the major
advantage of the inverse transformation
method is its simplicity and ease of
application.
Acceptance Rejection Method
There are several important distributions,
including the Erlang (used in queuing models) and
the beta (used in PERT), whose cumulative
distribution functions do not exist in closed form.
For these distributions, we must resort to other
methods of generating random variates, one of
which is the acceptance rejection method
(ARM).
This method is generally used for distributions
whose domains are defined over finite intervals.
Given a distribution whose pdf, f(x), is defined
over the interval a x b, the algorithm consists
of the following steps:
Step 1: Select a constant M such that M is the largest
value of f(x) over the interval [a, b].
Step 2: Generate two random numbers, r1 and r2.
Step 3: Computer x* = a + (b a)r1. (This ensures that
each member of [a, b] has an equal chance to be chosen
as x*.)
Step 4: Evaluate the function f(x) at the point x*. Let
this be f(x*).
Step 5: If
r2
f ( x*)
M
deliver x* as a random variate from the distribution whose
pdf is f(x). Otherwise, reject x* and go back to Step 2.
Note that the algorithm continues looping back to
Step 2 until a random variate is accepted.
This may take several iterations. For this reason, the
algorithm can be relatively inefficient.
The efficiency, however, is highly dependent on the
shape of the distribution.
There are several ways by which the method can
be made more efficient.
One of these is to use a function in Step 1 instead
of a constant.
We now give an intuitive justification of the
validity of the ARM.
In particular, we want to show that the ARM does
generate observations from the given random
variable X.
Direct and Convolution Methods for
the Normal Distribution
Both the inverse transformation method and the
acceptance reject method are inappropriate for
the normal distribution, because (1) the cdf does
not equal in closed form and (2) the distribution
is not defined over a finite interval.
Other methods such as an algorithm based on
convolution techniques, and then a direct
transformation algorithm that produces two
standard normal variates with mean 0 and
variance 1.
The Convolution Algorithm
In the convolution algorithm, we make direct use of
the Central Limit Theorem.
The Central Limit Theorem states that the sum Y of
n independent and identically distributed random
variables ( say Y1, Y2,Yn), each with mean and
finite variance 2) is approximately normally
distributed with mean n and variance n2.
If we want to generate a normal variate X with
mean and variance 2, we first generate Z using
this process generator then transform it using the
relation X = + Z. Unique to normal distribution.
The Direct Method
The direct methods for the normal distribution
was developed by Box and Muller (1958).
Its not as efficient as some of the newer
techniques, it is easy to apply and execute.
The algorithm generates two U(0,1) random
numbers, r1 and r2, and then transforms them into
two normal variates, each with mean 0 and
variance 1, using the direct transformation.
1
2
Z1 (2 ln r1 ) sin 2r2
1
2
Z2 (2 ln r1 ) cos 2r2
It is easy to transform these standardized normal
variates intro normal variates X1 and X2 from the
distribution with mean and variance 2, using
the equations
X1 Z1
X2 Z2
9.6 An Example of a Stochastic
Simulation
Cabot Inc. is a large mail order firm in Chicago.
Orders arrive into the warehouse via telephones. At
present, Cabot maintains 10 operators on-line 24
hours a day.
The operators take the orders and feed them directly
into a central computer, using terminals.
Each operator has one terminal. At present, the
company has a total of 11 terminals.
That is, if all terminals are working, there will be 1
spare terminal.
Cabot managers believe that the terminal system
needs evaluation, because the downtime of operators
due to broken terminals has been excessive.
They feel that the problem can be solved by the
purchase of some additional terminals for the spares
pool.
It has been determined that a new terminal will cost
a total of $75 per week.
It has also been determined that the cost of terminal
downtime, in terms of delays, lost orders, and so on
is $1000 per week.
Given this information, the Cabot managers would like
to determine how many additional terminals they
should purchase.
This model is a version of the machine repair problem.
It is easy to find an analytical solution to the problem
using the birth-death processes.
However, in analyzing the historical data for the
terminals, it has been determined that although the
breakdown times can be represented by the exponential
distribution, the repair times can be adequately
represented only by the exponential distribution.
This implies that analytical methods cannot be used
and that we must use simulation.
To simulate this system, we first require the
parameters of both the distributions.
The data show that the breakdown rate is
exponential and equal to 1 per week per terminal.
In other words, the time breakdowns for a terminal
is exponential with a mean equal to 1 week.
Analysis for the repair times shows that this
distribution can be represented by the triangular
distribution which has a mean of 0.075 week.
10 400 x 0.025 x 0.075
f ( x)
50 400 x 0.075 x 0.125
The repair stuff on average can repair 13.33
terminals per week.
To find the optimal number of terminals, we must
balance the cost of the additional terminals against
the increased revenues generated as a result of the
increase in the number of terminals.
In this simulation we increase the number of
terminals in the system, n, from the present total of
11 in increments of 1.
For this fixed value of n, we then run our simulation
model to estimate the net revenue.
Net revenue here is defined as the difference
between the increase in revenues due to the
additional terminals and the cost of these additional
terminals.
We keep on adding terminals until the net revenue
position reaches a peak.
To calculate the net revenue, we first computer the
average number of on-line terminals, ELn, for a
fixed number of terminals in the system, n.
Once we have a value of ELn, we can
computer the expected weekly downtime
costs, given by 1000(10-ELn).
Then the increase in revenue as a result of
increasing the number of terminals from 11
m
to n is 1000(EL
T n EL11). Mathematically,
Ai
N
(
t
)
dt
0
ELn EL
i 1
we compute
n
T
where
T = length of simulation
N(t) = number of terminals on-line at time t (0tT)
Ai = area of rectangle under N(t) between ei-1 and ei
(where ei is the time of the ith event)
m = number of events that occur in the interval [0,T]
Between time 0 and time e1, the time of the first
event, the total on-line time for all the terminals is
given by 10ei, since each terminal is on-line for a
period of e1 time units.
If we now run this simulation over T time units and
sum up the areas A1, A2, A3,, we can get an
estimate for EL10 by dividing this sum by T. This
statistic is called a time-average statistic.
We would like to set up the process in such way that
it will be possible to collect the statistics to
computer the areas A1, A2, A3,.
That is, as we move from event to event, we would
like to keep track of at least the number of terminals
on-line between the events and the time between
events.
We first define the state of the system as the
number of terminals in the repair facility.
The only time the state of the system will change is
when there is either a breakdown or a completion
of a repair.
Therefore, there are two events in this simulation:
breakdown and completion of repairs.
To set up the simulation, our first task is to
determine the process generators for both the
breakdown and the repair times.
We use the ITM to develop the process generators.
For the exponential distribution the process
generator is simply x = -log r
In case of the repair times, applying the ITM gives
us
x 0.025 0.005r (0 r 0.5)
and
x 0.125 0.005(1 r ) (0.5 r 1.0)
as the process generators.
For each n, we start the simulation in the state
where there are no terminals in the repair facility.
In this state, all 10 operators are on-line and any
remaining terminals are in the spares pool.
Our first action is the simulation is to schedule the
first series of events, the breakdown times for the
terminals presently on-line.
Having scheduled these events, we next determine
the first event, the first breakdown, by searching
through the current event list.
We then move the simulation clock to the time
of this event and process this breakdown.
To process a breakdown, we take two separate
series of actions
1. Determine whether a spare is available.
2. Determine whether the repair staff is idle.
These actions are summarized in the system
flow diagram showed in the book in Figure 17.
Otherwise, we process a completion of a repair.
To process the completion of a repair, we also
undertake two series of actions.
1. At the completion of a repair, we have an additional
working terminal, so we determine whether the terminal
goes directly to an operator or to the spares pool.
2. We check the repair queue to see whether any terminals
are waiting to be repaired.
We proceed with the simulation by moving from
event to event until the termination time T.
At this time, we calculate all the relevant measures
of performance from the statistical counters.
Our key measure is the net revenue for the current
value of n.
If this revenue is greater than the revenue for a
system with n-1 terminals, we increase the value of
n by 1 and repeat the simulation with n +1
terminals in the system.
Otherwise, the net revenue has reached a peak.
The simulation outlined in this example can be
used to analyze other policy options that
management may have.
The simulation model provides a very
flexible mechanism for evaluating
alternative policies.