0% found this document useful (0 votes)
38 views7 pages

Unit 6 Input Modeling: Collect Data From The Real System of Interest

Input modeling is crucial for simulation, requiring data collection, identification of probability distributions, parameter selection, and goodness-of-fit evaluation. The document outlines steps for effective data collection and emphasizes the importance of planning, analyzing data, and recognizing potential issues like data censoring. It also discusses various probability distributions and their applications in modeling different processes.

Uploaded by

h210154y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views7 pages

Unit 6 Input Modeling: Collect Data From The Real System of Interest

Input modeling is crucial for simulation, requiring data collection, identification of probability distributions, parameter selection, and goodness-of-fit evaluation. The document outlines steps for effective data collection and emphasizes the importance of planning, analyzing data, and recognizing potential issues like data censoring. It also discusses various probability distributions and their applications in modeling different processes.

Uploaded by

h210154y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT 6

INPUT MODELING

• Input data provide the driving force for a simulation model. In the simulation of a queuing system, typical input data

are the distributions of time between arrivals and service times.

• For the simulation of a reliability system, the distribution of time-to=failure of a component is an example of input

data.

However coming up with an input model is a very sophisticatedtask

There are four steps in the development of a useful model of input data:

• Collect data from the real system of interest.

This often requires a substantial time and resource commitment. Unfortunately, in some situations it is not possible to

collect data.

• Identify a probability distribution to represent the input process.

When data are available, this step typically begins by developing a frequency distribution, or histogram, of the data.

Consider all probability distrion discussed. Of which they got good results

• Choose parameters that determine a specific instance of the distribution family.

When data are available, these parameters may be estimated from the data.

• Evaluate the chosen distribution and the associated parameters for good-of-fit.

Goodness-of-fit may be evaluated informally via graphical methods, or formally via statistical tests.

The chisquare and the Kolmo-gorov-Smirnov tests are standard goodness-of-fit tests. If not satisfied that the chosen

distribution is a good approximation of the data, then the analyst returns to the second step, chooses a different family

of distributions, and repeats the procedure. If several iterations of this procedure fail to yield a fit between an assumed

distributional form and the data colle3cted, theempirical form of the distribution can be used
6.1 Data Collection
Data collection is one of the biggest tasks in solving a real problem. It is one of the most
important and difficult problems in simulation. And, even when data are available, they have
rarely been recorded in a form that is directly useful for simulation input modeling.

The following suggestions may enhance and facilitate data collection, although they are not all –
inclusive.
1. A useful expenditure of time is in planning. This could begin by a practice or pre observing
session. Try to collect data while pre observing. begins. Watch for
unusnal circumstances, and consider how they will be handled. When possible, videotape the
system and extract the data later by viewing the tape. Planning is important, even if data will be
collected automatically (e.g., via computer data collection), to ensure that the appropriate data
are available
2. Try to analyze the data as they are being collected. Determine if any data being collected are
useless to the simulation. There is no need to collect superfluous data. Data must beaduequate
for the distribution for the input model.
3. Try to combine homogeneous data sets. Check data for homogeneity in successive time
periods and during the same time period on successive days.
4. Be aware of the possibility of data censoring, in which a quantity of interest is not observed
in its entirety. This problem most often occurs when the analyst is interested in the time required
to complete some process (for example, produce a part, treat a patient, or have a component
fail), but the process begins prior to, or finishes after the completion of, the observation period.
5. To determine whether there is a relationship between two variables, build a scatter diagram
. 6. Consider the possibility that a sequence of observations which appear to be independent
may possess autocorrelation. Autocorrelation may exist in successive time periods or
forsuccessive customers.
7. Keep in mind the difference between input data and output or performance data, and be sure
to collect input data. Input data typically represent the uncertain quantities that are largely
beyond the control of the system and will not be altered by changes made to improve the
system.
6.2.1 Histogram

A frequency distribution or histogram is useful in identifying the shape of a distribution.

A histogram is constructed as follows:

1. Divide the range of the data into intervals (intervals are usually of equal width; however,

unequal widths however, unequal width may be used if the heights of the frequencies are

adjusted).

2. Label the horizontal axis to conform to the intervals selected.

3. Determine the frequency of occurrences within each interval.

4. Label the vertical axis so that the total occurrences can be plotted for each interval.

5. Plot the frequencies on the vertical axis.

• If the intervals are too wide, the histogram will be coarse, or blocky, and its shape and other

details will not show well. If the intervals are too narrow, the histogram will be ragged and

will not smooth the data.

• The histogram for continuous data corresponds to the probability density function of a

theoretical distribution.

The number of class intervals depends on the number of observations and on the amount of
scatter or dispersion in the data.

choosing the number of class intervals approximately equal to the square root of the sample
size often works well in practice.
Include an example and a diagram .

Discrete Example

The number of vehicles arriving at the northwest corner of an intersection in a 5-minute


period between 7:00 A.M. and 7:05 A.M. was monitored for .five workdays over a 20-week
period. Table 9.1 shows the resulting data. The first entry in the table indicates that there
were 12 5-ll).inute periods during which zero vehicles arrived, 10 periods during which one
vehicle arrived, and so on. . The number of automobiles is a [Link] variable, and there. are
ample 4ata, so the histogram may have a cell for each possible value in the range of the data.
The resulting histogram is shown in Figure 9 .2.
Continous Example
Life tests were performed on a random sample of electronic components at 1.5 times the
nominal voltage, and their lifetime (or time to failure), in days, was recorded:
79.919 3.081 0.062 1.961 5.845
3.027 6.505 0.021 0.013 0.123 .
6.769 . 59.899 1.192 34.760 . 5.009
18.387 0.141 43.565 24.420 0.433
144.95 2.663 17.967 0.091 9.003
0.941 0.878 3.371 2.157 7.579
0.624 5.380 3.148 7.078 23.960
0.590 1.928 0.300 0.002 0.543
7.004 31.764 1.005 1.147 0.219
3.217 14.382 1.008 2.336 4.562
6.2.2 Selecting the Family of Distributions
Additionally, the shapes of these distributions were displayed. The purpose of preparing histogram
is to infer a known pdf or pmf. A family of distributions is selected on the basis of what might arise
in the context being investigated along with the shape of the histogram.
Thus, if interarrival-time data have been collected, and the histogram has a shape similar to the pdf
in Figure [Link] assumption of an exponential distribution would be warranted.
• Similarly, if measurements of weights of pallets of freight are being made, and the histogram
appears symmetric about the mean with a shape like that shown in Fig 5.12, the assumption of a
normal distribution would be warranted.
• The exponential, normal, and Poisson distributions are frequently encountered and are not
difficult to analyze from a computational standpoint. Although more difficult to analyze, the
gamma and Weibull distributions provide array of shapes, and should not be overlooked when
modeling an underlying probabilistic process. Perhaps an exponential distribution was assumed,
but it was found not to fit the data. The next step would be to examine where the lack of fit
occurred.
• If the lack of fit was in one of the tails of the distribution, perhaps a gamma or Weibull
distribution would more adequately fit the data.
• Literally hundreds of probability distributions have been created, many with some specific
physical process in mind. One aid to selecting distributions is to use the physical basis of the
distributions as a guide. Here are some examples:
Binomial : Models the number of successes in n trials, when the trials are independent with
common success probability, p; for example, the number of defective computer chips found in a
lot of n chips.
Negative Binomial (includes the geometric distribution) : Models the number of trials required to
achieve k successes; for example, the number of computer chips that we must inspect to find 4
defective chips.
Poisson : Models the number of independent events that occur in a fixed amount of time or
space: for example, the number of customers that arrive to a store during 1 hour, or the number
of defects found in 30 square meters of sheet metal.
Normal : Models the distribution of a process that can be thought of as the sum of a number of
component processes; for example, the time to assemble a product which is the sum of the times
required for each assembly operation. Notice that the normal distribution admits negative values,
which may be-impossible for process times.
Lognormal : Models the distribution of a process that can be thought of as the product of
(meaning to multiply together) a number of component processes; for example, the rate of return
on an investment, when interest is compounded, is the product of the returns for a number of
periods.
Exponential : Models the time between independent events, or a process time which is
memoryless (knowing how much time has passed gives no information about how much
additional time will pass before the process is complete); for example, the times between the
arrivals of a large number of customers who act independently of each other.
Gamma : An extremely flexible distribution used to model nonnegative random variables. The
gamma can be shifted away from 0 by adding a constant.
Beta : An extremely flexible distribution used to model bounded (fixed upper and lower
limits)
random variables. The beta can be shifted away from 0 by adding a constant and can have a
larger range than [0,1] by multiplying by a constant.
Weibull : Models the time to failure for components; for example, the time to failure for a disk
drive. The exponential is a special case of the Weibull. Discrete or Continuous Uniform
Models
complete uncertainty , since all outcomes are equally likely. This distribution is often
overused
when there are no data.
Triangular Models a process when only the rninimum, most-likely, and maximum values of
the
distribution are known; for example, the minimum, most- likely, and maximum time required
to
test a product.
Empirical Resamples from the actual data collected; often used when no theoretical
distribution
seems appropriate.
6.3 Parameter Estimation
After a family of distributions has been selected, the next step is to estimate the parameters of
the
distribution.

Do not ignore physical characteristics of the process· when selecting distributions. Is the
process naturally
discrete or continuous valued? Is it bounded, or is there no natural bound? This knowledge,
which does not
depend on data, can help narrow the family of distributions from which to choose. And keep
in mind that there
is no ''true" distribution for any stochastic input process. An input model is an approximation
of reality, so the
goal is to obtain an approximation that yields useful results from the simulation experiment.
The reader is encouraged to complete Exercises 6 through I I to leani · more about the

You might also like