UNIVERSITY FOR DEVELOPMENT STUDIES
SCHOOL OF PUBLIC HEALTH
DEPARTMENT OF GLOBAL AND INTERNATIONAL
HEALTH
BSC PUBLIC HEALTH
Introductory statistics (MAT 101)
William Nkegbe
DATA
Data: The numerical result of any scientific
experiment
It is the information that you are gathering for
an inquiry
Variate : Each quantity or attributes recorded on
each respondent or subject
Observation:- each individual piece of data
• Data set/ Data Matrix:- Collection of all
observations for particular variables
DATA Cont.
Data is categorized into two major categories. It is
either quantitative or qualitative
QUANTITATIVE DATA: they are observations that are
measured on a numerical
scale.
They are numbers, and as a result of counting
or measuring attributes of a population
Eg. pulse rate, weight, amount of money, number of
students who take statistics, temperature, etc
DATA Cont.
Quantitative data may either be discrete or
continuous
Quantitative Discrete Data: they are the result of
counting, and they take only certain numerical values.
Eg.
• number of phone calls
• number of hospital visits,
• number of infected persons
• number of diarrhea episodes experienced in a year
• etc.
They take the numbers 0,1,2,3……..
DATA Cont
Quantitative Continuous Data: this data are
as a result of measuring
Eg. height, weight, temperature etc.
These can take any numeric value, and the
scale can be meaningfully divided into smaller
increments including fractional and decimal
values
DATA Cont
QUALITATIVE DATA:
➢They are the result of categorizing or describing
attributes of a population.
➢Their values CANNOT be put in any numerical
order.
➢They are observations that cannot be put in any
numerical order.
➢They are categorical rather than numerical and
not capable of being measured
Eg.
Blood type, sex, hair colour, political affiliation, etc
DATA Cont
Qualitative data can either be:
➢Ordinal; if there are intrinsic ordering about its
categories;
Eg. Severity of a disease, or a variable with the
categories good, adequate or poor)
Or
➢Nominal; if its categories are unordered and
mutually exclusive.
Eg. Gender, marital status, flower colour, etc
SOURCES OF DATA
There are two main sources of data
❖Primary Data: One which is collected by the
researcher himself for the purpose of a specific
inquiry or study
Such data is original in character and is generated by
surveys conducted by individuals or research
institutions
Eg;. if we are interested to know what women think
about the issue of abortion, we must undertake a
survey and collect data on the opinions of women by
asking relevant questions.
SOURCES OF DATA Cont
Secondary Data: data which has been collected by
other agencies and used by a researcher for
his own purpose.
• It can be obtained from research organizations,
journals, reports , government publications, etc
Eg. Climate data can be accessed from the Ghana
Meteorological Agency to assess the weather
conditions of a particular area.
DATA COLLECTION
➢Data collection is the process of gathering and
measuring information on variables of interest, in
an established systematic fashion that enables
one to answer stated research questions, test
hypotheses, and evaluate outcomes.
➢The data collection component of research is
common to all fields of study including physical
and social sciences, humanities, business, etc
Data Collection Cont
➢ While methods vary by discipline, the emphasis on
ensuring accurate and honest collection remains the
same
▪ It is importance to ensure accurate and appropriate
data collection
▪ accurate data collection is essential to maintaining
the integrity of research
▪ Hence maintaining the integrity of data collection is
very necessary.
Data Collection Cont
Note the following:
➢ As you begin thinking about a research question, also
begin thinking about the type of data you will have to
collect to answer that question.
❖ Interview?
❖Questionnaire?
❖Paper and pencil?
❖Computer?
Find out how other people have done it in the past by
reading the relevant literature
Data Collection Cont
➢ Think about WHERE you will be getting the data
➢ Make sure that the data collection forms you use are
clear and easy to use.
▪ Do a pilot and practice on the collected data
➢ Always make a duplicate copy of the data file and the
data collection sheets and keep them in a separate
location
➢ Do not rely on other people to collect or transfer
your data unless you have personally trained them
Data Collection Cont
➢As much as possible, cultivate possible sources
for your subject pool.
➢Try to follow up on subjects who missed their
testing session or interview
➢Never discard the original data
➢The type or form of data collection depends on
the type of study you are undertaking
Data Collection Cont
Eg
• Surveys and Cross-Sectional Studies
• Retrospective Studies
• Prospective Studies
• Experimental Studies and Quality Control
• Clinical Trials
• Epidemiological Studies
• Pharmacoeconomic Studies and Quality of Life
• Etc
DATA ENTRY
As an investigator, once you know what information
you want and how you are going to obtain it, you
need to decide how you will assign names and
attributes to the data.
➢Each type of observation should be given a name.
Eg
If one is studying systolic blood pressure of males by
age, one could have a small data set that contains
Data Entry
• ID
• Age,
• Systolic (an abbreviation for systolic blood
pressure),
• Gender and
• Smoke (smoking status)
• Etc
Note that abbreviated words are often used to
d
identify the variables
Data Entry Cont.
Depending on the software the researcher is using,
data can be entered directly or could be entered
into one and imported to another.
The researcher could have a data as shown
Id Age Systolic Sex Smoke
1 57 123 1 1
2 71 137 2 2
3 35 128 1 3
4 60 155 2 1
Data Cont.
Note from the table that:
➢Each row represents a new case and each
column a different type of observation (called
variables)
➢ You will need to code each qualitative response
in advance.
Here;
1 = male and 2 = female.
Also, 1 = current smoker, 2 = former smoker, and
3 = nonsmoker
➢Each person or item number is given an ID
number
Data Entry Cont.
Note also that:
❖In medical and many other studies, names are
often not used, for confidentiality reasons.
❖If the investigator wishes to compare the results for
two or more groups, a variable should be entered
that represents group identity.
ORGANISING / REPRESENTING DATA
Data can be organised into:
• Ordered Arrays
• The Frequency Table
• Relative Frequency Tables
• Stem and Leaf Tables
And or represented on Graphs
• The Histograms
• The Frequency Polygon
• Box-and-whisker plot
• Dot plot
• Distribution Curves Problems
Organizing Data Cont.
The data below is a list of 120 values of Body Mass Index (BMI) data
from the 1998 National Health Interview Survey on US adults.
27.4 31.0 34.2 28.9 25.7 37.1 24.8 34.9 27.5 25.9
23.5 30.9 27.4 25.9 22.3 21.3 37.8 28.8 28.8 23.4
21.9 30.2 24.7 36.6 25.4 21.3 22.9 24.2 27.1 23.1
28.6 27.3 22.7 22.7 27.3 23.1 22.3 32.6 29.5 38.8
21.9 24.3 26.5 30.1 27.4 24.5 22.8 24.3 30.9 28.7
22.4 35.9 30.0 26.2 27.4 24.1 19.8 26.9 23.3 28.4
20.8 26.5 28.2 18.3 30.8 27.6 21.5 33.6 24.8 28.3
25.0 35.8 25.4 27.3 23.0 25.7 22.3 35.5 29.8 27.4
31.3 24.0 25.8 21.1 21.1 29.3 24.0 22.5 32.8 38.2
27.3 19.2 26.6 30.3 31.6 25.4 34.8 24.7 25.6 28.3
26.5 28.3 35.0 20.2 37.5 25.8 27.5 28.8 31.1 28.7
24.1 24.0 20.7 24.6 21.1 21.9 30.8 24.6 33.2 31.6
Organizing Data Cont
Ordered Array
18.3 21.9 23.0 24.3 25.4 26.6 27.5 28.8 30.9 34.8
19.2 21.9 23.1 24.3 25.6 26.9 27.5 28.8 30.9 34.9
19.8 21.9 23.1 24.5 25.7 27.1 27.6 28.9 31.0 35.0
20.2 22.3 23.3 24.6 25.7 27.3 28.2 29.3 31.1 35.5
20.7 22.3 23.4 24.6 25.8 27.3 28.3 29.5 31.3 35.8
20.8 22.3 23.5 24.7 25.8 27.3 28.3 29.8 31.6 35.9
21.1 22.4 24.0 24.7 25.9 27.3 28.3 30.0 31.6 36.6
21.1 22.5 24.0 24.8 25.9 27.4 28.4 30.1 32.6 37.1
21.1 22.7 24.0 24.8 26.2 27.4 28.6 30.2 32.8 37.5
21.3 22.7 24.1 25 26.5 27.4 28.7 30.3 33.2 37.8
21.3 22.8 24.1 25.4 26.5 27.4 28.7 30.8 33.6 38.2
21.5 22.9 24.2 25.4 26.5 27.4 28.8 30.8 34.2 38.8
By inspection from the table of ordered array, we find that the lowest and
highest values are 18.3 and 38.8, respectively
Frequency Distribution Tables
We will use the given data to help us create equally
spaced intervals for tabulating frequencies of data
Determine the width of the intervals
➢ Although the number of intervals that one may
choose for a frequency distribution is arbitrary, the
actual number should depend on the range of the
data and the number of cases
➢ For a data set of 100 to 150 observations, the
number (of intervals) chosen usually ranges from
about five to ten
Frequency Distribution Cont.
In the present example, the range of the data is
38.8 – 18.3 = 20.5
Suppose we divide the data set into seven intervals.
Then, we have
20.5 ÷ 7 = 2.93
Which rounds to 3.0
• Consequently, the intervals will have a width of
three (3).
Frequency Distribution Cont.
• These seven intervals are as follows:
• 18.0 – 20.9
• 21.0 – 23.9
• 24.0 – 26.9
• 27.0 – 29.9
• 30.0 – 32.9
• 33.0 – 35.9
• 36.0 – 38.9
Frequency Distribution Cont.
Cumulative Frequency of BMI
Class Interval Frequency (f) Cumulative Relative
for BMI Levels frequency (cf) Frequency (%)
18.0 - 20.9 6 6 5.00
21.0 - 23.9 24 30 20.00
24.0 - 26.9 32 62 26.67
27.0 - 29.9 28 90 33.33
30.0 - 32.9 15 105 12.50
33.0 - 35.9 9 114 7.50
36.0 - 38.9 6 120 6.00
Total 120 -- 100.00
Frequency Distribution Cont.
Relative Frequency Table for BMI Levels
Class Interval for Relative Frequency Cumulative Relative
BMI Levels (%) Frequency (%)
18.0 - 20.9 5.00 5.00
21.0 - 23.9 20.00 25.00
24.0 - 26.9 26.67 51.65
27.0 - 29.9 33.33 75.00
30.0 - 32.9 12.50 87.50
33.0 - 35.9 7.50 95.00
36.0 - 38.9 6.00 100.00
Total 100 100
GRAPHICAL REPRESENTATION OF DATA
BMI of 120 US Adults
35
30
25
Frequencye
20
15
10
0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.9 27.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9
Class Interval for BMI Levels
GRAPHICAL REPRESENTATION OF DATA
Relative Frequency (%)
35
30
25
Relative Frequency
20
15
10
0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.9 27.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9
Class Interval
GRAPHICAL REPRESENTATION OF DATA
Cumulative Relative Frequency (%) of BMI
120
100
Cumulative Rel Freq (%)
80
60
40
20
0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.9 27.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9
Class Interval
GRAPHICAL REPRESENTATION OF DATA
Box plot (Box and Whisker plot)
Scatterplots
• Scatterplots display the relationship between two
continuous variables.
• Each dot on the graph has an X and Y coordinate that
corresponds with a pair of values.
Eg. BMI and %Fat
45
40
35
30
25
BMI
20
15
10
0
0 5 10 15 20 25 30 35
%Fat
SAMPLING
Scenario:
In a certain community, an opinion poll was
conducted to determine public sentiment
toward a health policy in an upcoming
election. The objective of the survey was to
estimate the proportion of voters in the
community who favoured the policy
Sampling
Note the following:
➢ An Element is an object on which a measurement is
taken
In our scenario, an element is a registered voter in the
community.
The measurement taken on an element is the voter’s
preference on the health policy
Because measurements are usually considered to be
numbers, the experimenter can obtain numerical data
by recording a 1 for a voter in favour of the policy and
a 0 for a voter not in favour
Sampling
Population
➢ is a collection of elements about which we wish to
make an inference
➢ a collection of people or objects that share common
observable characteristics.
The population in our example is the collection of voters
in the community
The characteristic (numerical measurement) of interest
for each member of this population is his or her
preference on the health policy
Sampling
An important task for the investigator is to
carefully and completely define the population
before collecting a sample.
This definition must include:
➢a description of the elements to be included
➢a specification of the measurements to be taken
Eg; if the population in the health policy study consists of
registered voters, then we may want to collect information
on whether or not each sampled person plans to vote in the
upcoming election.
Sampling
The population may be finite or infinite
➢ Finite Population: If the number of objects or units
in the population is count able, it is said to be a finite
population.
Eg; the number of houses in a suburb is a finite
population
➢ Infinite Population: If the number of objects or units
in the population is infinite, it is said to be an infinite
population.
Eg; the number of stars in the sky forms an infinite
population.
Sampling
In general, the population is denoted by and its
size is denoted by N.
In the case of infinite population, N →
➢ Target Population: A finite or infinite population
about which we require information is called target
population.
• Eg, All 19 year old females students in the
UDS.
sampling
➢Study Population: the basic finite set of
individuals we intend to study.
• Eg, All 19 year old female students of UDS
who pursue Public Health
➢Sampling units are nonoverlapping collections of
elements from the population that cover the entire
population
In the health policy example, a sampling unit may
be a registered voter in the community
Sampling
If each sampling unit contains one and only one
element of the population, then a sampling unit and
an element from the population are identical.
This situation arises if we sample individual voters
rather than households within the community; inb
our example.
Sampling
A frame is a list of sampling units.
If we specify the individual voter as the sampling
unit, a list of all registered voters may serve as a
frame for a public opinion poll
Sampling
A sample
➢is a collection of sampling units drawn from a
single frame or from multiple frames.
➢It is a subset of the population, which represents
the entire population. The sample is denoted by
s and its size by n
Sampling
Data are obtained from the elements of the sample
and used in describing the population
Let the individual voter be our sampling unit and the list
of registered voters be our frame. In the public opinion
poll, we contact a number of voters (the sample) to
determine their preference for the upcoming health policy
(measurement). We then use the information obtained
from these voters to make an inference about voter
preference throughout the community
Sampling
Advantages of Sampling
➢Reduced Cost
➢Greater Speed
➢Greater Scope
➢Greater Accuracy
Sampling
• The procedure for selecting the sample is called the
sample survey design
• The objective of a sample survey is to make an inference
about population parameters from information contained
in a sample
• Since observations cost money, a design that provides a
precise estimator of the parameter for a fixed sample size
yields a savings in cost to the experimenter
Sampling
Steps to Sample Selection
1. The objective of the survey
2. Population to be sampled
3. Data to be collected
4. Degree of precision desired
5. Methods of measurement
6. Frame
7. Selection of the sample
8. The Pretest
9. Organization of the fieldwork
[Link] and Analysis of the data
[Link] gained for future surveys
Sampling Designs
Some of the designs include
➢Simple random sampling,
➢Stratified random sampling,
➢Cluster sampling,
➢Systematic sampling.
➢Convenience sampling
➢Judgment Sampling
Etc
Sampling Methods:
Simple Random Sampling
➢If a sample of size n is drawn from a population of
size N such that every possible sample of size n
has the same chance of being selected, the
sampling procedure is called Simple Random
Sampling.
• The sample thus obtained is called a Simple
Random Sample.
Sampling Methods
Per this definition;
❖All individual elements in a population have the
same chance of being selected and that the
selection of individual elements is mutually
independent;
❖The presence or absence of a given element from
the sample does not affect the selection probability
of any other element
Sampling Methods
Stratified Random Sampling
A stratified random sample is one obtained by
separating the population elements into
nonoverlapping groups, called strata, and then
selecting a simple random sample from each
stratum.
Sampling Methods
Cluster Sampling
• In cluster sampling, the elements in the population
are first divided into separate groups called clusters.
Each element of the population belongs to one and
only one cluster
• A simple random sample of the clusters is then
taken.
• All elements within each sampled cluster form the
sample
• Cluster sampling tends to provide the best results
when the elements within the clusters are not alike
Sampling Designs
Systematic Sampling
Systematic sampling is defined as a probability
sampling method where the researcher chooses
elements from a target population by selecting a
random starting point and selects sample members
after a fixed ‘sampling interval’
Introduction to Probability
➢Probability is a numerical measure of the
likelihood that an event will occur
➢A measure of the likelihood that a particular event
will occur.
The concept of probability is relevant to experiments
that have some uncertain outcomes.
Experiment includes any activity that results in the
collection of data pertaining to phenomena that exhibit
variation.
Probability
The domain of probability encompasses all phenomena for
which outcomes cannot be exactly predicted in advance
Eg.
➢ Tossing a coin
➢ Rolling a die
➢ Gender of the first two newborns in town tomorrow
➢ Tossing two coins
➢ Rolling two dice
➢ Planting a seed, etc
Probability
Event:
An outcome or a set of outcomes of an activity or a result of
a trial.
Usually denoted by capital letters
Eg. Getting two heads in the trial of tossing three fair
coins simultaneously.
Elementary Event (Simple Event): a single possible
outcome of an experiment.
Eg. If we toss a fair coin, then the event of of a head coming
up is an elementary event
P(E) Probability event E occuring
Probability
Joint Events (Compound event): has two or more
elementary events in it.
Eg. Drawing a black ace from a pack of cards.
(this contains two elementary events of black and ace)
Sample Space (S): The collection of all possible outcomes
of an experiment
Eg. If we toss a die, one sample space, or set of all possible
outcomes, is given by S = {1, 2, 3, 4, 5, 6}
P(S) = 1
Probability
Remember:
For every event A;
➢ 0 P ( A) 1
a probability is between 0 and 1
➢ the impossible event has probability zero
P () = 0
➢ If A1 is the complement of A, then
P ( A1 ) = 1 − P ( A)
➢ If A = A1 A2 ... An , where A1 , A2 ... An are mutually
exclusive (all can’t occur at the same time as outcome of
a single experiment) events, then;
P ( A) = P ( A1 ) + P ( A2 ) + ... + P ( An )
Probability
❖When two or more events are mutually exclusive,
then the probability that either of the events will
occur is the sum of their separate probabilities
Eg. If we roll a single fair die, then the probability
that it will come up with a face 5(event A) or face
6(event B); with both events being being exclusive
events. We have:
P ( A or B ) = P ( A B ) = P ( A) + P ( B )
P(5 or 6) = P (5 6) = P (5) + P (6)
1 1
+
6 6
2
=
6
1
=
3
Probability
However if even A and B are not mutually exclusive,
then the probability of occurrence of either event A
or event B or both is equal to:
P( A B) = P( A) + P( B) − P( A and B)
P(A and B) is written as P( A B) or simply as P(AB)
➢Event (A and B) consist of all those events which
are contained in both A and B simultaneously.
Probability
Eg. In an experiment of taking cards out of a pack of 52 playing cards;
we assume that:
Event A = An ace is drawn
Event B = A spade is drawn
Event AB = An ace of spade is drawn
Hence;
P ( A B ) = P ( A) + P ( B ) − P ( AB )
4 13 1
= + −
52 52 52
16 4
= =
52 13
This is because there are 4 aces, 13 cards of spade, including 1 ace of
spade out of a total of 52 cards in the pack.
Probability
Note:
➢Independent events
➢Multiplication rule
➢Conditional Probability
➢Counting rule
➢Combination
➢Permutation
Probability
Counting Methods
• Suppose two operations A and B are carried out, and if
there are “m” different ways of carrying out A and “k”
different ways of carrying out B, then the combined
operation of A and B may be carried out in m × k = mk
different ways.
Eg. If we toss a coin 3 times, there are 2 possible
outcomes for each toss. Hence the total number of
possible outcomes in 2 x 2 x 2 = 8.
PROBABILITY DISTRIBUTION
Probability Distribution:
➢It’s a mathematical function that describes for all
possible outcomes of a random variable
➢It’s the listing of all possible outcomes of an
experiment together with their probabilities.
Eg. If we toss a fair coin two times; the following will
be the possible outcome of this experiment
Outcomes Probabilities
TT ¼
TH ¼
HT ¼
HH ¼
Probability Distribution
The probability distribution of the number of heads
obtained in this two tosses of the coin is given as:
No. of heads (X) Probability P(X)
0 1/4
1 1/2
2 1/4
Total 1.0
All probabilities must add up to 1; remember!
Probability Distribution
Random Variables:
➢They are characteristics that can be observed but
cannot be controlled.
➢They can be characteristics, measurements or
counts that vary randomly according to a function.
Random: you don’t know the value of the next
observation, but you do know the probabilities
associated with values and ranges of values.
Probability Distribution
Notation:
➢Upper case letters such as X and Y are used to
denote random variables.
➢Lower case letters such as x and y denote the
values of random variables.
Random variables can either be discrete or
continuous
Probability Distribution
Probability Distribution Function (PDF) of Discrete
Random Variables
➢ The table that shows the values x in one column and the
corresponding probabilities, p(x) in another column is
called a probability distribution function (PDF)
In our earlier example on the probability distribution of the
number of heads:
No. of heads (X) Probability P(X)
0 1/4
1 ½
2 ¼
Total 1.0
Probability Distribution
Thus, the p(x) satisfies the following characteristics;
➢. 0 p ( x) 1
Each probability is between zero and one, inclusive.
N
➢.
p( x ) = 1
i =1
i
The sum of the probabilities is one
Mean/Expected Value and Variance of a Probability Dist
Let X be random variable, and x1, x2, . . . xn, the list
of possible outcomes for X.
Then the mean of the distribution and expected
value of X are the same quantity, given by:
n
= E ( X ) = xi .P( xi ), i = 1, 2,3,..., n
i =1
Eg. Assume that we toss three fair coins
simultaneously, the possible number of heads that
can appear as a result of the random experiment are
as shown:
Mean/Expected Value and Variance of a Probability Distribution
Outcome No. of Heads Probabilities
TTT 0 1/8
HTT 1 1/8
TTH 1 1/8
THT 1 1/8
THH 2 1/8
HHT 2 1/8
HTH 2 1/8
HHH 3 1/8
Mean
Summarizing to the number of heads occurring in the entire
experiment and their corresponding probabilities, we have;
No of Heads (X) P(X) X.P(X)
0 1/8 0
1 3/8 3/8
2 3/8 ¾
3 1/8 3/8
Total 1.0 3/2
The expected value is;
n
E ( X ) = X i P ( X i ) , i = 1, 2, 3,..., n
i =1
3 3 3
= 0+ + +
8 4 8
12
= = 1.5
8
Mean
The result means that on the average, 1.5 heads can
be expected to appear as a result of every random
experiment of tossing three fair coins at any one
time.
Eg.
Variance
Let X be random variable, and x1, x2, . . . xn, the list
of possible outcomes for X.
Then:
The variance of the random variable is:
n
= Var ( X ) = ( X i − ) P( X i ), i = 1,2,3,..., n
2 2
i =1
And standard deviation is
n
= (x − ) P ( xi ), i = 1, 2,3,..., n
2
i
1=1
Variance
From our random experiment of tossing three fair
coins once;
No of Heads (X) P ( )
X ( X − ) ( X − )
2
( X − ) P( X )
2
0 1/8 1.5 - 1.5 2.25 0.28
1 3/9 1.5 - 0.5 0.25 0.09
2 3/8 1.5 0.5 0.25 0.09
3 1/8 1.5 1.5 2.25 0.28
Total 0.74
Variance = = ( X − ) P ( X )
2 2
= 0.74
Standard Deviation = = = 0.74 = 0.86 2
Binomial Distribution
Binomial distribution is a special discrete probability
distribution.
➢Its one of the simplest and most frequently used
discrete probability distributions
❖There are four conditions that the experiment has
to meet to be considered a binomial experiment:
i. There are a fixed number of trials.
Think of Trials as repetitions of an experiment.
The letter n denotes the number of trials.
Binomial Distribution
➢ There are only two possible and mutually exclusive
outcomes, called "success" and "failure," for each trial.
➢ The n trials are independent and are repeated using
identical conditions.
Because the n trials are independent, the outcome
of one trial does not help in predicting the outcome of
another trial.
➢ The letter p denotes the probability of a success on one
trial, and q, which is also (1-p); denotes the probability of
a failure on one trial, so p + q = 1.
Since the trials are independent, p and q remain the
same for each trial.
Binomial Distribution
Notation:
➢The outcomes of a binomial experiment fit a
binomial probability distribution.
➢The random variable X = the number of successes
obtained in the n independent trials.
➢. X B ( n, p ),
This notation states that the random variable X is a
binomial distribution with n trials and the
probability of success , p.
Binomial Distribution
➢. P ( x = k ) = nCk p k q n −k , this is the Binomial
Formula
(the probability of k successes out of n trials)
Where:
n n!
n
Ck = =
k k !( n − k ) !
p = Probability of success
q = Probability of failure (1 – p)
k = number of successes desired
n = number of trials
Binomial Distribution
The formula is also expressed as
n k n−k
p ( x) = p q
k
Eg. If a new drug is found to be effective 40% of the
time, then what is the probability that in a random
sample of 4 patients, it will be effective on 2 of
them?
Soln.
Let us define effective as success and non-effective
as failure. Then;
Binomial Distribution
p = 0.4 (since the drug is effective 40% of the time)
q = (1 – p) = (1 – 0.4) = 0.6
k=2
n=4
n
P( x) = ( p ) ( q )
k n−k
k
4
= ( 0.4 ) ( 0.6 )
2 2
2
4!
= ( 0.4 ) ( 0.6 )
2 2
2!( 4 − 2 ) !
= 6 0.16 0.36
= 0.3456
Binomial Distribution
Mean; = np
Variance; 2 = npq
Standard deviation; = npq
From our earlier example:
= ( 4 ) ( 0.4 ) = 1.6
= ( 4 )( 0.4 )( 0.6 ) = 0.96
2
= ( 4 )( 0.4 )( 0.6 ) = 0.9798
Binomial Distribution
Eg1. Suppose you play a game that you can only either win or lose. The
probability that you win any game is 55%, and the probability that you
lose is 45%. Each game you play is independent.
a. If you play the game 20 times, write the function that describes
the probability that you win will 15 of the 20 times.
b. Find the mean number of wins
c. Find the standard deviation of wins
Eg.2. A trainer is teaching a dolphin to do tricks. The probability that
the dolphin successfully performs the trick is 35%, and the probability
that the dolphin does not successfully perform the trick is 65%. Out of
ten attempts, you want to find the probability that the dolphin
succeeds at most 5 times. State the probability question
mathematically.
Poisson Distribution
It differs from the binomial distribution in the sense that:
➢ in the Binomial distribution we must be able to count
the number of successes and the number of failures;
….with the goal to look for the probability of a
specific value of success in n trials
➢ while in Poisson distribution , all we want is to know
the average number of successes in a given unit of
time.
… to look for the specific number of
occurrences in a specific amount of time or space.
Conditions of a Poisson Experiment
➢ The experiment consists of counting the number of events
occurring in a fixed interval of time or space if these
events happen with a known average rate and
independently of the time since the last event.
➢ The probability of the event remains constant for each
interval of equal length.
➢ The number of occurrences in one fixed interval is random
and independent of the number of occurrences in other
fixed intervals.
➢ The random variable X = the number of occurrences in the
interval of interest
Poisson Distribution
▪ We need to know the average of events per unit
of time; ( lambda )
This could be:
➢The average number of cars passing under a
bridge in any given hour
➢The average number of a machine breakdowns
per month
➢The average number of patience arriving at a
facility per day
➢Etc.
Poisson Distribution
➢The probability that exactly (x) events will occur in
a given time is given as:
e
x −
P ( x) =
x!
Where:
= Average number of occurrences per unit time
e = the base of the natural logarithms (2.71828…)
= np
=
Poisson Distribution
Eg.1: Assume that on an average 3
persons enter the lab for service every 10
minutes. What is the probability that
exactly 5 customers will enter the lab in a
given 10 minute period, assuming that the
process can be described by Poisson
distribution
Poisson Distribution
Soln.
x=5
=3
x e−
P ( x) =
x!
( 3) ( 2.71828 )
5 −3
=
5!
=
( 243 ) ( .0498 )
120
= 0.1008
Poisson Distribution
Eg.2. Customers arrive at a photocopying machine at
an average rate of 2 every 10 minutes. The number
of arrivals is distributed according to a Poisson
distribution.
What is the probability that:
a. There will be no arrival during this time period
b. There will be exactly one arrival during this time
period
c. There will be more than two arrivals during this
time period
Poisson Distribution
Soln.
From our problem, we have:
=2
x e−
P ( x) = , for x = 0,1, 2,...
x!
( 2 ) ( 2.71828)
0 −2
a. For x = 0; P ( 0) = = 0.1353
0!
( 2 ) ( 2.71828 )
1 −2
b. For x = 1; P (1) = = 0.2707
1!
( 2 ) ( 2.71828 )
2 −2
a. For x = 2; P ( 2 ) = = 0.2707
2!
Poisson Distribution
Then, the probability of more than 10 arrivals in a
10 minute period is
P ( x 2 ) = 1 − P ( 0 ) + P (1) + P ( 2 )
= 1 − .1353 + 02707 + .2707
= 1 − 6767
= 0.3233
CONTINUOUS RANDOM VARIABLES