0% found this document useful (0 votes)

21 views95 pages

Mat 101 Basic Statistics PPP Iph

Uploaded by

climbercollins6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views95 pages

Mat 101 Basic Statistics PPP Iph

Uploaded by

climbercollins6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIVERSITY FOR DEVELOPMENT STUDIES

SCHOOL OF PUBLIC HEALTH

DEPARTMENT OF GLOBAL AND INTERNATIONAL
HEALTH
BSC PUBLIC HEALTH
Introductory statistics (MAT 101)

William Nkegbe
DATA
Data: The numerical result of any scientific
experiment

It is the information that you are gathering for

an inquiry
Variate : Each quantity or attributes recorded on
each respondent or subject
Observation:- each individual piece of data
• Data set/ Data Matrix:- Collection of all
observations for particular variables
DATA Cont.
Data is categorized into two major categories. It is
either quantitative or qualitative
QUANTITATIVE DATA: they are observations that are
measured on a numerical
scale.
They are numbers, and as a result of counting
or measuring attributes of a population

Eg. pulse rate, weight, amount of money, number of

students who take statistics, temperature, etc
DATA Cont.
Quantitative data may either be discrete or
continuous

Quantitative Discrete Data: they are the result of

counting, and they take only certain numerical values.
Eg.
• number of phone calls
• number of hospital visits,
• number of infected persons
• number of diarrhea episodes experienced in a year
• etc.
They take the numbers 0,1,2,3……..
DATA Cont

Quantitative Continuous Data: this data are

as a result of measuring

Eg. height, weight, temperature etc.

These can take any numeric value, and the

scale can be meaningfully divided into smaller
increments including fractional and decimal
values
DATA Cont
QUALITATIVE DATA:
➢They are the result of categorizing or describing
attributes of a population.

➢Their values CANNOT be put in any numerical

order.
➢They are observations that cannot be put in any
numerical order.
➢They are categorical rather than numerical and
not capable of being measured
Eg.
Blood type, sex, hair colour, political affiliation, etc
DATA Cont
Qualitative data can either be:

➢Ordinal; if there are intrinsic ordering about its

categories;
Eg. Severity of a disease, or a variable with the
categories good, adequate or poor)
Or

➢Nominal; if its categories are unordered and

mutually exclusive.
Eg. Gender, marital status, flower colour, etc
SOURCES OF DATA
There are two main sources of data

❖Primary Data: One which is collected by the

researcher himself for the purpose of a specific
inquiry or study

Such data is original in character and is generated by

surveys conducted by individuals or research
institutions

Eg;. if we are interested to know what women think

about the issue of abortion, we must undertake a
survey and collect data on the opinions of women by
asking relevant questions.
SOURCES OF DATA Cont
Secondary Data: data which has been collected by
other agencies and used by a researcher for
his own purpose.
• It can be obtained from research organizations,
journals, reports , government publications, etc
Eg. Climate data can be accessed from the Ghana
Meteorological Agency to assess the weather
conditions of a particular area.
DATA COLLECTION
➢Data collection is the process of gathering and
measuring information on variables of interest, in
an established systematic fashion that enables
one to answer stated research questions, test
hypotheses, and evaluate outcomes.

➢The data collection component of research is

common to all fields of study including physical
and social sciences, humanities, business, etc
Data Collection Cont
➢ While methods vary by discipline, the emphasis on
ensuring accurate and honest collection remains the
same

▪ It is importance to ensure accurate and appropriate

data collection

▪ accurate data collection is essential to maintaining

the integrity of research

▪ Hence maintaining the integrity of data collection is

very necessary.
Data Collection Cont
Note the following:

➢ As you begin thinking about a research question, also

begin thinking about the type of data you will have to
collect to answer that question.

❖ Interview?
❖Questionnaire?
❖Paper and pencil?
❖Computer?

Find out how other people have done it in the past by

reading the relevant literature
Data Collection Cont
➢ Think about WHERE you will be getting the data

➢ Make sure that the data collection forms you use are
clear and easy to use.
▪ Do a pilot and practice on the collected data

➢ Always make a duplicate copy of the data file and the

data collection sheets and keep them in a separate
location

➢ Do not rely on other people to collect or transfer

your data unless you have personally trained them
Data Collection Cont
➢As much as possible, cultivate possible sources
for your subject pool.

➢Try to follow up on subjects who missed their

testing session or interview

➢Never discard the original data

➢The type or form of data collection depends on

the type of study you are undertaking
Data Collection Cont
Eg
• Surveys and Cross-Sectional Studies
• Retrospective Studies
• Prospective Studies
• Experimental Studies and Quality Control
• Clinical Trials
• Epidemiological Studies
• Pharmacoeconomic Studies and Quality of Life
• Etc
DATA ENTRY
As an investigator, once you know what information
you want and how you are going to obtain it, you
need to decide how you will assign names and
attributes to the data.

➢Each type of observation should be given a name.

Eg
If one is studying systolic blood pressure of males by
age, one could have a small data set that contains
Data Entry
• ID
• Age,
• Systolic (an abbreviation for systolic blood
pressure),
• Gender and
• Smoke (smoking status)
• Etc


Note that abbreviated words are often used to
d

identify the variables

Data Entry Cont.
Depending on the software the researcher is using,
data can be entered directly or could be entered
into one and imported to another.

The researcher could have a data as shown

Id Age Systolic Sex Smoke

1 57 123 1 1
2 71 137 2 2
3 35 128 1 3
4 60 155 2 1
Data Cont.
Note from the table that:
➢Each row represents a new case and each
column a different type of observation (called
variables)
➢ You will need to code each qualitative response
in advance.
Here;
1 = male and 2 = female.
Also, 1 = current smoker, 2 = former smoker, and
3 = nonsmoker
➢Each person or item number is given an ID
number
Data Entry Cont.

Note also that:

❖In medical and many other studies, names are

often not used, for confidentiality reasons.

❖If the investigator wishes to compare the results for

two or more groups, a variable should be entered
that represents group identity.
ORGANISING / REPRESENTING DATA
Data can be organised into:
• Ordered Arrays
• The Frequency Table
• Relative Frequency Tables
• Stem and Leaf Tables

And or represented on Graphs

• The Histograms
• The Frequency Polygon
• Box-and-whisker plot
• Dot plot
• Distribution Curves Problems
Organizing Data Cont.
The data below is a list of 120 values of Body Mass Index (BMI) data
from the 1998 National Health Interview Survey on US adults.

27.4 31.0 34.2 28.9 25.7 37.1 24.8 34.9 27.5 25.9
23.5 30.9 27.4 25.9 22.3 21.3 37.8 28.8 28.8 23.4
21.9 30.2 24.7 36.6 25.4 21.3 22.9 24.2 27.1 23.1
28.6 27.3 22.7 22.7 27.3 23.1 22.3 32.6 29.5 38.8
21.9 24.3 26.5 30.1 27.4 24.5 22.8 24.3 30.9 28.7
22.4 35.9 30.0 26.2 27.4 24.1 19.8 26.9 23.3 28.4
20.8 26.5 28.2 18.3 30.8 27.6 21.5 33.6 24.8 28.3
25.0 35.8 25.4 27.3 23.0 25.7 22.3 35.5 29.8 27.4
31.3 24.0 25.8 21.1 21.1 29.3 24.0 22.5 32.8 38.2
27.3 19.2 26.6 30.3 31.6 25.4 34.8 24.7 25.6 28.3
26.5 28.3 35.0 20.2 37.5 25.8 27.5 28.8 31.1 28.7
24.1 24.0 20.7 24.6 21.1 21.9 30.8 24.6 33.2 31.6
Organizing Data Cont
Ordered Array
18.3 21.9 23.0 24.3 25.4 26.6 27.5 28.8 30.9 34.8
19.2 21.9 23.1 24.3 25.6 26.9 27.5 28.8 30.9 34.9
19.8 21.9 23.1 24.5 25.7 27.1 27.6 28.9 31.0 35.0
20.2 22.3 23.3 24.6 25.7 27.3 28.2 29.3 31.1 35.5
20.7 22.3 23.4 24.6 25.8 27.3 28.3 29.5 31.3 35.8
20.8 22.3 23.5 24.7 25.8 27.3 28.3 29.8 31.6 35.9
21.1 22.4 24.0 24.7 25.9 27.3 28.3 30.0 31.6 36.6
21.1 22.5 24.0 24.8 25.9 27.4 28.4 30.1 32.6 37.1
21.1 22.7 24.0 24.8 26.2 27.4 28.6 30.2 32.8 37.5
21.3 22.7 24.1 25 26.5 27.4 28.7 30.3 33.2 37.8
21.3 22.8 24.1 25.4 26.5 27.4 28.7 30.8 33.6 38.2
21.5 22.9 24.2 25.4 26.5 27.4 28.8 30.8 34.2 38.8

By inspection from the table of ordered array, we find that the lowest and
highest values are 18.3 and 38.8, respectively
Frequency Distribution Tables
We will use the given data to help us create equally
spaced intervals for tabulating frequencies of data

Determine the width of the intervals

➢ Although the number of intervals that one may

choose for a frequency distribution is arbitrary, the
actual number should depend on the range of the
data and the number of cases

➢ For a data set of 100 to 150 observations, the

number (of intervals) chosen usually ranges from
about five to ten
Frequency Distribution Cont.
In the present example, the range of the data is
38.8 – 18.3 = 20.5

Suppose we divide the data set into seven intervals.

Then, we have
20.5 ÷ 7 = 2.93
Which rounds to 3.0

• Consequently, the intervals will have a width of

three (3).
Frequency Distribution Cont.
• These seven intervals are as follows:
• 18.0 – 20.9
• 21.0 – 23.9
• 24.0 – 26.9
• 27.0 – 29.9
• 30.0 – 32.9
• 33.0 – 35.9
• 36.0 – 38.9
Frequency Distribution Cont.
Cumulative Frequency of BMI
Class Interval Frequency (f) Cumulative Relative
for BMI Levels frequency (cf) Frequency (%)
18.0 - 20.9 6 6 5.00
21.0 - 23.9 24 30 20.00
24.0 - 26.9 32 62 26.67
27.0 - 29.9 28 90 33.33
30.0 - 32.9 15 105 12.50
33.0 - 35.9 9 114 7.50
36.0 - 38.9 6 120 6.00
Total 120 -- 100.00
Frequency Distribution Cont.
Relative Frequency Table for BMI Levels
Class Interval for Relative Frequency Cumulative Relative
BMI Levels (%) Frequency (%)

18.0 - 20.9 5.00 5.00

21.0 - 23.9 20.00 25.00
24.0 - 26.9 26.67 51.65
27.0 - 29.9 33.33 75.00
30.0 - 32.9 12.50 87.50
33.0 - 35.9 7.50 95.00
36.0 - 38.9 6.00 100.00
Total 100 100
GRAPHICAL REPRESENTATION OF DATA
BMI of 120 US Adults

25
Frequencye

0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.9 27.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9

Class Interval for BMI Levels

GRAPHICAL REPRESENTATION OF DATA
Relative Frequency (%)
35

25
Relative Frequency

0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.9 27.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9

Class Interval
GRAPHICAL REPRESENTATION OF DATA
Cumulative Relative Frequency (%) of BMI
120

100
Cumulative Rel Freq (%)

0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.9 27.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9

Class Interval
GRAPHICAL REPRESENTATION OF DATA
Box plot (Box and Whisker plot)
Scatterplots
• Scatterplots display the relationship between two
continuous variables.

• Each dot on the graph has an X and Y coordinate that

corresponds with a pair of values.

Eg. BMI and %Fat

25
BMI

0
0 5 10 15 20 25 30 35

%Fat
SAMPLING
Scenario:
In a certain community, an opinion poll was
conducted to determine public sentiment
toward a health policy in an upcoming
election. The objective of the survey was to
estimate the proportion of voters in the
community who favoured the policy
Sampling
Note the following:
➢ An Element is an object on which a measurement is
taken
In our scenario, an element is a registered voter in the
community.

The measurement taken on an element is the voter’s

preference on the health policy

Because measurements are usually considered to be

numbers, the experimenter can obtain numerical data
by recording a 1 for a voter in favour of the policy and
a 0 for a voter not in favour
Sampling
Population
➢ is a collection of elements about which we wish to
make an inference

➢ a collection of people or objects that share common

observable characteristics.

The population in our example is the collection of voters

in the community

The characteristic (numerical measurement) of interest

for each member of this population is his or her
preference on the health policy
Sampling
An important task for the investigator is to
carefully and completely define the population
before collecting a sample.

This definition must include:

➢a description of the elements to be included
➢a specification of the measurements to be taken

Eg; if the population in the health policy study consists of

registered voters, then we may want to collect information
on whether or not each sampled person plans to vote in the
upcoming election.
Sampling
The population may be finite or infinite
➢ Finite Population: If the number of objects or units
in the population is count able, it is said to be a finite
population.
Eg; the number of houses in a suburb is a finite
population
➢ Infinite Population: If the number of objects or units
in the population is infinite, it is said to be an infinite
population.
Eg; the number of stars in the sky forms an infinite
population.
Sampling
In general, the population is denoted by  and its
size is denoted by N.
In the case of infinite population, N → 

➢ Target Population: A finite or infinite population

about which we require information is called target
population.
• Eg, All 19 year old females students in the
UDS.
sampling
➢Study Population: the basic finite set of
individuals we intend to study.
• Eg, All 19 year old female students of UDS
who pursue Public Health
➢Sampling units are nonoverlapping collections of
elements from the population that cover the entire
population

In the health policy example, a sampling unit may

be a registered voter in the community
Sampling
If each sampling unit contains one and only one
element of the population, then a sampling unit and
an element from the population are identical.

This situation arises if we sample individual voters

rather than households within the community; inb
our example.
Sampling
A frame is a list of sampling units.

If we specify the individual voter as the sampling

unit, a list of all registered voters may serve as a
frame for a public opinion poll
Sampling

A sample
➢is a collection of sampling units drawn from a
single frame or from multiple frames.

➢It is a subset of the population, which represents

the entire population. The sample is denoted by
s and its size by n
Sampling
Data are obtained from the elements of the sample
and used in describing the population

Let the individual voter be our sampling unit and the list
of registered voters be our frame. In the public opinion
poll, we contact a number of voters (the sample) to
determine their preference for the upcoming health policy
(measurement). We then use the information obtained
from these voters to make an inference about voter
preference throughout the community
Sampling

Advantages of Sampling
➢Reduced Cost
➢Greater Speed
➢Greater Scope
➢Greater Accuracy
Sampling
• The procedure for selecting the sample is called the
sample survey design
• The objective of a sample survey is to make an inference
about population parameters from information contained
in a sample
• Since observations cost money, a design that provides a
precise estimator of the parameter for a fixed sample size
yields a savings in cost to the experimenter
Sampling
Steps to Sample Selection
1. The objective of the survey
2. Population to be sampled
3. Data to be collected
4. Degree of precision desired
5. Methods of measurement
6. Frame
7. Selection of the sample
8. The Pretest
9. Organization of the fieldwork
[Link] and Analysis of the data
[Link] gained for future surveys
Sampling Designs
Some of the designs include
➢Simple random sampling,
➢Stratified random sampling,
➢Cluster sampling,
➢Systematic sampling.
➢Convenience sampling
➢Judgment Sampling
Etc
Sampling Methods:
Simple Random Sampling
➢If a sample of size n is drawn from a population of
size N such that every possible sample of size n
has the same chance of being selected, the
sampling procedure is called Simple Random
Sampling.

• The sample thus obtained is called a Simple

Random Sample.
Sampling Methods
Per this definition;
❖All individual elements in a population have the
same chance of being selected and that the
selection of individual elements is mutually
independent;

❖The presence or absence of a given element from

the sample does not affect the selection probability
of any other element
Sampling Methods
Stratified Random Sampling

A stratified random sample is one obtained by

separating the population elements into
nonoverlapping groups, called strata, and then
selecting a simple random sample from each
stratum.
Sampling Methods
Cluster Sampling
• In cluster sampling, the elements in the population
are first divided into separate groups called clusters.
Each element of the population belongs to one and
only one cluster

• A simple random sample of the clusters is then

taken.

• All elements within each sampled cluster form the

sample

• Cluster sampling tends to provide the best results

when the elements within the clusters are not alike
Sampling Designs
Systematic Sampling
Systematic sampling is defined as a probability
sampling method where the researcher chooses
elements from a target population by selecting a
random starting point and selects sample members
after a fixed ‘sampling interval’
Introduction to Probability
➢Probability is a numerical measure of the
likelihood that an event will occur
➢A measure of the likelihood that a particular event
will occur.
The concept of probability is relevant to experiments
that have some uncertain outcomes.

Experiment includes any activity that results in the

collection of data pertaining to phenomena that exhibit
variation.
Probability
The domain of probability encompasses all phenomena for
which outcomes cannot be exactly predicted in advance

Eg.
➢ Tossing a coin
➢ Rolling a die
➢ Gender of the first two newborns in town tomorrow
➢ Tossing two coins
➢ Rolling two dice
➢ Planting a seed, etc
Probability
Event:
An outcome or a set of outcomes of an activity or a result of
a trial.
Usually denoted by capital letters
Eg. Getting two heads in the trial of tossing three fair
coins simultaneously.
Elementary Event (Simple Event): a single possible
outcome of an experiment.
Eg. If we toss a fair coin, then the event of of a head coming
up is an elementary event
P(E) Probability event E occuring
Probability
Joint Events (Compound event): has two or more
elementary events in it.
Eg. Drawing a black ace from a pack of cards.
(this contains two elementary events of black and ace)

Sample Space (S): The collection of all possible outcomes

of an experiment

Eg. If we toss a die, one sample space, or set of all possible

outcomes, is given by S = {1, 2, 3, 4, 5, 6}
P(S) = 1
Probability
Remember:
For every event A;
➢ 0  P ( A)  1
a probability is between 0 and 1

➢ the impossible event has probability zero

P () = 0
➢ If A1 is the complement of A, then
P ( A1 ) = 1 − P ( A)
➢ If A = A1  A2  ...  An , where A1 , A2 ... An are mutually
exclusive (all can’t occur at the same time as outcome of
a single experiment) events, then;
P ( A) = P ( A1 ) + P ( A2 ) + ... + P ( An )
Probability
❖When two or more events are mutually exclusive,
then the probability that either of the events will
occur is the sum of their separate probabilities
Eg. If we roll a single fair die, then the probability
that it will come up with a face 5(event A) or face
6(event B); with both events being being exclusive
events. We have:
P ( A or B ) = P ( A B ) = P ( A) + P ( B )
P(5 or 6) = P (5 6) = P (5) + P (6)
1 1
+
6 6
2
=
6
1
=
3
Probability
However if even A and B are not mutually exclusive,
then the probability of occurrence of either event A
or event B or both is equal to:
P( A  B) = P( A) + P( B) − P( A and B)
P(A and B) is written as P( A  B) or simply as P(AB)
➢Event (A and B) consist of all those events which
are contained in both A and B simultaneously.
Probability
Eg. In an experiment of taking cards out of a pack of 52 playing cards;
we assume that:
Event A = An ace is drawn
Event B = A spade is drawn
Event AB = An ace of spade is drawn
Hence;

P ( A  B ) = P ( A) + P ( B ) − P ( AB )
4 13 1
= + −
52 52 52
16 4
= =
52 13

This is because there are 4 aces, 13 cards of spade, including 1 ace of

spade out of a total of 52 cards in the pack.
Probability
Note:
➢Independent events
➢Multiplication rule
➢Conditional Probability
➢Counting rule
➢Combination
➢Permutation
Probability
Counting Methods
• Suppose two operations A and B are carried out, and if
there are “m” different ways of carrying out A and “k”
different ways of carrying out B, then the combined
operation of A and B may be carried out in m × k = mk
different ways.
Eg. If we toss a coin 3 times, there are 2 possible
outcomes for each toss. Hence the total number of
possible outcomes in 2 x 2 x 2 = 8.
PROBABILITY DISTRIBUTION
Probability Distribution:
➢It’s a mathematical function that describes for all
possible outcomes of a random variable
➢It’s the listing of all possible outcomes of an
experiment together with their probabilities.
Eg. If we toss a fair coin two times; the following will
be the possible outcome of this experiment
Outcomes Probabilities
TT ¼
TH ¼
HT ¼
HH ¼
Probability Distribution
The probability distribution of the number of heads
obtained in this two tosses of the coin is given as:
No. of heads (X) Probability P(X)

0 1/4
1 1/2
2 1/4
Total 1.0

All probabilities must add up to 1; remember!

Probability Distribution
Random Variables:
➢They are characteristics that can be observed but
cannot be controlled.

➢They can be characteristics, measurements or

counts that vary randomly according to a function.

Random: you don’t know the value of the next

observation, but you do know the probabilities
associated with values and ranges of values.
Probability Distribution
Notation:
➢Upper case letters such as X and Y are used to
denote random variables.
➢Lower case letters such as x and y denote the
values of random variables.

Random variables can either be discrete or

continuous
Probability Distribution
Probability Distribution Function (PDF) of Discrete
Random Variables
➢ The table that shows the values x in one column and the
corresponding probabilities, p(x) in another column is
called a probability distribution function (PDF)
In our earlier example on the probability distribution of the
number of heads:
No. of heads (X) Probability P(X)
0 1/4
1 ½
2 ¼
Total 1.0
Probability Distribution
Thus, the p(x) satisfies the following characteristics;
➢. 0  p ( x)  1
Each probability is between zero and one, inclusive.
N
➢.
 p( x ) = 1
i =1
i

The sum of the probabilities is one

Mean/Expected Value and Variance of a Probability Dist
Let X be random variable, and x1, x2, . . . xn, the list
of possible outcomes for X.
Then the mean of the distribution and expected
value of X are the same quantity, given by:
n
 = E ( X ) =  xi .P( xi ), i = 1, 2,3,..., n
i =1

Eg. Assume that we toss three fair coins

simultaneously, the possible number of heads that
can appear as a result of the random experiment are
as shown:
Mean/Expected Value and Variance of a Probability Distribution

Outcome No. of Heads Probabilities

TTT 0 1/8

HTT 1 1/8
TTH 1 1/8
THT 1 1/8
THH 2 1/8
HHT 2 1/8
HTH 2 1/8
HHH 3 1/8
Mean
Summarizing to the number of heads occurring in the entire
experiment and their corresponding probabilities, we have;
No of Heads (X) P(X) X.P(X)
0 1/8 0
1 3/8 3/8
2 3/8 ¾
3 1/8 3/8
Total 1.0 3/2
The expected value is;
n
E ( X ) =  X i P ( X i ) , i = 1, 2, 3,..., n
i =1

3 3 3
= 0+ + +
8 4 8
12
= = 1.5
8
Mean
The result means that on the average, 1.5 heads can
be expected to appear as a result of every random
experiment of tossing three fair coins at any one
time.

Eg.
Variance
Let X be random variable, and x1, x2, . . . xn, the list
of possible outcomes for X.
Then:
The variance of the random variable is:
n
 = Var ( X ) =  ( X i −  ) P( X i ), i = 1,2,3,..., n
2 2

i =1

And standard deviation is

n
= (x − ) P ( xi ), i = 1, 2,3,..., n
2
i
1=1
Variance
From our random experiment of tossing three fair
coins once;
No of Heads (X) P ( )
X  ( X −  ) ( X −  )
2
( X −  ) P( X )
2

0 1/8 1.5 - 1.5 2.25 0.28

1 3/9 1.5 - 0.5 0.25 0.09
2 3/8 1.5 0.5 0.25 0.09
3 1/8 1.5 1.5 2.25 0.28
Total 0.74

Variance =  =  ( X −  ) P ( X )
2 2

= 0.74
Standard Deviation =  =  = 0.74 = 0.86 2
Binomial Distribution
Binomial distribution is a special discrete probability
distribution.
➢Its one of the simplest and most frequently used
discrete probability distributions

❖There are four conditions that the experiment has

to meet to be considered a binomial experiment:
i. There are a fixed number of trials.
Think of Trials as repetitions of an experiment.
The letter n denotes the number of trials.
Binomial Distribution
➢ There are only two possible and mutually exclusive
outcomes, called "success" and "failure," for each trial.

➢ The n trials are independent and are repeated using

identical conditions.

Because the n trials are independent, the outcome

of one trial does not help in predicting the outcome of
another trial.

➢ The letter p denotes the probability of a success on one

trial, and q, which is also (1-p); denotes the probability of
a failure on one trial, so p + q = 1.

Since the trials are independent, p and q remain the

same for each trial.
Binomial Distribution

Notation:
➢The outcomes of a binomial experiment fit a
binomial probability distribution.

➢The random variable X = the number of successes

obtained in the n independent trials.
➢. X B ( n, p ),

This notation states that the random variable X is a

binomial distribution with n trials and the
probability of success , p.
Binomial Distribution
➢. P ( x = k ) = nCk p k q n −k , this is the Binomial
Formula
(the probability of k successes out of n trials)
Where:

n n!
n
Ck =   =
 k  k !( n − k ) !
p = Probability of success
q = Probability of failure (1 – p)
k = number of successes desired
n = number of trials
Binomial Distribution
The formula is also expressed as
 n  k n−k
p ( x) =   p q
k 

Eg. If a new drug is found to be effective 40% of the

time, then what is the probability that in a random
sample of 4 patients, it will be effective on 2 of
them?
Soln.
Let us define effective as success and non-effective
as failure. Then;
Binomial Distribution
p = 0.4 (since the drug is effective 40% of the time)
q = (1 – p) = (1 – 0.4) = 0.6
k=2
n=4
n
P( x) =   ( p ) ( q )
k n−k

k 
 4
=   ( 0.4 ) ( 0.6 )
2 2

 2
4!
= ( 0.4 ) ( 0.6 )
2 2

2!( 4 − 2 ) !
= 6  0.16  0.36
= 0.3456
Binomial Distribution
Mean;  = np
Variance;  2 = npq
Standard deviation;  = npq

From our earlier example:

 = ( 4 ) ( 0.4 ) = 1.6

 = ( 4 )( 0.4 )( 0.6 ) = 0.96

= ( 4 )( 0.4 )( 0.6 ) = 0.9798

Binomial Distribution
Eg1. Suppose you play a game that you can only either win or lose. The
probability that you win any game is 55%, and the probability that you
lose is 45%. Each game you play is independent.
a. If you play the game 20 times, write the function that describes
the probability that you win will 15 of the 20 times.
b. Find the mean number of wins
c. Find the standard deviation of wins

Eg.2. A trainer is teaching a dolphin to do tricks. The probability that

the dolphin successfully performs the trick is 35%, and the probability
that the dolphin does not successfully perform the trick is 65%. Out of
ten attempts, you want to find the probability that the dolphin
succeeds at most 5 times. State the probability question
mathematically.
Poisson Distribution
It differs from the binomial distribution in the sense that:

➢ in the Binomial distribution we must be able to count

the number of successes and the number of failures;

….with the goal to look for the probability of a

specific value of success in n trials

➢ while in Poisson distribution , all we want is to know

the average number of successes in a given unit of
time.

… to look for the specific number of

occurrences in a specific amount of time or space.
Conditions of a Poisson Experiment
➢ The experiment consists of counting the number of events
occurring in a fixed interval of time or space if these
events happen with a known average rate and
independently of the time since the last event.

➢ The probability of the event remains constant for each

interval of equal length.

➢ The number of occurrences in one fixed interval is random

and independent of the number of occurrences in other
fixed intervals.

➢ The random variable X = the number of occurrences in the

interval of interest
Poisson Distribution

▪ We need to know the average of events per unit

of time;  ( lambda )
This could be:
➢The average number of cars passing under a
bridge in any given hour
➢The average number of a machine breakdowns
per month
➢The average number of patience arriving at a
facility per day
➢Etc.
Poisson Distribution
➢The probability that exactly (x) events will occur in
a given time is given as:

 e
x −
P ( x) =
x!
Where:
 = Average number of occurrences per unit time
e = the base of the natural logarithms (2.71828…)
 = np
= 
Poisson Distribution

Eg.1: Assume that on an average 3

persons enter the lab for service every 10
minutes. What is the probability that
exactly 5 customers will enter the lab in a
given 10 minute period, assuming that the
process can be described by Poisson
distribution
Poisson Distribution
Soln.
x=5
 =3

 x e−
P ( x) =
x!
( 3) ( 2.71828 )
5 −3

=
5!

=
( 243 ) ( .0498 )
120
= 0.1008
Poisson Distribution
Eg.2. Customers arrive at a photocopying machine at
an average rate of 2 every 10 minutes. The number
of arrivals is distributed according to a Poisson
distribution.

What is the probability that:

a. There will be no arrival during this time period
b. There will be exactly one arrival during this time
period
c. There will be more than two arrivals during this
time period
Poisson Distribution
Soln.
From our problem, we have:
=2
 x e−
P ( x) = , for x = 0,1, 2,...
x!
( 2 ) ( 2.71828)
0 −2

a. For x = 0; P ( 0) = = 0.1353
0!

( 2 ) ( 2.71828 )
1 −2

b. For x = 1; P (1) = = 0.2707

( 2 ) ( 2.71828 )
2 −2

a. For x = 2; P ( 2 ) = = 0.2707
2!
Poisson Distribution
Then, the probability of more than 10 arrivals in a
10 minute period is

P ( x  2 ) = 1 −  P ( 0 ) + P (1) + P ( 2 ) 
= 1 − .1353 + 02707 + .2707 
= 1 − 6767
= 0.3233
CONTINUOUS RANDOM VARIABLES

Advanced Biostatistics Course Overview
No ratings yet
Advanced Biostatistics Course Overview
134 pages
Data Collection and Presentation Methods
No ratings yet
Data Collection and Presentation Methods
4 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
146 pages
Data Collection and Analysis Techniques
No ratings yet
Data Collection and Analysis Techniques
47 pages
Data Collection Methods and Techniques
No ratings yet
Data Collection Methods and Techniques
57 pages
Descriptive Data Analysis
No ratings yet
Descriptive Data Analysis
15 pages
Data Collection and Presentation
No ratings yet
Data Collection and Presentation
58 pages
02.1 - Sources of Data
No ratings yet
02.1 - Sources of Data
26 pages
Data Collection Methods Explained
No ratings yet
Data Collection Methods Explained
58 pages
Medical Data Presentation Guide
No ratings yet
Medical Data Presentation Guide
67 pages
Introduction to Biostatistics Basics
No ratings yet
Introduction to Biostatistics Basics
52 pages
Understanding Data and Variables in Statistics
100% (1)
Understanding Data and Variables in Statistics
17 pages
Biostatistics: Data, Variables, and Methods
No ratings yet
Biostatistics: Data, Variables, and Methods
195 pages
Biostatistics & Demography (CMH1111) : Lecturer: Ms Fatumah Nakku
No ratings yet
Biostatistics & Demography (CMH1111) : Lecturer: Ms Fatumah Nakku
35 pages
BOT 315 Slide
No ratings yet
BOT 315 Slide
20 pages
Biostatistics Lecture 2
No ratings yet
Biostatistics Lecture 2
52 pages
Public Health Data Collection Methods
No ratings yet
Public Health Data Collection Methods
72 pages
Biostatistics: Key Concepts and Methods
No ratings yet
Biostatistics: Key Concepts and Methods
60 pages
1 - Intro To Bio - Data Types&pres - SFB
No ratings yet
1 - Intro To Bio - Data Types&pres - SFB
71 pages
Data Collection and Processing Techniques
No ratings yet
Data Collection and Processing Techniques
99 pages
Data Presentation and Interpretation Guide
No ratings yet
Data Presentation and Interpretation Guide
127 pages
Basic Statistics and Data Management
No ratings yet
Basic Statistics and Data Management
62 pages
Biostatistics Course Outline and Pre-Test
No ratings yet
Biostatistics Course Outline and Pre-Test
101 pages
Lectures Total
No ratings yet
Lectures Total
269 pages
ARM Reviewer
No ratings yet
ARM Reviewer
9 pages
Understanding Medical Statistics Basics
No ratings yet
Understanding Medical Statistics Basics
40 pages
Data Handling and Statistical Methods
No ratings yet
Data Handling and Statistical Methods
35 pages
Health Statistics 1 3 1
No ratings yet
Health Statistics 1 3 1
170 pages
Choosing and Using Quantitative Research Methods and Tools
No ratings yet
Choosing and Using Quantitative Research Methods and Tools
28 pages
Understanding Biostatistics: Data Types & Analysis
No ratings yet
Understanding Biostatistics: Data Types & Analysis
24 pages
Biostatistics Guide for Paramedics
No ratings yet
Biostatistics Guide for Paramedics
67 pages
Understanding Body Temperature Norms
No ratings yet
Understanding Body Temperature Norms
16 pages
Unit-10 Data Management & Presentation Provid by Immam
No ratings yet
Unit-10 Data Management & Presentation Provid by Immam
42 pages
Introduction to Public Health Statistics
100% (1)
Introduction to Public Health Statistics
22 pages
2011 02 08 Data Analysis
No ratings yet
2011 02 08 Data Analysis
47 pages
Data Collection and Processing Methods
No ratings yet
Data Collection and Processing Methods
115 pages
Data Collection by DR Poonam
No ratings yet
Data Collection by DR Poonam
51 pages
Bio Statistics
No ratings yet
Bio Statistics
86 pages
PHS203 Biostatistics 2
No ratings yet
PHS203 Biostatistics 2
34 pages
Introduction to Biostatistics Techniques
No ratings yet
Introduction to Biostatistics Techniques
93 pages
Biostatistics: Dr. Naresh Manandhar Associate Professor Department of Community Medicine
100% (1)
Biostatistics: Dr. Naresh Manandhar Associate Professor Department of Community Medicine
76 pages
Share Mbbs - Lecture2 2024 (1) - 1
No ratings yet
Share Mbbs - Lecture2 2024 (1) - 1
39 pages
Chapter 3 Introduction To Health Data Collection
No ratings yet
Chapter 3 Introduction To Health Data Collection
15 pages
Intro Stat
No ratings yet
Intro Stat
47 pages
Introduction to Data Management Concepts
No ratings yet
Introduction to Data Management Concepts
54 pages
Data Management and Statistical Analysis
No ratings yet
Data Management and Statistical Analysis
13 pages
Session 1
No ratings yet
Session 1
46 pages
0 Ppt1 Introduction To Biostatistics123
No ratings yet
0 Ppt1 Introduction To Biostatistics123
59 pages
Bioe Prelims
No ratings yet
Bioe Prelims
76 pages
Data Collection and Analysis Guide
No ratings yet
Data Collection and Analysis Guide
35 pages
Social Research Methodology Guide
No ratings yet
Social Research Methodology Guide
36 pages
Understanding Business Statistics Basics
No ratings yet
Understanding Business Statistics Basics
38 pages
24 Data Collection Lecture DR - NJ 2023
No ratings yet
24 Data Collection Lecture DR - NJ 2023
52 pages
Introduction to Biostatistics Concepts
No ratings yet
Introduction to Biostatistics Concepts
283 pages
Data Collection Methods and Types
No ratings yet
Data Collection Methods and Types
70 pages
Data Collection Methods Explained
No ratings yet
Data Collection Methods Explained
57 pages
Biostatistics Lecture Notes Overview
No ratings yet
Biostatistics Lecture Notes Overview
208 pages
Introduction to Biostatistics Overview
No ratings yet
Introduction to Biostatistics Overview
54 pages
Biostatistics in Neonatal Nursing Guide
No ratings yet
Biostatistics in Neonatal Nursing Guide
19 pages
Grammar Worksheet
No ratings yet
Grammar Worksheet
8 pages
3GPP TS 34.108 Release 6 Testing Configurations
No ratings yet
3GPP TS 34.108 Release 6 Testing Configurations
459 pages
Drama Project TD
No ratings yet
Drama Project TD
15 pages
4700 00001-Up MM LNK 1160093A
No ratings yet
4700 00001-Up MM LNK 1160093A
307 pages
Class, Control, and Classical Music Anna Bull Digital Version 2025
No ratings yet
Class, Control, and Classical Music Anna Bull Digital Version 2025
87 pages
Maintenance Supervisor Job Description
No ratings yet
Maintenance Supervisor Job Description
12 pages
LWU 4 Unit 5 Challenge Test
100% (3)
LWU 4 Unit 5 Challenge Test
3 pages
Supreme Court Petition: Simon vs. Silver Films
100% (3)
Supreme Court Petition: Simon vs. Silver Films
7 pages
RSG - Alrawabi
No ratings yet
RSG - Alrawabi
6 pages
- إشكالية مفهوم أخلاق بين فلسفة أخلاق وعلم أخلاق the problematic concept of ethics between the philosophy of ethics and morality
No ratings yet
- إشكالية مفهوم أخلاق بين فلسفة أخلاق وعلم أخلاق the problematic concept of ethics between the philosophy of ethics and morality
6 pages
Recruitment and Selection Process of Robi
No ratings yet
Recruitment and Selection Process of Robi
15 pages
"Stages of Human Development" Sigmund Freud's Psychosexual Theory
No ratings yet
"Stages of Human Development" Sigmund Freud's Psychosexual Theory
2 pages
Social Impact Monitoring Guide
No ratings yet
Social Impact Monitoring Guide
20 pages
Advance Presentation Skills
No ratings yet
Advance Presentation Skills
22 pages
FM 5551 2016 - Strainers For Use With Water Spray Systems
100% (1)
FM 5551 2016 - Strainers For Use With Water Spray Systems
17 pages
Eco Friendly Bulb Front Page Created Merged
No ratings yet
Eco Friendly Bulb Front Page Created Merged
6 pages
Cloud Computing and DevOps Job Guarantee Program
No ratings yet
Cloud Computing and DevOps Job Guarantee Program
16 pages
Catalogue Mac 2024 - 240301 - 000709
No ratings yet
Catalogue Mac 2024 - 240301 - 000709
44 pages
Qristianobis Kvlevebi 2011 N6
No ratings yet
Qristianobis Kvlevebi 2011 N6
336 pages
Present Perfect (Just, Already, Yet)
No ratings yet
Present Perfect (Just, Already, Yet)
85 pages
33KV Feeder Control Panel Details
No ratings yet
33KV Feeder Control Panel Details
60 pages
Les Réseaux: Édition 2024-2026
No ratings yet
Les Réseaux: Édition 2024-2026
49 pages
For The Strength of Youth
No ratings yet
For The Strength of Youth
13 pages
Art of Joinery
100% (3)
Art of Joinery
29 pages
Leaders in Training Program Invitation
No ratings yet
Leaders in Training Program Invitation
10 pages
Intro Psych Ques Set 1
No ratings yet
Intro Psych Ques Set 1
5 pages
HVAC Market Entry Strategy in Brazil
No ratings yet
HVAC Market Entry Strategy in Brazil
12 pages
Level 1 ISO Previous Year Papers Class 2 - Olympiad Topper
No ratings yet
Level 1 ISO Previous Year Papers Class 2 - Olympiad Topper
10 pages
The Byzantine Gold Standard Explained
No ratings yet
The Byzantine Gold Standard Explained
27 pages
Job Advert - National Bank of Malawi PLC - Resourcing Officer
No ratings yet
Job Advert - National Bank of Malawi PLC - Resourcing Officer
2 pages