0% found this document useful (0 votes)
21 views95 pages

Mat 101 Basic Statistics PPP Iph

Uploaded by

climbercollins6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views95 pages

Mat 101 Basic Statistics PPP Iph

Uploaded by

climbercollins6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIVERSITY FOR DEVELOPMENT STUDIES

SCHOOL OF PUBLIC HEALTH


DEPARTMENT OF GLOBAL AND INTERNATIONAL
HEALTH
BSC PUBLIC HEALTH
Introductory statistics (MAT 101)

William Nkegbe
DATA
Data: The numerical result of any scientific
experiment

It is the information that you are gathering for


an inquiry
Variate : Each quantity or attributes recorded on
each respondent or subject
Observation:- each individual piece of data
• Data set/ Data Matrix:- Collection of all
observations for particular variables
DATA Cont.
Data is categorized into two major categories. It is
either quantitative or qualitative
QUANTITATIVE DATA: they are observations that are
measured on a numerical
scale.
They are numbers, and as a result of counting
or measuring attributes of a population

Eg. pulse rate, weight, amount of money, number of


students who take statistics, temperature, etc
DATA Cont.
Quantitative data may either be discrete or
continuous

Quantitative Discrete Data: they are the result of


counting, and they take only certain numerical values.
Eg.
• number of phone calls
• number of hospital visits,
• number of infected persons
• number of diarrhea episodes experienced in a year
• etc.
They take the numbers 0,1,2,3……..
DATA Cont

Quantitative Continuous Data: this data are


as a result of measuring

Eg. height, weight, temperature etc.

These can take any numeric value, and the


scale can be meaningfully divided into smaller
increments including fractional and decimal
values
DATA Cont
QUALITATIVE DATA:
➢They are the result of categorizing or describing
attributes of a population.

➢Their values CANNOT be put in any numerical


order.
➢They are observations that cannot be put in any
numerical order.
➢They are categorical rather than numerical and
not capable of being measured
Eg.
Blood type, sex, hair colour, political affiliation, etc
DATA Cont
Qualitative data can either be:

➢Ordinal; if there are intrinsic ordering about its


categories;
Eg. Severity of a disease, or a variable with the
categories good, adequate or poor)
Or

➢Nominal; if its categories are unordered and


mutually exclusive.
Eg. Gender, marital status, flower colour, etc
SOURCES OF DATA
There are two main sources of data

❖Primary Data: One which is collected by the


researcher himself for the purpose of a specific
inquiry or study

Such data is original in character and is generated by


surveys conducted by individuals or research
institutions

Eg;. if we are interested to know what women think


about the issue of abortion, we must undertake a
survey and collect data on the opinions of women by
asking relevant questions.
SOURCES OF DATA Cont
Secondary Data: data which has been collected by
other agencies and used by a researcher for
his own purpose.
• It can be obtained from research organizations,
journals, reports , government publications, etc
Eg. Climate data can be accessed from the Ghana
Meteorological Agency to assess the weather
conditions of a particular area.
DATA COLLECTION
➢Data collection is the process of gathering and
measuring information on variables of interest, in
an established systematic fashion that enables
one to answer stated research questions, test
hypotheses, and evaluate outcomes.

➢The data collection component of research is


common to all fields of study including physical
and social sciences, humanities, business, etc
Data Collection Cont
➢ While methods vary by discipline, the emphasis on
ensuring accurate and honest collection remains the
same

▪ It is importance to ensure accurate and appropriate


data collection

▪ accurate data collection is essential to maintaining


the integrity of research

▪ Hence maintaining the integrity of data collection is


very necessary.
Data Collection Cont
Note the following:

➢ As you begin thinking about a research question, also


begin thinking about the type of data you will have to
collect to answer that question.

❖ Interview?
❖Questionnaire?
❖Paper and pencil?
❖Computer?

Find out how other people have done it in the past by


reading the relevant literature
Data Collection Cont
➢ Think about WHERE you will be getting the data

➢ Make sure that the data collection forms you use are
clear and easy to use.
▪ Do a pilot and practice on the collected data

➢ Always make a duplicate copy of the data file and the


data collection sheets and keep them in a separate
location

➢ Do not rely on other people to collect or transfer


your data unless you have personally trained them
Data Collection Cont
➢As much as possible, cultivate possible sources
for your subject pool.

➢Try to follow up on subjects who missed their


testing session or interview

➢Never discard the original data

➢The type or form of data collection depends on


the type of study you are undertaking
Data Collection Cont
Eg
• Surveys and Cross-Sectional Studies
• Retrospective Studies
• Prospective Studies
• Experimental Studies and Quality Control
• Clinical Trials
• Epidemiological Studies
• Pharmacoeconomic Studies and Quality of Life
• Etc
DATA ENTRY
As an investigator, once you know what information
you want and how you are going to obtain it, you
need to decide how you will assign names and
attributes to the data.

➢Each type of observation should be given a name.

Eg
If one is studying systolic blood pressure of males by
age, one could have a small data set that contains
Data Entry
• ID
• Age,
• Systolic (an abbreviation for systolic blood
pressure),
• Gender and
• Smoke (smoking status)
• Etc


Note that abbreviated words are often used to
d

identify the variables


Data Entry Cont.
Depending on the software the researcher is using,
data can be entered directly or could be entered
into one and imported to another.

The researcher could have a data as shown

Id Age Systolic Sex Smoke


1 57 123 1 1
2 71 137 2 2
3 35 128 1 3
4 60 155 2 1
Data Cont.
Note from the table that:
➢Each row represents a new case and each
column a different type of observation (called
variables)
➢ You will need to code each qualitative response
in advance.
Here;
1 = male and 2 = female.
Also, 1 = current smoker, 2 = former smoker, and
3 = nonsmoker
➢Each person or item number is given an ID
number
Data Entry Cont.

Note also that:

❖In medical and many other studies, names are


often not used, for confidentiality reasons.

❖If the investigator wishes to compare the results for


two or more groups, a variable should be entered
that represents group identity.
ORGANISING / REPRESENTING DATA
Data can be organised into:
• Ordered Arrays
• The Frequency Table
• Relative Frequency Tables
• Stem and Leaf Tables

And or represented on Graphs


• The Histograms
• The Frequency Polygon
• Box-and-whisker plot
• Dot plot
• Distribution Curves Problems
Organizing Data Cont.
The data below is a list of 120 values of Body Mass Index (BMI) data
from the 1998 National Health Interview Survey on US adults.

27.4 31.0 34.2 28.9 25.7 37.1 24.8 34.9 27.5 25.9
23.5 30.9 27.4 25.9 22.3 21.3 37.8 28.8 28.8 23.4
21.9 30.2 24.7 36.6 25.4 21.3 22.9 24.2 27.1 23.1
28.6 27.3 22.7 22.7 27.3 23.1 22.3 32.6 29.5 38.8
21.9 24.3 26.5 30.1 27.4 24.5 22.8 24.3 30.9 28.7
22.4 35.9 30.0 26.2 27.4 24.1 19.8 26.9 23.3 28.4
20.8 26.5 28.2 18.3 30.8 27.6 21.5 33.6 24.8 28.3
25.0 35.8 25.4 27.3 23.0 25.7 22.3 35.5 29.8 27.4
31.3 24.0 25.8 21.1 21.1 29.3 24.0 22.5 32.8 38.2
27.3 19.2 26.6 30.3 31.6 25.4 34.8 24.7 25.6 28.3
26.5 28.3 35.0 20.2 37.5 25.8 27.5 28.8 31.1 28.7
24.1 24.0 20.7 24.6 21.1 21.9 30.8 24.6 33.2 31.6
Organizing Data Cont
Ordered Array
18.3 21.9 23.0 24.3 25.4 26.6 27.5 28.8 30.9 34.8
19.2 21.9 23.1 24.3 25.6 26.9 27.5 28.8 30.9 34.9
19.8 21.9 23.1 24.5 25.7 27.1 27.6 28.9 31.0 35.0
20.2 22.3 23.3 24.6 25.7 27.3 28.2 29.3 31.1 35.5
20.7 22.3 23.4 24.6 25.8 27.3 28.3 29.5 31.3 35.8
20.8 22.3 23.5 24.7 25.8 27.3 28.3 29.8 31.6 35.9
21.1 22.4 24.0 24.7 25.9 27.3 28.3 30.0 31.6 36.6
21.1 22.5 24.0 24.8 25.9 27.4 28.4 30.1 32.6 37.1
21.1 22.7 24.0 24.8 26.2 27.4 28.6 30.2 32.8 37.5
21.3 22.7 24.1 25 26.5 27.4 28.7 30.3 33.2 37.8
21.3 22.8 24.1 25.4 26.5 27.4 28.7 30.8 33.6 38.2
21.5 22.9 24.2 25.4 26.5 27.4 28.8 30.8 34.2 38.8

By inspection from the table of ordered array, we find that the lowest and
highest values are 18.3 and 38.8, respectively
Frequency Distribution Tables
We will use the given data to help us create equally
spaced intervals for tabulating frequencies of data

Determine the width of the intervals

➢ Although the number of intervals that one may


choose for a frequency distribution is arbitrary, the
actual number should depend on the range of the
data and the number of cases

➢ For a data set of 100 to 150 observations, the


number (of intervals) chosen usually ranges from
about five to ten
Frequency Distribution Cont.
In the present example, the range of the data is
38.8 – 18.3 = 20.5

Suppose we divide the data set into seven intervals.

Then, we have
20.5 ÷ 7 = 2.93
Which rounds to 3.0

• Consequently, the intervals will have a width of


three (3).
Frequency Distribution Cont.
• These seven intervals are as follows:
• 18.0 – 20.9
• 21.0 – 23.9
• 24.0 – 26.9
• 27.0 – 29.9
• 30.0 – 32.9
• 33.0 – 35.9
• 36.0 – 38.9
Frequency Distribution Cont.
Cumulative Frequency of BMI
Class Interval Frequency (f) Cumulative Relative
for BMI Levels frequency (cf) Frequency (%)
18.0 - 20.9 6 6 5.00
21.0 - 23.9 24 30 20.00
24.0 - 26.9 32 62 26.67
27.0 - 29.9 28 90 33.33
30.0 - 32.9 15 105 12.50
33.0 - 35.9 9 114 7.50
36.0 - 38.9 6 120 6.00
Total 120 -- 100.00
Frequency Distribution Cont.
Relative Frequency Table for BMI Levels
Class Interval for Relative Frequency Cumulative Relative
BMI Levels (%) Frequency (%)

18.0 - 20.9 5.00 5.00


21.0 - 23.9 20.00 25.00
24.0 - 26.9 26.67 51.65
27.0 - 29.9 33.33 75.00
30.0 - 32.9 12.50 87.50
33.0 - 35.9 7.50 95.00
36.0 - 38.9 6.00 100.00
Total 100 100
GRAPHICAL REPRESENTATION OF DATA
BMI of 120 US Adults

35

30

25
Frequencye

20

15

10

0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.9 27.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9

Class Interval for BMI Levels


GRAPHICAL REPRESENTATION OF DATA
Relative Frequency (%)
35

30

25
Relative Frequency

20

15

10

0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.9 27.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9

Class Interval
GRAPHICAL REPRESENTATION OF DATA
Cumulative Relative Frequency (%) of BMI
120

100
Cumulative Rel Freq (%)

80

60

40

20

0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.9 27.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9

Class Interval
GRAPHICAL REPRESENTATION OF DATA
Box plot (Box and Whisker plot)
Scatterplots
• Scatterplots display the relationship between two
continuous variables.

• Each dot on the graph has an X and Y coordinate that


corresponds with a pair of values.

Eg. BMI and %Fat


45

40

35

30

25
BMI

20

15

10

0
0 5 10 15 20 25 30 35

%Fat
SAMPLING
Scenario:
In a certain community, an opinion poll was
conducted to determine public sentiment
toward a health policy in an upcoming
election. The objective of the survey was to
estimate the proportion of voters in the
community who favoured the policy
Sampling
Note the following:
➢ An Element is an object on which a measurement is
taken
In our scenario, an element is a registered voter in the
community.

The measurement taken on an element is the voter’s


preference on the health policy

Because measurements are usually considered to be


numbers, the experimenter can obtain numerical data
by recording a 1 for a voter in favour of the policy and
a 0 for a voter not in favour
Sampling
Population
➢ is a collection of elements about which we wish to
make an inference

➢ a collection of people or objects that share common


observable characteristics.

The population in our example is the collection of voters


in the community

The characteristic (numerical measurement) of interest


for each member of this population is his or her
preference on the health policy
Sampling
An important task for the investigator is to
carefully and completely define the population
before collecting a sample.

This definition must include:


➢a description of the elements to be included
➢a specification of the measurements to be taken

Eg; if the population in the health policy study consists of


registered voters, then we may want to collect information
on whether or not each sampled person plans to vote in the
upcoming election.
Sampling
The population may be finite or infinite
➢ Finite Population: If the number of objects or units
in the population is count able, it is said to be a finite
population.
Eg; the number of houses in a suburb is a finite
population
➢ Infinite Population: If the number of objects or units
in the population is infinite, it is said to be an infinite
population.
Eg; the number of stars in the sky forms an infinite
population.
Sampling
In general, the population is denoted by  and its
size is denoted by N.
In the case of infinite population, N → 

➢ Target Population: A finite or infinite population


about which we require information is called target
population.
• Eg, All 19 year old females students in the
UDS.
sampling
➢Study Population: the basic finite set of
individuals we intend to study.
• Eg, All 19 year old female students of UDS
who pursue Public Health
➢Sampling units are nonoverlapping collections of
elements from the population that cover the entire
population

In the health policy example, a sampling unit may


be a registered voter in the community
Sampling
If each sampling unit contains one and only one
element of the population, then a sampling unit and
an element from the population are identical.

This situation arises if we sample individual voters


rather than households within the community; inb
our example.
Sampling
A frame is a list of sampling units.

If we specify the individual voter as the sampling


unit, a list of all registered voters may serve as a
frame for a public opinion poll
Sampling

A sample
➢is a collection of sampling units drawn from a
single frame or from multiple frames.

➢It is a subset of the population, which represents


the entire population. The sample is denoted by
s and its size by n
Sampling
Data are obtained from the elements of the sample
and used in describing the population

Let the individual voter be our sampling unit and the list
of registered voters be our frame. In the public opinion
poll, we contact a number of voters (the sample) to
determine their preference for the upcoming health policy
(measurement). We then use the information obtained
from these voters to make an inference about voter
preference throughout the community
Sampling

Advantages of Sampling
➢Reduced Cost
➢Greater Speed
➢Greater Scope
➢Greater Accuracy
Sampling
• The procedure for selecting the sample is called the
sample survey design
• The objective of a sample survey is to make an inference
about population parameters from information contained
in a sample
• Since observations cost money, a design that provides a
precise estimator of the parameter for a fixed sample size
yields a savings in cost to the experimenter
Sampling
Steps to Sample Selection
1. The objective of the survey
2. Population to be sampled
3. Data to be collected
4. Degree of precision desired
5. Methods of measurement
6. Frame
7. Selection of the sample
8. The Pretest
9. Organization of the fieldwork
[Link] and Analysis of the data
[Link] gained for future surveys
Sampling Designs
Some of the designs include
➢Simple random sampling,
➢Stratified random sampling,
➢Cluster sampling,
➢Systematic sampling.
➢Convenience sampling
➢Judgment Sampling
Etc
Sampling Methods:
Simple Random Sampling
➢If a sample of size n is drawn from a population of
size N such that every possible sample of size n
has the same chance of being selected, the
sampling procedure is called Simple Random
Sampling.

• The sample thus obtained is called a Simple


Random Sample.
Sampling Methods
Per this definition;
❖All individual elements in a population have the
same chance of being selected and that the
selection of individual elements is mutually
independent;

❖The presence or absence of a given element from


the sample does not affect the selection probability
of any other element
Sampling Methods
Stratified Random Sampling

A stratified random sample is one obtained by


separating the population elements into
nonoverlapping groups, called strata, and then
selecting a simple random sample from each
stratum.
Sampling Methods
Cluster Sampling
• In cluster sampling, the elements in the population
are first divided into separate groups called clusters.
Each element of the population belongs to one and
only one cluster

• A simple random sample of the clusters is then


taken.

• All elements within each sampled cluster form the


sample

• Cluster sampling tends to provide the best results


when the elements within the clusters are not alike
Sampling Designs
Systematic Sampling
Systematic sampling is defined as a probability
sampling method where the researcher chooses
elements from a target population by selecting a
random starting point and selects sample members
after a fixed ‘sampling interval’
Introduction to Probability
➢Probability is a numerical measure of the
likelihood that an event will occur
➢A measure of the likelihood that a particular event
will occur.
The concept of probability is relevant to experiments
that have some uncertain outcomes.

Experiment includes any activity that results in the


collection of data pertaining to phenomena that exhibit
variation.
Probability
The domain of probability encompasses all phenomena for
which outcomes cannot be exactly predicted in advance

Eg.
➢ Tossing a coin
➢ Rolling a die
➢ Gender of the first two newborns in town tomorrow
➢ Tossing two coins
➢ Rolling two dice
➢ Planting a seed, etc
Probability
Event:
An outcome or a set of outcomes of an activity or a result of
a trial.
Usually denoted by capital letters
Eg. Getting two heads in the trial of tossing three fair
coins simultaneously.
Elementary Event (Simple Event): a single possible
outcome of an experiment.
Eg. If we toss a fair coin, then the event of of a head coming
up is an elementary event
P(E) Probability event E occuring
Probability
Joint Events (Compound event): has two or more
elementary events in it.
Eg. Drawing a black ace from a pack of cards.
(this contains two elementary events of black and ace)

Sample Space (S): The collection of all possible outcomes


of an experiment

Eg. If we toss a die, one sample space, or set of all possible


outcomes, is given by S = {1, 2, 3, 4, 5, 6}
P(S) = 1
Probability
Remember:
For every event A;
➢ 0  P ( A)  1
a probability is between 0 and 1

➢ the impossible event has probability zero


P () = 0
➢ If A1 is the complement of A, then
P ( A1 ) = 1 − P ( A)
➢ If A = A1  A2  ...  An , where A1 , A2 ... An are mutually
exclusive (all can’t occur at the same time as outcome of
a single experiment) events, then;
P ( A) = P ( A1 ) + P ( A2 ) + ... + P ( An )
Probability
❖When two or more events are mutually exclusive,
then the probability that either of the events will
occur is the sum of their separate probabilities
Eg. If we roll a single fair die, then the probability
that it will come up with a face 5(event A) or face
6(event B); with both events being being exclusive
events. We have:
P ( A or B ) = P ( A B ) = P ( A) + P ( B )
P(5 or 6) = P (5 6) = P (5) + P (6)
1 1
+
6 6
2
=
6
1
=
3
Probability
However if even A and B are not mutually exclusive,
then the probability of occurrence of either event A
or event B or both is equal to:
P( A  B) = P( A) + P( B) − P( A and B)
P(A and B) is written as P( A  B) or simply as P(AB)
➢Event (A and B) consist of all those events which
are contained in both A and B simultaneously.
Probability
Eg. In an experiment of taking cards out of a pack of 52 playing cards;
we assume that:
Event A = An ace is drawn
Event B = A spade is drawn
Event AB = An ace of spade is drawn
Hence;

P ( A  B ) = P ( A) + P ( B ) − P ( AB )
4 13 1
= + −
52 52 52
16 4
= =
52 13

This is because there are 4 aces, 13 cards of spade, including 1 ace of


spade out of a total of 52 cards in the pack.
Probability
Note:
➢Independent events
➢Multiplication rule
➢Conditional Probability
➢Counting rule
➢Combination
➢Permutation
Probability
Counting Methods
• Suppose two operations A and B are carried out, and if
there are “m” different ways of carrying out A and “k”
different ways of carrying out B, then the combined
operation of A and B may be carried out in m × k = mk
different ways.
Eg. If we toss a coin 3 times, there are 2 possible
outcomes for each toss. Hence the total number of
possible outcomes in 2 x 2 x 2 = 8.
PROBABILITY DISTRIBUTION
Probability Distribution:
➢It’s a mathematical function that describes for all
possible outcomes of a random variable
➢It’s the listing of all possible outcomes of an
experiment together with their probabilities.
Eg. If we toss a fair coin two times; the following will
be the possible outcome of this experiment
Outcomes Probabilities
TT ¼
TH ¼
HT ¼
HH ¼
Probability Distribution
The probability distribution of the number of heads
obtained in this two tosses of the coin is given as:
No. of heads (X) Probability P(X)

0 1/4
1 1/2
2 1/4
Total 1.0

All probabilities must add up to 1; remember!


Probability Distribution
Random Variables:
➢They are characteristics that can be observed but
cannot be controlled.

➢They can be characteristics, measurements or


counts that vary randomly according to a function.

Random: you don’t know the value of the next


observation, but you do know the probabilities
associated with values and ranges of values.
Probability Distribution
Notation:
➢Upper case letters such as X and Y are used to
denote random variables.
➢Lower case letters such as x and y denote the
values of random variables.

Random variables can either be discrete or


continuous
Probability Distribution
Probability Distribution Function (PDF) of Discrete
Random Variables
➢ The table that shows the values x in one column and the
corresponding probabilities, p(x) in another column is
called a probability distribution function (PDF)
In our earlier example on the probability distribution of the
number of heads:
No. of heads (X) Probability P(X)
0 1/4
1 ½
2 ¼
Total 1.0
Probability Distribution
Thus, the p(x) satisfies the following characteristics;
➢. 0  p ( x)  1
Each probability is between zero and one, inclusive.
N
➢.
 p( x ) = 1
i =1
i

The sum of the probabilities is one


Mean/Expected Value and Variance of a Probability Dist
Let X be random variable, and x1, x2, . . . xn, the list
of possible outcomes for X.
Then the mean of the distribution and expected
value of X are the same quantity, given by:
n
 = E ( X ) =  xi .P( xi ), i = 1, 2,3,..., n
i =1

Eg. Assume that we toss three fair coins


simultaneously, the possible number of heads that
can appear as a result of the random experiment are
as shown:
Mean/Expected Value and Variance of a Probability Distribution

Outcome No. of Heads Probabilities


TTT 0 1/8

HTT 1 1/8
TTH 1 1/8
THT 1 1/8
THH 2 1/8
HHT 2 1/8
HTH 2 1/8
HHH 3 1/8
Mean
Summarizing to the number of heads occurring in the entire
experiment and their corresponding probabilities, we have;
No of Heads (X) P(X) X.P(X)
0 1/8 0
1 3/8 3/8
2 3/8 ¾
3 1/8 3/8
Total 1.0 3/2
The expected value is;
n
E ( X ) =  X i P ( X i ) , i = 1, 2, 3,..., n
i =1

3 3 3
= 0+ + +
8 4 8
12
= = 1.5
8
Mean
The result means that on the average, 1.5 heads can
be expected to appear as a result of every random
experiment of tossing three fair coins at any one
time.

Eg.
Variance
Let X be random variable, and x1, x2, . . . xn, the list
of possible outcomes for X.
Then:
The variance of the random variable is:
n
 = Var ( X ) =  ( X i −  ) P( X i ), i = 1,2,3,..., n
2 2

i =1

And standard deviation is


n
= (x − ) P ( xi ), i = 1, 2,3,..., n
2
i
1=1
Variance
From our random experiment of tossing three fair
coins once;
No of Heads (X) P ( )
X  ( X −  ) ( X −  )
2
( X −  ) P( X )
2

0 1/8 1.5 - 1.5 2.25 0.28


1 3/9 1.5 - 0.5 0.25 0.09
2 3/8 1.5 0.5 0.25 0.09
3 1/8 1.5 1.5 2.25 0.28
Total 0.74

Variance =  =  ( X −  ) P ( X )
2 2

= 0.74
Standard Deviation =  =  = 0.74 = 0.86 2
Binomial Distribution
Binomial distribution is a special discrete probability
distribution.
➢Its one of the simplest and most frequently used
discrete probability distributions

❖There are four conditions that the experiment has


to meet to be considered a binomial experiment:
i. There are a fixed number of trials.
Think of Trials as repetitions of an experiment.
The letter n denotes the number of trials.
Binomial Distribution
➢ There are only two possible and mutually exclusive
outcomes, called "success" and "failure," for each trial.

➢ The n trials are independent and are repeated using


identical conditions.

Because the n trials are independent, the outcome


of one trial does not help in predicting the outcome of
another trial.

➢ The letter p denotes the probability of a success on one


trial, and q, which is also (1-p); denotes the probability of
a failure on one trial, so p + q = 1.

Since the trials are independent, p and q remain the


same for each trial.
Binomial Distribution

Notation:
➢The outcomes of a binomial experiment fit a
binomial probability distribution.

➢The random variable X = the number of successes


obtained in the n independent trials.
➢. X B ( n, p ),

This notation states that the random variable X is a


binomial distribution with n trials and the
probability of success , p.
Binomial Distribution
➢. P ( x = k ) = nCk p k q n −k , this is the Binomial
Formula
(the probability of k successes out of n trials)
Where:

n n!
n
Ck =   =
 k  k !( n − k ) !
p = Probability of success
q = Probability of failure (1 – p)
k = number of successes desired
n = number of trials
Binomial Distribution
The formula is also expressed as
 n  k n−k
p ( x) =   p q
k 

Eg. If a new drug is found to be effective 40% of the


time, then what is the probability that in a random
sample of 4 patients, it will be effective on 2 of
them?
Soln.
Let us define effective as success and non-effective
as failure. Then;
Binomial Distribution
p = 0.4 (since the drug is effective 40% of the time)
q = (1 – p) = (1 – 0.4) = 0.6
k=2
n=4
n
P( x) =   ( p ) ( q )
k n−k

k 
 4
=   ( 0.4 ) ( 0.6 )
2 2

 2
4!
= ( 0.4 ) ( 0.6 )
2 2

2!( 4 − 2 ) !
= 6  0.16  0.36
= 0.3456
Binomial Distribution
Mean;  = np
Variance;  2 = npq
Standard deviation;  = npq

From our earlier example:


 = ( 4 ) ( 0.4 ) = 1.6

 = ( 4 )( 0.4 )( 0.6 ) = 0.96


2

= ( 4 )( 0.4 )( 0.6 ) = 0.9798


Binomial Distribution
Eg1. Suppose you play a game that you can only either win or lose. The
probability that you win any game is 55%, and the probability that you
lose is 45%. Each game you play is independent.
a. If you play the game 20 times, write the function that describes
the probability that you win will 15 of the 20 times.
b. Find the mean number of wins
c. Find the standard deviation of wins

Eg.2. A trainer is teaching a dolphin to do tricks. The probability that


the dolphin successfully performs the trick is 35%, and the probability
that the dolphin does not successfully perform the trick is 65%. Out of
ten attempts, you want to find the probability that the dolphin
succeeds at most 5 times. State the probability question
mathematically.
Poisson Distribution
It differs from the binomial distribution in the sense that:

➢ in the Binomial distribution we must be able to count


the number of successes and the number of failures;

….with the goal to look for the probability of a


specific value of success in n trials

➢ while in Poisson distribution , all we want is to know


the average number of successes in a given unit of
time.

… to look for the specific number of


occurrences in a specific amount of time or space.
Conditions of a Poisson Experiment
➢ The experiment consists of counting the number of events
occurring in a fixed interval of time or space if these
events happen with a known average rate and
independently of the time since the last event.

➢ The probability of the event remains constant for each


interval of equal length.

➢ The number of occurrences in one fixed interval is random


and independent of the number of occurrences in other
fixed intervals.

➢ The random variable X = the number of occurrences in the


interval of interest
Poisson Distribution

▪ We need to know the average of events per unit


of time;  ( lambda )
This could be:
➢The average number of cars passing under a
bridge in any given hour
➢The average number of a machine breakdowns
per month
➢The average number of patience arriving at a
facility per day
➢Etc.
Poisson Distribution
➢The probability that exactly (x) events will occur in
a given time is given as:

 e
x −
P ( x) =
x!
Where:
 = Average number of occurrences per unit time
e = the base of the natural logarithms (2.71828…)
 = np
= 
Poisson Distribution

Eg.1: Assume that on an average 3


persons enter the lab for service every 10
minutes. What is the probability that
exactly 5 customers will enter the lab in a
given 10 minute period, assuming that the
process can be described by Poisson
distribution
Poisson Distribution
Soln.
x=5
 =3

 x e−
P ( x) =
x!
( 3) ( 2.71828 )
5 −3

=
5!

=
( 243 ) ( .0498 )
120
= 0.1008
Poisson Distribution
Eg.2. Customers arrive at a photocopying machine at
an average rate of 2 every 10 minutes. The number
of arrivals is distributed according to a Poisson
distribution.

What is the probability that:


a. There will be no arrival during this time period
b. There will be exactly one arrival during this time
period
c. There will be more than two arrivals during this
time period
Poisson Distribution
Soln.
From our problem, we have:
=2
 x e−
P ( x) = , for x = 0,1, 2,...
x!
( 2 ) ( 2.71828)
0 −2

a. For x = 0; P ( 0) = = 0.1353
0!

( 2 ) ( 2.71828 )
1 −2

b. For x = 1; P (1) = = 0.2707


1!

( 2 ) ( 2.71828 )
2 −2

a. For x = 2; P ( 2 ) = = 0.2707
2!
Poisson Distribution
Then, the probability of more than 10 arrivals in a
10 minute period is

P ( x  2 ) = 1 −  P ( 0 ) + P (1) + P ( 2 ) 
= 1 − .1353 + 02707 + .2707 
= 1 − 6767
= 0.3233
CONTINUOUS RANDOM VARIABLES

You might also like