Updated Sta 101 Notes
Updated Sta 101 Notes
Purpose
To introduce the Learners to the General concept of Probability and Statistics including Markov
Chain
Learning Outcomes
By the end of this course the student should be able to:
To obtain a frequency distribution given a data set
To calculate Simple probabilities, mean, mode, Median and standard deviation.
To obtain a sample space for an event
To calculate expected value of a random variable
Course Description
Frequency distributions, relative and cumulative distributions, various frequency curves, mean,
mode, median, quartiles and percentiles, standard deviation, symmetrical and skewed
distributions. Probability: sample space and events; definition of probability, properties of
probability; random variables; probability distributions; expected values of random variables.
Elements of Markov chains.
1
WEEK CONTENT
Topic 1: Frequency distributions, relative and
cumulative distributions, various frequency curves
1and 2
Mode of Delivery
Online Lectures, Online Tutorials, Assignments, Online Class discussions & Illustrations.
Course Assessment
2 CATS- 20%
Assignment-10%
End of semester examination 70%
Total marks 100%
2
Topic 1: Frequency distributions, relative and cumulative distributions, various frequency curves
Frequency distributions
The term frequency distribution is a formation of two borrowed words Frequency meaning The
rate at which something happens or is repeated, Hornby A.S(2010) ; number of occurrence
of a given phenomenon per time and the word Distribution meaning the spread of something
over an area Hornby A.S(2010)
Frequency distribution therefore can be said to mean the spread of a rate of occurrence of a
phenomenon over an area.
Example
For a family planning body, a study of family size distribution revealed the number of children in
30 randomly chosen families.
Table 1: Number of children in 30 randomly chosen families:
1 2 4 0 2 3 1 4 2 3
5 2 2 3 2 2 3 1 2 3
2 0 1 1 2 0 3 2 3 3
Such a distribution that takes whole number is called discrete variables; otherwise the data set
can be continuous data set: taking both whole numbers and decimal within some range of
numbers.
Other examples of discrete raw data forms can take the form of
3
To put our data in Table 1 in a more meaningful way, we count the number of times each value
occurs and form a frequency distribution:
Frequency 3 5 11 8 2 1 30
Such data set may also be represented by a bar graph in which the height of each bar represents
the frequency.
f10
r
e8
q
u
e6
n
c4
y
2
0
0 1 2 3 4 5
Family size
Figure 1: Bar graph for the distribution of family sizes among a group of 30
families
From the graph this set of data clealy has a single mode, and the mode is 2 children per family
sometime the data we handle can be very large such as data on all emplyees in a county or a
country, such data run into millions of data points to handle this we group(order) the data.
4
Consider a dummy data set for the distribution of a daily covid -19 infections in a country for a
a hundred days .
320 380 340 410 380 340 360 350 320 370
350 340 350 360 370 350 380 370 300 420
370 390 390 440 330 390 330 360 400 370
320 350 360 340 340 350 350 390 380 340
400 360 350 390 400 350 360 340 370 420
420 400 350 370 330 320 390 380 400 370
390 330 360 380 350 330 360 300 360 360
360 390 350 370 370 350 390 370 370 340
370 400 360 350 380 380 360 340 330 370
340 360 390 400 370 410 360 400 340 360
One may wish to answer a few questions from such a distribution such as the average of the
infections.
Such a data set must be fit in a frequency distribution using tallies, to obtain this. The data is
ordered by arranging from the smallest say the 300,...,upto the largest 440 and hence obtaining
tallies for the data for the data set .
5
370 |||| |||| |||| 15 0.15 0.68
440 | 1 0.01 1
The number of tallies for every data poin t is used to indicate how oteen the corresponding value
occurs inthe sample and its called the absolute requency or simply the frequency of the value
. When the frequency of is divided by the sample size( ) the value obtained is called the
relative frequency ( ).
Theorem 1:
Suppose that we are given a sample of size con with numerically diferent values
̅ ̅ ̅
̅
̅ {
6
A histogram
This is a graph of a frequency distribution of numerical data for different categories of events,
individuals, or objects. A frequency distribution indicates the individual number of events,
individuals, or objects in the separate categories. Most people easily understand histograms
because they resemble bar graphs often seen in newspapers and magazines. An ogive is a graph
of a cumulative frequency distribution of numerical data from the histogram.
A cumulative frequency distribution indicates the successive addition of the number of vents,
individuals, or objects in the different categories of the histogram, which always sums to 100. An
ogive graph displays numerical data in an S-shaped curve with increasing numbers or
Percentages that eventually reach 100%. Because cumulative frequency distributions are rarely
used in newspapers and Histograms and ogives have different shapes and vary depending on
frequency. An ogive always increases from 0% to 100% for cumulative frequencies. The shape
of a histogram determines the shape of its related ogive. A uniform histogram is flat; its ogive is
a straight line sloping upward. An increasing histogram has higher frequencies for successive
categories; its ogive is concave and looks like part of a parabola. A decreasing histogram has
lower frequencies for successive categories; its ogive is convex and looks like part of a parabola.
A uni-modal histogram contains a single mound; its ogive is S-shaped. A bi-modal histogram
contains two mounds; its ogive can be either reverse S-shaped or double S-shaped depending
upon the data distribution. A right-skewed histogram has a mound on the left and a long tail on
the right; its ogive is S-shaped with a large concave portion. Frequency data from a Histogram,
however, can easily be displayed in a cumulative frequency ogive.
7
Example:
Using a table of classes and corresponding frequency provided, obtain a histogram for the data.
Solution
(a) Histogram
8
(b) Cumulative frequency curve
Exercise
9
Cumulative distributions
It is a step function, also reffered to us (piecewise costant function) having jumps of magnitude
at those x at which ̂
When we are dealing with numerous continuous data then grouping the data set is unavoidable .
The procedure here involves taking sample class intervals in such a way tha everyclass interval
contains a certain number of data point. The first class will contain the smallest value and
progressively the last class would contai the highest value. The mid-points for these classes are
called class class mid-points .
The number of data items in a class corresponds to the absolute class frequency.
Notice if only few classes are used the work is easy to do but this is done at the expense of a lot
of information is lost. The number of groups should be done in such awa that no usefell
information is lost.
NB:
The construction of the classes is such that they hae equal length
10
Example :
For the this data set below represents scores farm sizes to the nearest hectares obtain a suitable
table for grouped data
320 380 340 410 380 340 360 350 320 370
350 340 350 360 370 350 380 370 300 420
370 390 390 440 330 390 330 360 400 370
320 350 360 340 340 350 350 390 380 340
400 360 350 390 400 350 360 340 370 420
420 400 350 370 330 320 390 380 400 370
390 330 360 380 350 330 360 300 360 360
360 390 350 370 370 350 390 370 370 340
370 400 360 350 380 380 360 340 330 370
340 360 390 400 370 410 360 400 340 360
Solution:
360-375 360.5-375.5 ||
405-420 405.5-420.5 |
420-435 420.4-435.5
11
Topic 2: Mean, mode, median
A natural human tendency is to make comparisons with the “average”. For example, a student
scoring 40% in an examination will be happy with the result if the average score of the class is
25 %.
If the average class score is 90 %, then the student may not feel happy even if he got 70% right.
Some other examples of the use of “average” values in common life are mean body height, mean
temperature in July in some town, the most often selected study subject, the most popular TV
show in 2015, and average income. Various statistical concepts refer to the “average” of the data,
but the right choice depends upon the nature and scale of the data as well as the objective of the
study. We call statistical functions (operations) which describe the average or Centre of the data
location parameters or measures of central tendency. These functions are Mean, Mode and
Median.
The mean
Arithmetic Mean
The arithmetic mean is one of the most intuitive measures of central tendency. Suppose a
variable of size consists of the values, The arithmetic mean of this data is defined
as
∑
̅
In informal language, we often speak of “the average” or just “the mean” when using
∑
This is the “typical value” of a data set is often denoted by ̅ , therefore if we have values
.
Then
12
∑
̅
The symbol ∑ “means the sum of” and it is read as sigma and so ∑ for means
the sum of the values which is
And
∑
̅
The Median
After the mean, the most common measure of central tendency is the median. Like the mean , the
median provides atypical numerical value. The sample median, denoted by m is the central
observation when all the data are arranged in increasing sequence ie. This is the value above or
below which lies equal number of observations.
The Mode
This is the third measure of average, it represents the most frequently occurring value, consider a
simple case of data set {2,0, 2,3,4,4 ,4, 7}. In this case the mode is 4, because 4 occurs most
often. On a relative frequency plot, if the data set is large enough the mode takes approximately a
centre position unless the distribution is skewed.
When the data are grouped into classes, the mode is represented by the midpoint of the interval
having the greatest class requency, this group is then the modal class.
When the frequency distribution is portrayed as a smoothed curve figure below the mode
corresponds to the possible observation value lying beneath the highest point on the frequency
curve-the location of the maximum clustering.
Example
A sample audit of a company records showed the following number of plants accidents per
month.
13
0 1 4 4 7 2 2 6 7 2 0 1
Sol
Mean
̅
Median
Sol
0 0 1 1 2 2 2 4 4 6 7 7
Median = 2
Mode
The mode of a set of data is the value that occurs most often.
Task 3
14
2076 2061 2595 1960 2980
a) Mean
b) Mode
c) Median
∑ ̅
√ ∑ ̅
This formula is sometimes difficult to use , especially when ̅ is not an integer, an alter native
formula can then be derived from
∑ ̅
15
∑ ̅ ̅
A distribution is said to be symmetrical when the distribution on either side of the mean is a
mirror image of the other and clear a line of symmetry exists in such a way that a long this line ,
mean=median= mode.
Thus symmetry is said to exists in a distribution if the high values and low values balance
themselves out in their frequencies.
Thus if the smoothed frequency polygon of the distribution can be divided into equal halves
NB.
A symmetric distribution may not necessarily mean a normal distribution yet all normal
distributions are symmetrical..
Skewness
i. Positive skewness
ii. Negative skewness
16
Measures of skewness
Based on tendency
17
Example
Use the pearsonian 2nd coefficient of skewness to calculate skewness in the data set if any
Solution:
18
Topic 5: Probability: sample space and events
Experiment or observation of a phenomenon may help us yield a set of data. The data set so
obtained is a result of some outcomes, each with some possibility of occurring. The set of all
possible outcomes of a statistical experiment is called the sample space for the experiment;
it is denoted by S. Each of the possible outcomes of the statistical experiment are elements of the
sample space and are called sample points.
A sample space that contains a finite number or a countable set (i.e., as many elements as there
are whole numbers) of sample points is a discrete sample space. Conversely, a sample space
that contains an infinite and uncountable set of sample points, with as many elements as there are
points on a line, is a continuous sample space.
Consider the sample spaces that contain the outcomes of a toss of a coin yielding {HH or HT or
TT or TH} in two tosses of a coin or a sum of out comes in two spins of a die, drawings from a
bag of mixed-color balls, and dealings from a regular 52-card deck are all examples of discrete
sample spaces. Another example is the number of roulette wheel spins made before the ball lands
on 25; the number can range from 1, 2, 3, ... all the way to infinity, but the number has to be
integer, so this number can take on as many values as there are whole numbers.
Sample spaces that contain the outcomes of temperature readings, e.g temperature readings of
all employees entering a building, temperature readings of students sitting in a lecture hall.
Student’s height measurements, and workers, salaries are examples of continuous sample spaces.
An event is a subset of a sample space e.g an event a student temperature reads 37.10c. It
may contain some, all or none of the outcomes comprising the sample space. If the event
contains only one sample point, it is a simple event. If the event contains two or more sample
points, it is a compound event e.g event individual’s body temperatures fall below 37.10c and he
is Male or Female . If the event contains no sample points, it is known as a null space; this is
denoted by ̅.
Note
19
It can be shown that in any sample space the empty set { } is an event.
Example
(9,18),(9,20),
A= { }
B= { }
C= { }
Sample space like S above is clearly made up of simple events , that can again be grouped to
form a sub spaces. Such relationship can better be described using set notation:
UNION
A B
20
INTERSECTION
A B
COMPLIMENT
The compliment of a set A, is a set of outcomes that though appear in the space, they are NOT in
A. Denoted by
A B
We can obtain
21
and
A ={ }
There are events that cannot occur together, such events are called mutually exclusive events,
take or example the set of samples illustrated below
B
A
OR
The proportion of times the event would occur in the long run if the experiment were to be
repeated over and over again. Capital letter P is commonly used to stand for probability.
22
Thus given any experiment or Natural phenomenon
4.
5. Let denote the empty set. Then
Example
Let be the event a Student is late for class and let be the event it’s raining. Suppose
, and .
Find,
Solution
Random variables
4, 3, 4, 2, 5, 1, 6, 6, 5, 3, 2, 6, 5, 4, 6, 2, 1, 6, 2, 4
Such random variables can be discrete like in the case above or may be continuous eg heights of
students enrolling for Statistics course and such would probabaly be 1.4m.1.7m, 1.3,.1.6m....etc
taking decimal values within a range.
23
Probability distributions
The probability distribution or simply distribution of a discrete random variable X is a list of the
distinct numerical values of X along with their associted probabailities eg
Value of Probability
: :
Example
Given that a value X takes the values 0,1,2,3,4 with the following probabailities
Value(X) Probabaility(
0 0.02
1 0.23
2 0.4
3 0.25
4 0.1
24
Solution:
Example 2:
( )
Find
a)
b)
Solution
= = ( )
( )+ ( )
25
Topic 8: Expected values of random variables
(a) ∑ ( )
∑ ( )
Example
Upon examination of the claim records 280 Insurance policyholders over a period of five years,
the company now makes an empirical determination of the probability distribution of X=number
of claims per policy holder in 5 years.
0 0.307
1 0.286
2 0.204
3 0.114
4 0.064
5 0.018
6 0.007
26
(b) Calculate the standard deviation of
Solution
∑ ( )
And
(b) ∫
The function is the probability function of the random variable considered, the value is
then found by summing over all possible values. In the continuous case the function is
called the density of .
Assumptions
NB
If the above are not satisfied then the distribution does not have a mean- a rare case.
If the distribution is symmetric about some number , for real value , then it can be shown
that
The mean of a symmetric ditribution at imply that the distribution has mean
27
The variance on the other hand is denoted by and is obtained by
∑ ( )
∫ ( )
Statistics in my opinion is the “ lense” through which investigators use to see the greater picture
of world phenomenon that otherwiese would be difficult to have within reach.Statistics that
help attain this is the inferential statistics. This is attained by performing experiment or observing
the Naturing occurrences where experiments are not possible to undertake .
The realizations are deemed to display or happen with some probability and hence generally for
the whole set we can talk about the probabaility density or mass function(curves).
Bernoulli Distribution
Binomial distribution
Poisson Distribution
Normal Distribution
A Stochastic process
A Stochastic process is a mathematical model that evolves over time in a probabilistic manner.
A Markov chain named after a Russian mathematician A.A Markov(1856-1922) is a special kind
of stochastic process where the outcome of an experiment/Phenomena depends only on the
outcome of the previous experiment/phenomena. The next state of the system depends only on
the present state not on the preceding states.
28
Suppose there is a physical or mathematical system that has n possible states and at any one time,
the system is in one and only one of its n states. As well, assume that at a given observation
period, say k th period, the probability of the system being in a particular state depends only on
its status at the k-1st period. Such a system is called Markov Chain or Markov process. Let us
clarify this definition with the following example.
Example
Suppose a car rental agency has three locations in Ottawa: Downtown location (labeled A), East
end location (labeled B) and a West end location (labeled C). The agency has a group of delivery
drivers to serve all three locations. The agency's statistician has determined the following:
1. Of the calls to the Downtown location, 30% are delivered in Downtown area, 30% are
delivered in the East end, and 40% are delivered in the West end
2. Of the calls to the East end location, 40% are delivered in Downtown area, 40% are
delivered in the East end, and 20% are delivered in the West end
3. Of the calls to the West end location, 50% are delivered in Downtown area, 30% are
delivered in the East end, and 20% are delivered in the West end.
After making a delivery, a driver goes to the nearest location to make the next delivery. This
way, the location of a specific driver is determined only by his or her previous location.
T is called the transition matrix of the above system. In our example, a state is the location of a
particular driver in the system at a particular time. The entry sji in the above matrix represents the
29
probability of transition from the state corresponding to i to the state corresponding to j. (e.g. the
state corresponding to 2 is B)
Assuming that it takes each delivery person the same amount of time (say 15 minutes) to make a
delivery, and then to get to their next location. According to the statistician's data, after 15
minutes, of the drivers that began in A, 30% will again be in A, 30% will be in B, and 40% will
be in C. Since all drivers are in one of those three locations after their delivery, each column
sums to 1. Because we are dealing with probabilities, each entry must be between 0 and 1,
inclusive. The most important fact that lets us model this situation as a Markov chain is that the
next location for delivery depends only on the current location, not previous history. It is also
true that our matrix of probabilities does not change during the time we are observing.
What is the probability (say, P) that you will be in area B after 2 deliveries? Think about how
you can get to B in two steps. We can go from C to C, then from C to B, we can go from C to B,
then from B to B, or we can go from C to A, then from A to B. To figure out P,
Let P(XY) denote the probability of going from X to Y in one delivery (where X,Y can be A,B
or C).
Remember if two (or more) independent events must both (all) happen, to obtain the probability
of them both (all) happening, we multiply their probabilities together. To obtain the probability
of either (any) happening, we add the probabilities of those events together.
Thus if we mark the probability that a delivery person goes from C to B in 2 deliveries as P
then, P = [P(CA) and P(AB)] or [P(CB)andP(BB)] or[ P(CC)andP(CB)]
this gives us P = P(CA)P(AB) + P(CB)P(BB) + P(CC)P(CB) for the probability that a delivery
person goes from C to B in 2 deliveries.
Substituting into our formula using the statistician's data above gives,
This tells us that if we begin at location C, we have a 33% chance of being in location B after 2
deliveries.
Task(Try out):
30
Beginning at location B, what is the probability of being at locatin B after 2 deliveries?
Answer 0.34
(i) States
(ii) Transition probabaility
(iii)Transition probability Matrix
(iv) Nature of the states
(v) Long run transition probability matrix
Considering the matrix T in the example above , we can describe the following elements of
the Markov chain
States
Transition probability
The entry sji in the above matrix represents the probability of transition from the state
corresponding to i to the state corresponding to j.
As for T above this is the matrix representing transition probabilities for the chain.
Nature of the states
A state can be described as either transitive , egordic, accessible, communicative,
irreducible closure, recurrent, periodic and aperiodic depending on the transition
probabilities.
The matrix representing convergence or a steady state after many transitions , consider
What do you notice about these matrices as we take into account more and more
deliveries? The numbers in each row seems to be converging to a particular number.
31
Think about what this tells us about our long-term probabilities. This tells us that
after a large number of deliveries, it no longer matters which location we were in when
we started.
Definitions
For a Markov chain with n states, the state vector is a column vector whose ith
component represents the probability that the system is in the ith state at that time. Note
that the sum of the entries of a state vector is 1. For example, vectors X0 and X1 in the
above example are state vectors. If pij is the probability of movement (transition) from
one state j to state i, then the matrix T=[ pij] is called the transition matrix of the
Markov chain.
32
(b) The probability distribution of a random variable X , is given by
for x = 0, 1, 2, 3, 4. Given that t is a constant.
find:
(c) A random variable X, “the delay time in seconds before a School time keeper rings a
School bell” has a probability density function defined by
-----------------
(ii) the probability that the delay will be less than 4 seconds. [3marks]
(iii) the probability that the delay time will be between 2 and 6 seconds.
[3marks]
(f) In a vaccination exercise against a disease, it’s known that the probability of a child
reacting from injection of the serum is 0.001. In a Primary School of 2000 children,
what is the probability that out of 2000 pupils vaccinated?
(a) Given a discrete random variable R with its cumulative distribution F(r) shown in the
table below.
r 1 2 3 4
F( r ) 0.13 0.54 0.75 1
33
i. P(r = 2) [2 marks]
ii. P(r 1) [2marks]
iii. P(r 3) [2marks]
iv. P(r 2) [2marks]
v. [3marks]
(b) Show that for X and Y independent random variables with moment generating
functions respectively. Then
(9marks)
(a) Given that a variable X represents the ages of pupils in a class, if the pupils age follows a
normal distribution with mean of 12 years and standard deviation of 4 Find
i. [2marks]
ii. [2marks]
iii. [2marks]
34
(b) The intelligence quotients of 500 school children are assumed to be normally distributed
with mean 105 and standard deviation 12. How many children may be expected:
(a)The number of computer input errors per minute made by a particular computer
programmer has a poisson distribution with an average of 0.75 errors per minute.
(iii) What is the probability that the programmer will make at least one error in a
particular minute? [3marks]
(iv) What is the probability that the programmer will make more than two errors in a
particular minute? [3marks]
Find
(i) The expectation of X [4marks]
35
References:
https://www.dartmouth.edu/chance/teachingaids/booksarticles/probabilityboo
k/pdf.html
36