Statistics for Engineers & Scientists
Statistics for Engineers & Scientists
Chapter One
1.1 Introduction
Statistics is a branch of applied mathematics that deals with the collection, organization,
presentation, analysis and interpretation of numerical data.
Classification of Statistics
Statistics is broadly categorized into two categories based on how the collected data are used.
Classifications:
Depending on how data can be used statistics is sometimes divided in to two main areas or
branches.
1. Descriptive Statistics: is concerned with summary calculations, graphs, charts and tables.
2. Inferential Statistics: is a method used to generalize from a sample to a population. For
example, the average income of all families (the population) in Ethiopia can be estimated from
figures obtained from a few thousands (sample) families.
• It is important because statistical data usually arises from sample.
• Statistical techniques based on probability theory are required.
Examples:
a) From past figures, it has been predicted that 31% of registered voters will vote in the
November election.
b) The average age of a student in Hawassa University is 20.1 years.
i. Collection of data: the process of measuring, gathering, assembling the raw data up on which
the statistical investigation is to be based.
1
Data can be collected in a variety of ways and shall be discussed later.
iv. Analysis of data: The process of extracting relevant information from the summarized data,
mainly through the use of elementary mathematical operation.
v. Inference of data:
The interpretation and further observation of the various statistical measures through the analysis
of the data by implementing those methods by which conclusions are formed and inferences
made.
Sample: a part/ portion of the population selected to draw conclusions about the population. It
should be selected using some pre-defined sampling technique in such a way that they represent
the population very well.
Parameter: any statistical measure that refers to a population or computed from from a
population data.
Statistic: any statistical measure that refers to a sample or computed from from a sample data.
Census: A survey in which there is complete coverage.
Sample survey: A survey in which there is partial coverage.
Sampling: The process or method of sample selection from the population.
Sample size: The number of elements or observations to be included in the sample.
Variable: is a characteristic or attribute that can assume on many different numerical values.
Data: Data as a collection of related facts and figures from which conclusions can be drawn.
2
There are two types of variables.
Qualitative Variables: are nonnumeric variables and can't be measured but can be placed in to
distinct categories, according to some characteristics or attributes.
Examples: gender, religious affiliation, political affiliation etc
Quantitative variables: are variables that can be quantified or can have numerical values.
Examples: height, income, temperature e t c.
Quantitative variables can be further classified as
Continuous variables
b) Continuous variables are variables that can have any value within an interval.
Examples: weight, Length, Volume, e t c.
Applications of statistics:
• Almost all human beings in their daily life are subjected to obtaining numerical facts e.g.
about price.
Uses of statistics:
The main function of statistics is to enlarge our knowledge of complex phenomena. The
following are some uses of statistics:
3
Data reduction.
Limitations of statistics
As a science statistics has its own limitations. The following are some of the limitations:
• Deals with only aggregate of facts and not with individual data items.
The type of variable classification depend on how variables are categorized, counted or
measured. This variable classification uses measurement scales and four common types of
measurement scales are used. Namely: nominal, ordinal, interval and ratio.
Nominal scale: the nominal level of measurement classifies data into mutually exclusive (non-
overlapping) exhausting categories in which no order or ranking can be imposed on the data.
Ordinal scale: the ordinal level of measurement classifies data into categories that can be
ranked, however, no precise differences between the ranks do exist.
4
Interval Scale: the interval level of measurement ranks data and precise differences between
units of measure do exist, however, there is no true zero.
Ratio Scale: the ratio level of measurement possesses all the characteristics of interval
measurement and there exists a true zero. In addition, true ratios exist when the same variable is
measured on two different members of the poplation.
5
1.2 Methods of data collection and presentation
After clarifying problem and formulating questions or hypotheses that can be answered with the
study, then we collect the data. Having collected a set of data, a primary step is to organize and
present the information using frequency distributions, charts and graphs. In this chapter we learn
first how to collect data then organize and present data by means of tables, charts and graphs.
Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is
statistical data when they are
Comparable
Meaningful and
Collected for a well-defined objective
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of
magnitude.
It enables us to know the range of the data set easy and it also gives us some idea
about the general characteristics of the distribution.
Any scientific investigation requires data related to the study. The required data can be obtained
from either a primary source or a secondary source.
Primary source: Is a source of data that supplies firsthand information for the use of the
immediate purpose.
Primary data: are data originally collected for the immediate purpose.
6
- Primary data are more expensive than secondary data.
Introduction
Classification is necessary because it would not be possible to draw inferences and conclusions if
we have a large set of collected [raw] data.
A frequency distribution is the organization of raw data in table form using classes and
frequencies.
Frequency: - is the number of times a certain value or set of values occurs in a specific group.
This distribution is used to data that can be placed in categories such as nominal or ordinal.
Example: A social worker collected data on marital status for 25 persons (M= married, S=single,
W=widowed, D=divorced)
M S D W D
7
S S M M M
W D S M M
W D D S S
S W W D D
Solution: there are four types of marital status M, S, D and W. These categories will be used as
classes for the frequency distribution. The following procedures shall be followed while
constructing categorical frequency distribution.
Step 2: Count the categories and place the result in column (2).
f
Step 3: Find the percentages of values in each class by using; % * 100 where f=frequency
n
of the class and n= total no of observations.
-Percentages are not normally a part of frequency distribution but they can be added since they
are used in certain types diagrammatic such as pie charts.
8
Ungrouped frequency distribution is a table of all potential raw scored values that could possibly
occur in the data along with their corresponding frequencies. Ungrouped frequency distribution
is often constructed for small set of data.
First find the smallest and the largest raw scores in the collected data and calculate the range.
For small range values UFD is appropriate.
Arrange the data in order of magnitude to facilitate counting then count the frequency of the
values.
Then construct the UFD by putting the classes along with corresponding frequencies.
Solution:
STEP 1. Find the range of the data:
Range Maximumobservation Minimum observation =19-12=7. Since the range of the data
(7) is small, classes consisting of a single data value can be used. They are 12, 13, 14, 15, 16, 17,
18 and 19.
STEP 2. Arrange the data 12, 12, 12, 12, 12, 12, 12, 13, 14, 14, 15, 15, 15, 15, 15, 16, 16, 16,
16, 16, 16, 17, 18, 18, 19
STEP 3. Then construct the UFD by putting the classes along with corresponding frequencies.
Finally,
Age Frequency Percent
12 7 28
13 1 4
9
14 2 8
15 5 20
16 6 24
17 1 4
18 2 8
19 1 4
25 100
When the range of the data is large, the data must be grouped into classes. Grouped frequency
distribution is a frequency distribution when several numbers of data are grouped into one class.
10
– Class mark (M): the mid point of a class interval.
UCLi LCLi
i.e. M or M LCBi UCBi
2 2
– Cumulative frequency (Cf) less than type: the total frequency of all values (observations) less
than or equal to the upper class boundary for the given class.
– Cumulative frequency (Cf) more than type: The total frequency of all values (observations)
greater than or equal to the lower class boundary for the given class.
Cumulative frequency distribution: A tabular arrangement of class intervals together with their
corresponding cumulative frequency (either less than or more than type; as defined above).
Relative frequency: the frequency of a class divided by the total frequency (i.e. sum of all
frequencies) and, if multiplied by 100, gives the percent of values falling in that class.
frequencyof the class
Re lative frequencyof a class
total frequency
Note:
The relative frequency shows what fractional part or proportion of the total frequency
belongs to the corresponding class.
The sum of all the relative frequencies in the frequency distribution is always 1.
Formula: k 1 3.332log10 n where n is the total frequency and round this value of k up
to get an integer number when it turns to be fraction.
STEP 4. Find the class widths (W) by dividing the range by the number of classes and round the
number up to get an integer value. W R
K
STEP 5. Pick a suitable starting point less than or equal to the minimum value. This starting point
is the lower limit of the first class. Continue to add the class width to this lower limit to
get the rest of the lower limits.
11
STEP 6. Find the upper class limits. To find the upper class limit of the first class, subtract one unit
of measurement from the lower limit of the second class. Then continue to add the class
width to this upper limit so as to get the rest of the upper limits.
STEP 7. Compute the class boundaries as: LCB i LCLi 12 U and UCBi UCLi 12 U
STEP 8. Put the classes along with corresponding counts/ frequencies.
STEP 9. (If necessary) Find the cumulative frequencies (more than and less than types).
Example: The following data represent the record of high temperature in 0 F for each of the 50 US
states.
112 110 111 110 112 116 118 105 110 109
112 100 118 104 116 114 118 122 114 114
105 110 127 112 111 108 115 118 117 118
112 109 118 120 114 120 110 121 113 120
119 106 107 117 134 114 113 126 117 105
12
(less than (more than
type) type)
100 – 104 99.5 – 104.5 2 2 50
105 – 109 104.5 – 109.5 8 10 48
110 – 114 109.5 – 114.5 18 28 40
115 – 119 114.5 – 119.5 13 41 22
120 – 124 119.5 – 124.5 7 48 9
125 – 129 124.5 – 129.5 1 49 2
130 – 134 129.5 – 134.5 1 50 1
The data that is presented by a frequency distribution can also be displayed diagrammatically or
graphically. There are techniques for presenting data in visual display.
are visual aids which give a bird‟s eye view about a given set of numerical data;
a. Diagrammatic presentation of data: Bar charts, pie-chart, pictogram, Stem and leaf plot
Usually diagrams are appropriate for presenting discrete data, whereas graphs are appropriate for
presenting continuous types of data.
There are three common diagrammatic presentations of data: bar-diagram/charts, pie-chart and
pictograms. Stem and Leaf plot is also used some times.
i. Pie-charts
A pie-chart is a circle that is divided into sections or wedgrs according to the percentages of
frequencies in each category of the distribution. The angle of the sector of a class is obtained by
multiplying the ratio of the frequency of the class to the total frequency by 3600.
frequencyof the class
i.e. sector angleof a class 3600
total frequency
13
Note that pie-charts are usually used for depicting nominal level data.
Example: for the data given in the UFD below, construct a pie chart.
Class Frequency
M 6
S 7
D 7
W 5
Total 25
Chart Title
5, 20% 6, 24%
7, 28%
7, 28%
M S D W
14
Bar-diagram is a series of equally spaced bars having equal width and the height of each
bar representing the magnitude or frequency of observations in each group.
There are different types of bar charts. The most common being:
15
b. Component Bar Chart
When it is desired to show how a total (an aggregate) is divided into component parts, we use
component bar diagram. In such type of bar-diagrams, the bars represent aggregate value of a
variable with each aggregate broken into its component parts and different colors or designs are
used for identification.
c. Multiple bar-charts
Multiple bar-diagrams are used to display data on more than one variable. They are used for
comparing different variables at the same time.
Example: Use the same data given in the above example and depict it sing Multiple bar charts.
16
iii. Pictograms
In pictograms, we represent the data by means of some picture symbols. Here we decide a
suitable picture to represent a definite number of units in which the variable is measured.
Example: Draw a pictorial diagram to present the following data (number of students in a certain
school for four years.)
1995
1994 Key: = 1000 students
1993
1992
A stem and leaf plot is a device for presenting quantitative data in a graphical format similar to
histogram to assist in visualizing the shape of the distribution.
While Constructing stem and leaf plot, first order observations in ascending order
Example: 44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106
The leaf represents the ones place and the stem represents the rest.
17
4 4 6 7 9
5
6 3 4 6 8 8
7 2 2 5 6
8 1 4 8
9
10 6
i. Histogram
A histogram is another way of data presentation which is more suitable for frequency distributions with
continuous classes.
In drawing a pictogram, we put the class boundaries of each class on the horizontal axis and its respective
frequency on the vertical axis.
Example: Draw a histogram for the following data (temperature data of the 50 US states).
Solution:
Step 1 Draw and label the x and y axes. The x axis is always the horizontal axis, and the y axis is
always the vertical axis.
Step 2 Represent the frequency on the y axis and the class boundaries on the x axis.
Step 3 Using the frequencies as the heights, draw vertical bars for each class.
18
ii. Frequency Polygon
The frequency polygon is a graph that displays the data by using lines that connect points plotted
for the frequencies at the midpoints of the classes. The frequencies are represented by the heights
of the points.
Example: Present the data in the previous example using a frequency polygon.
Solution
Step 1 Find the midpoints of each class. Recall that midpoints are found by adding the upper and
lower boundaries and dividing by 2:
99.5 104.5
For the first class CM 102 , same procedure follows for others and the CMs are
2
given in the table below.
Class limits Class Class mid frequency
boundaries points
100 – 104 99.5 – 104.5 102 2
105 – 109 104.5 – 109.5 107 8
110 – 114 109.5 – 114.5 112 18
115 – 119 114.5 – 119.5 117 13
120 – 124 119.5 – 124.5 122 7
125 – 129 124.5 – 129.5 127 1
130 – 134 129.5 – 134.5 132 1
19
Step 2 Draw the x and y axes. Label the x axis with the midpoint of each class, and then use a
suitable scale on the y axis for the frequencies.
Step 3 Using the midpoints for the x values and the frequencies as the y values, plot the points.
Step 4 Connect adjacent points with line segments. Draw a line back to the x axis at the
beginning and end of the graph
Cumulative frequency polygon can be traced on less than or more than cumulative frequency
basis. Place the class boundaries along the horizontal axis and the corresponding cumulative
frequencies (either less than or more than cumulative frequencies) along the vertical axis. Then
join the cross points by a free hand curve.
Example: the data in the previous example can be presented using either a less than or a more
than cumulative frequency polygon as given below (i) and (ii) respectively.
Construct an ogive (less than type) for the frequency distribution of the 50 US states.
Solution
Step 1 Find the cumulative frequency for each class.
frequency
Class boundaries
Less than 99.5 0
20
Less than 104.5 2
Less than 109.5 10
Less than 114.5 28
Less than 119.5 41
Less than 124.5 48
Less than 129.5 49
Less than 134.5 50
Step 2 Draw the x and y axes. Label the x axis with the class boundaries. Use an appropriate
scale for the y axis to represent the cumulative frequencies.
Step 3 Plot the cumulative frequency at each upper class boundary
Step 4 Starting with the first upper class boundary, 104.5, connect adjacent points with line
segments
21
Chapter Two
Summarizing of Data
The most important aspect of studying the distribution of a sample measurement is the position
of the central value, that is, a representative value about which the measurements are distributed
and when it is convenient to have one figure that is representative of each group. This figure is
known as the average of the group. If the numbers of the group are arranged in order of
magnitude, the averages tend to fall around the central position in the group, so averages are
called measures of central tendency. In short, any measure intended to represent the center of
data set is called a measure of location or central tendency.
Let a data set consists of a number of observations, represents by x1 , x 2 , ..., x n where n (the last
subscript) denotes the number of observations in the data and x i is the ith observation. Then the
sum
n
x1 x2 ... xn xi
i 1
22
n
Similarly x1 x2 ... xn xi
2 2 2 2
i 1
n n
2. b.xi b xi where b is a constant number
i 1 i 1
n n
3. (a bxi ) n.a b xi where a and b are constant numbers
i 1 i 1
n n n
4. ( xi y i ) xi y i
i 1 i 1 i 1
We say a measure of central tendency is best if it possess most of the following. It should:
Several types of averages or measures of central tendency can be defined, the most commons are
- The mean
- the mode
- the median
The choice of average (measure of central tendency) depends upon which best represents the
property under discussion.
23
The arithmetic mean is defined as the sum of the measurements of the items divided by the total number of
items.
x x 2 ... x n x i
n n
n
X 1 X 2 ... X n X i
N N
Example 1: You measure the body lengths (in inches) of 10 infants at birth and record the
following:
17.5 19.5 17.5 19 20 21 18 19.5 18 10.75
n
x i
x1 x 2 ... x n 17.5 19.5 ... 10.75 180.75
x i 1
18.075
n n 10 10
When the data are arranged or given on the form of ungrouped frequency distribution, then the
formula for the mean is
k
f x f 2 x 2 ... f n x n fx i i k
x 1 1
f1 f 2 ... f n
i 1
k
Note that f i n
f
i 1
i
i 1
xi fi xi f i
2 2 4
3 1 3
7 3 21
8 1 8
Total 7 36
24
k
fx i i
f1 x1 f 2 x2 ... f n x n 2 * 2 1 * 3 ... 1 * 8 36
x i 1
5.14
k
f1 f 2 ... f n 2 1 ... 7
f
7
i
i 1
f x f 2 x 2 ... f n x n fx i i
x 1 1 i 1
f1 f 2 ... f n k
f
i 1
i
th
fi
= the frequency of the i class and k = the number of classes
k
Note that f i n = the total number of observations.
i 1
fx i i
f1 x1 f 2 x2 ... f k xk 2 *102 8 *107 ... 1 *132 5710
x i 1
114.2
k
f1 f 2 ... f k 2 8 ... 1
f
50
i
i 1
25
The sum of the deviations of the items from their arithmetic mean is zero. This means, the
algebraic sum of the deviations of a set of numbers x1 , x 2 , ..., x n from their mean x is zero.
n
That is ( xi x ) 0
i 1
The sum of the squares of the deviations of a set of observations from any number, say A, is
When a set of observations is divided into k groups and x1 is the mean of n1 observations of
of group k , then the combined mean ,denoted by x c , of all observations taken together is
given by
k
n x n2 x 2 ... nk x k n x i i
xc 1 1 i 1
n1 n2 ... nk k
n
i 1
i
Example: Last year there were three sections taking Stat 273 course in Alemaya University. At
the end of the semester, the three sections got average marks of 80, 83 and 76. There were 28, 32
and 35 students in each section respectively. Find the mean mark for the entire students.
Solution:
n1 x1 n2 x 2 n3 x3 28(80) 32(83) 35(76) 7556
xc 79.54
n1 n2 n3 28 32 35 95
If a wrong figure has been used in calculating the mean, we can correct if we know the
correct figure that should have been used. Let
x wr be the wrong mean calculated using x wr , then the correct mean, xcorrect , is given by
nx wr xc x wr
xcorrect
n
26
a) the mean of x1 k , x 2 k , ..., x n k will be x k
Example: An average weight of 10 students was calculated to be 65 kg, but latter, it was
discovered that one measurement was misread as 40 kg instead of 80 kg. Calculate the corrected
average weight.
- It is highly affected by extreme (abnormal) observations in the series. For instance, the
monthly incomes of three boys are 37 birr, 53 birr and 48 birr and that of their father is 1026
birr. The average income become becomes 219 birr which is not at all a representative
figure.
- It can be a number which does not exist in the series.
- It sometime gives such results which appear almost absurd. For example it is likely that we
can get an average of „3.6 children‟ per family.
- It gives greater importance to bigger items of a series and lesser importance to smaller items.
That means it is an upward bias measure.
- It can‟t be calculated for open-ended classes.
In finding arithmetic mean, all items were assumed to be of equal importance. When due
importance is to be given to each item, that is, when proper importance is required to be given to
27
different data, then we find weighted average. Weights are assigned to each item in proportion to
its relative importance.
If x1 , x 2 , ..., x k represent values of the items and w1 , w2 , ... , wk are the corresponding weights, then
w x w2 x 2 ... wk x k w x i i
xw 1 1 i 1
w1 w2 ... wk k
w
i 1
i
Example: A student‟s final grades in Mathematics, Physics, Chemistry and Biology are
respectively A, B, B and C. If the respective credits received for these courses are 3, 3, 2 and 2,
determine the approximate average mark (GPA) the student has got.
Solution: We use a weighted arithmetic mean, weight associated with each course being taken as
the number of credits received for the corresponding course.
xi 4 3 3 2
wi 3 3 2 2
Therefore, x w w x
i i
(3 4) (3 3) (2 3) (2 2) 31
3.1
w i 33 2 2 10
The geometric mean (GM) is defined as the nth root of the product of n values. The formula is
GM n x1 x 2 ... x n
The geometric mean is useful in finding the average of percentages, ratios, indexes, or growth
rates. Example: if a person receives a 20% raise after 1 year of service and a 10% raise after the
second year of service, the average percentage raise per year is not 15 but 14.89%, as shown.
The harmonic mean is defined as the number of values divided by the sum of the reciprocals of
each value. The formula is
28
n
HM
(1 / xi )
Example: Suppose a person drove 100 miles at 40 miles per hour and returned driving 50 miles
per hour. The average miles per hour are not 45 miles per hour, which is found by adding 40 and
50 and dividing by 2. The average is found as shown.
n 2
HM 44.44
(1 / xi ) 1 1
40 50
Exercise: A carpenter buys $500 worth of nails at $50 per pound and $500 worth of nails at $10
per pound. Find the average cost of 1 pound of nails.
The mode or the modal value is the most frequently occurring score/observation in a series and
denoted by x̂ . Note that the mode may not exist in the series or, even if it does exist, it may not be
unique.
1 The difference between the frequency of the modal class and the preceding class
2 The difference between the frequency of the modal class and the next class
29
115 – 119 114.5 – 119.5 117 13 41
120 – 124 119.5 – 124.5 122 7 48
125 – 129 124.5 – 129.5 127 1 49
130 – 134 129.5 – 134.5 132 1 50
Total 50
Since the third class has highest frequency, then it is taken as the modal class. Then LCB mod 109.5 ,
1 f mod f p 18 8 10 , 2 f mod f s 18 13 5
1 10
xˆ LCB mod w 109.5 5 112.83
1 2 10 5
Merits of mode
- Mode is not affected by extreme values.
- Mode can be calculated even in the case of open-end intervals. And it is not necessary to know all
observations.
Demerits of mode
- Mode may not exist in the series and if it exists it may not be a unique value.
- It does not fulfill most of the requirements of a good measure of central tendency
- It may be unrepresentative in many cases.
2.2.3 The Median
The median is the midpoint of the data array. The median of is denoted by ~
x . For ungrouped data
the median is obtained by
~
x LCB med
w
n / 2 cf
f med
30
cf Sum of frequencies of all class lower than the median class (in other words it is the
Then n / 2 50 / 2 25 ,
Since 28 is the first cumulative frequency to be greater than 25, the third class is the median class.
~
x LCB med
w
n / 2 cf
f med
75
x 109.5 50 / 2 10 109.5 113.67
~ 5
18 18
Merits of median
31
- Median can be calculated even in case of open-ended intervals.
Demerits of median
2.3 Quantiles
Quantiles are values which divides the data set arranged in order of magnitude in to certain equal
parts. They are measures of position (non-central tendency). Some of these values of quantiles are
quartiles, deciles and percentiles.
Q3
i. Quartiles: are values which divide the data set in to four equal parts, denoted by Q1 , Q2 and .
The first quartile is also called the lower quartile and the third quartile is the upper quartile. The
second quartile is the median.
Q j LCB Q j
w
fQj
j n 4 cf Q j ; j 1, 2, 3.
32
th
The j quartile class is the class with the smallest cumulative frequency greater than or equal to j n 4 .
It can be located by counting j n 4 of the frequencies beginning from the lowest class.
ii. Deciles: are values dividing the data in to ten equal parts, denoted by D1 , D2 , ..., D9 . The fifth
decile is the median.
For Ungrouped data
th
Let D j be the j percentile value for j 1, 2, ... , 9 . Then
th
j
D j n item; j 1, 2, ... , 9
10
For grouped data
We can apply the following formula:
D j LCB D j
w
f Dj
j n 10 cf D j ; j 1, 2, ... , 9
Pj LCB Pj
w
f Pj
j.n / 100 cf Pj ; j 1, 2, 3, ... , 99
33
Interpretations
1. Q j is the value below which ( j 25) percent of the observations in the series are found (where
j 1, 2, 3 ). For instance Q3 means the value below which 75 percent of observations in the given
series are found.
2. D j Is the value below which ( j 10) percent of the observations in the series are found (where
j 1, 2, ... , 9 ). For instance D4 is the value below which 40 percent of the values are found in the
series.
3. Pj is the value below which j percentof the total observations are found (where j 1, 2, 3, ... , 99 ).
For example 73 percent of the observations in a given series are below P73 .
Exercise: The temperature data calculate
Measures of variation are statistical measures, which provide ways of measuring the extent to
which the data are dispersed or spread out.
34
A good measure of dispersion should:
- be rigidly defined by a mathematical formula,
- be simple to understand and easy to calculate,
- be unique,
- be based on all observations in the series,
- not be affected by some extreme values existing in the series,
- be capable of further algebraic treatment as well as further statistical analysis.
In case the two sets of data are expressed in different units, however, such as quintals of sugar
versus tons of sugarcane or if the average sizes are very different such as manager‟s salary versus
worker‟s salary, the absolute measures of dispersion are not comparable. In such cases measures
of relative dispersion should be used.
Range (R) is defined as the difference between the largest and the smallest observation in a given
set of data. That is, R x max x min where xmax and xmin are the largest and the smallest
observations in the series respectively.
35
In case grouped data, range is found by taking the difference between the class mark of the last
class and that of the first class. That is, R CM last CM first where CM last and CM first are the
class marks of the last class and that of the first class respectively.
x max x min R
RR ........ for ungroupeddata
x max x min x max x min
CM last CM first R
RR ......... for grouped data
CM last CM first CM last CM first
- Range and relative range are easy to calculate and simple to understand.
- Both cannot be computed for grouped data with open ended classes.
- They do not tell us anything about the distribution of values in the series.
Example: Find the range and relative range for the monthly salary of ten workers in a certain
paint factory given below.
462 480 534 624 498 552 606 588 516 570
Solution:
xmax 624birr x min 462birr
R xmax xmin 624birr 462birr 162birr
x max x min 624birr 462birr 162birr
RR 0.149
x max xmin 624birr 462birr 1086birr
Example: Find the values of the range and relative range for the following frequency
distribution: which shows the distribution of the maximum loads supported by a certain number
of cables.
36
108 – 112 17
113 – 117 14
118 – 122 6
123 – 127 3
128 – 132 1
Solution:
Variance is the arithmetic mean of the square of the deviation of observations from their
arithmetic mean.
Population Variance ( 2 )
For raw data
N N
2
xi
x
2
i
1 N 2 i 1
2 i 1
... xi
N N i 1 N
Where is the population arithmetic mean and N is the total number of observations in the
population.
For ungrouped FD
2
f x
k
k
2 f i xi
i i
1 k 2 i 1
2 i 1
... f i xi
N N i 1 N
Where is the population arithmetic mean, x i is value/ reading of the i th class, k is number of
37
For grouped data
2
f x
k
k
2 f i xi
i i
1 k 2 i 1
2 i 1
... f i xi
N N i 1 N
Where is the population arithmetic mean, x i is the class mark of the i th class, k is number of
Where is the population arithmetic mean, x i is the class mark of the i th class, f i is the
Sample Variance ( S 2 )
For raw data
n n
2
xi x
2 xi
1 n 2 i 1
S2 ... xi
i 1
n 1 n 1 i 1 n
Where x is the sample arithmetic mean and n is the total number of observations in the sample.
For ungrouped FD
k k
2
f i ( xi x ) 2 f i xi
1 k i 1
S2 ... f i xi
i 1 2
n 1 n 1 i 1 n
Where x is the sample arithmetic mean, x i is the data value of the i th class, k is number of
38
k k
2
f i ( xi x ) 2 f i xi
1 k i 1
S2 ... f i xi
i 1 2
n 1 n 1 i 1 n
Where x is the sample arithmetic mean, x i is the class mark of the i th class, k is number of
Example: Find the variance and standard deviation of the following sample data.
xi 5 10 12 17 Total
xi x 2 36 1 1 36 74
n
x x
2
i
74
S2 i 1
24.67
n 1 3
x x
2
i
74
S S2 i 1
4.97
n 1 3
Example: Calculate the Variance variance ans the standard deviation of the temperature data.
(Assume the data as sample)
39
114.5 – 119.5 117 13 1521 177957
119.5 – 124.5 122 7 854 104188
124.5 – 129.5 127 1 127 16129
129.5 – 134.5 132 1 132 17424
Total 50 5710 653890
k
2
f i xi
1 k i 1 1 653890 (5710) 1 (653890 652082) 36.9
2
S i i
2 2
f x
n 1 i 1 n 49 50 49
S S 2 36.9 6.07
Coefficient of variation is used in such problems where we want to compare the variability of
two or more than two different series. Coefficient of variation is the ratio of the standard
deviation to the arithmetic mean, usually expressed in percent.
S
CV 100 . Where S is the standard deviation of the observations.
x
A distribution having less coefficient of variation is said to be less variable or more consistent or
more uniform or more homogeneous.
Example: Last semester, students of Biology and Chemistry Departments took Stat 273 course.
At the end of the semester, the following information was recorded.
Department Biology Chemistry
Mean score 79 64
Standard deviation 23 11
Compare the relative dispersions of the two departments‟ scores using the appropriate way.
Solution:
Biology Department Chemistry Department
S S
CV 100 CV 100
x x
23 11
100 29.11% 100 17.19%
79 64
40
Interpretation: Since the CV of Biology Department students is greater than that of Chemistry
Department students, we can say that there is more dispersion relative to the mean in the
distribution of Biology students‟ scores compared with that of Chemistry students.
– Its unit is the square of the unit of measurement of values. For example, if the variable is
measured in kg, the unit of variance is kg2.
– It is calculated based on all the observations/data in the series.
– It gives more weight to extreme values and less to those which are near to the mean.
Standard Deviation
A standard score is a measure that describes the relative position of a single score in the entire
distribution of scores in terms of the mean and standard deviation. It also gives us the number of
standard deviations a particular observation lie above or below the mean.
x
Population standard score: Z where x is the value of the observation, and are the
mean and standard deviation of the population respectively.
xx
Sample standard score: Z where x is the value of the observation, x and S are the mean
S
and standard deviation of the sample respectively.
41
Example: Two sections were given an exam in a course. The average score was 72 with standard
deviation of 6 for section 1 and 85 with standard deviation of 5 for section 2. Student A from
section 1 scored 84 and student B from section 2 scored 90. Who performed better relative to
his/her group?
x B x 2 90 85
Z-score of student B: Z 1.00
S2 5
From these two standard scores, we can conclude that student A has performed better relative to
his/her section students because his/her score is two standard deviations above the mean score of
selection 1 while the score of student B is only one standard deviation above the mean score of
section 2 students.
42
Chapter 3
Elementary Probability
Introduction
• Probability theory is the foundation upon which the logic of inference is built.
• It helps us to cope up with uncertainty.
• In general, probability is the chance of an outcome of an experiment. It is the measure of how
likely an outcome is to occur.
3.2 Review of set theory: sets, union, intersection complementation, De Morgan’s rules-
(Reading assignment)
43
Random experiment: - is a process of measurement or observation which is repeated at any
time and whose outcome can‟t be predicted with certainty.
-an experiment that can be repeated any numbers of times for similar condition and it is possible
to enumerate numbers of out comes.
e.g. tossing a coin
Outcome: - a particular result of an experiment (result of single trial of an experiment)
Sample space: - is the set of all possible outcomes of a random experiment. Each possible
outcome is called sample point.
Event: - is a subset of a sample space (one or more outcomes of an experiment)
Example1: if we toss a coin the sample space (S) of this experiment
S = {head, tail} where head and tail are two faces of a coin. If we are interested the outcome of
head will turn up then the event E= {head}
Example 2: find the sample space of tossing a coin twice.
S= {HH, HT, TH, TT}
Elementary or simple event: - an event having only one sample point.
Mutually exclusive event: - two events E1 and E2 are said to be mutually exclusive if there is
no sample point which is common to E1 and E2.
i.e. E1 E2 =
Independent event: two events E1 and E2 are said to be independent if the occurrence of E1
does not affect the occurrence of E2.
Dependent Events: Two events are dependent if the first event affects the outcome or
occurrence of the second event in a way the probability is changed.
Definition of probability
Finite sample space: when the outcomes of certain experiment are finite.
Equally likely outcomes: - if each outcome in a sample space has the same chance to be
occurred.
44
Example: In throwing a fair die all possible outcomes are equally likely. That means the
elements of the sample space have the chance to be occurred.
If the n actions can be done in n1 , n 2 ,, nk ways respectively, and no two actions can be done
at the same time, then the total possibilities of performing the action become n1 n2 nk .
2. Multiplication rule: - in a sequence of n events in which the first event has n1 possibilities…
the nth event has n 2 possibilities, then the total possibilities of the sequence will be
n1 n2 nk .
Example: The digits 0, 1, 2, 3, and 4 are to be used in 4 digit identification card. How many
different cards are possible if
a. Repetitions are permitted.
b. Repetitions are not permitted.
Solutions
a.
1st digit 2nd digit 3rd digit 4th digit
45
5 5 5 5
c) The number of permutation of n objects in which n1 are alike, n 2 are alike, n k are alike is
n!
n1!*n 2 !*... * nk
Example: How many different permutations can be made from the letters in the word
“CORRECTION”?
n! 10!
453600Permutations
n1!*n 2 !*... * nk 2!*2!*2!*1!*1!*1!*1!
46
Note: 0! =1! =1
Example: a photographer wants to arrange 3 persons in a raw for photograph. How many
different types of photographs are possible?
Solution:
Assume 3 persons Aster (A), lemma (L), Yared (Y) and n=3
Since n! =3! = 3*2! = 6, or 3P3 =3!/(3-3)!=3!/0!=6; there are 6 possible arrangement ALY,
AYL, LAY, LYA, YLA and YAL
Example2: fifteen athletes including Haile were entered to the race.
a) In how many different ways could prizes for the first, the second and the third place be
awarded?
b) How many of the above triplets just counted have if Haile is in the first position?
Solution:
15 objects taken 3 at a time 15P3=15! / (15-3)! = 2730
There are 14P2= 14! / (14-2) = 182
4. Combination: - counting technique in which the order of the objects is immaterial. Selection
of r objects from a collection of n objects where r<= n without regarding order. The combination
of n objects taking r objects at a time is given by
nCr = n!/(n-r)!r!
Example: In a club containing 7 members a committee of 3 people is to be formed. In how many
ways can the committee be formed?
Solution: 7C3 = 7! / (7-3)! 3! = 35
i. Classical approach: - Uses sample space to determine the numerical probability that an event
will happen. If there are n equally likely outcomes of an experiment, and out of the n outcomes
event A occur only f times the probability of the event A is denoted by P (A) is defined as
P (A) = n (A)/ n(S) =
- If total number of outcomes is infinite or if it is not possible to enumerate all elements of the
sample space.
47
- If each outcome is not equally likely
Example: A fair die is tossed once. What is the probability of getting?
a) Number 4?
b) An odd number?
c) An even number?
Solutions:
First identify the sample space, say S 1, 2, 3, 4, 5, 6
n(S)= 6
a) A 4, n(A)=1, P(A)=n(A)/ n(S)=1/6
b) A 1, 3, 5, n(A)=3, P(E)=n(A)/ n(S)=3/6=0.5
In other words given a frequency distribution , the probability of an event (A) being in a given
frquency of a class
class is P(A)=
total frequency in the distribution
Example: the national center for health statistics reported that of every 539 deaths in recent
years, 24 resulted that from automobile accident, 182 from cancer, and 353 from other disease.
What is the probability that particular death is due to an automobile accident?
1. P (A) ≥0
2. P (S) = 1, S is the sure event..
48
3. If A and B are mutually exclusive events, then either A or B occur equals the sum of of
the two probabilities P (A B) = P (A) + P (B)
4. P(A‟)=1-P(A)
5. 0≤P(A)≤1
6. P( )=0, is the impossible event.
49
Chapter 4: Conditional probability and independence
50
4.2 Multiplication Theorem, Baye’s Theorem, and Total probability
i. Multiplication Theorem
If A and B are any two events of a sample space such that P(A) ≠0 and P(B) ≠0, then P(A
B)=P(A)P(B/A)=P(B)P(A/B)
Theorem 2: Let {E1,E2, .., En} be partitions of the sample space S, and suppose E1,E2, .., En
has non-zero probability that is P(Ei) ≠ 0 for i= 1,2, … ,n and let E be any event for P(E) > 0,
then for each integer k, 1 ≤ k ≤ n, we have
p( E k P( E k E )
n Ek E)
P( ) P( E
) k
E = n
P( E
i 1
i E) P( E ) P( E E )
i 1
i i
Example: suppose that three machines are A1, A2 and A3 produce 60%, 30%, and 10%
respectively of the total production of machines are 2%, 4%, and 6% respectively.
i. If an item is selected at random, then find the probability that the item is defective. Assuming
that an item selected at random is found to be defective.
ii. Find the probability the item was produced on machine A1.
Solution: let B be an event of selecting a defective item at random and let E1, E2, E3 be an items
produced on machines A1, A2, A3 respectively then
P (B/E1) = 2%=0.02, P (B/E2) = 4% = 0.04 and P (B/E3) = 6% = 0.06
P (B) = P (B [E1 E2 E3])
= P ([B E1] [B E2] [B E3])
= P (B E1) + P (B E2) +P (B E3)
= P (E1)*P (B/E1) + P (E2)*P (B/E2) +P (E3)*P (B/E3)
= 0.6*0.02 + 0.3*0.04 + 0.1*.06
= 0.03
51
p ( E1 B ) P( E1) P( BE ) = 0.6 * 0.02 =0.4
We use Baye‟s formula P (E1/B) = = n
1
P( B) 0.03
P( E ) P( B E )
i 1
i i
n n
P( A) P( A Ei ) P( A Ei )P( Ei )
i 1 i 1
P( A B) P( A) , P( B A) P( B) .
52
Chapter Five
Example: Flip a coin three times, let X be the number of heads in three tosses.
S = (HHH, THH, HTH, HHT, TTH, THT, HTT, TTT)
Let the variable of interest, X, be the number of heads observed then relevant events would be
X (HHH) =3
X (HHT) =X (HTH) =X (THH) =2
X (HTT) =X (THT) =X (TTH) =1
X (TTT) =0
X= {0, 1, 2, 3}
The relevant question is to find the probability of each these events.
Note that X takes integer values even though the sample space consists of H‟s and T‟s. The
variable X transforms the problem of calculating probabilities from that of set theory to calculus.
Definition. A random variable (r.v.) is a rule that assigns a numerical value to each possible
outcome of a random experiment.
Interpretation:
-random: the value of the r.v. is unknown until the outcome is observed
- Variable: it takes a numerical value
Notation: We use X, Y , etc. to represent r.v.s.
Random Variables are of two types:
5.2 Discrete random variable: are variables which can assume only a specific number of values
which are clearly separated and they can be counted.
Example:
Toss coin n times and count the number of heads.
53
Number of Children in a family.
Number of car accidents per week.
Number of defective items in a given company.
Definition: A probability distribution is a complete list of all possible values of a random
variable and their corresponding probabilities.
Discrete probability distribution: is a distribution whose random variable is discrete.
Example: Consider the possible outcomes for the exp't of tossing three coins together.
Sample space, S = (HHH, THH, HTH, HHT, TTH, THT, HTT, TTT)
Let the r.v. X be the No of heads that will turn up when three coins tossed
X = {0, 1, 2, 3}
P(X = 0) = P (TTT) = 1/8,
P(X=1) = P (HTT) +P (THT) + P (TTH) =1/8+1/8+1/8 = 3/8
P(X=2) = P (HHT) +P (HTH) +P (THH) = 1/8+1/8+1/8 = 3/8,
P(X=3) = P (HHH) = 1/8
X=x 0 1 2 3
P(X=x) 1/8 3/8 3/8 1/8
If X is discrete r.v:
1. P(x) ≥0
2. P( x) 1
x
Note: If X is discrete rv then
b 1
P ( a X b) P( x)
x a 1
b 1
P(a X b) P( x)
x a
b
P ( a X b) P( x)
x a 1
b
P(a X b) P( x)
x a
5.3 Continuous random variable: are variables that can assume any value in an interval.
Example:
Height of students at certain college.
54
Mark of students.
Remarks.
(i) In data analysis we described a set of data (sample) by dividing it into classes and calculating
relative frequencies.
(ii) In Probability we described a random experiment (population) in terms of events and
probabilities of events.
(iii) Here, we describe a random experiment (population) by using random variables, and
probability distribution functions.
b
P (a X b) p (a x b) f ( x)dx
a
(area of shaded region)
b
Note: 1. if X is continuous rv then P (a X b) f ( x)dx
a
55
P ( a X b) P ( a X b) P ( a X b) P ( a X b)
The cumulative distribution function F(x) of a discrete random variable X with probability
The cumulative distribution function F(x) of a continuous random variable X with density
x
function f(x) is F ( x) P( X x) f (t )dt
for - x
Frequently in statistics, one encounters the need to derive the probability distribution of a
function of one or more random variables. For example, suppose that X is a discrete random
variable with probability distribution f(x), and suppose further that Y = u(X) defines a one-to-one
transformation between the values of X and Y . We wish to find the probability distribution of Y.
It is important to note that the one-to-one transformation implies that each value x is related to
one, and only one, value y = u(x) and that each value y is related to one, and only one, value x =
w(y), where w(y) is obtained by solving y = u(x) for x in terms of y.
It is clear that the random variable Y assumes the value y when X assumes the value w(y).
Consequently, the probability distribution of Y is given by
Theorem: Suppose that X is a discrete random variable with probability distribution f(x). Let Y
= u(X) define a one-to-one transformation between the values of X and Y so that the equation y
= u(x) can be uniquely solved for x in terms of y, say x = w(y). Then the probability distribution
of Y is g(y) = f[w(y)].
56
Example: let X be a geometric random variable with pmf:
x 1
31
f ( x) , x 1,2,3,.... find the probability distribution for Y X 2 .
44
Solution: since the values of X are all positive, the transformation defines a one-to-one
y 1
f
g ( y)
y 34 14 , y 1,4,9,...
0, otherwise
Suppose that X is a continuous random variable with probability distribution f(x). Let Y = u(X)
define a one-to-one correspondence between the values of X and Y so that the equation y = u(x)
can be uniquely solved for x in terms of y, say x = w(y).
Then the probability distribution of Y is g(y) = f[w(y)]|J|, where J = w‟(y) and is called the
Jacobian of the transformation
x
, 1 x 5,
f ( x) 12
0, elsewhere,
( y 3) / 2 1 y 3
, - 1 y 7,
Y to be g ( y ) 12 2 48
0,
otherwise
57
Chapter Seven
Two dimensional Random Variables
7.1 Two dimensional random variables
Our study of random variables and their probability distributions in the preceding sections is
restricted to one-dimensional sample spaces, in that we recorded outcomes of an experiment as
values assumed by a single random variable. There will be situations, however, where we may
find it desirable to record the simultaneous outcomes of several random variables. For example,
we might measure the amount of precipitate P and volume V of gas released from a controlled
chemical experiment, giving rise to a two-dimensional sample space consisting of the outcomes
(p, v), or we might be interested in the hardness H and tensile strength T of cold-drawn copper,
resulting in the outcomes (h, t). In a study to determine the likelihood of success in college based
on high school data, we might use a three dimensional sample space and record for each
individual his or her aptitude test score, high school class rank, and grade-point average at the
end of freshman year in college.
If X and Y are two discrete random variables, the probability distribution for their simultaneous
occurrence can be represented by a function with values f(x, y) for any pair of values (x, y)
within the range of the random variables X and Y . It is customary to refer to this function as the
joint probability distribution of X and Y.
that is, the values f(x, y) give the probability that outcomes x and y occur at the same time.
i. The function f(x, y) is a joint probability distribution or probability mass function of the
discrete random variables X and Y if
1. f ( x, y ) 0 for all ( x, y ),
2. f ( x, y) 1,
x y
3. P( X x, Y y ) f ( x, y ).
58
For any region A in the xy plane, P[( X , Y ) f ( x, y ).
A
Example: Two ballpoint pens are selected at random from a box that contains 3 blue pens, 2 red
pens, and 3 green pens. If X is the number of blue pens selected and Y is the number of red pens
selected, find
Solution : The possible pairs of values (x, y) are (0, 0), (0, 1), (1, 0), (1, 1), (0, 2), and (2, 0).
(a) Now, f(0, 1), for example, represents the probability that a red and a green pens are selected.
8
The total number of equally likely ways of selecting any 2 pens from the 8 is 28. The
2
2 3
number of ways of selecting 1 red from 2 red pens and 1 green from 3 green pens is 6 .
1 1
Hence, f(0, 1) = 6/28= 3/14. Similar calculations yield the probabilities for the other cases, which
are presented in the Table below. Note that the probabilities sum to 1.
3 2 3
x y 2 x y
f ( x, y ) for x=0, 1, 2; and 0≤x+y≤2. Or
8
2
x Row totals
f(x,y) 0 1 2
0 3 9 3 15
28 28 28 28
1 3 3 0 3
y
14 14 7
2 1 1
28 0 0 28
5 15 3
Column Totals 14 28 28 1
ii. The function f(x, y) is a joint density function of the continuous random variables X and Y if
59
1. f ( x, y ) 0 for all ( x, y ),
2.
f ( x, y )dxdy 1,
Example: A privately owned business operates both a drive-in facility and a walk-in facility.
On a randomly selected day, let X and Y, respectively, be the proportions of the time that the
drive-in and the walk-in facilities are in use, and suppose that the joint density function of these
random variables is
2
(2 x 3 y ), 0 x 1, 0 y 1,
f ( x, y ) 5
0, elsewhere.
a. Verify condition two of the definition for joint pdf of continuous variables.
1 1 1
b. Find P[(X, Y ) ∈ A], were A ( x, y) 0 x , y
2 4 2
Solution: (a) The integration of f ( x, y ) over the whole region is
1 1 2
f ( x, y )dxdy
0 0 5 (2 x 3 y)dxdy
2 x 2 6 xy x 1
1 1
dy
0 0
5 5 x 0
1
2 6y 2 y 3y2 y 1 2 3
dy 1
0
5 5 5 5 y 0 5 5
y 3 y 2 1 / 2 1 1 3 1 3 13
10 10 1 / 4 10 2 4 4 6 160
60
Given the joint probability distribution f(x, y) of the discrete random variables X and Y, the
probability distribution g(x) of X alone is obtained by summing f(x, y) over the values of Y .
Similarly, the probability distribution h(y) of Y alone is obtained by summing f(x, y) over the
values of X. We define g(x) and h(y) to be the marginal distributions of X and Y , respectively.
When X and Y are continuous random variables, summations are replaced by integrals. We can
now make the following general definition.
Marginal distributions
The term marginal is used here because, in the discrete case, the values of g(x) and h(y) are just
the marginal totals of the respective columns and rows when the values of f(x, y) are displayed in
a rectangular table.
Conditional distributions
Let X and Y be two random variables, discrete or continuous. The conditional distribution of the
random variable Y given X=x is
f ( x, y )
f ( y x) , provided g ( x) 0.
g ( x)
f ( x, y )
f ( x y) , provided h( y ) 0.
h( y )
61
Let X and Y be two random variables, discrete or continuous, with jo1int probability distribution
f(x, y) and marginal distributions g(x) and h(y), respectively. The random variables X and Y are
said to be statistically independent if and only if
y1 u1 ( x1 , x2 ) and y2 u2 ( x1 , x2 )
may be uniquely solved for x1 and x 2 in terms of y1 and y 2 , say x1 w1 ( y1 , y2 ) and
x1 x1
y1 y 2
J
x2 x2
y 2 y 2
x1
and is simply the derivative of x1 w1 ( y1 , y2 ) with respect to y1 and y 2 held constant,
y1
referred to in calculus as the partial derivative of x1 with respect to y1 . The other partial
derivatives are defined in a similar manner.
62
Chapter Eight
Expected Value
Let X be a discrete random variable X whose possible values are X1, X2 …., Xn with the
probabilities P(X1), P(X2),P(X3),…….P(Xn) respectively.
Then the expected value of X, E(X) is defined as:
E(X) =X1P(X1) +X2P(X2) +……..+XnP (Xn)
n
E (X) = X P X x if X is discrete,
i 1
i i
and E ( X ) xf ( x)dx if X is continuous.
Example: what is the expected value for the r.v from the above example?
Solution X= 0,1,2,3, X 1 0, X 2 1, X 3 2, X 4 3
P(X=x1) = 1/8 P(X= X2) =3/8 , P (X= x4) = 1/8
3
E (X) = X P X x
i 1
i i
g ( X ) E[ g ( X )] g ( x) f ( x) if X is discrete, and
x
g ( X ) E[ g ( X )] g ( x) f ( x)dx if X is continuous
63
Variance
If X is a discrete random variable with expected value (i.e. E(X) = ), then the variance of
X, denoted by Var (X), is defined by
Var (X) = E(X- ) 2
= E (X2) - 2
n
( xi ) P x -
2 2
= i
i 1
( xi X ) P x
n 2
Alternatively, Var (X) = i
i 1
Properties of Variances
For any r.v X and constant C, it can be shown that
Var (CX) = C2 Var (X)
Var (X +C) = Var (X) +0 = Var (X)
If X and Y are independent random variables, then
Var (X + Y) = Var (X) + Var (Y)
More generally if X1, X2 ……, Xk are independent random variables,
Then Var (X1 +X2 + …..+ Xk) = Var (X1) +Var (X2) +…. + var (Xk)
k k
I.e. Var xi
i 1
Var X
i 1
i
= 0(1-P) + 1(P) = P
64
Var (X) = E (X2) - 2
= xi P X x
2
i
2
= [02 P x 0 1 P x 1 ] - P2
2
= [0(1 p) 1( p)] P
2
= P P 2 = P (1-P)
2. Two fair coins are tossed. Determine Var (X) where X is the number of heads that appear.
A) Use the definition of the variance
B) Use the fact that the variance of the sum of independent variables is equal to the
sum of the respective variances.
= ½ - (1/2)2 = ¼ = ½ - (1/2)2 = ¼
X and Y are independent (i.e. the outcome of one coin does not influence the out come of the
second)
Var (X+Y) = Var (X) +Var (Y) = 1/4 +1/4 = ½
65
The rth moment about the origin
x r f ( x), if X is discrete, of the random variable X is given by
r E ( x r ) x
x r f ( x)dx, if X is continuous,
-
The moment generating function about the random variable X is given by E (e tx ) and is denoted
e tx f ( x) if X is discrete,
by M X (t ) E (e tx ) x
e tx f ( x)dx if X is continuous.
(Chebyshev’s Theorem) The probability that any random variable X will assume a value within
1
k standard deviations of the mean is at least 1-1/k2. That is, P ( k X k ) 1 .
k2
Let X and Y be random variables with joint probability f ( x, y ) . The covariance of X and Y is
XY E[( X X )(Y Y )] ( x x )( y y ) f ( x, y )dxdy if X and Y are continuous.
Let X and Y be random variables with covariance XY and standard deviations X and Y ,
XY
XY .
XY
66
Chapter Nine
Common Discrete Probability Distributions
9.1 Common Discrete Probability Distributions
9.1.1 Binomial Distribution
The origin of binomial experiment lies in Bernoulli trial. Bernoulli trial is an experiment of
having only two mutually exclusive outcomes which are designated by “success(s)” and “failure
(f)”. Sample space of Bernoulli trial is {s, f}.
Notation: Let probability of success and failure are p and q respectively P (success) = P(s) = p
and P (failure) = P (f) = q, where q= 1- p
Definition: Let X be the number of success in n repeated Binomial trials with probability of
success p on each trial, then the probabilities distribution of a discrete random variable X is
called binomial distribution.
Let P = the probability of success
67
q= 1-P= the probability of failure on any given trial.
A binomial random variable with parameters n and p represents the number of r successes in n
independent trials, when each trial has P probability of success
probability distribution .
Assumptions of a binomial distribution
1. The experiment consists of n identical trials.
2. Each trial has only one of the two possible mutually exclusive outcomes, success
or a failure.
3. The probability of each outcome does not change from trial to trial.
4. The trials are independent.
Examples of binomial experiments
Tossing a coin 20 times to see how many tails occur.
Asking 200 people weather or not they listen the BBC news.
2 3 2
3! 1 1
b) P (X=2) = 3
2! 3 2 ! 2 2 8
2. The probability that a student entering a college will graduate is 0.4. Determine the probability
that out of 5 students (a) none, (b) one (b) at least one (a) at most three will graduate
Solution X: No of students who will graduate
68
X = 0,1,2,3,4,5, P = 0.4, q = 1-P = 0.6
5!
0.40 0.65 0.08
0!5 0!
a) P (None will graduate) = P (X=0) =
5!
0.41 0.65 0.26
1!5 1!
b) P (one will graduate) = P (X=1) =
= 1-0.08=0.92
d) P (at most three will graduate) = P(X 3)
= 1-P(X>3)
= 1- [ P( x 4) P ( x 5)]
= 1-[5!/(4!(5-4)!(0.4)4(0.6)1+5!/5!(5-5)!)(0.4)5(0.6)0]
= 0.91296
If X is a binomial random variable with two parameters n and P, then
1. E (X) = n.p.
2. Var ( X) = npq
9.1.2 Poisson distribution
- It is a discrete probability distribution which is used in the area of rare events such as
number of car accidents in a day, arrival of telephone calls over interval of times, number
of misprints in a typed page natural disasters Like earth quake, etc,
- The expected occurrences of events can be estimated from part trials ( records)
- The numbers of success or events occur during a given regions / time intervals are
independent in another.
69
Definition Let X be the number of occurrences in a Poisson process and be the actual
average number of occurrence of an event in a unit length of interval, the probability function for
Poisson distribution is,
e x
P (X = x) =
forX 0,1, 2,....
X!
0, otherwise
Remarks
Poisson distribution possesses only one parameter
If X has a Poisson distribution the parameter , then E (X) = and
Var (X) = , i.e. E (X) = Var (X) = ,
P( X x) 1
x 0
Examples 1: A company manufacturing light bulbs discovers from past experience that 2 defects
of bulbs are manufactured per 30 working hours. What is the prob that 4 defects will be
manufactured in 30 working hours?
Solution: Let X be the R.v that the no of defected bulbs and 2,
e 2 .2 4
P (X = 4) 0.09
4!
Example 2: In a small city, 10 accidents took place in a time of 50 days. Find the probability that
there will be
a) Two accidents in a day
b) three or more accidents in a day
Solution: In 50 days we have 10 accidents, then the number of accidents per day becomes
10/50 = 0.2 or 0.2
Let X be the rv., the No of accidents per day
X ~poiss 0.2 X = 0, 1, 2,…
e 0.2 0.2
2
a) P (X = 2) = 0.0164
2!
b) P (X 3) P( X 3) P X 4 P X 5 ...
70
= 1- P X 0 P X 1 P X 2 . . . . . ………… b/c P X x 1
x 0
= 1- 0.8187 0.1637 0.0164
= 0.0012
3. a) Referring to eg.1, what is the expected no of defected light bulbs in a day? What
about the variance?
b) Referring to eg.2, find the mean and the variance for the no of accidents in a day
Solution a) E (X) = Var (X) = 2
b) E (X) = Var (X) = 0.2
Example 3: Suppose the number of typographical errors on a single page of your book has a
Poisson distribution with parameter λ = 1/2. Calculate the probability that there is at least one
error on this page.
2000
P( X 1 1 P x 1 1 p x 0 1 6 0.001 0.999 0.8648
0 2000
Since n = 2000 > 50 and np = 2, we can use Poisson approximation
71
e 2 .2
0
P (X 1) = 1-p x 1 1 p x 0 1 0.8647
0!
If repeated independent trials can result in a success with probability p and a failure with
probability q=1-p, then the probability distribution of the random variable X, the number of the
trial on which the first success occurs, is
g ( x; p) pq x 1 , x 1, 2, 3, ...
1 1 p
The mean and variance of geometric distributions are and respectively.
p p2
The density function of the continuous uniform random variable X on the interval [a, b] is
1
, a x b,
f ( x; a, b) b a
0, elsewhere.
ab
and 2
b a 2
2 12
It is the most important distribution in describing a continuous random variable and used as an
approximation of other distribution. A random variable X is said to have a normal distribution if
its probability density function is given by
1
2 x
2
1
f ( x) e 2 , Where X is the real value of X,
2
i.e. - <x< , -∞<µ<∞ and σ>0
2
Where µ=E(X), (σ) = variance(X)
µ and (σ) 2 are the parameters of the Normal Distribution.
Properties of Normal Distribution:
72
1. It is bell shaped and is symmetrical about its mean. The maximum coordinate is at
x = X
2. The curve approaches the horizontal x-axis as we go either direction from the mean.
1
1 x 2
3. Total area under the curve sums to 1, that is
f ( x)dx
2 e 2
dx 1
4. The Probability that a random variable will have a value between any two points is equal
to the area under the curve between those points.
5. The height of the normal curve attains its maximum at X this implies the mean and
mode coincides(equal)
It is a normal distribution with mean 0 and variance 1. Normal distribution can be converted to
standard normal distribution as follows. If X has normal distribution with mean X and standard
x
deviation σ, then the standard normal distribution devariate Z is given by Z=
2
1 z
P (Z) =
2
e 2
73
The continuous random variable X has an exponential distribution, with parameter , if its
density function is given by
1 x /
e , x 0,
f ( x; ) where 0.
0,
elsewhere
74