0% found this document useful (0 votes)
107 views75 pages

Statistics for Engineers & Scientists

The document provides an introduction to statistics, defining its branches as descriptive and inferential statistics, and outlining the stages of statistical investigation including data collection, organization, presentation, analysis, and inference. It also explains key statistical terms such as population, sample, parameter, and statistic, and discusses the applications, uses, and limitations of statistics in various fields. Additionally, it covers measurement scales and methods for data collection and presentation, including frequency distributions.

Uploaded by

hagos mekonen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views75 pages

Statistics for Engineers & Scientists

The document provides an introduction to statistics, defining its branches as descriptive and inferential statistics, and outlining the stages of statistical investigation including data collection, organization, presentation, analysis, and inference. It also explains key statistical terms such as population, sample, parameter, and statistic, and discusses the applications, uses, and limitations of statistics in various fields. Additionally, it covers measurement scales and methods for data collection and presentation, including frequency distributions.

Uploaded by

hagos mekonen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Statistics for Engineering and Scientist

Chapter One

1.1 Introduction

1.1.1 Definition and Classification of Statistics

Statistics is a branch of applied mathematics that deals with the collection, organization,
presentation, analysis and interpretation of numerical data.

Classification of Statistics

Statistics is broadly categorized into two categories based on how the collected data are used.

Classifications:
Depending on how data can be used statistics is sometimes divided in to two main areas or
branches.
1. Descriptive Statistics: is concerned with summary calculations, graphs, charts and tables.
2. Inferential Statistics: is a method used to generalize from a sample to a population. For
example, the average income of all families (the population) in Ethiopia can be estimated from
figures obtained from a few thousands (sample) families.
• It is important because statistical data usually arises from sample.
• Statistical techniques based on probability theory are required.

Examples:
a) From past figures, it has been predicted that 31% of registered voters will vote in the
November election.
b) The average age of a student in Hawassa University is 20.1 years.

1.1.2 Stages in statistical investigation.


There are five stages or steps in any statistical investigation.

i. Collection of data: the process of measuring, gathering, assembling the raw data up on which
the statistical investigation is to be based.

1
Data can be collected in a variety of ways and shall be discussed later.

ii. Organization of data:

Summarization of data in some meaningful way, e.g. table form

iii. Presentation of the data:

The process of re-organization, classification, compilation, and summarization of data to present


it in a meaningful form.

iv. Analysis of data: The process of extracting relevant information from the summarized data,
mainly through the use of elementary mathematical operation.

v. Inference of data:
The interpretation and further observation of the various statistical measures through the analysis
of the data by implementing those methods by which conclusions are formed and inferences
made.

Statistical techniques based on probability theory are required.

1.1.3 Definition of some terms

Population: the totality of all subjects under study.

Sample: a part/ portion of the population selected to draw conclusions about the population. It
should be selected using some pre-defined sampling technique in such a way that they represent
the population very well.

Parameter: any statistical measure that refers to a population or computed from from a
population data.
Statistic: any statistical measure that refers to a sample or computed from from a sample data.
Census: A survey in which there is complete coverage.
Sample survey: A survey in which there is partial coverage.
Sampling: The process or method of sample selection from the population.
Sample size: The number of elements or observations to be included in the sample.
Variable: is a characteristic or attribute that can assume on many different numerical values.
Data: Data as a collection of related facts and figures from which conclusions can be drawn.

2
There are two types of variables.
Qualitative Variables: are nonnumeric variables and can't be measured but can be placed in to
distinct categories, according to some characteristics or attributes.
Examples: gender, religious affiliation, political affiliation etc
Quantitative variables: are variables that can be quantified or can have numerical values.
Examples: height, income, temperature e t c.
Quantitative variables can be further classified as

 Discrete variables, and

 Continuous variables

a) Discrete variables are variables whose values are counts.


Examples: number of students, family size, Number of pages of a book.

b) Continuous variables are variables that can have any value within an interval.
Examples: weight, Length, Volume, e t c.

There are four types of measurement scales for variables

1.1.4 Applications, uses and limitations of statistics

Applications of statistics:

• In almost all fields of human endeavor.

• Almost all human beings in their daily life are subjected to obtaining numerical facts e.g.
about price.

• Applicable in some process e.g. invention of certain drugs, extent of environmental


pollution.

• In industries especially in quality control area.

Uses of statistics:

The main function of statistics is to enlarge our knowledge of complex phenomena. The
following are some uses of statistics:

 It presents facts in a definite and precise form.

3
 Data reduction.

 Measuring the magnitude of variations in data.

 Furnishes a technique of comparison

 Estimating unknown population characteristics.

 Testing and formulating of hypothesis.

 Studying the relationship between two or more variable.

 Forecasting future events.

Limitations of statistics

As a science statistics has its own limitations. The following are some of the limitations:

• Deals with only quantitative information.

• Deals with only aggregate of facts and not with individual data items.

• Statistical data are only approximately and not mathematical correct.

• Statistics can be easily misused and therefore should be used be experts.

1.1.5 Scales of Measurement

The type of variable classification depend on how variables are categorized, counted or
measured. This variable classification uses measurement scales and four common types of
measurement scales are used. Namely: nominal, ordinal, interval and ratio.
Nominal scale: the nominal level of measurement classifies data into mutually exclusive (non-
overlapping) exhausting categories in which no order or ranking can be imposed on the data.

Eg: Residence (rural, urban),


Political affiliation (Democrat, Republican)

Ordinal scale: the ordinal level of measurement classifies data into categories that can be
ranked, however, no precise differences between the ranks do exist.

Examples: Letter grades (A, B, C, D and E)


Rating scale (poor, good, excellent)

4
Interval Scale: the interval level of measurement ranks data and precise differences between
units of measure do exist, however, there is no true zero.

Example: Temperature, IQ test scores of persons

Ratio Scale: the ratio level of measurement possesses all the characteristics of interval
measurement and there exists a true zero. In addition, true ratios exist when the same variable is
measured on two different members of the poplation.

Example: Height, weight, time, salary etc

Summary of Characteristics of the Four Levels of Measurement

Level Difference Ranked Distance between True zero


Categories categories measured
Nominal Yes
Ordinal Yes Yes
Interval Yes yes Yes
Ratio Yes yes yes yes

5
1.2 Methods of data collection and presentation
After clarifying problem and formulating questions or hypotheses that can be answered with the
study, then we collect the data. Having collected a set of data, a primary step is to organize and
present the information using frequency distributions, charts and graphs. In this chapter we learn
first how to collect data then organize and present data by means of tables, charts and graphs.

1.2.1 Methods of Data Collection

Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is
statistical data when they are

 Comparable
 Meaningful and
 Collected for a well-defined objective
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of
magnitude.

 It enables us to know the range of the data set easy and it also gives us some idea
about the general characteristics of the distribution.

Any scientific investigation requires data related to the study. The required data can be obtained
from either a primary source or a secondary source.

1.2.2 Sources of Data

Primary source: Is a source of data that supplies firsthand information for the use of the
immediate purpose.

 Primary data: are data originally collected for the immediate purpose.

6
- Primary data are more expensive than secondary data.

The process of data collection from a primary source may be through:


 Field trials
 Laboratory experiments
 Surveys – census survey (personal interview, telephone interview and mailed questionnaire)
Secondary source: are individuals or agencies, which supply data originally collected for other
purposes by them or others.

- Usually they are published or unpublished materials, records, reports, e t c.

 Secondary data: data collected from a secondary source.

1.2.3 Methods of Data Presentation

Introduction

Classification: - is the process of arranging items/data into classes or categories according to


their similarities.

Classification is necessary because it would not be possible to draw inferences and conclusions if
we have a large set of collected [raw] data.

1.2.3.1 Frequency Distributions

A frequency distribution is the organization of raw data in table form using classes and
frequencies.
Frequency: - is the number of times a certain value or set of values occurs in a specific group.

Generally, there are three basic types of frequency distributions: Categorical/qualitative,


Ungrouped and Grouped frequency distributions.

i. Categorical Frequency Distribution:

This distribution is used to data that can be placed in categories such as nominal or ordinal.

Example: A social worker collected data on marital status for 25 persons (M= married, S=single,
W=widowed, D=divorced)

M S D W D

7
S S M M M
W D S M M
W D D S S
S W W D D

Solution: there are four types of marital status M, S, D and W. These categories will be used as
classes for the frequency distribution. The following procedures shall be followed while
constructing categorical frequency distribution.

Step 1: Make a table as shown.

Class Frequency Percent


(1) (2) (3)
M
S
D
W

Step 2: Count the categories and place the result in column (2).
f
Step 3: Find the percentages of values in each class by using; %  * 100 where f=frequency
n
of the class and n= total no of observations.
-Percentages are not normally a part of frequency distribution but they can be added since they
are used in certain types diagrammatic such as pie charts.

Step 4: Find the total for column (2) and (3).


Combing all the steps one can construct the following frequency distribution.

Class Frequency Percent


(1) (2) (3)
M 6 6/25*100=24
S 7 28
D 7 28
W 5 20
Total 25 100

ii.Ungrouped frequency distribution

8
Ungrouped frequency distribution is a table of all potential raw scored values that could possibly
occur in the data along with their corresponding frequencies. Ungrouped frequency distribution
is often constructed for small set of data.

– Raw data: data collected in original form.


– Array:data arranged, in ascending or descending order.
– Class: different, non-overlapping groups of data.
Constructing an ungrouped frequency distribution
To construct an ungrouped frequency distribution,

 First find the smallest and the largest raw scores in the collected data and calculate the range.
For small range values UFD is appropriate.

 Arrange the data in order of magnitude to facilitate counting then count the frequency of the
values.

 Then construct the UFD by putting the classes along with corresponding frequencies.

Example: Given the following data construct an appropriate frequency distribution.


12 12 12 16 18
12 18 12 17 15
15 16 12 16 16
12 14 15 15 15
19 13 16 16 14

Solution:
STEP 1. Find the range of the data:
Range  Maximumobservation  Minimum observation =19-12=7. Since the range of the data

(7) is small, classes consisting of a single data value can be used. They are 12, 13, 14, 15, 16, 17,
18 and 19.
STEP 2. Arrange the data 12, 12, 12, 12, 12, 12, 12, 13, 14, 14, 15, 15, 15, 15, 15, 16, 16, 16,
16, 16, 16, 17, 18, 18, 19
STEP 3. Then construct the UFD by putting the classes along with corresponding frequencies.
Finally,
Age Frequency Percent
12 7 28
13 1 4

9
14 2 8
15 5 20
16 6 24
17 1 4
18 2 8
19 1 4
25 100

iii. Grouped Frequency Distribution

When the range of the data is large, the data must be grouped into classes. Grouped frequency
distribution is a frequency distribution when several numbers of data are grouped into one class.

Some Important Definitions


 Class limits: separate one class in a grouped frequency distribution from another.
o The limits could actually appear in the collected data and have gaps between the
upper limit of one class and the lower limit of the next class.
– Unit of measurement (U): the distance/ difference between any two values of the variable
being measured. It is usally taken as 1, 0.1, 0.01, 0.001 etc.
– Class boundaries: separate one class in a grouped frequency distribution from another.
o The boundaries have one more decimal place than the raw data and therefore do not
appear in the collected data.
o There is no gap between the upper boundary of one class and the lower boundary of
the next class.
o The lower class boundaries (LCBs) are found by subtracting 0.5 units of
measurement from the lower class limits (LCLs) and the upper class boundaries
(UCBs) are found by adding 0.5 units of measurement to the upper class limits
(UCLs). That is, LCBi=LCLi+ 1 2 U and UCBi =UCLi + 1
2 U
– Class width (W): the difference between the upper and lower boundaries of any class
o or the difference between two lower limits of consecutive classes,
o or the difference between upper limits of two consecutive classes.
o or the difference between two lower boundaries of consecutive classes,
o or the difference between upper boundaries of two consecutive classes.
N.B: Class width is not equal to the difference between UCL and LCL of the same
class.

10
– Class mark (M): the mid point of a class interval.
UCLi  LCLi
i.e. M  or M  LCBi  UCBi
2 2
– Cumulative frequency (Cf) less than type: the total frequency of all values (observations) less
than or equal to the upper class boundary for the given class.
– Cumulative frequency (Cf) more than type: The total frequency of all values (observations)
greater than or equal to the lower class boundary for the given class.

Cumulative frequency distribution: A tabular arrangement of class intervals together with their
corresponding cumulative frequency (either less than or more than type; as defined above).
Relative frequency: the frequency of a class divided by the total frequency (i.e. sum of all
frequencies) and, if multiplied by 100, gives the percent of values falling in that class.
frequencyof the class
Re lative frequencyof a class 
total frequency
Note:
 The relative frequency shows what fractional part or proportion of the total frequency
belongs to the corresponding class.
 The sum of all the relative frequencies in the frequency distribution is always 1.

Guidelines to construct a grouped frequency distribution

STEP 1. Determine the unit of measurement, U


STEP 2. Find the maximum (Max) and the minimum(Min) observation, and then compute their
range, R Range  Max  Min
STEP 3. Select the number of classes desired (K), usually between 5 and 20 or use Sturge‟s

Formula: k  1 3.332log10 n where n is the total frequency and round this value of k up
to get an integer number when it turns to be fraction.
STEP 4. Find the class widths (W) by dividing the range by the number of classes and round the
number up to get an integer value. W R
K
STEP 5. Pick a suitable starting point less than or equal to the minimum value. This starting point
is the lower limit of the first class. Continue to add the class width to this lower limit to
get the rest of the lower limits.

11
STEP 6. Find the upper class limits. To find the upper class limit of the first class, subtract one unit
of measurement from the lower limit of the second class. Then continue to add the class
width to this upper limit so as to get the rest of the upper limits.

STEP 7. Compute the class boundaries as: LCB i  LCLi  12 U and UCBi  UCLi  12 U
STEP 8. Put the classes along with corresponding counts/ frequencies.
STEP 9. (If necessary) Find the cumulative frequencies (more than and less than types).
Example: The following data represent the record of high temperature in 0 F for each of the 50 US
states.
112 110 111 110 112 116 118 105 110 109
112 100 118 104 116 114 118 122 114 114
105 110 127 112 111 108 115 118 117 118
112 109 118 120 114 120 110 121 113 120
119 106 107 117 134 114 113 126 117 105

Construct a suitable frequency distribution.

STEP 1. Unit of measurement; U= 1


STEP 2. Max = 134, Min = 100 so that R = 134-100 = 34

STEP 3. k  1  3.332 log10 50  6.644  7

STEP 4. Class width W  34  4.9  5


7
STEP 5. Let 100 (the smallest observation) be the Starting point or lower limit of the first class. And
hence the lower class limits become
100 105 110 115 120 125 130
STEP 6. Upper limit of the first class = 105-1 = 104. And hence the upper class limits become
104 109 114 119 124 129 134
STEP 7. By subtracting 0.5 units of measurement from the lower class limits and by adding 0.5 units
of measurement to the upper class limits, we can get lower and upper class boundaries as
follows. Hence, the LCBs are 99.5 104.5 109.5 114.5 119.5 124.5 129.5 and the UCBs
are 104.5 109.5 114.5 119.5 124.5 129.5 134.5

STEPS 8, 9 and 10 are displayed in the following table.

Class limits Class frequency Cumulative Cumulative


boundaries frequency frequency

12
(less than (more than
type) type)
100 – 104 99.5 – 104.5 2 2 50
105 – 109 104.5 – 109.5 8 10 48
110 – 114 109.5 – 114.5 18 28 40
115 – 119 114.5 – 119.5 13 41 22
120 – 124 119.5 – 124.5 7 48 9
125 – 129 124.5 – 129.5 1 49 2
130 – 134 129.5 – 134.5 1 50 1

1.2.3.2 Diagrammatic and/or graphical presentation of data:

The data that is presented by a frequency distribution can also be displayed diagrammatically or
graphically. There are techniques for presenting data in visual display.

Diagrams and graphs:

 are visual aids which give a bird‟s eye view about a given set of numerical data;

 have greater attraction than mere figures (numbers);

 facilitate comparison of data;

 are easily understandable by anyone who does have no statistical background

a. Diagrammatic presentation of data: Bar charts, pie-chart, pictogram, Stem and leaf plot

Usually diagrams are appropriate for presenting discrete data, whereas graphs are appropriate for
presenting continuous types of data.

There are three common diagrammatic presentations of data: bar-diagram/charts, pie-chart and
pictograms. Stem and Leaf plot is also used some times.

i. Pie-charts
A pie-chart is a circle that is divided into sections or wedgrs according to the percentages of
frequencies in each category of the distribution. The angle of the sector of a class is obtained by
multiplying the ratio of the frequency of the class to the total frequency by 3600.
frequencyof the class
i.e. sector angleof a class   3600
total frequency

13
Note that pie-charts are usually used for depicting nominal level data.

Example: for the data given in the UFD below, construct a pie chart.
Class Frequency
M 6
S 7
D 7
W 5
Total 25

How to draw a pie-chart


- First find the percentages of each class
- Next calculate the degree measures for each class
- Finally, using a protractor, put each sector /degree measure/ in a circle and give a key for
explanation.
Class Frequency Percent Degree
M 6 6/25*100=24 86.4
S 7 28 100.8
D 7 28 100.8
W 5 20 72
Total 25 100 360

Now we can draw the pie-chart for the data.

Chart Title

5, 20% 6, 24%

7, 28%
7, 28%

M S D W

ii. Bar-diagrams/ Bar-charts

14
 Bar-diagram is a series of equally spaced bars having equal width and the height of each
bar representing the magnitude or frequency of observations in each group.

 They are useful for comparing aggregate over time space.

 Bar-diagrams can be drawn either horizontally or vertically.

There are different types of bar charts. The most common being:

o Simple bar chart


o Component bar chart
o Multiple bar chart

a. Simple bar chart

– Are used to display data on one variable.


– They are thick lines (narrow rectangles) having the same breadth. The magnitude of a
quantity is represented by the height /length of the bar.
Example: The following data represent sale by product, 1957- 1959 of a given company for three
products A, B, C.

Product Sales ($) Sales ($) Sales ($)


In 1957 In 1958 In 1959
A 12 14 18
B 24 21 18
C 24 35 54

The horizontal bar chart for sales in 1957 is then as follows

15
b. Component Bar Chart

When it is desired to show how a total (an aggregate) is divided into component parts, we use
component bar diagram. In such type of bar-diagrams, the bars represent aggregate value of a
variable with each aggregate broken into its component parts and different colors or designs are
used for identification.

Example: Represent the data given above using component bar-charts

c. Multiple bar-charts

Multiple bar-diagrams are used to display data on more than one variable. They are used for
comparing different variables at the same time.

Example: Use the same data given in the above example and depict it sing Multiple bar charts.

16
iii. Pictograms
In pictograms, we represent the data by means of some picture symbols. Here we decide a
suitable picture to represent a definite number of units in which the variable is measured.

Example: Draw a pictorial diagram to present the following data (number of students in a certain
school for four years.)

Year 1992 1993 1994 1995


No. of students 2000 3000 5000 7000

Let a single picture () represents one thousand students.

1995 
1994  Key: = 1000 students
1993 
1992 

iv. Stem and Leaf Plot

A stem and leaf plot is a device for presenting quantitative data in a graphical format similar to
histogram to assist in visualizing the shape of the distribution.
 While Constructing stem and leaf plot, first order observations in ascending order
Example: 44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106
The leaf represents the ones place and the stem represents the rest.

17
4 4 6 7 9
5
6 3 4 6 8 8
7 2 2 5 6
8 1 4 8
9
10 6

b. Graphical Presentation of Data


Graphs are used to present continuous data. The three common graphic presentations of data are:
histogram, frequency polygon, and cumulative frequency polygon (ogive).

i. Histogram
A histogram is another way of data presentation which is more suitable for frequency distributions with
continuous classes.

In drawing a pictogram, we put the class boundaries of each class on the horizontal axis and its respective
frequency on the vertical axis.

Example: Draw a histogram for the following data (temperature data of the 50 US states).

Class limits Class frequency Cumulative Cumulative


boundaries frequency frequency
(less than (more than
type) type)
100 – 104 99.5 – 104.5 2 2 50
105 – 109 104.5 – 109.5 8 10 48
110 – 114 109.5 – 114.5 18 28 40
115 – 119 114.5 – 119.5 13 41 22
120 – 124 119.5 – 124.5 7 48 9
125 – 129 124.5 – 129.5 1 49 2
130 – 134 129.5 – 134.5 1 50 1

Solution:
Step 1 Draw and label the x and y axes. The x axis is always the horizontal axis, and the y axis is
always the vertical axis.
Step 2 Represent the frequency on the y axis and the class boundaries on the x axis.
Step 3 Using the frequencies as the heights, draw vertical bars for each class.

18
ii. Frequency Polygon

The frequency polygon is a graph that displays the data by using lines that connect points plotted
for the frequencies at the midpoints of the classes. The frequencies are represented by the heights
of the points.
Example: Present the data in the previous example using a frequency polygon.

Solution
Step 1 Find the midpoints of each class. Recall that midpoints are found by adding the upper and
lower boundaries and dividing by 2:
99.5  104.5
For the first class CM   102 , same procedure follows for others and the CMs are
2
given in the table below.
Class limits Class Class mid frequency
boundaries points
100 – 104 99.5 – 104.5 102 2
105 – 109 104.5 – 109.5 107 8
110 – 114 109.5 – 114.5 112 18
115 – 119 114.5 – 119.5 117 13
120 – 124 119.5 – 124.5 122 7
125 – 129 124.5 – 129.5 127 1
130 – 134 129.5 – 134.5 132 1

19
Step 2 Draw the x and y axes. Label the x axis with the midpoint of each class, and then use a
suitable scale on the y axis for the frequencies.
Step 3 Using the midpoints for the x values and the frequencies as the y values, plot the points.
Step 4 Connect adjacent points with line segments. Draw a line back to the x axis at the
beginning and end of the graph

iii. Cumulative Frequency Polygon (Ogive)

Cumulative frequency polygon can be traced on less than or more than cumulative frequency
basis. Place the class boundaries along the horizontal axis and the corresponding cumulative
frequencies (either less than or more than cumulative frequencies) along the vertical axis. Then
join the cross points by a free hand curve.

Example: the data in the previous example can be presented using either a less than or a more
than cumulative frequency polygon as given below (i) and (ii) respectively.

(i) Less than type cumulative frequency polygon

Construct an ogive (less than type) for the frequency distribution of the 50 US states.
Solution
Step 1 Find the cumulative frequency for each class.

frequency
Class boundaries
Less than 99.5 0

20
Less than 104.5 2
Less than 109.5 10
Less than 114.5 28
Less than 119.5 41
Less than 124.5 48
Less than 129.5 49
Less than 134.5 50

Step 2 Draw the x and y axes. Label the x axis with the class boundaries. Use an appropriate
scale for the y axis to represent the cumulative frequencies.
Step 3 Plot the cumulative frequency at each upper class boundary
Step 4 Starting with the first upper class boundary, 104.5, connect adjacent points with line
segments

21
Chapter Two
Summarizing of Data

2.1 Measures of central Tendency: Objectives of Measuring Central Tendency

The most important aspect of studying the distribution of a sample measurement is the position
of the central value, that is, a representative value about which the measurements are distributed
and when it is convenient to have one figure that is representative of each group. This figure is
known as the average of the group. If the numbers of the group are arranged in order of
magnitude, the averages tend to fall around the central position in the group, so averages are
called measures of central tendency. In short, any measure intended to represent the center of
data set is called a measure of location or central tendency.

Objectives of Measuring Central Tendency

The most important objectives of measuring central tendency are:


 To determining a single value around which the other data will concentrate
 To summarizing/reducing the volume of the data
 To facilitating comparison within one group or between groups of data

The Summation Notation (∑)

Let a data set consists of a number of observations, represents by x1 , x 2 , ..., x n where n (the last

subscript) denotes the number of observations in the data and x i is the ith observation. Then the
sum
n
x1  x2  ...  xn   xi
i 1

22
n
Similarly x1  x2  ...  xn   xi
2 2 2 2

i 1

Some Properties of the Summation Notation


n
1.  c = n.c where c is a constant number.
i 1

n n
2.  b.xi  b xi where b is a constant number
i 1 i 1

n n
3.  (a  bxi )  n.a  b xi where a and b are constant numbers
i 1 i 1

n n n
4.  ( xi  y i )  xi   y i
i 1 i 1 i 1

Important characteristics /Desirable properties of measure of central tendency

We say a measure of central tendency is best if it possess most of the following. It should:

- be simple to understand and easy to calculate/interpret,


- exist and be unique,
- be rigidly defined by mathematical formula,
- be based on all observations,
- Not be seriously affected by extreme observations,
- Have capable of further statistical analysis and/or algebraic manipulation.

2.2 Types of Measures of Central Tendency

Several types of averages or measures of central tendency can be defined, the most commons are
- The mean
- the mode
- the median
The choice of average (measure of central tendency) depends upon which best represents the
property under discussion.

2.2.1 The Mean

a. The Arithmetic Mean (The Mean)

23
The arithmetic mean is defined as the sum of the measurements of the items divided by the total number of
items.

i. Arithmetic Mean for Raw Data

For raw data


n

x  x 2  ...  x n x i

for sample mean, x  1  i 1

n n
n

X 1  X 2  ...  X n X i

for population mean, X   i 1

N N

Example 1: You measure the body lengths (in inches) of 10 infants at birth and record the
following:
17.5 19.5 17.5 19 20 21 18 19.5 18 10.75
n

x i
x1  x 2  ...  x n 17.5  19.5  ...  10.75 180.75
x i 1
    18.075
n n 10 10

ii. Arithmetic Mean for Ungrouped Frequency Distribution

When the data are arranged or given on the form of ungrouped frequency distribution, then the
formula for the mean is
k

f x  f 2 x 2  ...  f n x n fx i i k
x 1 1
f1  f 2  ...  f n
 i 1
k
Note that f i n
f
i 1
i
i 1

Example 2: Calculate the mean for the following data.

xi fi xi f i
2 2 4
3 1 3
7 3 21
8 1 8
Total 7 36

24
k

fx i i
f1 x1  f 2 x2  ...  f n x n 2 * 2  1 * 3  ...  1 * 8 36
x i 1
    5.14
k
f1  f 2  ...  f n 2  1  ...  7
f
7
i
i 1

iii. Arithmetic Mean for Grouped Frequency Distribution


If data are given in the form of continuous frequency distribution, the sample mean can be
computed as
k

f x  f 2 x 2  ...  f n x n fx i i
x 1 1  i 1

f1  f 2  ...  f n k

f
i 1
i

Where x i = the class mark of the i class; i = 1, 2, …, k


th

th
fi
= the frequency of the i class and k = the number of classes
k
Note that  f i  n = the total number of observations.
i 1

Example: Calculate the average temperature for the 50 US states.

Class limits Class Class mid frequency xi f i


boundaries points
100 – 104 99.5 – 104.5 102 2 204
105 – 109 104.5 – 109.5 107 8 856
110 – 114 109.5 – 114.5 112 18 2016
115 – 119 114.5 – 119.5 117 13 1521
120 – 124 119.5 – 124.5 122 7 854
125 – 129 124.5 – 129.5 127 1 127
130 – 134 129.5 – 134.5 132 1 132
Total 50 5710

fx i i
f1 x1  f 2 x2  ...  f k xk 2 *102  8 *107  ...  1 *132 5710
x i 1
    114.2
k
f1  f 2  ...  f k 2  8  ...  1
f
50
i
i 1

iv. Properties of the Arithmetic Mean

25
 The sum of the deviations of the items from their arithmetic mean is zero. This means, the

algebraic sum of the deviations of a set of numbers x1 , x 2 , ..., x n from their mean x is zero.
n
That is  ( xi  x )  0
i 1

 The sum of the squares of the deviations of a set of observations from any number, say A, is

the least only when A  x . That is,  (x  x )   ( xi  A)


2 2
i

 When a set of observations is divided into k groups and x1 is the mean of n1 observations of

group 1, x 2 is the mean of n 2 observations of group2, …, x k is the mean of n k observations

of group k , then the combined mean ,denoted by x c , of all observations taken together is
given by
k

n x  n2 x 2  ...  nk x k n x i i
xc  1 1  i 1

n1  n2  ...  nk k

n
i 1
i

Example: Last year there were three sections taking Stat 273 course in Alemaya University. At
the end of the semester, the three sections got average marks of 80, 83 and 76. There were 28, 32
and 35 students in each section respectively. Find the mean mark for the entire students.
Solution:
n1 x1  n2 x 2  n3 x3 28(80)  32(83)  35(76) 7556
xc     79.54
n1  n2  n3 28  32  35 95

 If a wrong figure has been used in calculating the mean, we can correct if we know the
correct figure that should have been used. Let

 x wr denote the wrong figure used in calculating the mean

 xc be the correct figure that should have been used

 x wr be the wrong mean calculated using x wr , then the correct mean, xcorrect , is given by
nx wr  xc   x wr
xcorrect 
n

 If the mean of x1 , x 2 , ..., x n is x , then

26
a) the mean of x1  k , x 2  k , ..., x n  k will be x  k

b) The mean of kx1 , kx2 , ..., kxn will be kx .

Example: An average weight of 10 students was calculated to be 65 kg, but latter, it was
discovered that one measurement was misread as 40 kg instead of 80 kg. Calculate the corrected
average weight.

Solution: xcorrect  nx wr  xc  x wr  10(68)  80  40  69


n 10
Merits of Arithmetic Mean
- Arithmetic mean is rigidly defined and its value is always definite.
- It is calculated based on all observations.
- Arithmetic mean is simple to calculate and easy to understand. It doesn‟t need arraying
(arranging in increasing or decreasing order) of the data.
- Arithmetic mean is also capable of further algebraic treatment.
- It affords a good standard of comparison.

Drawbacks of Arithmetic Mean

- It is highly affected by extreme (abnormal) observations in the series. For instance, the
monthly incomes of three boys are 37 birr, 53 birr and 48 birr and that of their father is 1026
birr. The average income become becomes 219 birr which is not at all a representative
figure.
- It can be a number which does not exist in the series.
- It sometime gives such results which appear almost absurd. For example it is likely that we
can get an average of „3.6 children‟ per family.
- It gives greater importance to bigger items of a series and lesser importance to smaller items.
That means it is an upward bias measure.
- It can‟t be calculated for open-ended classes.

b. Weighted Arithmetic Mean

In finding arithmetic mean, all items were assumed to be of equal importance. When due
importance is to be given to each item, that is, when proper importance is required to be given to

27
different data, then we find weighted average. Weights are assigned to each item in proportion to
its relative importance.

If x1 , x 2 , ..., x k represent values of the items and w1 , w2 , ... , wk are the corresponding weights, then

the weighted mean, ( x w ) is given by


k

w x  w2 x 2  ...  wk x k w x i i
xw  1 1  i 1

w1  w2  ...  wk k

w
i 1
i

Example: A student‟s final grades in Mathematics, Physics, Chemistry and Biology are
respectively A, B, B and C. If the respective credits received for these courses are 3, 3, 2 and 2,
determine the approximate average mark (GPA) the student has got.
Solution: We use a weighted arithmetic mean, weight associated with each course being taken as
the number of credits received for the corresponding course.

xi 4 3 3 2

wi 3 3 2 2

Therefore, x w  w x
i i

(3  4)  (3  3)  (2  3)  (2  2) 31
  3.1
w i 33 2 2 10

c. The Geometric Mean (GM)

The geometric mean (GM) is defined as the nth root of the product of n values. The formula is

GM  n x1 x 2 ... x n
The geometric mean is useful in finding the average of percentages, ratios, indexes, or growth
rates. Example: if a person receives a 20% raise after 1 year of service and a 10% raise after the
second year of service, the average percentage raise per year is not 15 but 14.89%, as shown.

GM  1.2 *1.1  1.1489

d. The Harmonic Mean (HM)

The harmonic mean is defined as the number of values divided by the sum of the reciprocals of
each value. The formula is

28
n
HM 
 (1 / xi )
Example: Suppose a person drove 100 miles at 40 miles per hour and returned driving 50 miles
per hour. The average miles per hour are not 45 miles per hour, which is found by adding 40 and
50 and dividing by 2. The average is found as shown.

n 2
HM    44.44
 (1 / xi ) 1  1
40 50

Exercise: A carpenter buys $500 worth of nails at $50 per pound and $500 worth of nails at $10
per pound. Find the average cost of 1 pound of nails.

2.2.2 The Mode

The mode or the modal value is the most frequently occurring score/observation in a series and
denoted by x̂ . Note that the mode may not exist in the series or, even if it does exist, it may not be
unique.

For grouped data, the mode is found by the following formula:


 1 
xˆ  LCB mod  w 
 1   2 

Where LCB mod  lower class boundary of the modal class

1  The difference between the frequency of the modal class and the preceding class

 2  The difference between the frequency of the modal class and the next class

w  is the class width


The modal class is the class with the highest frequency in the distribution.
Example: The marks obtained by ten students in a semester exam in statistics are: 70, 65, 68, 70, 75,
73, 80, 70, 83 and 86. Find the mode of the students‟ marks.
Example: Calculate the modal temperature reading for the 50 US states.

Class limits Class Class mid frequency Cum. freq.


boundaries points (less than type)
100 – 104 99.5 – 104.5 102 2 2
105 – 109 104.5 – 109.5 107 8 10
110 – 114 109.5 – 114.5 112 18 28

29
115 – 119 114.5 – 119.5 117 13 41
120 – 124 119.5 – 124.5 122 7 48
125 – 129 124.5 – 129.5 127 1 49
130 – 134 129.5 – 134.5 132 1 50
Total 50

Since the third class has highest frequency, then it is taken as the modal class. Then LCB mod  109.5 ,

1  f mod  f p  18  8  10 ,  2  f mod  f s  18  13  5
 1   10 
xˆ  LCB mod  w   109.5  5   112.83
 1   2   10  5 

Merits of mode
- Mode is not affected by extreme values.
- Mode can be calculated even in the case of open-end intervals. And it is not necessary to know all
observations.

Demerits of mode

- Mode may not exist in the series and if it exists it may not be a unique value.
- It does not fulfill most of the requirements of a good measure of central tendency
- It may be unrepresentative in many cases.
2.2.3 The Median

The median is the midpoint of the data array. The median of is denoted by ~
x . For ungrouped data
the median is obtained by

 x n 1 if the number of items, n, is odd


~ 
 2
x  1
 ( x n  x n ) if the number of items, n, is even

2 2 2
1

For grouped data the median, obtained by interpolation method, is given by

~
x  LCB med 
w
n / 2  cf 
f med

Where LCB med  lower class boundary of the median class

30
cf  Sum of frequencies of all class lower than the median class (in other words it is the

cumulative frequency preceding the median class)

f med  Frequency of the median class and w  is class width


The median class is the class with the smallest cumulative frequency greater than or equal to n . It
2
can be located by counting n of the frequencies beginning from the lowest class.
2
Example: The birth weights in pounds of five babies born in a hospital on a certain day are 9.2, 6.4,
10.5, 8.1 and 7.8. Find the median weight of these five babies.
Solution: the median is 8.1.

Example Calculate the median temperature for the 50 US states.

Class limits Class Class mid frequency Cum. freq.


boundaries points (less than type)
100 – 104 99.5 – 104.5 102 2 2
105 – 109 104.5 – 109.5 107 8 10
110 – 114 109.5 – 114.5 112 18 28
115 – 119 114.5 – 119.5 117 13 41
120 – 124 119.5 – 124.5 122 7 48
125 – 129 124.5 – 129.5 127 1 49
130 – 134 129.5 – 134.5 132 1 50
Total 50

Then n / 2  50 / 2  25 ,

Since 28 is the first cumulative frequency to be greater than 25, the third class is the median class.

The LCB med  109.5 , w  5 , f med  18 , cf  10 ,

~
x  LCB med 
w
n / 2  cf 
f med

 75 
x  109.5  50 / 2  10  109.5     113.67
~ 5
18  18 

Merits of median

- Median is a positional average and hence it is not influenced by extreme values.


- Median is rigidly defined so that its value is always definite.

31
- Median can be calculated even in case of open-ended intervals.

Demerits of median

- It is not capable of further algebraic treatment.


- It is not a good representative of the data if the number of items (data) is small.
- The arrangement of items in order of magnitude is sometimes very tedious process if the number
of items is very large.

2.3 Quantiles

Quantiles are values which divides the data set arranged in order of magnitude in to certain equal
parts. They are measures of position (non-central tendency). Some of these values of quantiles are
quartiles, deciles and percentiles.
Q3
i. Quartiles: are values which divide the data set in to four equal parts, denoted by Q1 , Q2 and .
The first quartile is also called the lower quartile and the third quartile is the upper quartile. The
second quartile is the median.

 For Ungrouped data:


th
Let Q j be the j quartile value for j  1, 2, 3 . Then
th
j 
Q j   n  item; j  1, 2, 3.
4 
 For grouped data
We can apply the following formula:

Q j  LCB Q j 
w
fQj
 
j  n 4  cf Q j ; j  1, 2, 3.

Where Q j  the j quartile which is to be worked out


th

LCB Q j  Lower class boundary of the j quartile class


th

cf Q j  Sum of frequencies of all classes lower than the j quartile class


th

f Q j  Frequency of the j quartile class and w  Class width


th

32
th
The j quartile class is the class with the smallest cumulative frequency greater than or equal to j  n 4 .
It can be located by counting j  n 4 of the frequencies beginning from the lowest class.

ii. Deciles: are values dividing the data in to ten equal parts, denoted by D1 , D2 , ..., D9 . The fifth
decile is the median.
 For Ungrouped data
th
Let D j be the j percentile value for j  1, 2, ... , 9 . Then
th
 j 
D j   n  item; j  1, 2, ... , 9
 10 
 For grouped data
We can apply the following formula:

D j  LCB D j 
w
f Dj
 
j  n 10  cf D j ; j  1, 2, ... , 9

Define the symbols similar way as we did in the case of quartiles.


th
The j decile class is the class with the smallest cumulative frequency greater than or equal to j  n 10
. It can be located by counting j  n 10 of the frequencies beginning from the lowest class.
iii. Percentiles: are values which divide the data in to one hundred equal parts, denoted by

P1 , P2 , ... P99 . The fiftieth percentile is the median.


 For ungrouped data

Let Pj be the percentile value for j  1, 2, 3, ... , 99 . Then


th
 j 
Pj   n  item; j  1, 2, 3, ... , 99
 100 
 For grouped data
We can use the following formula:

Pj  LCB Pj 
w
f Pj
j.n / 100  cf Pj ;  j  1, 2, 3, ... , 99

Define the symbols similar way as we did in the case of quartiles.


th
The j percentile class is the class with the smallest cumulative frequency greater than or equal to
j  n 100 . It can be located by counting j  n 100 of the frequencies beginning from the lowest class.

33
Interpretations

1. Q j is the value below which ( j  25) percent of the observations in the series are found (where

j  1, 2, 3 ). For instance Q3 means the value below which 75 percent of observations in the given
series are found.
2. D j Is the value below which ( j  10) percent of the observations in the series are found (where

j  1, 2, ... , 9 ). For instance D4 is the value below which 40 percent of the values are found in the

series.

3. Pj is the value below which j percentof the total observations are found (where j  1, 2, 3, ... , 99 ).

For example 73 percent of the observations in a given series are below P73 .
Exercise: The temperature data calculate

Find a) Q1 and P25 and check they are equal

b) P50 and check it is equal to ~


x.
2.4 Measures of Dispersion (Variation)
Variation (dispersion) is the scatter or spread of observations /values/ in a distribution. The
average or central value is of little use unless the degree of variation, which occurs about it, is
given. If the scatter about the measure of central tendency is very large, the average is not a
typical value. Therefore it is necessary to develop a quantitative measure of the dispersion (or
variation) of the values about the average.

Measures of variation are statistical measures, which provide ways of measuring the extent to
which the data are dispersed or spread out.

Objectives of measuring variation

Measures of variation are needed for the following basic objectives.


 To judge the reliability of a measure of central tendency
 To compare two or more sets of data with regard to their variability
 To control variability itself like in quality control, body temperature, etc.
 To make further statistical analysis or to facilitate the use of other statistical measures.

Properties of a good measure of dispersion

34
A good measure of dispersion should:
- be rigidly defined by a mathematical formula,
- be simple to understand and easy to calculate,
- be unique,
- be based on all observations in the series,
- not be affected by some extreme values existing in the series,
- be capable of further algebraic treatment as well as further statistical analysis.

Absolute and Relative Measures of Dispersion

Measures of dispersion /variation may be either absolute or relative. Absolute measures of


dispersion are expressed in the same unit of measurement in which the original data are given.
These values may be used to compare the variation in two distributions provided that the
variables are in the same units and of the same average.

In case the two sets of data are expressed in different units, however, such as quintals of sugar
versus tons of sugarcane or if the average sizes are very different such as manager‟s salary versus
worker‟s salary, the absolute measures of dispersion are not comparable. In such cases measures
of relative dispersion should be used.

A measure of relative dispersion is the ratio of a measure of absolute dispersion to an appropriate


measure of central tendency. It is sometimes called coefficient of dispersion because the word
“coefficient” represents a pure number (that is independent of any unit of measurement). Note
also that the value of a relative dispersion is unit less quantity.

Some types of measures of variation are discussed below.

2.4.1 The Range, Variance, Standard Deviation and Coefficient of Variation

i. The Range and Relative Range

Range (R) is defined as the difference between the largest and the smallest observation in a given

set of data. That is, R  x max  x min where xmax and xmin are the largest and the smallest
observations in the series respectively.

35
In case grouped data, range is found by taking the difference between the class mark of the last

class and that of the first class. That is, R  CM last  CM first where CM last and CM first are the

class marks of the last class and that of the first class respectively.

A relative range (RR), also known as coefficient of range, is given by

x max  x min R
RR   ........ for ungroupeddata
x max  x min x max  x min
CM last  CM first R
RR   ......... for grouped data
CM last  CM first CM last  CM first

Properties of Range and Relative Range

- Range and relative range are easy to calculate and simple to understand.
- Both cannot be computed for grouped data with open ended classes.
- They do not tell us anything about the distribution of values in the series.

Example: Find the range and relative range for the monthly salary of ten workers in a certain
paint factory given below.

462 480 534 624 498 552 606 588 516 570

Solution:
xmax  624birr x min  462birr
R  xmax  xmin  624birr  462birr  162birr
x max  x min 624birr  462birr 162birr
RR     0.149
x max  xmin 624birr  462birr 1086birr

Example: Find the values of the range and relative range for the following frequency
distribution: which shows the distribution of the maximum loads supported by a certain number
of cables.

Maximum load Number


(in kilo-Newton) of cables
93 – 97 2
98 – 102 5
103 – 107 12

36
108 – 112 17
113 – 117 14
118 – 122 6
123 – 127 3
128 – 132 1

Solution:

M first  95 kN M last  130 kN


R  CM last  CM first  130 kN  95 kN  35 kN
CM last  CM first 130 kN  95 kN 35 kN
RR     0.156
CM last  CM first 130 kN  95 kN 225 kN

ii. The Variance

Variance is the arithmetic mean of the square of the deviation of observations from their
arithmetic mean.
 Population Variance (  2 )
For raw data
N   N  
2
   xi 
 x   
2
i
1  N 2  i 1  
 
2 i 1
 ...    xi  
N N  i 1 N 
 
 

Where  is the population arithmetic mean and N is the total number of observations in the
population.

For ungrouped FD
  
2

 f x 
k
 k
 
2    f i xi 
i i
1 k 2  i 1  
 
2 i 1
 ...    f i xi 
N N  i 1 N 
 
 

Where  is the population arithmetic mean, x i is value/ reading of the i th class, k is number of

classes, f i is the frequency of the i th class and N  f i .

37
For grouped data
  
2

 f x 
k
 k
 
2    f i xi 
i i
1 k 2  i 1  
 
2 i 1
 ...    f i xi 
N N  i 1 N 
 
 

Where  is the population arithmetic mean, x i is the class mark of the i th class, k is number of

classes, f i is the frequency of the i th class and N  f i .

Where  is the population arithmetic mean, x i is the class mark of the i th class, f i is the

frequency of the i th class and N  f i .

 Sample Variance ( S 2 )
For raw data
n   n 
2

  xi  x 
2    xi  
1  n 2  i 1  
S2   ...    xi 
i 1

n 1 n  1  i 1 n 
 
 
Where x is the sample arithmetic mean and n is the total number of observations in the sample.
For ungrouped FD
k   k  
2

 f i ( xi  x ) 2    f i xi 
1  k  i 1  
S2   ...    f i xi 
i 1 2

n 1 n  1  i 1 n 
 
 

Where x is the sample arithmetic mean, x i is the data value of the i th class, k is number of

classes, f i is the frequency of the i th class and n  f i .

For grouped data

38
k   k  
2

 f i ( xi  x ) 2    f i xi 
1  k  i 1  
S2   ...    f i xi 
i 1 2

n 1 n  1  i 1 n 
 
 

Where x is the sample arithmetic mean, x i is the class mark of the i th class, k is number of

classes, f i is the frequency of the i th class and n  f i .

iii. The Standard Deviation

Standard deviation is the positive square root of the variance.


 Population Standard Deviation (  )

   2 where  2 is the population variance.


 Sample Standard Deviation ( S )

S  S 2 where S 2 is the sample standard deviation.

Example: Find the variance and standard deviation of the following sample data.

xi 5 10 12 17 Total
xi  x 2 36 1 1 36 74
n

 x  x
2
i
74
S2  i 1
  24.67
n 1 3

 x  x
2
i
74
S  S2  i 1
  4.97
n 1 3

Example: Calculate the Variance variance ans the standard deviation of the temperature data.
(Assume the data as sample)

Class Class mid frequency f i xi f i xi2


boundaries points ( x i ) ( fi )
99.5 – 104.5 102 2 204 20808
104.5 – 109.5 107 8 856 91592
109.5 – 114.5 112 18 2016 225792

39
114.5 – 119.5 117 13 1521 177957
119.5 – 124.5 122 7 854 104188
124.5 – 129.5 127 1 127 16129
129.5 – 134.5 132 1 132 17424
Total 50 5710 653890

  k  
2
   f i xi 
1  k  i 1    1  653890 (5710)   1 (653890 652082)  36.9
2
S   i i 
2 2
f x  
n  1  i 1 n  49  50  49
 
 
S  S 2  36.9  6.07

iv. Coefficient of Variation


The standard deviation is an absolute measure of dispersion. The corresponding relative measure
is known as the coefficient of variation (CV).

Coefficient of variation is used in such problems where we want to compare the variability of
two or more than two different series. Coefficient of variation is the ratio of the standard
deviation to the arithmetic mean, usually expressed in percent.
S
CV   100 . Where S is the standard deviation of the observations.
x
A distribution having less coefficient of variation is said to be less variable or more consistent or
more uniform or more homogeneous.

Example: Last semester, students of Biology and Chemistry Departments took Stat 273 course.
At the end of the semester, the following information was recorded.
Department Biology Chemistry
Mean score 79 64
Standard deviation 23 11

Compare the relative dispersions of the two departments‟ scores using the appropriate way.

Solution:
Biology Department Chemistry Department
S S
CV   100 CV   100
x x
23 11
  100  29.11%   100  17.19%
79 64

40
Interpretation: Since the CV of Biology Department students is greater than that of Chemistry
Department students, we can say that there is more dispersion relative to the mean in the
distribution of Biology students‟ scores compared with that of Chemistry students.

Properties of the Variance and the Standard Deviation


Variance

– Its unit is the square of the unit of measurement of values. For example, if the variable is
measured in kg, the unit of variance is kg2.
– It is calculated based on all the observations/data in the series.
– It gives more weight to extreme values and less to those which are near to the mean.

Standard Deviation

– It is considered to be the best measure of dispersion.


– [Demerit] If the values of two series have different unit of measurement, then we can not
compare their variability just by comparing the values of their respective standard deviations.
– It is calculated based on all the observations/data in the series. Standard deviation is capable of
further algebraic treatment.
– Standard deviation is as such neither easy to calculate nor to understand.
– Similar to the variance, standard deviation gives more weight to extreme values and less to
those which are near to the mean.

4.5 The Standard Scores (Z-Scores)

A standard score is a measure that describes the relative position of a single score in the entire
distribution of scores in terms of the mean and standard deviation. It also gives us the number of
standard deviations a particular observation lie above or below the mean.
x
Population standard score: Z  where x is the value of the observation,  and  are the

mean and standard deviation of the population respectively.
xx
Sample standard score: Z  where x is the value of the observation, x and S are the mean
S
and standard deviation of the sample respectively.

41
Example: Two sections were given an exam in a course. The average score was 72 with standard
deviation of 6 for section 1 and 85 with standard deviation of 5 for section 2. Student A from
section 1 scored 84 and student B from section 2 scored 90. Who performed better relative to
his/her group?

Solution: Section 1: x = 72, S = 6 and score of student A from Section 1; x A = 84


Section 2: x = 85, S = 5 and score of student B from Section 2; x B = 90
x A  x1 84  72
Z-score of student A: Z    2.00
S1 6

x B  x 2 90  85
Z-score of student B: Z    1.00
S2 5
From these two standard scores, we can conclude that student A has performed better relative to
his/her section students because his/her score is two standard deviations above the mean score of
selection 1 while the score of student B is only one standard deviation above the mean score of
section 2 students.

42
Chapter 3
Elementary Probability
Introduction

• Probability theory is the foundation upon which the logic of inference is built.
• It helps us to cope up with uncertainty.
• In general, probability is the chance of an outcome of an experiment. It is the measure of how
likely an outcome is to occur.

Generally probability can be divided into two

i) Subjective probability: - probability of an event in a certain experiment to be


occurred based on individual‟s belief or attitude.
ii) Objective probability: - the probability of an event in a certain experiment based on
experimental evidence.
3.1 Deterministic and Non deterministic models

Deterministic Stochastic model (probabilistic)


-> Certain -> uncertain
->mathematical ->non-mathematical (eg: econometric model)

3.2 Review of set theory: sets, union, intersection complementation, De Morgan’s rules-
(Reading assignment)

3.3. Random experiments, sample space and events

43
Random experiment: - is a process of measurement or observation which is repeated at any
time and whose outcome can‟t be predicted with certainty.
-an experiment that can be repeated any numbers of times for similar condition and it is possible
to enumerate numbers of out comes.
e.g. tossing a coin
Outcome: - a particular result of an experiment (result of single trial of an experiment)
Sample space: - is the set of all possible outcomes of a random experiment. Each possible
outcome is called sample point.
Event: - is a subset of a sample space (one or more outcomes of an experiment)
Example1: if we toss a coin the sample space (S) of this experiment
S = {head, tail} where head and tail are two faces of a coin. If we are interested the outcome of
head will turn up then the event E= {head}
Example 2: find the sample space of tossing a coin twice.
S= {HH, HT, TH, TT}
Elementary or simple event: - an event having only one sample point.
Mutually exclusive event: - two events E1 and E2 are said to be mutually exclusive if there is
no sample point which is common to E1 and E2.
i.e. E1  E2 = 
Independent event: two events E1 and E2 are said to be independent if the occurrence of E1
does not affect the occurrence of E2.
Dependent Events: Two events are dependent if the first event affects the outcome or
occurrence of the second event in a way the probability is changed.
Definition of probability

Probability:-is a chance (likely hood) of occurrence of an event. It is expressed by a numerical


value between 0 and 1 inclusively.

3.4 finite sample spaces and equally likely outcomes

Finite sample space: when the outcomes of certain experiment are finite.

Equally likely outcomes: - if each outcome in a sample space has the same chance to be
occurred.

44
Example: In throwing a fair die all possible outcomes are equally likely. That means the
elements of the sample space have the chance to be occurred.

3.5 Counting techniques:

In order to calculate probabilities, we have to know


• The number of elements of an event
• The number of elements of the sample space.
That is in order to judge what is probable, we have to know what is possible.
In order to determine the number of outcomes, one can use several rules of counting.
- The addition rule
- The multiplication rule
- Permutation rule
- Combination rule
In order to determine the number of out comes one can use several rules of counting
1. Addition rile: is a basic counting principle that is used when n1 possibilities of doing
something and n 2 ways of doing another thing and we cannot do both at the same time, then
there are n1  n2 ways of possibilities.

If the n actions can be done in n1 , n 2 ,, nk ways respectively, and no two actions can be done

at the same time, then the total possibilities of performing the action become n1  n2    nk .

2. Multiplication rule: - in a sequence of n events in which the first event has n1 possibilities…
the nth event has n 2 possibilities, then the total possibilities of the sequence will be

n1  n2    nk .
Example: The digits 0, 1, 2, 3, and 4 are to be used in 4 digit identification card. How many
different cards are possible if
a. Repetitions are permitted.
b. Repetitions are not permitted.

Solutions
a.
1st digit 2nd digit 3rd digit 4th digit

45
5 5 5 5

There are four steps


1. Selecting the 1st digit, this can be made in 5 ways.
2. Selecting the 2nd digit, this can be made in 5 ways.
3. Selecting the 3rd digit, this can be made in 5 ways.
4. Selecting the 4th digit, this can be made in 5 ways.
Therefore, 5*5*5*5=625 different possibilities
b.
1st digit 2nd digit 3rd digit 4th digit
5 4 3 2

There are four steps


1. Selecting the 1st digit, this can be made in 5 ways.
2. Selecting the 2nd digit, this can be made in 4 ways.
3. Selecting the 3rd digit, this can be made in 3 ways.
4. Selecting the 4th digit, this can be made in 2 ways.
Therefore, 5*4*3*2=120 different cards can be prepared.

3. Permutation: is an arrangement of n objects in a specific order. In this case order is crucial.


a) The number of permutations of n objects taken all together is n!
i.e. n! / (n-n)! =n!
b) The arrangement of n distinct objects in a specific order using r objects at a time is given by
nPr =n!/(n-r)!= n(n-1)(n-2)…..(n-r-1)(n-r)!/(n-r)!= n(n-1)(n-2)…..(n-r-1)

c) The number of permutation of n objects in which n1 are alike, n 2 are alike, n k are alike is

n!
n1!*n 2 !*... * nk

Example: How many different permutations can be made from the letters in the word
“CORRECTION”?
n! 10!
  453600Permutations
n1!*n 2 !*... * nk 2!*2!*2!*1!*1!*1!*1!

46
Note: 0! =1! =1
Example: a photographer wants to arrange 3 persons in a raw for photograph. How many
different types of photographs are possible?
Solution:
Assume 3 persons Aster (A), lemma (L), Yared (Y) and n=3
Since n! =3! = 3*2! = 6, or 3P3 =3!/(3-3)!=3!/0!=6; there are 6 possible arrangement ALY,
AYL, LAY, LYA, YLA and YAL
Example2: fifteen athletes including Haile were entered to the race.
a) In how many different ways could prizes for the first, the second and the third place be
awarded?
b) How many of the above triplets just counted have if Haile is in the first position?
Solution:
15 objects taken 3 at a time 15P3=15! / (15-3)! = 2730
There are 14P2= 14! / (14-2) = 182

4. Combination: - counting technique in which the order of the objects is immaterial. Selection
of r objects from a collection of n objects where r<= n without regarding order. The combination
of n objects taking r objects at a time is given by

nCr = n!/(n-r)!r!
Example: In a club containing 7 members a committee of 3 people is to be formed. In how many
ways can the committee be formed?
Solution: 7C3 = 7! / (7-3)! 3! = 35

3.6 Definitions of probability / approaches of probability

i. Classical approach: - Uses sample space to determine the numerical probability that an event
will happen. If there are n equally likely outcomes of an experiment, and out of the n outcomes
event A occur only f times the probability of the event A is denoted by P (A) is defined as
P (A) = n (A)/ n(S) =

Deficiencies of classical approach

- If total number of outcomes is infinite or if it is not possible to enumerate all elements of the
sample space.

47
- If each outcome is not equally likely
Example: A fair die is tossed once. What is the probability of getting?
a) Number 4?
b) An odd number?
c) An even number?
Solutions:
First identify the sample space, say S  1, 2, 3, 4, 5, 6
n(S)= 6
a) A  4, n(A)=1, P(A)=n(A)/ n(S)=1/6
b) A  1, 3, 5, n(A)=3, P(E)=n(A)/ n(S)=3/6=0.5

c) A  2, 4, 6 , n(A)=3, P(A)=n(A)/ n(S)=3/6=0.5

ii. Relative frequency Approach (empirical approach):- suppose we repeat a certain


experiment n times and let E be an event of the experiment and let f be the number of times that
event A occurs. Then the ratio f/n is called the relative frequency of event A.
numberof times event A has occurred f
P( A)  
total number of observations n

In other words given a frequency distribution , the probability of an event (A) being in a given
frquency of a class
class is P(A)=
total frequency in the distribution

Example: the national center for health statistics reported that of every 539 deaths in recent
years, 24 resulted that from automobile accident, 182 from cancer, and 353 from other disease.
What is the probability that particular death is due to an automobile accident?

Solution P (automobile) = no of deaths due to automobile /total death =f/n=24/539


iii. The Axiomatic Approach
Let A be a random experiment and S be a sample space associated with E. With each event A, a
real number called the probability of A satisfies the following properties called axioms of
probability or postulates of probability.

1. P (A) ≥0
2. P (S) = 1, S is the sure event..

48
3. If A and B are mutually exclusive events, then either A or B occur equals the sum of of
the two probabilities P (A  B) = P (A) + P (B)
4. P(A‟)=1-P(A)
5. 0≤P(A)≤1
6. P(  )=0,  is the impossible event.

3.7 Derived theorems of probability


Theorem l: let A be an event and A‟ be the compliment of A with respect to a given sample
space of an experiment, then p(A‟)=1-P(A)
Proof: let S be a sample space S=A  A‟ and, A and A‟ are mutually exclusive A  A‟ = 
P(S) = P (A  A‟) = P (A‟) + P (A) and P(S) = 1
1= P (A‟) + P (A) => P (A‟) = 1-P (A)
Theorem 2: let A and B are events of a sample space S, then P (A‟  B) = P (B)-P (A  B)
Proof: B =S  B = (A  A‟)  B = (A  B)  (A‟  B)
Case 1: if A  B ≠  , then P (B) =P (A  B) +P (A‟  B)
P (A‟  B) = P (B) – P (A  B)
Case 2: if A  B =  , then P (B) =P (A  B) + P (A‟  B) since P (A  B) = P (  ) =0
=> P (B) = P (A‟  B)
Theorem 3: Suppose A and B are two events of a sample space, then
P (A  B) = P (A) + P (B) - P (A  B)
Example: A fair die is thrown twice. Calculate the probability that the sum of spots on the face
of the die that turn up is divisible by 2 or 3.
Solution:
S= {(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(3,1),
(3,2),(3,3),(3,4),(3,5),(3,6),(4,1),(4,2),(4,3),(4,4),(4,5),(4,6),(5,1),(5,2),(5,3),(5,4),(5,4),(5,5),(5,6),
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)}
This sample space has 6*6 =36 elements let A be the event that the sum of the spots on the die is
divisible by 2 and B be the event that the sum of the spots on the die is divisible by 3, then
P (A or B) = P (A  B)
= P (A) +P (B) – P (A  B)
= 18/36 + 12/36 -6/36 = 24/36 = 2/3

49
Chapter 4: Conditional probability and independence

4.1 Conditional probability:


The conditional probability of an event A in relation to B is defined as the probability that event
A occurs given that event B has been already occurred.
P (A/B) = P (A  B)/ P (B) where P (B) > 0
Remark: (i) P (A  B) & P (B)
(ii) P (S/B) = P(S  B)/P (B) = P (B)/P (B) = 1
(iii) P (B/S) = P (B) because P (B/S) = P (B  S)/P(S) = P (B)/1 =P (B)
(iv) if A and B are independent event, then P(A/B) =P(A) and P(B/A) =P(B)
o Two events A and B are independent if the occurrence of B doesn‟t affect the occurrence of
A. i.e. P(A/B) =P(A  B)/P(B)= P (A)* P (B) / P (B)= P (A) since P (A  B) = P (A/B) *P
(B)= P (A)* P (B)
Example: Suppose that an office has 100 calculating machines. Some of them use electric power
(E) while others are manual (M) and some machines are new (N) while others are used (U). The
table below gives numbers of machines in each category. A person enter the office picks a
machine at random and discovers that it is new. What is the probability that it is operates with
electric power?
E M Total
N 40 30 70
U 20 10 30
Total 60 40 100
40
Solution: P (E/N) =P (E  N) /P (N) = 100 = 4/7
70
100

50
4.2 Multiplication Theorem, Baye’s Theorem, and Total probability
i. Multiplication Theorem

If A and B are any two events of a sample space such that P(A) ≠0 and P(B) ≠0, then P(A 
B)=P(A)P(B/A)=P(B)P(A/B)

ii. Baye’s theorem


Theorem 1: let {E1,E2, .., En} be partitions of the sample space S, and suppose E1,E2, .., En has
non-zero probability that is P(Ei) ≠ 0 for i = 1,2, … ,n and let E be any event, then P(E) =P(E1)*
P(E/E1) + P(E2)*P(E/E2) +….+P(En)*P(E/En)
n
=  P ( E )P( E )
i 1
i
E i

Theorem 2: Let {E1,E2, .., En} be partitions of the sample space S, and suppose E1,E2, .., En
has non-zero probability that is P(Ei) ≠ 0 for i= 1,2, … ,n and let E be any event for P(E) > 0,
then for each integer k, 1 ≤ k ≤ n, we have

p( E k P( E k  E )
 n Ek E)
P( ) P( E
) k
E = n
 P( E
i 1
i  E)  P( E ) P( E E )
i 1
i i

Example: suppose that three machines are A1, A2 and A3 produce 60%, 30%, and 10%
respectively of the total production of machines are 2%, 4%, and 6% respectively.
i. If an item is selected at random, then find the probability that the item is defective. Assuming
that an item selected at random is found to be defective.
ii. Find the probability the item was produced on machine A1.
Solution: let B be an event of selecting a defective item at random and let E1, E2, E3 be an items
produced on machines A1, A2, A3 respectively then
P (B/E1) = 2%=0.02, P (B/E2) = 4% = 0.04 and P (B/E3) = 6% = 0.06
P (B) = P (B  [E1  E2  E3])
= P ([B  E1]  [B  E2]  [B  E3])
= P (B  E1) + P (B  E2) +P (B  E3)
= P (E1)*P (B/E1) + P (E2)*P (B/E2) +P (E3)*P (B/E3)
= 0.6*0.02 + 0.3*0.04 + 0.1*.06
= 0.03

51
p ( E1  B ) P( E1) P( BE ) = 0.6 * 0.02 =0.4
We use Baye‟s formula P (E1/B) = = n
1
P( B) 0.03
 P( E ) P( B E )
i 1
i i

iii. Total Probability Theorem:


Let {E1,E2, .., En} be partitions of the sample space S, then for any event A of the same
probability space

n n
P( A)   P( A  Ei )   P( A Ei )P( Ei )
i 1 i 1

Probability of Independent Events

Two events A and B are independent if and only if P ( A  B )  P ( A) * P ( B ) , in otherwords,

P( A B)  P( A) , P( B A)  P( B) .

52
Chapter Five

One-dimensional Random Variables

5.1 Definition of Random Variable

Definition: A Random Variable is variable whose values are determined by chance. It is a


numerical description of the outcomes of the experiment or a numerical valued function defined
on sample space, usually denoted by capital letters.
If X is a random variable, then it is a function from the elements of the sample space to the set of
real numbers. i.e. X is a function X: S→R.

Example: Flip a coin three times, let X be the number of heads in three tosses.
S = (HHH, THH, HTH, HHT, TTH, THT, HTT, TTT)
Let the variable of interest, X, be the number of heads observed then relevant events would be
X (HHH) =3
X (HHT) =X (HTH) =X (THH) =2
X (HTT) =X (THT) =X (TTH) =1
X (TTT) =0
X= {0, 1, 2, 3}
The relevant question is to find the probability of each these events.
Note that X takes integer values even though the sample space consists of H‟s and T‟s. The
variable X transforms the problem of calculating probabilities from that of set theory to calculus.
Definition. A random variable (r.v.) is a rule that assigns a numerical value to each possible
outcome of a random experiment.
Interpretation:
-random: the value of the r.v. is unknown until the outcome is observed
- Variable: it takes a numerical value
Notation: We use X, Y , etc. to represent r.v.s.
Random Variables are of two types:
5.2 Discrete random variable: are variables which can assume only a specific number of values
which are clearly separated and they can be counted.
Example:
 Toss coin n times and count the number of heads.

53
 Number of Children in a family.
 Number of car accidents per week.
 Number of defective items in a given company.
Definition: A probability distribution is a complete list of all possible values of a random
variable and their corresponding probabilities.
Discrete probability distribution: is a distribution whose random variable is discrete.
Example: Consider the possible outcomes for the exp't of tossing three coins together.
Sample space, S = (HHH, THH, HTH, HHT, TTH, THT, HTT, TTT)
Let the r.v. X be the No of heads that will turn up when three coins tossed
X = {0, 1, 2, 3}
P(X = 0) = P (TTT) = 1/8,
P(X=1) = P (HTT) +P (THT) + P (TTH) =1/8+1/8+1/8 = 3/8
P(X=2) = P (HHT) +P (HTH) +P (THH) = 1/8+1/8+1/8 = 3/8,
P(X=3) = P (HHH) = 1/8
X=x 0 1 2 3
P(X=x) 1/8 3/8 3/8 1/8

If X is discrete r.v:
1. P(x) ≥0
2.  P( x)  1
x
Note: If X is discrete rv then

b 1
P ( a  X  b)   P( x)
x  a 1
b 1
P(a  X  b)   P( x)
x a
b
P ( a  X  b)   P( x)
x  a 1
b
P(a  X  b)   P( x)
x a

5.3 Continuous random variable: are variables that can assume any value in an interval.
Example:
 Height of students at certain college.

54
 Mark of students.
Remarks.
(i) In data analysis we described a set of data (sample) by dividing it into classes and calculating
relative frequencies.
(ii) In Probability we described a random experiment (population) in terms of events and
probabilities of events.
(iii) Here, we describe a random experiment (population) by using random variables, and
probability distribution functions.

Probability density function (continuous probability distribution): is a probability


distribution whose random variable is continuous. Probability of a single value is zero and
probability of an interval is the area bounded by curve of probability density function and
interval on x-axis. Let a and b be any two values; a < b. The prob. that X assumes a value that
lies b/n a and b is equal to the area under the curve a and b.
I.e: P (a  x  b) area under curve b/n a and b
Note: P ( X = a ) = 0 for any point a

Fig. probability density functions of X.

b
P (a  X  b)  p (a  x  b)   f ( x)dx
a
(area of shaded region)

If f(x) is a probability density functions:


1) f(x) ≥0

2) 

f ( x)dx  1

b
Note: 1. if X is continuous rv then P (a  X  b)   f ( x)dx
a

2. probability of a fixed value of a continuous rv is zero. Implying

55
P ( a  X  b)  P ( a  X  b)  P ( a  X  b)  P ( a  X  b)

5.4 Cumulative distribution function and its properties

The cumulative distribution function F(x) of a discrete random variable X with probability

distribution f(x) is F ( x)  P( X  x)   f (t ) for -   x  


tx

The cumulative distribution function F(x) of a continuous random variable X with density
x
function f(x) is F ( x)  P( X  x)   f (t )dt

for -   x  

6. Functions of Random Variables

6.1 Equivalent events: events having same probability of occurring.

6.2 Functions of discrete random variables and their distributions

Frequently in statistics, one encounters the need to derive the probability distribution of a
function of one or more random variables. For example, suppose that X is a discrete random
variable with probability distribution f(x), and suppose further that Y = u(X) defines a one-to-one
transformation between the values of X and Y . We wish to find the probability distribution of Y.
It is important to note that the one-to-one transformation implies that each value x is related to
one, and only one, value y = u(x) and that each value y is related to one, and only one, value x =
w(y), where w(y) is obtained by solving y = u(x) for x in terms of y.

It is clear that the random variable Y assumes the value y when X assumes the value w(y).
Consequently, the probability distribution of Y is given by

g(y) = P(Y = y) = P[X = w(y)] = f[w(y)].

Theorem: Suppose that X is a discrete random variable with probability distribution f(x). Let Y
= u(X) define a one-to-one transformation between the values of X and Y so that the equation y
= u(x) can be uniquely solved for x in terms of y, say x = w(y). Then the probability distribution
of Y is g(y) = f[w(y)].

56
Example: let X be a geometric random variable with pmf:

x 1
31
f ( x)    , x  1,2,3,.... find the probability distribution for Y  X 2 .
44

Solution: since the values of X are all positive, the transformation defines a one-to-one

correspondence between the x and y values, y=x2 and x= y . Hence

 y 1

f
g ( y)  
 y   34  14  , y  1,4,9,...
 

0, otherwise

6.3 Functions of continuous random variables and their distributions

Suppose that X is a continuous random variable with probability distribution f(x). Let Y = u(X)
define a one-to-one correspondence between the values of X and Y so that the equation y = u(x)
can be uniquely solved for x in terms of y, say x = w(y).

Then the probability distribution of Y is g(y) = f[w(y)]|J|, where J = w‟(y) and is called the
Jacobian of the transformation

Example: Let X be a continuous rv with pdf

x
 , 1  x  5,
f ( x)  12

0, elsewhere,

Find the pdf of the rv Y  2 X  3 .

Solution: the inverse solution of y  2 x  3 yields x  ( y  3) / 2, from which we obtain


J  w' ( y )  dx / dy  1 / 2 . Therefore, using the theorem above, we find the density function of

 ( y  3) / 2  1  y  3
   , - 1  y  7,
Y to be g ( y )   12  2  48
0,
 otherwise

57
Chapter Seven
Two dimensional Random Variables
7.1 Two dimensional random variables

Our study of random variables and their probability distributions in the preceding sections is
restricted to one-dimensional sample spaces, in that we recorded outcomes of an experiment as
values assumed by a single random variable. There will be situations, however, where we may
find it desirable to record the simultaneous outcomes of several random variables. For example,
we might measure the amount of precipitate P and volume V of gas released from a controlled
chemical experiment, giving rise to a two-dimensional sample space consisting of the outcomes
(p, v), or we might be interested in the hardness H and tensile strength T of cold-drawn copper,
resulting in the outcomes (h, t). In a study to determine the likelihood of success in college based
on high school data, we might use a three dimensional sample space and record for each
individual his or her aptitude test score, high school class rank, and grade-point average at the
end of freshman year in college.

If X and Y are two discrete random variables, the probability distribution for their simultaneous
occurrence can be represented by a function with values f(x, y) for any pair of values (x, y)
within the range of the random variables X and Y . It is customary to refer to this function as the
joint probability distribution of X and Y.

Hence, in the discrete case,


f ( x, y )  P ( X  x, Y  y ) ;

that is, the values f(x, y) give the probability that outcomes x and y occur at the same time.

7.2 Joint distributions for discrete and continuous random variables

i. The function f(x, y) is a joint probability distribution or probability mass function of the
discrete random variables X and Y if

1. f ( x, y )  0 for all ( x, y ),

2.  f ( x, y)  1,
x y

3. P( X  x, Y  y )  f ( x, y ).

58
For any region A in the xy plane, P[( X , Y )   f ( x, y ).
A

Example: Two ballpoint pens are selected at random from a box that contains 3 blue pens, 2 red
pens, and 3 green pens. If X is the number of blue pens selected and Y is the number of red pens
selected, find

(a) the joint probability function f(x, y),

(b) P[(X, Y ) ∈ A], where A is the region {(x, y)|x + y ≤ 1}.

Solution : The possible pairs of values (x, y) are (0, 0), (0, 1), (1, 0), (1, 1), (0, 2), and (2, 0).

(a) Now, f(0, 1), for example, represents the probability that a red and a green pens are selected.
8 
The total number of equally likely ways of selecting any 2 pens from the 8 is    28. The
 2
 2  3 
number of ways of selecting 1 red from 2 red pens and 1 green from 3 green pens is     6 .
1 1 
Hence, f(0, 1) = 6/28= 3/14. Similar calculations yield the probabilities for the other cases, which

are presented in the Table below. Note that the probabilities sum to 1.

The joint probability distribution becomes

 3  2  3 
   
 x  y  2  x  y 
f ( x, y )  for x=0, 1, 2; and 0≤x+y≤2. Or
8 
 
 2

x Row totals
f(x,y) 0 1 2
0 3 9 3 15
28 28 28 28
1 3 3 0 3
y
14 14 7
2 1 1
28 0 0 28
5 15 3
Column Totals 14 28 28 1

ii. The function f(x, y) is a joint density function of the continuous random variables X and Y if

59
1. f ( x, y )  0 for all ( x, y ),
 
2.  
 
f ( x, y )dxdy  1,

3. P[( X , Y )  A    f ( x, y )dxdy, for any region A in the xy plane.


A

Example: A privately owned business operates both a drive-in facility and a walk-in facility.
On a randomly selected day, let X and Y, respectively, be the proportions of the time that the
drive-in and the walk-in facilities are in use, and suppose that the joint density function of these
random variables is
2
 (2 x  3 y ), 0  x  1, 0  y  1,
f ( x, y )   5

0, elsewhere.

a. Verify condition two of the definition for joint pdf of continuous variables.
 1 1 1
b. Find P[(X, Y ) ∈ A], were A  ( x, y) 0  x  ,  y  
 2 4 2
Solution: (a) The integration of f ( x, y ) over the whole region is
  1 1 2
 
 
f ( x, y )dxdy  
0 0 5 (2 x  3 y)dxdy
 2 x 2 6 xy  x  1
1 1
      dy
0 0
 5 5  x  0
1
 2 6y   2 y 3y2  y  1 2 3
   dy       1
0
5 5   5 5  y  0 5 5

(b) To calculate the probability, we use


 1 1 1
P[(X, Y ) ∈ A]= P 0  x  ,  y  
 2 4 2
1/ 2 1/ 2 2
  (2 x  3 y ) dxdy
1/ 4 0 5
1 / 2 2 x
2
6 xy  x  1 1 / 2 1 3y 
     dy     dy
1/ 4
 5 5 x0 
1 / 4 10 5 

 y 3 y 2  1 / 2 1  1 3   1 3  13
            
 10 10  1 / 4 10  2 4   4 6  160

60
Given the joint probability distribution f(x, y) of the discrete random variables X and Y, the
probability distribution g(x) of X alone is obtained by summing f(x, y) over the values of Y .
Similarly, the probability distribution h(y) of Y alone is obtained by summing f(x, y) over the
values of X. We define g(x) and h(y) to be the marginal distributions of X and Y , respectively.
When X and Y are continuous random variables, summations are replaced by integrals. We can
now make the following general definition.

7.3 Marginal and conditional distributions

Marginal distributions

The marginal distributions of X alone and Y alone are


g ( x)   f ( x, y ) and h( y)   f ( x, y )
y x

for the discrete case, and


 
g ( x)   f ( x, y )dy and h( y )   f ( x, y )dx for the continuous case.
 

The term marginal is used here because, in the discrete case, the values of g(x) and h(y) are just
the marginal totals of the respective columns and rows when the values of f(x, y) are displayed in
a rectangular table.

Conditional distributions

Let X and Y be two random variables, discrete or continuous. The conditional distribution of the
random variable Y given X=x is

f ( x, y )
f ( y x)  , provided g ( x)  0.
g ( x)

Similarly, the conditional distribution of X given Y=y is

f ( x, y )
f ( x y)  , provided h( y )  0.
h( y )

7.4 Independent random variables

61
Let X and Y be two random variables, discrete or continuous, with jo1int probability distribution
f(x, y) and marginal distributions g(x) and h(y), respectively. The random variables X and Y are
said to be statistically independent if and only if

f ( x, y )  g ( x)h( y ) for all x, y  within their range.

7.5 Distributions of functions of two dimensional random variables


Suppose that X 1 and X 2 are discrete random variables with joint probability distribution

f ( x1 , x2 ) . Let Y1  u1 ( X 1 , X 2 ) and Y2  u 2 ( X 1 , X 2 ) define a one-to-one transformation between


the points ( x1 , x2 ) and ( y1 , y 2 ) so that the equations

y1  u1 ( x1 , x2 ) and y2  u2 ( x1 , x2 )
may be uniquely solved for x1 and x 2 in terms of y1 and y 2 , say x1  w1 ( y1 , y2 ) and

x2  w2 ( y1 , y2 ) . Then the joint probability distribution of Y1 and Y2 is


g ( y1 , y2 )  f w1 ( y1 , y2 ), w2 ( y1 , y2 ).
Suppose that X 1 and X 2 are continuous random variables with joint probability distribution

f ( x1 , x2 ) . Let Y1  u1 ( X 1 , X 2 ) and Y2  u 2 ( X 1 , X 2 ) define a one to one transformation between


the points ( x1 , x2 ) and ( y1 , y 2 ) so that the equations y1  u1 ( x1 , x2 ) and y2  u2 ( x1 , x2 ) may be

uniquely solved for x1 and x 2 in terms of y1 and y 2 , say x1  w1 ( y1 , y2 ) and x2  w2 ( y1 , y2 ) .

Then the joint probability distribution of Y1 and Y2 is

g ( y1 , y 2 )  f w1 ( y1 , y 2 ), w2 ( y1 , y 2 ) J , where the Jacobian is 2x2 determinant

x1 x1
y1 y 2
J
x2 x2
y 2 y 2

x1
and is simply the derivative of x1  w1 ( y1 , y2 ) with respect to y1 and y 2 held constant,
y1

referred to in calculus as the partial derivative of x1 with respect to y1 . The other partial
derivatives are defined in a similar manner.

62
Chapter Eight
Expected Value

8.1 Expectation of a random variable

 Let X be a discrete random variable X whose possible values are X1, X2 …., Xn with the
probabilities P(X1), P(X2),P(X3),…….P(Xn) respectively.
Then the expected value of X, E(X) is defined as:
E(X) =X1P(X1) +X2P(X2) +……..+XnP (Xn)
n
E (X) =  X P X  x  if X is discrete,
i 1
i i


and   E ( X )   xf ( x)dx if X is continuous.


Example: what is the expected value for the r.v from the above example?

Solution X= 0,1,2,3,  X 1  0, X 2  1, X 3  2, X 4  3
P(X=x1) = 1/8 P(X= X2) =3/8 , P (X= x4) = 1/8
3
E (X) =  X P X  x 
i 1
i i

= 0(1/8) +1(3/8) + 2(3/8) +3(1/8) = 12/8 = 1.5


8.2 Expectation of a function of a random variable
Let X be a random variable with probability distribution f (x ) . The expected value of the
random variable g ( X ) is

 g ( X )  E[ g ( X )]   g ( x) f ( x) if X is discrete, and
x


 g ( X )  E[ g ( X )]   g ( x) f ( x)dx if X is continuous


8.3 Properties of expected value


 If C is a constant then E(C) = C
 E (CX) =CE(X), Where C is constant.
 E (X+C) =E(X) +C, Where C is a constant.
 E(X + Y)= E(X) +E(Y)

8.4 Variance of a random variable and its Properties

63
Variance
If X is a discrete random variable with expected value  (i.e. E(X) =  ), then the variance of
X, denoted by Var (X), is defined by
Var (X) = E(X-  ) 2

= E (X2) - 2
n

 ( xi ) P  x  - 
2 2
= i
i 1

 ( xi   X ) P  x 
n 2
Alternatively, Var (X) = i
i 1

Properties of Variances
 For any r.v X and constant C, it can be shown that
Var (CX) = C2 Var (X)
Var (X +C) = Var (X) +0 = Var (X)
 If X and Y are independent random variables, then
Var (X + Y) = Var (X) + Var (Y)
More generally if X1, X2 ……, Xk are independent random variables,
Then Var (X1 +X2 + …..+ Xk) = Var (X1) +Var (X2) +…. + var (Xk)
 k  k
I.e. Var   xi  
 i 1 
Var  X 
i 1
i

 If X and Y are not independent, then


Var (X+Y) = Var(X) + 2Cov(X,Y) + Var(Y)
Var(X -Y) = Var(X)-2Cov(X,Y) + Var(Y)
Examples Consider a random variable X that takes a value either
1 or 0 with respective probabilities P and 1-P. find the expected value as well as the variance of
the r.v.
Solution X1 = { 0, 1 }
P(X=1) = P and P(X=0) = 1-P

E (X) =  xiP X  x   0.P X  0  1.P X  1


i

= 0(1-P) + 1(P) = P

64
Var (X) = E (X2) - 2
=  xi P X  x   
2
i
2

= [02 P  x  0   1 P  x  1 ] - P2
2

= [0(1  p)  1( p)]  P
2

= P  P 2 = P (1-P)
2. Two fair coins are tossed. Determine Var (X) where X is the number of heads that appear.
A) Use the definition of the variance
B) Use the fact that the variance of the sum of independent variables is equal to the
sum of the respective variances.

Solution A) X = No of heads = 0,1, 2, HH , TH , HT , TT 


P (X = 0) =¼ , P (X = 1) = ½, P(X=2) = ¼
E (X) = 0.P(X=0) +1.P (X=1) +2P(X=2) = 0 (1/4) + 1(1/2) +2(1/4) = 1=E(X)
E(X2) = 02P(X=1) +12.P(X=1) +22P(X=2) = 0(1/4) + 1(1/2) +4(1/4) = 3/2

Var (X) = E(X2) -  2 = 3/2-1=1/2


B) Let X = head on the first coin = 0,1 T , H 
Y = head on the second coin = 0,1  T , H 
P(X= 0) = ½ , P (X = 1) = ½ and P (Y=0) = ½, P(Y=1) = ½
E(X) = 0.P(X=0) + 1.P(X=1) E(Y) = 0.P(Y=0) +1P(Y=1)
= 0(1/2) +1(1/2) =1/2=E(X) = 0(1/2) +1(1/2) =1/2
E(X2) = 02 .P(X=0) +12.P(X=1) E (Y2) =02.P(Y=0) +12P(Y=1)
= 0(1/2) +1(1/2) =1/2 = 0(1/2) +1(1/2) =1/2

Var (X) = E (X2) -  Var (Y) = E (Y2) - 


2 2

= ½ - (1/2)2 = ¼ = ½ - (1/2)2 = ¼
X and Y are independent (i.e. the outcome of one coin does not influence the out come of the
second)
Var (X+Y) = Var (X) +Var (Y) = 1/4 +1/4 = ½

8.5 Moments and moment generating function

65
The rth moment about the origin
 x r f ( x), if X is discrete, of the random variable X is given by

 r  E ( x r )   x
 x r f ( x)dx, if X is continuous,
 -

The moment generating function about the random variable X is given by E (e tx ) and is denoted

 e tx f ( x) if X is discrete,

by M X (t )  E (e tx )   x

 e tx f ( x)dx if X is continuous.
 

8.6 Chebyshev’s Inequality

(Chebyshev’s Theorem) The probability that any random variable X will assume a value within

1
k standard deviations of the mean is at least 1-1/k2. That is, P (   k  X    k )  1  .
k2

8.7 Covariance and Correlation Coefficient

Let X and Y be random variables with joint probability f ( x, y ) . The covariance of X and Y is

 XY  E[( X   X )(Y  Y )]   ( x   x )( y   y ) f ( x, y ) if X and Y are discrete, and


x y

 
 XY  E[( X   X )(Y  Y )]    ( x   x )( y   y ) f ( x, y )dxdy if X and Y are continuous.
 

Let X and Y be random variables with covariance  XY and standard deviations  X and  Y ,

respectively. The correlation coefficient of X and Y is

 XY
 XY  .
 XY

66
Chapter Nine
Common Discrete Probability Distributions
9.1 Common Discrete Probability Distributions
9.1.1 Binomial Distribution
The origin of binomial experiment lies in Bernoulli trial. Bernoulli trial is an experiment of
having only two mutually exclusive outcomes which are designated by “success(s)” and “failure
(f)”. Sample space of Bernoulli trial is {s, f}.

Notation: Let probability of success and failure are p and q respectively P (success) = P(s) = p
and P (failure) = P (f) = q, where q= 1- p

Definition: Let X be the number of success in n repeated Binomial trials with probability of
success p on each trial, then the probabilities distribution of a discrete random variable X is
called binomial distribution.
Let P = the probability of success

67
q= 1-P= the probability of failure on any given trial.

A binomial random variable with parameters n and p represents the number of r successes in n
independent trials, when each trial has P probability of success

If X is a random variable, then for r= 0, 1, 2… n


n!
Pr  q 
nr
P(X=r) = whereq  1  P and P(X=r) is called a binomial
r ! n  r !

probability distribution .
Assumptions of a binomial distribution
1. The experiment consists of n identical trials.
2. Each trial has only one of the two possible mutually exclusive outcomes, success
or a failure.
3. The probability of each outcome does not change from trial to trial.
4. The trials are independent.
Examples of binomial experiments
 Tossing a coin 20 times to see how many tails occur.
 Asking 200 people weather or not they listen the BBC news.

Example 1. A fair coin is flipped 3 times, what is the probability of getting


a) No heads?
b) 2 heads?

Solution : Let x = no of heads


i = 0,1,2,3, (i.e. i is the no of possible no of heads)
P (getting head) =) P = ½ q = 1-P =1/2 , n =3
0 3 0
3! 1  1 
     1
a) P(X=0) = 0!3  0!  2   2  8

2 3 2
3! 1 1
b) P (X=2) =      3
2! 3  2 !  2   2  8
2. The probability that a student entering a college will graduate is 0.4. Determine the probability
that out of 5 students (a) none, (b) one (b) at least one (a) at most three will graduate
Solution X: No of students who will graduate

68
X = 0,1,2,3,4,5, P = 0.4, q = 1-P = 0.6

5!
0.40 0.65  0.08
0!5  0!
a) P (None will graduate) = P (X=0) =

5!
0.41 0.65  0.26
1!5  1!
b) P (one will graduate) = P (X=1) =

c) P (at least one will graduate) = P (X  1)


= P(X=1) + P(X=2) +P(X=3) +P(X=4) +P+X=5)
= 1-P(X<1) = 1-P (X=0)
5!
0.40 0.65
0!5  0 !
= 1-

= 1-0.08=0.92
d) P (at most three will graduate) = P(X  3)
= 1-P(X>3)
= 1- [ P( x  4)  P ( x  5)]
= 1-[5!/(4!(5-4)!(0.4)4(0.6)1+5!/5!(5-5)!)(0.4)5(0.6)0]
= 0.91296
If X is a binomial random variable with two parameters n and P, then
1. E (X) = n.p.
2. Var ( X) = npq
9.1.2 Poisson distribution
- It is a discrete probability distribution which is used in the area of rare events such as
number of car accidents in a day, arrival of telephone calls over interval of times, number
of misprints in a typed page natural disasters Like earth quake, etc,

A Poisson model has the following assumptions

- The expected occurrences of events can be estimated from part trials ( records)
- The numbers of success or events occur during a given regions / time intervals are
independent in another.

69
Definition Let X be the number of occurrences in a Poisson process and  be the actual
average number of occurrence of an event in a unit length of interval, the probability function for
Poisson distribution is,


e  x
P (X = x) = 
 forX  0,1, 2,....
X!

 0, otherwise
Remarks
 Poisson distribution possesses only one parameter 
 If X has a Poisson distribution the parameter  , then E (X) =  and
Var (X) =  , i.e. E (X) = Var (X) =  ,

  P( X  x)  1
x 0

Examples 1: A company manufacturing light bulbs discovers from past experience that 2 defects
of bulbs are manufactured per 30 working hours. What is the prob that 4 defects will be
manufactured in 30 working hours?
Solution: Let X be the R.v that the no of defected bulbs and   2,

e 2 .2 4
P (X = 4)   0.09
4!
Example 2: In a small city, 10 accidents took place in a time of 50 days. Find the probability that
there will be
a) Two accidents in a day
b) three or more accidents in a day
Solution: In 50 days we have 10 accidents, then the number of accidents per day becomes
10/50 = 0.2 or   0.2
Let X be the rv., the No of accidents per day
X ~poiss   0.2 X = 0, 1, 2,…

e 0.2 0.2
2

a) P (X = 2) =  0.0164
2!
b) P (X  3)  P( X  3)  P X  4  P X  5  ...

70

= 1- P X  0  P X  1  P X  2 . . . . . ………… b/c  P X  x   1
x 0
= 1- 0.8187  0.1637  0.0164
= 0.0012
3. a) Referring to eg.1, what is the expected no of defected light bulbs in a day? What
about the variance?
b) Referring to eg.2, find the mean and the variance for the no of accidents in a day
Solution a) E (X) = Var (X) =   2
b) E (X) = Var (X) =   0.2

Example 3: Suppose the number of typographical errors on a single page of your book has a
Poisson distribution with parameter λ = 1/2. Calculate the probability that there is at least one
error on this page.

Solution. Letting X denote the number of errors on a single page, we have


P(X ≥ 1) = 1 − P(X = 0) = 1 − e−0.5 ≈ 0.395
Approximating Binomial with Poisson
Poisson distribution can approximate binomial distn, when the number of trials, n is
comparatively large and the probability of an occurrence a success, P is small; preferably with np
≤ 7.
The approximating formula is
e np  np 
, X  0,1, 2,...., Generally we use poisson to approximate
P(X=x) = x!
a Binomial when n  50 and np  5
Example: Suppose that an insurance company has 2000 policy holders & that the probability of
any one of policy holders will file at least one claim in any given year is 1/1000. Find the
probability that in any given year one or more of the policy holders will file at least one claim.

Solution X = No of policy holders who will file at least one claim


X = 0,1,....... 2000 n = 2000, p = 0.001

 2000 
P( X  1  1  P  x  1  1  p  x  0   1   6   0.001  0.999   0.8648
0 2000

 
Since n = 2000 > 50 and np = 2, we can use Poisson approximation

71
e 2 .2
0

P (X  1) = 1-p  x  1  1  p  x  0   1   0.8647
0!

9.2.3 Geometric Distribution

If repeated independent trials can result in a success with probability p and a failure with
probability q=1-p, then the probability distribution of the random variable X, the number of the
trial on which the first success occurs, is

g ( x; p)  pq x 1 , x  1, 2, 3, ...

1 1 p
The mean and variance of geometric distributions are and respectively.
p p2

9.2 Common Continuous Probability Distributions

9.2.1 Uniform distribution

The density function of the continuous uniform random variable X on the interval [a, b] is

 1
 , a  x  b,
f ( x; a, b)   b  a
0, elsewhere.

The mean and variance of the uniform distribution are


ab
and  2 
b  a 2
2 12

9.2.2 Normal Distributions

It is the most important distribution in describing a continuous random variable and used as an
approximation of other distribution. A random variable X is said to have a normal distribution if
its probability density function is given by
1
 2  x 
2
1
f ( x)  e 2 , Where X is the real value of X,
 2
i.e. -  <x<  , -∞<µ<∞ and σ>0
2
Where µ=E(X), (σ) = variance(X)
µ and (σ) 2 are the parameters of the Normal Distribution.
Properties of Normal Distribution:

72
1. It is bell shaped and is symmetrical about its mean. The maximum coordinate is at

x = X
2. The curve approaches the horizontal x-axis as we go either direction from the mean.
  1
1   x   2
3. Total area under the curve sums to 1, that is

 f ( x)dx 
 2  e 2
dx  1

4. The Probability that a random variable will have a value between any two points is equal
to the area under the curve between those points.

5. The height of the normal curve attains its maximum at  X this implies the mean and
mode coincides(equal)

Standard Normal Distribution

It is a normal distribution with mean 0 and variance 1. Normal distribution can be converted to

standard normal distribution as follows. If X has normal distribution with mean  X and standard
x
deviation σ, then the standard normal distribution devariate Z is given by Z=

2
1 z
P (Z) =
2
e 2

Properties of the standard normal distribution:


 The same as normal distribution, but the mean is zero and the variance is one.
 Areas under the standard normal distribution curve have been tabulated in various ways.
The most common ones are the areas between Z = 0 and a positive value of Z.
Given a normal distributed random variable X with mean µ and standard deviation σ.
b x a
P (a<X<b)  P(   )
  
 x a   a x
P( X  a )  P    = PZ   But,  Z Standard normal
       
random variable
Note: i) P (a<x<b) = P (a ≤ X < b) = P (a < X ≤ b) =P (a ≤ X ≤ b)
ii) P (  Z  )  1
iii) the mean is zero and variance 1 for standard normal distribution.
9.2.3 Exponential distribution

73
The continuous random variable X has an exponential distribution, with parameter  , if its
density function is given by
 1 x / 
 e , x  0,
f ( x;  )    where   0.
0,
 elsewhere

The mean and variance of the exponential distribution are    and  2   2 .

74

You might also like