Electronic Statistics and Probabilities
Electronic Statistics and Probabilities
Prepared
By
2025
-2-
-3-
Preface
* Types of Data.
* Data Graph.
* Measures of Tendency.
* Measures of Dispersion.
* Probability Theory.
* Probability Distribution.
* Estimation Theory.
* Analysis of Variance.
-4-
-5-
Contents
Firstly Statistics
* Types of Data.
* Data Graph.
* Measures of Tendency.
* Measures of Dispersion.
CHAPTER 1
An introduction
- 10 -
- 11 -
CHAPTER 1
An introduction
Descriptive Statistics:
Descriptive statistics is a set of procedures for
organizing and summarizing information in order to
make them readable and comprehensible.
Inferential Statistics:
It is used to make generalizations and extract
conclusions about the population. In other words,
statistical inference is the process of making an estimate,
prediction, or decision about population based on
sample data. Since populations are always large and the
investigation of each member will be expensive, it is
easier and cheaper to draw a sample from the
population under consideration and deduce conclusions
- 12 -
Statistical Analysis:
When we use statistics to make a decision, we carry
out a statistical analysis. Statistical analysis can divided
into four phases:
1. Study design,
2. Data collection,
3. Data analysis, and
4. Interpretation of the conclusions of the analysis in
order to take a convenient action.
The observations or measurements you collect from
an investigation or survey are referred to as data, the
data set, or the database. A single observation or
measurement is called a datum or data point. Data may
be either quantitative (numerical) or qualitative
(categorical):
Biostatistics:
Biostatistics is the science dealing with application of
statistics to the medical, health and biological fields.
- 13 -
Population:
A population is the entire set of observations or
measurements under study. That is it is the set of all
items of interest in a statistical problem. In other words,
a population is a collection of all the elements about
which we want to draw conclusions.
Sample:
A sample is a set of observations selected from the
population, and is therefore only a part of entire
population. That it is a collection of the elements we are
studying.
Parameter:
A summary numerical result calculated from census
data is called parameter. Statisticians assign Greek
letters to parameters. For example, (mu) is symbol
used for the population average, (pi) is the symbol
used for the population proportion and 2 is the symbol
used for the population variance. Briefly, we can say that
the parameter is a descriptive measure of a population.
Statistic:
A summary numerical result calculated from sample
data is called a statistic. X (x–bar) is the symbol used
for the sample average. P is the symbol used for the
sample proportion, and s2 is the symbol used for the
sample variance. Inferential statistics provides the tools
for drawing inferences about a parameter from a
- 14 -
Variable:
There are several; ways to classify the variables.
Variables can be divided into qualitative variables and
quantitative variables. On the other hand, the
quantitative variables can be divided into two type: the
discrete variables and the continuous variables.
Qualitative Variable:
The variable which can't be measured numerically is
called a qualitative variable. That is we can't assign
numerical values for the qualitative variable. In other
words, the qualitative variables are categorical
observations. The presence or absence of pain, name of
patient, sex and religion all are examples of qualitative
variables.
Quantitative Variable:
The variable can be numerically measured is called
quantitative variable. Ages, heights, blood pressure,
heart rates and temperatures all are examples of
quantitative variables.
Discrete Variable:
The variable which characterized by integer numbers
is called a discrete variable. That is there is a gap
between any two values of discrete variable. The
- 15 -
Continuous Variable:
The variable which characterized by any value in
given interval is called a continuous variable. That is
there are no gaps between the values of continuous
variable. Variables such as height, weight and
temperature are examples of continuous variable.
Levels of Data:
Data can may be classified as nominal, ordinal, ratio,
and interval.
Nominal Data:
Nominal data name or denote differences in kind.
Nominal data are also referred to as categorical data. For
example, name, sex, religion, presence of pain and
occupation. Numerical data can nominal such as the
serial number on a patient sheet. Nominal numerical
data are names. Consequently, you should not perform
any calculations with the nominal numbers themselves.
Ordinal Data:
Ordinal data note differences in degree and can be
ordered or ranked Good / better / best and small /
medium / large are examples of ordinal data. Ordinal
- 16 -
Ratio Data:
Permit the measurement of the exact difference
between any according to whether your data are
nominal, ordinal or ratio.
- 17 -
Exercises 1
1. Consider the following record of a patient admitted
to an hospital today.
Name:
Age:
Gender:
Occupation:
City:
Marital:
status:
Number of children:
Weight:
Height:
Are you in pain?
Se verity of pain:
Systolic blood pressure:
Temperature:
Pulse:
Blood group:
Determine the type of data in each item.
2. In your own works, define and give an example of each
of the following statistical terms.
- 18 -
(a) Population.
(b) Sample.
(c) Parameter.
(d) Statistics.
(e) Statistical inference.
3- Provide two examples each of qualitative data and
quantitative data.
4- Discuss the difference between qualitative and
quantitative data.
- 19 -
CHAPTER 2
CHAPTER 2
29 69 61 63 42 47 41 36 45 49
51 60 30 34 43 37 45 55 43 35
53 57 36 42 31 48 28 46 45 39
49 38 21 33 49 38 44 27 31 50
27 43 22 23 47 59 24 16 49 48
- 22 -
= 53 ÷ 7 = 6.57 8
- 23 -
Notes:
(a) For each: (say the first as an example)
16 is called the lower limit.
24 is called the upper limit.
Histogram:
The information in a frequency distribution is often
graphed. The common way of presenting a frequency
distribution graphically is the histogram, which is
constructed by marking off the class limit along the
horizontal axis and erecting over each class interval a
rectangle whose height equals the frequency of data
class. The histogram corresponding to the frequency
distribution in the last table is shown in the following
figure:
- 25 -
20
10
Histogram
Histogram of C1 N = 50
Midpoint Count
20.00 4 ****
28.00 8 ********
36.00 9 *********
44.00 13 *************
52.00 10 **********
60.00 5 *****
68.00 1 *
- 26 -
Frequency Polygon:
Another common way of presenting a frequency
distribution graphically is the frequency polygon. A
frequency polygon is obtained by plotting the frequency
of each class above the midpoint of that class and then
joining the points with straight lines. The polygon is
usually closed by considering one additional class (with
zero frequency) at each end of the distribution and then
extending a straight line to the midpoint of each of these
classes. Frequency polygons are useful for obtaining a
general idea of the shape of the distribution.
Frequency Polygon
14
Number of Patients
12
10
8
Series1
6
4
2
0
0 50 100
Ages
Stem Leaf
1 6
2 97123847
3 806431786159
4 9322397781546535998
5 137950
6 9013
Note:
If the data contain decimal points, define the stem as
the digits to left of the decimal and the leaf as the digit
to right of the decimal.
- 28 -
Pie Chart:
A pie chart is simply a circle subdivided into a number
of slices (or sectors) that represent the various
categories. It should be drawn so that size each slice is
proportional to the percentage corresponding to that
category.
Example:
Given the following table that shows blood group of 20
patients:
50
Percentage for blood group A = = 25%
200
25
Percentage for blood group B = = 12.5%
200
40
Percentage for blood group AB = = 20%
200
85
Percentage for blood group O = = 42.5 %
200
Number of Patients
A; 50;
25%
O; 85; A
42% B
AB
B; 25;
O
13%
AB; 40;
20%
- 31 -
EXERCIES 2
86 75 66 77 66 64 73 91 65 59
78 61 86 61 58 70 77 80 58 94
55 62 79 83 54 52 45 82 48 67
CHAPTER 3
CHAPTER 3
MEASURES OF CENTRAL TENDENCY
Arithmetic Mean:
The most commonly used measure of location is the
arithmetic mean (or mean). It is obtained by adding all
values of the set of data and dividing by the number of
these values. In other words, the mean is the sum of
values divided by the number of data points. The mean
of a set of measurements is defined as follows:
Sum of measurements
x
umber of measurements
- 38 -
In symbols.
n
x
i 1
i
x1 x2 ... xn
X
n n
Where:
X called " x – bar" is the mean
xi is the ith data point.
Example 1:
The following is a set of measure height of 8 children
(in cm).
Solution:
x x x
X
i 1
i
810
X 101.25 cm
n 8
X xx
5 5-11.33= - 6.33
10 10 – 11.33 = - 1.33
12 12 – 11.33 = 0.67
17 17 – 11.33 = 5.67
20 20 – 11.33 = 8.67
4 4 – 11.33 = - 7.33
68 0.02 0
Frequency 4 8 9 13 10 5 1
Solution:
To calculate the arithmetic mean, we use two methods
as follows.
1. Direct Method
Classes f x f x
16- 4 20 80
24- 8 28 224
32- 9 36 324
40- 13 (44) 572
a
48- 10 52 520
56- 5 60 300
64 - 72 1 68 68
Total 50 2088
- 42 -
X
f x 2088 41.76
f 50
2. Short-cut Method
16- 4 20 24 -3 -12
24- 8 28 -16 -2 -16
32- 9 36 -8 -1 -9
40- 13 (44) a 0 0 0
48- 10 52 8 1 10
56- 5 60 16 2 10
64 - 72 1 68 24 3 3
Total 50 -14
Notes:
*d xa
d
*d a
'
X a
fd '
l
f
14
X 44 8 41.76
50
Properties of Mean:
Median:
The median is the middle value when the data
arranged in order. Consequently the median is greater
than or equal to half of the observations and less than or
- 44 -
n 1
Position of median
2
7 1
Position of median 4
2
n 8
The position of the first value = 4
2 2
12 13 25
Median = 12.5
2 2
Frequency 4 8 9 13 10 5 1
Solution:
* Using Ascending Cumulative Frequency
The median of this frequency distribution may be found
carrying out the following steps:
1.Construct the ascending cumulative frequency
distribution table of the origin distribution.
- 47 -
Upper limits CF
Less than 24 4
Less than 32 12
Less than 40 21 f1
Less than 48 34 f2
Less than 56 44
Less than 64 49
Less than 72 50
Position of median
f
50
25
2 2
Pos. f1
Median M 0 l=
f 2 f1
25 21
40 8 42.46
34 21
48 or more 16 f1
56 or more 6
64 or more 1
- 49 -
Position of median f
50
25
2 2
Pos. f1
Median M u l=
f2 f 1
25 16
48 8 42.46
29 16
Median graphically
To find the median graphically, we carry out the
following:
- 50 -
position of median
f
2
Example 7:
Given the following frequency distribution:
Frequency 4 8 9 13 10 5 1
Solution:
Ascending cumulative frequency distribution table
Upper limits CF
Less than 24 4
Less than 32 12
Less than 40 21
Less than 48 34
Less than 56 44
Less than 64 49
Less than 72 50
Position of median f
50
25
2 2
- 52 -
CF
50 *
*
40 *
30 *
20 *
10 * Classes
*
24 32 40 48 56 64 72
Median 42,46 =
Properties of Median:
1. Uniqueness: The median of a set of data is unique
value.
2. Simplicity: The median is easy to obtain.
3. The median is not affected by extreme values.
4. The median does not use all information available, and
therefore it is usually less efficient than the mean
because it wastes information.
- 53 -
Quartiles
Firstly: Ungrouped Data
Example 8:
Given the following set of data:
11 9 14 8 15 19 21 24 17 30 27
Find
(a) First quartile.
(B) Third quartile.
(C) Quartile deviation.
Solution:
(a) First Quartile (Q1)
Ascending order of data
8 9 (11) 14 15 17 19 21 (24) 27 30
n 1 12
Position of Q1 3
4 4
Profit 5- 10 - 15 - 20 - 25 - 30 - 35 - 40
Number of 4 11 21 37 17 7 3
companies
Compute:
(a) The first quartile.
(b) The third quartile.
(c) Quartile Deviation.
Solution:
(a) The first quartile
1. Ascending cumulative frequency distribution table
- 55 -
Upper limits CF
Less than 10 4
Less than 15 15 f1
Less than 20 36 f2
Less than 25 73
Less than 30 90
Less than 35 97
Less than 40 100
Position of Q1
f
100
25
4 4
Pos. f1
Q1 M 0 l=
f 2 f1
25 15
15 5 17.381
36 15
Upper limits CF
Less than 10 4
Less than 15 15
Less than 20 36
Less than 25 73 f1
Less than 30 90 f2
Less than 35 97
Less than 40 100
3f 300
Position of Q3 75
4 4
Pos. f1
Q3 M 0 l=
f 2 f1
75 73
25 5 25.590
90 73
Lower limits CF
5 or more 100
10 or more 96
15 or more 85
20 or more 64
25 or more 27 f2
30 or more 10 f1
35 or more 3
Position of Q1 f
100
25
4 4
Pos. f1
Median M u l=
f2 f 1
50 27
25 5 21.892
64 27
Mode:
The mode of measurements is the value that occurs
most frequency. If all measurements are different, then
the set has no mode. A set of data may have more then
one mode. If a set of data has two mode, it is called
bimodal set. In addition, if the set of data has more than
two modes, it is called a multimodal set
Firstly: Ungrouped Data
The mode of an ungrouped set data is most frequently
value
Example 10:
A set of data:
15, 19 , 21, 19, 16, 14
has one mode (19). It is called unimodal set of data.
- 60 -
Example 11:
A set of data:
13, 12, 18, 17, 12, 18, 12
has one mode (12). Note that 12 occurs more than the
value of 18.
Example 12:
A set of data:
20, 12, 21, 20, 25, 28, 25
has two modes (20 and 25). It is called bimodal set of
data because 20 and 25 occur with equal frequency.
Example 13:
A set of data:
20, 18, 15, 12, 42, 30
has no mode because all values are different.
Properties of Mode:
1. It is not unique.
2. It is easy to obtain
- 61 -
Example 14:
Find The mode for the frequency distribution in
example (3).
Solution:
1. Determine the modal class (the class opposite the
largest frequency).
2. Determine the preceding and next classes, and put the
data in a table like, the following one:
Class Frequency
32 - 40 9
40 - 48 13 Modal class
48 - 56 10
4
Median 40 8
4 3
= 40 + 4.57 = 44.57
Example 15:
Mode in case of unequal classes frequency distributions
If the frequency distribution has unequal classes, it is
required to modify the frequency to proportion to
lengths of classes, where
frequencyof the class
Modified frequency
lenthof the class
Example 16:
Given the following frequency distribution:
Classes 0- 5- 12 - 20 - 30 - 45 - 50
Frequency 3 8 16 11 7 5
classes Modified
f
5 - 12 1.143
12 - 20 2 Modal class
20 - 30 1.1
1 = 2 – 1.143 = 0.857
2 = 2 – 1.1 = 0.9
0.857
Median 12 8 15.9
0.857 0.9
- 64 -
Mode graphically
The mode can be graphically obtained.
This will be done through lecture.
Notes:
1. When the distribution has extremes on one side, the
median is the preferred measure of central location.
2. When the data are ranked, the median is preferred.
3. The mode is preferred in case of qualitative data, for
example, suppose the severity of pain o patients seen in
a pain clinic are classified as : no pain / mild moderate /
severe / very severe, the mode can describe the most
frequent degree of severity of pain.
4. The mode is the only measure of location which can
be used for variables measured at nominal scale such as
sex.
EXERCISES 3
0 3
1 10
2 4
3 2
4 1
- 69 -
Find:
(a) The Arithmetic mean
(b) The Median.
(c) The Mode.
Number of 3 11 18 35 19 12 2
patients
Find :
(d) The Arithmetic mean
(e) The Median.
(f) The Mode.
- 70 -
- 71 -
CHAPTER 4
CHAPTER 4
MEASURES OF DISPERSION & SKEWNESS
Range:
The first and simplest measure of dispersion is the
range. The range of a set of measurements is the
numerical difference between the largest and smallest
measurements.
Firstly: Ungrouped Data
Example 1:
For the set of data:
20, 12 , 15, 17, 19
- 74 -
Solution:
Arrange the data in its ascending order:
12, 15, 17, 19, 20
Smallest value = 12
Largest value = 20
Range = largest value – smallest value
Range = 20 - 12 = 8
Secondly: Grouped Data
The range for the grouped data can be obtained as
the difference between the midpoint of the last class and
the midpoint of the first class.
Since the range is interested in the smallest and the
largest values and most of the information is ignored,
then it is seldom used analytically as a measure of
variations.
Mean Deviation:
The mean deviation or the average deviation takes
into account each observation and thus is superior
measure of the variability of the data than is either the
range. We can calculate the mean deviation using the
formula.
- 75 -
n
i 1
xi x
Mean Deviation
n
Where : xi x is called the absolute value of the
Example 2:
Compute the mean deviation of the following set of
observations:
10 12 18 20 25
Solution:
x xx xi x
10 -7 7
12 -5 5
18 1 1
20 3 3
25 8 8
85 24
- 76 -
x
x 85 17
n 5
n
i 1
xi x
24
Mean Deviation 4.8
n 5
This means that on average, the data points differ from
their mean by 4.8
Note:
as another method to obtain the mean deviation, we
can use
n
xi x
n
i 1
xi median
i 1
instead of .
n n
Class 10 - 20 - 30 - 40 - 50 - 60 - 70
frequency 7 11 19 15 9 4
- 77 -
class f x f x
xxf
10 - 7 15 105 161.56
20 - 11 25 275 143.88
30 - 19 35 665 58.52
40 - 15 45 675 103.80
50 - 9 55 495 152.28
60 - 70 4 65 260 107.68
Total 65 2475 727.72
x
f x 2475 38.08
f 65
n
i 1
xi x f
727.72
Mean Deviation 11.20
f 65
S2
( x x) 2
n
and the standard deviation is the positive squared root
of the variance, that's,
S
( x x) 2
n
x x
2 2
S 2
n n
Example 3:
The following is the set of height – measurement for
six students:
150, 165, 165, 180, 190, 170
Find the mean, variance and standard deviation.
Solution:
x
xx ( x x) 2
150 - 20 400
165 -5 25
165 -5 25
180 10 100
190 20 400
170 0 0
1020 0 950
- 80 -
The mean:
x
x 1020 170
n 6
The variance:
S2
(x x )2
950
158.33
n 6
The standard deviation:
S
(x x ) 2
950
158.33 12.58
n 6
Another Solution:
x x2
150 22500
165 27225
165 27225
180 32400
190 36100
170 28900
1020 174350
- 81 -
x x
2 2 2
174350 1020
S 2
158.33
n n 6 6
Example 4:
Given the following set of data:
15 18 20 14 17 21 22 24 30 19
It is required to compute:
(a) The arithmetic mean.
(b) The variance.
(c) The standard deviation.
- 82 -
Solution:
x xx ( x x) 2
15 -5 25
18 -2 4
20 0 0
14 -6 36
17 -3 9
21 1 1
22 2 4
24 4 16
30 10 100
19 -1 1
200 0 196
The mean:
x
x 200 20
n 10
The variance:
S2
( x x) 2
196
19.6
n 10
- 83 -
S
(x x) 2
196
4.43
n 10
S 2
(x x ) 2
f
f
S
(x x )2 f
f
The Second Method
By using the midpoints of the classes.
- 84 -
The Variance
x f xf
2
2
S 2
f f
x f xf
2
2
S
f f
The Variance
d '2 f d ' f
2
S l
2 2
f f
d '2 f d ' f
2
2
S l
f f
- 85 -
Example 5:
The following table presents the frequency distribution
of the weekly wages for 100 workers in some company:
x
xf
39700
397 LE
f 100
The Variance
S2
(x x )2 f
2009100
20091 LE
f 100
S
(x x ) 2
f
20091 141.74 LE
f
classes f x fx fx 2
The Variance
x f xf
2
2
S 2
f f
2
17770000 39700
20091 LE
100 100
x f xf
2
2
S 141.74 LE
f f
Third Method
The Variance
d '2 f d ' f
2
2
S l
2
f f
223 47 2
10000 20091 LE
100 100
d '2 f d ' f
2
S l
2
20091 141.74 LE
f f
Notes:
*d x a
*d ' d / l
Coefficient of Variation:
The coefficient of variation of a set of measurements
is the standard deviation of the measurements divided
by their mean. This measure is a relative one and it
allows us to compare the relative variability of the two
data sets, because it adjusts for differences in the
magnitudes of the means of the data sets. For example
we may wish to know, for a certain population whether
serum cholesterol levels, measured in mg par 100 ml, are
more variable than body weight, measure in pounds. To
- 89 -
S
CV
(100)
x
Example 6:
Compute the coefficient of variation for the data of the
example 4.
Solution:
We have x 20 and S 4.43
S 4.43
CV
(100) (100) 22.15 %
x 20
Measures of Skewness:
Two characteristic of a distribution have been
studied namely, measures of central tendency and
dispersion. Another characteristic which be measured is
skewness. As we previously mentioned, the distribution
which is symmetrical has no skewness. That is the
skewness is zero. Karl Pearson developed the coefficient
of skewness (SK) to measure the amount and direction
of skewness.
SK = 3 ( mean median)
- 90 -
Where:
Sk : Coefficient of skewness.
: Standard deviation.
and
-3 SK 3
Example 7:
The following is the number of patients admitted
University Hospital through a week:
Find:
(a) The arithmetic mean.
(b) The standard deviation.
(c) The median.
(d) Pearson's coefficient of skewness.
Solution:
To calculate the mean and the standard deviation,
construct a table like the following:
- 91 -
Xi Xi - X X i X
2
10 -2 4
5 -7 49
2 -10 100
15 3 9
9 -3 9
20 8 64
23 11 121
84 0 356
Xi 84 = 12 patients
X
n 7
b. The standard deviation:
X i X
2
356
2 7
50.857 = 7.13
= 7 patents approximately
- 92 -
SK = 3 ( mean median)
= 3 (12 10 ) 0.857
7
3. Variance 2 =
f
2
f X i2 f X i
=
f f
f X i X
2
2
f X i2 f X i
=
f f
1. Coefficient of variation (CV) X (100)
3 ( mean median)
2. Coefficient of skewness (SK) =
Example 8:
Find:
(a) The range (R).
(b) The mean deviation (MD).
(c) The variance (2).
(d) The standard deviation ().
(e) The coefficient of variation (CV).
(f) The coefficient of skewness (SK).
Solution:
a. Range (R) = midpoint of last class – midpoint of first
class
= 11 – 1 = 10
b. Mean Deviation (MD):
Number Number of x fx X X f X X
of DMF children (f)
f X 594
X 5.94
f 100
f X X 239.64
MD 2.396
f 100
The variance:
Number Number of x fx f x2
of DMF children (f)
0- 8 1 8 8
2- 19 3 57 171
4- 26 5 130 650
6- 21 7 147 1029
8- 17 9 153 1377
10 - 12 9 11 99 1089
Total 100 594 4324
2
f X i2 f X i
2
f f
2
432 4 594
100 100
2
f X i2 f X i
f f
7.96 2.82
f 100
Position of Median 50
2 2
Pos CF
Median M w
f
50 27
4 2
26
23
4 2
26
= 4 + 1.77 = 5.77
3 ( mean median)
SK
3 ( 5.94 5.77 )
2.82
3 ( 0.17 )
0.18
2.82
- 98 -
EXERCISES 4
Number of patients 3 7 29 19 8 3 1
Find:
(e) The range.
(f) The mean deviation.
(g) The variance.
(h) The standard deviation.
(i) The coefficient of variation.
(j) Pearson's coefficient of skewness.
- 101 -
Chapter 5
Correlation & Regression
- 102 -
- 103 -
Chapter 5
Correlation & Regression
Firstly: Correlation
Coefficient of Linear Correlation
The coefficient of linear correlation measures the type
and degree of the relation between two variables; one
of them is called the independent variable ( say: x ) and
the other is called the dependent variable ( say: y )
where the value of y depends on the value of x.
The relationship between two variables x and y can be
obtained by
n x y x y
n x x n y y
2 2 2 2
Where 1 1
* If 1 , there is a perfect positive relation between x
and y .
* If 1 , there is a perfect negative relation between
x and y .
Example1:
The income x and consumption y of six persons (in
thousands pounds) are presented in the following table:
Income 4 6 7 9 11 13
Consumption 2 3 5 8 9 10
Compute coefficient of linear correlation between x and
y . comment on the result.
Solution:
The coefficient of linear correlation between x and y
x y x y x2 y2
4 2 8 16 4
6 3 18 36 9
7 5 35 49 25
9 8 72 81 64
11 9 99 121 81
13 10 130 169 100
50 37 362 472 283
x y x y
n
n x x n y y
2 2 2 2
- 105 -
2172 1850
[ 2832 2500] [1698 1369]
322
0.97
[ 332] [329]
Y a b X
Where
n x y ( x ) ( y )
b
n x 2 ( x ) 2
and
a
y bx
n
14
12
10
8
ص
شكل االنتشار
6
4
2
0
0 5 10 15 20
س
- 107 -
16
14
12
10
ص
8 شكل االنتشار
6
4
2
0
0 5 10 15 20
س
30
25
20
ص
15 شكل االنتشار
10
0
0 10 20 30
س
- 108 -
12
10
حجم المبيعات
8
6 شكل االنتشار
4
2
0
0 2 4 6 8
تكلفة اإلعالن
- 109 -
The scatter diagram shows that there is a positive relation between the
advertisement cost and the volume of sales, and the value of regression
coefficient b will be a positive value as we will see.
n=7 x 25 y 53
x y 216 x 2
107 y 2
449
n x y ( x ) ( y )
b
n x 2 ( x ) 2
53 1.508(25)
a 2.186
7
Example 3:
Compute the sum of the squares of the deviations of the original values
about the estimated values ( sum of squares of error).
e 2 ( y y) 2
Solution:
y 2.186 1.508 x
y1 2.186 1.508(1) 3.694
y2 2.186 1.508(2) 5.202
y3 2.186 1.508(3) 6.710
y4 2.186 1.508(4) 8.218
y5 2.186 1.508(5) 9.726
- 111 -
y6 2.186 1.508(6) 11.234
y7 2.186 1.508(4) 8.218
x y y ( y y) ( y y) 2
1 3 3.694 -- 0.694 0.481636
2 5 5.202 -- 0.202 0.040804
3 8 6.710 1.290 1.6641
4 7 8.218 1.218- 1.483524
5 9 9.726 -- 0.726 0.527076
6 11 11.234 -- 0.234 0.054756
4 10 8.218 1.782 3.175524
25 53 53.002 - 0.002 7.42742
n2
Or Sy/x
y 2
a y b x y
n2
Sy/x
( y y) 2
7.42742
1.2188
n2 72
Sy/x
y 2
a y b x y
n2
Second Part
Probabilities
* Probability Theory.
* Some Discrete Distributions.
* Probability Density Functions.
* Some Continuous Distribution.
* Estimation Theory.
Chi Square Test.
* Analysis of Variance.
- 114 -
- 115 -
CHAPTER 6
PROBABILITY THEORY
- 116 -
- 117 -
CHAPTER 6
PROBABILITY THEORY
Experiment:
An experiment is the process that gives rise to an
Outcome. Rolling a die and tossing a coin are examples
of an experiment.
- 118 -
Outcome:
An outcome is a single possible result experiment
Obtaining a 2 is an outcome when rolling a die.
Sample Space:
The sample space denoted by S is the set of all
possible outcomes of an experiment. The outcome of 1,
2, 3, 4, 5, or 6 dots showing are the sample space when
rolling a die.
Event:
An event is a collection of one or more specific
outcomes. Obtaining an even number of dots when
rolling a die is an example of an event. In this example,
the event comprises three outcomes.
The probability of an event A is equal to the sum of
the probabilities assigned to the simple events contained
in A. In other words, the probability of an event, A
denoted by P(A) can be obtained using the following law.
Solution:
The sample space: S = {H, T}
Example 2:
Tossing a balanced coin twice, write the sample space
of this experiment and find the probability of getting:
(a) Exactly two heads.
(b) No heads.
(c) Exactly one tail.
Solution:
The sample space:
S = {(H, H), (H,T), (T,H), (T,T)}
Suppose that:
A = Event of getting two heads.
B = Event of getting no heads.
C = Event of getting one tail
(a) A = { (H,H) }
n( A) 1
P( A)
n( S ) 4
(b) B = {(T,T)}
n( B ) 1
P( B)
n( S ) 4
- 120 -
n(C ) 2 1
P(C )
n( S ) 4 2
Example 3:
Tossing a balanced coin three times, write the sample
space of this experiment and find the probability of:
(i) Getting exactly one head.
(ii) Getting at least one head.
(iii) Getting at most one head.
Solution:
The sample space:
n( A) 3
P( A)
n( S ) 8
n( B ) 7
P( B)
n( S ) 8
n(C ) 4 1
P(C )
n( S ) 8 2
P ( A and B ) 0
Andtherefore
P(A or B) = P (A) + P (B) – P (A and B)
Conditional Probability:
Let A and B two events such that P (B) 0. The
conditional probability that A occurs, given that B has
occurred, is
P( A B)
P( A / B)
P( B)
Independent Events:
Two events are statistically independent if the
occurrence of either event has effect on the probability
of the occurrence of the other. That is two events A and
B are said to be independent if
P (A/B) = P (A) or P (B/A) = P (B)
Otherwise, the events A and B are said to be
dependent.
Complement Rule :
If A' is the event complementary to A, than
P (A') = 1 – P (A)
Example 4:
Let A and B be any two events where P (A) = 0.4 and
(B) = 0.7. If you know that P (A B) = 0.24, find :
(a) P (A')
(b) P (A or B)
(c) Are A and B independent events or not?
(d) Are A and B mutually exclusive events or not?
Solution:
(a) P (A') = 1 – P(A)
= 1 – 0.4 = 0.6
(b) (A or B) = P (A) + P(B) – P (A B)
= 0.40 + 0.70 – 0.24
= 1.10 – 0.24 = 0.86
(c) P (A) = 0.4 , P(B) = 0.7
P (A) – P (B) = 0.4 × 0.7 = 0.28
P (A B) = 0.24
Since P (A B) P (A) – P (B) , A and B are not
independent.
(d) P (A B) = 0.24
- 124 -
Test result
Positive Negative Total
Aids status
Infected 49 1 50
Not infected 199 9751 9950
Total 248 9752 10000
- 125 -
Find:
(a) P (infected).
(b) P ( Not infected).
(c) P (infected and positive).
(d) P (infected and negative).
(e) P (infected or positive).
(f) P (infected or negative).
(g) P (infected / positive).
(h) P (Negative / Not infected).
Solution:
9950 995
0.995
10000 1000
49
(c) P (Infected and positive) = 10000 = 0.0049
1
(d) P (Infected or positive) = = 0.0001
10000
- 126 -
= 50 248 49 50 248 49
10000 10000 10000 10000
= 249 = 0.0249
10000
Example 6:
A dentist has 50 children less than 12 years old in his
clinic. He counted the number of decayed, missing or
filled teeth each child has and listed them in a table like
the following.
The probability distribution of number of DMF teeth per
child will be:
- 129 -
Frequency of
xi Occurrence of P (xi)
xi
0 1 1/15
1 4 4/50
2 6 6/50
3 4 4/50
4 9 9/50
5 10 10/50
6 7 7/50
7 4 4/50
8 2 21/50
9 2 2/50
10 1 1/50
Total 50 1
4 1 5 45 9
(d) 1 1
50 50 50 50 10
(e) P (x 8) = 1 – {P (x 8)
= 1 – P (x = 9 or x = 10)
= 1 – {P(x = 9) + P (x = 10)
2 1 3 47
1 1
50 50 50 50
Expected Value:
Given a discrete random variable X with values xi that
occur with probabilities P (xi) the expected value og X
denoted by E (X) is:
E (X) = xi P (xi)
Laws of expected value:
If X and Y are random variables and C is any constant,
the following identities hold.
1. E (C) = C
2. E (CX) = CE (X)
3. E (X + Y) = E (X) + E (Y)
E (X - Y) = E (X) - E (Y)
4. E (X Y) = E (X) E (Y),
If X and be a independent random variables.
- 131 -
Variance:
Let X be a discrete random variable with possible
values xi, that occur with probabilities P (xi), and let E (X)
= , the variance of X, denoted by 2 is defined to be :
2 E X 2 E X 2 2
X i2 P X i 2
Laws of Variance:
If X and Y are random variables and C is constant, the
following identities hold:
1. V (C) = 0
2. V (CX) = C2 V (X)
3. V(X + C) = V (X)
4. V (X + Y) = V (X) + V(Y)
V (X – Y) = V (Y) + V (Y).
If X and Y are independent.
- 132 -
EXERCISES 6
1. If you roll two balanced dice, what is the probability
of obtaining :
(a) A total of 3 dots ?
(b) A total of 6 dots ?
(c) A total of 10 dots ?
x 0 1 2 3 4 5
P (x) .50 .21 .19 .09 .02 .01
x -10 -5 0 5 10
P (x) .01 .02 .02 .02 .03
CHPTER 7
CHPTER 7
Some Discrete Probability Distributions
Binomial Distribution
The binomial distribution is one of the most important
discrete distributions where many of statistical
phenomena behave as the binomial distribution. Any
random experiment consists of n independent trials and
each trial has two outcomes; one is a success event which
is fixed from trial to another, and the other is a failure
event, this experiment has a behavior of binomial
distribution.
P( X r ) C r p r (1 p) n r
n
n!
p r (1 p ) n r
r! (n r )!
where r 0 , 1, 2 , . . ., n and 0 p 1
Example 1:
A balanced coin was tossed three consecutive times, if the
random variable X is defined as the number of heads
appeared in the outcomes, find the probability
distribution of X using:
(a) The sample space of this experiment.
(b) The binomial distribution function.
(c) Graph the probability distribution of X .
Solution:
(a) The Probability Distribution of X Using the Sample
Space
- 139 -
H H H 3
H H T 2
H T H 2
H T T 1
T H H 2
T H T 1
T T H 1
T T T 0
X 0 1 2 3
P (X ) 18 38 38 18
0 3
3! 1 1 1
P( X 0)
0! 3! 2 2 8
2
3! 1 1 3
P( X 1)
1! 2! 2 2 8
2
3! 1 1 3
P( X 2)
2! 1! 2 2 8
3 0
3! 1 1 1
P( X 3)
3! 0! 2 2 8
P (X)
3/8
1/8
X
0 1 2 3
- 141 -
n
n!
x0
x
x ! (n x)!
p x (1 p) n x
n
(n 1) !
n
x 1 ( x 1) ! (n x)!
p x (1 p ) n x
n
(n 1) !
n p x 1 ( x 1) ! (n x)!
p x 1 (1 p ) n x
n
n 1 x 1
n p p (1 p) n x
x 1 x 1
- 142 -
n p p (1 p)n 1 n p
Lemma:
n
n x n x
a b n
b a
x0 x
Proof:
V ( X ) E ( X )2 E ( X 2 ) 2 (1)
E ( X 2 ) E X ( X 1) X
E X ( X 1) E ( X ) (2)
n
E X ( X 1) x ( x 1) P ( x)
x0
n
n!
x ( x 1)
x0 x ! (n x) !
p x (1 p) n x
n
(n 2) !
n (n 1)
x2 ( x 2) ! ( n x ) !
p x (1 p ) n x
- 143 -
n
(n 2) !
n (n 1) p 2 p x 2 (1 p ) n x
x2 ( x 2) ! (n x) !
n 2 y
E X ( X 1) n (n 1) p 2 p (1 p ) n y 2
y
n (n 1) p 2 p (1 p)
n2
n (n 1) p 2
From (2), we have
E ( X 2 ) n (n 1) p 2 n p
n p [ (n 1) p 1 ]
From (1), we have
V (X ) E (X 2 ) 2
n (n 1) p 2 n p (n p) 2
n p [( n 1) p 1 n p]
n p [ n p p 1 n p]
n p (1 p)
Since the standard deviation is the positive square root
of the variance, then
Std ( X ) n p (1 p)
- 144 -
Example 2:
In some factory, if the percentage of defective units was
15 %. If a sample of five units was selected from this
factory production, compute:
(a) The probability that this sample contains exactly a
defective unit.
(b) The probability that this sample has no defective units.
(c) The probability that this sample contains at least a
defective unit.
(d) The probability that this sample contains at most a
defective unit.
(e) The expected value and the standard deviation of the
number of defective units.
Solution:
p 0.15 1 p 0.85 n 5 units
5!
P ( x 0) (0.15) 0 (0.85) 5 0.444
0 ! 5!
1 P ( X 0)
1 0.444 0.556
Example 3:
An urn contains five white balls, three red balls, if
consecutive three balls were withdrew with replacement,
compute:
(a) The probability of getting exactly one white ball.
(b) The probability of getting no white ball.
© The probability of getting at least two white balls.
(d) The mean, variance, standard deviation and
coefficient of variation of selected white balls.
- 146 -
Solution:
It is known that the withdrawing of balls considered
independent trials, therefore the probability of getting a
white ball is fixed from trial to another. That is, the
binomial distribution function will be applied to compute
the required.
5 3
n3 p 1 p
8 8
(a) The probability of getting exactly one white ball.
2
3! 5 3
P ( X 1) 0.264
1! 2! 8 8
1 P ( X 0) P ( X 1)
1 0.053 0.264 0.683
Another Solution:
P ( X 2) P ( X 2) P ( X 3)
Try to complete!
- 147 -
* The mean
5
E (X ) n p 3 1.875
8
* The variance
5 3
V ( X ) n p (1 p ) 3 5.625
8 8
* The standard deviation
Bernulli Distribution
The Bernulli distribution is a special case of binomial
distribution when the number of independent trials
equals one. That is, if the random variable X is
distributed as Bernulli, the probability distribution of X
will be
P ( X ) p x (1 p)1 x , X 0, 1
1
E (X )
x 0
x P (x )
1
x 0
x p x (1 p)1 x p
1
E (X 2 )
x 0
x 2 p x (1 p )1 x p
V ( X ) p p 2 p (1 p)
Poisson Distribution
The Poisson distribution is one of important discrete
distributions where it expresses many of practical
phenomena in our life that are described with rareness and
independence, such as
(a) The number of mistakes in some book.
(b) The number of accidents on some day in a high way.
(c) The number of patients who attain to some clinic on
some day.
(d) The number of cars that arrive some service station at
an hour.
(e) The number of calls attain to academy switch.
- 149 -
1. P ( X ) 0
2.
x0
P ( x) 1
Proof:
e x
x
x0 x!
e
x0 x!
2 3
e [1 ... ]
2! 3!
e e 1
- 150 -
E ( X ) x P ( x)
x0
e x
x0
x
x!
x
e
x0
x
x!
x 1
e
x 1 ( x 1) !
e e
That is the mean of Poisson distribution equals the
parameter of the distribution.
V ( X ) E ( X 2 ) E ( X )
2
E (X )
2
x
x0
2
P ( x)
e x
x0
x2
x!
- 151 -
x
e
x0
( x x x)
2
x!
x
e
x ( x 1) x
x0 x!
x
x
x ( x 1)
e x
x 0 x! x0 x!
x 2
x 1
e 2 ( x 2) !
( x 1) !
x2 x 1
e 2 e e
e e ( 2 )
2
The variance of Poisson distribution will be
V (X ) 2
Example 4:
If the average number of accidents on some high way was
4 accidents within two weeks, compute the probability of
occurrence:
(a) An accident within a week.
- 152 -
e2 2
(a) P ( x 1) 0.271
1!
e 2 20
(b) P ( x 0) 0.135
0!
(c) P ( x 2) P ( x 0) P ( x 1) P ( x 2)
P ( x 0) 0.135
P ( x 1) 0.271
e 2 22
P ( x 2) 0.271
2!
(d) P ( x 2) 1 P ( x 2)
1 P ( x 0) P ( x 1)
- 153 -
Example 5:
The patients arrive at some clinic at a rate of 12 patients
per day, compute the probability of arriving:
(a) three patients within an hour.
(b) five patients within 3 hours.
(c) four patients within half a day.
Solution:
(a) the average number of patients within an hour,
12
0 .5
24
e 0.5 (0.5) 3
P ( x 3) 0.01264
3!
e 1.5 (1.5) 5
P ( x 5) 0.01420
5!
e 6 (6) 4
P ( x 4) 0.13392
4!
- 154 -
Example 6:
During the production investigation in some factory, it is
shown that every 250 units contain 3 defective units. A
sample of 25 units were randomly selected, compute the
probability of getting:
(a) no defective units in the sample.
(b) a defective unit in the sample.
(c) two defective units in the sample.
(d) at least three defective units in the sample.
(e) more than one defective units.
Solution:
The number of defective units is a random variable
distributed as binomial distribution, where
2. p 0.05
- 155 -
3. n 25
(a) The probability of no defective units
e 3 (3) 0
P ( x 0) 0.0498
0!
e 3 (3) 2
P ( x 2) 0.2240
2!
(d) The probability of getting at least 3 defective units
P ( x 3) 1 P ( x 3)
1 P ( x 0) P ( x 1) P ( x 2)
1 0.4232 0.5768
(e) The probability of getting more than a defective unit
P ( x 1) 1 P ( x 1)
1 P ( x 0) P ( x 1)
1 0.0498 0.1494
1 0.1992 0.8008
- 156 -
- 157 -
Chapter 8
Chapter 8
2.
f ( x) dx 1
b
3. P(a X b) a f ( x) dx
Notes:
Example 1:
1 if 0 x 1
f ( x) { 0 otherwise
Solution:
a. 1. f (0) 1 0
f (1) 1 0
1
2.
0
f ( x) dx [ x ]10 1 0 1
- 161 -
b. The Mean
1 1 1 1
E ( X ) x. f ( x) dx [ x 2 ]10 (1 0 )
0 2 2 2
The Variance
2 E ( X 2 ) ( E ( X ))2
2
1 1
x f ( x) dx
2
0
2
1 1 1 1 1
[ x 3 ]10 (1 0)
3 4 3 4 12
1
0.29
12
Example 2:
3 x2 if 0 x 1
f ( x) { 0 otherwise
- 162 -
d. compute P ( X 0.7)
Solution:
a. 1. f (0) 3(0) 2 0
f (1) 3(1) 2 3 0
1 1
2. 0
f ( x) dx 3x 2 dx [ x 3 ]10 1 0 1
0
b. The Mean
1 1 1 3 3 3
E ( X ) x. f ( x) dx x.3 x 2 dx 3 x 3 dx [ x 4 ]10 (1 0 )
0 0 0 4 4 4
The Variance
2 E ( X 2 ) ( E ( X ))2
- 163 -
2
1 3 1 9 1 9
x f ( x) dx x 2 .3x 2 dx
2
3 x 4 dx
0
4 0 16 0 16
3 9 3 9 3
[ x 5 ]10
5 16 5 16 80
3
0.19
80
0.5 0.5
c. P(0.2 X 0.5) 0.2 f ( x)dx 30.2 x 2 dx [ x 3 ]00..52
1
d. P( X 0.7) 0.7 f ( x)dx 0.7 f ( x)dx 1 f ( x)dx
1
3 x 2 dx 0 [ x 3 ]10.7 (1) 3 (0.7) 3 1 0.343 0.657
0.7
Example 3:
x if 0 x 1
f ( x)
0 otherwise
then compute:
b. P( X 0.4) .
- 164 -
Solution:
First condition:
f ( 0) 0
f (1) 1 0
Second condition:
1 1 1 1 1
f ( x)dx xdx 2 [ x ] (1 0) 1
2 1
0
0 0 2 2
This yields,
1
a x dx 1
1
0
af ( x) dx 1 0
- 165 -
a 2 1 a
[ x ]0 1 (1 0) 1
2 2
a
1 a2
2
2 x if 0 x 1
f ( x)
0 otherwise
a. The mean
1 1 2 3 1 2 2
E ( X ) xf ( x) dx 2 x 2 dx [ x ]0 (1 0)
0 0 3 3 3
The variance
1 1 1 4 1 1 1
E ( X 2 ) x 2 f ( x) dx 2 x 3 dx [ x ]0 (1 0)
0 0 2 2 2
2 E ( X 2 ) ( E ( X ))2
1 2 1 4 1
2 ( )2
2 3 2 9 18
1
0.24
18
0.4 0 0.4
b. P( X 0.4) f ( x)dx f ( x)dx 0 f ( x)dx
- 166 -
0.4 0.4
0 f ( x)dx 2 xdx [ x 2 ]00.4 (0.4) 2 (0) 2 0.16
0 0
Example 4:
a e x if x 0
f ( x)
0 otherwise
c. P (0.2 X 0.7) .
d. P (10 X 1).
Solution:
0
f ( x)dx 1
0
a e x dx 1
a e x dx 1 a[ e x ]0 1
0
a (e e 0 ) 1 a (0 1) 1 a 1
e x if x 0
f ( x)
0 otherwise
b. The mean
E ( X ) x f ( x)dx
0
x e x dx xe x e x dx
0
[ xe x ]0 e x dx
0
0 (0 1) 1
The variance
E( X 2 )
x 2 f ( x)dx
0
x 2e x dx x 2
0
e x dx 2 x(e x )dx
x 2 [e x ]0 2 x(e x )dx
0
x 2 [e x ]0 2 xe x (e x )dx
0
x 2[e x ]0 2 [ xe x ]0 [e x ]0
- 168 -
Try to complete!
0.7 0.7
c. P(0.2 X 0.7) 0.2
f ( x)dx 0.2
e x dx
1
d. P( 10 X 1) 10
f ( x)dx
0 1
10
f ( x)dx f ( x)dx
0
1
0 e x dx [e x ]10 e 1 e 0
0
0.3679 1 0.6321
Example 5:
a e 3 x if x 0
f ( x)
0 otherwise
c. P(0.3 X 0.5) .
- 169 -
d. P(1 X 1).
Solution:
0
f ( x)dx 1 0
a e 3 x dx 1
a
a e 3 x dx 1 [e 3 x ]0 1
0 3
a
a (e e0 ) 1 ( 0 1) 1 a 3
3
3 e 3 x if x 0
f ( x)
0 otherwise
x
F ( x) f ( x)dx
Example 7:
1 if x 0
f ( x)
0 otherwise
Solution:
x
F ( x) 0
dx x
- 171 -
CHAPTER 9
CHAPTER 9
Uniform Distribution
1
if 0 x b
f ( x) b a
0
otherwise
f (x)
1/ ( b – a)
x
a b
- 174 -
1. f ( x) 0
2.
f ( x) 1
b
3. P (a x b)
a
f ( x) dx
Example 1:
Solution:
1 1
1. f (a ) 0 and f (b) 0
ba ba
b 1 x b
2. dx ]a
a ba ba
b a ba
1
ba ba ba
2 1 x 2 2 1 1
p (1 x 2) dx ]1
1 ba ba ba ba ba
- 175 -
b
E ( x ) x f ( x ) dx
a
b 1 x
x
b
a ba
dx a ba
dx
x2 b b2 a2
2 (b a ) a 2 (b a ) 2 (b a )
b2 a2 (b a ) ( b a ) (b a )
2 (b a ) 2 (b a ) 2
That is
ab
E ( x)
2
b
E ( x 2 ) x 2 f ( x) dx
a
- 176 -
b 1 b x2
x dx
2
dx
a ba a ba
x3 b b3 a3
3 (b a ) a 3 (b a ) 3 (b a )
b3 a3 (b a ) ( b 2 a b a 2 )
3 (b a ) 3 (b a )
b2 a b a2
3
Var( x) E( x 2 ) E ( x)
2
b2 a b a2 a b
2
3 2
b2 a b a2 a2 2 a b b2
3 4
4 (b 2 a b a 2 ) 3 (a 2 2 a b b 2 )
12
a2 2 a b b2 ( a b) 2
12 12
- 177 -
That is
( a b) 2
Var ( x)
12
( a b) 2
Std ( x )
12
Example 2:
Exponential Distribution
e x if x 0
f ( x)
0 otherwise
x x
F ( x)
f ( x) dx e x d x
x
e x 1 e x
x 1 e x x0
F ( x)
f ( x) dx
0 x 0
Mean
1
E (X )
Proof:
(t ) E [ e t x ] t
t
d2 2
E (X )
2
(t ) t 0
d t2 2
Variance
V ( X ) E ( X 2 ) E ( X )
2 1
2
- 179 -
CHAPTER 10
NORMAL DISTRIBUTION
- 180 -
- 181 -
CHAPTER 10
NORMAL DISTRIBUTION
Z= X is a normal with = 0 , = 1
This standardization converts any normal distribution
into a normal distribution with a mean of 0 and a standard
deviation of 1 . Therefore the standard normal distribution
is a normal distribution with = 0 , = 1 .
Note :
If x is normal distributed with mean and standard
deviation , then :
Example 1:
Given the standard normal distribution, find the area
under the curve, above the X-axis between 0 and Z = 2 .
Solution:
It will be helpful to draw a picture of the standard
normal distribution and shade the desired area as in the
figure below. If we locate Z = 2 in table (1) and read the
corresponding entry in the body of the table, we find the
desired area to be 0.4772 . We may interpret this area in
several ways. We may interpret it as the probability that
a Z picked random from the population of Z's will have a
value between 0 and 2. We may also interpret it as the
relative frequency of occurrence (or proportion) of values
of Z between 0 and 2, or we may that 47.72 percent of the
Z's have a value between 0 and 2.
0 2
Example 2:
What is the probability that a Z picked from the
population of Z's will have between -2.55 and +2.55?
- 185 -
Solution:
The figure shows the area desired. Table (1) gives
across until this area to be 0.4946. If we look at the picture
we draw, we see that this is only part of the area desired,
and if we recall that the normal distribution is symmetric
about the mean, we realize that we have found exactly one
half of the desired area. To obtain the total desired we
double .4946 to get 09892 .
- 2.55 0
2.55
Example 3:
What proportion of Z values are between -2.74 and 1.53?
Solution:
The figure shows the area desired. We find in table (1)
that the area between 0 and 1.53 is 0.4370 . Because of
symmetry we know that the area between 0 and -2.74 is
the same as the area between 0 and +2.74 which from the
table, we find is 0.4969 . To obtain the total area desired
we add these two areas to get 0.4370 + 0.4969 = 0.9339.
- 186 -
- 2.74 0 1.53
Example 4:
Given the standard normal distribution find P (Z 2.71).
Solution :
The area desired is shown in the figure below. We
obtain the area to the right of Z = 2.71 by subtracting
the area between 0 and 2.71 from 0.5 Thus
P (Z 2.71) = 0.5 – P (0 Z 2.71)
= 0.5 – 0.4966 = 0.0034
0 2.71
- 187 -
Example 5:
Given the standard normal distribution, find P (0.84 Z
2.45)
Solution:
The area we are looking for is the figure below. We
first obtain the area between 0 and 2.45, and from that
subtract the area between 0 and 0.84 . In other words.
P (0.84 Z 2.45) = P (0 Z 2.45) – P (0 Z 0.84)
= 0.4929 – 0.2995 = 0.1934
0 0.84 2.45
Application:
Although its importance in the field of statistics is
indisputable, one should realize that the normal
distribution is not a law that is adhered to be all
measurable characteristics occurring in nature. It is true,
however, that many of these characteristics are
- 188 -
Example 6:
A physical therapist feels that scores on a certain
manual dexterity test are approximately normally
distributed with a mean of 10 and a standard deviation of
2.5 . If a randomly selected individual takes that, what is
the probability that he or she will make a score of 15 or
better?
Solution:
First let us draw a picture of the distribution and shade
the area corresponding to the probability of interest. This
has been in the figure below. If our distribution was the
standard normal distribution with a mean of 0 and
standard deviation of 1, we could make use of table (1)
and find the probability with little effort Fortunately, it is
- 189 -
X
Z
15 10 5
Z 2
25 2 .5
0 2
15 10
P ( X 15) P Z
2 .5
P ( Z 2) 0.5 P (0 Z 2)
Example 7:
Suppose it is known that the weights of a certain group
of individuals are approximately normal distributed with
a mean of 140 pounds and a standard deviation of 25
pounds. What is the probability that a person picked at
random from this group will weigh between 100 and 170
pounds.
- 191 -
Solution:
The figure below shows the distribution of weights and
the Z distribution to which we transform the original
values to determine the desired probabilities. We find the
Z value corresponding to an x of 100 by:
100=-140
1.6
Z
25
170 140
Z 1.2
25
- 1.6 0 1.2
= P (-1.6 Z 0) + P (0 Z 1.2)
= 0.4452 + 0.3849 = 0.8301
The probability asked for in our original question, then ,
is 0.8301 .
- 193 -
EXERCIES 10
CHAPTER 11
ESTIMATION Theory
- 198 -
- 199 -
CHAPTER 11
ESTIMATION Theory
Estimation:
The estimation is a technique by which, we can drawn
inferences about a population.
Estimator:
Estimate:
An estimate is the calculation of a specific value of
random variable.
Point Estimator:
A point estimator draws inferences about a population
by estimating the value of an unknown parameter using
a single value or point.
Interval Estimator:
An interval estimator draws inferences about a
population by estimating the value of an unknown
parameter using an interval that is likely to include the
value of the population parameter.
- 201 -
X
Z
/ n
X
1 P Z a / 2 Z a / 2
/ n
1 P Z a / 2 X Za/2
n n
- 202 -
1 P X Z a / 2 X Za/2
n n
X Z / 2
n
X Z / 2 is called the upper confidence
n
limit (UCL)
Confidence Significance
Level 1- level Za/2
/2
0.90 0.10 0.05 1.645
0.95 0.05 0.025 1.960
0.99 0.01 0.005 2.576
Example 1:
Ina random sample of size n = 100 from a population
whose variance is 2 = 18. We calculated X = 30. Find
the 90% confidence interval estimate the population
mean .
Solution:
X = 30 =9
n = 100 Za/2 = 1.645
The 90% confidence interval estimate will be as follows.
X Z a / 2 X Za/2
n n
9 9
X 1.645 X 1.645
100 100
X 1.4805 X 1.4805
- 204 -
Example 2:
The following observations were drawn from a
normal population whose variance is 100 :
140, 170, 150, 180, 130, 120, 100
Determine the 95% confidence interval of the
population mean.
Solution:
X Z a / 2 X Za/2
n n
10 10
X 1.96 X 1.645
7 7
X 7.4081 X 7.4081
EXERCISES 11
CHAPTER 12
CHI-SQUARE TESTS
- 210 -
- 211 -
CHAPTER 12
CHI-SQUARE TESTS
(Oi Ei ) 2
2
Ei
Example 1:
Consider a subset of the data from a case-control
study on smoking and lung cancer.
63 92
E11 54.68
106
6314
E12 8.32
106
- 213 -
43 92
E 21 37.32
106
43 14
E 22 5.68
106
r c (Oij Eij ) 2
2
i 1 j 1 Eij
r = number of roes, r = 1, 2
c = number of columns, c = 1, 2
- 214 -
2 2 (Oij Eij ) 2
2
i 1 j 1 Eij
Statistical decision:
Since the calculated 2 is greater than tabulated 2 ,
then we reject H0 .
Clinical decision:
On the basis of these data, we may deduce that there
is an association between the smoking and lung cancer.
- 215 -
Example 2:
The following data pertain to a cohort of 609 healthy
individuals between the ages of 40 and 76. The cohort
was followed for 7 years, after which new cases of
coronary heart disease (CHD) were identified. The level
of circulating catecholamine (CAT) is the exposure
variable of interest, The following table present the
results.
71 122
E11 14.22
609
- 216 -
71 487
E12 56.78
609
548 122
E 21 107.78
609
538 487
E 22 430.22
609
2 2 (Oij Eij ) 2
2
i 1 j 1 Eij
12,0.05 3.84
Statistical decision:
Since the calculated 2 tabulated 2 , then we
reject H0 .
Clinical decision:
On the basis of these data, we may extract the CHD
and CAT are related variables.
Example 3:
A certain drug is claimed to be effective in curing
colds. In an experiment on 164 people with cold, half of
them were given the drug and half of them were given
sugar pills. The patients reactions to the treatment are
recorded in the following table. Test the hypothesis that
the drug and the sugar pills yield similar reactions with
= 0.01.
Solution:
The hypotheses:
H0 = The drug and the sugar pills yield similar
reactions.
H1 = The drug and the sugar pills don't yield similar
reactions.
82 96
E11 48
164
82 22
E12 11
164
82 46
E13 23
164
82 96
E 21 48
164
82 22
E 22 11
164
82 46
E 23 23
164
- 219 -
2 2 (Oij Eij ) 2
2
i 1 j 1 Eij
(52 48) 2 (10 11) 2 (20 23) 2 (44 48) 2 (12 11) 2 (26 23) 2
48 11 23 48 11 23
= (2 -1) (3 – 1) = 2
22,0.01 9.21 (tabulated 2 )
The decision:
Since the calculated X2 is smaller than the tabulated
X2 , we accept H0. That is the drug and sugar pills yield
similar reactions.
Example 4:
In a breeding experiment of flowers, the results were
recorded as follows:
Expected
Flower Observed (O – (O – (O E ) 2
number
colour number E) E)2 E E
(E)
Red 149 134 15 225 1.68
Yellow 211 268 -57 3249 12.12
Pink 191 134 57 3249 24.25
White 119 134 -15 225 1.68
Total 670 670 39.73
The summation of the last column represents the chi-
square statistic. That is,
The calculated X2 = 39.73
The critical value of chi-square with = 0.05 degrees
of freedom will be:
- 222 -
(tabulated 2 )
2
3, 0.05 7081
The decision :
Since the calculated 2 is greater than tabulated 2 ,
we reject the null hypothesis; H0 . That is the observed
distribution and expected distribution are not equal.
Example 5 :
The number of field mice caaptured at five traps
during a period of 3 months were recorded as follows :
Traps A B C D E Total
Number 23 7 25 19 21 95
Expected
Observed (O E ) 2
Trap number (O – E) (O –E)2
number (O) E
(E)
A 23 19 4 16 0.84
B 7 19 -12 144 7.58
C 25 19 6 36 1.89
D 19 19 0 0 0.00
E 21 19 2 4 0.21
Total 95 95 10.52
32,0.05 7081
The decision :
Since the calculated 2 is smaller than tabulated 2 ,
we accept the null hypothesis; H0 . That is the observed
distribution and the expected distribution are equal.
- 224 -
EXERCIES 12
1. A management behavior analyst has been
studying the relationship between male/female
supervisory structures in the work place and the
level of employees' job satisfaction. The results of a
recent survey are shown in the table below.
Conduct a test with = 0.05 to determine whether
the level of job satisfaction depends on the
boss/employee gender relationship.
Level of Boss / Employee
Satisfaction F/M F/F M/M M/F
Satisfied 21 25 59 71
Neutral 39 49 50 38
Dissatisfied 31 48 10 11
CHAPTER 13
ANALYSIS OF VARIANCE
- 226 -
- 227 -
CHAPTER 13
ANALYSIS OF VARIANCE
X
i 1
ij
( xij x j
2
SSTOT = SST + SS
2 2 k nj
nj
k k
( xij x j
2
( x1 x n j (x j x
j 1 i 1 = j 1 + j 1 i 1
Where :
- 230 -
j 1 i 1
( xij x j x j x
2
nj
k
=
j 1 i 1
x
ij
x j
x j x
k nj
2
2
=
j 1 i 1
xij x j 2 xij x j x j x x j x
nj nj k
k
k
j 1 i 1
xij x j 2 x ji x j x j x n j x j x
j 1 j 1 j 1
2
nj 2
k
k
=
j 1 i 1
xij x j n j x j x
j 1
k
Notcie that :
j 1
x j x =0
The test of the equality of two or more population
means can be cut lined as follows :
H0 : 1 = 2 = …… = k
H1 : 1 = 2 = …… = k
2. Drew independent random samples from populations
and calculate the sample means.
X1 , X2 , …….. Xk
3. Set alpha and determine the degrees of freedom
of both numberator and denominstor as (k-1) and
(n-k) respectively, where k is the number of means
being tested, and n is the total number of
observations,
Fka1,nk than look up the critical value of F,
consulting Table (3);
4. Calculate :
(a) The treatment sum of squares ; SST.
(b) The error sum of squares ; SSE.
(c) Mean squares of treatments; MST.
Where : SST
MST = k 1
(d) Mean squares of Error, MSE
Where : SSE
MSE = n k
5. Calculate the F statistic .
F = MST
MSE
6. Reject H0 when F F a
k 1 , n k
Example 1:
Five methods of teaching were used on of
homogenous groups, at the end of the semester, we
have the scores in the final examination as follows.
Treatments
A B C D E
30 30 40 54 55
35 40 45 40 45
25 45 55 35 60
40 45 35 40 50
Solution :
30 35 25 40 130
X1 32.5
4 4
30 40 45 45 160
X2 40
4 4
40 45 55 35 175
X3 43.75
4 4
45 40 35 40 160
X4 40
4 4
55 45 60 50 210
X5 52.5
4 4
SST = 5 n X X
j j
j 1
= 4 (32.5 – 41.75)2 + (40 – 41.75)2 + (43.75 -
)41.75)2 + (40- 41.75)2 + (52.5 – 41.75)2
SSE = 5 4
X ij X
j 1 i 1 j
Total 19 SSTOT=1543.71
Calculated F = 4.53
Critical value of F (tabulated F)
- 235 -
.05 = 4.08
F40,15
Decision
Since the calculated F is greater than the tabulated F, we
reject H0 . That is the responses of the treatments are
different ( 1 2 3 4 5 )
- 236 -
EXERCISES 13
Dusting method
1 2 3 4
5.3 4.4 8.4 7.4
3.7 5.1 6.0 4.3
14.3 5.4 4.9 3.5
6.5 12.1 9.5 3.8
Control I II III IV V VI
89.8 84.4 64.4 75.2 88.4 56.4 65.6
93.8 116.0 79.8 62.8 90.2 83.2 79.4
88.4 84.0 88.0 62.4 73.2 90.4 65.4
112.6 68.6 69.4 73.8 87.8 85.6 70.2
Test to see if there are significant differences in
weights due to the different solutions using level of
significance of 0.01 .
Treatments
Feed I II III
240 185 115
210 195 125
195 210 135
220 200 130
- 239 -
6.
Treatments
I II III
12 10 9
11 10 7
6 4 3
7 8 5
Test 1 = 2 = 3 using = 0.01
- 240 -
General Exercises
1. Tossing a balanced coin consecutive tree times, the
probability of getting no heads ……………..
2. Tossing a balanced coin consecutive ten times, the
number of outcomes in its sample space ….........
3. Rolling a balanced die, the probability of getting a
prime odd number is equal to……………
4. Rolling a balanced die, the probability of getting no
prime even number is equal to……………
5. Rolling a balanced die twice, the probability of
getting a sum of prime even number is …………
6. Rolling a balanced die twice, the probability of
getting a sum of prime even number is ………….
7. The number of outcomes if rolling a balanced die
consecutive three times is…………..
8. Rolling a balanced die twice, the probability of
getting a sum of prime number divisible 3 …………..
9. The quotient of the number of elements in an event
space and the number of elements in a sample
space…..
10. The mean of a distribution is 14 and the standard
deviation is 5. What is the value of the coefficient of
variation?
11. The sum of deviations about mean is always…..
- 241 -