0% found this document useful (0 votes)
28 views111 pages

BST 121

The document is a set of lecture notes on Medical Statistics prepared by Dr. Mahmoud Mokhtar and Dr. Marwa Hani Maneea, outlining the vision and mission of the College of Oral and Dental Medicine. It covers fundamental concepts of statistics, including descriptive and inferential statistics, types of data, measures of central tendency and dispersion, correlation and regression, and data presentation methods. The notes aim to equip future dentists with the statistical knowledge necessary for research and professional practice in the field of dentistry.

Uploaded by

arsdaa1234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views111 pages

BST 121

The document is a set of lecture notes on Medical Statistics prepared by Dr. Mahmoud Mokhtar and Dr. Marwa Hani Maneea, outlining the vision and mission of the College of Oral and Dental Medicine. It covers fundamental concepts of statistics, including descriptive and inferential statistics, types of data, measures of central tendency and dispersion, correlation and regression, and data presentation methods. The notes aim to equip future dentists with the statistical knowledge necessary for research and professional practice in the field of dentistry.

Uploaded by

arsdaa1234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 111

Lectures Notes on

Medical Statistics

Prepared By
Dr. Mahmoud Mokhtar
Dr. Marwa Hani Maneea
Vision
The College of Oral and Dental Medicine - Modern
University for Technology and Information aspires to
be one of the most distinguished colleges at the local
and regional levels in the field of dentistry.

Mission
The college is committed to preparing dentists who are
distinguished by professional merit and are able to
comply with the requirements of the labor market and
keep pace with scientific development and contribute
to it through research activities while meeting the needs
of the surrounding community within the framework of
ethical values.
Contents
Chapter 1
1.1 Introduction 2
1.2 Some Basic Concepts 2
9
1.3 Data presentation

1-3-1-Frequency 9
Distribution for
Qualitative Data
1-3-2-Charts and graphs 16

Chapter 2 Measures of Central Tendency 27

2-1-Mean 27
2-2 Median 31
2-3 Mode 36

Chapter 3 Measures of Dispersion 41


3-1-The range 42
3-2-The standard deviation 42

Chapter 4 CORRELATION AND 51


REGRESSION
4-1-CORRELATION 51
4-2- REGRESSION 65

Chapter 5 Normal Distribution 74

Chapter 6 Test of Hypotheses about 88


Population Mean (𝝁)

Chapter 1
2
1.1 Introduction:
Statistics is a field of study concerned with
1- collection, organization, summarization and analysis of data.
2- drawing of inferences about a body of data when only a part of the data
is observed.
Statisticians try to interpret and communicate the results to others.
Types of statistics:
applied statistics can be divided into two areas: descriptive statistics and
inferential statistics.
Descriptive Statistics
Descriptive statistics consists of methods for organizing, displaying, and
describing data by using tables, graphs, and summary measures.
Inferential Statistics
Inferential statistics consists of methods that use sample results to help
make decisions or predictions about a population
1.2 Some Basic Concepts
Data:
Data is the raw material of statistics.
Statistics:
Statistics is the field of study concerned with:
1-The collection, organization, summarization, and analysis of data.
(Descriptive Statistics)
2-The drawing of inferences and conclusions about a body of data

3
(population) when only a part of the data (sample) is observed. (Inferential

Statistics)

Biostatistics (also known as biometry):


When the data is obtained from the biological sciences and medicine, we
use the term "biostatistics".

Sources of Data:
1. Routinely kept records.
2. Surveys.
3.Experiments.
4.External sources. (Published reports, data bank, . . .)

Population:
A population is the largest collection of entities (elements or individuals) in
which we are interested at a particular time and about which we want to draw
some conclusions. When we take a measurement of some variable on each of
the entities in a population, we generate a population of values of that
variable.
Two kinds of populations: finite or infinite.

Example:

If we are interested in the weights of students enrolled in the faculty of

medicine, then our population consists of the weights of all of these students,

and our variable of interest is the weight.

Population Size (N):

4
The number of elements in the population is called the population size
and is denoted by N.

Sample:

A sample is a part of a population.

From the population, we select various elements on which we collect our

data. This part of the population on which we collect data is called the

sample.

Sample Size (n):

The number of elements in the sample is called the sample

size and is denoted by n.

Variables:

The characteristic to be measured on the elements is called variable.

The value of the variable varies from element to element.

Example of Variables:

(1) No. of patients (2) Height (3) Sex (4) Educational Level

Independent Variable:
The variable in the study under consideration. The cause for the

outcome for the study.

Dependent Variable:

5
The variable being affected by the independent variable. The effect of

the study.

Parameter:
A numerical value summarizing all the data of an entire population

Types of Variables:

(1) Quantitative Variables (numerical variables):

A quantitative variable is a characteristic that can be measured. The


values of a quantitative variable are numbers indicating how much or
how many of something.

Examples

(i)Family Size (ii) No. of patients (iii) Weight (iv) height (v) body
temperature.

Types of Quantitative Variables:

(a)Discrete Variables:

There are jumps or gaps between the values(can’t be fractions).

Examples:

Family size (x = 1, 2, 3, .

Number of patients (x= 0, l, 2, 3, .

(b)Continuous Variables:

There are no gaps between the values.

6
Variables that can assume an infinite number of values between any two
specific values. They are obtained by measuring and they often include
fractions and decimals.

Examples

Height (140 < x < 190)

Blood sugar level (10 < x < 15)

Interval and Ratio


The difference between interval and ratio scales comes from their ability to dip

below zero. Interval scales hold no true zero and can represent values below zero.

For example, you can measure temperature below 0 degrees Celsius, such as -10

degrees. Ratio variables, on the other hand, never fall below zero. Height and

weight measure from 0 and above, but never fall below it.

(2) Qualitative Variables(categorical):

A variable which can’t be measured in quantitative form. But can only


be identified by name or categories.

Examples

place of birth, types of drug, stages of breast cancer (I, II, III, or IV),
degree of pain (minimal, moderate, severe), gender (male or female),
hair color (blond, brown, red, gray, black), Nationality, Students Grades,
Educational level.
7
Types of Qualitative Variables:

(a) Nominal Qualitative Variables:

i-Data that represent categories or names

ii-There is no implied order to the categories of nominal data.

iii-No arithmetic and relational operation can be applied.

Examples

•Blood type (A, B, O and AB)

•Eye color (brown, black, blue, etc.)

•Sex (Male, Female)

•Nationality (Saudi, Egyptian, British, . . .)

•Sick - well

•Married – single - divorced

(b) Ordinal Qualitative Variables:

i-Categories that can be ranked, but differences between ranks do not

exist .

ii-Arithmetic operations are not applicable but relational operations.

Examples:

•Degree of pain (minimal, moderate, severe)

•Rating scales (Excellent, Very good, Good, Fair, poor)

•Letter grade (A, B, C, D and F)

8
• Educational level (elementary, intermediate, .

• Student’s grade (A, B, C, D, F)

• Military rank

Types of variables

Qualitative Quantitative

Nominal Ordinal Discrete Continuous

Interval Ratio

9
1.3 Data presentation
• Tables
– Simplest way to summarize data
– Data is presented as absolute numbers or percentages
• Charts and graphs
– Visual representation of data
– Usually, data is presented using percentages

1-3-1-Frequency Distribution for Qualitative Data


A frequency distribution for qualitative data lists all categories and the
number of

elements that belong to each of the categories.

Example

A sample of 30 employees from large companies was selected, and these


employees were asked how stressful their jobs were. The responses of
these employees are recorded below, where very represents very stressful,
somewhat means somewhat stressful, and none stands for not stressful at
all.

somewhat none somewhat very very none very somewhat


somewhat very somewhat somewhat very somewhat none very
none somewhat

somewhat very somewhat somewhat very none somewhat


very very somewhat none somewhat

10
Construct a frequency distribution table for these data.

Solution

Frequency Distribution of Stress on Job

Calculating Relative Frequency of a Category

𝑭𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 𝒐𝒇 𝒕𝒉𝒂𝒕 𝒄𝒂𝒕𝒆𝒈𝒐𝒓𝒚


𝑹𝒆𝒍𝒂𝒕𝒊𝒗𝒆 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 =
𝒔𝒖𝒎 𝒐𝒇 𝒂𝒍𝒍 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒊𝒆𝒔

Calculating Percentage

𝑷𝒆𝒓𝒄𝒆𝒏𝒕𝒂𝒈𝒆 = (𝑹𝒆𝒍𝒂𝒕𝒊𝒗𝒆 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚) × 𝟏𝟎𝟎

Example:

from the previous example

Frequency and Frequency Tables

11
The frequency of a particular data value is the number of times the data
value occurs.
Ungrouped data
A frequency table is constructed by arranging collected data values in
ascending order of magnitude with their corresponding frequencies.
We use the following steps to construct a frequency table:
Step 1:
Construct a table with three columns. Then in the first column, write
down all of the data values in ascending order of magnitude.
Step 2:
To complete the second column, go through the list of data values and
place one tally mark at the appropriate place in the second column for
every data value. When the fifth tally is reached for a mark, draw a
horizontal line through the first four tally marks as shown for 7 in the
above frequency table. We continue this process until all data values in
the list are tallied.
Step 3:
Count the number of tally marks for each data value and write it in the
third column.
Example
The marks awarded for an assignment set for a Year 8 class of 20
students were as follows:
6 7 5 7 7 8 7 6 9 7
4 10 6 8 8 9 5 6 4 8
Present this information in a frequency table.

12
Solution

Relative frequency Percentage


2/20=0.1 10%
2/20=0.1 10%
4/20=0.2 20%
5/20=0.25 25%
4/20=0.2 20%
2/20=0.1 10%
1/20=0.05 5%

Class Intervals (or Groups)


When the set of data values are spread out, it is difficult to set up
a frequency table for every data value as there will be too many rows in
the table.
So, we group the data into class intervals (or groups) to help us
organise, interpret and analyse the data.
Ideally, we should have between five and ten rows in a frequency
table. Bear this in mind when deciding the size of the class interval (or
group).
Each group starts at a data value that is a multiple of that group. For
example, if the size of the group is 5, then the groups should start at 5,

13
10, 15, 20 etc. Likewise, if the size of the group is 10, then the groups
should start at 10, 20, 30, 40 etc.
The frequency of a group : is the number of data values that fall in the
range specified by that group (or class interval).
Example
The number of calls from motorists per day for roadside service was
recorded for the month of December 2003. The results were as
follows:
28 122 217 130 120 86 80 90 120 140
70 40 145 187 113 90 68 174 194 170
100 75 104 97 75 123 100 82 109 120
81

Set up a frequency table for this set of data values.


Solution:
To construct a frequency table, we proceed as follows:
Smallest data value=28
Highest data value=217
Difference=Highest value-smallest value=217-28=189.
Let the width of the class interval be 40.
189
Number of class intervals= = 4.7 = 5. (Round up to the next
40
integer)
There are at least 5 class intervals. This is reasonable for the given data.
Step 1: Construct a table with three columns, and then write the data
groups or class intervals in the first column. The size of each group is

14
40. So, the groups will start at 0, 40, 80, 120, 160 and 200 to include all
of the data. Note that in fact we need 6 groups (1 more than we first
thought).
Step 2: Go through the list of data values. For the first data value in
the list, 28, place a tally mark against the group 0-39 in the second
column. For the second data value in the list, 122, place a tally mark
against the group 120-159 in the second column. For the third data
value in the list, 217, place a tally mark against the group 200-239 in the
second column.
We continue this process until all of the data values in the set are
tallied.
Step 3: Count the number of tally marks for each group and write it in
the third column. The finished frequency table is as follows:

The Cumulative Frequency:


It can be computed by adding successive frequencies.
The Cumulative Relative Frequency:
It can be computed by adding successive relative frequencies.
The Mid- interval:
It can be computed by adding the lower bound of the interval plus the
upper bound of it and then divide over 2.
[(Lower bound + Upper bound)/2]
15
Example:
Class 30-39 40-49 50-59 60-69 70-79 80-89

Frequency 11 46 70 45 16 1

present the cumulative frequency, the relative frequency, the cumulative


relative frequency and the mid-interval.

Solution 𝐹𝑟𝑒𝑞
R.F= 𝑛

Class Mid- Frequency Cumulative Relative Cumulative


interval interval Freq(f) Frequency Frequency Relative
R.F Frequency
30-39 34.5 11 11 0.0582 0.0582
40-49 44.5 46 57 0.2434 0.3016
50-59 54.5 70 127 0.3704 0.6720
60-69 64.5 45 172 0.2381 0.9101
70-79 74.5 16 188 0.0847 0.9948
80-89 84.5 1 189 0.0053 1
Total 189 1

Sum of Frequency =
sample size = n

16
1-3-2-Charts and graphs
Data may be presented diagrammatically or visually by use of bar graphs,
histograms, frequency polygon, Ogive or Pie-chart. These visual diagrams
give a visual impression to the statistician who now goes ahead to analyze
and make conclusions about the data.

Bar Graph
This is at times called a bar chart. Class frequencies are plotted against class limits.
Since consecutive classes can never have common limits, the bars have spaces
between them when plotted.

Example
Twenty-five students are asked their blood type. Their responses are as follows:
A; B; O; A; AB; O; O; A; O; B; A; A; A; O; O; O; B; O; AB; B, O, B, O, A, A
Create a bar chart.
Solution
Data value Frequency
A 8
AB 2
B 5
O 10

17
Example
Treatment group Frequency
1 15
2 25
3 20

Histogram:
A histogram is a specific type of bar chart, where the categories are ranges of
numbers. Histograms therefore show combined continuous data.

Example

You have been given a list of ages in years, and you need to show them in a graph.

The ages are:


5, 12, 23, 22, 28, 17, 11, 21, 25, 23, 7, 16, 13, 39, 35, 42, 24, 31, 35, 36, 35, 34, 37,
44, 51, 53, 46, 45, and 57.

You can choose to group them into ten-year age categories, 0–10, 11–20, 21–30
and so on:

18
Age Number of people

0-10 2

11-20 5

21-30 7

31-40 8

41-50 4

51-60 3

The difference between a bar graph and a histogram

19
Polygon
A graph formed by joining the midpoints of the tops of successive bars in a histogram
with straight lines is called a polygon.
Example
A frequency polygon was constructed from the frequency table below.
(the midpoint =(upper limit+lower limit)/2)
Scores 40-49 50-59 60-69 70-79 80-89 90-99 100-109
Frequency 0 5 10 30 40 15 0
Mid point 44.5 54.5 64.5 74.5 84.5 94.5 104.5

20
Pie Chart
A circle divided into portions that represent the relative frequencies or percentages
of a population or a sample belonging to different categories is called a pie chart.
Example
The next chart explain usage of home budget

Stem-and-Leaf Graph:
One simple graph, the stem-and-leaf graph or stem plot, comes from the
field of exploratory data analysis. It is a good choice when the data sets
are small.
To create the stem-and-leaf plot:
1.) Divide each observation of data into a stem and a leaf. The leaf consists
of a final significant digit.
For example:
1-The number 23 has stem two and leaf three. The number 432 has stem
43 and leaf two. Likewise, the number 5,432 has stem 543 and leaf two.
The decimal 9.3 has stem nine and leaf three.

21
2-Write the stems in a vertical line from smallest to largest. Draw a
vertical line to the right of the stems. Then write the leaves in increasing
order next to their corresponding stem.
Example:
For Susan Dean's spring pre-calculus class, scores for the first exam were
as follows:
33; 42; 49; 49; 53; 55; 55; 61; 63; 67; 68; 68; 69; 69; 72; 73; 74; 78; 80;
83; 88; 88; 88; 90; 92; 94; 94; 94; 94; 96; 100
Create a Stem and leaf plot.
Solution
Stem Leaf
3 3
4 2 9 9
5 3 5 5
6 1 3 7 8 8 9 9
7 2 3 4 8
8 0 3 8 8 8
9 0 2 4 4 4 4 6
10 0
The Cumulative Frequency Curve
This is also called an Ogive. It is obtained by plotting the cumulative frequency
curve against the Class boundaries.

Example

Draw an ogive for the following distribution

Height (cm) 0-20 20-40 0-60 60-80 80-100


frequency 4 6 5 3 2

22
Solution

Height (cm) 0-20 20-40 0-60 60-80 80-100

frequency 4 6 5 3 2

cumulative 4 10 15 18 20
frequency

23
Exercise (1)
(1) Indicate which of the following variables are quantitative and which
are qualitative.

1. The temperature in Barrow, Alaska at 12:00 pm on any given day.

2. The make of automobile driven by each faculty member.

3. Whether or not a 6 volt lantern battery is defective.

4. The weight of a lead pencil.

5. The length of time billed for a long distance telephone call.

6. The brand of cereal children eat for breakfast.

7. The type of book taken out of the library by an adult.

8. Number of persons in a family

9. Colors of cars

10. Marital status of people

11. Time to commute from home to work

12. Number of errors in a person’s credit report

13. Number of typographical errors in newspapers

14. Monthly TV cable bills

15. Spring break locations favored by college students

16. Number of cars owned by families

24
(2) Identify each of the following as examples of (1) nominal, (2)
ordinal, (3) discrete, or (4) continuous variables:

1. The length of time until a pain reliever begins to work.

2. The number of chocolate chips in a cookie.

3. The number of colors used in a statistics textbook.

4. The brand of refrigerator in a home.

5. The overall satisfaction rating of a new car.

6. The number of files on a computer’s hard disk.

7. The pH level of the water in a swimming pool.

8. The number of staples in a stapler.

(3) The following data give the results of a sample survey. The letters A,
B, and C represent the three categories.
ABBACBCCCA

CBCACCBCCA

ABCCBCBACA

a. Prepare a frequency distribution table.

b. Calculate the relative frequencies and percentages for all categories.

c. What percentage of the elements in this sample belong to category B?

d. What percentage of the elements in this sample belong to category A


or C?

e. Draw a bar graph for the frequency distribution.

25
(4) The following data give the results of a sample survey. The letters Y,
N, and D

represent the three categories.

DNNYYYNYDY

YYYYNYYNNY

NYYNDNYYYY

YYNNYYNNDY

a. Prepare a frequency distribution table.

b. Calculate the relative frequencies and percentages for all categories.

c. What percentage of the elements in this sample belong to category Y?

d. What percentage of the elements in this sample belong to category N


or D?

e. Draw a pie chart for the percentage distribution.

(5)For the Park City basketball team, scores for the last 30 games were
as follows (smallest to largest):
32; 32; 33; 34; 38; 40; 42; 42; 43; 44; 46; 47; 47; 48; 48; 48; 49; 50; 50;
51; 52; 52; 52; 53; 54; 56; 57; 57; 60; 61 .
Construct a stem plot for the data.

26
(6)The following scores were made on a 53-item test:
25 30 34 37 41 42 46 49 53
26 31 34 37 41 42 46 50 53
28 31 35 37 41 43 47 51 54
29 32 36 38 41 44 48 52 54
30 33 36 39 41 44 48 52 55
30 33 37 40 42 45 48 52

1- Set up a frequency table for the above data, then calculate the Relative
Frequency and the percentage frequency.

2- Draw a histogram for the data using class interval of size 6.


3- Draw the frequency polygon for this histogram
4- Find the cumulative frequencies table.
5- Draw the cumulative frequency curve.

27
Chapter 2
Measures of Central Tendency
These include the mean, median and mode. These values locate the
average value of a variable in a specific position of the number line with
respect to the data.
2-1-Mean
The mean, also called the arithmetic mean, is the most frequently used
measure of central tendency.

For ungrouped data


The mean is obtained by dividing the sum of all values by the number of
values in the data set:

𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠


𝑀𝑒𝑎𝑛 =
𝑁𝑢𝑚𝑏𝑒𝑟𝑠 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
The mean calculated for sample data is denoted by 𝑋̅ (read as “x bar”),

and the mean calculated for population data is denoted by 𝜇 (Greek letter
mu).
∑𝑛𝑖=1 𝑥𝑖 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛
𝑀𝑒𝑎𝑛 = 𝑋̅ = = .
𝑛 𝑛

Example 1
Determine the mean mark of a class test using the data 28, 35, 18, 40, 62,
50 and 70.
Solution:
28
28 + 35 + 18 + 40 + 62 + 50 + 70
𝑀𝑒𝑎𝑛 = 𝑋̅ = = 43.3.
7
Example 2
The following are the ages (in years) of eight patients: 53, 32, 61,27, 39,
44, 49, 57. Find the mean age of these patients.

Solution

53 + 32 + 61 + 27 + 39 + 44 + 49 + 57 362
𝑋̅ = = = 45.25 𝑦𝑒𝑎𝑟𝑠.
8 8
Example 3
If the heights of 5 people are 142 cm, 150 cm, 149 cm, 156 cm, and 153 cm.

Find the mean height.

Solution
142+150+149+156+153 750
̅=
Mean height =𝑋 = = 150 𝑐𝑚.
5 5

For grouped data

∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖
𝑀𝑒𝑎𝑛 = 𝑋̅ = 𝑛 ,
∑𝑖=1 𝑓𝑖

Where: 𝑥𝑖 is the mid-point of the 𝑖𝑡ℎ class.

𝑓𝑖 is the frequency of the 𝑖𝑡ℎ class

Example 1
Calculate the mean

Classes 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64
Frequency 6 6 4 3 3 6 5 1 6

29
Solution:
Classes frequency 𝑥𝑖 𝑓𝑖 𝑥𝑖
20-24 6 22 132
25-29 6 27 162
30-34 4 32 128
35-39 3 37 111
40-44 3 42 126
45-49 6 47 282
50-54 5 52 260
55-59 1 57 57
60-64 6 62 372
Sum 40 1630

1630
𝑋̅ = = 40.75.
40
Example 2
The following table indicates the data on the number of patients visiting a
hospital in a month.

Find the average number of patients visiting the hospital in a day.

Number of days
Number of patients
visiting hospital
0-10 2
10-20 6
20-30 9
30-40 7
40-50 4
50-60 2

Solution

In this case, we find the class mark (also called as mid-point of a class) for
each class.

30
𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 + 𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
Note: Class mark (mid − point) = .
2

Hence, we get the following table:

Classes Class mark (xi) frequency (fi) 𝒙𝒊 𝒇𝒊


0-10 5 2 10
10-20 15 6 90
20-30 25 9 225
30-40 35 7 245
40-50 45 4 180
50-60 55 2 110
∑ 30 860

∑𝑛𝑖=1 𝑥𝑖 𝑓𝑖 860
𝑋̅ = = = 28.67
∑𝑛𝑖=1 𝑓𝑖 30

Advantages and disadvantages of the mean:

Advantages:

• Uniqueness. For a given set of data there is one and only one
mean.
• Simplicity. It is easy to understand and to compute.
• The mean takes into account all values of the data.

Disadvantages:

• Affected by extreme values. Since all values enter into the


computation.

Example:
Sample Data mean
A 2,4,5,7,7,10 5.83
B 2,4,5,7,7,100 20.83

• The mean can only be found for quantitative variables.


31
2-2 Median
The median is the value of the middle term in a data set that has been
ranked in increasing order, i.e., the value of the variable which divides
the set of data into two equal parts such that half of the data are
before it and the other are after it.

For ungrouped data


The calculation of the median consists of the following two steps:
Step 1: Arrange the data in ascending or descending order.

Step 2: Let the total number of observations be n.

To find the median, we need to consider if n is even or odd.

If n is odd, then use the formula:

(i) If the number of observations in a data set is odd, then the median is
given by the value of the middle term in the ranked data.

(ii) If the number of observations is even, then the median is given by


the average of the values of the two middle terms.

𝑥(𝑛+1) 𝑛 𝑖𝑠 𝑜𝑑𝑑
2
𝑋̃ = {1
{𝑥(𝑛) + 𝑥(𝑛+1) } 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2 2 2

Example 1
Upgrading of the hospital, the following data give the prices of seven
new medical equipment: 312, 257, 421, 289, 526, 374, 497. Find the
median.

Solution
32
First, we rank the given data in increasing order as follows:

257, 289, 312, 374, 421, 497, 526.

Since there are seven values, the middle term is the fourth term,

the median= the value of the fourth term in the ranked data=374.

Example 2
Determine the median of
(i) 12 15 10 11 16 18 14
(ii) 3 7 9 10 13 12 8

Solution

The data is arranged in ascending order


(i)10, 11, 12, 14, 15, 10, 18 (n=7, odd)
𝑛+1 7+1
The order of the median= = = 4, the median is the fourth
2 2
term. The middle value is 14.
Median = 14

(ii) 3, 7, 8, 9, 10, 12, 13, 14 (n=8, even)


𝑛 8 𝑛 8
The order of the median: = = 4 𝑎𝑛𝑑 + 1 = + 1 = 5, (fourth
2 2 2 2
and fifth terms). The middle values are 9 and 10.
9+10
Median =𝑋̃ = = 9.5.
2

Example 3
Let's consider the data:82, 56, 67, 54, 34, 78,29, 43, 23. What is the

median?

33
Solution

Arranging in ascending order, we get: 23,29, 34, 43, 54, 56, 67,
78,82.

n (no. of observations) = 9.
𝑛+1 9+1
So, the order of the median= = = 5, the median is the fifth
2 2
term.
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑥̃ = 54.

Example 4
Let's consider the data: 50, 67, 24, 34, 78, 43. What is the median?

Solution

Arranging in ascending order, we get: 24, 34, 43, 50, 67, 78.

n (no.of observations) = 6.
𝑛 𝑛 6 6
The order of the median= , + 1 = , + 1 = 3,4 , the median is the
2 2 2 2

third and the fourth term.


43 + 50
𝑀𝑒𝑑𝑖𝑎𝑛 = = 46.5
2

For grouped data

The main steps are:

(i) Construct the cumulative frequency distribution.

(ii) Find (N/2)th term

34
(iii) The class that contains the cumulative frequency N/2 is called the
median class.

(iv) Find the median by using the formula:

𝑁
− 𝑐𝑓
̃
𝑋=𝐿+ 2 𝑐
𝑓

Where

l = Lower limit of the median class,


f = Frequency of the median class
c = Width of the median class,
N = The total frequency (∑f)
𝑐𝑓 = cumulative frequency of the class preceding the median class

Example 1

Find the median marks for the following distribution:


Classes 0-10 10-20 20-30 30-40 40-50
Number of students 2 12 22 8 6

Solution

We need to calculate the cumulative frequencies to find the median.

Cumulative
Classes Frequency
frequency
0-10 2 2
10-20 12 2 + 12 = 14
20-30 22 14 + 22 = 36
30-40 8 36 + 8 = 44
40-50 6 44 + 6 = 50

35
𝑁
𝑁 = 50 ⟹ = 25,
2
The median class: 20-30.

𝐿 = 20, 𝑓 = 22, 𝑐𝑓 = 14, 𝑐 = 10


𝑁
− 𝑐𝑓 25 − 14
̃
𝑋=𝐿+ 2 × 𝑐 = 20 + × 10 = 25.
𝑓 22

Advantages and disadvantages of the median:

Advantages:
• Uniqueness. For a given set of data there is one and only one
median.
• Simplicity. It is easy to calculate.
• It is not affected by extreme values as is the mean.

Disadvantages
• The median does not take into account all values of the sample.

36
2-3 Mode
The value which appears most often in the given data i.e. The

observation with the highest frequency is called a mode of data.

Note:

1-A data may have no mode, 1 mode, or more than 1 mode.

2-Depending upon the number of modes the data has, it can be called

unimodal, bimodal, trimodal, or multimodal.

For ungrouped data:


For ungrouped data, we just need to identify the observation which
occurs maximum times.

Mode = Observation with maximum frequency

Example 1
The data: 6, 8, 9, 3, 4, 6, 7, 6, 3 the value 6 appears the most number
of times.
Thus, mode = 6.

Example 2
Find the mode of 5, 3, 5, 8, 9
Solution
Mode =5.

Example 3
Find the mode of 8, 9, 9, 7, 8, 2, and 5.
Solution
It is a bimodal Data: 8 and 9

37
Example 4:
Find the mode of 4, 12, 3, 6, and 7.
Solution
No mode for this data.

Example 5:
Find the mode of {19, 8, 29, 35, 19, 28, 15}
Solution
19 appears twice, all the rest appear only once, so 19 is the mode.

Example 6:
Find the mode of: {1, 3, 3, 3, 4, 4, 6, 6, 6, 9}
Solution
3 appears three times, as does 6.
So there are two modes: at 3 and 6.
Having two modes is called "bimodal".

For grouped data


When the data is continuous, the mode can be found using the
following steps:

Step 1: Find modal class i.e., the class with maximum frequency.

Step 2: Find mode using the following formula:

𝑓𝑚 −𝑓1
𝑀𝑜𝑑𝑒 = 𝑋̂ = 𝐿 + [(𝑓 ]h.
𝑚 −𝑓1 )+(𝑓𝑚 −𝑓2 )

Where:
L = lower limit of the modal class,
𝑓𝑚 = frequency of the modal class,
𝑓1 = frequency of class preceding the modal class,
𝑓2 = frequency of class succeeding the modal class,
h = width of the modal class.

38
Example 1

Find the mode of the given data:


Marks Obtained 0-20 20-40 40-60 60-80 80-100
Number of students 5 10 12 6 3

Solution

The highest frequency = 12, so the modal class is 40-60.

L= lower limit of modal class = 40

𝑓𝑚 = frequency of modal class =12

𝑓1 =frequency of class preceding modal class = 10

𝑓2 =frequency of class succeeding modal class = 6


h =class width = 20

Using the mode formula,

𝑓𝑚 − 𝑓1 12 − 10
𝑀𝑜𝑑𝑒 = 𝐿 + [ ] ℎ = 40 + [ ] × 20
(𝑓𝑚 − 𝑓1 ) + (𝑓𝑚 − 𝑓2 ) (12 − 10) + (12 − 6)

𝑀𝑜𝑑𝑒 = 45.

Example 2
The heights, in cm, of 50 students are recorded

Height (in cm) 125-130 130-135 135-140 140-145 145-150

Number of students 7 14 10 10 9

Calculate the mode.

39
Solution

Here, the maximum frequency is 14 and the corresponding class is


130-135.

So, 130-135 is the modal class.

L=130, h=5, 𝑓𝑚 =14, 𝑓1 =7 and 𝑓2 =10.


𝑓𝑚 −𝑓1 14−7
Mode=𝐿 + [(𝑓 ]h=130 +[ ]×5
𝑚 −𝑓1 )+(𝑓𝑚 −𝑓2 ) (14-7) +(14-10)

𝑋̂=133.18.

Hence, the modal height = 133.18 cm.

Example 3
Find the mean, mode and median for the following data,
Class 0-10 10-20 20-30 30-40 40-50
Frequency 8 16 36 34 6

Solution:
Class 𝑓𝑖 𝑥𝑖 Cumulative 𝑥𝑖 𝑓𝑖
frequency
0-10 8 5 8 40
10-20 16 15 24 240
20-30 36 25 60 900
30-40 34 35 94 1190
40-50 6 45 100 270
Sum 100 2640

∑𝑛𝑖=1 𝑥𝑖 𝑓𝑖 2640
𝑀𝑒𝑎𝑛 = 𝑛 = = 26.4.
∑𝑖=1 𝑓𝑖 100
Here, N =100 ⇒ N / 2 = 50.
Cumulative frequency just greater than 50 is 60 and corresponding
class is 20-30.
40
Thus, the median class is 20-30.
Hence, L = 20, c = 10, f = 36, c. f. of preceding class = 24 and N/2=50
𝑁
− 𝑐𝑓 50 − 24
̃
Median = 𝑋 = 𝐿 + 2 × 𝑐 = 20 + × 10 = 27.2.
𝑓 36
Median = 27.2.
Mode = 28.8.

Advantages and disadvantages of the median


Advantages:

• Simplicity. It is easy to calculate.


• It may be used for both quantitative and qualitative data.
• It is not affected by extreme values.

Disadvantages:

• There might be no mode or more than one mode.


• The mode does not take into account all the values of the sample.

41
Chapter 3
Measures of Dispersion
The measures of central tendency, such as the mean, median, and mode,

do not reveal the whole picture of the distribution of a data set. Two data

sets with the same mean may have completely different spreads. The

variation among the values of observations for one data set may be much

larger or smaller than for the other data set. (Note that the words

dispersion, spread, and variation have the same meaning.)

Thus, the mean, median, or mode by itself is usually not a sufficient

measure to reveal the shape of the distribution of a data set.

We also need a measure that can provide some information about the

variation among data values. The measures that help us learn about the

spread of a data set are called the measures of dispersion. The measures

of central tendency and dispersion taken together give a better picture of

a data set than the measures of central tendency alone.

Measures of Dispersion are used to find the spread of the observation


from the mean or about the mean.

42
They include range, mean deviation, quartiles, percentiles, deciles,
variance and standard deviation.

3-1-The range
It is the simplest measure of dispersion to calculate. It is obtained by
taking the difference between the largest and the smallest values in a
data set.

the Range for Ungrouped Data

Range= Largest value - smallest value

Example 1
For the set S below, find the range
(𝑖)𝑆 = {12,17,21,14,23,19}
(𝑖𝑖)𝑆 = {43,50,64,74,85,67,79,38}

Solution:
(𝑖)𝑇ℎ𝑒 𝑟𝑎𝑛𝑔𝑒 𝑜𝑓 𝑆 = 23 − 12 = 11.
(𝑖𝑖)𝑇ℎ𝑒 𝑟𝑎𝑛𝑔𝑒 𝑜𝑓 𝑆 = 85 − 38 = 47.

3-2-The standard deviation


The standard deviation
is the most-used measure of dispersion. The value of the standard
deviation tells how closely the values of a data set are clustered around
the mean.

In general, a lower value of the standard deviation for a data set


indicates that the values of that data set are spread over a relatively
smaller range around the mean.

43
In contrast, a larger value of the standard deviation for a data set
indicates that the values of that data set are spread over a relatively
larger range around the mean.

The standard deviation (𝜎)is obtained by taking the positive square root
of the variance.

The variance calculated for population data is denoted by 𝜎 2 (read as


sigma squared).

For Ungrouped data

The Variance

The variance is a measure that uses the mean as a point of reference.

2) ∑𝑛 ̅ 2
𝑖=1(𝑥𝑖 −𝑋)
Variance(𝜎 = ,
𝑛

where:
𝑥1 , 𝑥2 , … , 𝑥𝑛 𝑏𝑒 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑙𝑢𝑒𝑠.
𝑋̅ is the sample mean,
n is the sample size.

The Standard deviation is the square root of the variance

∑𝑛𝑖=1(𝑥𝑖 − 𝑋̅ )2
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝜎) = √𝜎 2 =√
𝑛

Example 2
Find the variance of 43, 46, 50, 53, 57, 61.
Solution

44
𝑥 (𝑥 − 𝑥̅ ) (𝑥 − 𝑥̅ )2
43 -8.66 74.9956
46 -5.66 32.0356
50 -1.66 2.7556
53 1.34 1.7956
57 5.34 28.5156
61 9.34 87.2356
Sum 227.3336

𝟒𝟑+𝟒𝟔+𝟓𝟎+𝟓𝟑+𝟓𝟕+𝟔𝟏 𝟑𝟏𝟎
̅=
Mean=𝒙 = = 𝟓𝟏. 𝟔𝟔.
𝟔 𝟔
𝟐𝟐𝟕.𝟑𝟑𝟑𝟔
Variance= = 𝟑𝟕. 𝟖𝟖𝟗𝟑𝟑𝟑𝟑.
𝟔
The standard deviation=√𝟑𝟕. 𝟖𝟖𝟗𝟑𝟑𝟑𝟑 = 𝟔. 𝟏𝟓𝟓𝟑𝟑𝟗𝟓.

Example 3
Consider the following set of data: 12,15,11,17,18,20,19. Find the standard
deviation and the variance.

Solution

𝑥𝑖 𝑥𝑖 − 𝑋̅ (𝑥𝑖 − 𝑥̅ )2
12 -4 16
15 -1 1
11 -5 25
17 1 1
18 2 4
20 4 16
19 3 9
Sum 112 72
112
𝑋̅ = = 16.
7

2)
∑𝑛𝑖=1(𝑥𝑖 − 𝑋̅)2
Variance(𝜎 = ≈ 10.3
𝑛
Standard deviation=3.21.

45
Coefficient of variation
It is sometimes useful to describe variability by expressing the standard
deviation as a proportion of mean, usually a percentage. The formula for
it as a percentage is:
Standard deviation
Coefficient of variation = × 100.
Mean
For Grouped data
The variance:
𝒏
1
̅ )2
𝝈2 = ∑ 𝒇𝒊 (𝒙𝒊 − 𝑿
𝑵
𝒊=1

The standard deviation :

𝒏
𝟏
̅ )𝟐
𝝈 = √ ∑ 𝒇𝒊 (𝒙𝒊 − 𝑿
𝑵
𝒊=𝟏

Where

𝒙𝒊 is the midpoint of each class,


̅ is the arithmetic mean,
𝑿

𝒇𝒊 is the number of frequency in each class, and

𝑵 is the total number of frequency

Example 4
Calculate the standard deviation and the variance from the following distribution of
marks

46
Marks 1-3 3-5 5-7 7-9
No. of students 40 30 20 10

Solution
Marks 𝑓𝑖 𝑥𝑖 𝑥𝑖 𝑓𝑖 (𝒙𝒊 − 𝑿̅ )𝟐 ̅ )𝟐
𝒇𝒊 (𝒙𝒊 − 𝑿
1-3 40 2 80 4 160
3-5 30 4 120 0 0
5-7 20 6 120 4 80
7-9 10 8 80 16 160
Sum 100 400 400

∑𝑛𝑖=1 𝑥𝑖 𝑓𝑖 400
𝑋̅ = 𝑛 = = 4.
∑𝑖=1 𝑓𝑖 100
1 400
The variance: 𝜎2 = ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑋̅ )2 = = 4.
𝑁 100

1
The standard deviation : 𝜎 = √ ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑋̅)2 =2.
𝑁

Example 5
For the distribution given below, find the standard deviation.
Classes 10-12 13-15 16-18 19-21 22-24 25-27

𝑓 3 8 12 13 10 4

Solution

47
Classes 𝑓 𝑥 𝑥𝑓 (𝑥𝑖 − 𝑋̅)2 𝑓𝑖 (𝑥𝑖 − 𝑋̅)2

10-12 3 11 33 61.7796 185.3388


13-15 8 14 112 23.6196 188.9568
16-18 12 17 204 3.4596 41.5152
19-21 13 20 260 1.2996 16.8948
22-24 10 23 230 17.1396 171.396
25-27 4 26 104 50.9796 203.9184
Sum 50 943 808.02

943
𝑋̅ = = 18.86.
50
𝜎 2 = 16.1604, 𝜎 = √16.1604 = 4.02.
Example 6
Calculate the standard deviation and the variance from the following data
Classes 10-14 15-19 20-24 25-29 30-34 35-39
Frequency 2 5 8 12 7 6
Solution
Classes 𝑓𝑖 𝑥𝑖 𝑥𝑖 𝑓𝑖 ̅)
(𝒙𝒊 − 𝑿 ̅ )𝟐
(𝒙𝒊 − 𝑿 ̅ )𝟐
𝒇𝒊 (𝒙𝒊 − 𝑿
10-14 2 12 24 14.375 206.64 413.281
15-19 5 17 85 -9.375 87.89 439.453
20-24 8 22 176 -4.375 19.14 153.125
25-29 12 27 324 0.625 0.39 4.688
30-34 7 32 224 5.625 31.64 221.484
35-39 6 37 222 10.625 112.89 677.344
Sum 40 1055 1909.375

48
∑𝑛𝑖=1 𝑥𝑖 𝑓𝑖 1055
𝑋̅ = = = 26.375.
∑𝑛𝑖=1 𝑓𝑖 40
1 1909.375
The variance: 𝜎2 = ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑋̅ )2 = = 47.734375.
𝑁 40

1
The standard deviation : 𝜎 = √ ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑋̅)2 =6.909.
𝑁

Example 7:
Calculate the standard deviation from the frequency table
Class 5-10 10-15 15-20 20-25
Frequency 5 6 15 10
Solution

Class Freq. Midpoint x 𝑓. 𝑥 𝑥 − 𝑥̅ 𝑓(𝑥 − 𝑥̅ )2


5-10 5 7.5 37.5 -9.1 414.05
10-15 6 12.5 75 -4.1 100.86
15-20 15 17.5 262.5 0.9 12.15
20-25 10 22.5 225 5.9 348.1

Sum 36 600 875.16

600
𝑋̅ = = 16.6
36

875.16 875.16
𝝈2 = = 24.31, 𝜎=√ = 4.93.
36 36

49
Exercise (2)
(I)Find: the arithmetic mean, median, mode, the variance and the
standard deviation for the following ungrouped data

1- 43, 47, 56, 66, 78, 88, 95, 101, 105 and 110
2- 21, 30, 38, 45, 50, 56, 71, 82 and 87.

3- 60, 63, 69, 71, 78, 80 and 81


4- 15, 33,19, 28, 32, 33 and 35.

5- 3,6,5,8,6,5,5,4,96.

(II)Find: the arithmetic mean, median, mode, the variance and the
standard deviation for the following grouped data

1-

Class 16-18 18-20 20-22 22-24 24-26 26-28


Frequency 40 120 140 70 19 11

2-

Class 20- 25- 30- 35- 40- 45- 50- 55- 60-
24 29 34 39 44 49 54 29 64

Frequency 6 6 4 3 3 6 5 1 6
3-

Class 35-40 40-45 45-50 50-55 55-60


Frequency 5 15 38 29 29

4-

Class 15-25 25-30 35-45 45-55 55-65


Frequency 5 25 40 20 10

5-

Class 10-20 20-30 30-40 40-50 50-60


Frequency 8 32 60 40 10

51
Chapter 4
CORRELATION AND REGRESSION

4-1-CORRELATION
Correlation coefficients measure the strength of the relationship between
two variables. A correlation between variables indicates that as one
variable changes in value, the other variable tends to change in a specific
direction. Understanding that relationship is useful because we can use
the value of one variable to predict the value of the other variable. For
example, height and weight are correlated—as height increases, weight
also tends to increase. Consequently, if we observe an individual who is
unusually tall, we can predict that his weight is also above the average.

In statistics, correlation coefficients are a quantitative assessment that


measures both the direction and the strength of this tendency to vary
together.

Types of Correlation Types I:


Positive (Direct) Correlation:
If y tends to increase as x increases, there is a positive correlation.
Negative(Inverse)Correlation:
If y tends to decrease as x increases, there is a negative correlation.

52
Zero Correlation:
If there is no relationship between x and y then there is zero or no
correlation.

Examples of Positive, Negative and Zero Correlation Coefficients

Examples of positive correlation

1-The relationship between the speed of a wind turbine and the amount
of energy it produces. As the turbine speed increases, electricity
production also increases.

2-water consumption and temperature.

3- study time and grades.

4- The more time you spend on a project, the more effort you'll have put in.

5. The more overtime you work, the more money you'll earn.

Examples of negative correlation

1-The relationship between outdoor temperature and heating costs. As


the temperature increases, heating costs decrease.

2-Alcohol consumption and driving ability.

3- Price & quantity demanded

4- As the number of your employees decreases, the more job positions


you'll have open.

53
5- The more you work in the office, the less time you'll spend at home.

Examples of No correlation

1. The nicer you treat your employees, the higher their pay will be.

2. The smarter you are, the later you'll arrive at work.

3. The wealthier you are, the happier you'll be.

4. The earlier you arrive at work, your need for more supplies increases.

5. The more funds you invest in your business, the more employees will
leave work early.

The scatter diagrams give a visual impression of correlation. If the


variables are plotted in the xy-plane, we get a scatter diagram

Important Notes:
1- The correlation coefficient lies between -1 and 1.
2-The correlation coefficient lies between 0 and 1 for a positive
correlation or between −1 and 0 for a negative correlation.
3-If r = +1, then the correlation between the two variables is said to be
perfect and positive.If r = -1, then the correlation between the two
variables is said to be perfect and negative

54
Some properties of the correlation coefficient (r) , 𝒓 ∈ [−𝟏, 𝟏]

r=0 No correlation

r=1 Perfect correlation

r=-1 Perfect inverse correlation

0 < 𝑟 < 0.4 Direct week correlation

−0.4 < 𝑟 < 0 inverse week correlation

0.4 ≤ 𝑟 ≤ 0.6 Direct moderate correlation

−0.6 ≤ 𝑟 ≤ −0.4 inverse moderate correlation

0.6 < 𝑟 < 1 Direct strong correlation

−1 < 𝑟 < −0.6 inverse strong correlation

While correlation studies how two entities relate to one another, a


correlation coefficient measures the strength of the relationship between
the two variables.
55
In statistics, there are three types of correlation coefficients. They are
as follows:

1-Pearson correlation: The Pearson correlation is the most commonly


used measurement for a linear relationship between two variables. The
stronger the correlation between these two datasets, the closer it'll be to
+1 or -1.
2-Spearman correlation: This type of correlation is used to determine
the monotonic relationship or association between two datasets. Unlike
the Pearson correlation coefficient, it's based on the ranked values for
each dataset and uses skewed or ordinal variables rather than normally
distributed ones.
3-Kendall correlation: This type of correlation measures the strength of
dependence between two datasets.

Types of Correlation Type II


1-Simple correlation:
Under simple correlation problem there are only two variables are
studied.

2-Multiple Correlation:
Under Multiple Correlation three or more than three variables are
studied.
3-Partial correlation:
Analysis recognizes more than two variables but considers only two
variables keeping the other constant.
56
4-Total correlation:
Is based on all the relevant variables, which is normally not feasible.

Types of Correlation Type III


1-Linear correlation:
Correlation is said to be linear when the amount of change in one
variable tends to bear a constant ratio to the amount of change in the
other.

The graph of the variables having a linear relationship will form a


straight line.

2-Non-Linear correlation:
The correlation would be nonlinear if the amount of change in one
variable does not bear a constant ratio to the amount of change in the
other variable.

57
Types of Correlation

Based on the direction Based upon the Based upon the constancy
number of variables of the ratio of change
of change of variables between the variables
studied

positive Negative Multiple Linear Non-linear


Simple
Correlation Correlation Correlation Correlation Correlation
Correlation

Partial Total

Pearson’s Correlation Coefficients Measure Linear Relationship


Pearson’s correlation coefficients measure only linear relationships.
Consequently, if your data contain a curvilinear relationship, the
correlation coefficient will not detect it.

Pearson’s Coefficient of Correlation denoted by-‘r’ The coefficient of


correlation ‘r’ measure the degree of linear relationship between two
variables say x & y.

𝒏 ∑𝒏𝒊=𝟏 𝒙𝒊 𝒚𝒊 − ∑𝒏𝒊=𝟏 𝒙𝒊 ∑𝒏𝒊=𝟏 𝒚𝒊


𝒓=
√𝒏 ∑𝒏𝒊=𝟏 𝒙𝒊 𝟐 − (∑𝒏𝒊=𝟏 𝒙𝒊 )𝟐 √𝒏 ∑𝒏𝒊=𝟏 𝒚𝒊 𝟐 − (∑𝒏𝒊=𝟏 𝒚𝒊 )𝟐

−𝟏 ≤ 𝒓 ≤ 𝟏

58
Example 1

Calculate correlation coefficient for the following data:

X 2 4 5 6 8 11
Y 18 12 10 8 7 5

Solution
X y XY X2 Y2
2 18 36 4 324
4 12 48 16 144
5 10 50 25 100
6 8 48 36 64
8 7 56 64 49
11 5 55 121 25
Sum 36 60 293 266 706

6 × 293 − 36 × 60
𝒓= = −0.920.
√6 × 266 − (36)2 √6 × 706 − (60)2

The relation between x and y is inverse and strong.

Example 2:

Calculate Pearson’s correlation coefficient between the variables x and y


for the following data, then determine its type.

x 2 -1 7 -8 5 -4 0 -5 8 -3
y 3 -4 7 -8 1 0 -3 -5 4 -1

Solution
59
x y xy 𝑥2 𝑦2
2 3 6 4 9
-1 -4 4 1 16
7 7 49 49 49
-8 -8 64 64 64
5 1 5 25 1
-4 0 0 16 0
0 -3 0 0 9
-5 -5 25 25 25
8 4 32 64 16
-3 -1 3 9 1
Sum 1 -6 188 257 190

𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − ∑𝑛𝑖=1 𝑥𝑖 ∑𝑛𝑖=1 𝑦𝑖


𝑟=
2 2
√𝑛 ∑𝑛𝑖=1 𝑥𝑖 2 − (∑𝑛𝑖=1 𝑥𝑖 ) √𝑛 ∑𝑛𝑖=1 𝑦𝑖 2 − (∑𝑛𝑖=1 𝑦𝑖 )

1886
r= = 0.86.
√2569√1846
The relation between x and y is direct and strong.

Important Note
Pearson’s correlation coefficient (r)does not change if we add or subtract
a constant number from or to all values of (𝑥) and also if we add or
subtract other constant number from or to all values of (𝑦).
Then if we put:
𝑋 = 𝑥 − 𝑥̅ 𝑎𝑛𝑑 𝑌 = 𝑦 − 𝑦̅ ,

Where:

60
∑𝑥
𝑥̅ is the mean of 𝑥 :𝑥̅ = .
𝑛
∑𝑦
𝑦̅ is the mean of 𝑦 :𝑦̅ = .
𝑛

𝑛 ∑ 𝑋𝑌 − (∑ 𝑋)(∑ 𝑌)
𝑟=
√𝑛 ∑ 𝑋 2 − (∑ 𝑋)2 √𝑛 ∑ 𝑌 2 − (∑ 𝑌)2

This formula can make the calculation of (r) easier.

Example 3
Calculate Pearson’s correlation coefficient between the variables x and y for the
following data, then determine its type.

x 56 62 76 77 83 86 90 92 98
y 67 44 53 48 55 42 41 34 39

Solution
∑ 𝑥 720
𝑥̅ = = = 80, 𝑋 = 𝑥 − 80.
𝑛 9
∑ 𝑦 423
𝑦̅ = = = 47, 𝑌 = 𝑦 − 47.
𝑛 9

𝒙 𝒚 𝑿 = 𝒙 − 𝟖𝟎 𝒀 = 𝒚 − 𝟒𝟕 𝑿𝟐 𝒀𝟐 𝑿𝒀
56 67 -24 20 576 400 -480
62 44 -18 -3 324 9 54
76 53 -4 6 16 36 -24
77 48 -3 1 9 1 -3
83 55 3 8 9 64 24
86 42 6 -5 36 25 -30
90 41 10 -6 100 36 -60
92 34 12 -13 144 169 -156
98 39 18 -8 324 64 -144
720 423 0 0 1538 804 -843
61
r = -0.76.
The relation between x and y is inverse and strong.

Spearman’s Rank correlation coefficient


1-It is used to determine the correlation coefficient between variables.
2-It depends on ranks and not on the values of variables.
3-It is possible to use this measure to determine the correlation
coefficient between descriptive data.
4-Spearman’s Rank correlation coefficient is given by:
6 ∑ 𝑑2
𝑟𝑠 = 1 −
𝑛(𝑛2 − 1)
Where:
d = rank x – rank y, and n is the number of data

Example 4
Find the rank correlation coefficient from the following data:
X 17 13 15 16 6 11 14 9 7 12
Y 36 46 35 24 12 18 27 22 2 8

Solution
x y Rank x=𝑅1 Rank y=𝑅2 d=𝑅1 -𝑅2 𝒅𝟐
17 36 1 2 -1 1
13 46 5 1 4 16
15 35 3 3 0 0
16 24 2 5 -3 9
6 12 10 8 2 4
11 18 7 7 0 0
14 27 4 4 0 0
62
9 22 8 6 2 4
7 2 9 10 -1 1
12 18 6 9 -3 9
Sum 44
𝟔(𝟒𝟒)
𝒓=𝟏− = 𝟎. 𝟕𝟑𝟑.
𝟏𝟎(𝟏𝟎𝟎 − 𝟏)
Correlation is direct strong.
Remark:
If there is some equal values, the rank is the mean of ranks.

Example 5
The following table gives the score of 10 students in statistics(x) and
anatomy(y). Find the Spearman’s rank correlation coefficient and
determine its type.
X 68 71 75 80 77 54 65 54 50 70
Y 46 60 40 36 41 36 25 31 52 58

Solution

x y Rank x=𝑅1 Rank y=𝑅2 d=𝑅1 -𝑅2 𝒅𝟐


68 46 5 7 -2 4
71 60 7 10 -3 9
75 40 8 5 3 9
80 36 10 3.5 6.5 42.25
77 41 9 6 3 9
54 36 2.5 3.5 -1 1
65 25 4 1 3 9
54 31 2.5 2 0.5 0.25

63
50 52 1 8 -7 49
70 58 6 9 -3 9
Sum 141.5

x has two equal values 54 and 54 , their ranks are 2 and 3


2+3
their mean= = 2.5.
2

Also, y has two equal values 36 and 36 , their ranks are 3 and 4
4+3
their mean= = 3.5.
2

6(141.5)
𝑟 =1− = 0.14.
10(100 − 1)
There is a direct week relation between scores of statistics and anatomy.
Example 6
The following table gives the score of 10 students in English(x) and
biophysics(y). Find the Spearman’s rank correlation coefficient and
determine its type.
X pass good pass good good v.good pass v.good pass Good
Y good pass pass good v.good Exc. Pass good pass v.good

Solution
x y Rank x=𝑅1 Rank y=𝑅2 d=𝑅1 -𝑅2 𝒅𝟐
Pass Good 2.5 6 -3.5 12.25
Good Pass 6.5 2.5 4 16
Pass Pass 2.5 2.5 0 0
Good Good 6.5 6 0.5 0.25
Good v.good 6.5 8.5 -2 4
v.good Exc. 9.5 10 -0.5 0.25
Pass Pass 2.5 2.5 0 0
64
v.good Good 9.5 6 3.5 12.25
Pass pass 2.5 2.5 0 0
good v.good 6.5 8.5 -2 4
49
In the scores of English(x):
There is four equal values (pass), their ranks are: 1, 2, 3 and 4.
1+2+3+4
their mean= = 2.5.
4

And, (good) repeated four times, their ranks are: 5,6,7 and 8.
5+6+7+8
their mean= = 6.5.
4

Also, v.good is repeated twice, their ranks are:9 and 10.


9+10
their mean= = 9.5.
2

In the scores of biophysics(y):


There is four equal values (pass), their ranks are: 1, 2, 3 and 4.
1+2+3+4
their mean= = 2.5.
4

And, (good) repeated four times, their ranks are: 5,6,7 and 8.
5+6+7
their mean= = 6.
3

Also, v.good is repeated twice, their ranks are:8 and 9.


9+8
their mean= = 8.5.
2

𝟔(𝟒𝟗)
𝒓=𝟏− = 𝟎. 𝟕.
𝟏𝟎(𝟏𝟎𝟎 − 𝟏)
There is a direct and strong relation between scores of English and
biophysics.

65
4-2- REGRESSION
Regression Analysis is a very powerful tool in the field of statistical
analysis in predicting the value of one variable, given the value of
another variable (unknown variable from known variable), when those
variables are related to each other.

Types of regression analysis


1-Simple linear regression.
66
2-Multiple linear regression.

3-Non-linear regression.
Simple linear regression
It is used to estimate the relationship between two quantitative
variables. You can use simple linear regression when you want to know
the value of the dependent variable at a certain value of the independent
variable .

Determination of the regression line Equation

Method of least squares to find the equation of the regression line of


Y on X

This method is used to fit the best straight line to a set of points such that
the sum of squares of the deviations of the points from the straight line is
as small as possible.

We have the values of the variables x and y say:

(𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 )


Suppose that the equation of the regression line of y on x:
𝑦 = 𝑎 𝑥 + 𝑏,
where:
1- “b” constant which gives the value of “y” when x=0. It is called the
“y” intercept.

67
2- “a” is a constant indicating the slope of the regression line, and it
gives a measure of the change in “y “for a unit change in “x”. It is also
regression coefficient of “y” on “x”.

3-The calculation formula for “a” and “b” are:

𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖− (∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 )


𝑎= 𝑛 2 𝑛 2 ,
∑ ∑
𝑛( 𝑖=1 𝑥𝑖 ) − ( 𝑖=1 𝑥𝑖 )

(∑𝑛𝑖=1 𝑦𝑖 ) − 𝑎(∑𝑛𝑖=1 𝑥𝑖 )
𝑏=
𝑛
Remark
• From basic algebra, recall that the slope is a number that describes
the steepness of the line.
• The sign of the slope b determines whether the line slopes upward
or downward, as shown in the figure below:
a) If b > 0, the line slopes upward to the right.
b) If b = 0, the line is horizontal.
c) If b < 0, the line slopes downward to the right.

68
• More specifically, when the slope b is positive, an increase in x
results in an increase in y. And if the slope b is negative, then an
increase in x results in a decrease in y.

Example 7

For the given data:


x 0 1 2 3 4
y 2 3 5 4 6

a) Find the least square regression line .


b) Use the least squares regression line to estimate the value of y at
x=10.
Solution

x y xy 𝑥2
0 2 0 0
1 3 3 1
2 5 10 4
3 4 12 9
4 6 24 16
10 20 49 30

b=0.9, a=2.2
(a)The least square regression line is:
𝑦 = 2.2 + 0.9 𝑥.
(b)at x=10, y=2.2+0.9(10)=11.2.

69
Important Remarks:
1-The equation of the regression line of 𝑦 𝑜𝑛 𝑥 is: 𝑦 = 𝑎𝑥 + 𝑏, then:
𝑎 (The regression coefficient of 𝑦 𝑜𝑛 𝑥 )
𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖− (∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 )
𝑎= 2 ⟶ (1)
𝑛(∑𝑛𝑖=1 𝑥𝑖2 ) − (∑𝑛𝑖=1 𝑥𝑖 )
(∑𝑛𝑖=1 𝑦𝑖 ) − 𝑎(∑𝑛𝑖=1 𝑥𝑖 )
𝑏=
𝑛
And if the equation of the regression line of 𝑥 𝑜𝑛 𝑦 is: 𝑥 = 𝑐𝑦 + 𝑑, then:
𝑐 (The regression coefficient of𝑥 𝑜𝑛 𝑦 )
𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖− (∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 )
𝑐= 2 ⟶ (2)
𝑛(∑𝑛𝑖=1 𝑦𝑖2 ) − (∑𝑛𝑖=1 𝑦𝑖 )
(∑𝑛𝑖=1 𝑥𝑖 ) − 𝑐(∑𝑛𝑖=1 𝑦𝑖 )
𝑑=
𝑛

2-We know that the linear correlation coefficient of Pearson between


𝑥 𝑎𝑛𝑑 𝑦 is:
𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖− (∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 )
𝑟= ⟶ (3)
√𝑛(∑𝑛𝑖=1 𝑥𝑖2 ) − (∑𝑛𝑖=1 𝑥𝑖 )2 √𝑛(∑𝑛𝑖=1 𝑦𝑖2 ) − (∑𝑛𝑖=1 𝑦𝑖 )2

From (1),(2) and (3), we get:


𝑟 2 = 𝑎𝑐.
The correlation coefficient has the same sign as the two regressions
coefficients.
𝑟 = +√𝑎𝑐 if a and c are positive.
𝑟 = −√𝑎𝑐 if a and c are negative.
70
Example 8
𝑥 5 6 4.5 6.5 7.5 5.5 4 8
𝑦 15 12 14 13 9 13 17 9
From the previous table find:
(i) The regression coefficient of 𝑦 𝑜𝑛 𝑥.
(ii) The regression coefficient of𝑥 𝑜𝑛 𝑦.
(iii) From (i)and (ii), find the correlation coefficient of Pearson
between 𝑥 𝑎𝑛𝑑 𝑦.

Solution
n=8
𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦
5 15 25 225 75
6 12 36 144 72
4.5 14 20.25 196 63
6.5 13 42.25 169 84.5
7.5 9 56.25 81 67.5
5.5 13 30.25 169 71.5
4 17 16 289 68
8 9 64 81 72
Sum 47 102 290 1354 573.5
(i) The regression coefficient of 𝑦 𝑜𝑛 𝑥
𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖− (∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 )
𝑎= 2 = −1.86.
𝑛(∑𝑛𝑖=1 𝑥𝑖2 ) − (∑𝑛𝑖=1 𝑥𝑖 )
(ii)The regression coefficient of𝑥 𝑜𝑛 𝑦
𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖− (∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 )
𝑐= 2 = −0.48.
𝑛(∑𝑛𝑖=1 𝑦𝑖2 ) − (∑𝑛𝑖=1 𝑦𝑖 )
71
(iii)The correlation coefficient of Pearson between 𝑥 𝑎𝑛𝑑 𝑦.
𝑟 = −√𝑎𝑐 = −0.95.
Example 9
The following scores represent a nurse’s assessment (X)and a physician’s
assessment (y)of the condition of 10 patients at time of admission to a
trauma center.
X 18 13 18 15 10 12 8 4 7 3
Y 23 20 18 16 14 11 10 7 6 4
(a) Draw a scatter diagram.
(b) Compute the linear regression equation by the least square method.
(c) Predict physician’s assessment for x=9.
(d) Compute the sample correlation coefficient.

Solution
(a)

(b)∑ 𝑥 = 108 , ∑ 𝑦 = 129, ∑ 𝑥𝑦 = 1672, ∑ 𝑥 2 = 1424, ∑ 𝑦 2 = 2027.


𝑦 = 1.21 + 1.082 𝑥.

72
(c )𝑦 = 10.948.
(d)𝑟 = 0.912.
It is strong direct correlation.
Example 10
Use the following table to compute:
x 42 45 51 58 60 61 66 69 73 75
y 22 23 25 26 28 29 31 31 32 33
(i) Pearson’s correlation coefficient between the variables x and y, and determine
its type.
(ii) The equation of the regression line of y on x.
Solution
x y 𝑥2 𝑦2 𝑥𝑦
42 22 1764 484 924
45 23 2025 529 1035
51 25 2601 625 1275
58 26 3364 676 1508
60 28 3600 784 1680
61 29 3721 841 1769
66 31 4356 961 2046
69 31 4761 961 2139
73 32 5329 1024 2336
75 33 5625 1089 2475
∑ 600 280 37146 7974 17187

n=10
𝑛 ∑𝑛 𝑛 𝑛
𝑖=1 𝑥𝑖 𝑦𝑖 −∑𝑖=1 𝑥𝑖 ∑𝑖=1 𝑦𝑖
(𝑖) 𝑟 =
2 2
√𝑛 ∑𝑛 2 𝑛
𝑖=1 𝑥𝑖 −(∑𝑖=1 𝑥𝑖 )
√𝑛 ∑𝑛 2 𝑛
𝑖=1 𝑦𝑖 −(∑𝑖=1 𝑦𝑖 )

r =0.99.
It is direct strong correlation.
𝑛(∑ 𝑥 𝑦)−(∑ 𝑥)(∑ 𝑦) 3870
(ii) a = = = 36.15,
𝑛(∑ 𝑥 2 )−(∑ 𝑥)2 √11460
73
(∑ 𝑦) − 𝑏(∑ 𝑥)
𝑏= = −2141.
𝑛
The equation of the regression line of y on x
𝑦 = 𝑎 𝑥 + 𝑏 = −2141 + 36.15 𝑥.
Example 11
The following data, give the relation between the two variables x (the
age) and y (no of cases of hypertension reported in clinic A), where:

𝑛 = 8, ∑ 𝑥 = 760, ∑ 𝑦 = 654, ∑ 𝑥 2 = 99000

∑ 𝑦 2 = 54464 𝑎𝑛𝑑 ∑ 𝑥𝑦 = 57620.

Find the regression coefficient of y on x.

Solution

𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖− (∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 ) −36080


𝐚= 2
= ≈ −𝟎. 𝟏𝟕.
𝑛(∑𝑛𝑖=1 𝑥2𝑖 ) − (∑𝑛𝑖=1 𝑥𝑖 ) 214400

74
Exercise (3)
Calculate Pearson's correlation coefficient between x and y, rank
correlation and determine the linear regression equation, for the
following tables:

1-

x 2 5 1 3 4 1 5
y 24 28 22 26 25 24 26

2-

x 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4


y 10 15 30 35 25 30 50 45

3-

x 30 20 10 30 10
y 0.9 0.8 0.5 1 0.8

4-

x 45 30 90 60 105 65 90 80 55 75
y 40 35 75 65 90 50 90 80 45 65
Chapter 5
Normal Distribution
Continuous Distribution
A continuous random variable is a variable whose possible values form
some interval of numbers.
Typically, a continuous variable involves a measurement of something,
such as the height of a person, the weight of a newborn baby, or the length
of time a car battery lasts. Continuous curves such as the one shown on
the graphs of function called probability densities, or informally,
continuous distributions.
Probability densities are characterized by the fact that the area under the
curve between any two values a and b gives the probability that a random
variable having this continuous distribution will take on a value on the
interval from a to b.

Normal distribution
The normal distribution is the most widely known and used of all
distributions. Because the normal distribution approximates many natural
phenomena so well, it has developed into a standard of reference
for many probability problems.

76
Characteristics of the Normal distribution

(1) It is a distribution of a random continuous variable X, whose

range is ] − ∞, ∞[ , and its probability density function is a bell

shaped curve depends on the two values 𝜇 (the mean) and 𝜎(the

standard deviation) of this random variable X.

77
(2) Normal distributions are symmetric around their mean (the line 𝑥 =
𝜇, it divides the area into two equal parts).

(3) The mean, median, and mode of a normal distribution are equal.

(4) The area under the normal curve is equal to 1.0.

(5) Normal distributions are denser in the center and less dense in the tails.

(6) Normal distributions are defined by two parameters, the mean (μ) and
the standard deviation (σ).

(7) Note that the normal distribution is actually a family of distributions,


since μ and σ determine the shape of the distribution.

(8) Continuous for all values of X between -∞ and ∞ so that each


conceivable interval of real numbers has a probability other than zero.

(9) -∞ ≤ X ≤ ∞, The standard normal curve extends indefinitely in both


directions, approaching, but never touching, the horizontal axis as it does
so.

78
(10)68% of the area of a normal distribution is within one standard
deviation of the mean. i.e., about 2/3 of all cases fall within one standard
deviation of the mean, that is

P (μ - σ ≤ X ≤ μ + σ) = .6826.

(11) Approximately 95% of the area of a normal distribution is within two


standard deviations of the mean. i.e.,
P (μ - 2σ ≤ X ≤ μ + 2σ) = .9544

(12) The rule for a normal density function is

1 (𝑥−𝜇 2 )⁄
2) −
𝑓(𝑥; 𝜇, 𝜎 = 𝑒 2𝜎 2
√2𝜋𝜎 2

79
The standardized normal distribution
The normal distribution with μ = 0 and σ = 1 is called standard normal distribution
As you might suspect from the formula for the normal density function, it
would be difficult and tedious to do the calculus every time we had a new
set of parameters for μ and σ.
So instead, we usually work with the standardized normal distribution,
where μ = 0 and σ = 1, i.e., N (0,1). That is, rather than directly solve a
problem involving a normally distributed variable X with mean μ and
standard deviation σ, an indirect approach is used.
1. We first convert the problem into an equivalent one dealing with a
normal variable measured in standardized deviation units, called a
2
standardized normal variable. To do this, if X ∼ N (μ, σ ), then

𝑥−𝜇
𝑧= ~𝑁(0,1)
𝜎

2. A table of standardized normal values can then be used to obtain an


answer in terms of the converted problem.

Example 1
If a random variable has the normal distribution with μ = 82.0 and 𝜎 =
4.8, find the probabilities that it will take on a value
(a) Less than 89.2
(b) Greater than 78.4

80
(c) Between 83.2 and 88.0
(d) Between 73.6 and 90.4
Solution

(a)We have

89.2 − 82
𝑧= = 1.5
4.8
Therefore, the probability is: 0.4332+0.5=0.9332.
(b)We have
78.4 − 82
𝑧= = −0.75
4.8
Therefore, the probability is: 0.2734+0.5=0.7734
(c)We have
83.2 − 82 88 − 82
𝑧1 = = 0.25, 𝑧2 = = 1.25
4.8 4.8
Therefore, the probability is: 0.3944-0.0987=0.2957
(d)We have
73.6 − 82 90.4 − 82
𝑧1 = = −1.75, 𝑧2 = = 1.75
4.8 4.8
Therefore, the probability is: 0.4599+0.4599=0.9198
Example 2
As reported in Runner's World magazine, the times of the finishers in the
New York City 10-Km run are normally distributed with mean 61 minutes
and standard deviation 9 minutes.

81
(a)Determine the percentage of finishers with times between 50 and
70minutes.
(b)Determine the percentage of finishers with times less than 75 minutes.
Solution
(a)We have
50 − 61 70 − 61
𝑧1 = = −1.22, 𝑧2 = =1
9 9
Therefore, the probability is: 0.3888+0.3413=0.7301⟹ 73.01%
(b)We have
75 − 61
𝑧= = 1.5556
9
Therefore, the probability is: 0.5+0.4406=0.9406⟹ 94.06%

Example 3
If Z is a standard normal random variable find the value of the +ve of the
real number "a" which satisfies:
(𝑖)𝑃(𝑍 ≥ 𝑎) = 0.4013 (𝑖𝑖)𝑃(𝑍 ≤ 𝑎) = 0.648
(𝑖𝑖𝑖)𝑃(𝑍 ≥ −𝑎) = 0.8577 (𝑖𝑣)𝑃(𝑍 ≤ −𝑎) = 0.2643
Solution
(𝑖)𝑃(𝑍 ≥ 𝑎) = 𝑃(𝑍 ≥ 0) − 𝑃(0 ≤ 𝑍 ≤ 𝑎)

0.4013 = 0.5 − 𝑃(0 ≤ 𝑍 ≤ 𝑎) ⟹ 𝑃(0 ≤ 𝑍 ≤ 𝑎) = 0.0987

𝑎 = 0.25

82
(𝑖𝑖)𝑃(𝑍 ≤ 𝑎) = 𝑃(𝑍 ≤ 0) + 𝑃(0 ≤ 𝑍 ≤ 𝑎)

0.648 = 0.5 + 𝑃(0 ≤ 𝑍 ≤ 𝑎) ⟹ 𝑃(0 ≤ 𝑍 ≤ 𝑎) = 0.148

𝑎 = 0.38

(𝑖𝑖𝑖)𝑃(𝑍 ≥ −𝑎) = 𝑃(𝑍 ≥ 0) + 𝑃(0 ≤ 𝑍 ≤ 𝑎)

0.8577 = 0.5 + 𝑃(0 ≤ 𝑍 ≤ 𝑎) ⟹ 𝑃(0 ≤ 𝑍 ≤ 𝑎) = 0.3577

𝑎 = 1.07

(𝑖𝑖)𝑃(𝑍 ≤ −𝑎) = 𝑃(𝑍 ≤ 0) − 𝑃(0 ≤ 𝑍 ≤ 𝑎)

0.2643 = 0.5 − 𝑃(0 ≤ 𝑍 ≤ 𝑎) ⟹ 𝑃(0 ≤ 𝑍 ≤ 𝑎) = 0.2357

𝑎 = 0.63

Example 4

If Z is a standard normal random variable find the value of the

+ve of the real number "a" which satisfies:

(𝑖)𝑃(−𝑎 ≤ 𝑍 ≤ 𝑎) = 0.9010

(𝑖𝑖)𝑃(−1.4 ≤ 𝑍 ≤ 𝑎) = 0.7270

(𝑖𝑖𝑖)𝑃(−𝑎 ≤ 𝑍 ≤ 0.64) = 0.7290

Solution

(𝑖)𝑃(−𝑎 ≤ 𝑍 ≤ 𝑎) = 𝑃(−𝑎 ≤ 𝑍 ≤ 0) + 𝑃(0 ≤ 𝑍 ≤ 𝑎)

Due to symmetry:
83
= 𝑃(0 ≤ 𝑍 ≤ 𝑎) + 𝑃(0 ≤ 𝑍 ≤ 𝑎) = 2𝑃(0 ≤ 𝑍 ≤ 𝑎) = 0.9010

𝑃(0 ≤ 𝑍 ≤ 𝑎) = 0.4505 ⟹ 𝑎 = 1.65

(𝑖𝑖)𝑃(−1.4 ≤ 𝑍 ≤ 𝑎) = 𝑃(−1.4 ≤ 𝑍 ≤ 0) + 𝑃(0 ≤ 𝑍 ≤ 𝑎)

Due to symmetry:
⟹ 𝑃(0 ≤ 𝑍 ≤ 1.4) + 𝑃(0 ≤ 𝑍 ≤ 𝑎) = 0.7270

0.4192 + 𝑃(0 ≤ 𝑍 ≤ 𝑎) = 0.7270 ⟹ 𝑃(0 ≤ 𝑍 ≤ 𝑎) = 0.3078

𝑎 = 0.87

(𝑖𝑖𝑖)𝑃(−𝑎 ≤ 𝑍 ≤ 0.64) = 𝑃(−𝑎 ≤ 𝑍 ≤ 0) + 𝑃(0 ≤ 𝑍 ≤ 0.64)

Due to symmetry:
𝑃(0 ≤ 𝑍 ≤ 𝑎) + 0.2389 = 0.7290 ⟹ 𝑃(0 ≤ 𝑍 ≤ 𝑎) = 0.4901

𝑎 = 2.33

Example (5)
Using the table of the area under the standard normal curve where
Z is a standard normal random variable, find:

(𝑖)𝑃(0.86 ≤ 𝑍 ≤ 1.42)

(𝑖𝑖)𝑃(−1.12 ≤ 𝑍 ≤ 0.64)

(𝑖𝑖𝑖)𝑃(−1.92 ≤ 𝑍 ≤ −0.83)

Solution
84
(𝑖)𝑃(0.86 ≤ 𝑍 ≤ 1.42) = 𝑃(0 ≤ 𝑍 ≤ 1.42) − 𝑃(0 ≤ 𝑍 ≤ 0.86)

= 0.4222 − 0.3051 = 0.1171

(𝑖𝑖)𝑃(−1.12 ≤ 𝑍 ≤ 0.64) = 𝑃(−1.12 ≤ 𝑍 ≤ 0) + 𝑃(0 ≤ 𝑍 ≤ 0.64)

Due to symmetry:
⟹ 𝑃(0 ≤ 𝑍 ≤ 1.12) + 𝑃(0 ≤ 𝑍 ≤ 0.64) = 0.3686 + 0.2389
= 0.6075

(𝑖𝑖𝑖)𝑃(−1.92 ≤ 𝑍 ≤ −0.83) 𝐷𝑢𝑒 𝑡𝑜 𝑠𝑦𝑚𝑚𝑒𝑡𝑟𝑦


= 𝑃(0.83 ≤ 𝑍 ≤ 1.92)

𝑃(0.83 ≤ 𝑍 ≤ 1.92) = 𝑃(0 ≤ 𝑍 ≤ 1.92) − 𝑃(0 ≤ 𝑍 ≤ 0.83)

= 0.4726 − 0.2967 = 0.1759.

85
86
87
𝑬𝒙𝒆𝒓𝒄𝒊𝒔𝒆 (4)
(1) If the weight of 1000 persons is normally distributed with mean 80 kg. and
standard deviation 5 kg. Find:
(i) The number of persons, whose weights are more than 92 kg.
(ii) The number of persons, whose weights are less than 75 kg.
(iii) The number of persons, whose weights are between 60 kg. and 98 kg.

(2) If Z is a standard normal random variable find the value of the +ve of the
real number "a" which satisfies:

(𝑖)𝑃(−𝑎 ≤ 𝑍 ≤ 𝑎) = 0.5160 (𝑖𝑖)𝑃(−𝑎 ≤ 𝑍 ≤ 𝑎) = 0.9010

(𝑖𝑖𝑖)𝑃(−0.77 ≤ 𝑍 ≤ 𝑎) = 0.42 (𝑖𝑣)𝑃(−𝑎 ≤ 𝑍 ≤ 1.62) = 0.55

(𝑣)𝑃(𝑍 ≤ −𝑎) = 0.9920 (𝑣𝑖)𝑃(𝑍 ≥ 𝑎) = 0.0781

(𝑣𝑖𝑖)𝑃(0 ≤ 𝑍 ≤ 𝑎) = 0.195 (𝑣𝑖𝑖𝑖)𝑃(−𝑎 ≤ 𝑍 ≤ 0) = 0.437

(3) If Z is a standard normal random variable find:


(𝑖)𝑃(𝑍 ≤ 1.5) (𝑖𝑖)𝑃(𝑍 ≤ −1.03)

(𝑖𝑖𝑖)𝑃(𝑍 ≥ 0.78) (𝑖𝑣)𝑃(𝑍 ≥ −3.04)

(𝑣 )𝑃(−0.63 ≤ 𝑍 ≤ 0.63) (𝑣𝑖)𝑃(−1.75 ≤ 𝑍 ≤ −0.8)

88
(4) When revising 100 books, we find that the misprints are normally
distributed with mean 16 and standard deviation 5. Find the number of
books with less than 10 misprints.

(5)A factory produces tyres for cars, the lengths of its diameters (X) are
normally distributed with mean 𝜇 = 24 and standard deviation 𝜎 = 1.5.
Calculate the following probabilities:

(𝑖)𝑋 ≤ 21 (𝑖𝑖 )𝑋 ≥ 25

(iii) The percentage of tyres whose diameters are “a” such that 21 ≤ 𝑎 ≤ 27.

89
Chapter 6
Test of Hypotheses about Population Mean (𝝁)
This chapter covers hypothesis testing, the second of two general areas of
statistical inference. Hypothesis testing is a topic with which you as a
student are likely to have some familiarity.
Hypotheses are assumptions about the parameters of one or more
populations. We test hypotheses to assess their correctness.
The purpose of hypothesis testing is to help the researcher or administrator
in reaching a decision concerning a population by examining a sample
from that population.
The main steps

1- set up your statistical hypotheses, which are :

(i)the null hypothesis (H0)

(ii)the alternative hypothesis (H1).

2- Test statistic

It is mathematical expression of sample values which provides a basis for


testing a statistical hypothesis, to reach a decision to reject or to accept
the null hypothesis.

3-Determine the Acceptance region and the Rejection region.

4- The decision.

90
Hypothesis

Null hypothesis (H0)

1-It is the hypothesis to be tested.

2-It is the hypothesis of equality or the hypothesis of no difference.


Alternative hypothesis(H1) or (𝑯𝑨 )
1-It is the hypothesis available when the null hypothesis has to be rejected.
2-It is the hypothesis of difference
The level of significance (𝜶) is a probability and, in fact, is the probability
of rejecting a true null hypothesis.

For example, if 𝛼 = 0.05, This means that there is a 5% chance that you
will accept your alternative hypothesis when your null hypothesis is
actually true.

Critical values
The values of the test statistic that separate the rejection region from the
acceptance region.
Acceptance region
A set of values of the test statistic leading to acceptance of the null
hypothesis (Values of the test not included in the critical region).
Rejection region
A set of values of the test statistic leading to rejection of the null
hypothesis.

91
Statistical decision
It consists of rejecting or not rejecting the (H0). It is rejected if the
computed value of the test statistic falls in the rejection area, and it is
not rejected if the computed value of the test statistic falls in the
acceptance region.
Conclusion
Determine whether or not (H0) can be rejected. If (H0) is rejected, the
statistical conclusion is that the alternative hypothesis (H1) is true.
Two side test
If the rejection area is divided into the two tails the test is called two-
sided test.

92
One sided test:
If the rejection region is only in one tail it is called one-side test.

93
Errors
There are two possible errors to come to the wrong conclusion.

Types of Errors
Type I Error Rejecting Ho Type II Error Accepting
when in fact Ho when in fact
Ho is actually true Ho is actually false

The probability of making a Type The probability of


I Error is called making a Type II
the significance level, denoted Error is 𝜷
by alpha,𝜶; (0.01, 0.05, 0.1).

Decision
Accept 𝑯𝟎 Reject 𝑯𝟎
𝐻0 is True Correct decision Incorrect decision
Probability=1 − 𝛼 Type I error
Probability=𝛼

𝐻0 is false Incorrect decision Correct decision


Type II error Probability=1 − 𝛽
Probability=𝛽

94
How to perform a test of hypothesis for the population mean 𝜇 when the
population standard deviation is 𝜎, here there are three possible cases as
follows:

Variance 𝝈𝟐 Size of sample Test statistic


Known Unconditionally 𝑋̅ − 𝜇
𝑧= 𝜎
√𝑛
Unknown 𝑛 > 30 𝑋̅ − 𝜇
𝑧= 𝑠
√𝑛
Unknown 𝑛 ≤ 30 𝑋̅ − 𝜇
𝑡= 𝑠
√𝑛

(𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑣𝑎𝑙𝑢𝑒) − (𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒)


𝑇𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟

Level of significance Table


Level of 0.10 0.05 0.01 0.005 0.002
significance
Critical values -1.28 -1.645 -2.33 -2.58 -2.88
of Z for One- or or or or or
Tailed Test. 1.28 1.645 2.33 2.58 2.88
Critical values -1.645 -1.96 -2.58 -2.81 -3.08
of Z for Two- and and and and and
Tailed Test. 1.645 1.96 2.58 2.81 3.08

95
Example 1
Does the evidence support the idea that the average lecture consists of
3000 words if a random sample of the lectures of 16 professors had a
mean of 3472 words, given the population standard deviation is 500
words? Use 𝛼 = 0.01. Assume that lecture lengths are approximately
normally distributed. Show all steps.
Solution
𝜇 = 3000,
𝜎 = 500,
𝑋̅ = 3472,
𝑛 = 16,
𝛼 = 0.01
1- 𝐻0 : 𝜇 = 3000

2- 𝐻1 : 𝜇 ≠ 3000

3- 𝛼 = 0.01

𝑋̅−𝜇 3472−3000
4- 𝑧(𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 ) = 𝑧𝑐 = 𝜎 = 500 = 3.78
√𝑛 √16

5-𝑧(𝑡𝑎𝑏𝑙𝑒) = 𝑧𝛼 = 𝑧0.01 = 𝑧0.005


2 2

6-Reject 𝐻0 𝑖𝑓 𝑧 < −2.576 𝑜𝑟 𝑧 > 2.576

7-Reject 𝐻0 because 3.78 > 2.576

8-At 𝛼 = 0.01 , the population mean is not equal to 3000 words.

96
Example 2

A certain breed of rats shows a mean weight gain of 65gm, during the first 3 months
of life.16 of these rats were fed a new diet from birth unit age of 3 months. The mean
was 60.75 gm. If the population variance is 10 gm, is there a reason to believe at the
5% level of significance that the new diet causes a change in the average amount of
weight gained.

Solution
𝐻0 : 𝜇 = 65 , 𝐻1 = 𝜇 ≠ 65, , ̅𝑋 = 60.75, 𝛼 = 0.05
𝑋̅ − 𝜇 60.75 − 65
𝑍= 𝜎 = = −5.38
√𝑛 √10
16
Since the calculated values falls in the rejection region, we reject the 𝐻0 and accept
the 𝐻1 .

Example 3
Grain millers claim that the average weight of a bag of maize flour is
80kg. If a random sample of 100 bags had a mean of 79kg and standard
deviation of 4kg test whether the average weight of the bags is less than
80kg at 5% level of significance.

Solution:
The model is normal
𝐻0 : 𝜇 = 80 𝐾𝑔
𝐻1 : 𝜇 < 80 𝐾𝑔
𝜎 = 4, ̅𝑋 = 79, 𝛼 = 0.05.
97
The Z-value which of 5% to the left is −1 645
79 − 80
𝑧= = −2.5
4
√100
−2 5 is in the rejection region so we reject the null hypothesis, i.e.,
we accept the alternative hypothesis that the average weight of bags
is less than 80kg.
Example 4

The average mark in an examination at secondary school is 58% with a


standard deviation of 2%. Is there reason to believe that there has been a
change in performance if a random sample of 40 students has an average
of 60%? Test this claim at 2% level of significance.
Solution:
𝐻0 : 𝜇 = 58

𝐻1 : 𝜇 ≠ 58

𝜎=2

98
This is a two tailed test i.e., there is an area of 1% at either tail.

The Z value which haves an area of 1% at a either tail is 2.33.


Evaluating the sample
60 − 58
𝑧= = 6.3245.
2
√40

Since 6.3245 is greater than 2.33, we reject 𝐻0 and conclude that there is
a significant change in performance of students at 2% level of
significance.
Example 5

A researcher reports that the average salary of assistant professors is


more than $42,000. A sample of 30 assistant professors has a mean
salary of $43,260. At 𝛼 = 0.05 , test the claim that assistant professors
earn more than $42,000 a year. The standard deviation of the population
is $5230.

Solution

Step 1: State the hypotheses and identify the claim.

𝐻0 : 𝜇 ≤ $42,000 𝐻1 : 𝜇 > $42,000 (claim)


99
Step 2: Find the critical value. Since 𝛼 = 0.05 and the test is a right-tailed

test, the critical value is z = +1.65.

Step 3: Compute the test value.

𝑋̅−𝜇 43,260 – 42,000


Step 3: 𝑍 = 𝜎 = 5230 = 1.32.
√𝑛 √30

Step 4: Make the decision. Since the test value, +1.32, is less than the
critical value, +1.65, and not in the critical region, the decision is “Do not
reject the null hypothesis.”

Step 5: Summarize the results. There is not enough evidence to support


the claim that assistant professors earn more on average than $42,000 a
year.

Example 6
In a certain community, a claim is made that the average income of all
employed individuals is 35,500$. A group of citizens suspects this value
is incorrect and gathers a random sample of 140 employed individuals in
hopes of showing that 35,500$ is not the correct average. The mean of
the sample is $34,325 with a population standard deviation of 4,200$.
Test at α = 0.10. Show all steps.
Solution
𝜇 = 35.500, 𝜎 = 4.200, 𝑥̅ = 34.325, 𝑛 = 140, 𝛼 = 0.10

100
(1)𝐻0 : 𝜇 = 35.500

(2)𝐻1 : 𝜇 ≠ 35.500

(3)𝛼 = 0.1

(4)𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑧 > 1.645 𝑜𝑟 𝑧 > −1.645.

34325 − 35500
(5)𝑧 = = −3.31.
4200
( )
√140
(6)𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 , 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 − 3.31 < −1.645.

(7)𝐴𝑡 𝛼 = 0.10, 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 𝑖𝑠 𝑛𝑜𝑡 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 35.500$.

101
p-Value
Assuming that the null hypothesis is true, the p-value can be defined as
the probability that a sample statistic (such as the sample mean) is at
least as far away from the hypothesized value in the direction of the
alternative hypothesis as the one obtained from the sample data under
consideration.
Note that the p-value is the smallest significance level at which the null
hypothesis is rejected

Using the p-value approach, we reject the null hypothesis if


𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 𝛼 𝑜𝑟 𝛼 > 𝑝 − 𝑣𝑎𝑙𝑢𝑒
and we do not reject the null hypothesis if
𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≥ 𝛼 𝑜𝑟 𝛼 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒
For a one-tailed test, the p-value is given by the area in the tail of the
sampling distribution curve beyond the observed value of the sample
statistic.
The next figure shows the p-value for a right-tailed test about 𝜇.
For a left-tailed test, the p-value will be the area in the lower tail of
the sampling distribution curve to the left of the observed value of 𝑥̅ .

102
The p-value for a right- tailed test.

For a two-tailed test, the p-value is twice the area in the tail of the
sampling distribution curve beyond the observed value of the sample
statistic. Each of the areas in the two tails gives one-half the p-value.

The p-value for a two-tailed test

A test of hypothesis procedure that uses the p-value approach involves


the following four steps.
Steps to Perform a Test of Hypothesis Using the p-Value Approach
1. State the null and alternative hypothesis.
2. Select the distribution to use.
3. Calculate the p-value.
4. Make a decision.

103
Example 7
The mean GPA at a certain university is 2.80 with a population standard
deviation of 0.3. A random sample of 16 business students from this
university had a mean of 2.91. Test to determine whether the mean GPA
for business students is greater than the university mean at the 0.10 level
of significance. Show all steps.
Solution
𝜇 = 2.80, 𝜎 = 0.3, 𝑥̅ = 2.91, 𝑛 = 16, 𝛼 = 0.10

(1)𝐻0 : 𝜇 = 2.8

(2)𝐻1 : 𝜇 > 2.8

(3)𝛼 = 0.10

2.91 − 2.80
(4)𝑧 = = 1.46.
0.3
( )
√16

(5)𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.075.

(6)0.075 < 0.1, 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 .

(7)𝐴𝑡 𝛼 = 0.10, 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 𝑖𝑠 𝑔𝑟𝑒𝑎𝑡𝑒𝑟 𝑡ℎ𝑎𝑛 2.8.

104
Example 8
A study by the Web metrics firm Experian showed that in August of 2011,
the mean time spent per visit to Facebook was 20.8 minutes with a
population standard deviation of 8 minutes. Suppose a simple random
sample of 60 visits in August 2013 has a mean of 21.5 minutes. A social
scientist is interested to know whether the mean time of Facebook visits
has changed. Use α = 0.05. Show all steps.
Solution
𝜇 = 20.8, 𝜎 = 8, 𝑥̅ = 21.5, 𝑛 = 60, 𝛼 = 0.05

(1)𝐻0 : 𝜇 = 20.8

(2)𝐻1 : 𝜇 ≠ 20.8

(3)𝛼 = 0.05

21.5 − 20.8
(4)𝑧 = = 0.68.
8
( )
√60

(5)𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.498.

(6)0.498 > 0.05, 𝐴𝑐𝑐𝑒𝑝𝑡 𝐻0 .

(7)𝐴𝑡 𝛼 = 0.05, 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 𝑖𝑠 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 20.8.

105
Example 9
The management of Priority Health Club claims that its members lose an
average of 10 pounds or more within the first month after joining the
club. A consumer agency that wanted to check this claim took a random
sample of 36 members of this health club and found that they lost
an average of 9.2 pounds within the first month of membership. The
population standard deviation is known to be 2.4 pounds. Find the p-
value for this test. What will your decision be if 𝛼 = 0.01? What if
𝛼 = 0.05?
Solution
Let 𝜇 be the mean weight lost during the first month of membership by
all members of this health club, and let be the corresponding mean for
the sample. From the given information,
𝑛 = 36, 𝑥̅ = 9.2 𝑝𝑜𝑢𝑛𝑑𝑠, 𝑎𝑛𝑑 𝜎 = 2.4 𝑝𝑜𝑢𝑛𝑑𝑠.
The claim of the club is that its members lose, on average, 10 pounds or
more within the first month of membership. To perform the test using
the p-value approach, we apply the following four steps.
Step 1. State the null and alternative hypotheses.

𝐻0 : 𝜇 ≥ 10(𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑤𝑒𝑖𝑔ℎ𝑡 𝑙𝑜𝑠𝑡 𝑖𝑠 10 𝑝𝑜𝑢𝑛𝑑𝑠 𝑜𝑟 𝑚𝑜𝑟𝑒)

𝐻1 : 𝜇 < 10(𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑤𝑒𝑖𝑔ℎ𝑡 𝑙𝑜𝑠𝑡 𝑖𝑠 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛10 𝑝𝑜𝑢𝑛𝑑𝑠 )

Step 2. Select the distribution to use.

106
Here, the population standard deviation is known, and the sample size is
large 𝑛 > 30). Hence, the sampling distribution of 𝑥̅ is normal with its
mean equal to 𝜇 and the standard deviation equal to 𝜎𝑥̅ Consequently,
we will use the normal distribution to find the p-value and perform the
test.
Step 3. Calculate the p-value.
The < sign in the alternative hypothesis indicates that the test is left-
tailed.
The p-value is given by the area to the left of 𝑥̅ = 9.2 under the
sampling distribution curve of 𝑥̅ , as shown in the next figure.
To find this area, we first find the z value for as follows:
9.2 − 10
𝑧= = −2.00
2.4
√36

The area to the left of 𝑥̅ = 9.2 under the sampling distribution of 𝑥̅ is


equal to the area under the standard normal curve to the left of 𝑧 =

107
−2.00 . From the normal distribution table, the area to the left of 𝑧 =
−2.00 is .0228 .
(𝑃(𝑧 < −2) = 0.5 − 𝑃(0 < 𝑧 < 2) = 0.5 − 0.4772 = 0.0228)
Consequently,
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.0228.
Step 4. Make a decision.
Thus, based on the p-value of .0228, we can state that for any 𝛼
(significance level) greater than .0228 we will reject the null hypothesis
stated in Step 1, and for any 𝛼 less than or equal to .0228 we will not
reject the null hypothesis.
Since 𝛼 = 0.01 is less than the p-value of .0228, we do not reject the
null hypothesis at this significance level. Consequently, we conclude
that the mean weight lost within the first month of membership by the
members of this club is 10 pounds or more.
Now, because 𝛼 = 0.05 is greater than the p-value of .0228, we reject
the null hypothesis at this significance level. Therefore, we conclude that
the mean weight lost within the first month of membership by the
members of this club is less than 10 pounds.

108
Exercise (5)

109
References
1-Biostatistics: Basic Concepts and Methodology for the Health Sciences,
10th Edition International Student Version.

2- The basic practice of statistics, Moore, D.S. Notz, W. I, & Flinger, M.


A. (2013). (6th ed.). New York, NY: W. H. Freeman and Company.
3- Applied Statistics and Probability for Engineers by Montgomery,
Douglas C.Montgomery, George C.Runger , 6th edition.
4- INTRODUCTORY STATISTICS, PREM S. MANN,
CHRISTOPHER JAY LACKE, Seventh Edition.
5- BIOSTATISTICS, A Foundation for Analysis in the Health Sciences,
WAYNE W. DANIEL, CHAD L. CROSS, Tenth edition.

110

You might also like