0% found this document useful (0 votes)

6 views94 pages

Lectures 3 - 6 - 2017

The document introduces methods for organizing, summarizing, and presenting data, focusing on frequency tables, graphs/charts, and summary statistics. It covers tabular presentations, graphical representations, and numerical summaries, detailing their definitions, uses, and examples. Key concepts include frequency distribution, relative frequency, cumulative frequency, and various graphical tools like histograms, box plots, and pie charts.

Uploaded by

Stanley Ogili

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views94 pages

Lectures 3 - 6 - 2017

Uploaded by

Stanley Ogili

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 94

Summarisation and Presentation of Data

(Descriptive Statistics)

Rotimi F. Afolabi, PhD

Department of Epidemiology & Medical Statistics
College of Medicine, University of Ibadan
Session Objective

• Introduce students to the following different

methods of organising, summarising and presenting
data:
– frequency tables (tabular descriptive),
– graphs/charts (graphical descriptive) and
– summary (numerical descriptive) statistics

Descriptive Statistics_Lectures 3-6_2017 2

Tabular Presentation of Data
(Lecture 3)
Tables
♦ Definition
– Data presented in columns and rows by one or
more classification variable
• Uses
– demonstrate patterns, exceptions, differences and
other relationships
– serve as the basis for preparing more visual
displays of data, such as graphs and charts, where
some of the detail may be lost

Descriptive Statistics_Lectures 3-6_2017 4

Examples:
Represent the data on the age distribution of
adult admissions into UCH in June 1998.

Age (years) Frequency

10-19 16
20-29 27
30-39 23
40-49 24
50-59 23
60-69 19
70-79 15
The two-variable table
Table 1. Cases of Salmonella
Typhimurium-infection by age-group and sex,
Herøy, Norway, 1999

Age group Sex Total

(years) Male Female
0 - 9 7 5 12
10 - 19 5 5 10
20 - 29 5 5 10
30 - 39 1 4 5
40 - 49 2 3 5
50 - 59 0 3 3
60 - 69 2 1 3
70 - 2 4 6
Total 24 30 54

Descriptive Statistics_Lectures 3-6_2017 6

Contingency Table
The 2x2 table for a cohort study
Table 5. Association between fish consumption and
gastrointestinal illness among customers at Uncle Mike's Fish &
Chips, Cambridge, October 1 2000

Ill Well Total Attack rate

Ate fish 42 16 58 0.72
Did not eat fish 5 59 64 0.078

Relative risk: 9.3 (95% confidence interval 3.9 - 22)

Descriptive Statistics_Lectures 3-6_2017 7
Frequency Distribution Table
• Arrangement of data by rows & columns
• Useful to summarize data.
• Has two main columns.
• Column 1 lists all values of the variable.
• Column 2 the frequency at which each
value occurs.
• For initial data exploration
• Construction depends on type of variable

Descriptive Statistics_Lectures 3-6_2017 8

Frequency Table - Qualitative variable

• Ist column: different categories of the

variable (mutually exclusive)
• 2nd column: frequency or count with which
each category occurred.

Descriptive Statistics_Lectures 3-6_2017 9

Reasons for physicians not smoking

Reasons Frequency

• Health 25
• Religious 15
• Social 12
• Profession 5
• Others 3

Descriptive Statistics_Lectures 3-6_2017 10

Frequency Intervals
• Must not overlap-
Example
• 5-9 not 5 - 10
• 10 - 14 10 - 15
• 15 - 19 15 – 20

• Equal intervals easier to interpret but

• Unequal intervals may be used to illustrate specific attributes
of interest

Descriptive Statistics_Lectures 3-6_2017 11

Relative Frequency

• Proportion of total observations ascribed to

that value
• Divide frequency in the class interval by
total observation.

Descriptive Statistics_Lectures 3-6_2017 12

Cumulative Frequency
• Proportion of total observations with
certain value or less.

• Must correspond to end of class interval.

• Add up relative frequencies to preceding

values.

Descriptive Statistics_Lectures 3-6_2017 13

Frequency Distribution Table with One Variable
Table 1 Number of Cases of Primary and Secondary Syphilis
by Age Group, USA, 1989.
Age Group Cases

(in years) Frequency Cumulative Relative Frequency Cumulative

(Number) frequency (Percentage) Relative
Frequency
< 14 230 230 0.5 0.5
15-19 4378 4608 10.0 10.5
20-24 10,405 15013 23.6 34.1
25-29 9610 24623 21.8 55.9
30-34 8648 33271 19.6 75.5
35-44 6901 40172 15.7 91.2
45-54 2631 42803 6.0 97.2
> 55 1278 44081 2.9 100.0
Total 44,081 100 100.0%

Descriptive Statistics_Lectures 3-6_2017 14

Qualities of Frequency Tables

Simple Information (not more than 3 variables)

• Clear title to indicate what? when? where?
• Good labeling of rows & columns
• Indicate units of measurements
• Row, column & grand totals MUST add up

Descriptive Statistics_Lectures 3-6_2017 15

Terms used in constructing frequency table
• Classes (Class Intervals): categories of grouping data

• Frequency: It is the number of times a value or group of

values of a variable occurs i.e. number of observations
that fall in a class

• Frequency distribution: listing of all classes and their

frequencies

• Relative frequency: ratio of frequency of a class to the

total number of observations

• Relative-frequency distribution: listing of all classes and

their relative frequencies
Descriptive Statistics_Lectures 3-6_2017 16
Terms used in constructing frequency table …
• Class Limits – the end numbers of each class. Upper class
limit is the largest number of the class and Lower class
limit is the smallest number of the class
• Class Boundaries – When the upper limit of each class is
the same as the lower limit of the next class, the class
limits are referred to as Class Boundaries. This is obtained
by taking the midpoint of the upper limit of each class
and the lower limit of the next
• Class Mark – It is the midpoint between the lower and
upper class limits of a class
• Class Size (or Width) – It is the difference between the
upper and lower class boundaries
Descriptive Statistics_Lectures 3-6_2017 17
Graphical Presentation of Data
(Lecture 4)
Graphical Presentation
Need :
– To aid in visually exploring the data
– Diagram make better visual
impressions than numbers
• An adage says a picture is more than a
thousand words

Descriptive Statistics_Lectures 3-6_2017 19

Graphs/charts
Depends on type of data
• Quantitative or numerical data
– Histogram,
– frequency polygon,
– Cumulative frequency curve (known as “ogive”)
– Box plot
• Qualitative or categorical data
– Bar chart ,
– Pie chart

Descriptive Statistics_Lectures 3-6_2017 20

Histogram

• Plot of the frequency distribution

• Use to show data on interval scale or
continuous variables
• Slender rectangles adjoin each other
• Area under the histogram is equivalent to
the total frequency

Descriptive Statistics_Lectures 3-6_2017 21

Histogram – Purpose/uses
• Provides information on range of data
values
• Shows the location of the highest
concentration of measurement
• Reveals the presence or absence of
symmetry

Descriptive Statistics_Lectures 3-6_2017 22

Example 1: Represent the data on the age distribution
of adult admissions into UCH in June 1998.

Age (years) Frequency

10-19 16
20-29 27
30-39 23
40-49 24
50-59 23
60-69 19
70-79 15
Histogram of ages of adult admissions at
UCH, June 1998
Frequency
30

20
Frequency

0
10 - 19 20-29 30-39 40-49 50-59 60-69 70-79
Age

Descriptive Statistics_Lectures 3-6_2017 24

Frequency polygon
• Special line graph for frequency distribution
• Obtained by plotting frequencies against the
class marks
• From the histogram
– Plot the frequency at the midpoint
– Join the points with a straight line

Lecture 3 - 2015/2016 Session 25

FREQUENCY POLYGON
30

20
Frequency

0
10 20 30 40 50 60 70 80
Age (Yrs)

Lecture 3 - 2015/2016 Session 26

Cumulative frequency curve -ogive
• Plot of cumulative frequency against the upper class
boundaries, and joining all the consecutive points
• Additional point is obtained by plotting a frequency of
zero against the lowest lower boundary
• Used to show how many data values are accumulated up
to and including a specific class
• It may be applied to obtain measures of partition such as
– Quartiles
– Deciles
– percentiles

Lecture 3 - 2015/2016 Session 27

Ogive
156

136

116
Cummulative freq

16
19 29 39 49 59 69 79 89
Age (Yrs)

Lecture 3 - 2015/2016 Session 28

Box and Whisker Plot [ Box Plot]
• A simple but excellent tool for conveying location
and variation information in data sets
• It helps to display the symmetry properties of a
sample
• It can be used to visually describe the spread of
a sample
• It can also help identify possible outlying values
– that is, values that seem inconsistent with the rest of
the points in the sample.

Descriptive Statistics_Lectures 3-6_2017 29

Box and whiskers plot

Systolic 75th percentile

Blood
pressure
(mmHg) median
120

mean

25thpercentile

Descriptive Statistics_Lectures 3-6_2017 30

Box Plot …
• The box plot is interpreted as follows:
• The midline of a box plot is the median or 50th
percentile.
• The body or box portion of the plot is the
interquartile range going from the 25th percentile to
the 75th percentile.
– The interquartile range is the middle 50% of the data
– The width of the interquartile range is equal to:
• 75th Percentile – 25th Percentile
– The interquartile range is a robust measure of variability
Descriptive Statistics_Lectures 3-6_2017 31
Box Plot …
• How can the median, upper quartile, and lower
quartile be used to judge the symmetry of a
distribution?
1. If the distribution is symmetric, then the upper and lower
quartiles should be approximately equally spaced from
the median.
2. If the upper quartile is farther from the median than the
lower quartile, then the distribution is positively skewed.
3. If the lower quartile is farther from the median than the
upper quartile, then the distribution is negatively skewed
• Uses the 5-number summary indices
Descriptive Statistics_Lectures 3-6_2017 32
5-Number Summary Indices

•Arrange data in descending order

– Find Q1=1/4 data lies below this point
– Find Median= 1/2 data lies below this point
– Find Q3=3/4 data lies below this point
– Find Maximum score and
– Find Minimum score

Descriptive Statistics_Lectures 3-6_2017 33

Descriptive Statistics_Lectures 3-6_2017 34
Bar Chart
• Slender rectangles to represent frequency of
values of variable
• Rectangles are separate and distinct
• Height of rectangle correspond to frequency
• Types of bar chart
– Simple: consisting of a set of non-joining bars
– Component: like simple bar chart except that each
bar is split up into constituent parts
– Multiple: the component values are shown as
separate bars joined and always in the same
sequence

Descriptive Statistics_Lectures 3-6_2017 35

Example : Reasons For Physicians Not Smoking

» Reasons Frequency

• Health 25
• Religious 15
• Social 12
• Profession 5
• Others 3

Descriptive Statistics_Lectures 3-6_2017 36

Reasons for UCH physicians not smoking

0
Health Religious Social Profession Others

Descriptive Statistics_Lectures 3-6_2017 37

Pie Chart

• Graphical device consisting of a circle sub-divided

into sectors whose areas are proportional to the
whole quantity
• Use to show the components of a total
• More intelligent visual impressions sometimes

• Procedures:
– Draw a circle of any convenient radius, with a marked
centre to represent total observation.
– Divide circle into sectors according to the frequency of
each attribute.
– Use (n/N) x 3600 to represent each sector.
– Shade sectors in different colours to distinguish.
Descriptive Statistics_Lectures 3-6_2017 38
Pie Chart …
5%
8%

Health
42%
Religious
20%
Social
Profession
Others

25%

Descriptive Statistics_Lectures 3-6_2017 39

Numerical Summarisation of Data
(Lectures 5 & 6)
Types of Numerical Measures
Central Location / Position / Tendency - a
single value that represents (is a good
summary of) an entire distribution of data

Spread / Dispersion / Variability - how much

the distribution is spread or dispersed from
its central location

Descriptive Statistics_Lectures 3-6_2017 41

Central Location

20
? ?

15
Number of people

5
Spread
0
0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99

Age

Descriptive Statistics_Lectures 3-6_2017 42

Measures of central tendency
(Lecture 5)
Measures of Central Location
 Definition:
– a single value that represents (a good summary
of) an entire distribution of data

 Also known as:

– “Measure of central tendency”
– “Measure of central position”

 Common measures:
– Arithmetic mean
– Median
– Mode
Descriptive Statistics_Lectures 3-6_2017 44
Arithmetic Mean (Average)
• Most useful measure of central tendency
• Mean retains all original measurements
• Not good when data is skewed
• Sensitive to extreme values
– One data point could make a great change in
sample mean

• Calculation steps:
– Add up data
– Divide by number of observations (pieces of
data) i.e. sample size (n)

Descriptive Statistics_Lectures 3-6_2017 45

s/n Age
1 27 Raw data set:
2 30 Ages of students in a class (by years of
3 28
4 31 age)
5 28
6 36
7 29
Mean = Sum of all
8 37
9 29 observations
10 34
No of
n observations

x
11 30
12 30
13 27 i
14 30

X  i 1
15 28
16 31
17
18
19
32
30
29
n
20 29 –Usually pronounced “x bar”
Descriptive Statistics_Lectures 3-6_2017 46
Arithmetic Mean
Five systolic blood pressures (mmHg) (n = 5)
120, 80, 90, 110, 95
–Can be represented with math type notation:
x1= 120, x2 = 80, . . . , x5 = 95

–The sample mean is easily computed by adding up the five

values and dividing by five—in statistical notation the sample
mean is frequently represented by a letter with a line over it

x =120 + 80+90 + 110 +95

=99mmHg

Descriptive Statistics_Lectures 3-6_2017 47

PROCEDURE FOR CALCULATING MEAN
(Grouped data)
i. Find class mark for each interval

ii. Multiply class mark in each interval by

their corresponding frequencies

iii. Add results in (ii) across all intervals

iv. Divide results in (iii) by number of

observations or total frequency

Descriptive Statistics_Lectures 3-6_2017 48

Mean – Grouped Data
• The mean of the given data is calculated by
dividing the sum of all observations by the
number of observations
• If x1, x2, x3 .......xn are the given observations
with their respective frequencies f1, f2,
f3 .......fn then,
• Sum of observations = f1x1 + f2x2 + ... + fnxn
• Sum of frequencies = f1 + f2 + f3 + ... + fn
Descriptive Statistics_Lectures 3-6_2017 49
Mean – Grouped Data

Descriptive Statistics_Lectures 3-6_2017 50

Example
• Compute the mean of the data below
Age Interval Frequency (f)
20 -29 2
30 -39 3
40-49 3
50 -59 2
Total 10

Descriptive Statistics_Lectures 3-6_2017 51

Solution
Age Interval Frequency (f) Class Mark (X) fx

20 -29 2 24.5 49
30 -39 3 34.5 103.5
40-49 3 44.5 133.5
50 -59 2 54.5 109
Total 10 395
n

fx i i
395
x i 1
n
 39.5 years
10
f
i 1
i

Descriptive Statistics_Lectures 3-6_2017 52

Median
• The median is the middle number (also called the
50th percentile or second quartile)
– Other percentiles/quartiles can be computed as well, but
are not measures of centre
80 90 95 110 120

• Best measure of central tendency when data is
skewed
• Concentrates on ranks of values rather than
absolute values
• Unaffected by extreme values
– For example, if 120 became 200, the median would remain the same, but
the mean would change to 115
• It is determined mainly by middle value(s) in a sample and
insensitive to other values unlike the Arithmetic mean that uses
all observations Descriptive Statistics_Lectures 3-6_2017 53
Calculation of median values for
ungrouped data
• Steps
– Arrange observations in ascending or
descending order

– If n is odd : Pick observation in the middle as

median

– If n is even : take arithmetic mean of two

middle observations

Descriptive Statistics_Lectures 3-6_2017 54

EXAMPLE ON MEDIAN
Find the Median of age at marriage of ten
pregnant women seen at ANC:
42, 52, 31, 35, 50, 40, 27, 43, 35, 28

Step 1: Arrange in Ascending Order

27, 28, 31, 35, 35, 40, 42, 43, 50, 52

Step 2: Pick the middle observations

(35+40)/2
= 37.5 years

The Median (P50) is the value that separates the lower 50%
from the upper 50% of the observations
Median – Grouped Data
Step 1: Construct the cumulative frequency distribution
Step 2: Decide the class that contain the median
Median Class is the first class with the value position of
cumulative frequency equal at least n/2
Step 3: Find the median by using the following formula:
n  cf
Median lb  ( 2 )w
fm
n = total frequency
Lb = the lower class boundary of the median class
cf =cumulative frequency of the class preceding the median class
fm = the frequency of the median class
w =class width or class size

The median class is the class interval whose cumulative frequency is

nearly equal to n/2; if is odd, replace n/2 with (n+1)/2 in the equation
Descriptive Statistics_Lectures 3-6_2017 56
Recall the example on mean, and find its
median
Age Class Frequency Cumulative
Interval Boundaries (f) frequency (cf)
20 -29 19.5 – 29.5 2 2
30 -39 29.5 – 39.5 3 5
40-49 39.5 – 49.5 3 8
50 -59 49.5 - 59.5 2 10
Total 10

10  2
Median 29.5  ( 2 )10 39.5
3

Descriptive Statistics_Lectures 3-6_2017 57

MODE
Definition: Mode is the value that occurs most
frequently
• Least used measure of central tendency
• The observation that occurs most frequently
• Easiest measure to understand, explain,
identify
• Mathematical properties rather intractable
• Modes may not exist if there are many large
number of possible values
• May be more than one mode
• Insensitive to extreme values (outliers)
• Does not use all the data
Descriptive Statistics_Lectures 3-6_2017 58
Method for identification:

1. Arrange data into a frequency distribution

or histogram, showing all values of the
variable and the frequency with which
each value occurs

2. Identify the value that occurs most often

Descriptive Statistics_Lectures 3-6_2017 59

Ob
s Age
1
2
27
27
Mode
3 28
4 28 The most frequent value of the variable
5 28
6 29 Mode = 30
7 29 7
8 29
6
9 29
10 30 5
Frequency

11 30
4
12 30
13 30 3
14 30
2
15 31
16 31 1
17 32 2
18 34 27 8 29 30 31 32 33 34 35 36 37
19 36
Age (years)
20 37

Descriptive Statistics_Lectures 3-6_2017 60

Mode – Grouped Data
Mode
•Mode is the value that has the highest frequency in a data set.
•For grouped data, class mode (or, modal class) is the class with the highest frequency.
•To find mode for grouped data, use the following formula:

1
Mode lbWhere:
( )w
1   2
w = is the class
width
1 = fm – f1 is the difference between the frequency of class mode (f m)
and the frequency of the class preceding/before the class mode
2 = f – f is the difference between the frequency of class mode (f m) and
m 2
the frequency of the class succeeding/after the class mode

lm is the lower boundary of class mode

06/21/2025 Lecture 4 - Numerical measure of data I 61
Calculation of Grouped Data - Mode
Example: Based on the grouped data below, find the mode
Time to travel to work Frequency
1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7

Solution:
Based on the table,

Lmo = 10.5, 1 = (14 – 8) = 6,  2 = (14 – 12) = 2 and

6
Mode 10.5  ( )10 18
62
06/21/2025 Lecture 4 - Numerical measure of data I 62
Mode can also be obtained from a histogram.
Step 1: Identify the modal class and the bar representing it
Step 2: Draw two cross lines as shown in the diagram.
Step 3: Drop a perpendicular from the intersection of the
two lines until it touch the horizontal axis.
Step 4: Read the mode from the horizontal axis

06/21/2025 Lecture 4 - Numerical measure of data I 63

Geometric Mean
• Useful in laboratory data whereby values are
concentrations of one substance in another,
assessed by dilution techniques in multiples of a
standard number
• The geometric mean of positive n observations is
the nth root of the product of the observations.
That is;

06/21/2025 Lecture 4 - Numerical measure of data I 64

Relationship among the three main
measures of location

• Relationship between mean and median are useful

in assessing symmetry of distribution of data
– Mean=Median=Mode implies symmetry
– Mean>Median>Mode implies positive skewness
– Mean<Median<Mode implies negative skewness

Descriptive Statistics_Lectures 3-6_2017 65

Comparison of Mode, Median and Mean

Symmetrical:
Mode = Median = Mean

Skewed right:
Mode < Median < Mean

Skewed left:
Mean < Median < Mode

Descriptive Statistics_Lectures 3-6_2017 66

Pearson’s Measure of Skewness

• Skewness = (Mean-Median)
Standard Deviation
• Values:
1. Zero, if a perfect Symmetrical distribution
2. Negative, when negatively skewed or skewed to
the left
3. Positive, when positively skewed or skewed to
the right

Descriptive Statistics_Lectures 3-6_2017 67

Measures of Partition
• These are descriptive measures commonly
used for ORDERED observations
– Quartiles
– Deciles
– Percentiles

Descriptive Statistics_Lectures 3-6_2017 68

Measures of Partition
• Percentiles: divides a set of ordered observations
into 100 equal parts
– 20th percentile is the value below which 20% of the
observations lie.
• Deciles: divides a set of ordered observations into
10 equal parts
• Quartiles: divides a set of ordered observations into
4 equal parts
– Q1(1st quartile, 25th percentile), Q2 (2nd quartile, median,
50th percentile), Q3 (3rd quartile, 75% percentile)
Descriptive Statistics_Lectures 3-6_2017 69
For grouped data
Quartile Percentile Position Position
(odd) (even)
Q1 P25 (n+1)/4 n/4 middle observation of the lower half of
observations i.e.
The value that separates the lower 25%
from the upper 75% of the observations

Q2 P50 (n+1)/2 n/2 Middle observation

Q3 P75 3(n+1)/4 3n/4 middle observation of the upper half of
observations i.e.
The is the value that separates the lower
75% from the upper 25% of the
observations.

kn  cf
Pk lb  ( 100 ) w; k 1,2,...,99
fp
kn  cf
Qk lb Descriptive 4
( Statistics_Lectures
)w ; k 1,2,3
fq 3-6_2017 70
Recall that Interquartile Range is a
function of Q1 and Q3
20
18
16
14
12
N 10
8
6
4
2
0

Q1 Q3

Interquartile Interval
Descriptive Statistics_Lectures 3-6_2017 71
Quartiles
Using the same method of calculation as in the Median,
we can get Q1 and Q3 equation as follows:

n  cf 3n  cf
Q1 lb  ( 4 )w Q3 lb  ( 4 )w
fq fq

Example: Based on the grouped data below, find the Interquartile Range

Time to travel to work Frequency

1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7

Descriptive Statistics_Lectures 3-6_2017 72

Solution:
1st Step: Construct the cumulative frequency distribution

Time to travel Frequency Cumulative

to work Frequency
1 – 10 8 8
11 – 20 14 22
21 – 30 12 34
31 – 40 9 43
41 – 50 7 50
2nd Step: Determine the Q1 and Q3

n 50 n  cf
Class Q1   12.5
4 4 Q1 lb  ( 4 )w
fq
Class Q1 is the 2nd class
Therefore,
12.5  8
Q1 10.5  ( )10 13.7143 min s
14

Descriptive Statistics_Lectures 3-6_2017 73

3n 3 50  3n  cf
Class Q3 
4

4
37.5
Q3 lb  ( 4 )w
fq
Class Q3 is the 4th class
Therefore,
37.5  34
Q 3 30.5  ( )10 34.3889 min s
9
Interquartile Range

IQR = Q3 – Q1

Therefore IQR = Q3 – Q1

= 34.3889 –
13.7143
= 20.6746
Descriptive Statistics_Lectures 3-6_2017 74
Measures of Variation/Dispersion/Spread
(Lecture 6)
Distributional Spread from the Centre

High

Low
Descriptive Statistics_Lectures 3-6_2017 76
Measures of dispersion / spread / variation

• Range

• Interquartile range

• Variance

• Standard Deviation

• Coefficient of Variation

Descriptive Statistics_Lectures 3-6_2017 77

Range
• The difference between the smallest
observation (minimum value) and the largest
observation (maximum value) in a set of data
• range = maximum – minimum
• Rely on only 2 extreme values
• Easy to calculate
• But it is affected by outliers

Descriptive Statistics_Lectures 3-6_2017 78

Range

Minimum Maximum

Range
Descriptive Statistics_Lectures 3-6_2017 79
Inter-quartile Range.

• The interquartile range is the difference

between the 25th percentile (1st quartile) and
the 75th percentile (3rd quartile) in a set of data
– Difference between 3rd quartile and Ist quartile.
• Concentration on the middle 50% of the
ordered observations
– i.e. it gives an idea of the middle 50 percent of the
observations
• Not affected by outliers.
Descriptive Statistics_Lectures 3-6_2017 80
Median Mode
14

8
N
6

1st quartile 3rd quartile

Minimum Interquartile interval Maximum

Range

Descriptive Statistics_Lectures 3-6_2017 81

Variance

• Mean squared deviations from the mean

value.
• Square of standard deviations.

• Units of measurement in square of original

units S2 =  (xI - x)2

n -1

Descriptive Statistics_Lectures 3-6_2017 82

Standard Deviation

• Square root of variance

• Best measure of variation or
dispersion
• Unit same as original units
• Amenable to mathematical and
statistical manipulations
Descriptive Statistics_Lectures 3-6_2017 83
Steps to Calculate Variance and
Standard Deviation
x : mean
xi : value
å( x i - x )²
n : number s² =
s²: variance
s : standard deviation
n-1
1. Calculate the arithmetic mean x
2. Subtract the mean from each observation. xi- x
3. Square the difference.
( x i - x )²
å( x i - x )²
4. Sum the squared differences

5. Divide the sum of the squared differences by n – 1

6. Take the square root of the variance

s = s2
Descriptive Statistics_Lectures 3-6_2017 84
Example:
The frequency distribution of the
weight of 100 patients with
Rheumatoid Arthritis is as follows:

Weight (kgs) Frequency Class-Mid-Mark

60 - 69 5 64.5

70 - 79 15 74.5

80 - 89 20 84.5

90 - 99 25 94.5

100 - 109 20 104.5

110 - 119 15 114.5

Calculate the mean, variance and standard deviation

Descriptive Statistics_Lectures 3-6_2017 85
SOLUTION
Mean =  fI xI = 5(64.5) + 15(74.5) + 20(84.5) + 25(94.5) + 20(104.5) + 15(114.5)
 fI 100

= 322.5 + 1117.5 + 1690 + 2362.5 + 2090 + 1717.5= 9300

100 100

= 93 kgs

Variance =  fi (xI - x)2 = 5(64.5-93)2 +... + 15(114.5-93)2

 fI - 1 100 - 1

= 20275 = 204.798 kg2

Standard deviation =  fi (xI - x)2 = 20275 = 14.31kgs

 fI - 1 99

Descriptive Statistics_Lectures 3-6_2017 86

Example 2: Find the variance and standard deviation for the following data:
No. of order f
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
Total n = 50
Solutio
n:
No. of order f x fx fx2

10 – 12 4 11 44 484
13 – 15 12 14 168 2352
16 – 18 20 17 340 5780
19 – 21 14 20 280 5600

Total n = 50 832 14216

Descriptive Statistics_Lectures 3-6_2017 87

Variance,
 fx 
2

2
 fx 2

n
s 
n 1
832 
2

14216 
 50
50  1
7.5820
2
Standard Deviation, s  s  7.5820 2.75

Thus, the standard deviation of the number of orders

received at the office of this mail-order company
during the past 50 days is 2.75.

Descriptive Statistics_Lectures 3-6_2017 88

–Often abbreviated S, SD or sd
–The smaller the s, the lesser the
variability and the better the statistic
becomes
– s measures the spread about the mean
– s can equal 0 only if there is no spread
– All n observations have the same
value
Descriptive Statistics_Lectures 3-6_2017 89
–The units of s are the same as the units
COEFFICIENT OF VARIATION:

It is a measure of spread that corrects for differences in

magnitude or units of observations
– It is dimensionless thereby useful for comparing the spread
of two or more data sets efficiently, when the units are
different

– The lower the coefficient of variation , the smaller the

spread
It is defined as the ratio of standard deviation to the
mean of a data set; mathematically expressed as:

Descriptive Statistics_Lectures 3-6_2017 90

Choosing appropriate descriptive statistics
For single-peaked and symmetric distribution
– Position
• mean, median and mode are identical or nearly
equal
– Dispersion
• Standard Deviation
For data with significant outliers
– Position
• median is more informative than the mean
– Dispersion
• Range
• Interquartile interval

Descriptive Statistics_Lectures 3-6_2017 91

Review question
• What summary tools are the most appropriate
to use for the following sets of data?
– Salaries of physicians in a clinic
– Test scores of all students in a qualifying exam
– Serum sodium levels of healthy individuals
– Presence of diahrroea in a group of children
– Disease stage of cervical cancer patients in UCH

Descriptive Statistics_Lectures 3-6_2017 92

Exercise:
Based on the grouped data below, find the mean,
median, standard deviation, coefficient of variation,
and Pearson measure of skewness

Time to travel to work Frequency

1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7

Descriptive Statistics_Lectures 3-6_2017 93

Assignment:
Based on the frequency distribution table below, find
the mean, median, interquartile range, standard
deviation, coefficient of variation, and Pearson
measure of skewness
Distribution of the number of previous pregnancies of a
group of women aged 30–34 attending an antenatal clinic
No. of previous No. of women
pregnancies
0 18
1 27
2 31
3 19
4 5

Descriptive Statistics_Lectures 3-6_2017 94

Solution For "Financial Statement Analysis" Penman 5th Edition
64% (28)
Solution For "Financial Statement Analysis" Penman 5th Edition
16 pages
Staticus: Math 103 Lecture 9 Class Notes
No ratings yet
Staticus: Math 103 Lecture 9 Class Notes
4 pages
Statistics Review Worksheet-1a
No ratings yet
Statistics Review Worksheet-1a
6 pages
Origin and Growth of Statistics
No ratings yet
Origin and Growth of Statistics
18 pages
Frequency Distribution and Data: Types, Tables, and Graphs: What Is Descriptive Statistics?
No ratings yet
Frequency Distribution and Data: Types, Tables, and Graphs: What Is Descriptive Statistics?
19 pages
Graphical Representation of Data
No ratings yet
Graphical Representation of Data
6 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
MATH 322: Probability and Statistical Methods
No ratings yet
MATH 322: Probability and Statistical Methods
27 pages
Session-4-5-6-Statistics For Data Analytics-Dr - Girish - Bagale - IsAGx5vCqq
No ratings yet
Session-4-5-6-Statistics For Data Analytics-Dr - Girish - Bagale - IsAGx5vCqq
21 pages
QM Statistic Notes
No ratings yet
QM Statistic Notes
24 pages
02 - BIOE 211 - Data Presentation (Compressed)
No ratings yet
02 - BIOE 211 - Data Presentation (Compressed)
37 pages
2 Organizing and Visualizing Variables
No ratings yet
2 Organizing and Visualizing Variables
36 pages
BUSINESS STATISTICS - Unit-2
No ratings yet
BUSINESS STATISTICS - Unit-2
23 pages
Statistics - 4th Form 2023
No ratings yet
Statistics - 4th Form 2023
3 pages
Week 2 Data Presentation
No ratings yet
Week 2 Data Presentation
37 pages
Biostatistics Module 3
No ratings yet
Biostatistics Module 3
9 pages
Lectures 3 - 6 (Descriptive Statistics) - 2018
No ratings yet
Lectures 3 - 6 (Descriptive Statistics) - 2018
89 pages
Statistics Lec 2
No ratings yet
Statistics Lec 2
25 pages
Polestico - Assessment Report
No ratings yet
Polestico - Assessment Report
15 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
14 pages
Course: Biostatistics: Haramaya University, Chms
100% (1)
Course: Biostatistics: Haramaya University, Chms
49 pages
2 Lecture 2 Organizing and Displaying of Data
No ratings yet
2 Lecture 2 Organizing and Displaying of Data
37 pages
Module 3 Data Presentation
No ratings yet
Module 3 Data Presentation
9 pages
Graphical Presentation
No ratings yet
Graphical Presentation
27 pages
STA112 Week 2 Class Note
No ratings yet
STA112 Week 2 Class Note
102 pages
Unit 4 Quantitative Analysis and Interpretation
No ratings yet
Unit 4 Quantitative Analysis and Interpretation
10 pages
BIOL 2163 Lecture 2 - Summarizing and Graphing Data
No ratings yet
BIOL 2163 Lecture 2 - Summarizing and Graphing Data
59 pages
Lecture 1 PDF
No ratings yet
Lecture 1 PDF
14 pages
1st Mid
No ratings yet
1st Mid
19 pages
1 Stats Intro 14022024 105127am
No ratings yet
1 Stats Intro 14022024 105127am
26 pages
Intro To Statistics
No ratings yet
Intro To Statistics
38 pages
Chap 3. Data Presentation
No ratings yet
Chap 3. Data Presentation
72 pages
Elementary Statistics Ch.1
No ratings yet
Elementary Statistics Ch.1
45 pages
Business Statistics - Chapter 2
No ratings yet
Business Statistics - Chapter 2
112 pages
Prof Ed 6 Lesson 7
No ratings yet
Prof Ed 6 Lesson 7
31 pages
Final Term Notes Ands
No ratings yet
Final Term Notes Ands
43 pages
Chapter 3 Descriptive Biostatistics
No ratings yet
Chapter 3 Descriptive Biostatistics
103 pages
Week 2.1 Data Presentation
No ratings yet
Week 2.1 Data Presentation
40 pages
1 Introduction of The Nature of Statistics and Frequency Distributions and Graph
No ratings yet
1 Introduction of The Nature of Statistics and Frequency Distributions and Graph
13 pages
Finals RT Core 3
No ratings yet
Finals RT Core 3
25 pages
Describing Data - Frequency Distribution
No ratings yet
Describing Data - Frequency Distribution
15 pages
2 - Presenting Data Part
No ratings yet
2 - Presenting Data Part
42 pages
Presentation of Data
No ratings yet
Presentation of Data
10 pages
Descriptive Statistics I
No ratings yet
Descriptive Statistics I
33 pages
PLU Quantitative Techniques 2
No ratings yet
PLU Quantitative Techniques 2
20 pages
(Mycology Series 16) D.H. Howard-Pathogenic Fungi in Humans and Animals-Marcel Dekker (2003)
100% (1)
(Mycology Series 16) D.H. Howard-Pathogenic Fungi in Humans and Animals-Marcel Dekker (2003)
804 pages
2.data Presentation
No ratings yet
2.data Presentation
26 pages
2. presenting of data - ١١١٠٥٩
No ratings yet
2. presenting of data - ١١١٠٥٩
39 pages
Data Presentation
No ratings yet
Data Presentation
37 pages
Drilling 1 PDF
No ratings yet
Drilling 1 PDF
14 pages
Organizing-Data 250120 180858
No ratings yet
Organizing-Data 250120 180858
32 pages
Learning Objectives: Introduction W
No ratings yet
Learning Objectives: Introduction W
238 pages
Business Plan Group 3
100% (1)
Business Plan Group 3
12 pages
CL - Concepts - m3
No ratings yet
CL - Concepts - m3
34 pages
Sec Registration of Representative Office: Basic Requirements To Have
No ratings yet
Sec Registration of Representative Office: Basic Requirements To Have
8 pages
STEP 7 V56 - Compatibility List
No ratings yet
STEP 7 V56 - Compatibility List
31 pages
HR Interview Questions
No ratings yet
HR Interview Questions
8 pages
Material Safety Data Sheet Avafulflow
No ratings yet
Material Safety Data Sheet Avafulflow
4 pages
Impulse Invariance and Bilinear
No ratings yet
Impulse Invariance and Bilinear
8 pages
Lesson 6 Presentation of Data
No ratings yet
Lesson 6 Presentation of Data
73 pages
Quantitative data analysis
No ratings yet
Quantitative data analysis
42 pages
Unit 2
No ratings yet
Unit 2
11 pages
Lecture 4 Graphical Presentation of Data
No ratings yet
Lecture 4 Graphical Presentation of Data
14 pages
(Spmsoalan) Soalan KBAT Bio 2
No ratings yet
(Spmsoalan) Soalan KBAT Bio 2
5 pages
Lecture 3_6_2025_edit
No ratings yet
Lecture 3_6_2025_edit
90 pages
Dpp06dstructuralisomerism Emerge
No ratings yet
Dpp06dstructuralisomerism Emerge
4 pages
Lecture 2 organizing and displaying of data
No ratings yet
Lecture 2 organizing and displaying of data
35 pages
LEVEL 3: Scope and Sequence: Big Question
No ratings yet
LEVEL 3: Scope and Sequence: Big Question
4 pages
STA 111 Lecture Two
No ratings yet
STA 111 Lecture Two
9 pages
Lecture 5 introduction to statistics
No ratings yet
Lecture 5 introduction to statistics
54 pages
Cement Outline.05
No ratings yet
Cement Outline.05
2 pages
Frequency Distribution
100% (2)
Frequency Distribution
25 pages
0 - Ritu Sharma Old CV
No ratings yet
0 - Ritu Sharma Old CV
2 pages
PNL Account Cashflow Forecast: Missing Values
No ratings yet
PNL Account Cashflow Forecast: Missing Values
5 pages
Graphical Representation of Data Word
No ratings yet
Graphical Representation of Data Word
10 pages
Learner Cala Gu-Wps Office
0% (1)
Learner Cala Gu-Wps Office
3 pages
Szymanowski List of Compositions
No ratings yet
Szymanowski List of Compositions
12 pages
FLC Provider Database
0% (1)
FLC Provider Database
15 pages
Basic Technology Exam Questions For Jss2 Second Term
No ratings yet
Basic Technology Exam Questions For Jss2 Second Term
6 pages
SanyaMidha FullStackWebDeveloper Resume
100% (1)
SanyaMidha FullStackWebDeveloper Resume
1 page
Class 12 Physics Electricity Experiment
No ratings yet
Class 12 Physics Electricity Experiment
18 pages
Grainger Shows Strong End Market
No ratings yet
Grainger Shows Strong End Market
26 pages
Manual7298631 Dell Color Management User S Guide For Macos
No ratings yet
Manual7298631 Dell Color Management User S Guide For Macos
13 pages
De Thi 100 - Fix
No ratings yet
De Thi 100 - Fix
6 pages
Prolegomenon To Geisha As A Cultural Performer: Miyako Odori, The Gion School and Representation of A Traditional" Japan - Mariko Okada
No ratings yet
Prolegomenon To Geisha As A Cultural Performer: Miyako Odori, The Gion School and Representation of A Traditional" Japan - Mariko Okada
7 pages
Radiant July 2018
No ratings yet
Radiant July 2018
18 pages
CHEMISTRY Exam
No ratings yet
CHEMISTRY Exam
8 pages
Document 1
No ratings yet
Document 1
4 pages
Changing Levels of Meaning and Experience - Steve Andreas
No ratings yet
Changing Levels of Meaning and Experience - Steve Andreas
5 pages