2.
Method of data collection and presentation
Source and Types of Data
There are two types of data:
1. Primary Data :
Data collected by the investigator directly from the source.
Example: observe signs, measure characteristics, record symptoms and
interview respondents, etc.
2. Secondary Data :
Data gathered or compiled from published and unpublished sources or
files.
Example: Hospital records, vital statistics and registers, etc.
When our source is secondary data check that:
The type and objective of the situations.
The purpose for which the data are collected and compatible with the
present problem.
The nature and classification of data is appropriate to our problem.
There are no biases and misreporting in the published data.
1 Basics Statistics (By:Admasu M.) 11/6/2024
1.2.2. Methods of Data Collection
There are three major methods of data collection:
1. Observational or measurement.
2. Interview with questionnaires.
Face to face interview.
Telephone interview.
Self-administered questionnaires returned by mail (mailed
questionnaire).
3. The use of documentary sources
. Official Records: Government or institutional reports, census data,
court records, and public health records.
Personal Documents: Letters, diaries, autobiographies, and personal
notes.
Media: Newspapers, magazines, radio, and television transcripts.
Academic Literature: Previous research studies, academic journals, ..
Digital Sources: Websites, online databases, social media content, and
digital archives.
2 Basics Statistics (By:Admasu M.) 11/6/2024
1.2.3 Methods Of Data
Presentation
After having the collected data, the next important step is to
organize it.
That is to present it in a readily comprehensible condensed form
that aids to draw inferences from it.
The presentation of data is broadly classified in to the following
three categories:
Tabular presentation (frequency distribution).
Diagrammatical presentation and
Graphical presentation.
3 Basics Statistics (By:Admasu M.) 11/6/2024
1 Tabular Presentation of Data (Frequency
Distribution)
Definitions:
Raw data: is a data which is collected in original form (survey), whether it may be
counts or measurements.
Frequency (f): is the number of values in a specific class of a distribution.
Frequency distribution(FD): is the organization of raw data in table form, using classes
and frequencies.
Depending on the type of data, there are two basic types of frequency distributions:
Qualitative (Categorical) frequency distribution and
Quantitative frequency distribution Ungrouped frequency distribution.
Grouped frequency distribution.
4 Basics Statistics (By:Admasu M.) 11/6/2024
2. Categorical (Qualitative) frequency
Distribution:
It is often constructed for some data sets that can be placed
in a specific categories such as nominal, or ordinal data's.
Example: A social worker collected the following data on
marital status for 25 persons. (𝑀 = 𝑚𝑎𝑟𝑟𝑖𝑒𝑑, 𝑆 = 𝑠𝑖𝑛𝑔𝑙𝑒,
𝑊 = 𝑤𝑖𝑑𝑜𝑤𝑒𝑑, 𝐷 = 𝑑𝑖𝑣𝑜𝑟𝑐𝑒𝑑). Construct a frequency
distribution for the following data.
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
5 Basics Statistics (By:Admasu M.) 11/6/2024
Solution:
Since the data are qualitative (categorical), discrete classes can be used.
There are four types of marital status M, S, D, and W.
These types will be used as the classes for the distribution.
Classes Frequency (f) tally
M 6 //// /
S 7 //// //
D 7 //// //
W 5 ////
6 Basics Statistics (By:Admasu M.) 11/6/2024
2. Quantitative frequency Distribution
a ) Ungrouped frequency Distribution:
It is often constructed for some data sets in which the number
of "distinct values" are small.
it is constructed for small set or data on discrete variable.
Example: Construct an ungrouped frequency distribution for
the following data.
0 2 2 1 1 2
3 5 3 2 2 2
1 0 1 2 4 2
0 1 0 1 4 4
2 2 0 1 1 5
7 Basics Statistics (By:Admasu M.) 11/6/2024
Solution:
First arrange the data in order of magnitude (in ascending order)
and then count the frequency.
The distinct values for these data are: 0,1,2,3,4 & 5. =>
𝑠𝑚𝑎𝑙𝑙 No of cups Frequency (f)
0 5
1 8
2 10
3 2
4 3
5 2
Total 30
8 Basics Statistics (By:Admasu M.) 11/6/2024
b ). Grouped frequency Distribution
When the number of "distinct values" of the data is too large,
the data must be grouped in to classes.
So, we divide the values into groups or class intervals, and
then count the number of data values falling in each class
interval.
Class intervals (CI): are a non-overlapping intervals such that
each value in the set of observations can be placed in one, and
only one, of the intervals.
9 Basics Statistics (By:Admasu M.) 11/6/2024
Grouped frequency Distribution
Grouped Frequency Distribution: a frequency distribution when
several numbers are grouped in one class.
Class limits: Separates one class in a grouped frequency
distribution from another. The limits could actually appear in the
data and have gaps between the upper limits of one class and lower
limit of the next.
Units of measurement (U): the distance between two possible
consecutive measures. It is usually taken as 1, 0.1, 0.01, 0.001, -----.
Class boundaries: Separates one class in a grouped frequency
distribution from another.
The lower class boundary is found by subtracting U/2 from the
corresponding lower class limit and the upper class boundary is
found by adding U/2 to the corresponding upper class limit.
10 Basics Statistics (By:Admasu M.) 11/6/2024
Cont.…
Class width: the difference between the upper and lower class boundaries
of any class. It is also the difference between the lower limits of any two
consecutive classes or the difference between any two consecutive class
marks.
Class mark (Mid points): it is the average of the lower and upper class
limits or the average of upper and lower class boundary.
Cumulative frequency: is the number of observations less than/more than
or equal to a specific value.
Cumulative frequency above: it is the total frequency of all values greater
than or equal to the lower class boundary of a given class.
Cumulative frequency blow: it is the total frequency of all values less than
or equal to the upper class boundary of a given class.
11 Basics Statistics (By:Admasu M.) 11/6/2024
Cont…
Cumulative Frequency Distribution (CFD): it is the tabular arrangement of
class interval together with their corresponding cumulative frequencies.
Relative frequency (rf): it is the frequency divided by the total frequency.
Relative cumulative frequency (rcf): it is the cumulative frequency
divided by the total frequency.
12 Basics Statistics (By:Admasu M.) 11/6/2024
Cont.…
Guidelines for classes
There should be between 5 and 20 classes.
The classes must be mutually exclusive. This means that no
data value can fall into two different classes
The classes must be all inclusive or exhaustive. This means that
all data values must be included.
The classes must be continuous. There are no gaps in a
frequency distribution.
The classes must be equal in width. The exception here is the
first or last class. It is possible to have an "below ..." or "... and
above" class. This is often used with ages.
13 Basics Statistics (By:Admasu M.) 11/6/2024
Steps for constructing Grouped frequency
Distribution
1. First arrange the data in ascending order.
2. Find the range (R) : 𝑹 = 𝑴𝒂𝒙𝒊𝒎𝒖𝒎 − 𝑴𝒊𝒏𝒊𝒎𝒖𝒎
3. Find the number of class intervals (k): It should be between 5 and 20. i.e. 5 ≤ 𝑘 ≤ 20
or 𝒖𝒔𝒆 𝑺𝒕𝒖𝒓𝒈𝒆′𝒔 𝒇𝒐𝒓𝒎𝒖𝒍𝒂: 𝒌 = 𝟏 + 𝟑. 𝟑𝟐𝟐 𝒙 𝐥𝐨𝐠 𝟏𝟎 𝒏.
where: k is the number of class intervals desired and n is the total number of
observations.
k must be rounded up/down to the nearest whole number.
4. Find the class width (w): It is the gap between two consecutive class
intervals.
𝑹
𝒘= and it is always rounded up.
𝒌
When the data is given as
Whole number "w" always rounded up to the next whole number. e.g. 𝑤 = 4.13 ≈ 5
Tenth digit "w" always rounded up to the next tenth digit. For e.g. 𝑤 = 0.325 ≈ 0.4.
Hundredth digit "w" always rounded up to the next hundredth digit. For e.g.
𝑤 = 2.532 ≈ 2.54 ; 𝑤 = 0.981 ≈ 0.99.
14 Basics Statistics (By:Admasu M.) 11/6/2024
5. Find the class limits (CL): These are extreme values for each
class. They are called lower and upper class limits.
Lower class limit (LCL): The LCL of the first class interval
should be equal to or smaller than the smallest observation in the
data. i.e. 𝒍𝒄𝒍𝟏 ≤ 𝒕𝒉𝒆 𝒔𝒎𝒂𝒍𝒍𝒆𝒔𝒕 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏 => 𝒍𝒄𝒍𝟏 =
𝒕𝒉𝒆 𝒔𝒎𝒂𝒍𝒍𝒆𝒔𝒕 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏.
Continue to add the class width to this lower limit to get the
rest of the lower limits. i.e. 𝒍𝒄𝒍𝒊+𝟏 = 𝒍𝒄𝒍𝒊 + 𝒘 , 𝑖=
1,2, … , 𝑘 − 1.
Upper class limit (UCL): To find the upper class limit of the first
class, subtract "𝒖" from the lower limit of the second class.
𝑖. 𝑒. 𝒖𝒄𝒍𝟏 = 𝒍𝒄𝒍𝟐 − 𝒖.
Then continue to add the class width to this upper limit to get
the rest of the upper class limits. i.e. 𝒖𝒄𝒍𝒊+𝟏 = 𝒖𝒄𝒍𝒊 + 𝒘 ,
𝑖 = 1,2, … , 𝑘 − 1.
15 Basics Statistics (By:Admasu M.) 11/6/2024
6. Find the class boundary and Class mark
Class boundaries (CB): are the set of exact limits or true limits.
They are called lower and upper class boundaries.
Lower class boundary (LCB): The lcb is obtained by subtracting
half the unit of measurements from the lcl of the class. i.e.
𝒖
𝒍𝒄𝒃𝒊 = 𝒍𝒄𝒍𝒊 − , 𝒍𝒄𝒃𝒊+𝟏 = 𝒍𝒄𝒃𝒊 + 𝒘
𝟐
Upper class boundary (UCB): The ucb is obtained by adding half
the unit of measurements from the ucl of the class. i.e.
𝒖
𝒖𝒄𝒃𝒊 = 𝒖𝒄𝒍𝒊 + 𝑵𝒐𝒕𝒆: 𝒖𝒄𝒃𝒊+𝟏 = 𝒖𝒄𝒃𝒊 + 𝒘
𝟐
Class marks (mid points) (m): It is the average of lcl and ucl or
lcb and ucb.
𝒍𝒄𝒍𝒊 +𝒖𝒄𝒍𝒊 𝒍𝒄𝒃𝒊 +𝒖𝒄𝒃𝒊
𝒎𝒊 = 𝒐𝒓 𝒎𝒊 = 𝑵𝒐𝒕𝒆: 𝒎𝒊+𝟏 = 𝒎𝒊 + 𝒘
𝟐 𝟐
7. Find the number of frequency in each class
16 Basics Statistics (By:Admasu M.) 11/6/2024
Example:
Construct a complete grouped frequency distribution for the
following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
17 Basics Statistics (By:Admasu M.) 11/6/2024
Solution:
Step 1: Obtain the minimum and maximum observation
Step 2: Find the range (R) : 𝑅 = 𝑀𝑎𝑥 − 𝑀𝑖𝑛 = 39 − 6 = 33.
Step 3: Select the number of classes desired using Sturge's formula;
𝑘 = 1 + 3.322 𝑥 𝑙𝑜𝑔 𝑛 = 𝑘
= 1 + 3.322 𝑥 𝑙𝑜𝑔 20 = 5.32 ≈ 6 (𝑟𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑢𝑝) .
Step 4: Find the class width;
𝑅 33
𝑤 = = 𝑤 = = 5.5 ≈ 6 𝑟𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑢𝑝
𝑘 6
Step 5: Find the lower and the upper class limits.
Select the starting point, let it be the smallest observation.
6, 12, 18, 24, 30,36 are the lower class limits.
Find the upper class limits; e.g. the first upper class limit
𝑢𝑐𝑙1 = 12 − 𝑈 = 12 − 1 = 11.
𝑢 = 1 𝑠𝑖𝑛𝑐𝑒 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛 𝑎𝑠 𝑎 𝑤ℎ𝑜𝑙𝑒 𝑛𝑢𝑚𝑏𝑒𝑟.
11, 17, 23, 29, 35,41 are the upper class limits.
18 Basics Statistics (By:Admasu M.) 11/6/2024
Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36- 41
Step 6: Find the class boundaries;
𝑢 1
𝑙𝑐𝑏1 = 6 − = 6 − = 5.5
2 2
𝑢 1
𝑢𝑐𝑏1 = 11 + = 11 + = 11.5
2 2
19 Basics Statistics (By:Admasu M.) 11/6/2024
Step 7: Find the frequencies.
The complete grouped frequency distribution is given as
follows:
Class Class Class f Lcf Mcf rf. %rf %Lcf
limit boundary Mark
6 – 11 5.5 – 11.5 8.5 2 2 20 0.1 10% 10%
12 – 17 11.5 – 17.5 14.5 2 4 18 0.1 10% 20%
18 – 23 17.5 – 23.5 20.5 7 11 16 0.35 35% 55%
24 – 29 23.5 – 29.5 26.5 4 15 9 0.2 20% 75%
30 – 35 29.5 – 35.5 32.5 3 18 5 0.15 15% 90%
36– 41 35.5 – 41.5 38.5 2 20 2 0.1 10% 100%
20 Basics Statistics (By:Admasu M.) 11/6/2024
1.3.2 DIAGRAMATICALPRESENTATION
These are techniques for presenting data in visual displays
using geometric and pictures.
Importance:
They have greater attraction.
They facilitate comparison.
They are easy to understand.
Diagrams are appropriate for presenting discrete data.
The two most commonly used diagrammatic presentation
for discrete as well as qualitative data are:
Bar charts and Pie charts
21 Basics Statistics (By:Admasu M.) 11/6/2024
1. Bar chart
There are three types of bar charts. These are:
Simple bar chart, Component bar chart & Multiple bar chart
a) Simple Bar chart:
It is a chart which is used to present data that has only one
variable.
It shows changes in the totals of different categories.
Example: Construct a simple bar chart for the following table
showing annual cases of HIV patients reported in Ethiopia as of
July 31, 1993.
Year 1986 1987 1988 1989 1990 1991 1992 1993
Cases 2 17 87 190 448 885 3256 2814
22 Basics Statistics (By:Admasu M.) 11/6/2024
Solution:
23 Basics Statistics (By:Admasu M.) 11/6/2024
2. Pie-Chart
It is used to show the partitioning of a total data into its
component parts using circles.
The circles should be divided into sectors proportional to
the frequencies of the categories they represent.
Example
Draw the pie chart for the following data
Wards Frequency Percentage rf Central angle
Medical A 55 27.5% 99
Medical B 30 15% 54
Surgical A 40 20% 72
Surgical B 25 12.5% 45
Pediatrics 50 25% 90
Total 200 100% 360
24 Basics Statistics (By:Admasu M.) 11/6/2024
solution
25 Basics Statistics (By:Admasu M.) 11/6/2024
1.3.3 Graphical presentation of data
a) Histogram
It presents a grouped frequency distribution of a continuous type.
It is drawn by making class boundaries in the x-axis and
frequencies in the y-axis.
Example: Draw a histogram for the following grouped age data.
Class limit Class boundaries Mid point Frequency
15-19 14.5-19.5 17 2
20-24 19.5-24.5 22 8
25-29 24.5-29.5 27 6
30-34 29.5-34.5 32 12
35-39 34.5-39.5 37 7
40-44 39.5-44.5 42 6
45-49 44.5-49.5 47 4
50-54 49.5-54.5 52 3
55-59 54.5-59.5 57 1
60-64 59.5-64.5 62 1
26 Basics Statistics (By:Admasu M.) 11/6/2024
Solution:
Histogram
Frequency
Polygon
27 Basics Statistics (By:Admasu M.) 11/6/2024
b) Frequency polygon
It is a multi-sided figure which is drawn by plotting the class
marks (midpoints) in the x-axis and the frequencies in the y-axis.
Then connect the points with straight lines
Example: draw the frequency polygon for the following age
data.
Class limit Mid point Frequency
15-19 17 2
20-24 22 8
25-29 27 6
30-34 32 12
35-39 37 7
40-44 42 6
45-49 47 4
50-54 52 3
55-59 57 1
60-64 62 1
28 Basics Statistics (By:Admasu M.) 11/6/2024
solution:
29 Basics Statistics (By:Admasu M.) 11/6/2024
c) Ogives or cumulative frequency polygon (curve)
It plotted in association with the class boundaries on the x- axis and
the cumulative frequencies on the y- axis. Then connect the points
with straight lines.
Less than ogive: It is plotted by "UCB" in the x-axis against the
"lcf" in the y-axis.
More than ogive: It is plotted by "LCB" in the x-axis against the
"mcf" in the y-axis.
Example: draw the less than and more than ogives for the following
age data.
30 Basics Statistics (By:Admasu M.) 11/6/2024
Class limit Frequency LCF More than
23-26 3 ≤ 26.5 ≤ 26 = 3 ≥ 22.5 ≥ 23 = 20
27-30 4 ≤ 30.5 ≤ 30 = 7 ≥ 26.5 ≥ 27 = 17
31-34 3 ≤ 34.5 ≤ 34 = 10 ≥ 30.5 ≥ 31 = 13
35-38 5 ≤ 38.5 ≤ 38 = 15 ≥ 34.5 ≥ 35 = 10
39-42 5 ≤ 42.5 ≤ 42 = 20 ≥ 38.5 ≥ 39 = 5
MCCurve
LCCurve
31 Basics Statistics (By:Admasu M.) 11/6/2024