CHAPTER TWO
METHODS OF DATA COLLECTION AND PRESNTATION
2.1 DATA COLLECTION
Once it is decide what type of study is to be made, it becomes necessary to collected
information about the concerned study, mostly in the form of data. In order to generate valid
conclusion from a data, information has to be collected in a systematic manner. Whatever the
quality of sampling and analysis method, a haphazardly collected dataset is less likely to produce
valuable and generalizable information.
2.1.1 Type of data: data can be classified as primary and secondary
2.1.2 Sources of data
Data may be derived from several sources. Depending on the source, data can be classified
as Primary or Secondary data.
1. Primary Data
Data measured or collect by the investigator or the user directly from the source.
data is gathered for the first time by the researcher for a given purpose
2. Secondary Data
Data gathered or compiled from published and unpublished sources or files.
Usually secondary data is obtained from years book, census reports, survey reports,
official records or reported experimental reports
For example, let’s assume a researcher is interested to study the prevalence of family
planning utilization among women of reproductive age in a given Woreda. The researcher can
either conduct a survey (primary data) or utilize the record of family planning clinics in the woreda
(secondary data).
Note: Data which are primary for one may be secondary for the other.
2.1.3 Data collection techniques/method
1) Questionnaire is the main data collection instrument in formal sample survey. Before
examining the steps in designing a questionnaire we need to review the types of questions used in
questionnaires. Depending on the amount of freedom given to respondent in offering responses,
there are two basic types of questions that can be used in questionnaires: open-ended questions
and closed ended questions.
1
The type of questions for use will be determined by the form of responses wanted, the
nature of the respondents and their ability to answer the questions.
Open-ended questions: - allows the respondent to answer it freely in his or her own
words
Example: what do you think are the reasons for a high drop-out rate of village health
committee members?
Closed – ended questions:-
Predetermined list of alternate responses is presented to the respondent for checking the
appropriate one(s). It implies that the respondent’s answers are restricted in some way to a limited
range of alternatives. Closed ended question fall in to one of the two categories: dichotomous
questions and multiple-choice question.
A dichotomous question contains two alternatives in the predetermined list of responses.
Example: - Yes-no, true –false, agree-disagree, like-dislike, fair-unfair and so on.
A multiple choice question offers more than two responses in the predetermined list of
alternate responses.
Example: How many children have you ever born?
a. 1-2 b. 3-4 c. 5-6 d. 7-8 e. More than 8
Example2: - which type of soft drink (s) do you consume?
1. Coca-cola 2. Fanta 3. Pepsi-cola 4. Miranda 5. Sprite
6. Seven-up 7. Others specify _____________
2) Focus Group Discussion (FGD):-
3) In-depth interview:- A qualitative method that relies on person to person discussion
4) Observation: - A qualitative method that involves critical observation and recording
the practice (behavior, culture…) of individuals or a group.
2.1.4 Level/scale of measurement
Measurement scale refers to the property of value assigned to the data based on the properties of
order, distance and fixed zero. Measurement is the assignment of numbers to objects or events in a
systematic fashion. Four levels of measurement scales are commonly distinguished: nominal, ordinal,
interval, and ratio and each possessed.
2
Properties of the measurement system.
Order
The property of order exists when an object that has more of the attribute than another
object, is given a bigger number by the rule system. This relationship must hold for all objects in
the "real world".
The property of ORDER exists When for all i, j if O >O , then M (O ) > M (O ).
i j i j
Distance
The property of distance is concerned with the relationship of differences between objects.
If a measurement system possesses the property of distance it means that the unit of measurement
means the same thing throughout the scale of numbers. That is, an inch is an inch, no matters were
it falls - immediately ahead or a mile downs the road.
More precisely, an equal difference between two numbers reflects an equal difference in
the "real world" between the objects that were assigned the numbers. In order to define the
property of distance in the mathematical notation, four objects are required: O , O , O , and O . The
i j k l
difference between objects is represented by the "-" sign; O - O refers to the actual "real world"
i j
difference between object i and object j, while M (O ) - M (O ) refers to differences between
i j
numbers.
The property of DISTANCE exists, for all i, j, k, l
If O -O ≥ O - O then M (O )-M (O ) ≥ M (O )-M ( O ).
i j k l i j k l
Fixed Zero
A measurement system possesses a rational zero (fixed zero) if an object that has none of
the attribute in question is assigned the number zero by the system of rules. The object does not
need to really exist in the "real world", as it is somewhat difficult to visualize a "man with no
height". The requirement for a rational zero is this: if objects with none of the attribute did exist
would they be given the value zero. Defining O as the object with none of the attribute in question,
0
the definition of a rational zero becomes:
The property of FIXED ZERO exists if M (O ) = 0. The property of fixed zero is necessary
0
for ratios between numbers to be meaningful.
3
1. Nominal Scales
Nominal scales are measurement systems that possess none of the three properties stated above.
Level of measurement which classifies data into mutually exclusive, all inclusive categories
in which no order or ranking can be imposed on the data.
No arithmetic and relational operation can be applied.
No quantitative information is conveyed
Thus only gives names or labels to various categories.
Examples:
Political party preference (Republican, Democrat, or Other,)
Sex (Male or Female.)
Marital status (married, single, widow, divorce)
Country code
Regional differentiation of Ethiopia.
2. Ordinal Scales
Ordinal Scales are measurement systems that possess the property of order, but not the property of
distance. The property of fixed zero is not important if the property of distance is not satisfied.
Level of measurement which classifies data into categories that can be ranked. Differences
between the ranks do not exist.
Ordering is the sole property of ordinal scale.
Examples:
Letter grades (A, B, C, D, F).
Rating scales (Excellent, Very good, Good, Fair, poor).
Military status.
3. Interval Scales
Interval scales are measurement systems that possess the properties of Order and distance, but not
the property of fixed zero.
Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
Examples:
IQ
4
Temperature in o .
F
Temperature in o .
C
4. Ratio Scales
Ratio scales are measurement systems that possess all three properties: order, distance, and fixed
zero. The added power of a fixed zero allows ratios of numbers to be meaningfully interpreted; i.e. the ratio
of Bekele's height to Martha's height is 1.32, whereas this is not possible with interval scales.
Level of measurement which classifies data that can be ranked, differences are meaningful,
and there is a true zero. True ratios exist between the different units of measure.
Examples:
Weight
Height
Number of students
Age
2.2 METHODS OF DATA PRESNTATION
The presentation of data is broadly classified in to the following three categories:
• Tabular presentation
• Diagrammatic and
Graphic presentation.
Having collected and edited the data, the next important step is to organize it. That is to
present it in a readily comprehensible condensed form that aids in order to draw inferences from it.
It is also necessary that the like be separated from the unlike ones. The process of arranging data in
to classes or categories according to similarities technically is called classification. Classification is
a preliminary and it prepares the ground for proper presentation of data. Mainly, the purpose of
classification is to divide the data into homogeneous groups or class.
The classification of the data generally done on geographical, chronological, qualitative or
quantitative basis on the following lines:
1) In geographical classification, data are arranged according to places, areas or regions.
2) In chronological classification, data are arranged according to time i.e., weekly,
monthly, quarterly, half yearly, annually, etc.
3) In qualitative classification, the data are arranged according to attributes like sex,
marital status, educational standard, stage or intensity of diseases etc.
5
4) In quantitative classification, the data are arranged according to certain characteristic
that has been measured like height, weight, income of persons, vitamin content of in a substance
etc.
Frequency Distribution: is the organization of raw data in table form with classes and
frequencies.
Where:
Raw Data is data collected in original form.
Frequency is the number of times a certain value or class of values occurs
2.2.1 Types of frequency distributions
Categorical frequency distribution
Ungrouped frequency distribution
Grouped frequency distribution
1) Categorical frequency Distribution:
-Used for data that can be place in specific categories such as nominal, or ordinal.
Example 2.1: a social worker collected the following data on marital status for 25
persons. (M=married, S=single, W=widowed, D=divorced)
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital
status M, S, D, and W. These types will be used as class for the distribution. We follow procedure to
construct the frequency distribution.
6
Step 1: Make a table as shown
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
Step 4: Find the percentages of values in each class by using;
f
%= 100 = Where f= frequency of the class, n=total number of value.
n
Step 5: Find the total for column (3) and (4).
Combing the entire steps one can construct the following frequency distribution.
Class Tally Frequency Percent
(1) (2) (3) (4)
M ///// 5 20
S //// /// 7 28
D //// // / 7 28
W //// // 6 24
2) Ungrouped frequency Distribution:
Is a table of all the potential raw score values that could possible occur in the data
along with the number of times each actually occurred.
Is often constructed for small set or data on discrete variable.
Constructing ungrouped frequency distribution:
• First find the smallest and largest raw score in the collected data.
• Arrange the data in order of magnitude and count the frequency.
• To facilitate counting one may include a column of tallies.
Example 2.2:
The following data represent the mark of 20 students.
80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
Construct a frequency distribution, which is ungrouped?
7
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Tally the data.
Step 3: Compute the frequency.
Step 4: find total for the third and fourth column
Mark Tally Frequency percentage
60 // 2 10
62 / 1 5
63 / 1 5
65 / 1 5
70 //// 4 20
74 / 1 5
75 // 2 10
76 / 1 10
80 /// 3 15
85 /// 3 15
90 / 1 10
3) Grouped frequency Distribution:
A frequency distribution when several numbers are grouped in one class.
When the range of the data is large, the data must be grouped in to classes that
are more than one unit in width.
Definition of some common terms
Class limits: Separates one class in a grouped frequency distribution from
another. The limits could actually appear in the data and have gaps between the upper limits of one
class and lower limit of the next.
Units of measurement (U): the distance between two possible consecutive
measures. It is usually taken as 1, 0.1, 0.01, 0.001, -----.
Class boundaries: Separates one class in a grouped frequency distribution from
another. The boundaries have one more decimal places than the row data and therefore do not
8
appear in the data. There is no gap between the upper boundary of one class and lower boundary of
the next class.
The lower class boundary is found by subtracting 0.5U from the corresponding lower class
limit and the upper class boundary is found by adding 0.5U to the corresponding upper class limit.
Class width: the difference between the upper and lower class boundaries of any
class. It is also the difference between the lower limits of any two consecutive classes or the
difference between any two consecutive class marks.
Class mark (Mid points): it is the average of the lower and upper class limits or
the average of upper and lower class boundary.
Cumulative frequency: is the number of observations less than/more than or
equal to a specific value.
Cumulative frequency above: it is the total frequency of all values greater than
or equal to the lower class boundary of a given class.
Cumulative frequency blow: it is the total frequency of all values less than or
equal to the upper class boundary of a given class.
Cumulative Frequency Distribution (CFD): it is the tabular arrangement of
class interval together with their corresponding cumulative frequencies. It can be more than or less
than type, depending on the type of cumulative frequency used.
Relative frequency (rf): it is the frequency divided by the total frequency. This
gives the percent of values falling in that class.
Relative cumulative frequency (rcf): it is the cumulative frequency of each
class divided by the total frequency. Gives the percent of the values which are less than or more
than the upper class more than the lower class boundary respectively.
Guidelines for classes
1. There should be between 5 and 20 classes.
2. It is preferable if the class width be an odd number. This will guarantee that the
class midpoints are integers instead of decimals.
3. The classes must be mutually exclusive. This means that no data value can fall into
two different classes.
9
4. The classes must be all inclusive or exhaustive. This means that all data values must
be included.
5. The classes must be continuous. There are no gaps in a frequency distribution.
Classes that have no values in them must be included (unless it's the first or last classes which are
dropped).
6. The classes must be equal in width. The exception here is the first or last class. It is
possible to have a "below ..." or "... and above" class. This is often used with ages.
Example2.3:
Construct a frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
Step 2: Find the range; R=H-L=39-6=33
Step 3: Select the number of classes’ desired using Sturges formula;
=1+3.32log (20) =5.32=6(rounding up
Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)
Step 5: Select the starting point, let it be the minimum observation.
6, 12, 18, 24, 30, 36 are the lower class limits.
Step 6: Find the upper class limit;
E.g. the first upper class=12-U=12-1=11
11, 17, 23, 29, 35, 41 are the upper class limits.
So combining step 5 and step 6, one can construct the following classes.
Class limits 24 – 29
6 – 11 30 – 35
12 – 17 36 – 41
18 – 23
Step 7: Find the class boundaries;
E.g. for class 1 Lower class boundary=6-U/2=5.5
Upper class boundary =11+U/2=11.5
10
• Then continue adding W on both boundaries to obtain the rest boundaries. By doing so one
can obtain the following classes.
Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5
Step 8: tally the data.
Step 9: Write the numeric values for the tallies in the frequency column.
Step 10: Find cumulative frequency.
Step 11: Find relative frequency or/and relative cumulative frequency.
The complete frequency distribution follows:
Class Class Class Tally Freq. CF(less CF(more RF RF
RF
limit boundary Mark Than than type) (less
type) than type ) (more
than type)
6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10 1
12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20 0.9
18 – 23 17.5 – 23.5 20.5 ////// 7 11 16 0.35 0.55 0.55
24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75 0.35
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90 0.2
36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00 0.1
11
2.3 Diagrammatic and Graphic presentation of data.
-These are techniques for presenting data in visual displays using geometric and pictures.
Importance: -
• They have greater attraction.
• They facilitate comparison.
• They are easily understandable.
2.3.1 Diagrammatic presentation of data
-Diagrams are appropriate for presenting discrete as well as qualitative data.
-The three most commonly used diagrammatic presentation for discrete as well as qualitative
data are:
I• Pie charts
II• Pictogram
III• Bar chart
I. Pie chart
A Pie Chart is a circular chart divided into sectors, illustrating relative magnitudes or
frequencies of classes of a given variable. Pie chart usually represents categorical data but it is also
possible to use it for discrete quantitative data. The angle of each sector has to be proportional to the
relative frequency of a given class.
value of the part
Degree/ Angle of Sector= * 360
the whole quantity
Example 2.4: Draw a suitable diagram to represent the following population in a town.
Men Women Girls Boys
2500 2000 4000 1500
Solutions:
Step 1: Find the percentage.
Step 2: Find the number of degrees for each class.
Step 3: Using a protractor and compass, graph each section and write its name with its
corresponding percentage.
Class Frequency Percent Degree
Men 2500 25 90
Women 2000 20 72
Girls 4000 40 144
Boys 1500 15 54
12
15%
25%
Men
Women
Girls
Boys
40% 20%
II. Pictogram
Data are presented with the help of picture also. Such a presentation known as pictorial
diagram or pictogram. Here the magnitudes of quantities of the variable are explained with the help of
pictures which depict the variable approximately. In a pictogram, each symbol in the picture
represents a fixed quantity of the variable.
III. Bar Charts:
- A set of bars (thick lines or narrow rectangles) representing some magnitude over time space.
- They are useful for comparing aggregate over time space.
- Bars can be drawn either vertically or horizontally.
- There are different types of bar charts. The most common being:
A. Simple bar chart
B. Deviation 0r two way bar chart
C. Broken bar chart
D. Component or sub divided bar chart.
E. Multiple bar charts.
A. Simple Bar Chart
-Are used to display data on one variable. Means one variable represented by one bar.
-They are thick lines (narrow rectangles) having the same breadth. The magnitude of a
quantity is represented by the height /length of the bar.
13
Example 2.5: The following data represent sale by product, 1957- 1959 of a given company
for three products A, B, C.
Product Sales($) Sales($) Sales($)
In 1957 In 1958 In 1959
A 12 14 18
B 24 21 18
C 24 35 54
Draw a Simple Bar chart for sale by product in year 1957.
Sales($) In 1957
30
24 24
25
20
15 12
10
0
A B C
D. Component Bar chart
-When there is a desire to show how a total (or aggregate) is divided in to its component parts, we use
component bar chart. Each bars represent total value of different variable of same period with total
broken in to its component parts and different paints or designs are used for identifications
Example 2.6:
Draw a component bar chart to represent the sales by product from 1957 to 1959.
Solutions:
Sales By product in 1957-1959
100
80
sales in $
product C
60
product B
40
product A
20
0
1957 1958 1959
Years of production
E. Multiple Bar charts
- These are used to display data on more than one variable.
14
- They are used for comparing different variables at the same time.
Example: multiple bar charts to represent the sales by product from 1957 to 1959. Solution
Sales by Product in 1957-1959
60
50
Sales in $
40 product A
30 product B
20 product C
10
0
1957 1958 1959
Years of production
2.3.2 Graphical Presentation of data
- Commonly applied graphical representation for continuous data are:
I. The histogram
II. Frequency polygon
III. Cumulative frequency graph or Ogive
I. Histogram
A graph which displays the data by using vertical bars of various heights to represent
frequencies. Class boundaries are placed along the horizontal axis. Class marks and class limits are
sometimes used as quantity on the X axes. Unlike Bar graph, in the case of Histogram the categories
(bars) must be adjacent.
Example 2.7: the following table summarizes the Biostatistics mid exam score of 38 students
out of 35 marks.
If we want to draw Histogram for this data it would be like this:
15
II. Frequency Polygon:
Frequency Polygon depicts a frequency distribution for discrete or continuous numeric [Link]
polygons are a graphical device for understanding the shapes of distributions. A Histogram can easily
be changed to Frequency Polygon by joining the mid points of the top of the adjacent rectangles of the
Histogram with a line. It is also possible to draw Frequency Polygon without drawing Histogram.
Example 2.8: - the following Frequency Distribution represents the ages (in years) of 60
patients at a psychiatric counseling center.
16
Note that two artificial class marks at both ends with frequencies of zero have been added to “tie
down” the graph on the X-Axis.
III. Ogive (cumulative frequency polygon): A graph showing the cumulative frequency (less
than or more than type) plotted against upper or lower class boundaries respectively.
1. Less than Ogive :- is a line graph obtained from less than cumulative frequency plotted against
upper boundaries of their respective class intervals
2. More than Ogive :- is a line graph obtained from more than cumulative frequency plotted
against the lower boundaries of their respective class intervals.
17