0% found this document useful (0 votes)
151 views8 pages

Unit 3 - Types and Sources of Errors in Demographic Data

The document discusses the various types and sources of errors in demographic data, categorizing them into coverage errors, content errors, sampling errors, and non-sampling errors. It highlights factors contributing to these errors, such as human mistakes, poorly designed data collection instruments, and low literacy levels, which can lead to inaccuracies in data collection and representation. Additionally, it outlines different sampling methods, both probability and non-probability, emphasizing the importance of proper sampling techniques to minimize errors and improve data reliability.

Uploaded by

kabwebwalya64
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views8 pages

Unit 3 - Types and Sources of Errors in Demographic Data

The document discusses the various types and sources of errors in demographic data, categorizing them into coverage errors, content errors, sampling errors, and non-sampling errors. It highlights factors contributing to these errors, such as human mistakes, poorly designed data collection instruments, and low literacy levels, which can lead to inaccuracies in data collection and representation. Additionally, it outlines different sampling methods, both probability and non-probability, emphasizing the importance of proper sampling techniques to minimize errors and improve data reliability.

Uploaded by

kabwebwalya64
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Types and Sources of Errors in Demographic Data

Introduction: definition of an error and how errors are caused.


 Errors in demographic data refer to inaccurate and or unrepresentative data.
 Data is unrepresentative if it is not a true representation or reflection of the
population it was collected from.
 No matter how carefully demographic data is collected, it always contains errors.
 There are various factors which lead to errors in demographic data.
 Human errors: mistakes made by the enumerator, wrong responses given by
the respondent (deliberately or due to memory loss) or when processing the
data e.g. entering a wrong figure during data entry, etc.
 Design of data collection instruments/tools: A poorly designed data
collection instrument such as a questionnaire would lead to errors as
questions may not be fully understood, hence wrong answers may be given
by the respondent. In other words, a poorly designed questionnaire may lead
to collection of inaccurate data.
 Low literacy levels: Some errors are more evident in developing countries
because of low literacy levels.
 As such, some respondents may not fully understand what they are
being asked.
 Errors in data may be large or small. The quantity of errors depends on the obstacles
encountered in the enumeration area to accurately record the data, the methods
used in compiling the data, and the relative efficiency with which the methods are
applied.
 Some errors cannot be avoided and will always be part of the enumeration process.
For example, age cheating due digit preference or other reasons, omission of
respondents, etc.

TYPES OF ERRORS IN DEMOGRAPHIC DATA


There are two main types of errors found in demographic data.
i. Coverage errors
 They are also referred to as errors of omission.
 These arise from the omission of eligible respondents in a defined territory.
 Coverage errors can only arise in situations were a complete canvass (enumeration)
of people in a defined territory is needed.
 Coverage errors can be caused by a number of factors:
 Poor mapping of boundaries: If boundaries have not been clearly identified,
some respondents may be deemed to be outside the enumeration area when
in fact they are within.
 Poor map reading by the enumerator: If an enumerator reads the map
incorrectly, he or she may omit some people or enumerate people outside

1
the map. Poor map reading is often as a result of inadequate training of
enumerators or simply incompetence despite receiving adequate training.
 Non-response: The enumerator may omit to enumerate some people in the
defined territory. This could be due to several reasons:

 Intentionally: Where an enumerator deliberately skips some homes


for whatever reason.
 Failure to make a return visit: When an enumerator does not find
anyone at home, he/she is supposed to go back to the household
some other time to attempt to enumerate the household. He or she is
supposed to make at least one more return visit or as guided during
training. However, some enumerators don’t revisit a household if they
do not find anyone at home when they go there the first time.
Sometimes an enumerator may make several return visits to
enumerate a household but with no success (no one at home).

ii. Content errors


 These are errors that are inherent or contained in the data as a result of collecting
wrong data.
 Content errors can be caused by the following:
 The enumerator: The enumerator may enter a wrong response in the
questionnaire contrary to the answer that the respondent gave. This can be
deliberate due to bias or by mistake. Errors made by the enumerator are
called enumerator errors. Examples:
 By mistake, the enumerator ticks on male when the respondent is
female. However, such mistakes can be rectified later during data
cleaning e.g. checking against questions that relate to females only.
For example, a male person cannot answer questions on pregnancy.
Therefore, if the answer to the question on sex/gender was male but
the respondent answered pregnancy related questions, then there
was a mistake in entering the respondent as male. The use of CAPI
methods has also eliminated such errors.
 In some situations, the enumerator may deliberately record a
different answer from the one the respondent gave. For example, an
elderly man stating that he is 30 years old when he looks to be over
50 years old or a respondent saying they are 45 when they look like
they are below 40. The enumerator is supposed to enter the age given
by the respondent. If in doubt, they can probe further so that they are
sure that the respondent is giving correct information.
 The respondent: The respondent can sometimes give an incorrect answer
deliberately or due to loss of memory. This is called respondent error. It is
classified as misreporting.
2
 The respondent can genuinely forget their true age and therefore give
wrong age.
 Some people choose not to disclose certain information, and may
therefore give a wrong answer.
 A poorly designed data collection instrument: Some content errors arise
from the data collection method used such as an unclear questionnaire.
 A questionnaire should be formatted and worded in such a way that it
is easy to understand the questions so that correct responses can be
entered or ticked.
 It should also not be unnecessarily long to avoid interviewee fatigue
which may result in the interviewee giving wrong responses.
OTHER TYPES OF DEMOGRAPHIC ERRORS
i. Sampling Errors
 These are errors which arise due to the method of sampling used to select
respondents and also as a result of not enumerating everyone in the population.
 Sampling is the process of selecting units (e.g., individuals, groups) from the
population of interest to study these units in detail with the aim of drawing
conclusions about the larger population. This is called generalization.
Generalizability of research or survey findings depends on the sampling
method used.
 Generalization of results depends on the method used to select
respondents. This is because some sampling methods are biased such
as non-probability sampling methods. If a sample is biased, the
findings of the survey cannot be generalized to the whole population
from which the sample was drawn.
 People have unique characteristics. Therefore, it is not possible that the
responses given by those selected in the survey can 100% represent the rest
of the population.
 Since not everyone in the population is enumerated when a survey is conducted, it
means that sampling errors cannot be eliminated but only reduced by:
 Increasing the sample size (i.e. number of people to participate in the
survey).
 Using probability sampling methods so that every person in the population is
given an equal chance of being selected to participate in the survey.
 The only way to eliminate sampling error is by enumerating everyone in the
population, which in most cases is not possible and actually defeats the purpose of
using a sample. A sample is used in order to reduce cost associated with complete
enumeration (census).
ii. Non-sampling errors
 Are errors that arise from factors other than the sampling method used.

3
 They can occur at planning stage, fieldwork stage, tabulation (entry) and processing
of data.
 The main sources of non-sampling errors include:
 coverage errors,
 content errors,
 non-response errors,
 processing errors,
 interview errors,
 faulty definition of terms in the data collection instrument,
 defective methods of data collection. Data collection methods such as
interview, observation, focus group discussion, etc. need to be carefully
selected dependent on the type of data needed.
 Therefore, among others, the following (falling under the above sources) can lead to
non-sampling errors:
 Imprecise definition of the boundaries (poor mapping)
 Inappropriate or inaccurate data collection methods
 Ambiguous questionnaire, definitions and instructions
 Poor design of the data collection instrument (e.g. questionnaire), for
example, unclear or biased questions.
 Poorly trained, inexperienced or biased enumerators.
 Non-response
 Giving false information due to poor memory (recall errors) or deliberately.
 Erroneous coding (coding errors) and data entry (data entry errors).

SOURCES OF SAMPLING ERRORS IN DEMOGRAPHIC DATA


 As stated above sampling errors arise from the type of sampling method used to
select the sample.
 There are two main methods of selecting the sample:
 Probability sampling methods, and
 Non-probability sampling methods.

Probability sampling methods


 The methods give every member of the population an equal chance of being
included or picked in the sample.
 Probability sampling methods improve generalization of results of the sample
survey.
 There are several types of probability sampling methods. These are discussed below.

i. Simple random sampling

4
 This is a sampling method which gives each unit in the target population an equal
chance of being selected in the sample by randomly selecting the desired number of
respondents from the target population.
 This approach is fair and reduces selection bias, which undermines the accuracy of
the predictions being made about the target population. Ideally, a sample should be
representative of the entire target population.
 In order to select a random sample, a sampling frame is required. A sampling frame
is the total population of units or people in the target population.
How selection is done using simple random sampling
 Each unit is assigned a unique identification number and then using a random
number table or generator, the required number of units (people) is randomly
selected.
 Example:
 An area has a total population size of 1,536. A research on “assessing
people’s attitude towards homosexuality” needs 300 people to participate.
 To select the 300 people, assign everyone in the population a four-digit
number beginning with 0001, 0002, 0003, 0004 and until the last person
1536.
 Then starting at any point in the random number table, manually pick the
numbers successively until the 300th person is picked.
 Alternatively, use a software such as Microsoft Excel to randomly select 300
people from the population of 1,536.
How to select a random sample of 300 respondents from a total population of 1,536 using
Microsoft Excel
Step 1: Click on cell A1 and type RANDBETWEEN(0001,1536) and press Enter.
Step 2: To generate, for example, a list of 300 random numbers, select cell A1, click on the
lower right corner of cell A1 and drag it down to cell A300.

ii. Systematic random sampling


 Systematic random sampling is a technique where people are selected in the sample
successively at a determined regular interval.
 The total population in the sampling frame is divided by the sample size to
determine the selection interval.
 Then a starting number is randomly selected using simple random sampling. The
starting number can also be selected near the beginning of the sampling frame list
such as between 1-10.
 Once the first number is picked, pick the next number depending on the interval.
How to select a sample using systematic random sampling
Step 1: Estimate the number of units in the population (for example, population size of
1,536).

5
Step 2: Determine the sample size (for example, 300 people).
Step 3: Divide step 1 by step 2 (k=N/n) to get the skip/interval number. Example: k =
1,536/300 = 5.12. or 5 when rounded off to a whole number.
Step 4: Randomly select the starting point in the sampling frame.
Step 5: Select every 5th number after the randomly selected number until the 300 th person is
selected.

iii. Stratified random sampling


 It is also referred to as proportional or quota random sampling.
 The technique divides the sampling frame into two or more subpopulations called
strata depending on the desired population characteristics such as age, sex, tribe,
etc.
 Thus, stratified random sampling is useful when comparing several groups of people
based on their different population characteristics.
 Then, a simple random sample from each stratum is taken proportional to size.
 For example, if a population consists of children, youth, young adults, adults and the
elderly, how many of each should be included in the sample to conduct a research?
If the total population is known as well as the total number of people for each
subpopulation group is known, then the sample can be drawn proportional to size of
each stratum (population group).
How to select a sample using stratified random sampling
Step 1: Divide the population into the strata of interest. For example, from a total
population of 1,536 people, there are 142 children, 157 youth, 413 young adults, 701 adults
and 123 elderly people.
Step 2: Select a simple random sample from each stratum according to the stratum’s
proportional size to the total.
 Children: 142/1,536 = 0.092 * 142 = 13.12
 This means that 13 children are to be selected using simple random sampling from
the 142 population of children
 The same is done for the rest of the population subgroups.
 Note: The number of units (people) selected from each stratum should be equivalent
to the stratum’s proportion of the total population.
iv. Cluster random sampling
 Simple, systematic and stratified random sampling techniques all require a sampling
frame, which sometimes may not be available.
 When a sampling frame is not available or the units on the list are so widely
dispersed that it would be too time-consuming and expensive to conduct a simple
random sample, cluster sampling is a useful alternative.
 Cluster random sampling divides the population into several clusters or areas such as
wards or provinces.

6
 Some clusters are then randomly selected using simple random sampling based on
the desired number of clusters to enumerate.
 Everyone in the selected clusters then gets to be enumerated. The units in the
selected clusters constitute the sample.
Disadvantage Cluster random sampling
 Clusters may differ in important characteristics from the ones not included in the
sample.
 This could lead to bias in the accuracy of results i.e. it may be difficult to
generalize the results to the whole population.
How to select a sample using cluster random sampling
A Mulungushi students wants to conduct a research on sexuality in Kabwe District. She
wants to use cluster random sampling to select her sample. Kabwe District is made up of 10
Wards each with a different population size. She wants to select 4 Wards to participate in
the survey.
Step 1: Identify the population of interest: Her population of interest is Kabwe District with
1,536 people.
Step 2: Divide the population into a large number of clusters. Kabwe District has 10 Wards.
Step 3: Select the clusters to be included in the sample using simple random sampling. She
needs to select 4 Wards from the 10 Wards.
Note: Her sample size will be determined by the total population of the 4 Wards she
randomly selected.
v. Multistage random sampling
 Multistage random sampling is a technique that combines two or more random
sampling methods sequentially.
 The process usually begins by taking a cluster random sample, followed by a simple
random sample or a stratified random sample.
 Multistage random sampling can also combine random and non-random sampling
techniques.
 Example: Instead of enumerating everyone in the 4 Wards selected by the student,
she uses simple random sampling to select respondents from each Ward based on
proportional to size. In this case, she needs to know the total population size of the 4
Wards, then use proportion to size to pick her sample.

Non-random sampling
i. Purposive/purposeful sampling
 Purposeful sampling is when a sample is selected using a predetermined criterion
that will facilitate the collection of required data to answer the objectives of
research or purpose of conducting the research.
 Unlike random sampling, this sampling technique is mainly used with a limited
number of persons who have the required information.

7
Advantages of purposive sampling
 Suitable when resources and time to collect data are limited
 In emergency settings such as conflict-affected societies, this approach may also be
more appropriate, as taking a random sample may face the risk of aggravating
tensions.
Disadvantage purposive sampling
 Unlike random sampling methods, purposeful sampling is deliberately biased in
order to select the most appropriate cases to answer the questions posed.
 Thus, if this sampling technique is used, it is necessary to be transparent and
rigorous when selecting a sample to control for and identify any potential
bias in the data that will be collected.

ii. Snowball sampling


 It is a form of purposive sampling that is used to collect information from
populations that are hard to locate or difficult to access.
 The process of collecting information starts with identifying a suitable candidate to
interview and then after the interview asking the respondent to identify other
potential respondents to talk to.
 This creates a chain of respondents who all recommend the next person the
interviewer can collect data from.
 This process is continued until the researcher collects enough data to a point of
saturation.
 Saturation is a point in the data collection process where no new or relevant
information emerges that addressed the questions proposed.
iii. Convenience sampling
 This is a sampling technique where respondents are selected to be part of the
sample based on the availability or self-selection of participants (volunteers) or the
researcher’s convenience.
Advantage of convenience sampling
 It is inexpensive, simple and convenient because the interviewer collects information
from people willing to be interviewed or those found wherever the interviewer is.
Disadvantage of convenience sampling
 This sampling technique is also the least reliable of the non-random sampling
approaches presented above. Conveniently interviewing people means that the most
available will be over-represented.

You might also like