Unit 3 - Types and Sources of Errors in Demographic Data
Unit 3 - Types and Sources of Errors in Demographic Data
1
the map. Poor map reading is often as a result of inadequate training of
enumerators or simply incompetence despite receiving adequate training.
Non-response: The enumerator may omit to enumerate some people in the
defined territory. This could be due to several reasons:
3
They can occur at planning stage, fieldwork stage, tabulation (entry) and processing
of data.
The main sources of non-sampling errors include:
coverage errors,
content errors,
non-response errors,
processing errors,
interview errors,
faulty definition of terms in the data collection instrument,
defective methods of data collection. Data collection methods such as
interview, observation, focus group discussion, etc. need to be carefully
selected dependent on the type of data needed.
Therefore, among others, the following (falling under the above sources) can lead to
non-sampling errors:
Imprecise definition of the boundaries (poor mapping)
Inappropriate or inaccurate data collection methods
Ambiguous questionnaire, definitions and instructions
Poor design of the data collection instrument (e.g. questionnaire), for
example, unclear or biased questions.
Poorly trained, inexperienced or biased enumerators.
Non-response
Giving false information due to poor memory (recall errors) or deliberately.
Erroneous coding (coding errors) and data entry (data entry errors).
4
This is a sampling method which gives each unit in the target population an equal
chance of being selected in the sample by randomly selecting the desired number of
respondents from the target population.
This approach is fair and reduces selection bias, which undermines the accuracy of
the predictions being made about the target population. Ideally, a sample should be
representative of the entire target population.
In order to select a random sample, a sampling frame is required. A sampling frame
is the total population of units or people in the target population.
How selection is done using simple random sampling
Each unit is assigned a unique identification number and then using a random
number table or generator, the required number of units (people) is randomly
selected.
Example:
An area has a total population size of 1,536. A research on “assessing
people’s attitude towards homosexuality” needs 300 people to participate.
To select the 300 people, assign everyone in the population a four-digit
number beginning with 0001, 0002, 0003, 0004 and until the last person
1536.
Then starting at any point in the random number table, manually pick the
numbers successively until the 300th person is picked.
Alternatively, use a software such as Microsoft Excel to randomly select 300
people from the population of 1,536.
How to select a random sample of 300 respondents from a total population of 1,536 using
Microsoft Excel
Step 1: Click on cell A1 and type RANDBETWEEN(0001,1536) and press Enter.
Step 2: To generate, for example, a list of 300 random numbers, select cell A1, click on the
lower right corner of cell A1 and drag it down to cell A300.
5
Step 2: Determine the sample size (for example, 300 people).
Step 3: Divide step 1 by step 2 (k=N/n) to get the skip/interval number. Example: k =
1,536/300 = 5.12. or 5 when rounded off to a whole number.
Step 4: Randomly select the starting point in the sampling frame.
Step 5: Select every 5th number after the randomly selected number until the 300 th person is
selected.
6
Some clusters are then randomly selected using simple random sampling based on
the desired number of clusters to enumerate.
Everyone in the selected clusters then gets to be enumerated. The units in the
selected clusters constitute the sample.
Disadvantage Cluster random sampling
Clusters may differ in important characteristics from the ones not included in the
sample.
This could lead to bias in the accuracy of results i.e. it may be difficult to
generalize the results to the whole population.
How to select a sample using cluster random sampling
A Mulungushi students wants to conduct a research on sexuality in Kabwe District. She
wants to use cluster random sampling to select her sample. Kabwe District is made up of 10
Wards each with a different population size. She wants to select 4 Wards to participate in
the survey.
Step 1: Identify the population of interest: Her population of interest is Kabwe District with
1,536 people.
Step 2: Divide the population into a large number of clusters. Kabwe District has 10 Wards.
Step 3: Select the clusters to be included in the sample using simple random sampling. She
needs to select 4 Wards from the 10 Wards.
Note: Her sample size will be determined by the total population of the 4 Wards she
randomly selected.
v. Multistage random sampling
Multistage random sampling is a technique that combines two or more random
sampling methods sequentially.
The process usually begins by taking a cluster random sample, followed by a simple
random sample or a stratified random sample.
Multistage random sampling can also combine random and non-random sampling
techniques.
Example: Instead of enumerating everyone in the 4 Wards selected by the student,
she uses simple random sampling to select respondents from each Ward based on
proportional to size. In this case, she needs to know the total population size of the 4
Wards, then use proportion to size to pick her sample.
Non-random sampling
i. Purposive/purposeful sampling
Purposeful sampling is when a sample is selected using a predetermined criterion
that will facilitate the collection of required data to answer the objectives of
research or purpose of conducting the research.
Unlike random sampling, this sampling technique is mainly used with a limited
number of persons who have the required information.
7
Advantages of purposive sampling
Suitable when resources and time to collect data are limited
In emergency settings such as conflict-affected societies, this approach may also be
more appropriate, as taking a random sample may face the risk of aggravating
tensions.
Disadvantage purposive sampling
Unlike random sampling methods, purposeful sampling is deliberately biased in
order to select the most appropriate cases to answer the questions posed.
Thus, if this sampling technique is used, it is necessary to be transparent and
rigorous when selecting a sample to control for and identify any potential
bias in the data that will be collected.