National Guidelines for Data Quality
National Guidelines for Data Quality
Statistics (ICMR-NIMS)
July 2021
References..................................................................................................79-82
Selected Definitions.....................................................................................83-86
List of Contributors .......................................................................................... 87
I
National Guidelines for Data Quality in Surveys
L
National Guidelines for Data Quality in Surveys
Acknowledgements
I would like to express my profound sense of reverence and gratitude to Prof. (Dr.)
Balram Bhargava, Secretary, Department of Health Research and Director General,
Indian Council of Medical Research, New Delhi, as Steering Committee Chair
for his guidance, supervision and leadership in the formulation and finalisation of
the guidelines.
The guidance and expertise extended by other members of the Steering Committee-
Dr. Pronab Sen - Director, IGC India, New Delhi and Ex-Secretary, MoSPI and Chief
Statistician of India; Dr. Nivedita Gupta - Chief Director (Stats) MoHFW, New Delhi;
Shri. Shankar Lal Menaria - Addl. Director General, National Statistics Office, MoSPI,
New Delhi; Dr. P. Ashok Babu - Director, POSHAN Abhiyaan, Anganwadi Services
(ICDS), MWCD, New Delhi; Dr. Shekhar Shah - Director, NCAER, New Delhi; Dr. T. K.
Roy - Ex-Director, International Institute for Population Sciences (IIPS), Mumbai; Shri.
Pankaj Shreyaskar - Director, Survey Coordination Division, National Statistics Office,
MoSPI, New Delhi; Dr. K. S. James - Director and Senior Professor, International
Institute for Population Sciences, Mumbai; Dr. Suneeta Krishnan - Country Lead
(MLE), Bill & Melinda Gates Foundation, New Delhi; Shri. Sanjeev Kumar - Deputy
Director General, Office of the Registrar General of India, New Delhi; and Mr. D. K.
Ojha - DDG (Stats, HMIS), MoHFW, New Delhi, at all stages of the preparation of the
guidelines is gratefully acknowledged.
The publication of this document would not have been achieved without the
commitment and contributions of a team of scientists from ICMR-National Institute
of Medical Statistics (NIMS), New Delhi, researchers of the Population Council,
New Delhi; and NDQF project staff. I express my gratitude for their persistent and
unstinted support in bringing out this document in short time frame. I would also
like to thank Mr. Vaibhav Malhotra, Project Officer, NDQF at the ICMR-NIMS, and
Ms. Radhika Dhingra – Assistant Program Officer and Ms. Ramandeep Kaur –
Senior Executive Assistant at the Population Council, for their invaluable support at
different stages of preparation and production of this document.
Last, but not the least, I am extremely grateful to the Bill & Melinda Gates
Foundation for its technical and financial support through the Population Council.
I am especially thankful to the Population Council for its facilitation right from
initiation, till the completion of the document.
Director
ICMR-NIMS, New Delhi
About National Data Quality Forum
The NDQF envisions providing a framework for advanced data quality monitoring,
process audits and analytics and building the capacity of data collectors for
improving the quality of surveys and administrative data. The goals of NDQF are:
Aim
To generate quality survey data by mitigating errors and biases that may creep in
during survey design, data collection and analysis.
Objectives
Targeted Stakeholders
This document provides insight on crucial steps that need to be followed right from
the beginning to ensure data quality. The document is divided into three sections
based parts on three broad phases of a survey:
• Preparatory phase
• Data collection phase
• Post data collection phase
It lists out the points that need to be borne in mind during the preparatory phase,
including the study design, sampling, survey tools and manuals. It guides readers
on key quality considerations while designing and developing a data entry package,
quality assurance protocols that ensure quality of survey, anthropometry and
biomarker data, recruitment and training of survey investigators, and assessment of
trained health/research investigators.
The guideline document describes the quality assurance activities in the data
collection phase, monitoring quality of data collection using multiple tools, suggests
usage of data entry parameters and quality dashboard during a field survey.
The document also guides on post data collection quality checks, reviewing the
data and using appropriate data quality analytics. It outlines different techniques
that can be employed for assessing quality of data, and estimation of sampling and
non-sampling errors. Application of different machine learning techniques in the
assessment of quality of the surveys is made easy with this document as it provides
tips on the use of technology and checklists for quick guidance, wherever needed.
These guidelines are useful for a wide range of audiences viz., government and
private data producers and users, national and regional level policy makers, and
technical staff at the ministries and organisations, besides academic, research
institutions and survey agencies. It is immensely helpful in guiding the planning,
designing and execution of sample surveys and achieving high quality data.
1
National Guidelines for Data Quality in Surveys
1.1 General Principles of Data
Quality Assurance in Surveys
Building a quality assurance mechanism is key to data quality in surveys. The efforts
towards this start with survey planning. The basic quality requirements are:
Every survey conducted in the fields of demography, health, and nutrition (or
likewise) must prepare a quality assurance plan during the planning process. The
quality assurance plan should list detailed steps and activities. To build such a
plan, the survey implementing agencies or the survey coordinating organisations
may consider:
2
National Guidelines for Data Quality in Surveys
• Describing the structure of the team for quality assurance during
survey implementation
• Creating tools that help measure data quality on an ongoing basis
• Developing procedures for both quality management plan and quality
assurance procedures/activities in the field
O erform analytics on
P O Present dashboards on
paradata to monitor the data consistency.
progress in data quality. O Report data quality
O Perform analytics on key measures, including but not
study indicators to examine limited to sampling error,
investigators’ bias or non-sampling error and
indicative patterns. investigator bias.
4
National Guidelines for Data Quality in Surveys
1.2 Data Quality Framework
for Surveys
5
National Guidelines for Data Quality in Surveys
The following diagram provides the specific set of principles at each stage of
the survey.
6
National Guidelines for Data Quality in Surveys
Quality dimensions related to survey data and processes, associated attributes,
metrics/indicators, and sections where these issues are dealt with are provided in
the table below.
7
National Guidelines for Data Quality in Surveys
Metrics/ Sections Where These
Dimensions Attributes
Indicators Issues are Dealt with
SURVEY PROCESS
Methodological Appropriate study Engagement of right Types of survey error (1.3); quality
Soundness design, ethical experts for various criteria for review of survey
procedures, components of protocols/bids (1.6); study design
sampling survey; adoption (2.1); sampling design (2.2), survey
approaches, of internationally tools (2.3); quality assurance of
sample selection accepted standards anthropometric and biological data
and study for data collection (2.7, 2.8); calculation of sampling
tools, interview and monitoring weight, sampling error (4.2) and non-
approach, survey sampling errors/bias (4.3)
monitoring, and
data recording
SURVEY OUTPUT
Relevance Satisfy the Tools used for the Type of survey error (1.3); study
data needs of study cover the design (2.1); sampling design (2.2);
the users study objectives; survey tools (2.3)
indicators are
standardised
Accuracy Closeness of Assess coverage Type of survey error (1.3); study
estimate to reality error, measurement design (2.1); sampling design (2.2);
error, non-response survey tools (2.3); data profiling
error, sampling error (4.1); calculation of sampling weight,
(CV, Variance, SE) sampling error (4.2) and non-
of key estimates sampling errors/bias (4.3)
8
National Guidelines for Data Quality in Surveys
Metrics/ Sections Where These
Dimensions Attributes
Indicators Issues are Dealt with
Reliability Closeness of the Test-retest reliability, Tools to monitor data quality (3.6);
initial estimate to alternate-form use of paradata (3.7, 3.8); data
the final one reliability and quality dashboard (3.9)
internal consistency
reliability of
key estimates
(Cronbach’s Alpha,
Kappa statistics)
Accessibility/ Data accessible Open access data; Ethics (1.7); data profiling (4.1);
Clarity to public with clearly defined documentation on data
relevant metadata metadata; relevant quality (3.11)
and data use documentation of
guidelines data quality and
data profiling
Coherence Consistency of Tools contain Survey tools (2.3); designing data
different indicators questions to entry application (2.4); data profiling
within the same assess internal (4.1)
dataset; option consistency;
of integration at key geographic
various levels (for characteristics are
example, district, provided in the unit
state, national, level data
gender, age)
Comparability Allow Standard design Study design (2.1); survey
comparability used in the survey tools (2.3)
across and standardised
geographies, indicators are
time, domains available for
comparison
Completeness All data items Percent of missing Data profiling (4.1)
recorded values on different
indicators
Note: The quality dimensions, quality attributes and metrics/indicators presented in this table were
sourced from mutliple publications [2-9]
9
National Guidelines for Data Quality in Surveys
1.3 Sources of Errors
and Biases in Surveys
Errors are an essential part of survey estimates. In a sample survey, errors occur
because all the members of the sampling frame are not observed, rather estimates
are drawn from a small representative segment of the target population [10-12].
Estimates from the sample population are used to infer on the characteristics
of the target population, assuming that they are generalisable. The difference
between the estimates drawn from the sample and the corresponding population
parameter is termed as survey error. There are other types of errors that stem from
inappropriate implementation of sampling design or problematic data collection,
entry or processing. Further, some biases can occur making a population estimate
unusable. Two types of survey errors are observed, some are measurable, and
some are not:
Sampling Error: The deviation between a sample estimate and the population
parameter under study, caused by sample selection, is generally referred to as
the sampling error. These errors occur in the preparatory phase of a survey while
conceptualising the sampling strategy. While sampling errors are inevitable in a
survey, they can, however, be reduced by either increasing the size of the sample
or by using stratification. Higher the sample size, closer will be the sample estimate
to population value. If a population is heterogeneous, stratification can make the
sample more representative of the population and thereby reduce error.
10
National Guidelines for Data Quality in Surveys
Biases in Survey: While errors in survey occur randomly, deviation of sample
mean from the population mean can also be caused by different kinds of biases
that occur systematically during the survey. For example, response bias is caused
by a deliberate attempt to alter true response whereas coverage bias is caused by
intentionally leaving out certain communities during the survey.
11
National Guidelines for Data Quality in Surveys
1.4 Data Quality Assurance –
Management Plan and Teams
One of the ways data quality can be improved is through setting up an independent
Data Quality Management Structure (DQMS) at the institution level. DQMS can
be implemented at various levels depending on the overall objective of the survey
and its implementation strategy. At the institution level, DQMS can provide overall
guidance and strategic inputs to ensure the quality of data. A core team of experts
can provide the required guidance to each component related to the survey data
quality and monitor adherence to good practices. At the project level, depending on
whether data collection is outsourced to a survey agency or is implemented by the
institute/organisation/agency by itself, an appropriate data quality assurance team
consisting of senior management and field level quality monitors can be constituted.
12
National Guidelines for Data Quality in Surveys
13
National Guidelines for Data Quality in Surveys
1.5 Procedures for Monitoring the
Management Plan Implementation
The monitoring process oversees all the tasks and metrics necessary to ensure
that the protocols are implemented as planned, with specific reference to the data
quality assurance team’s scope, time and budget so that project risks
are minimised.
Key dimensions to monitor quality assurance management plan include, but not
limited to:
The survey manager will have the responsibility to ensure proper implementation of
the data quality management plan.
14
National Guidelines for Data Quality in Surveys
1.6 Quality Criteria for Reviewing
Survey Protocols
15
National Guidelines for Data Quality in Surveys
1.7 Ethics and Data Quality
Compliance to research ethics helps ensure data quality and credibility of research.
Ethical principles to be followed in a survey have been laid out in the available
literature and guidelines. These ethical principles serve as a guide through planning,
funding, and conducting research, as well as for data storing, analysis, sharing
of data, use of data and dissemination of research findings. While it is imperative
for any survey to seek approval from an Institutional Review Board (IRB) and to
conform to standard operating guidelines and procedures of survey research ethics
throughout the life cycle of the survey, the following key measures should
be practiced for quality control and monitoring of implementation of ethical
guidelines [13-18]:
1.
Ensure the survey uses participant information sheets and informed
consents that adhere to national/international ethical guidelines.
Additionally, for an ethically appropriate data quality monitoring
system, a statement seeking participant consent for a follow up visit
by field supervisors and other data quality monitors should be added
to the informed consent. The purposes of these monitoring visits are
to assess adherence of ethical protocols in the field and quality of
information elicited during the interviews.
2.
Design and implement a quality monitoring checklist to ensure
that ethical guidelines are being followed during the informed
consent process.
16
National Guidelines for Data Quality in Surveys
4.
De-identify respondent’s information from the publicly
available dataset.
6.
Regular monitoring by the Institutional Review Boards to ensure
adherence to the ethical protocols.
Technology Tip:
17
National Guidelines for Data Quality in Surveys
This page has been intentionally left blank
18
National Guidelines for Data Quality in Surveys
2. Quality Assurance During
the Preparatory Phase
The preparatory phase forms the foundation in quality control assurance for the whole
survey. This section of the document outlines ‘what’ quality assurance parameters
can be considered during the preparatory phase of the survey and ‘how’ they can
be implemented. It lists out the quality assurance steps in developing study design,
sampling, survey tools and manuals, recruitment of survey investigators, designing of
data entry application, and training of survey investigators. Below are some of the key
principles that can be followed during the preparatory phase of the survey:
Review different
dimensions of the
survey tools to avoid
redundancy and
eliminate bias Ensure quality of
training
19
National Guidelines for Data Quality in Surveys
2.1 Study Design – Quality Assurance
Assessment and Guidance
There are a number of study designs available that can be adapted according
to the objectives of the study. Broadly, these study designs can be categorised
as observational and experimental. Details of different study designs can be
found elsewhere [19].
The following parameters can help determine the quality of the study design:
Irrespective of the study design, the guidance provided in this document applies to
any survey-based data collection.
20
National Guidelines for Data Quality in Surveys
2.2 Sampling Design –
Quality Assurance
Adopting a proper sampling design is one of the most important steps in ensuring
quality of survey estimates. Details on various sampling methods are available
elsewhere [20-22]. Following are the principles for a good quality sampling design:
1.
Decide on an appropriate sampling design that is scientific as well as
cost-effective. In population-based surveys, having more than one
stage of selection often saves money.
2.
Identify a proper sampling frame. If not readily available, create one
through a listing exercise (for example, listing of households) or from
registers (for example, ASHA registers of pregnant women).
3.
Check the sampling frame for any
exclusion or duplication of target Checklist
sample units. If found defective,
correct the frame before
sample selection. • Sampling design
documented
4.
If information on variables that • Adequate sample size
affect the outcome of interest is • Appropriate sampling
available in the sampling frame, method chosen
use them to stratify the frame, • Stratification used
for example, female literacy, (if applicable)
ethnicity, occupation. This helps • Accurate sampling
reduce sampling error and frame created
improve design efficiency. • Sample selection
process is based on a
5.
Wherever possible, stick to scientific method and
EPSEM (Equal Probability of as per the design
Selection Method) design.
21
National Guidelines for Data Quality in Surveys
6.
Keep records of probability of selection at each stage to calculate
sampling weights. If a listing exercise is employed for preparing a
sampling frame, keep a record of all details regarding the frame,
including selection and size of segments, if any.
7.
Proper care needs to be taken to ensure that there is no deviation
from the proposed design in any form. Field monitoring of listing
exercise, centralised system to select segments and sampling units,
and use of geo-referenced location data may help achieve this.
8.
Consult a statistician to decide and develop an appropriate
sampling strategy.
Technology Tips:
• Digital devices can help improve the sampling frame boundary and
geo-referenced location. Field teams may save time in reaching PSUs by using
PSU geolocation. Online listing (using Google spreadsheet/other alternative tools)
of targeted respondents can minimise the listing error and reduce time gap
between listing and sampling
• Select PSUs using Google maps
• se statistical software such as Stata, SAS, R, online sample size calculators or
U
readily available Excel templates for sample size calculation
22
National Guidelines for Data Quality in Surveys
2.3 Survey Tools
Survey tools consist of both manuals and questionnaires, including a checklist for
survey monitoring and reporting.
23
National Guidelines for Data Quality in Surveys
A well-designed survey questionnaire should follow the BRUSO model —Brief,
Relevant, Unambiguous, Specific and Objective. The quality assurance team shall
review the questionnaire’s content, formulation, type, sequencing and length of
questions to ensure it follows the standard and recommended procedures. The
details on standard and common practices followed for developing questionnaires
are available elsewhere [27].
Some key points to keep in mind while developing survey questionnaires are:
• Develop a tool appropriate for the mode of data collection and the
type of respondents.
• Pre-test questionnaires before implementing in the field.
• Translate and back-translate questionnaires to ensure consistency
in how the questions are worded.
• Include instructions for administering questions where the
interviewer may require clarity.
• Examine and remove redundant questions, if any.
• Use only validated question items/scales.
• In a structured questionnaire, review the response codes to ensure
the categories are exclusive. Also, ensure the response categories
are explained in the survey manual. Use appropriate filters and skips
to avoid asking inapplicable questions to respondents and minimise
errors during data collection.
24
National Guidelines for Data Quality in Surveys
Checklist for quality assurance of survey manuals
• Manual includes DQA checklists for pre, during and post data collection
25
National Guidelines for Data Quality in Surveys
2.4 Quality Considerations in Designing
Data Entry Applications
In the current times, most of the surveys are using handheld devices such as
mobile phones and tablets to collect data from the field. It is important to note that
collecting quality data does not only depend on a good survey tool but also on how
well the data entry application has been designed and developed. Some key quality
considerations while designing and developing a data entry package are:
26
National Guidelines for Data Quality in Surveys
• Monitor the response time for each question. If a respondent
answers a question faster than the expected response time, include
warning messages for the interviewer to improve questioning.
• Ensure that the supervisor has the necessary access for review of
data before it is uploaded on the server.
• Collect GPS data at the start and the end of the interview.
• Ensure that both paradata and raw data are linked to the quality
monitoring dashboard for real-time monitoring.
Technology Tips:
• Some of the freely available software for developing data entry applications
are: CSPro, SurveyCTO, Epi-Info, KoBo Toolbox, ODK and ONA
• These software can also be used to build data quality assurance applications
• Use dedicated servers for data security
27
National Guidelines for Data Quality in Surveys
2.5 Quality Considerations When
Recruiting Survey Investigators
Technology Tip:
28
National Guidelines for Data Quality in Surveys
2.6 Training
Training of the field staff is an important part of the survey to familiarise them with
the SOPs before they start data collection. In case of large-scale surveys, it may not
be possible to train all individuals in one batch. It is recommended that not more
than 40 trainees should be trained in one batch. In such a scenario, a cascade
training model may be adopted.
The quality assurance for field staff training can be done as follows [28-32]:
1. Undertake pre and post-training assessments with investigators
and supervisors to gauge the level of knowledge on survey tools
and processes.
2. Conduct mock tests and role plays at the end of each
questionnaire section.
3. Orient investigators on key terminologies used in the survey and
ethical issues in data collection.
4. Trainees should be asked to demonstrate their learnings through
group discussions on key topics, mock interviews and role plays.
29
National Guidelines for Data Quality in Surveys
5. Observe difficulty in the reading of questions and make necessary
changes to the questionnaire.
6. Observe whether the investigators have trouble in understanding the
questions. Reorient investigators, if required.
7. If possible, have an independent observer to assess the training
quality at the end of the day’s training.
8. Conduct daily debriefing session with investigators and supervisors
to assess the gaps in training delivery.
Checklist
• Trainees attended all sessions on • Training includes mock
all days interviews, role plays and adequate
• Training covers all topics/sections, practice sessions
including demo sessions • Pre and post-assessment conducted
• Trainer trained in TOT or is part with participants
of the core project team • Field practice and feedback
• Training agenda followed completely sessions conducted
in terms of topics and time • Pilot testing of the procedures for
• Lectures on ethics, sexual biological sample collection,
harassment and sensitive storage, transportation and analysis
subjects organised during training
Technology Tips:
30
National Guidelines for Data Quality in Surveys
2.7 Preparatory Steps for Quality Assurance
of Anthropometric Measurements
31
National Guidelines for Data Quality in Surveys
Checklist
• Monitoring and reporting systems • Manuals have detailed
have been defined instructions for measurement,
• Selected qualified personnel are equipment calibration, care
trained and standardised to take and maintenance
anthropometric measurements • Standardised equipment procured
• Job aids and manuals (including for the survey
videos) are prepared • Equipment calibrated as
per protocol
Technology Tip:
32
National Guidelines for Data Quality in Surveys
2.8 Preparatory Steps for Quality Assurance
of Biological Sample Collection
Key steps to be taken prior to initiating data collection for ensuring data quality in
case of biomarker measures are:
33
National Guidelines for Data Quality in Surveys
5. Use standard, internationally recognised analytical methodologies
in the laboratory for biochemical analysis. Please see Biomarkers
of Nutrition for Development (BOND) [33], European Registration
of Cancer Care (EURECCA) [34] and National Health and Nutrition
Examination Survey (NHANES) [35] for appropriate methodologies.
7. Ensure that the data collection teams are well-acquainted with the
standard operating procedures.
34
National Guidelines for Data Quality in Surveys
Checklist
• Laboratories selected have • SOPs are prepared with detailed
internal and external quality instructions for sample collection,
control procedures storage and transportation
• Phlebotomists qualified with a • If possible, employ quality
Diploma in Medical Laboratory control laboratories for
Technology recruited comparison testing
• Standard and uniform materials • Appropriate database templates/
are procured for use across formats and information systems
survey locations established for data capture
• Standard internationally recognised
analytical methodologies are
used in the laboratory for
biochemical analysis
35
National Guidelines for Data Quality in Surveys
This page has been intentionally left blank
36
National Guidelines for Data Quality in Surveys
3. Implementation of Quality Assurance
Activities During the Data Collection Phase
Quality control during data collection is the most important part in a survey. This
section of the document outlines ‘what’ quality assurance parameters should be
considered during the data collection phase of the survey and ‘how’ they can
be implemented. It lists out the quality assurance steps, tools to monitor survey,
anthropometric and biological data collection, and provides guidance on the
coordination mechanism between the quality assurance team and the survey team.
It also describes how to use paradata and data quality dashboards to monitor data
quality during field survey.
37
National Guidelines for Data Quality in Surveys
Below are some of the key principles that can be followed during the data
collection phase:
Summary of considerations
for quality assurance during
data collection
Monitor quality of
data intensely
through field visits
Monitor quality
of anthropometric data
through re-measurement
in a sub-sample Employ on-field checklists,
review Field Check Tables
(FCTs) and use paradata
and data quality dashboards
for effective monitoring of
field data collection
38
National Guidelines for Data Quality in Surveys
3.1 Summary of Considerations for Quality
Assurance During Data Collection
Data quality can be enhanced by considering the following points during data
collection and should be a part of field operation strategies. The field operation staff
(field coordinator and team supervisor) should strictly monitor the following aspects
during data collection.
39
National Guidelines for Data Quality in Surveys
3.2 Steps for Monitoring Survey Data
Collection Quality
Monitoring data collection in the field is necessary for improving its quality. With
the technological improvements in surveys for data collection, there are online and
offline checks recommended to further improve data quality. Following are the
recommended steps for monitoring survey data collection:
• Each team member in the survey has a brief outline (one page or
less) highlighting responsibilities they have in the survey.
• Review the completed interview while the survey team is still in the
vicinity of the PSU.
• Field check tables generated by the online system (or from the
central data quality assurance team) should be regularly reviewed
and discussed with the field staff.
40
National Guidelines for Data Quality in Surveys
Daily data quality checks
41
National Guidelines for Data Quality in Surveys
Checklist for data collector
Technology Tips:
42
National Guidelines for Data Quality in Surveys
3.3 Steps for Monitoring of
Anthropometric Data Quality
43
National Guidelines for Data Quality in Surveys
• Re-measurement to assess accuracy: (1) Blinded re-measurement is
recommended on randomly selected sub-samples that have already
been measured as part of the survey sample. (2) Flagged
re-measurement is recommended for flagged data/implausible
values. Re-measurement should be done using the same type of
calibrated equipment and standard measurement methods used for
the initial measurement.
Checklist
• Age of the respondent verified • Parallax error avoided
• Set up of anthropometric equipment • Re-measurement done to assess
for measurement as per the accuracy (blinded random
protocols followed re-measurement of sub-sample
• Routine calibration of anthropometric and/or flagged re-measurement)
equipment done as per the schedule using the same type of calibrated
and calibration log maintained equipment and standard
• Job aids and manuals measurement methods
are available • Measurement errors detected
within acceptable limits
Technology Tip:
44
National Guidelines for Data Quality in Surveys
3.4 Steps for Monitoring Biomarker
Sample Collection, Storage and
Transportation Processes
Collection of samples for biomarker analyses in field surveys and obtaining reliable
results is challenging as samples have to be collected and transported from the
field to the laboratory under various conditions. Although in many large-scale
surveys, biomarker tests are done in the field itself, if the survey includes biomarker
indicators, a rigorous quality assurance procedure needs to be established using
standard internal and external quality assurance procedures. Some important steps
are as below:
45
National Guidelines for Data Quality in Surveys
• Several biomarkers are adversely affected when the blood samples
are stored for a long time period before plasma/serum collection,
due to partial/complete cell lysis. Therefore, a careful sample
processing plan (often specific to the biomarker of interest) needs to
be thought out before the sample collection.
46
National Guidelines for Data Quality in Surveys
• In case of spot testing of samples, such as random blood
glucose testing or haemoglobin testing, check if the results are
recorded accurately.
• Process control:
47
National Guidelines for Data Quality in Surveys
• External quality assurance:
• At the laboratory:
Q Check if the procedures being used for sample processing are in line
with the SOPs prepared for the study.
Q Review laboratory registers to check whether the samples collected
match those that have been sent to the QC laboratories.
Q Check the calibration log in the laboratory to ensure devices were
calibrated as per the SOPs.
Q Review the results of the comparison analysis carried out between the
results from the laboratory and the QC laboratories.
Q Review the data logger (if used) results every week and give feedback
to the laboratory personnel. The laboratory personnel, in turn, should
provide feedback to the phlebotomists.
Q Check if the internal quality assurance systems are being implemented.
48
National Guidelines for Data Quality in Surveys
Technology-based monitoring during data collection
49
National Guidelines for Data Quality in Surveys
Checklist
• SOPs with detailed instructions are • Each sample and aliquot are
prepared for sample collection, appropriately labelled
storage and transportation • Results are recorded correctly in
• Laboratory runs internal case of spot testing of samples
quality checks • Correct procedures are followed
• A subset of samples is sent to QC for sample processing
laboratories for comparison testing • Calibration log is maintained, and
• Samples collected from selected devices are calibrated as per
eligible respondents the SOPs
• Standard equipment and consumables • Comparison analysis carried out
used for sample collection between results from the
• Cool boxes are used for sample laboratory and from the
transportation to maintain QC laboratories
adequate temperatures for at least • Time and temperature monitoring
12-16 hours are undertaken using a data
• Appropriate instructions are given logger and feedback is given
to the respondents based on results
• Job aids available with
the phlebotomists
Technology Tips:
• SMS-based alert system (for example, RapidPro) can be used to track the journey
of biological samples from the point of collection in the field to the laboratory
where they are analysed. This system should include automatic alerts sent
whenever there is breach of time
• A data logger can be used to monitor time and temperature during the sample’s
journey from field to laboratory. Automated programmes can be developed to
analyse data logger data to maintain a log of any breach in temperature and
provide immediate feedback
• Cool bags containing biological samples can be geotagged
50
National Guidelines for Data Quality in Surveys
3.5 Mechanism for Coordination Within
and Between the Quality Assurance
Team and the Main Survey Team
The DQA teams and main survey teams must maintain a clear understanding of
their roles and responsibilities during data collection. The following points should be
considered to have the teams work in tandem to ensure data quality:
Technology Tip:
51
National Guidelines for Data Quality in Surveys
3.6 Tools to Monitor Quality of
Field Data Collection
There are several tools that can be designed for monitoring data quality
depending on the type of survey, the issues that are examined and the levels at
which each survey is conducted. Following are the recommended tools for data
quality measurement:
52
National Guidelines for Data Quality in Surveys
5. Field Check Tables (FCTs): FCTs are created considering the key
indicators pertaining to study objectives and can be developed for
each team and interviewer. FCTs play an instrumental role in data
quality assurance. FCTs help monitor the response rate, negative
screening, estimates of critical indicators, investigator efficiency and
bias. FCT data should be reviewed and discussed with all levels of field
staff for better understanding and to provide feedback.
53
National Guidelines for Data Quality in Surveys
3.7 Use of Paradata to
Improve Data Quality
54
National Guidelines for Data Quality in Surveys
7. Enabling the identification of crisis on the field and implementing
evidence-based rapid response
8. Comparing the data quality metric to ensure quality output
9. Error-cost trade off report that includes survey performance
reporting indicators such as time per unit, cost per unit and
completion rate at various levels of aggregation
55
National Guidelines for Data Quality in Surveys
3.8 Type of Analytics on Paradata to
Present Data Quality Metrics
3. GPS data: The Global Positioning System data can be used to track
team movement, increase efficiency of survey implementation strategy,
coverage bias and any clustering issue in the estimates.
The results from the analysis of paradata may be documented in the data quality
reports as a part of the survey documentation.
Technology Tip:
• Use software such as Stata, Excel, SPSS, Python, R, CSPro for analysis
of paradata
57
National Guidelines for Data Quality in Surveys
3.9 Using Dashboards to Monitor
Data Quality
A data quality dashboard should have options to demonstrate the time trend of the
above quality monitoring indicators and should also enable filtering at various levels
of data collection. Additionally, it should be prepared in a way that a non-technical
user can navigate it easily and be able to clearly understand the data quality issues
as and when they creep in during data collection. The data quality dashboard
58
National Guidelines for Data Quality in Surveys
should be made available to all relevant supervisory positions starting from the field
supervisor with viewing controls for different levels. For example, a field supervisor
should be able to view only the performance of his/her team members; whereas
someone sitting at the central office should be able to view details of all teams.
For a real-time data quality dashboard, it is important to choose the right data
collection platform. One should prefer a platform that allows connection between
the data storage server and the dashboard creating platform. While there are
several dashboard creating platforms such as Power BI, Dashit and Google
Dashboard, it is important to choose one which can be smoothly managed and
easily handled by the study team.
Technology Tip:
59
National Guidelines for Data Quality in Surveys
3.10 Indicators to Measure Data Quality for
Providing Feedback to Investigators
During Data Collection
During data collection, the survey agency/nodal institute must ensure that regular
quality check reports are prepared and sent to the field teams. Different indicators
should be assessed and feedback on quality aspects must be sent to observers in
the field so that the investigators are appropriately debriefed.
In population and health surveys, some key indicators for feedback include [38-40]:
• Household completion rate: Out of the total number of eligible
households; the percentage of households completed, refused,
dwelling vacant or destroyed or not found
• Number of household members at home with their
age-sex composition
• Completeness of age and age heaping in all collected age variables –
age, age at marriage, age at first child, age at death (if any)
• In case of a child, the percentage of date of birth information
obtained from birth certificate, vaccination card, caretaker’s recall, or
other sources
• Completeness of length and height in case of anthropometric
measurements. Standing/lying position for length/height in case of a
child. Digit heaping in height and weight measurement
• Average time taken per schedule
• Frequency at which equipment is calibrated
• Number of interview schedules filled per investigator per day
• Missing information and skipping pattern followed and outliers
identified
60
National Guidelines for Data Quality in Surveys
• Pattern in reported days between onset of symptoms and
diagnosis of disease
• Pattern in reported days between diagnosis and treatment seeking
• Treatment seeking (government vs private health facility)
• Number of new and relapse cases of infection diseases
The data quality feedback provided to the investigators must contain the
following measurements:
• Review of filled interview schedules for missing information, outliers
and skipping pattern
• Re-measurement of sub-samples can be performed while the survey
team is in the field
• List of investigators with missing information, outliers and
skipped questions
Technology Tip:
• For real-time monitoring of data quality, a dashboard for supervisors and field
coordinators can be prepared utilising information on completed cases,
non-response, missing values, negative screening rates, interview completion
rates, average time taken to complete interviews and so on
• Use decision trees, neural networks, SVM, K-means for classifying the number
of times ‘don’t knows’ and ‘skips’ are used by either the interviewer or the
respondent
61
National Guidelines for Data Quality in Surveys
3.11 Documentation of Data
Quality Assurance
A document on data quality helps data users understand the strengths and
limitations of data and also enables them to derive appropriate conclusions from
the data. At the same time, it also helps other data producers reproduce similar
data by implementing similar data quality assurance mechanisms. It further helps
carry out a comparison of data across surveys.
62
National Guidelines for Data Quality in Surveys
Documentation of survey data quality should start from the preparatory phase. It
could be designed as a process document where details are noted on each step of
survey planning and implementation from the perspective of data quality. Moreover,
one person should be exclusively assigned to document all quality aspects by
interacting with field teams, survey managers and other research team members.
Technology Tip:
• Use digital platforms (for example, WhatsApp, Slack, Chanty, Flock, Hangout)
to share audios/videos, photos/screenshots, text messages, conduct group
discussions and also to document DQA reviews and actions taken within the
project team
63
National Guidelines for Data Quality in Surveys
This page has been intentionally left blank
64
National Guidelines for Data Quality in Surveys
4. Data Quality Assessments
Post Data Collection
Once data is collected, it is important to review the data, and undertake analytics to
examine data quality parameters. This section of the document outlines the different
techniques that can be employed for assessing quality of data, and estimation of
sampling and non-sampling errors. It also guides computation of sampling weight
and application of different machine learning techniques in surveys. Below are some
of the key take-aways from this section of the document:
Compute appropriate
sampling weights and
report sampling errors
on key indicators
Review the data Report data
thoroughly for outliers, quality measures,
missing values, and including
inconsistencies non-sampling errors
65
National Guidelines for Data Quality in Surveys
4.1 Post Survey: Profiling
Survey Data
One of the most critical steps in data quality assurance is processing of raw data to
make it suitable for analysis. There are two critical aspects in processing raw data:
(a) checking errors in raw data and (b) preparing metadata. Even before checking
errors in raw data, a key step involves reconciliation of field data. Handling large
quantity of data can get tricky and tedious, both for the data analysts as well as
66
National Guidelines for Data Quality in Surveys
the analytical tools or software in which the data is being analysed. It is, therefore,
of utmost importance to perform systematic checks that are mentioned below on
snippets of data every time the dataset is updated to avoid any error creeping in.
Time and again one must also compare the new version of the file with the older
ones and check the total number of rows and columns in each to make sure no
data is lost or converted or corrupted in the process.
67
National Guidelines for Data Quality in Surveys
available in string (text) or vice-versa. The data analyst should review
all the variables and their data types. If any variable is not as per the
expected data type, it should be transformed while finalising the data.
Similarly, certain fields can also have a limit to the length of the strings
which might not permit characters beyond that thereby leading to
incomplete data entry.
• issing value: Check whether the values that are missing from
M
the data are the ones which were specified at the planning phase of
the survey itself and were expected to have missing values in them
(i.e as per skip patterns in the interview tools). Given that most of
data is collected in handheld devices, the chances of missing values
are rare these days, However, if still there are missing values, data
analyst needs to check with the field team the reason behind missing
values. Such reasons should be documented (if possible, in the data
itself). In some instances, people prefer imputing missing values, in
such cases, it is better to use multivariate imputation method. For
such imputation, one should create relevant variables which can help
determine the outcome of a missing observation.
68
National Guidelines for Data Quality in Surveys
interquartile range and coefficient of variation). This would facilitate
understanding the performance of data.
• Sanity checks: While during data collection, several range, skip and
simple validation checks are included in the data entry program, it is
not possible to include all relational checks during the data collection.
Before finalising the data, one should check for any out of range
values and internal inconsistencies in the data. The data analyst
should verify if the data in a variable is consistent with other related
variables in the same data. To check consistency of the data, one
should write program in data processing software and document if
any inconsistencies are found. If the data analyst decides to change
any value, it should be documented and rationale behind the change
should be mentioned.
Preparing metadata
Metadata is the documentation of the data including important details about the
data, the instruments, protocol information, analysis approach and survey tool
details. However, in most cases metadata is not well documented and therefore
the potential of having good metadata is often not realised. Metadata should ideally
answer all the what, why, where, who, how questions about the survey. Metadata
for survey data should include definition of all variables, description of all coding
values, date of data creation, information about data custodian, and documentation
of specific data issues.
69
National Guidelines for Data Quality in Surveys
Whenever there is human intervention, the possibility of errors creeps in, therefore,
while performing these data checks and preparing metadata, one must be
extremely careful that no further distortion to the data is made. Efficient data
analysts should perform these checks, reconcile carefully, and document all the
changes made to the original raw data for future reference.
Technology Tip:
• Use of open sources such as Python, R, etc. and statistical packages such as
STATA, SPSS, SAS, etc. can help in cleaning and summarising the data
70
National Guidelines for Data Quality in Surveys
4.2 Sample Weights and
Sampling Errors
In a sampling design, if units are not selected with EPSEM, the sample mean
will not be an unbiased estimate of the population mean and it would become
imperative to use some weights to take care of the bias in estimation. In probability
sampling, the probability of selection of a unit is known and the sample weight
for each unit can be reciprocal (inverse) to its selection probability. Multiplying the
variate values with their respective weights will provide an unbiased estimate of
a parameter. Besides, computing sampling errors of key outcome indicators and
reporting them are important steps of documentation of data quality.
71
National Guidelines for Data Quality in Surveys
3. It is a good practice to incorporate
sample response rates into the
weight calculation to adjust Checklist
for any bias due to differential
response.
• All required information
4. Finally, one may choose to for calculating weights
available with the
normalise weights before applying
project team
it to the data for analysis.
• Considered all probabilities
of selection at each stage
5. To calculate sampling errors, add
information on sample design • Compared weighted
including stages of selection and population with census
stratification to the dataset. population on important
stratification variables (for
example, rural/urban, ST/
6. Another important indicator to SC, literacy)
report is the design effect, which
• Normalisation of weights
is, defined as the ratio between
the standard error using the
sample design adopted in the
survey and the standard error that
would result if a simple random sample had been used.
Detailed discussion on sampling weights can be found elsewhere [20, 22, 41].
Technology Tip:
• Sampling errors can be calculated using statistical software (for example, Stata,
SPSS, SAS) by specifying the sampling design
72
National Guidelines for Data Quality in Surveys
4.3 Data Quality Metrics: Calculation
of Non-Sampling Errors
Non-sampling errors are caused at the time of data collection and data processing,
such as failure to locate and interview the correct household, misunderstanding
of the questions and data entry errors. Non-sampling errors can be classified into
three categories [42] :
73
National Guidelines for Data Quality in Surveys
Non-Sampling Error Assessment Methods
Non-Response Error: Non-response arises when Non-response error is assessed by the response
households or other units of observation which rate. This is calculated as “the number of eligible
have been selected for inclusion in the survey fail sample units who responded to survey divided
to yield all or some of the data that were to be by the total number of eligible sample units”.
collected.
Two major categories on non-response may be
identified as non-contact and refusal.
Non-contact occurs due to difficulties in
accessing sample units, failing to contact
respondents, failing to gain cooperation.
Refusal occurs when respondent deny providing
information.
Response error: This occurs due to collection Response error can be reduced by the length
of invalid or inappropriate data from sample of recall which is the time elapsed between the
elements which lead to inconsistency in data, date of a particular event or transaction that
missing values, and outliers. occurred during the reference period and the
date on which a respondent is asked to recall it.
Missing data in general hampers the reliability
of estimates and may be treated as a response
error. Methods to impute the missing values can
be used to minimise the response error. Missing
data can be replaced by imputation using
various methods:
• Cold deck imputation [43]
• Hot deck imputation [44]
• Random imputation [45]
• Mean value imputation [46]
74
National Guidelines for Data Quality in Surveys
5. Use of Machine Learning Techniques
in Improving Data Quality
In the last decade, there have been significant technological innovations, especially
due to the application of Artificial Intelligence (AI) including Machine Learning (ML).
Even in survey data collection and management, there is a range of machine
learning and artificial intelligence techniques that helps improve data quality in a
cost-effective way. For example, in a field survey, identifying the right household to
interview was earlier a manual job and often led to confusion and wrong selection of
households. With the use of geo-spatial and ML algorithms for image recognition,
one can improve the accuracy of household selection as well as save cost of
listing each household. Similarly, once data collection is over, one can use isolation
forest to find potential anomalies in the data. This chapter will discuss some of the
applications of ML algorithms that can help to improve data quality.
PRE-SURVEY:
DURING SURVEY:
75
National Guidelines for Data Quality in Surveys
POST SURVEY:
Though these techniques have been suggested to ensure good quality data, it
is advisable to first try each of them either on a snippet of the data or on a copy
of the original data. Only once reviewed and tested, and the results obtained are
convincing and desirable, should one use these above-mentioned machine learning
techniques on the whole dataset to have an improved quality data.
76
National Guidelines for Data Quality in Surveys
Examples of application of machine learning techniques to
assess data quality
Two machine learning tools that automate data quality checks and labelling are now
available in the public domain.
77
National Guidelines for Data Quality in Surveys
Outlier Detection Tool
NDQF data science lab has developed an outlier detection tool to identify the
potential outliers in the dataset using machine learning techniques. The tool works
for any survey dataset using multiple data science approaches like silhouette score
calculations, k-means clustering and isolation forest to flag observations that are
potential outliers. Unlike most methods of outlier detection that helps in identifying
outliers within one variable (one-dimensional), this tool will help to solve the bigger
challenge of finding outliers in multidimensional space. The outlier detection tool is
available in public domain (https://ndqf001.pythonanywhere.com/).
78
National Guidelines for Data Quality in Surveys
References
79
National Guidelines for Data Quality in Surveys
15. Spatz ES, Suter LG, George E, Perez M, Curry L, Desai V, Bao H, Geary
LL, Herrin J, Lin Z, et al. An instrument for assessing the quality of
informed consent documents for elective procedures: development and
testing. BMJ open 2020;10(5):e033297.
16. Welch BM, Marshall E, Qanungo S, Aziz A, Laken M, Lenert L, Obeid J.
Teleconsent: a novel approach to obtain informed consent for research.
Contemporary clinical trials communications 2016 Aug 15;3:74-9.
17. National Drug Abuse Treatment Clinical Trial Network. Internet: https://
gcp.nidatraining.org/resources.
18. Guideline IHT. Guideline for good clinical practice. J Postgrad Med
2001;47(3):199-203.
19. UNFPA and the Population Council. Operations Research Methodology
Options: Assessing Integration of Sexual and Reproductive Health and HIV
Services for Key Affected Populations. August 2013.
20. Kish L. Survey Sampling. New York: John Wiley & Sons, Inc., 1995.
21. Martínez LI. Technical report on Improving the use of GPS, GIS and RS for
setting up a master sampling frame. Technical Report Series GO-06-2015:
FAO, 2013.
22. Roy TK, Acharya R, Roy A. Statistical Survey Design and Evaluating
Impact. New Delhi, London and New York: Cambridge University Press,
2016.
23. Measure DHS. DHS Survey Organization Manual 2012.
24. International Institute for Population Sciences (IIPS) and ICF. National
Family Health Survey (NFHS-4), 2015–16: Interviewer’s manual, Mumbai:
IIPS. Mumbai: IIPS, 2014.
25. Centers for Disease Control and Prevention. Global Adult Tobacco Survey
(GATS): Core Questionnaire with Optional Questions, Version 2.0. Atlanta,
GA: Global Adult Tobacco Survey Collaborative Group., 2010:2010–56.
26. Macro International Inc. AIDS Indicator Survey: Household Listing Manual
MEASURE DHS Calverton. Maryland, USA, 2007.
27. Peterson RA. Constructing effective questionnaires. Thousand Oaks, CA:
Sage, 2000.
28. Macro ICF. Training field staff for DHS surveys. Calverton, MD. ICF Macro,
2009:724.
29. World Health Organization. Health in all policies: training manual. World
Health Organization. 2015.
80
National Guidelines for Data Quality in Surveys
30. Measure D. Demographic and health survey sampling and household
listing manual. Calverton: ICF International, 2012.
31. Measure D. Demographic and health survey biomarker field manual.
Calverton: ICF International, 2012.
32. World Bank. Internet: https://dimewiki.worldbank.org/wiki/Enumerator_
Training.
33. Raiten DJ, Namasté S, Brabin B, Combs GJ, L’Abbe MR, Wasantwisut
E, Darnton-Hill I. Executive summary--Biomarkers of Nutrition for
Development: Building a Consensus. Am J Clin Nutr 2011;94:633S-50S.
34. EURECCA. Internet: www.eurecca.org 2020.
35. National Health and Nutrition Examination Survey. Internet: http://www.
cdc.gov/NCHS/NHANES.htm.
36. UNICEF and Population Council. Comprehensive national nutrition
survey: anthropometric measurement manual. March 2016.
37. National Health and Nutrition Examination Survey. Anthropometry
Procedures Manual,. Centers for Disease Control and Prevention, 2011.
38. World Health Organization. Data quality review: a toolkit for facility data
quality assessment. Geneva: World Health Organization, 2017.
39. Ehling M, Körner T. Handbook on data quality assessment methods and
tools. European Commission, Eurostat, 2007.
40. Measure evaluation. Internet: https://www.measureevaluation.org.
41. Kish L. Weighting for Unequal P_i. Journal of Official Statistics
1992;8(2):183-200.
42. National Household Survey Capability Programme UN. Household
Surveys in Developing and Transition Countries (Studies in Methods.
Series F; No 96), 2005.
43. Singh S. A new method of imputation in survey sampling. Statistics
2009;43(5):499-511.
44. Andridge RR, Little RJA. A Review of Hot Deck Imputation for Survey
Non-response. International Statistical Review 2010;78(1):40-64.
45. Kalton G, Kish L. Some efficient random imputation methods.
Communications in Statistics - Theory and Methods 1984;13(16):
1919-39.
81
National Guidelines for Data Quality in Surveys
46 Jadhav A, Pramod D, Ramanathan K. Comparison of Performance of Data
Imputation Methods for Numeric Dataset. Applied Artificial Intelligence
2019;33(10):913-33.
47. Chew RF, Amer S, Jones K. Residential scene classification for gridded
population sampling in developing countries using deep convolutional
neural networks on satellite imagery. International Journal of Health
Geographics 2018;17(12).
48. Shah N, Mohan D, Bashingwa JJH, Ummer O, Chakraborty A, LeFevre
AE. Using Machine Learning to Optimize the Quality of Survey Data:
Protocol for a Use Case in India. JMIR Res Protoc 2020;9(8):e17619.
49. Brownlee J. 4 Automatic Outlier Detection Algorithms in Python. 2020.
50. Kumar D. Fraud Analytics using Extended Isolation Forest Algorithm.
2020.
51. Hawkins S, He H, Williams G, Baxter R. Outlier Detection Using Replicator
Neural Networks. In: Kambayashi Y., Winiwarter W., Arikawa M. (eds)
Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture
Notes in Computer Science,. In: Springer B, Heidelberg, ed., 2002.
52. Fuertes. T. Internet: https://quantdare.com/outliers-detection-with-
autoencoder-neural-network/.
53. Ryan M. Internet: https://medium.com/datadriveninvestor/how-to-
clustering-and-detect-outlier-at-the-same-time-30576acd75d0.
54. Infosys. Using machine learning in data quality management. In: Infosys,
ed., 2020.
55. Kumar S. 7 Ways to Handle Missing Values in Machine Learning. 2020.
56. Gupta A. Internet: https://medium.com/airbnb-engineering/overcoming-
missing-values-in-a-random-forest-classifier-7b1fc1fc03ba.
57. Hu W, Zaveri A, Qiu H. Cleaning by clustering: methodology for
addressing data quality issues in biomedical metadata. BMC
Bioinformatics 2017;18:415.
82
National Guidelines for Data Quality in Surveys
Selected Definitions
83
National Guidelines for Data Quality in Surveys
• Decision Tree • Interquartile Range
It is a decision support tool that looks It is a measure of dispersion in data,
like a tree structure and is used for computed by taking difference between
classification and prediction modelling. 75th (3rd quartile) and 25th (1st quartile)
The uses of decision tree are found in percentiles.
operations research (decision analysis)
and machine learning. • Isolation Forest
It is an unsupervised machine learning
• Equal Probability of Selection algorithm that is used for anomaly
Method (EPSEM) detection and works on the principle of
EPSEM is a sampling technique that isolating anomalies/outliers.
results in the population elements • K-means Clustering
having equal probabilities of being
included in the sample. It is a type of unsupervised machine
learning algorithm that partitions
• External Validity available observations into several
It refers to how well the outcome of a clusters where each observation
study can be generalised with respect belongs to the cluster with the nearest
to different measures, persons, settings, mean.
and times. • Kurtosis
• Field Check Table (FCT) It is the measure of tailedness or
These are a set of tools to understand peakedness of frequency distribution
the progress of survey work in field, of a variable in data, that is, how tall or
track any significant departure from sharp the central peak of the distribution
expected distributions of important is while measured with respect to a
population parameters and identify normal distribution.
problematic survey teams or individual • Listing
investigators as source of bias/
systematic error, if any. FCT is usually It refers to the process of identifying
generated and discussed at a week or the target population or households for
two-week time lag. developing a sampling frame.
84
National Guidelines for Data Quality in Surveys
• Negative Screening Rate • Pre-testing
It is defined as ratio of the number of It refers to the stage in survey
screening questions marked ‘No’ by the research when survey questions and
interviewer to the total number of valid questionnaires are tested on members
screening questions in the instrument. of target population/study population, to
evaluate the reliability and validity of the
• Neural Network survey instruments prior to start of the
Neural network is a computational survey.
learning system using a network of
functions to understand and translate • Probability Sampling
data into a desired output, usually in a It is a sampling technique in which
different form. samples are drawn from the target
population using methods based on
• Non-sampling Error the theory of probability. Each sample
It is a type of error, not related to selected in this method has a specified
sampling of units, arising during the probability of selection.
survey process, such as failure to locate
and interview the correct household, • Primary Sampling Units (PSU)
asking questions incorrectly, and data It refers to the set of sampling units
entry errors. from where units are selected in the first
(primary) stage of a multi-stage sampling
• Optical Character Recognition design.
This is a kind of technology that is used
to convert virtually any image containing • Random Forest
texts (typed or handwritten or printed) Random forest is an ensemble
into machine-readable text data. learning method used for classification,
regression and other tasks. It operates
• Paradata by constructing numerous decision
In a survey, it refers to auxiliary data trees at training time and displaying the
collected about interviews and survey class which is the mode of the classes
processes. Some examples of paradata or the mean/average prediction of the
include (not limited to) duration of individual trees.
interview, time taken to ask each
question, and negative screening. • Random Imputation
It refers to the process where
• Parallax Error observations of an attribute are drawn
This is an error caused in reading randomly from the dataset for imputing
of a measurement (for example, in the missing values of that attribute.
anthropometry) due to a viewing angle • Response Error
that is other than an angle perpendicular
It represents inaccuracies in responses
to the object being measured.
to questions asked during sample
• Precision surveys and arises due to a number of
reasons including problems with the
It refers to how closely repeated
survey instrument or its implementation
measurements (or observations) of an
and respondent’s understanding of
object (or indicator) come to
the questions.
duplicating measured or observed
values.
85
National Guidelines for Data Quality in Surveys
• Response Rate • Spot-check
It refers to the proportion of sample who Spot-check is a way to ensure data
responded to a survey out of the total quality in field surveys. It is when
number of target sample. senior survey staff physically observes
interviewers conducting interviews.
• Sampling Design
It refers to the methodology used to • Standard Deviation
select sample units for measurement It is a statistical quantity that measures
from a specified population and the amount of variation (or dispersion) of
is described by defining sampling a set of values of a particular variable or
universe, sampling frame, stages of in other words, it measures how far the
sampling and method of sampling at values disperse from the mean value of
each stage. the variable.
• Sampling Error • Structured Questionnaires
It is the deviation between a sample It refers to a type of questionnaire with
estimate and the population parameter questions that allow only a
under study, caused by sampling design pre-specified set of responses for each
or sample selection. question.
• Sampling Frame • Support-vector Machines (SVM)
It refers to the list of the target These are supervised machine learning
population units from which samples are models with learning algorithms
drawn for the data collection. analysing data for two-group
classification problems as well as
• Semi-structured Questionnaires regressions.
It refers to a type of questionnaire which
contains both open-ended and closed- • echnical Error of Measurement
T
ended questions, that is, it allows some (TEM)
questions to have pre-specified answers It is an accuracy index to present error-
and some other to have unspecified margin in anthopometric and captures
answers in text form. both inter-rater and intra-rater variability
in anthropometric measurements.
• Simple Random Sampling
It is used to evaluate the accuracy of
A simple random sample is a sampling anthropometry measurers during a
method in which a sample is selected training session.
randomly from a specified population
in such a way that each member of the • Total Survey Error
population has an exactly equal chance It refers to the accumulation of all the
of being selected into the sample. errors that arise in the design, collection,
processing, and analysis of survey data.
• Skewness
It is a measure of symmetry (or • Z-Score
asymmetry) of the frequency distribution It is a statistical quantity that gives
of a variable in data, with respect to its one an idea of how far from the mean
central point. a particular data point is. Given a set
of values, it measures the number of
standard deviations below or above the
sample mean a particular data is.
86
National Guidelines for Data Quality in Surveys
List of Contributors
Population Council
87
National Guidelines for Data Quality in Surveys
Notes
Notes
Notes
91
National Guidelines for Data Quality in Surveys
92
National Guidelines for Data Quality in Surveys