Data Management Training
April, 2017
1
Part-I
Data Management
Part-II
Data Capturing Tools
2
Data Management overview outline
Definitions and Principles
Unforeseen Problems and Solution Tools
DM Process and Questionnaire Development
Questionnaire Handling
Procedures for Completion of Questionnaire
Archiving of Questionnaire
Tsegaye Hailu
Definitions and Principles
“Data management” is a general term covering
procedures both for:
– the collection of data at study sites and
– the quality control of those data before and after they
have been submitted to a statistical analysis or
coordinating centre.
Data management includes all aspects of data
planning, handling, analysis, documentation and
storage, and takes place during all stages of a study.
Data management team is responsible for producing
high quality databases containing high quality data
meet operational, clinical and regulatory requirements
4
WHY COLLECT THE DATA ?
What Do you think is the Importance of
Collecting the Health Data or Study in general?
Tsegaye Hailu
WHY COLLECT THE DATA ? (1)
To meet objectives of study & health and patient
management strategy
Hopefully study will cause improvement in:
Government policies on the TB /HIV strategy etc.
Reduction in transmission (spread) of the disease
Therefore reduction in disease and mortality
Tsegaye Hailu
WHY COLLECT THE DATA ? (2)
It is very important therefore to pay attention to:
• Data Content
• Accuracy of the Data
• Relevance of the Data
• Completeness of the Data
Tsegaye Hailu
To Achieving Quality of Data:
All Health/Study data/information should be:
- Recorded
- Handled
- Stored
In a way that allows its accurate reporting,
interpretation, and verification.
Leading to:
Credible health research data.
Tsegaye Hailu
When does Data Management Begin?
Begins with the overall planning process of the research
/ survey, or whatever the purpose is.
Hence, Data Management Team should be involved in
the overall planning of the research or survey.
Will help important decisions on:
Rate of data collection
Rate of data processing
Project Time table
Calculation of Budget
Tsegaye Hailu
When does Data Management Begin?
Contributing to Protocol Writing
Contributing to Questionnaire Development
Contributing to Database Design and Writing of Data
Dictionaries
Writing of Data Management and other SOPS
Data Entry, Validation, Cleaning and Archiving of
Research Data & Documents
Preparing and doing data for Statistical Analysis
Completion of Research / Study Report
Tsegaye Hailu
Data Management motto: G I G O
GARBAGE IN GARBAGE OUT
• It means that if your data is of poor quality then it
is certain to be the case that the results of any
analysis will be unreliable.
• One of the major roles of Data Management is to
minimise error at all stages of the study and not
just at the computing stage.
Tsegaye Hailu
PROBLEMS
WHAT POSSIBLE
PROBLEMS
CAN YOU ENVISAGE?
Tsegaye Hailu
UNFORESEEN PROBLEMS
Data and/or software occasionally gets corrupted for
some unknown reasons.
Hardware problems include computer breakdown
(hard disk) or CD or USB containing backup data or
struck by Lightening.
Printer breakdown or faulty (more annoying during
report production.
Tsegaye Hailu
UNFORESEEN PROBLEMS cont’d
Power cuts or UPS breakdown – can lead to loss of data
(an hours or days or a weeks or months work).
Flooding or Fire outbreak destroying forms
Termites eating questionnaire forms
Data security and confidentiality broken.
Tsegaye Hailu
FIELD AND DATA ENTRY PROBLEMS
Difficulty getting exact date from Subjects.
Dates: _ _ /01/ 98, 1 month ago, weeks ago.
D.O.B and/or Age not entered;
waste time if poorly filled
Weeks for days, months for weeks vice versa.
Missing values though box provided;
don’t know whether question was asked or not.
Y or N unanswered does not mean answer is NO.
Tsegaye Hailu
FIELD AND DATA ENTRY PROBLEMS
Some result units recorded per uL others per L
Condition is jump yet people fill in next question
Consistency e.g. 1st Name before 2nd Name not mixed
up. 1 for Y, 2 for N not mixed up
Writing eligibly
Tsegaye Hailu
How Can Data Management Solve this Problem?
Use tools at its disposal to prevent as much as
it possibly can, the unforeseen problems.
“PREVENTION IS BETTER THAN CURE”
Tsegaye Hailu
TOOLS FOR THE PROBLEMS (1)
DETECTING ERRORS IN DATA
Manual Checking: Manually going through forms
Interviewers (F. Workers), Data supervisors (if any)
Checking During Data Entry: Check files or programs
written; (Interactive checking)
Checking After Data Entry: (Batch checking)
Validation (and/or Verification)
Tsegaye Hailu
TOOLS FOR THE PROBLEMS (2)
DETECTING ERRORS IN DATA
VERIFICATION:
• Used to ensure that data entered is actually data
on the questionnaire. This is normally
accomplished by double entry (entry by two
different clerks).
• Verified data does not necessarily mean accurate
data. If data is invalid from the field it will be
verified correctly using double entry but will still
remain invalid. Tsegaye Hailu
TOOLS FOR THE PROBLEMS (3)
DETECTING ERRORS IN DATA
VALIDATION: A means of ensuring that data entered
into the data file is valid according to some criteria
arrived at by an expert in the field.
Tsegaye Hailu
TOOLS FOR THE PROBLEMS (4)
DATA PROCESSING PROCEDURES
Good Reception and storage of questionnaires.
Ensure Not damaged or dirty
Good Data Entry – good choice personnel &
training & software
Verification or Double Entry (if used)
Tsegaye Hailu
TOOLS FOR THE PROBLEMS (5)
DATA PROCESSING PROCEDURES
Frequent checking and/or editing of data in
preparation for analysis
Frequent Backing up of data (having copies
elsewhere)
Archiving of data
Tsegaye Hailu
DM: ISSUES IN D’VELOPING COUNTRIES
Questionnaire designs, Data capturing tools, Storage,
archiving not optimal and need improvement
Our dates different from rest of the world. Need
synchronizing dates with other countries
Using Not Licensed software for entering and analyzing
data.
No formal training of staff latest database technologies
and statistical software's
Tsegaye Hailu
CONCLUSION
Data management plays the singular and central role
as the link between all aspects and disciplines of any
project from the field work to the laboratory, clinic,
regulatory bodies, sponsors and the statistics.
Do not start any study without first consulting a Data
Management Unit,.
additional slide on
“ DM Process and Questionnaire Development Next “
Tsegaye Hailu
DATA MANAGEMENT IN OR FORMS
Questionnaire / Form:
Participants’ data are collected in the study questionnaire with
unique identifiers on each form and specimen label.
Data management and quality assurance:
Raw data from questionnaires are double entered with
programmed computer checks to identify data entry errors.
Responsibility:
It is the investigators’ responsibility to ensure accuracy,
legibility and completeness of data entry in the questionnaire
and in all other required report forms and logs.
Tsegaye Hailu
DATA MANAGEMENT PROCESS
Protocol pQES development Documentation
on going eQES
Data Entry Instructions
design and validation
Qu
Data Entry erie
s Monitor
-----------------
ries
Data Validation Que Investigator
Clean File
Statistical Analysis
Completion of OStudy
QES Archiving
Report
Tsegaye Hailu
Purpose of Designing Questionnaire
Collects relevant data in a specific format
in accordance with the protocol
compliance with regulatory requirements; IEC/IRB
Allows for efficient and complete data processing,
analysis and reporting
Facilitates the exchange of data across projects
and organizations esp. through standardization.
Tsegaye Hailu
Questionnaire Relationship to Protocol
Protocol determines what data should be collected
on the questionnaire
All data must be collected on the questionnaire if
specified in the protocol
Data that will not be analyzed should not appear
on the questionnaire /CRF
Tsegaye Hailu
Questionnaire Development Process
Designer: Drafts questionnaire from protocol
Reviewers: questionnaire review meeting; comments
back to designer
Designer: Finalizes and prints questionnaire for use
Key identifying information: MUST HAVES
Study Number
Site/Center Number
Subject identification number
Tsegaye Hailu
Questionnaire Development cont’d
Guidelines(SOP will be shown)
• Collect data with all users in mind
• Collect data outlined in the protocol
• Be clear and concise with your data questions
• Avoid duplication
• Request minimal free text responses
Tsegaye Hailu
Questionnaire Development cont’d
Guidelines (cont..)
• Provide units to ensure comparable values
• Provide instructions to reduce misinterpretations
• Provide “choices” for each questions
• allows for computer summarization
• Use “None” and “Not done” where appropriate
Tsegaye Hailu
Poorly Designed Questionnaire
• Data not collected
• Collected too much data – Wasted resources in
collection and processing
• Database may require modification
• Data Entry process impeded
• Need to edit data
• Target dates are missed
Tsegaye Hailu
Guidelines for filling CRF
Have clear instructions on how to complete
• All sections must be completed at the time of subject visit
or as soon as results are available
• All entries must be attributable, accurate, legible and
complete
• In BLACK BALL POINT PEN
• Incorrect entries are crossed out with a single line, dates
and initialed, and explained where necessary
• No “WHITE OUT”, Erasers, Ink pen, Pencils
Tsegaye Hailu
Submitting Questionnaire
“The investigator should ensure the accuracy completeness,
legibility, and timeliness of the data reported to the sponsor in
the CRFs (Questionnaire) and in all required reports.”
ICH-GCP 4.9.1
Common errors in submitted questionnaires include:
corrections not dated / initialed
incorrect data
dates in wrong format
missing entries
wrong units
use of wrong colour pen
data for wrong subject
missing signatures
lack of consistency between parts of the CRF
Tsegaye Hailu
DM AND Questionnaire (3)
QES HANDLING: Reception and Data Entry (I)
• DM/DEC check number of forms and corresponding PINs
in Log book.
• DM/DEC signs for number of forms and date received.
Field Co-ordinator signs to log out the forms.
• Data entered using agreed software.
– Double Entry is done by two data entry clerks using two
computers.
Tsegaye Hailu
Questionnaire Safety and Precautions
•Keep questionnaires in a well-protected location.
•Do not give questionnaires to study participants.
•Store questionnaires binders in metal cabinets.
•Only authorized study personnel should have access to
questionnaires .
Tsegaye Hailu
PROCEDURE
1. State of the Questionnaires:
• Verify that each questionnaire page conforms to the
procedures to be performed on that study day.
• Each time a questionnaire page is completed, verify
that it corresponds to the correct study participant and
record the participant study number onto all CRF pages .
• Verify that all questionnaire pages are present
• Verify that questionnaire pages are not damaged
Tsegaye Hailu
PROCEDURE 2 CONT’D
Missing information should be recorded in the entry
field as follows:
NA = data not applicable
ND = evaluation or assessment were
applicable but not done
NK = information requested is unknown
Tsegaye Hailu
PROCEDURE 3 CONT’D
Use the 24-hour clock to record time (i.e. 22:40 instead
of 10:40 PM). One day encompasses 00:00 (midnight)
to 23:59 (one minute before midnight of the next day).
A time specification of 24:00 is invalid.
Numbers should be right-justified and recorded using
leading zeros when necessary.
Numbers should contain zeros in the tenths, hundreds
and thousands column as necessary (i.e. there should
not be blank fields).
Tsegaye Hailu
PROCEDURE 4 CONT’D
• Record numbers using decimals only (i.e., not fractions).
• Values should be recorded using the units specified on
the questionnaire .
• Data reported on the questionnaires that are derived
from source documents should be consistent with the
source documents or the discrepancies should be
explained.
Tsegaye Hailu
MAKING CORRECTIONS:
a. Authorized actions:
• Cross out the wrong entry with a single line
• Write the correct entry alongside/above/under the
wrong entry
• Initial the correction
• Date the correction
d. Prohibited actions:
Use the correcting fluids
Erasing or overwriting entries
Intentionally entering false data
Illegible entries
Tsegaye Hailu
Examples of Data onto CRFs
1. Entering the Data:
Specify: This is text
– The text field, represented by a thin line. This
is where you can enter text
Temperature: o
C
– The numerical data field. Use these fields to
enter numerical data. Enter leading zeros of
there are extra boxes.
Tsegaye Hailu
Examples of Data onto CRFs 1 cont’d
__/___ 2004
dd mm yyyy
– The data field. Please record the date in the
European format (i.e. day/ month/year).
___ ___:___ ___
24 hours
– The time field. Please record the time in 24
hour clock format.
Tsegaye Hailu
Examples of Data onto CRFs 1 cont’d
2. Correction procedure
_08_/_05_/2016 TH
09/05/2016
If an error has occurred, please use the following correction
procedure:
· Draw a single line through the error, so the original data
can still be seen
· Write the correct value next to the original entry
· Initial and date the correction
Tsegaye Hailu
Data Entry and Validation
Data processing errors are errors that occur
after data have been collected.2 Examples of
data processing errors include:
Transpositions (e.g., 19 becomes 91)
• Copying errors (e.g., 0 (zero) becomes O)
• Coding errors (e.g., a racial group).
Routing errors (e.g., the interviewer asks the wrong
question or asks questions in the wrong order)
• Consistency errors (contradictory responses, such as
the reporting of a hysterectomy after the respondent
has identified himself as a male)
• Range errors (responses outside of the range)
Tsegaye Hailu
Data Entry and Validation cont’d
To prevent such errors, you must identify the stage
at which they occur and correct the problem.
Methods to prevent data entry errors include:
Manual checks during data collection (e.g., checks
for completeness, handwriting legibility)
• Range and consistency checking during data entry
(e.g., preventing impossible results, such as ages
greater than 110)
• Double entry and validation following data entry
• Data analysis screening for outliers during data
analysis
Tsegaye Hailu
DM AND Questionnaire (6)
BACK-UPs & ARCHIVING :
• Back-up of data entered should be on DM’s computer/
CD /USB at end of each day/ week/month appropriately.
• Questionnaires after entry should be filed and kept in a
locked cabinet. DM & PI keep keys to the cabinets.
• Final Cleaned Data are sent to Institute’s or designated
Statistician for analysis and presentation.
• At the end of the study, all questionnaires should be
archived by Investigator and/or
Tsegaye Hailu
Sponsor
Data Capturing Tools
48
Data Capturing Tools outline
Web based system
RedCap
OpenClinica
Personal Assistant Device(PDA)
Stand alone Database
Microsoft Access
SPSS
Epiinfo
EpiData
49
Redcap
• REDCap was developed by an informatics team at
Vanderbilt University with ongoing support from NCRR
and NIH grants
• Research Electronic Data Capture
• browser-based
• designed to address common problems for academic
biomedical researchers hoping to use electronic databases
• And also clinical & translational research databases
• widely used in the academic research community
• Expensive for investigator-initiated studies or other such
studies at a smaller scale
50
REDCap provides
• user-friendly
• Web-based case report forms
• Real-time data entry validation (e.g. for data
types and range checks)
• audit trails, and the ability to set up a calendar to
schedule and track critical study events such as
blood-draws, participant visits, etc.
• Also, designated users can assign different levels
of access for each member of the research team.
51
Advantages of REDCap:
• Secure and web-based. Input data from anywhere in the world with secure web authentication,
data logging, and Secure Sockets Layer (SSL) encryption.
• Fast and flexible Conception to production-level database in less than one day.
• Multisite access. Projects can be used by researchers from multiple sites and institutions.
• Fully customizable. You are in total control of shaping your database or survey.
• Advanced question features. Auto-validation, branching logic, and stop actions.
• Mid-study modifications. You may modify the database or survey at any time during the study.
• Data import functions. Data may be imported from external data sources to begin a study or to
provide mid-study data uploads.
• Data comparison functions. Double data entry / Blinded data entry.
• Export survey results to common data analysis packages. Export your
data to Microsoft Excel, SAS, STATA, R, or SPSS for analysis.
• Save your survey or forms as PDFs. Generate a PDF version for printing in order to collect
52
Login interface
53
After login
54
Different Features
55
OpenClinica
• The world’s most widely-used,
open-source software for clinical research
• 1st released in 2005
• Designed to meet the diverse needs of
modern research environments
• Built as a lightweight, extensible, and modular
application
• Web brower
56
Important Features of OpenClinica
• Organization of research by study protocol and site.
• Dynamic generation of web-based CRFs in portable Excel
templates.
• Management of longitudinal data for recurring patient visits
• Data import/export tools for migration of study datasets.
• Interfaces for data query and retrieval across subjects, time, and
clinical parameters
• Compliance with regulatory guidelines e.g. 21 CFR Part 11
• Built on robust and scalable technology infrastructure interoperable
with relational databases
57
Login Interface
58
After login and different features
59
After login and different project
60
Working with OpenClinica
• Policy determination needed
• Required human and material resources allocated
• When know-how is established, utilization requires only
5 main steps:
– Designing
– Creating CRF’s
– Event definitions
– Data Entry
– Data Extraction
61
Designing CRFs
Done in excel using a blank CRF template
provided by Openclinica
62
Uploading CRFs
The excel sheet is then uploaded onto
OpenClinica
63
Event Definition
64
Data Entry
65
Epidata
EpiData is a windows based program for:
Design data structures
Simple data entry
Entering data and applying validating principles
Editing / correcting data already entered
Asserting that the data are consistent across variables
Printing or listing data for documentation of error-
checking and error-tracking
Comparing data entered twice
Exporting data for further use in statistics
66
EpiData Main Features
• Questionnaire design
• Make data file
• Check
• Data entry
• Data export
• Documentation
67
How to work with EpiData?
Work Process toolbar“
Define Data
Point at “Define data” part and “new qes file”
Save the empty file and give it the name first.qes
Write now in the Epi-Editor the lines shown:
Explanation: Each line has three elements:
68
EpiData(1)
1. Write now in the Epi-Editor the lines shown:
Explanation: Each line has three elements:
A. Name of variable (e.g. v1)
B. Text describing the variable
(e.g. sex or "day of birth")
C. An input definition, e.g. ## for two digit
2. Save the file again as done in point 1
3. Now preview the data form
69
EpiData(2)
Close the form as well as the Epi-Editor
Proceed to next section
Create DataFile
Accept the ”first.qes” and ”first.rec” names
for "make datafile“
Data form saved as first.qes
Data file which will contain the data, saved as
first.rec.
70
Add checks of Data Entry
Click Add checks of Data Entry
Add checks specify rules for data entry
• range 10-80 plus single value 99
• Jumps: On value 1 goto s2: 1>s2
71
Now add value labels to a variable
72
Data Entry
• Continue with Enter Data
• Simply activate the Enter data on the toolbar
and accept first.rec for data entry
• Double Entry of Data
Toos->
prepare
double
data entry
73
Export, Analysis and options
• Export to any data format
74
Data management and analysis using stata
• Running Stata
• Stata windows shown below
75
Data management using stata(6)
• Simple linear regression – regress, rvfplot,
other diagnostics
• Correlation – corr, spearman, ktau – I tend not
to use corr because of the sensitivity to the
normality assumption for tests and confidence
intervals
• Only pwcorr and not corr provide test of
significance
76
THANK YOU
77