Data Analysis
Data Analysis
Contents
◦ Data Processing: Editing, Coding, Classification,
Tabulation, and presentation
◦ Employing Statistical Tools for Data Analysis
Overview of descriptive and inferential statistics
Parametric and non-parametric tests
◦ Interpretation of Data
◦ Utilizing Computers for Data Processing (using
SPSS): An Overview
Plan for processing and analysis:
Quantitative data
• Data Processing: Editing, Coding,
Classification, Tabulation, and presentation
• Level of measurements [Nominal, Ordinal,
Interval and Ratio]
• Employing Statistical Tools for Data Analysis
Descriptive Vs inferential statistics
Parametric and non-parametric tests
• Interpretation of Data
• Utilizing Computers for Data Processing
(using SPSS, STATA, etc)
SEM, Meta Analysis
Qualitative Data Analysis
Best used when for in-depth understanding of
the intervention
Used for any non-numerical data collected:
– unstructured observations
– open-ended interviews
– analysis of written documents
– focus groups transcripts
– diaries, observations
Analysis challenging
Take care for accuracy (validity concern)
Computer help for qualitative data
analysis
Software packages to help you organize
data [example, Qualpro, Hyperqual,
Anthropax, Atlas-ti, Envivo, etc]
Search, organize, categorize, and annotate
textual and visual data
Help you visualize the relationships
among data
Linking qualitative and quantitative
data
Should qualitative and quantitative data
and associated methods be linked during
study design?
– How?
– Why?
Refer the discussion on mixed method
design
Data Processing:
Once the data have been collected, the next
step is data processing, generally consisting
of:
◦ Editing,
◦ Coding/recoding
◦ Classification and
◦ Tabulations including producing tables, graphs,
coefficients etc.
Data processing requires careful attention
and understandings.
understandings Else it results in what is
known as GIGO:
GIGO Garbage in Garbage out. out
Questionnaire Editing
Editing of data is a process of examining
the collected raw data (especially in
surveys) to detect errors and omissions and
to correct these.
It involves a careful scrutiny of the
completed questionnaires and /or schedules.
It is done to assure that information received
are complete as much as possible and have
been well arranged to facilitate coding and
tabulation.
Editing
requires checking for the following:
a. Completeness: Whether every questions has
answers or not. Incomplete questions can be
imputed (if possible).
b. Accuracy:
Accuracy Check if every questions has an
appropriate answer. Inaccuracy often arises out of
carelessness on the part of enumerator, deliberate
misleading, and ticking wrong boxes or circling
wrong codes.
Uniformity Failure to give explicit
c. Uniformity:
instructions or clear understanding of the
questions could lead to recording the same
answer in different ways. A check on
uniformity is believed to eradicate this
source of error.
Data Coding
Coding is the process of converting
answers to numbers and classifying
answers accordingly so that responses can
be put into a limited number of categories
or classes. .
Coding is the primary task in reduction
of qualitative data.
Coding decision should usually be taken
at the designing stage of the questionnaire.
Six main steps in Coding and
Classifying quantitative data:
a. Classifying responses
b. Allocating codes to each variable
c. Allocating column numbers to each
variables
d. Producing a codebook
e. Checking from coding errors
f. Entering data into computer
Data Entry
Requirements for Data Entry
1. Definition of Data Dictionary – Giving
names and explanations for each of the
variables to be entered into the database.
2. Defining range : In order to regulate the
magnitude of answers to be entered for each
of the questions on the questionnaires, the
researcher needs to limit the scope of
answers and their flow patterns.
Data editing and cleaning
Dataediting and cleaning after data entry is
tantamount to drying and ironing washed clothes
before putting on.
Wrong entries either in the field or during data
coding and entry need to be checked and removed
before the commencement of data analysis.
Cleaning can be done by looking at patterns of
the data via identification of outliers and
unexpected responses through running frequencies
and cross tabulating related variables.
Four broad considerations of data
analysis
Identification of Level of measurement of
each variable
Number of variables that each of the
particular pieces of analysis requires.
Types of analysis required: descriptive vs
analytic
Application of ethical principles of full,
fair, appropriate and challenging analysis
to the selection of data to be analyzed and
reported.
Tabulation and data analysis
Tabulation starts with production of simple
frequency and contingency tables to
construction of complex and multi-
dimensional tables
Tabulation is often known as a skeleton
form of the survey research.
A researcher shall assume some knowledge
of quantitative data analysis procedures to
assume the sense of skeleton.
Even if the researcher does not have
sufficient knowledge of data analysis, he/she
can consult someone who has sufficient
knowledge of data processing.
Descriptive Vs inferential analysis
Analysis of data means the computation of
certain indices or measures along with
searching for pattern of relationship that
exist among the data groups.
Analysis, particularly in case of survey or
experimental data, involves estimating the
values of unknown parameters of the
population and testing of hypotheses for
drawing inferences.
Descriptive Analysis
Descriptive Statistics is the numerical, graphical
and tabular techniques for organizing, presenting
and analyzing data.
Descriptive Statistics are used to describe the
basic features of the data in a study. They provide
simple summaries about the sample and the
measures. Together with simple graphic analysis,
they form the basis of virtually every quantitative
analysis of data. With descriptive statistics you
are simply describing what is and what the data
shows.
• Inferential Statistics investigate questions,
models and hypotheses. In many cases, the
conclusions from inferential statistics extend
beyond the immediate data alone. For
instance, we use inferential statistics to try to
infer from the sample data what the population
looks like or how the population looks like.
• Further, we use inferential statistics to make
judgments as to whether an observed
difference between groups is a dependable one
or one that might have happened by chance
alone in a study.
Types of Descriptive Statistics
Type Function Examples
Tables Provide a frequency Frequency Table
distribution for a variable Cross-tabulations
or variables
00totoless
lessthan
than100
100 30
30 0.163
0.163
100totoless
100 lessthan
than200
200 38
38 0.207
0.207
200totoless
200 lessthan
than300
300 50
50 0.272
0.272
300totoless
300 lessthan
than400
400 31
31 0.168
0.168
400totoless
400 lessthan
than500
500 22
22 0.120
0.120
500totoless
500 lessthan
than600
600 13
13 0.070
0.070
184
184 1.000
1.000
Cross Tabulations
A table representing the cross-classification of two or
more categorical variables. The levels of each variable
are arranged in a grid, and the number of observations
are falling into each category (i.e. cell).
Independent
Variable
Sex Row
Monthly Salary Total
M F Total
Less than 1000 65 (21.0) 105 (31.8) 170(26.6)
1001 – 2500 97 (31.3) 123 (37.3) 220(34.4)
2500 – 5000 103 (33.2) 77 (23.3) 180(28.1)
5001 and above 45(14.5) 25(7.6) 70(10.9)
19.0%
19.0%
Enjoy job, but it is not on my career path
Enjoy job, but it is not on my career path
23.0%
23.0%
My job just pays the bills
My job just pays the bills
1.5
1.5
1.2
1.2
0.9
0.9
0.6
0.6
0.3
0.3
0.0
0.0
1Q 2Q 3Q 4Q 1Q
1Q 2Q 3Q 4Q 1Q
2003 C4 2004
2003 C4 2004
0 .2
Relative Frequency
0 .1
0 .0
0 1 0 2 0 3 0 4 0 5 0
Sales
8 .5
7 .5
Million s of Ton s
6 .5
5 .5
Mean
Mean Average
Example - Mode
..
.. .. .. .. .. :: .. :: :: :: .. .. .. .. ..
---------------------------------------------------------------
---------------------------------------------------------------
66 9910
10 12121313141415
1516161717181819
19202021212222 24 24
Mode = 16
x n
x
i 1
N x i 1
n
Parameter is a Statistic is a measure
measure of that characterizes a
population features sample
Example – Mean
Sales
9
6 n
12
10
x 317
13 x i 1
1585
.
15 n 20
16
14
14
16
17 Mean is a computed average
16
24
21
22
18
19
18
20
17
317
Measures of Dispersion
Range
Difference between maximum and minimum
values (Max – Min)
Variance (s2)
Average*of the squared deviations from the
mean
Standard Deviation (s)
Square root of the variance (√s2 )
Definitions of population variance and sample variance differ slightly .
Example - Range
Sorted
Sales Sales Rank
9 6 1 Minimum
6 9 2
12 10 3
10 12 4
13 13 5
15 14 6
16 14 7
14 15 8 Range: Maximum - Minimum =
14 16 9 24 - 6 = 18
16 16 10
17 16 11
16 17 12
24 17 13
21 18 14
22 18 15
18 19 16
19 20 17
18 21 18
20 22 19
17 24 20 Maximum
Variance and Standard Deviation
(x )
2 (x x) 2
s i 1
2
2 i1
N
n 1
( x) ( )
2 2
N n
x
N n
i 1
x 2
i 1 x 2
i1 N
i 1
n
N n 1
2
s s
2
Standard Deviation
The most commonly used method of
summarizing dispersion in statistics
Calculates the average amount of
deviation from the mean
Reflects the degree to which the values in
the distribution differ from the arithmetic
mean
Is usually presented in tendon with the
mean (x±s), as it is difficult to determine
its meaning without mean values.
Measurement of Shape of Distribution:
Skewness and Kurtosis
Skewness
◦ Measure of asymmetry of a frequency distribution
Skewed to left
Symmetric or unskewed
Skewed to right
Kurtosis
◦ Measure of flatness or peakedness of a frequency
distribution
Platykurtic (relatively flat)
Mesokurtic (normal)
Leptokurtic (relatively peaked)
Skewness
Skewed to left
Skewness
Symmetric
Skewness
Skewed to right
Kurtosis