0% found this document useful (0 votes)
11 views

Data Analysis

Uploaded by

Moti Gurmessa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Data Analysis

Uploaded by

Moti Gurmessa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 52

Processing and Analysis of Data

Contents
◦ Data Processing: Editing, Coding, Classification,
Tabulation, and presentation
◦ Employing Statistical Tools for Data Analysis
 Overview of descriptive and inferential statistics
 Parametric and non-parametric tests
◦ Interpretation of Data
◦ Utilizing Computers for Data Processing (using
SPSS): An Overview
Plan for processing and analysis:
Quantitative data
• Data Processing: Editing, Coding,
Classification, Tabulation, and presentation
• Level of measurements [Nominal, Ordinal,
Interval and Ratio]
• Employing Statistical Tools for Data Analysis
 Descriptive Vs inferential statistics
 Parametric and non-parametric tests
• Interpretation of Data
• Utilizing Computers for Data Processing
(using SPSS, STATA, etc)
SEM, Meta Analysis
Qualitative Data Analysis
 Best used when for in-depth understanding of
the intervention
Used for any non-numerical data collected:
– unstructured observations
– open-ended interviews
– analysis of written documents
– focus groups transcripts
– diaries, observations
Analysis challenging
Take care for accuracy (validity concern)
Computer help for qualitative data
analysis
Software packages to help you organize
data [example, Qualpro, Hyperqual,
Anthropax, Atlas-ti, Envivo, etc]
Search, organize, categorize, and annotate
textual and visual data
Help you visualize the relationships
among data
Linking qualitative and quantitative
data
Should qualitative and quantitative data
and associated methods be linked during
study design?
– How?
– Why?
Refer the discussion on mixed method
design
Data Processing:
Once the data have been collected, the next
step is data processing, generally consisting
of:
◦ Editing,
◦ Coding/recoding
◦ Classification and
◦ Tabulations including producing tables, graphs,
coefficients etc.
Data processing requires careful attention
and understandings.
understandings Else it results in what is
known as GIGO:
GIGO Garbage in Garbage out. out
Questionnaire Editing
Editing of data is a process of examining
the collected raw data (especially in
surveys) to detect errors and omissions and
to correct these.
It involves a careful scrutiny of the
completed questionnaires and /or schedules.
It is done to assure that information received
are complete as much as possible and have
been well arranged to facilitate coding and
tabulation.
 Editing
requires checking for the following:
a. Completeness: Whether every questions has
answers or not. Incomplete questions can be
imputed (if possible).
b. Accuracy:
Accuracy Check if every questions has an
appropriate answer. Inaccuracy often arises out of
carelessness on the part of enumerator, deliberate
misleading, and ticking wrong boxes or circling
wrong codes.
Uniformity Failure to give explicit
c. Uniformity:
instructions or clear understanding of the
questions could lead to recording the same
answer in different ways. A check on
uniformity is believed to eradicate this
source of error.
Data Coding
Coding is the process of converting
answers to numbers and classifying
answers accordingly so that responses can
be put into a limited number of categories
or classes. .
Coding is the primary task in reduction
of qualitative data.
Coding decision should usually be taken
at the designing stage of the questionnaire.
Six main steps in Coding and
Classifying quantitative data:
a. Classifying responses
b. Allocating codes to each variable
c. Allocating column numbers to each
variables
d. Producing a codebook
e. Checking from coding errors
f. Entering data into computer
Data Entry
Requirements for Data Entry
1. Definition of Data Dictionary – Giving
names and explanations for each of the
variables to be entered into the database.
2. Defining range : In order to regulate the
magnitude of answers to be entered for each
of the questions on the questionnaires, the
researcher needs to limit the scope of
answers and their flow patterns.
Data editing and cleaning
 Dataediting and cleaning after data entry is
tantamount to drying and ironing washed clothes
before putting on.
 Wrong entries either in the field or during data
coding and entry need to be checked and removed
before the commencement of data analysis.
 Cleaning can be done by looking at patterns of
the data via identification of outliers and
unexpected responses through running frequencies
and cross tabulating related variables.
Four broad considerations of data
analysis
Identification of Level of measurement of
each variable
Number of variables that each of the
particular pieces of analysis requires.
Types of analysis required: descriptive vs
analytic
Application of ethical principles of full,
fair, appropriate and challenging analysis
to the selection of data to be analyzed and
reported.
Tabulation and data analysis
Tabulation starts with production of simple
frequency and contingency tables to
construction of complex and multi-
dimensional tables
Tabulation is often known as a skeleton
form of the survey research.
A researcher shall assume some knowledge
of quantitative data analysis procedures to
assume the sense of skeleton.
Even if the researcher does not have
sufficient knowledge of data analysis, he/she
can consult someone who has sufficient
knowledge of data processing.
Descriptive Vs inferential analysis
Analysis of data means the computation of
certain indices or measures along with
searching for pattern of relationship that
exist among the data groups.
Analysis, particularly in case of survey or
experimental data, involves estimating the
values of unknown parameters of the
population and testing of hypotheses for
drawing inferences.
Descriptive Analysis
Descriptive Statistics is the numerical, graphical
and tabular techniques for organizing, presenting
and analyzing data.
Descriptive Statistics are used to describe the
basic features of the data in a study. They provide
simple summaries about the sample and the
measures. Together with simple graphic analysis,
they form the basis of virtually every quantitative
analysis of data. With descriptive statistics you
are simply describing what is and what the data
shows.
• Inferential Statistics investigate questions,
models and hypotheses. In many cases, the
conclusions from inferential statistics extend
beyond the immediate data alone. For
instance, we use inferential statistics to try to
infer from the sample data what the population
looks like or how the population looks like.
• Further, we use inferential statistics to make
judgments as to whether an observed
difference between groups is a dependable one
or one that might have happened by chance
alone in a study.
Types of Descriptive Statistics
Type Function Examples
Tables Provide a frequency Frequency Table
distribution for a variable Cross-tabulations
or variables

Graphs Provide a visual Pie, bar, histogram, polygon


representation of the Scatter plot
distribution of a
variable(s)

Numerical Mathematical operations Measures of central tendency


Measures used to quantify (in a Measures of dispersion
single number)
number particular Measures of Association, Correlation
features of dist’n and Regression)
Frequency Table
A frequency table provides the number of observations
belonging to each of the categories for the variable in question. It
tells the number of cases in each of the categories.
 Relative Frequency is the proportion of cases contained within
each category.
xx f(x)
f(x) f(x)/n
f(x)/n
SpendingClass
Spending Class($)Frequency
($)Frequency(number
(numberofofcustomers)
customers) RelativeFrequency
Relative Frequency

00totoless
lessthan
than100
100 30
30 0.163
0.163
100totoless
100 lessthan
than200
200 38
38 0.207
0.207
200totoless
200 lessthan
than300
300 50
50 0.272
0.272
300totoless
300 lessthan
than400
400 31
31 0.168
0.168
400totoless
400 lessthan
than500
500 22
22 0.120
0.120
500totoless
500 lessthan
than600
600 13
13 0.070
0.070
184
184 1.000
1.000
Cross Tabulations
A table representing the cross-classification of two or
more categorical variables. The levels of each variable
are arranged in a grid, and the number of observations
are falling into each category (i.e. cell).

Independent
Variable
Sex Row
Monthly Salary Total
M F Total
Less than 1000 65 (21.0) 105 (31.8) 170(26.6)
1001 – 2500 97 (31.3) 123 (37.3) 220(34.4)
2500 – 5000 103 (33.2) 77 (23.3) 180(28.1)
5001 and above 45(14.5) 25(7.6) 70(10.9)

Total 310(100.0) 330(100.0) 640(100.0) Grand


Column Total Total
Graphical methods of displaying data
 Pie Charts
Categories represented as percentages of total
 Bar Graphs
Heights of rectangles represent group frequencies
Bars do not touch each other
 Frequency Polygons
Height of line represents frequency
 Histogram
 A histogram is a chart made of bars of different
heights but interconnected.
 Time Plots
Represents values over time
Pie Chart
Figure1-1:
Figure 1-1:Extent
Extentof
of job
jobsatisfication
satisfication
Category
Category
Don't like my job but it is on my career path
Don't
Job like mybut
is OK, jobitbut it ison
is not onmy
mycareer
careerpath
path
Job is OK,
Enjoy but
job, butit is not
it is onon
not my career
my path
career path
Enjoy job, but it is not on my career path
My job just pays the bills
MyHappy
job just pays
with the bills
career
Happy with career

6.0% Do not like my job, but it is on my career path


6.0% Do not like my job, but it is on my career path
Happy with career
Happy with career 19.0%
33.0% 19.0%
33.0%
Job OK, but it is not on my career path
Job OK, but it is not on my career path

19.0%
19.0%
Enjoy job, but it is not on my career path
Enjoy job, but it is not on my career path
23.0%
23.0%
My job just pays the bills
My job just pays the bills

NB: Use different colors for each of the slices to


distinguish between categories
Bar Chart
Figure1-2
Figure 1-2
Quartelynet
Quartely netincome
incomefor
forGeneral
GeneralMotors
Motors(in
(inbillions)
billions)

1.5
1.5

1.2
1.2

0.9
0.9

0.6
0.6

0.3
0.3

0.0
0.0
1Q 2Q 3Q 4Q 1Q
1Q 2Q 3Q 4Q 1Q
2003 C4 2004
2003 C4 2004

Bar chart is advantageous to make presentations for those


who are not familiar with statistical materials
Histogram Example
Frequency Histogram
Frequency Polygon
Relative Frequency Polygon
0 .3

0 .2
Relative Frequency

0 .1

0 .0

0 1 0 2 0 3 0 4 0 5 0

Sales

It visualizes gradual shifts in frequency from one


category to another
Time Plot/Line Graph
M o n t h l y S t e e l P r o d u c ti o n

8 .5

7 .5
Million s of Ton s

6 .5

5 .5

M o n th J F MAM J J A S O N D J F MAM J J A S O N D J F MAM J J A S O


Summary Measures:
Population Parameters and Sample Statistics

 Measures of Central  Measures of dispersion


Tendency  Range
Median  Variance
Mode  Standard Deviation

Mean

 Other summary measures:


 Skewness
 Kurtosis
Measures of central tendency

Median  Middle value when


sorted in order of
magnitude
 50th percentile

Mode  Most frequently-


occurring value

Mean  Average
Example - Mode

..
.. .. .. .. .. :: .. :: :: :: .. .. .. .. ..
---------------------------------------------------------------
---------------------------------------------------------------
66 9910
10 12121313141415
1516161717181819
19202021212222 24 24

Mode = 16

The mode is the most frequently occurring value.


It is the value with the highest frequency.
Arithmetic Mean or Average
The mean of a set of observations is their average -
the sum of the observed values divided by the
total number of observations.

Population Mean Sample Mean


N

x n

x
 i 1

N x i 1

n
Parameter is a Statistic is a measure
measure of that characterizes a
population features sample
Example – Mean
Sales
9
6 n
12
10
x 317
13 x i 1
  1585
.
15 n 20
16
14
14
16
17 Mean is a computed average
16
24
21
22
18
19
18
20
17
317
Measures of Dispersion
 Range
Difference between maximum and minimum
values (Max – Min)
 Variance (s2)
Average*of the squared deviations from the
mean
 Standard Deviation (s)
Square root of the variance (√s2 )

Definitions of population variance and sample variance differ slightly .
Example - Range
Sorted
Sales Sales Rank
9 6 1 Minimum
6 9 2
12 10 3
10 12 4
13 13 5
15 14 6
16 14 7
14 15 8 Range: Maximum - Minimum =
14 16 9 24 - 6 = 18
16 16 10
17 16 11
16 17 12
24 17 13
21 18 14
22 18 15
18 19 16
19 20 17
18 21 18
20 22 19
17 24 20 Maximum
Variance and Standard Deviation

Population Variance Sample Variance


n
N

(x )
  2 (x  x) 2

s  i 1
2

 2  i1
N
n  1
( x) ( )
2 2
N n
 x
N n
i 1
x 2
 i 1 x  2

 i1 N 
i 1
n
N n  1
  2

s s
2
Standard Deviation
The most commonly used method of
summarizing dispersion in statistics
Calculates the average amount of
deviation from the mean
Reflects the degree to which the values in
the distribution differ from the arithmetic
mean
Is usually presented in tendon with the
mean (x±s), as it is difficult to determine
its meaning without mean values.
Measurement of Shape of Distribution:
Skewness and Kurtosis
 Skewness
◦ Measure of asymmetry of a frequency distribution
 Skewed to left
 Symmetric or unskewed
 Skewed to right
 Kurtosis
◦ Measure of flatness or peakedness of a frequency
distribution
 Platykurtic (relatively flat)
 Mesokurtic (normal)
 Leptokurtic (relatively peaked)
Skewness
Skewed to left
Skewness

Symmetric
Skewness

Skewed to right
Kurtosis

Platykurtic - flat distribution


Kurtosis

Mesokurtic – neither too flat nor too peaked


Kurtosis

Leptokurtic - peaked distribution


Type of Tests
Parametric tests are statistical tests which
make certain assumptions about the
parameters of the full population from which
the sample is taken.
These tests normally involve data expressed in
absolute numbers (interval or ratio) rather
than ranks and categories (nominal or ordinal).
 Such tests include analysis of variance
(ANOVA), t-tests, Z-test, etc.
Some assumptions for parametric
tests include:
 The observations must be independent.
 The observations should be drawn from
normally distributed populations.
 These populations should have equal
variances
t-test
Is a parametric test most suitable for a
small sample
It is used to:
◦ Test the significance of the means of a random
sample
◦ This test is made when the researcher is
interested in examining whether the mean of a
sample from the normal population deviated
significantly from the hypothetical population
mean
ANOVA
• ANOVA (ANalysis Of VAriance) is a statistical
method for determining the existence of
differences among several population means.
ANOVA is designed to detect differences
among means from populations subject to
different treatments.
Analysis of variance is a method of splitting
the total variation into meaningful components
that measure different sources of variation.
Who is living better life? People living in bole,
people living in Kolfe, or people living in x area?
This can be seen by analysis of variance.
Non parametric test
• Non-parametric tests are used to test
hypotheses with nominal and ordinal data.
• The use of non-parametric methods may be
necessary when data have a ranking but no
clear numerical interpretation, such as
when assessing preferences; in terms of
levels of measurement, for data on an
ordinal scale.
• Such tests are like Chi-Square (X2),
Mann-Whitney Test, kruskal wallis, etc
Chi-Square (2)
If the two variables whose degree of
association we want to test are categorical
in nature (for example, job satisfaction
versus income), the appropriate
nonparametric statistic for testing such
relationship is the Chi-square test.
The χ2 test helps to test whether
there really is a relationship between
two variables or not.
It is based on probability sampling,
that is most commonly used in all
non-parametric tests.
is basically used:
◦ to test the goodness of fit
◦ test of homogeneity/ association
52

You might also like