Introduction Qr1
Introduction Qr1
Teaching Notes: Quantitative Data Analysis ~ © Copyright 2017 W. Mertens, A. Pugliese & J. Recker. All Rights Reserved. ~ 1
Copyright Notice
Teaching Notes: Quantitative Data Analysis ~ © Copyright 2017 W. Mertens, A. Pugliese & J. Recker. All Rights Reserved. ~ 2
What these materials are about
Offering a guide through the essential steps required in quantitative data analysis
1. Introduction
2. Comparing Differences Across Groups
3. Assessing (Innocuous) Relationships
4. Models with Latent Concepts and Multiple Relationships: Structural Equation
Modeling
5. Nested Data and Multilevel Models: Hierarchical Linear Modeling
6. Analyzing Longitudinal and Panel Data
7. Causality: Endogeneity Biases and Possible Remedies
8. How to Start Analyzing, Test Assumptions and Deal with that Pesky p-Value
9. Keeping Track and Staying Sane
Teaching Notes: Quantitative Data Analysis ~ © Copyright 2017 W. Mertens, A. Pugliese & J. Recker. All Rights Reserved. ~ 3
Part 1:
Exploring Data and Testing
Assumptions
Teaching Notes: Quantitative Data Analysis ~ © Copyright 2017 W. Mertens, A. Pugliese & J. Recker. All Rights Reserved. ~ 4
Warning
There are three kinds of lies: lies, damned lies, and statistics.
Benjamin Disraeli
Henry Clay
5
Agenda
6
Structuring data
1. Exploring data
7
Structuring data
1. Exploring data
Nested data
8
Structuring data
1. Exploring data
Cleaning data
= Taking out unreliable (not inconvenient) cases
Missing data (or listwise/pairwise)
Extreme tendencies (e.g. all 6/all 1)
Improbable response time (e.g. outliers)
Inconsistent responses (e.g. age < tenure)
≠ Introducing bias
Consistent application of rules
Mindful of hypotheses and method (IV/DV)
Consider power and credibility
10
Cleaning data and eliminating outliers
1. Exploring data
Eliminating outliers
Outliers are highly improbable or erroneous values
They can influence statistics --> introduce bias
They affect generalizability
The decision to exclude depends on the RQs
How to find outliers
Box-plots
Histograms
Scatter plots
z-scores <-3.29 or >3.29 (see slide 16)
11
Visualising data
1. Exploring data
Histograms
65 66 67 68 69 70 71 72 73 74 75 76 … 83
Visualising data
1. Exploring data
Box plots
Q3
Age of senior league bowls players
Q3
median median
Q1 Q1
13
Visualising data
1. Exploring data
Scatter plots
# points scored
Frequency distributions
n
Population of senior bowls players
65 66 67 68 69 70 71 72 73 74 75 76 … 83
Normalization:
Models
Attempt to explain/summarise data
Vary in how well they “fit” the data
E.g.: mean is a model; s illustrates fit
Fit
Significance
Hypothesis testing involves comparing two models (H0 vs. H1)
Comparing models is done using test statistics:
variance explained by the model/variance not explained by the model
If the probability of observing this test statistic, or anything more extreme, is smaller
than .05/.01/.001, then we conclude statistical significance (i.e. H1 explains the data
better than H0)
Significance ≠ importance
Non-significance does not say anything about H0 17
Correlations and differences
2. Understanding data
n
Population of
senior bowls players
65 66 67 68 69 70 71 72 73 74 75 76 … 83
Mean
Standard Deviation 18
Correlations and differences
2. Understanding data
# points scored
H0
Independence
Data was collected from independent sources
Variable measurements were independent (e.g. regression)
Homoscedasticity/homogeneity of variance
Variance is equal in different (sub-)samples
Normality
Sampling distribution/errors/data follow a normal distribution --> have limited skew
and kurtosis
21
Independence
3. Testing assumptions
22
Homoscedasticity/homogeneity of variance
3. Testing assumptions
One variable, multiple groups (e.g. t-test): spread of values is equal across
different groups
Visual test: scatter- or boxplot
Statistical test: Levene’s test for equality of variance
When significant (p < .05): no homo-scedascity (i.e. heteroscedascity)
Levene’s test will usually be
significant in large samples;
use other tests
(e.g. Hartley’s Fmax)
23
Homoscedasticity/homogeneity of variance
3. Testing assumptions
24
Normality
3. Testing assumptions
Skew (+)
must be ≤ 1.96
Increase to 2.58 in larger samples and do not use in very large samples (n > 200)
26
Normality
3. Testing assumptions
In regression-based models
Errors/residuals, not indicators need to be normally distributed
Same visual principles as Q-Q plot apply
residuals
Please note: in
this case, both
graphs do not
represent the
same data
What if assumptions are violated?
3. Testing assumptions
Correct data
Exclude outliers
Transform data, e.g.:
Log-, square root and reciprocal (1/x) transformations shorten the right tale (i.e. correct
positive skew)
The same transformations applied to the reverse score (score – highest score + 1)
correct for negative skew
The same transformation has to be applied to variables that are compared directly
Turn to tests that are robust against violations or to non-parametric tests, e.g.
Mann–Whitney U for group comparisons
Kendall's tau for dependence between two variables
Scales and factors - basics
4. Scales and factors
Visualisation of scale with three indicators measuring one latent variable / factor:
Principal component analysis
4. Scales and factors
Run PCA with no restriction on the number of factors and with a scree plot
Decide how many factors to retain based on eigenvalues, scree plot and R2
Separate mountain from scree
Eigenvalue > 1
Eigenvalue: proportion of
variance explained by factor
(sum = # variables)
Cumulative R2 > .6
Principal component analysis
4. Scales and factors
Once a stable solution has been reached, evaluate reliability and uni-
dimensionality of scales
Inter-item correlation when # indicators for factor is 2
Should be significant
Chronbach’s Alpha when # indicators for factor is > 2
Should be higher than .7
“Alpha if item deleted” should be lower than Alpha
If not: exclude item and run PCA again
End of Part 1
© Copyright 2017 W. Mertens, A. Pugliese & J. Recker. All Rights
Reserved.
26.11.2021 Teaching Notes: Quantitative Data Analysis ~ © Copyright 2017 W. Mertens, A. Pugliese & J. Recker. All Rights Reserved. ~ 34