0% found this document useful (0 votes)
10 views34 pages

Introduction Qr1

Uploaded by

jeansamuel100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views34 pages

Introduction Qr1

Uploaded by

jeansamuel100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Quantitative Data Analysis:

A Companion for Accounting and


Information Systems Research
Teaching Materials

Created by Willem Mertens, Amedeo Pugliese & Jan Recker

Teaching Notes: Quantitative Data Analysis ~ © Copyright 2017 W. Mertens, A. Pugliese & J. Recker. All Rights Reserved. ~ 1
Copyright Notice

© Copyright 2017 W. Mertens, A. Pugliese & J. Recker. All


Rights Reserved.

Teaching Notes: Quantitative Data Analysis ~ © Copyright 2017 W. Mertens, A. Pugliese & J. Recker. All Rights Reserved. ~ 2
What these materials are about
Offering a guide through the essential steps required in quantitative data analysis

1. Introduction
2. Comparing Differences Across Groups
3. Assessing (Innocuous) Relationships
4. Models with Latent Concepts and Multiple Relationships: Structural Equation
Modeling
5. Nested Data and Multilevel Models: Hierarchical Linear Modeling
6. Analyzing Longitudinal and Panel Data
7. Causality: Endogeneity Biases and Possible Remedies
8. How to Start Analyzing, Test Assumptions and Deal with that Pesky p-Value
9. Keeping Track and Staying Sane

Teaching Notes: Quantitative Data Analysis ~ © Copyright 2017 W. Mertens, A. Pugliese & J. Recker. All Rights Reserved. ~ 3
Part 1:
Exploring Data and Testing
Assumptions

Teaching Notes: Quantitative Data Analysis ~ © Copyright 2017 W. Mertens, A. Pugliese & J. Recker. All Rights Reserved. ~ 4
Warning

There are three kinds of lies: lies, damned lies, and statistics.

Benjamin Disraeli

Statistics are no substitute for judgment.

Henry Clay

5
Agenda

1. Exploring Data 3. Testing assumptions


 Structuring data  Independence
 Basics  Homoscedasticity
 Variable types  Normality
 Cleaning data and eliminating outliers  Skew and kurtosis
 Visualising data  Transformations

2. Understanding data 4. Scales and factors


 Distributions, means and standard deviations  Basics
 Models and significance  PCA/EFA vs. CFA
 Correlations and differences

6
Structuring data
1. Exploring data

 One row per case, one variable per column

Age Gender Role …


Person 1 19 F Student …
Person 2 53 F Professor …
Person 3 27 M Admin …
… … … … …

 Depends on unit of analysis (e.g. person)

7
Structuring data
1. Exploring data

 Nested data

Age Gender Role


Role1 Role
… 2 Role 3
Person 1 19 F Student Tutor
… -
Person 1 19 F Tutor …
Person 2 53 F Professor Head of…School Supervisor
Person 2 53 F Head of School … -
Person 2 53 F Supervisor …
Person 3 27 M Admin …- -
… … … … … …

8
Structuring data
1. Exploring data

 Recoding data: variable types


 Categorical variables
 Nominal (e.g. role)
 Dichotomous (e.g. gender)
 Ordinal (e.g. hierarchical level)
 Continuous variable
 Interval (e.g. degrees): 5-10 = 15-20
 Ratio (e.g. weight): O is nothing, 10 = 2*5

Age Gender Role 1 Role 2 Role 3


Person 1 19 1F Student Tutor -
Person 2 53 1F Professor Head of School Supervisor
Person 3 27 M
2 Admin - -
… … … … … … 9
Cleaning data and eliminating outliers
1. Exploring data

 Cleaning data
= Taking out unreliable (not inconvenient) cases
 Missing data (or listwise/pairwise)
 Extreme tendencies (e.g. all 6/all 1)
 Improbable response time (e.g. outliers)
 Inconsistent responses (e.g. age < tenure)
≠ Introducing bias
 Consistent application of rules
 Mindful of hypotheses and method (IV/DV)
 Consider power and credibility

10
Cleaning data and eliminating outliers
1. Exploring data

 Eliminating outliers
 Outliers are highly improbable or erroneous values
 They can influence statistics --> introduce bias
 They affect generalizability
 The decision to exclude depends on the RQs
 How to find outliers
 Box-plots
 Histograms
 Scatter plots
 z-scores <-3.29 or >3.29 (see slide 16)

11
Visualising data
1. Exploring data

 Histograms

65 66 67 68 69 70 71 72 73 74 75 76 … 83

Age of senior league bowls players


12
"Boxplot vs PDF" by Jhguch - Wikipedia

Visualising data
1. Exploring data

 Box plots

Q3
Age of senior league bowls players
Q3
median median
Q1 Q1

13
Visualising data
1. Exploring data

 Scatter plots

# points scored

Age of senior league bowls players


14
Distributions, means and standard deviations
2. Understanding data

 Frequency distributions

n
Population of senior bowls players

65 66 67 68 69 70 71 72 73 74 75 76 … 83

Age of senior league bowls players


15
Distributions, means and standard deviations
2. Understanding data

 Probability distributions - e.g.: normal distribution

Normalization:

-3.29 -1.96 Mean s 1.96 3.29


95%
100% 16
Models and significance
2. Understanding data

 Models
 Attempt to explain/summarise data
 Vary in how well they “fit” the data
 E.g.: mean is a model; s illustrates fit
 Fit
 Significance
 Hypothesis testing involves comparing two models (H0 vs. H1)
 Comparing models is done using test statistics:
variance explained by the model/variance not explained by the model
 If the probability of observing this test statistic, or anything more extreme, is smaller
than .05/.01/.001, then we conclude statistical significance (i.e. H1 explains the data
better than H0)
Significance ≠ importance
Non-significance does not say anything about H0 17
Correlations and differences
2. Understanding data

 Example of a model/hypothesis test: difference between means = t-test

n
Population of
senior bowls players

65 66 67 68 69 70 71 72 73 74 75 76 … 83
Mean
Standard Deviation 18
Correlations and differences
2. Understanding data

 Example of a model/hypothesis test: difference between means = t-test

Student t distribution Values depend on degrees of


freedom (df): the number of values
that are free to vary when
calculating the statistic. For t-tests
this is n-1; the example shown is for
large samples (n>100)

-1.96 95% 1.96


19
Correlations and differences
2. Understanding data

 Example of a model/hypothesis test: correlation

# points scored

H0

Age of senior league bowls players


20
Most common assumptions for linear analyses
3. Testing assumptions

 Independence
 Data was collected from independent sources
 Variable measurements were independent (e.g. regression)

 Homoscedasticity/homogeneity of variance
 Variance is equal in different (sub-)samples

 Normality
 Sampling distribution/errors/data follow a normal distribution --> have limited skew
and kurtosis

21
Independence
3. Testing assumptions

 Data was collected from independent sources


 No repeated measures
 No mutual influence between participants
 No nested structures (see HLM module)

▪ Variable measurements were independent


 No priming, framing, context or other question order effects
 In regression-based models:
 Variables are unrelated to external (exogenous) variables
 Errors are independent

22
Homoscedasticity/homogeneity of variance
3. Testing assumptions

 One variable, multiple groups (e.g. t-test): spread of values is equal across
different groups
 Visual test: scatter- or boxplot
 Statistical test: Levene’s test for equality of variance
 When significant (p < .05): no homo-scedascity (i.e. heteroscedascity)
Levene’s test will usually be
significant in large samples;
use other tests
(e.g. Hartley’s Fmax)

23
Homoscedasticity/homogeneity of variance
3. Testing assumptions

 Two variables (e.g. regression): spread of errors/residuals is equal across


different values of x

24
Normality
3. Testing assumptions

 In many statistical tests


 Sampling distribution is normally distributed
--> test normality of sample
 Visually testing normality of (sub-)sample data
 Histograms (see slide 10)
 Q-Q plots: theoretical vs. actual quantiles
Kurtosis (+)
Skew (+)
Kurtosis (+)

Skew (+)

"Normal normal qq" by


Skbkekas - Wikipedia 25
Normality
3. Testing assumptions

 Statistical tests for normality of (sub-)sample data


 Compute descriptives including skew and kurtosis
 Convert skew and kurtosis to z-scores, e.g.:

must be ≤ 1.96

Increase to 2.58 in larger samples and do not use in very large samples (n > 200)

 Shapiro-Wilk test: significant (p < .05) when NOT normal

26
Normality
3. Testing assumptions

 In regression-based models
 Errors/residuals, not indicators need to be normally distributed
 Same visual principles as Q-Q plot apply

residuals

Please note: in
this case, both
graphs do not
represent the
same data
What if assumptions are violated?
3. Testing assumptions

 Correct data
 Exclude outliers
 Transform data, e.g.:
 Log-, square root and reciprocal (1/x) transformations shorten the right tale (i.e. correct
positive skew)
 The same transformations applied to the reverse score (score – highest score + 1)
correct for negative skew
The same transformation has to be applied to variables that are compared directly

 Turn to tests that are robust against violations or to non-parametric tests, e.g.
 Mann–Whitney U for group comparisons
 Kendall's tau for dependence between two variables
Scales and factors - basics
4. Scales and factors

 Scales are sets of indicators that measure the same


latent variable / factor
≠ response scales!

 E.g. To aid me in my teaching, overall, I feel


Powerpoint … is:
 Easy to Learn
 Easy to manipulate
 Clear to interact with Ease of use
 Flexible to interact with
 Difficult to master (reverse scored)
 Very cumbersome (reverse scored)
Scales and factors - basics
4. Scales and factors

 Visualisation of scale with three indicators measuring one latent variable / factor:
Principal component analysis
4. Scales and factors

 Run PCA with no restriction on the number of factors and with a scree plot
 Decide how many factors to retain based on eigenvalues, scree plot and R2
 Separate mountain from scree
 Eigenvalue > 1
 Eigenvalue: proportion of
variance explained by factor
(sum = # variables)
 Cumulative R2 > .6
Principal component analysis
4. Scales and factors

 Run PCA again


 Restrict the number of extracted factors
 Rotate factors orthogonally or oblique based on theory (or trial and error/inspection
of the component correlation matrix)
 Study the component matrix (orthogonal) or pattern matrix (oblique) to interpret
factors and exclude indicators when
 Loading is small (< .4/.7) on all factors
 Loadings are high for multiple factors (> .4/.7)
 Difference between loadings on different factors < .2
 Run PCA again after each exclusion
Principal component analysis
4. Scales and factors

 Once a stable solution has been reached, evaluate reliability and uni-
dimensionality of scales
 Inter-item correlation when # indicators for factor is 2
 Should be significant
 Chronbach’s Alpha when # indicators for factor is > 2
 Should be higher than .7
 “Alpha if item deleted” should be lower than Alpha
 If not: exclude item and run PCA again
End of Part 1
© Copyright 2017 W. Mertens, A. Pugliese & J. Recker. All Rights
Reserved.

26.11.2021 Teaching Notes: Quantitative Data Analysis ~ © Copyright 2017 W. Mertens, A. Pugliese & J. Recker. All Rights Reserved. ~ 34

You might also like