Summarising univariate categorical data
tabulate [varcat]
graph bar ([count/percent]), over([varcat])
graph pie, over([varcat])
Summarising univariate numeric data
summarize [varnum], detail
histogram [varnum], frequency normal
graph vbox [varnum]
Summarising bivariate categorical data
tabulate [varcat1] [varcat2], [row/column/cell]
graph bar ([count/percent]), over([varcat1]) over([varcat2])
graph pie, over([varcat])
Summarising bivariate (one categorical variable, one numeric variable) data
by [varcat], sort: summarize [varnum], detail
graph box [varnum], over([varcat])
|r|
0.0 – 0.1 negligible
Summarising bivariate numeric data 0.1 – 0.3 weak
See Pearson Correlation 0.3 – 0.5 moderate
0.5 – 1.0 strong
ONE-SAMPLE T-/Z-TEST (comparing one numerical variable to a known μ)
histogram [varnum], frequency normal
swilk [varnum] → (Shapiro-Wilk) Is normally distributed?
ttest [varnum] == [μ] or ztest [varnum] == [μ], sd([σ])
Manually calculate effect size: d = t/(n ^ 0.5)
χ2 GOODNESS OF FIT TEST (comparing one categorical variable to expected proportions)
csgof [varcat], expperc([n1],[n2],...) → Are all categories expected to have 5+ samples?
Manually calculate effect size: W = (χ2 / n) ^ 0.5
PEARSON CORRELATION (comparing two+ numeric variables)
graph twoway (lfit [varnumy] [varnumx]) (scatter [varnumy] [varnumx]) → Non-linear relationship?
graph matrix [varnum1] [varnum2] [varnum3] … → Generates multiple scatterplots
pwcorr [varnum1] [varnum2] [varnum3] …, sig
ci2 [varnum1] [varnum2] [varnum3] …, corr
INDEPENDENT SAMPLES T-TEST (comparing one numeric and one between-subjects categorical variable)
histogram [varnum], by([varcat]) freq
by [varcat], sort: swilk [varnum] → (Shapiro-Wilk) Are both normally distributed?
robvar [varnum], by([varcat]) → (Levene’s) Accept H0 of equal variance?
ttest [varnum], by([varcat])
Manually calculate effect size: d = t * ((1/n1 + 1/n2) ^ 0.5) |d|
<0.2 negligible
0.2 – 0.5 small
PAIRED T-TEST (comparing one numeric and one within-subjects categorical variable) 0.5 – 0.8 moderate
generate [diff] = [varnum1a] - [varnum1b] 0.8+ large
histogram [diff]
swilk [diff] → (Shapiro-Wilk) Are differences normally distributed?
ttest [varnum1a] == [varnum1b]
Manually calculate effect size: d = t/(nd ^ 0.5)
χ2 TEST OF INDEPENDENCE (comparing two unrelated categorical variables)
tabulate [varcat1] [varcat2], row expected → Are all combinations expected to have 5+ samples?
tabulate [varcat1] [varcat2], chi2 expected row
Manually calculate effect size: W = (χ2 / n) ^ 0.5
W
0.0 – 0.1 negligible
McNEMAR’S TEST (comparing two related categorical variables, 2x2 table) 0.1 – 0.3 small
0.3 – 0.5 moderate
mcci [#cellA] [#cellB] [#cellC] [#cellD]
0.5 – 1.0 large
Manually calculate effect size: W = (χ2 / n) ^ 0.5
STATA FORMATTING COMMANDS
import excel “[path]”, sheet1 firstrow
label define [labelset] 0 “[label0]” 1 “[label1]” … → Creates label set
label values [var1] [var2] [var3] [labelset] → Assigns label set to multiple variables
recode [varnum] (min/[n1] = 0) ([n1]/[n2] = 1) ([n2]/max = 2), generate([varcat])
→ Generates new categorical variable from ranges of numeric variable
generate [var1] = 0 → Generates new variable, with value 0 for all subjects
generate [difference] = [varnum1] - [varnum2]
replace [var1]=1 if ([var2]>=0.5) → Replaces variable values if condition is met.
rename [oldvar] [newvar2]