Stata Tests

This document provides guidance on summarizing univariate and bivariate data using Stata commands. It outlines how to summarize categorical and numeric variables individually using commands like tabulate, graph bar, graph pie, summarize, histogram, and graph vbox. It also provides instructions for summarizing relationships between variables, including correlations between numeric variables, differences between groups for categorical and numeric variables using t-tests, chi-square tests for categorical variables, and effect size calculations. Formatting commands like import, label, recode, generate, replace, and rename are also included.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views2 pages

Stata Tests

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Summarising univariate categorical data

tabulate [varcat]
graph bar ([count/percent]), over([varcat])
graph pie, over([varcat])

Summarising univariate numeric data

summarize [varnum], detail
histogram [varnum], frequency normal
graph vbox [varnum]

Summarising bivariate categorical data

tabulate [varcat1] [varcat2], [row/column/cell]
graph bar ([count/percent]), over([varcat1]) over([varcat2])
graph pie, over([varcat])

Summarising bivariate (one categorical variable, one numeric variable) data

by [varcat], sort: summarize [varnum], detail
graph box [varnum], over([varcat])
|r|
0.0 – 0.1 negligible
Summarising bivariate numeric data 0.1 – 0.3 weak
See Pearson Correlation 0.3 – 0.5 moderate
0.5 – 1.0 strong

ONE-SAMPLE T-/Z-TEST (comparing one numerical variable to a known μ)

histogram [varnum], frequency normal
swilk [varnum] → (Shapiro-Wilk) Is normally distributed?
ttest [varnum] == [μ] or ztest [varnum] == [μ], sd([σ])
Manually calculate effect size: d = t/(n ^ 0.5)

χ2 GOODNESS OF FIT TEST (comparing one categorical variable to expected proportions)

csgof [varcat], expperc([n1],[n2],...) → Are all categories expected to have 5+ samples?
Manually calculate effect size: W = (χ2 / n) ^ 0.5

PEARSON CORRELATION (comparing two+ numeric variables)

graph twoway (lfit [varnumy] [varnumx]) (scatter [varnumy] [varnumx]) → Non-linear relationship?
graph matrix [varnum1] [varnum2] [varnum3] … → Generates multiple scatterplots
pwcorr [varnum1] [varnum2] [varnum3] …, sig
ci2 [varnum1] [varnum2] [varnum3] …, corr

INDEPENDENT SAMPLES T-TEST (comparing one numeric and one between-subjects categorical variable)
histogram [varnum], by([varcat]) freq
by [varcat], sort: swilk [varnum] → (Shapiro-Wilk) Are both normally distributed?
robvar [varnum], by([varcat]) → (Levene’s) Accept H0 of equal variance?
ttest [varnum], by([varcat])
Manually calculate effect size: d = t * ((1/n1 + 1/n2) ^ 0.5) |d|
<0.2 negligible
0.2 – 0.5 small
PAIRED T-TEST (comparing one numeric and one within-subjects categorical variable) 0.5 – 0.8 moderate
generate [diff] = [varnum1a] - [varnum1b] 0.8+ large
histogram [diff]
swilk [diff] → (Shapiro-Wilk) Are differences normally distributed?
ttest [varnum1a] == [varnum1b]
Manually calculate effect size: d = t/(nd ^ 0.5)

χ2 TEST OF INDEPENDENCE (comparing two unrelated categorical variables)

tabulate [varcat1] [varcat2], row expected → Are all combinations expected to have 5+ samples?
tabulate [varcat1] [varcat2], chi2 expected row
Manually calculate effect size: W = (χ2 / n) ^ 0.5
W
0.0 – 0.1 negligible
McNEMAR’S TEST (comparing two related categorical variables, 2x2 table) 0.1 – 0.3 small
0.3 – 0.5 moderate
mcci [#cellA] [#cellB] [#cellC] [#cellD]
0.5 – 1.0 large
Manually calculate effect size: W = (χ2 / n) ^ 0.5
STATA FORMATTING COMMANDS
import excel “[path]”, sheet1 firstrow
label define [labelset] 0 “[label0]” 1 “[label1]” … → Creates label set
label values [var1] [var2] [var3] [labelset] → Assigns label set to multiple variables
recode [varnum] (min/[n1] = 0) ([n1]/[n2] = 1) ([n2]/max = 2), generate([varcat])
→ Generates new categorical variable from ranges of numeric variable
generate [var1] = 0 → Generates new variable, with value 0 for all subjects
generate [difference] = [varnum1] - [varnum2]
replace [var1]=1 if ([var2]>=0.5) → Replaces variable values if condition is met.
rename [oldvar] [newvar2]