Unit 545 Differences Between Two or More Groups Non Parametric With Answers
Unit 545 Differences Between Two or More Groups Non Parametric With Answers
09 January 2024
Selecting tests
Which test would you select?
1. Two groups of students were asked the question: “How likely is it that you will go on
holiday this year?” The three answering categories of this question were “not very
much”, “somewhat” and “very much”. Suppose you want to test the (null-)
hypothesis that in the population the two groups do not differ in the way they
respond to the question. Which variant of the linear model or non-parametric test
would you use for this?
The Mann-Whitney_Wilcoxon test. As the dependent variable is ordinal, a non-parametric test
should be used. And since we are having two groups only, the MWW test is to be preferred.
This allows you to test whether the medians in the groups are different.
2. Three groups of 3 x 60 students were randomly assigned to three teaching methods.
The grade afterwards was used to test the difference between the teaching methods.
Which variant of the linear model or non-parametric test would you use for this?
Since we are having three groups, and the sample is rather big, an (Welch) ANOVA or a linear
model with a nominal independent variable (dummies) (which is the same as a (non Welch)
ANOVA) is to be preferred. Generally parametric tests are said to be more powerful.
3. Five relatively small groups of politicians have responded to the statement: “Global
warming is fake news”. The politicians were able to respond to this question on a 7-
point Likert scale (with 1 = “not at all agree” to 7 = “totally agree”). Suppose a
researcher wants to test the hypothesis that in the population the groups do not
differ in the way they respond to the question. The researcher has no problem in
treating the Likert scale as an interval-level (quantitative) variable instead of
1
ordinal. However, she finds that when she runs a linear model, the residuals do not
show normal distributions in each group. Even after trying several different ways of
transforming the dependent variable, the residuals are far removed from showing a
normal distribution.
What would you recommend the researcher? Here are the options:
A. Consider a Kruskall-Wallis test.
B. Remove data points that cause the non-normality and run the linear model again.
C. Collect more data until the residuals show a more normal distribution and run the linear
model again.
D. There might be errors in the data: check if the problem goes away if you change a few
values in the data matrix. Then run the linear model again.
Removing data points is often arbitrary and reduces possibilities to generalize to the
population of politicians (no B). Collecting data requires more politicians, and why do we
expect the distribution the change if we do that? (so no C). Checking data is always good, but
to randomly check things you do not like, is confirmation bias (so no D).
You will now do some of the tests covered in this unit.
2
# geom_histogram(aes(y =..density..))
# +
# stat_function(fun = dnorm,
# args = list(mean = mean(asthma),
# sd = sd(asthma)))
We indeed have a skewed dependent variable with a small number of cases, so we opt for the
(non parametric) Mann–Whitney–Wilcoxon test.
7. Create a scatter (or jitter) plot with these data. What do you think: are the groups
really different?
data %>%
ggplot(aes(x = group, y = asthma)) +
geom_point()
3
The groups look different! However, the sample size is very small, so maybe this is just an
outcome of a random sample?
8. Check whether the groups are really different by using the Mann–Whitney–
Wilcoxon test. Check the RHelpdesk file for this when needed.
test <- wilcox.test(asthma ~ group,
data = data,
exact = FALSE)
test
##
## Wilcoxon rank sum test with continuity correction
##
## data: asthma by group
## W = 22, p-value = 0.05855
## alternative hypothesis: true location shift is not equal to 0
4
Under the null hypothesis the groups come from the same distribution (read, come from the
same population, read, there is no difference between the medians of both groups) and thus
the distributions are the same. The only reason for finding differences between the groups is
then because by chance. The W statistic and the associated table gives a p-value. This p-value
is the chance that these data are both random samples from the same (distribution of the)
population. In this case, the p value is low, but still above the cut off point of 0.05, so we
CANNOT exclude the possibility that there is actually NO difference between the two groups.
(so: groups maybe/probably not different/the same).
10. Let us ignore some observations about the skewness of the data and do a simple
parametric test, assuming equal variances for the groups. What do you see?
test_2 <- data %>%
lm(asthma ~ group, .)
summary(test_2)
##
## Call:
## lm(formula = asthma ~ group, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.80 -1.65 -0.50 0.65 5.20
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.800 1.158 5.874 0.000372 ***
## groupT -3.600 1.637 -2.199 0.059081 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.588 on 8 degrees of freedom
## Multiple R-squared: 0.3767, Adjusted R-squared: 0.2988
## F-statistic: 4.836 on 1 and 8 DF, p-value: 0.05908
The outcome is similar, also in this case we do NOT reject the null hypothesis.
5
The outcome is the APGAR score test annotation indicator measured 5 minutes after birth.
APGAR scores range from 0 to 10 with scores of 7 or higher considered normal (healthy),
4-6 low and 0-3 critically low.1
The data of the study can be created using the following commands.
# Example again stolen from https://sphweb.bumc.bu.edu/otlt/mph-
modules/bs/bs704_nonparametric/bs704_nonparametric4.html
11. Why would you opt for a (non parametric) Mann–Whitney–Wilcoxon test in this
context?
Because the samples are small, there are two groups and the APGAR scores are not normally
distributed. Check this by creating a histogram.
12. Test whether the groups are really different.
test_4 <- wilcox.test(apgar ~ group,
data = data_2,
exact = FALSE)
test_4
##
## Wilcoxon rank sum test with continuity correction
##
## data: apgar by group
## W = 47.5, p-value = 0.02609
## alternative hypothesis: true location shift is not equal to 0
1
The APGAR is based on 5 criteria Appearance of the skin, Pulse rate, Grimace (reflex or reaction to
stimulation), Activity (or muscle tone), and Respiration. Each of the 5 criteria is rated as 0 (very unhealthy), 1
or 2 (healthy) based on specific clinical criteria. The APGAR score is the sum of the 5 component scores and
ranges from 0 to 10. Infants with scores of 7 or higher are considered normal, 4-6 low and 0 to 3 critically
low. Sometimes the APGAR scores are repeated, for example at 1 minute after birth, at 5 and at 10 minutes
after birth and analyzed.
6
exclude the possibility that there is no difference between the two groups. This is supporting
the alternative. So it seems there REALLY is a difference.
Suppose now we do not have two but several groups. Again the sample is small and the
distribution of the dependent variable is not close to normal. So the standard (and
powerful) statistical methods for testing group differences may not be used.
set.seed(19640304)
n <- 40
my_data <- c(1:n) %>% as.data.frame() %>% rename(RNR = ., .)
my_data$class <- rep(1:4, len = 10)
mu <- 100
sigma <- 0.65
my_data$int_mot <- rlnorm(n, mu, sigma)
my_data %>%
ggplot(aes(x = int_mot)) +
geom_histogram(aes(y = after_stat(density))) +
stat_function(fun = dnorm,
args = list(mean = mean(my_data$int_mot),
sd = sd(my_data$int_mot)))
7
shapiro.test(my_data$int_mot)
##
## Shapiro-Wilk normality test
##
## data: my_data$int_mot
## W = 0.88437, p-value = 0.0006918
14. Would you recommend we use this linear model, or would you recommend a non-
parametric alternative? Why? And if you recommend a non-parametric alternative:
which test would you recommend?
There are clear deviations from normality in the plots. There are only ten participants per
group so we cannot assume that a linear model would be robust to these deviations from
normality. A non-parametric test is more appropriate. As we are comparing several groups,
the Kruskal-Wallis test for group comparisons is recommended.
8
Example 4: intrinsic motivation for health-related behavior
A total of 21 people were randomly assigned to participate in either an aerobics class, a
spinning class or a pilates class. After class they were asked to report how depressed they
felt at that moment. Depressed mood was measured using 1 item with 5 indicating high
depression and 1 indicating low depression. Check the output below.
set.seed(19640304)
n <- 21
my_data <- c(1:n) %>% as.data.frame() %>% rename(RNR = ., .)
my_data$group <- rep(1:3, len = 7)
my_data$type <- "other"
my_data$type <- ifelse(my_data$group == 1, "aerobics", my_data$type)
my_data$type <- ifelse(my_data$group == 2, "spinning", my_data$type)
my_data$type <- ifelse(my_data$group == 3, "pilates", my_data$type)
mu <- 100
sigma <- 0.3
my_data$depress <- rlnorm(n, mu + 0.5*my_data$group, sigma)
my_data %>%
ggplot(aes(x = type, y = depress)) +
geom_boxplot()
kw_test
15. What was tested here? Explain in your own words and give the null hypothesis in
words.
Explanation: It was tested whether there is a difference in distribution of depressed mood
after participating in either a spinning, an aerobics class or in pilates. Null hypothesis in
words: the three distributions of depressed mood are equal. OR the medians in all three
groups is the same.
16. Which group reported the more depressed moods?
groups <- my_data %>%
group_by(type) %>%
summarise(median = median(depress))
9
kw_test$p.value, " we do ", NOT, " reject the null hypothesis of there
being no differences in the medians of the groups.", sep = "")
10