Open In App

Performing ANOVA for Multiple Variables in R

Last Updated : 20 Sep, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Analysis of Variance (ANOVA) is a powerful statistical technique used to determine if there are significant differences between the means of three or more independent groups. When dealing with multiple variables, ANOVA can be extended to assess the impact of several independent variables on one or more dependent variables simultaneously. This article will guide you through performing ANOVA for multiple variables using R Programming Language.

ANOVA for Multiple Variables

ANOVA is used to compare the means across different groups to understand if at least one group's mean is significantly different from the others. It helps in determining the influence of independent categorical variables on a continuous dependent variable by analyzing variance.

  • Two-Way ANOVA: Extends the basic one-way ANOVA by allowing two independent variables (factors) to test for interaction effects between them.
  • MANOVA (Multivariate Analysis of Variance): Handles multiple dependent variables at the same time, which is helpful when you have several response variables that might be correlated.

Before performing ANOVA, ensure that your data meets the following assumptions:

  • The dependent variable should be continuous and normally distributed.
  • The groups should have similar variances (homogeneity of variance).
  • Observations should be independent.

1: Performing Two-Way ANOVA in R

Before performing ANOVA, you need to ensure your data is structured appropriately. A two-way ANOVA tests the effects of two independent variables and their interaction on a dependent variable.

R
# Sample dataset for two-way ANOVA
data <- data.frame(
  group1 = factor(rep(c("A", "B"), each = 5)),
  group2 = factor(rep(c("X", "Y"), times = 5)),
  response = c(5, 6, 7, 6, 8, 4, 5, 6, 5, 7)
)

# Sample dataset for MANOVA
data_manova <- data.frame(
  group = factor(rep(c("Control", "Treatment"), each = 5)),
  response1 = c(7, 8, 6, 9, 10, 6, 7, 5, 8, 6),
  response2 = c(70, 80, 60, 90, 100, 60, 70, 50, 80, 60)
)

# Two-way ANOVA in R
anova_model <- aov(response ~ group1 * group2, data = data)
summary(anova_model)

Output:

              Df Sum Sq Mean Sq F value Pr(>F)
group1 1 2.500 2.500 1.607 0.252
group2 1 0.000 0.000 0.000 1.000
group1:group2 1 1.067 1.067 0.686 0.439
Residuals 6 9.333 1.556

In the above code, the aov function is used to fit the two-way ANOVA model. The group1 * group2 formula tests for both the main effects and interaction effects of group1 and group2 on response.

2: Performing MANOVA in R

MANOVA is useful when you have multiple dependent variables. You can perform MANOVA using the manova function.

R
# Sample dataset for MANOVA with more observations
data_manova <- data.frame(
  group = factor(rep(c("Control", "Treatment"), each = 10)),
  response1 = c(7, 8, 6, 9, 10, 6, 7, 5, 8, 6, 6, 8, 9, 5, 7, 6, 5, 8, 9, 6),
  response2 = c(70, 80, 60, 90, 100, 60, 70, 50, 80, 60, 55, 85, 95, 45, 65, 70, 60, 75, 85, 55)
)

# Check the structure of the dataset
str(data_manova)

# Fit the MANOVA model
manova_model <- manova(cbind(response1, response2) ~ group, data = data_manova)

# Summarize the MANOVA model
summary(manova_model)

Output:

'data.frame':	20 obs. of  3 variables:
$ group : Factor w/ 2 levels "Control","Treatment": 1 1 1 1 1 1 1 1 1 1 ...
$ response1: num 7 8 6 9 10 6 7 5 8 6 ...
$ response2: num 70 80 60 90 100 60 70 50 80 60 ...

Df Pillai approx F num Df den Df Pr(>F)
group 1 0.010539 0.090533 2 17 0.9139
Residuals 18

3: Post-hoc Tests for Multiple Variables

Once ANOVA shows significant results, you may want to explore which specific groups are different. For two-way ANOVA, you can use the TukeyHSD function:

R
# Post-hoc test after two-way ANOVA
TukeyHSD(anova_model)

Output:

  Tukey multiple comparisons of means
95% family-wise confidence level

Fit: aov(formula = response ~ group1 * group2, data = data)

$group1
diff lwr upr p adj
B-A -1 -2.93015 0.93015 0.2518678

$group2
diff lwr upr p adj
Y-X 0 -1.93015 1.93015 1

$`group1:group2`
diff lwr upr p adj
B:X-A:X -1.6666667 -5.607998 2.274665 0.5098871
A:Y-A:X -0.6666667 -4.607998 3.274665 0.9328804
B:Y-A:X -1.0000000 -4.525234 2.525234 0.7646960
A:Y-B:X 1.0000000 -3.317513 5.317513 0.8514822
B:Y-B:X 0.6666667 -3.274665 4.607998 0.9328804
B:Y-A:Y -0.3333333 -4.274665 3.607998 0.9903812

For MANOVA, you can analyze each dependent variable separately using ANOVA or post-hoc tests.

R
# Post-hoc test for individual response variables after MANOVA
summary.aov(manova_model)

Output:

 Response response1 :
Df Sum Sq Mean Sq F value Pr(>F)
group 1 6.4 6.4 3.3684 0.1038
Residuals 8 15.2 1.9

Response response2 :
Df Sum Sq Mean Sq F value Pr(>F)
group 1 640 640 3.3684 0.1038
Residuals 8 1520 190

Conclusion

Performing ANOVA for multiple variables in R is a powerful way to test for group differences across various experimental designs. Whether you are dealing with two independent variables using two-way ANOVA or multiple dependent variables using MANOVA, R provides a straightforward framework for running these tests.


Next Article
Article Tags :

Similar Reads