Linear Regression on Group Data in R
Last Updated :
11 Apr, 2025
Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. In R programming language it can be performed using the lm()
function which stands for "linear model". Sometimes, analysts need to apply linear regression separately to subsets of data grouped by a particular variable. This is where the concept of "grouped regression" comes.
Implementing Grouping Data in R
Grouping data in R allows you to analyse subsets of your data that share a common attribute. For example, you might want to perform linear regression separately for each group defined by a categorical variable. Below is the step by step implementation:
1. Install and Load Necessary Packages
First we will Load dplyr package.
R
install.packages("dplyr")
library(dplyr)
2. Prepare Your Data
Assume you have a dataset where you want to perform linear regression on each group defined by a categorical variable, such as group_var
. We will generate a sample dataset.
set.seed(123)
ensures reproducibility.data.frame()
creates a dataset with group
, x
and y
variables.rnorm(30
)
creates 30 random values following a normal distribution.
R
# Sample data
set.seed(123)
data <- data.frame(
group = rep(c("A", "B", "C"), each = 10),
x = rnorm(30),
y = rnorm(30)
)
print(data)
Output:
3. Group Data and Apply Linear Regression
Using the group_by()
function you can group the data by a categorical variable and then apply the lm()
function to each group using do()
.
group_by(group)
groups the dataset by the categorical variable group
, allowing operations to be applied separately to each subset.do(model = lm(y ~ x, data = .))
applies linear regression within each group modeling y
as a function of x
and stores the models in a list.
R
# Perform linear regression by group
models <- data %>%
group_by(group) %>%
do(model = lm(y ~ x, data = .))
print(models)
Output:
Here models
will contain a list of linear regression models, one for each group.
4. Extract and Summarise Results
Here we extract and summarise the coefficients or other statistics from the models.
summarise()
calculates summary statistics for each group.coef(model)[1]
extracts the intercept from the linear regression model for each group.coef(model)[2]
extracts the slope (coefficient of x
) from the model for each group.
R
# Summarize coefficients by group
coefficients_summary <- models %>%
summarise(
intercept = coef(model)[1],
slope = coef(model)[2]
)
print(coefficients_summary)
Output:
This will provide a summary table of the intercept and slope for each group allowing you to understand how the relationship between data. This approach is useful in scenarios where relationships between variables differ across categories such as analyzing customer behavior, medical outcomes or vehicle performance.