Regression Analysis in R Programming
Last Updated :
12 Jul, 2025
Regression analysis is a statistical method used to determine the relationship between a dependent variable and one or more independent variables. Regression analysis is commonly used for prediction, forecasting and determining relationships between variables. In R, there are several types of regression techniques, each suitable for different types of data and relationships.
Types of Regression Analysis
We will explore various types of regression in this section.
1. Linear Regression
Linear regression is one of the most common regression techniques used to model the relationship between a dependent variable and one independent variable. The relationship is modeled as:
y = ax+b
Where:
- y is the dependent variable (response variable)
- x is the independent variable (predictor)
- a is the slope (coefficient)
- b is the intercept
Example: We are going to implement linear regression in R using the lm() function.
R
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 5)
model <- lm(y ~ x)
summary(model)
Output:
Linear Regression2. Logistic Regression
Logistic regression is used for classification tasks, where the response variable is categorical (often binary). It estimates the probability of an event occurring using a logistic function:
y = \frac{1}{1 + e^{-z}}
Where:
- y is the predicted probability(response variable).
- z is a linear combination of independent variables.
Despite its name, logistic regression is used for classification, not regression tasks, because it predicts a probability (which lies between 0 and 1) rather than a continuous value. However, it is still referred to as logistic regression due to the mathematical form of the model.
Example: We are implementing logistic regression in R using the glm() function with a binomial family.
R
IQ <- rnorm(40, 30, 2)
result <- c(0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0)
df <- data.frame(IQ, result)
model <- glm(result ~ IQ, family = binomial, data = df)
summary(model)
Output:
Logistic Regression3. Polynomial Regression
Polynomial regression is used when the relationship between the independent and dependent variables is non-linear. It is a form of linear regression where we model the data using polynomial equations. The general equation for polynomial regression of degree n is:
y = a_nx^n + a_{n-1}x^{n-1} + \dots + a_1x + b
Where:
- y is the dependent variable (response variable)
- x is the independent variable (predictor)
- a is the slope (coefficient)
- b is the intercept
- n is the degree of the polynomial
Example: We are implementing Polynomial regression in R by adding polynomial terms to the linear regression model.
R
x <- c(1, 2, 3, 4, 5)
y <- c(1, 4, 9, 16, 25)
model <- lm(y ~ poly(x, 2))
summary(model)
Output:
Polynomial Regression4. Lasso Regression
Lasso regression is a type of linear regression that uses L1 regularization, which helps in feature selection by shrinking some coefficients to zero. This technique is especially useful when there are many features, as it automatically selects the most significant predictors. The model for Lasso regression is represented as:
\text{Lasso (L1):}=\min_{\beta} \left( \text{Loss} + \lambda \|\beta\|_1 \right)
Where:
- Loss = squared error ( \sum (y_i - \hat{y}_i)^2 )
- \|\beta\|_1 = \sum |\beta_j|
We are implementing Lasso regression in R using the glmnet package with \alpha =1 to apply L1 regularization.
R
install.packages("glmnet")
library(glmnet)
x <- matrix(rnorm(100), ncol=10)
y <- rnorm(10)
model <- glmnet(x, y, alpha = 1)
print(model)
Output:
Lasso Regression5. Ridge Regression
Ridge regression is another regularized linear regression technique, but instead of L1 regularization (as in Lasso), it applies L2 regularization. This technique reduces the magnitude of the coefficients but does not set them to zero, which helps address multicollinearity in the data. The model for Ridge regression is represented as:
\text{Ridge(L2):} =\min_{\beta} \left( \text{Loss} + \lambda \|\beta\|_2^2 \right)
Where:
- Loss = squared error \sum (y_i - \hat{y}_i)^2
- \|\beta\|_2^2 = \sum \beta_j^2
Example: We are implementing Ridge regression in R using the glmnet package with \alpha = 0 to apply L2 regularization.
R
library(glmnet)
x <- matrix(rnorm(100), ncol=10)
y <- rnorm(10)
model <- glmnet(x, y, alpha = 0)
print(model)
Output:
Ridge Regression6. Elastic Net Regression
Elastic Net regression combines both L1 and L2 regularization. It is useful when there are many correlated predictors and helps improve prediction accuracy.
The model for Elastic Net regression is a mix of Lasso and Ridge:
\text{Elastic Net}=\min_{\beta} \left( \text{Loss} + \lambda_1 \|\beta\|_1 + \lambda_2 \|\beta\|_2^2 \right)
Where:
- Loss = residual sum of squares \sum (y_i - \hat{y}_i)^2
- \|\beta\|_1 = \sum |\beta_j|
- \|\beta\|_2^2 = \sum \beta_j^2∥
Example: We are implementing Elastic Net regression in R using the glmnet package with a value for α\alpha between 0 and 1 (for Lasso and Ridge combinations).
R
library(glmnet)
x <- matrix(rnorm(100), ncol=10)
y <- rnorm(10)
model <- glmnet(x, y, alpha = 0.5)
print(model)
Output:
Elastic Net RegressionIn this article, we have covered multiple regression techniques in R. Each method serves a specific purpose depending on the nature of the data and the problem.