Dummy Variables in R Programming
Last Updated :
17 Apr, 2025
Dummy variables are binary variables used to represent categorical data in numerical form. They represents a characteristic of an observation for example, gender can be represented as 1 for male and 0 for female or vice versa. New columns are created to reflect these binary values, such as gender_m
for male and gender_f
for female.
Here's the original dataframe:
After creating dummy variable: 
Dummy variables are essential in statistical models and machine learning algorithms because most algorithms require numerical input. By converting categories into binary values, dummy variables allow these models to process and analyze categorical features effectively In this article, we will create dummy variables in R using two methods, ifelse() method and another is by using dummy_cols() function.
1. Using ifelse() function
ifelse() function performs a test and based on the result of the test return true value or false value as provided in the parameters of the function. Using this function, dummy variable can be created accordingly.
Syntax: ifelse(test, yes, no)
Parameters:
- test: represents test condition
- yes: represents the value which will be executed if test condition satisfies
- no: represents the value which will be executed if test condition does not satisfies
Example 1:
In this example, we loaded the built-in PlantGrowth
dataset and created a dummy variable group_ctr1
, which is 1 if the group is "ctrl" (control group) and 0 otherwise. This transformation makes the categorical group
variable suitable for numerical analysis.
R
pg <- PlantGrowth
cat("Original dataset:\n")
head(pg, 5)
pg$group_ctr1 <- ifelse(pg$group == "ctrl", 1, 0)
cat("After creating dummy variable:\n")
head(pg,5)
Output:
Original Data
Data with dummy variablesExample 2:
In this example, we created a data frame df
with categorical and numerical variables. We then generated two dummy variables: gender_m
, which is 1 if gender is "m" and 0 otherwise and gender_f
, which is 1 if gender is "f" and 0 otherwise. This allows the gender
variable to be represented in a numerical format suitable for analysis
R
df <- data.frame(gender = c("m", "f", "m"),
age = c(19, 20, 20),
city = c("Delhi", "Mumbai",
"Delhi"))
head(df)
df$gender_m <- ifelse(df$gender == "m", 1, 0)
df$gender_f <- ifelse(df$gender == "f", 1, 0)
head(df)
Output:
Original Data Frame
After creating dummy variables2. Using dummy_cols() function
dummy_cols() function is present in fastDummies package. It creates dummy variables on the basis of parameters provided in the function. If columns are not selected in the function call for which dummy variable has to be created, then dummy variables are created for all characters and factors column in the dataframe.
Syntax: dummy_cols(.data, select_columns = NULL)
Parameters:
- data: represents object for which dummy columns has to be created .
- select_columns: represents columns for which dummy variables has to be created.
Example 1:
In this example, we used the fastDummies
package to automatically create dummy variables for the group
column in the PlantGrowth
dataset. The dummy_cols()
function generates separate binary columns for each category in group
, enabling easy use of categorical data in numerical analysis.
R
install.packages("fastDummies")
library(fastDummies)
data <- PlantGrowth
data <- dummy_cols(data,
select_columns = "group")
head(data,5)
Output:
Using dummy_cols function Example 2:
In this example, we created a data frame df
and used the dummy_cols()
function from the fastDummies
package to automatically generate dummy variables for all categorical columns (gender
and city
). This converts each category into separate binary columns, making the data ready for numerical analysis.
R
df <- data.frame(gender = c("m", "f", "m"),
age = c(19, 20, 20),
city = c("Delhi", "Mumbai",
"Delhi"))
df <- dummy_cols(df)
head(df)
Output:
Using dummy_cols functionIn this article, we explored how to create dummy variables in R using two approaches ,manually with the ifelse()
function and automatically with the dummy_cols()
function from the fastDummies package ,to convert categorical data into a numerical format suitable for analysis.
Similar Reads
Hello World in R Programming When we start to learn any programming languages we do follow a tradition to begin HelloWorld as our first basic program. Here we are going to learn that tradition. An interesting thing about R programming is that we can get our things done with very little code. Before we start to learn to code, le
2 min read
Handling Missing Values in R Programming Missing values are those values that are not known and NA or NaN are reserved words that indicate a missing value in R Programming language. Missing values are practical in life for example, some cells in spreadsheets are empty and handling them is important for better analysis. In this article, we
3 min read
Assigning Vectors in R Programming Vectors are one of the most basic data structure in R. They contain data of same type. Vectors in R is equivalent to arrays in other programming languages. In R, array is a vector of one or more dimensions and every single object created is stored in the form of a vector. The members of a vector are
5 min read
Subsetting in R Programming In R Programming Language, subsetting allows the user to access elements from an object. It takes out a portion from the object based on the condition provided. There are 4 ways of subsetting in R programming. Each of the methods depends on the usability of the user and the type of object. For examp
11 min read
How to Code in R programming? R is a powerful programming language and environment for statistical computing and graphics. Whether you're a data scientist, statistician, researcher, or enthusiast, learning R programming opens up a world of possibilities for data analysis, visualization, and modeling. This comprehensive guide aim
4 min read