dplyr Package in R Programming

Last Updated : 02 May, 2025

The dplyr package for R offers efficient data manipulation functions. It makes data transformation and summarization simple with concise, readable syntax.

Key Features of dplyr

Data Frame and Tibble

Data frames in dplyr in R is organized tables where each column stores specific types of information, like names, ages, or scores.for creating a data frame involves specifying column names and their respective values.

df <- data.frame(Name = c("vipul", "jayesh", "anurag"),
                 Age = c(25, 23, 22), Score = c(95, 89, 78))
df

Output:

    Name Age Score
1  vipul  25    95
2 jayesh  23    89
3 anurag  22    78

On the other hand, tibbles, introduced through the tibble package, share similar functionality but offer enhanced user-friendly features. The syntax for creating a tibble is comparable to that of a data frame.

Pipes (`%>%`)

dplyr in R The pipe operator (%>%) in dplyr package, which allows us to chain multiple operations together, improving code readability.

library(dplyr)

result <- mtcars %>%
filter(mpg > 20) %>%	 
select(mpg, cyl, hp) %>% 
group_by(cyl) %>%		
summarise(mean_hp = mean(hp)) 

print(result)

Output:

    cyl mean_hp
  <dbl>   <dbl>
1     4    82.6
2     6   110

Important dplyr Functions

dplyr in R provides various important functions that can be used for Data Manipulation. These are:

filter()

For choosing cases and using their values as a base for doing so.

# Create a data frame with missing data
d <- data.frame(name = c("Abhi", "Bhavesh", "Chaman", "Dimri"),
                age = c(7, 5, 9, 16),
                ht = c(46, NA, NA, 69),
                school = c("yes", "yes", "no", "no"))

print(d)

# Finding rows with NA value
r_w_na <- d %>% filter(is.na(ht))
print(r_w_na)

# Finding rows with no NA value
r_w_na <- d %>% filter(!is.na(ht))
print(r_w_na)

Output:

     name age ht school
1    Abhi   7 46    yes
2 Bhavesh   5 NA    yes
3  Chaman   9 NA     no
4   Dimri  16 69     no

     name age ht school
1 Bhavesh   5 NA    yes
2  Chaman   9 NA     no

   name age ht school
1  Abhi   7 46    yes
2 Dimri  16 69     no

arrange()

For reordering of the cases.

# Create a data frame with missing data 
d <- data.frame( name = c("Abhi", "Bhavesh", "Chaman", "Dimri"), 
                 age = c(7, 5, 9, 16), 
                 ht = c(46, NA, NA, 69),
                 school = c("yes", "yes", "no", "no") )
d

# Arranging name according to the age
d.name<- arrange(d, age)
print(d.name)

Output:

     name age ht school
1    Abhi   7 46    yes
2 Bhavesh   5 NA    yes
3  Chaman   9 NA     no
4   Dimri  16 69     no

     name age ht school
1 Bhavesh   5 NA    yes
2    Abhi   7 46    yes
3  Chaman   9 NA     no
4   Dimri  16 69     no

select() and rename()

For choosing variables and using their names as a base for doing so.

d <- data.frame(name=c("Abhi", "Bhavesh",
                        "Chaman", "Dimri"),
                 age=c(7, 5, 9, 16),
                 ht=c(46, NA, NA, 69),
                 school=c("yes", "yes", "no", "no"))

# startswith() function to print only ht data
select(d, starts_with("ht"))

# everything except ht data
select(d, -starts_with("ht"))
select(d, 1: 2)
select(d, contains("a"))

# Printing data of column heading which matches 'na'
select(d, matches("na"))

Output:

  ht
1 46
2 NA
3 NA
4 69

     name age school
1    Abhi   7    yes
2 Bhavesh   5    yes
3  Chaman   9     no
4   Dimri  16     no

     name age
1    Abhi   7
2 Bhavesh   5
3  Chaman   9
4   Dimri  16

     name age
1    Abhi   7
2 Bhavesh   5
3  Chaman   9
4   Dimri  16

     name
1    Abhi
2 Bhavesh
3  Chaman
4   Dimri

mutate() and transmute()

Addition of new variables which are the functions of prevailing variables.

d <- data.frame(name = c("Abhi", "Bhavesh", "Chaman", "Dimri"), 
                age = c(7, 5, 9, 16), 
                ht = c(46, NA, NA, 69),
                school = c("yes", "yes", "no", "no"))

# Add 'x3' as sum of height and age, keeping all columns
mutate(d, x3 = ht + age)

# Add 'x3' as sum of height and age, keeping only 'x3'
transmute(d, x3 = ht + age)

Output:

    name age ht school x3
1    Abhi   7 46    yes 53
2 Bhavesh   5 NA    yes NA
3  Chaman   9 NA     no NA
4   Dimri  16 69     no 85

  x3
1 53
2 NA
3 NA
4 85

summarise()

Condensing various values to one value.

d <- data.frame( name = c("Abhi", "Bhavesh",
                          "Chaman", "Dimri"), 
                 age = c(7, 5, 9, 16), 
                 ht = c(46, NA, NA, 69),
                 school = c("yes", "yes", "no", "no") )

summarise(d, mean = mean(age))
summarise(d, med = min(age))
summarise(d, med = max(age))
summarise(d, med = median(age))

Output:

  mean
1 9.25

  med
1   5

  med
1  16

  med
1   8

sample_n() and sample_frac()

For taking random specimens.

d <- data.frame( name = c("Abhi", "Bhavesh",
                          "Chaman", "Dimri"), 
                 age = c(7, 5, 9, 16), 
                 ht = c(46, NA, NA, 69),
                 school = c("yes", "yes", "no", "no") )

# Printing three rows
sample_n(d, 3)

# Printing 50 % of the rows
sample_frac(d, 0.50)

Output:

    name age ht school
1 Chaman   9 NA     no
2  Dimri  16 69     no
3   Abhi   7 46    yes

   name age ht school
1  Abhi   7 46    yes
2 Dimri  16 69     no

dplyr Package in R Programming

geeksforgeeks user

Improve

Article Tags :

dplyr Package in R Programming

Key Features of dplyr

Data Frame and Tibble

Pipes (%>%)

Important dplyr Functions

filter()

arrange()

select() and rename()

mutate() and transmute()

summarise()

sample_n() and sample_frac()

Similar Reads

Introduction

Fundamentals of R

Variables

Input/Output

Control Flow

Functions

Data Structures

Object Oriented Programming

Error Handling

Thank You!

What kind of Experience do you want to share?

Pipes (`%>%`)