Open In App

dplyr Package in R Programming

Last Updated : 02 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

The dplyr package for R offers efficient data manipulation functions. It makes data transformation and summarization simple with concise, readable syntax.

Key Features of dplyr

Data Frame and Tibble

Data frames in dplyr in R is organized tables where each column stores specific types of information, like names, ages, or scores.for creating a data frame involves specifying column names and their respective values.

R
df <- data.frame(Name = c("vipul", "jayesh", "anurag"),
                 Age = c(25, 23, 22), Score = c(95, 89, 78))
df                 

Output:

    Name Age Score
1 vipul 25 95
2 jayesh 23 89
3 anurag 22 78

On the other hand, tibbles, introduced through the tibble package, share similar functionality but offer enhanced user-friendly features. The syntax for creating a tibble is comparable to that of a data frame.

Pipes (%>%)

dplyr in R The pipe operator (%>%) in dplyr package, which allows us to chain multiple operations together, improving code readability.

R
library(dplyr)

result <- mtcars %>%
filter(mpg > 20) %>%	 
select(mpg, cyl, hp) %>% 
group_by(cyl) %>%		
summarise(mean_hp = mean(hp)) 

print(result)

Output:

    cyl mean_hp
<dbl> <dbl>
1 4 82.6
2 6 110

Important dplyr Functions

dplyr in R provides various important functions that can be used for Data Manipulation. These are: 

filter()

For choosing cases and using their values as a base for doing so.

R
# Create a data frame with missing data
d <- data.frame(name = c("Abhi", "Bhavesh", "Chaman", "Dimri"),
                age = c(7, 5, 9, 16),
                ht = c(46, NA, NA, 69),
                school = c("yes", "yes", "no", "no"))

print(d)

# Finding rows with NA value
r_w_na <- d %>% filter(is.na(ht))
print(r_w_na)

# Finding rows with no NA value
r_w_na <- d %>% filter(!is.na(ht))
print(r_w_na)

Output: 

     name age ht school
1 Abhi 7 46 yes
2 Bhavesh 5 NA yes
3 Chaman 9 NA no
4 Dimri 16 69 no

name age ht school
1 Bhavesh 5 NA yes
2 Chaman 9 NA no

name age ht school
1 Abhi 7 46 yes
2 Dimri 16 69 no

arrange()

For reordering of the cases.

R
# Create a data frame with missing data 
d <- data.frame( name = c("Abhi", "Bhavesh", "Chaman", "Dimri"), 
                 age = c(7, 5, 9, 16), 
                 ht = c(46, NA, NA, 69),
                 school = c("yes", "yes", "no", "no") )
d

# Arranging name according to the age
d.name<- arrange(d, age)
print(d.name)

Output: 

     name age ht school
1 Abhi 7 46 yes
2 Bhavesh 5 NA yes
3 Chaman 9 NA no
4 Dimri 16 69 no

name age ht school
1 Bhavesh 5 NA yes
2 Abhi 7 46 yes
3 Chaman 9 NA no
4 Dimri 16 69 no

select() and rename()

For choosing variables and using their names as a base for doing so.

R
d <- data.frame(name=c("Abhi", "Bhavesh",
                        "Chaman", "Dimri"),
                 age=c(7, 5, 9, 16),
                 ht=c(46, NA, NA, 69),
                 school=c("yes", "yes", "no", "no"))

# startswith() function to print only ht data
select(d, starts_with("ht"))

# everything except ht data
select(d, -starts_with("ht"))
select(d, 1: 2)
select(d, contains("a"))

# Printing data of column heading which matches 'na'
select(d, matches("na"))

Output: 

  ht
1 46
2 NA
3 NA
4 69

name age school
1 Abhi 7 yes
2 Bhavesh 5 yes
3 Chaman 9 no
4 Dimri 16 no

name age
1 Abhi 7
2 Bhavesh 5
3 Chaman 9
4 Dimri 16

name age
1 Abhi 7
2 Bhavesh 5
3 Chaman 9
4 Dimri 16

name
1 Abhi
2 Bhavesh
3 Chaman
4 Dimri

mutate() and transmute()

Addition of new variables which are the functions of prevailing variables.

R
d <- data.frame(name = c("Abhi", "Bhavesh", "Chaman", "Dimri"), 
                age = c(7, 5, 9, 16), 
                ht = c(46, NA, NA, 69),
                school = c("yes", "yes", "no", "no"))

# Add 'x3' as sum of height and age, keeping all columns
mutate(d, x3 = ht + age)

# Add 'x3' as sum of height and age, keeping only 'x3'
transmute(d, x3 = ht + age)

Output: 

    name age ht school x3
1 Abhi 7 46 yes 53
2 Bhavesh 5 NA yes NA
3 Chaman 9 NA no NA
4 Dimri 16 69 no 85

x3
1 53
2 NA
3 NA
4 85

summarise()

Condensing various values to one value.

R
d <- data.frame( name = c("Abhi", "Bhavesh",
                          "Chaman", "Dimri"), 
                 age = c(7, 5, 9, 16), 
                 ht = c(46, NA, NA, 69),
                 school = c("yes", "yes", "no", "no") )

summarise(d, mean = mean(age))
summarise(d, med = min(age))
summarise(d, med = max(age))
summarise(d, med = median(age))

Output: 

  mean
1 9.25

med
1 5

med
1 16

med
1 8

sample_n() and sample_frac()

For taking random specimens.

R
d <- data.frame( name = c("Abhi", "Bhavesh",
                          "Chaman", "Dimri"), 
                 age = c(7, 5, 9, 16), 
                 ht = c(46, NA, NA, 69),
                 school = c("yes", "yes", "no", "no") )

# Printing three rows
sample_n(d, 3)

# Printing 50 % of the rows
sample_frac(d, 0.50)

Output: 

    name age ht school
1 Chaman 9 NA no
2 Dimri 16 69 no
3 Abhi 7 46 yes

name age ht school
1 Abhi 7 46 yes
2 Dimri 16 69 no

Next Article
Article Tags :

Similar Reads