Replace Missing Values by Column Mean in R DataFrame
Last Updated :
21 Dec, 2023
In this article, we are going to see how to replace missing values with columns mean in R Programming Language. Missing values in a dataset are usually represented as NaN or NA. Such values must be replaced with another value or removed. This process of replacing another value in place of missing data is known as Data Imputation.
Creating data frame with missing values
R
# creating a dataframe
data <- data.frame(marks1 = c(NA, 22, NA, 49, 75),
marks2 = c(81, 14, NA, 61, 12),
marks3 = c(78.5, 19.325, NA, 28, 48.002))
data
Output:
marks1 marks2 marks3
1 NA 81 78.500
2 22 14 19.325
3 NA NA NA
4 49 61 28.000
5 75 12 48.002
Replace columns using mean() function
Let's see how to impute missing values with each column's mean using a dataframe and mean( ) function. mean() function is used to calculate the arithmetic mean of the elements of the numeric vector passed to it as an argument.
Syntax of mean() : mean(x, trim = 0, na.rm = TRUE, …)
Arguments:
- x - any object
- trim - observations to be trimmed from each end of x before the mean is computed
- na.rm - TRUE to remove NA values
Replacing NA for all columns using mean( ) function
R
data$marks2[is.na(data$marks2)]<-mean(data$marks2,na.rm=TRUE)
data
Output:
marks1 marks2 marks3
1 NA 81 78.500
2 22 14 19.325
3 NA 42 NA
4 49 61 28.000
5 75 12 48.002
In this code we fill the missing values of marks2 column with mean value.
Replacing Missing Data in all columns Using for-Loop
With the help of For loops in R we will Replacing Missing Data in all columns.
R
# replacing NA with each column's mean
for(i in colnames(data))
data[,i][is.na(data[,i])] <- a[,i]
data
Output:
marks1 marks2 marks3
1 48.66667 81 78.50000
2 22.00000 14 19.32500
3 48.66667 42 43.45675
4 49.00000 61 28.00000
5 75.00000 12 48.00200
colMeans() function is used to compute the mean of each column of a matrix or array
Syntax of colMeans() : colMeans(x, na.rm = FALSE, dims = 1 ...)
Arguments:
- x: object
- dims: dimensions are regarded as ‘columns’ to sum over
- na.rm: TRUE to ignore NA values
Here we are going to use colMeans function to replace the NA in columns.
R
# creating a dataframe
data <- data.frame(marks1 = c(NA, 22, NA, 49, 75),
marks2 = c(81, 14, NA, 61, 12),
marks3 = c(78.5, 19.325, NA, 28, 48.002))
data
# using colMeans()
mean_val <- colMeans(data,na.rm = TRUE)
# replacing NA with mean value of each column
for(i in colnames(data))
data[,i][is.na(data[,i])] <- mean_val[i]
data
Output :
marks1 marks2 marks3
1 NA 81 78.500
2 22 14 19.325
3 NA NA NA
4 49 61 28.000
5 75 12 48.002 data marks1 marks2 marks3
1 48.66667 81 78.50000
2 22.00000 14 19.32500
3 48.66667 42 43.45675
4 49.00000 61 28.00000
5 75.00000 12 48.00200
Replacing NA using apply() function
In this method, we will use apply() function to replace the NA from the columns.
Syntax of apply() : apply(X, MARGIN, FUN, …)
Arguments:
- X - an array, including a matrix
- MARGIN - a vector
- FUN - the function to be applied
R
# creating a dataframe
data <- data.frame(marks1 = c(NA, 22, NA, 49, 75),
marks2 = c(81, 14, NA, 61, 12),
marks3 = c(78.5, 19.325, NA, 28, 48.002))
data
# computing mean of all columns using apply()
all_column_mean <- apply(data, 2, mean, na.rm=TRUE)
# imputing NA with the mean calculated
for(i in colnames(data))
data[,i][is.na(data[,i])] <- all_column_mean[i]
data
Output :
marks1 marks2 marks3
1 NA 81 78.500
2 22 14 19.325
3 NA NA NA
4 49 61 28.000
5 75 12 48.002 data marks1 marks2 marks3
1 48.66667 81 78.50000
2 22.00000 14 19.32500
3 48.66667 42 43.45675
4 49.00000 61 28.00000
5 75.00000 12 48.00200
Using na.aggregate() Function of zoo Package
We can also replace the missing values using na.aggregate Function of zoo Package in R.
R
# Install & load zoo package
install.packages("zoo")
library("zoo")
# creating a dataframe
data <- data.frame(marks1 = c(NA, 22, NA, 49, 75),
marks2 = c(81, 14, NA, 61, 12),
marks3 = c(78.5, 19.325, NA, 28, 48.002))
data
# using na.aggregate function to replace missing values
data<- na.aggregate(data)
data
Output:
marks1 marks2 marks3
1 NA 81 78.500
2 22 14 19.325
3 NA NA NA
4 49 61 28.000
5 75 12 48.002
marks1 marks2 marks3
1 48.66667 81 78.50000
2 22.00000 14 19.32500
3 48.66667 42 43.45675
4 49.00000 61 28.00000
5 75.00000 12 48.00200
Similar Reads
Replace Values Based on Condition in R In this article, we will examine various methods to replace values based on conditions in the R Programming Language. How to replace values based on conditionR language offers a method to replace values based on conditions efficiently. By using these methods provided by R, it is possible to replace
3 min read
Rename Columns of a Data Frame in R Programming - rename() Function The rename() function in R Programming Language is used to rename the column names of a data frame, based on the older names.Syntax: rename(x, names) Parameters:x: Data frame names: Old name and new name 1. Rename a Data Frame using rename function in RWe are using the plyr package to rename the col
2 min read
How to Replace particular value in R dataframe ? Often, some values in our dataframe are not appropriate, they are not up-to-date, or we aren't aware of those values. In such cases, we replace those values, because they are causing ambiguity. Over here, we will use the term NA, which stands for Non-Available to replace the unknown values. In this
4 min read
Condense Column Values of a Data Frame in R Programming - summarise() Function summarise() function in R Language is used to condense various values of column of a data frame to one value. Syntax: summarise(x, expr) Parameters: x: Data Frame expr: Operation to condense data Example 1: Python3 1== # R program to condense data # of a data frame # Loading library library(dplyr) #
1 min read
How to Impute Missing Values in R? In this article, we will discuss how to impute missing values in R programming language. In most datasets, there might be missing values either because it wasn't entered or due to some error. Replacing these missing values with another value is known as Data Imputation. There are several ways of imp
3 min read