0% found this document useful (0 votes)
17 views53 pages

IDS Unit-3

This document provides an overview of vectors in R programming, explaining their definition, types, and how to create and manipulate them. It covers vector arithmetic operations, accessing and modifying vector elements, and subsetting techniques. Key functions such as c(), length(), sort(), and rep() are highlighted for vector management.

Uploaded by

upender
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views53 pages

IDS Unit-3

This document provides an overview of vectors in R programming, explaining their definition, types, and how to create and manipulate them. It covers vector arithmetic operations, accessing and modifying vector elements, and subsetting techniques. Key functions such as c(), length(), sort(), and rep() are highlighted for vector management.

Uploaded by

upender
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

UNIT-3

VECTORS
• A vector is simply a list of items that are of the same type.
• A vector is a basic data structure which plays an important role in R programming.
• In R, a sequence of elements which share the same data type is known as vector. A vector
supports logical, integer, double, character, complex, or raw data type. The elements which
are contained in vector known as components of the vector. We can check the type of vector
with the help of the typeof() function.
• To combine the list of items to a vector, use the c() function and separate the items by a comma.
• Vectors in R are the same as the arrays in C language which are used to hold multiple data values
of the same type. One major key point is that in R the indexing of the vector will start from ‘1’
and not from ‘0’. We can create numeric vectors and character vectors as well.
• The length is an important property of a vector. A vector length is basically the number
of elements in the vector, and it is calculated with the help of the length() function.

Single Element Vector:


Even when you write just one value in R, it becomes a vector of length 1 and belongs to one of
the above vector types.
# Atomic vector of type character.
> print("abc")
[1] "abc"
# Atomic vector of type double.
> print(12.5)
[1] 12.5
# Atomic vector of type integer.
> print(63L)
[1] 63
# Atomic vector of type logical.
> print(TRUE)
[1] TRUE
# Atomic vector of type complex.
> print(2+3i)
[1] 2+3i
# Atomic vector of type raw.
> print(charToRaw('hello'))
[1] 68 65 6c 6c 6f
Types of vectors:
Vectors are of different types which are used in R.
1. Numeric vectors: Numeric vectors are those which contain numeric values such as integer, float,
etc.
# R program to create numeric Vectors
# creation of vectors using c() function.
> v1 <- c(4, 5, 6, 7)
# Print the values of v1
> v1
[1] 4 5 6 7
# display type of vector
> typeof(v1)
[1] "double"
# by using 'L' we can specify that we want integer values.
> v2 <- c(1L, 4L, 2L, 5L)
# display type of vector
> typeof(v2)
[1] "integer"
2. Character vectors: Character vectors contain alphanumeric values and special characters.
Ex1:
# R program to create Character Vectors by default numeric values are converted into characters
> v1 <- c('geeks', '2', 'hello', 57)
# Displaying type of vector
> typeof(v1)
[1] "character"
EX2:
# Vector of strings
> fruits <- c("banana", "apple", "orange")
# Print values of fruits
> fruits
[1] "banana" "apple" "orange"
> typeof(fruits)
[1] "character"
3. Logical vectors: Logical vectors contain Boolean values such as TRUE, FALSE and NA for Null
values.
# R program to create Logical Vectors
# Creating logical vector using c() function
> log_values <- c(TRUE, FALSE, TRUE, NA)
# Displaying values of vector
> log_values
[1] TRUE FALSE TRUE NA
# Displaying type of vector
> typeof(log_values)
[1] "logical"

CREATING AND NAMING VECTORS


• we use c() function to create a vector. This function returns a one-dimensional array or simply
vector. The c() function is a generic function which combines its argument.
• All arguments are restricted with a common data type which is the type of the returned value.
There are various other ways also there
Types:
1. Using c() Function
2. Using the colon(:) operator
3. Using the seq() function
4. Using assign() function
Using c() Function:
The c function in R programming stands for 'combine'. This function is used to get the output by
giving parameters inside the function.
EX:
# R program to create Vectors we can use the c function to combine the values as a vector.
# By default, the type will be double.
> X <- c(61, 4, 21, 67, 89, 2)
> cat('using c function', X)
using c function 61 4 21 67 89 2
# print the vales option 1
>X
[1] 61 4 21 67 89 2
# print the vales option 2
> print(X)
[1] 61 4 21 67 89 2
# Print type of vector
> typeof(X)
[1] "double"
Using the colon(:) operator:
Colon operator (":") in R is a function that generates regular sequences. It is most commonly
used in for loops, to index and to create a vector with increasing or decreasing sequence. It is a
binary operator i.e. it takes two arguments.
Syntax : z<- x:y
EX1:
# Vector with numerical values in a sequence
> numbers <- 1:10
# print the vales
> numbers
[1] 1 2 3 4 5 6 7 8 9 10
EX2:
# Vector with numerical decimals in a sequence
> numbers1 <- 1.5:6.5
> numbers1
[1] 1.5 2.5 3.5 4.5 5.5 6.5
# Vector with numerical decimals in a sequence where the last element is not used
> numbers2 <- 1.5:6.3
> numbers2
[1] 1.5 2.5 3.5 4.5 5.5
Using the seq() function:
In R, we can create a vector with the help of the seq() function. A sequence function creates a
sequence of elements as a vector. The seq() function is used in two ways, i.e., by setting step size
with ‘by’ parameter or specifying the length of the vector(sequence) with the 'length.out' feature.
Ex1:
# Assigning a vector using seq() function
> seq_vec<-seq(1,4,by=0.5)
# Printing the vector
> seq_vec
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0
> class(seq_vec)
[1] "numeric"
EX2:
> V = seq(1, 3, by=0.2)
> print(V)
[1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
EX3:
> seq_vec<-seq(1,4,length.out=10)
> seq_vec
[1] 1.000000 1.333333 1.666667 2.000000 2.333333 2.666667 3.000000 3.333333
[9] 3.666667 4.000000
> class(seq_vec)
[1] "numeric"
EX4:
> seq_vec<-seq(1,4,length.out=5)
> seq_vec
[1] 1.00 1.75 2.50 3.25 4.00
> class(seq_vec)
[1] "numeric"
EX5:
# Creating a sequence from 5 to 13.
> v <- 5:13
> print(v)
[1] 5 6 7 8 9 10 11 12 13
# Creating a sequence from 6.6 to 12.6.
> v <- 6.6:12.6
> print(v)
[1] 6.6 7.6 8.6 9.6 10.6 11.6 12.6
Using assign() function:
The assign() function takes the following mandatory parameter values are
x : This represents the variable name that is given as a character string.
value : This is the value to be assigned to the x variable.
EX1:
> assign("vec2",c(6,7,8,9,10))
> vec2
[1] 6 7 8 9 10

VECTOR ARITHMETIC OPERATIONS


• We can perform arithmetic operations on vectors, like addition, subtraction, multiplication
and division.
• Important note that the two vectors should be of same length and same type. Or one of the vectors
can be an atomic value of same type.
• If the vectors are not of same length, then Vector Recycling happens implicitly.
Vector Recycling:
If two vectors are of unequal length, the shorter one will be recycled in order to match the longer
vector. For example, the following vectors u and v have different lengths, and their sum is
computed by recycling values of the shorter vector u.
Ex: > u = c(10, 20, 30)
> v = c(1, 2, 3, 4, 5, 6, 7, 8, 9)
> u+v
[1] 11 22 33 14 25 36 17 28 39
Types:
1. Addition: Addition operator takes two vectors as operands, and returns the result of sum of two
vectors. i.e. a + b
For example, the following program, we create two integer vectors and add them using
Addition Operator.
> a <- c(10, 20, 30, 40, 50)
> b <- c(1, 3, 5, 7, 9)
> result <- a + b
> print(result)
[1] 11 23 35 47 59
2. Subtraction: Subtraction operator takes two vectors as operands, and returns the
result of difference of two vectors. i.e. a - b
For example, the following program, we create two integer vectors and find their different using
Subtraction Operator.
> a <- c(10, 20, 30, 40, 50)
> b <- c(1, 3, 5, 7, 9)
> result <- a - b
> print(result)
[1] 9 17 25 33 41
3. Multiplication: Multiplication operator takes two vectors as operands, and returns the
result of product of two vectors. i.e. a * b
For example, the following program, we create two integer vectors and find their product using
Multiplication Operator.
> a <- c(10, 20, 30, 40, 50)
> b <- c(1, 3, 5, 7, 9)
> result <- a * b
> print(result)
[1] 10 60 150 280 450
4. Division
Division operator takes two vectors are operands, and returns the result of division of two vectors. i.e. a
+b
For example, the following program, we create two integer vectors and divide them using
Division Operator.
> a <- c(10, 20, 30, 40, 50)
> b <- c(1, 3, 5, 7, 9)
> result <- a / b
> print(result)
[1] 10.000000 6.666667 6.000000 5.714286 5.555556
Accessing Vector Elements:
Elements of a Vector are accessed using indexing. The [ ] brackets are used for indexing. Indexing starts
with position 1 and increased by 1 for every element. Indexing denotes the position where the value
in a vector is stored. Indexing will be performed with the help of integer, character, or logic (TRUE,
FALSE or 0 and 1). Giving a negative value in the index drops that element from result.
# Accessing vector elements using position.
> days <- c("Sun","Mon","Tue","Wed","Thu","Fri","Sat")
> u <- days[c(2,3,6)
>u
[1] "Mon" "Tue" "Fri"
# Accessing vector elements using logical indexing.
> v <- days[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
>v
[1] "Sun" "Fri"
# Accessing vector elements using negative indexing.
> x <- days[c(-2,-5)]
>x
[1] "Sun" "Tue" "Wed" "Fri" "Sat"
# Accessing vector elements using 0/1 indexing.
> y <- days[c(0,0,0,0,0,0,1)>0]
>y
[1] "Sat"
Vector Length: To find out how many items a vector has, use the length() function
# Creating a character vector
> fruits <- c("banana", "apple", "orange")
# Get the length of vector using length()
> length(fruits)
[1] 3
Sort a Vector: To sort items in a vector alphabetically or numerically, use the sort() function
# Creating a character vector
> fruits <- c("banana", "apple", "orange", "mango", "lemon")
# Creating a numaric vector
> numbers <- c(13, 3, 5, 7, 20, 2)
# Sort a string
> sort(fruits)
[1] "apple" "banana" "lemon" "mango" "orange"
# Sort numbers
> sort(numbers)
[1] 2 3 5 7 13 20
Change an Item: To change the value of a specific item, refer to the index number
> fruits <- c("banana", "apple", "orange", "mango", "lemon")
# Change "banana" to "pear"
> fruits[1] <- "pear"
# Print fruits
> fruits
[1] "pear" "apple" "orange" "mango" "lemon"
Repeat Vectors: To repeat vectors, use the rep() function.
• Repeat each value:
> repeat_each <- rep(c(1,2,3), each = 3)
> repeat_each
[1] 1 1 1 2 2 2 3 3 3
• Repeat the sequence of the vector:
> repeat_times <- rep(c(1,2,3), times = 3)
> repeat_times
[1] 1 2 3 1 2 3 1 2 3
• Repeat each value independently:
> repeat_indepent <- rep(c(1,2,3), times = c(5,2,1))
> repeat_indepent
[1] 1 1 1 1 1 2 2 3

VECTOR SUB SETTING


In R Programming Language, subsetting allows the user to access elements from an object. It takes out
a portion from the object based on the condition provided. There are 4 ways of subsetting in R
programming. Each of the methods depends on the usability of the user and the type of object.
1. Subsetting in R Using [ ] Operator: Vectors are basic objects in R and they can be subsetted
using the [ ] operator. Using the ‘[ ]’ operator, elements of vectors and observations from data frames
can be accessed. To neglect some indexes, ‘-‘ is used to access all other indexes of vector or data
frame.
> x <- c("a", "b", "c", "c", "d", "a")
> x[1] ## Extract the first element
[1] "a"
> x[2] ## Extract the second element
[1] "b"
The [ ] operator can be used to extract multiple elements of a vector by passing the operator an
integer sequence. Here we extract the first four elements of the vector.
> x[1:4]
[1] "a" "b" "c" "c"
The sequence does not have to be in order; you can specify any arbitrary integer vector.
> x[c(1, 3, 4)]
[1] "a" "c" "c"
To neglect some indexes, ‘-‘ is used to access all other indexes of vector.
> print(x[-c(1, 2, 3)])
[1] "c" "d" "a"
2. Subsetting in R Using [[ ]] Operator: [[ ]] operator is used for subsetting of list-objects. This
operator is the same as [ ] operator but the only difference is that [[ ]] selects only one element whereas
[ ] operator can select more than 1 element in a single command.
# Create list
> ls <- list(a = 1, b = 2, c = 10, d = 20)
# Print list
> print(ls)
$a
[1] 1
$b
[1] 2
$c
[1] 10
$d
[1] 20
# Select first element of list
> ls[[1]]
[1] 1
3. Subsetting in R Using $ Operator: $ operator can be used for lists and data frames in R. Unlike
[ ] operator, it selects only a single observation at a time. It can be used to access an element in named
list or a column in data frame. $ operator is only applicable for recursive objects or list -like objects.
# Create list
> ls <- list(a = 1, b = 2, c = "Hello", d = "Data Science")
# Print list
> print(ls)
$a
[1] 1
$b
[1] 2
$c
[1] "Hello"
$d
[1] "Data Science"
# Print "Data Science" using $ operator
> print(ls$d)
[1] "Data Science"
4. Subsetting in R Using subset() Function: subset() function in R programming is used to create a
subset of vectors, matrices, or data frames based on the conditions provided in the parameters.
Syntax: subset(x, subset, select)
Parameters:
• x: indicates the object
• subset: indicates the logical expression on the basis of which subsetting has to be done
• select: indicates columns to select
# Subsetting
> airq <- subset(airquality, Temp < 65, select = c(Month))
# Print subset
> print(airq)
Month
4 5
5 5
8 5
9 5
15 5
16 5
18 5
20 5
21 5
23 5
24 5
25 5
26 5
27 5
144 9
148 9
# Subsetting
> mtc <- subset(mtcars, gear == 5 & hp > 200, select = c(gear, hp))
# Print subset
> print(mtc)
gear hp
Ford Pantera L 5 264
Maserati Bora 5 335

MATRICES
• Matrix is a rectangular arrangement of numbers in rows and columns. In a matrix, as we know
rows are the ones that run horizontally and columns are the ones that run vertically.
• In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created
with the help of the vector input to the matrix function. On R matrices, we can perform addition,
subtraction, multiplication, and division operation.
• In the R matrix, elements are arranged in a fixed number of rows and columns. The
matrix elements are the real numbers. In R, we use matrix function, which can easily reproduce
the memory representation of the matrix. In the R matrix, all the elements must share a
common basic type.
CREATING AND NAMING MATRICES
• To create a matrix in R you need to use the function called matrix(). The arguments to this
matrix() are the set of elements in the vector.
• You have to pass how many numbers of rows and how many numbers of columns you want to
have in your matrix. Note: By default, matrices are in column-wise order.
• Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular
layout. They contain elements of the same atomic types. Though we can create a matrix
containing only characters or only logical values, they are not of much use. We use
matrices containing numeric elements to be used in mathematical calculations.
• A Matrix is created using the matrix() function.
Syntax: matrix(data, nrow, ncol, byrow, dimnames)
Following is the description of the parameters used –
➢ data is the input vector which becomes the data elements of the matrix.
➢ nrow is the number of rows to be created.
➢ ncol is the number of columns to be created.
➢ byrow is a logical value. If TRUE then the input vector elements are arranged by
row.( byrow is a logical variable. Matrices are by default column-wise. By setting
byrow as TRUE, we can arrange the data row- wise in the matrix).
➢ dimname is the names assigned to the rows and columns(takes two character arrays
as input for row names and column names).
create a matrix in R:
Like vector and list, R provides a function which creates a matrix. R provides the matrix() function
to create a matrix. This function plays an important role in data analysis. There is the following
syntax of the matrix in R:
matrix(data, nrow, ncol, byrow, dim_name)
data: The first argument in matrix function is data. It is the input vector which is the data elements of
the matrix.
Nrow: The second argument is the number of rows which we want to create in the matrix.
Ncol: The third argument is the number of columns which we want to create in the matrix.
Byrow: The byrow parameter is a logical clue. If its value is true, then the input vector
elements are arranged by row.
dim_name: The dim_name parameter is the name assigned to the rows and columns.
Example to understand how matrix function is used to create a matrix and arrange the elements
sequentially by row or column.
Ex1: > # Create a matrix
> thismatrix <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)
> # Print the matrix
> thismatrix
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
Ex2: > # Create a matrix and Arranging elements sequentially by row.
> A <- matrix(c(5:16), nrow = 4, byrow = TRUE)
> # Print the matrix
>A
[,1] [,2] [,3]
[1,] 5 6 7
[2,] 8 9 10
[3,] 11 12 13
[4,] 14 15 16
Ex3: > # Create a matrix and Arranging elements sequentially by column.
> B <- matrix(c(5:16), nrow = 4, byrow = FALSE)
> # Print the matrix
>B
[,1] [,2] [,3]
[1,] 5 9 13
[2,] 6 10 14
[3,] 7 11 15
[4,] 8 12 16
Ex4: > # Create a matrix with number of columns are 4
> C <- matrix(c(5:16), ncol = 4, byrow = TRUE)
> # Print the matrix
>C
[,1] [,2] [,3] [,4]
[1,] 5 6 7 8
[2,] 9 10 11 12
[3,] 13 14 15 16
Ex5: > # Create a matrix and Defining the column and row names.
> row_names = c("row1", "row2", "row3", "row4")
> col_names = c("col1", "col2", "col3")
> D<-matrix(c(3:14), nrow=4, byrow=TRUE, dimnames=list(row_names, col_names))
> # Print the matrix
>D
col1 col2 col3
row1 3 4 5
row2 6 7 8
row3 9 10 11
row4 12 13 14
Diagonal matrix: A diagonal matrix is a matrix in which the entries outside the main diagonal are all
zero. To create such a matrix the syntax is given below:
Ex: > print(diag(c(5, 3, 3), 3, 3))
[,1] [,2] [,3]
[1,] 5 0 0
[2,] 0 3 0
[3,] 0 0 3
Identity matrix: A square matrix in which all the elements of the principal diagonal are ones and all
other elements are zeros. To create such a matrix the syntax is given below:
Ex: > print(diag(1, 3, 3))
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
Accessing matrix elements in R:
There are three ways to access the elements from the matrix.
1. We can access the element which presents on nth row and mth column. Access the items by using
[ ] brackets. The first number "1" in the bracket specifies the row-position, while the second
number "2" specifies the column-position:
Ex: > # Defining the column and row names.
> row_names = c("row1", "row2", "row3", "row4")
> ccol_names = c("col1", "col2", "col3")
> A <-matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names,
col_names))
>A
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
> #Accessing element present on 3rd row and 2nd column
> print(A[3,2])
[1] 12
2. We can access all the elements of the matrix which are present on the nth row. The whole
row can be accessed if you specify a comma after the number in the bracket:
> # Accessing element present in 3rd row
> print(A[3,])
col1 col2 col3
11 12 13
3. We can also access all the elements of the matrix which are present on the mth column. The
whole column can be accessed if you specify a comma before the number in the bracket:
> #Accessing element present in 2nd column
> print(A[,2])
row1 row2 row3 row4
6 9 12 15
Modification of the matrix:
R allows us to do modification in the matrix. There are several methods to do modification in the matrix,
which are as follows:

Assign a single element: In matrix modification, the first method is to assign a single element to the
matrix at a particular position. By assigning a new value to that position, the old value will get
replaced with the new one. This modification technique is quite simple to perform matrix
modification.
The basic syntax for it is as follows: matrix[n, m]<-y
Here, n and m are the rows and columns of the element, respectively. And, y is the value which we assign
to modify our matrix.
Ex: > # Defining the column and row names.
> row_names = c("row1", "row2", "row3", "row4")
> col_names = c("col1", "col2", "col3")
> A <-matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names,
col_names))
>A
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
> # Assigning value 20 to the element at 3d row and 2nd column
> A[3,2]<-20
>A
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 20 13
row4 14 15 16

Use of Relational Operator: R provides another way to perform matrix modification. In this method,
we used some relational operators like >, <, ==. Like the first method, the second method is quite
simple to use. Let see an example to understand how this method modifies the matrix.
Ex: > # Defining the column and row names.
> row_names = c("row1", "row2", "row3", "row4")
> col_names = c("col1", "col2", "col3")
> A <-matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names,
col_names))
>A
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
> # Replacing element that equal to the 12
> A[A==12]<-0
>A
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 0 13
row4 14 15 16
> # Replacing elements whose values are greater than 10
> A[A>10]<-0
>A
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 0 0 0
row4 0 0 0

Adding of Rows and Columns: The third method of matrix modification is through the addition of
rows and columns using the cbind() and rbind() function. The cbind() and rbind() function are
used to add a column and a row respectively. Let see an example to understand the working of cbind()
and rbind() functions.
Ex: > # Defining the column and row names.
> row_names = c("row1", "row2", "row3", "row4")
> ccol_names = c("col1", "col2", "col3")
> R<-matrix(c(5:16), nrow=4, byrow=TRUE,dimnames =list(row_names,ccol_names)) > R
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
> # Adding row
> rbind(R,c(17,18,19))
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
17 18 19
> # Adding column
> cbind(R,c(17,18,19,20))
col1 col2 col3
row1 5 6 7 17
row2 8 9 10 18
row3 11 12 13 19
row4 14 15 16 20
> # Transpose of the matrix using the t() function:
> t(R)
row1 row2 row3 row4
col1 5 8 11 14
col2 6 9 12 15
col3 7 10 13 16
> # Modifying the dimension of the matrix using the dim() function
> dim(R)<-c(1,12)
>R
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 5 8 11 14 6 9 12 15 7 10 13 16
Combine Matrix using rbind() Function: We will create two matrix and combine them with the help
of rbind function.
Ex: > # create one matrix
> a1<-matrix(1:9,3,3)
> a1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> # Create second matrix
> a2<-matrix(10:18,3,3)
> a2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
[3,] 12 15 18
> # combine the both matrix
> rbind(a1,a2)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[4,] 10 13 16
[5,] 11 14 17
[6,] 12 15 18
Combine Matrix using cbind() Function: We will create two matrix and combine them with the help
of cbind function.
Ex: > # create one matrix
> a1<-matrix(1:9,3,3)
> a1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> # Create second matrix
> a2<-matrix(10:18,3,3)
> a2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
[3,] 12 15 18
> # combine the both matrix
> cbind(a1, a2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 7 10 13 16
[2,] 2 5 8 11 14 17
[3,] 3 6 9 12 15 18

Matrix operations: In R, we can perform the mathematical operations on a matrix such as


addition, subtraction, multiplication, etc. For performing the mathematical operation on the matrix,
it is required that both the matrix should have the same dimensions.

Ex: > # create first matrix


> R <- matrix(c(5:16), nrow = 4,ncol=3)
>R
[,1] [,2] [,3]
[1,] 5 9 13
[2,] 6 10 14
[3,] 7 11 15
[4,] 8 12 16
> # create second matrix
> S <- matrix(c(1:12), nrow = 4,ncol=3)
>S
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
> # Addition
> sum=R+S
> sum
[,1] [,2] [,3]
[1,] 6 14 22
[2,] 8 16 24
[3,] 10 18 26
[4,] 12 20 28

> # Subtraction
> sub<-R-S
> sub
[,1] [,2] [,3]
[1,] 4 4 4
[2,] 4 4 4
[3,] 4 4 4
[4,] 4 4 4

> # Multiplication
> mul=R*S
> mul
[,1] [,2] [,3]
[1,] 5 45 117
[2,] 12 60 140
[3,] 21 77 165
[4,] 32 96 192

> # Multiplication by constant


> mul1=R*12
> mul1
[,1] [,2] [,3]
[1,] 60 108 156
[2,] 72 120 168
[3,] 84 132 180
[4,] 96 144 192

> # Division
> div=R/S
> div
[,1] [,2] [,3]
[1,] 5.000000 1.800000 1.444444
[2,] 3.000000 1.666667 1.400000
[3,] 2.333333 1.571429 1.363636
[4,] 2.000000 1.500000 1.333333
Matrix metrics: Matrix metrics mean once a matrix is created then
• How can you know the dimension of the matrix?
• How can you know how many rows are there in the matrix?
• How many columns are in the matrix?
Ex: > # Create A Matrix
> A = matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3, byrow = TRUE)
>A
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
> # Dimension of the matrix
> print(dim(A))
[1] 3 3
> # Number of rows
> print(nrow(A))
[1] 3
> # Number of columns
> print(ncol(A))
[1] 3
> # Number of elements
> print(length(A))
[1] 9
> print(prod(dim(A)))
[1] 9

Applications of matrix:
1. In geology, Matrices takes surveys and plot graphs, statistics, and used to study in different fields.
2. Matrix is the representation method which helps in plotting common survey things.
3. In robotics and automation, Matrices have the topmost elements for the robot movements.
4. Matrices are mainly used in calculating the gross domestic products in Economics, and it
also helps in calculating the capability of goods and products.
5. In computer-based application, matrices play a crucial role in the creation of realistic seeming
motion.

MATRIX SUBSETTING
A matrix is subset with two arguments within single brackets, [ ], and separated by a comma.
The first argument specifies the rows, and the second the columns.
Ex: > A<-matrix(1:16,4)
>A
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
> colnames(A)<-c("C1","C2","C3","C4")
> rownames(A)<-c("R1","R2","R3","R4")
>A
C1 C2 C3 C4
R1 1 5 9 13
R2 2 6 10 14
R3 3 7 11 15
R4 4 8 12 16
> # all rows with 1st column
> A[,1,drop=FALSE]
C1
R1 1
R2 2
R3 3
R4 4
> # 1st row with all column
> A[1,,drop=FALSE]
C1 C2 C3 C4
R1 1 5 9 13
> #display 1 row and 1st column, cell value
st

> A[1,1,drop=FALSE]
C1
R1 1
> # display 1st , 2nd rows and 2nd ,3rd column
> A[1:2,2:3]
C2 C3
R1 5 9
R2 6 10
> #display 1st , 2nd rows and 2nd, 4th column
> A[1:2,c(2,4)]
C2 C4
R1 5 13
R2 6 14

Note:
The only difference between vectors, matrices, and arrays are
✓ Vectors are uni-dimensional arrays
✓ Matrices are two-dimensional arrays
✓ Arrays can have more than two dimensions
CLASS
Classes and Objects are basic concepts of Object-Oriented Programming that revolve around the real-
life entities. Class is the blueprint that helps to create an object and contains its member variable along
with the attributes. It represents the set of properties or methods that are common to all objects of one
type. Everything in R is an object. An Object is simply a data structure that has some methods and
attributes. Unlike most other programming languages, R has a three-class system. These are S3, S4,
and Reference Classes.
S3 Class:
• S3 class is somewhat primitive in nature. It lacks a formal definition and object of this class can
be created simply by adding a class attribute to it.
• This simplicity accounts for the fact that it is widely used in R programming language. In fact
most of the R built-in classes are of this type.
Example: > # create a list with required components
> s <- list(name = "John", age = 21, GPA = 6.5)
> # give a name to your class
> class(s) <- "student"
>s
$name
[1] "John"
$age
[1] 21
$GPA
[1] 6.5
attr(,"class")
[1] "student"
S4 Class
• S4 class are an improvement over the S3 class. They have a formally defined structure which
helps in making object of the same class look more or less similar.
• Class components are properly defined using the setClass() function and objects are created using
the new() function.
Example: > # definition of S4 class
> setClass("student", slots=list(name="character", age="numeric", GPA="numeric"))
> # creating an object using new() by passing class name and slot values
> studentlist<-new("student", name="john", age=25, GPA=6.5)
> studentlist
An object of class "student"
Slot "name":
[1] "john"

Slot "age":
[1] 25

Slot "GPA":
[1] 6.5
Reference Class
• Reference class were introduced later, compared to the other two. Reference Class is an
improvement over S4 Class. Here the methods belong to the classes. It is more similar to the
object-oriented programming we are used to seeing in other major programming languages.
• Reference classes are basically S4 classed with an environment added to it.
• Defining a Reference class is similar to defining S4 classes. We use setRefClass() instead
of setClass() and “fields” instead of “slots”.
Example: > # setRefClass returns a generator
> students<-setRefClass("students", fields=list(name="character", age="numeric",
GPA="numeric"))
> #now we can use the generator to create objects
> studentlist<-students(name="john", age=25, GPA=6.5)
> studentlist
Reference class object of class "students"
Field "name":
[1] "john"
Field "age":
[1] 25
Field "GPA":
[1] 6.5
ARRAYS
• In R, arrays are the data objects which allow us to store data in more than two dimensions. In
R, an array is created with the help of the array() function. This array() function takes a
vector as an input and to create an array it uses vectors values in the dim parameter.
• For example- if we will create an array of dimension (2, 3, 4) then it will create 4 rectangu
lar matrices of 2 row and 3 columns.
• R Array Syntax: The following syntax of R arrays:
array_name <- array(data, dim= (row_size, column_size, matrices, dim_names))
✓ data: The data is the first argument in the array() function. It is an input vector which is
given to the array.
✓ Matrices: In R, the array consists of multi-dimensional matrices.
✓ row_size: This parameter defines the number of row elements which an array can store.
✓ column_size: This parameter defines the number of columns elements which an array
can store.
✓ dim_names: This parameter is used to change the default names of rows and
columns.
TYPES of Arrays:
Uni-Dimensional Array: A vector is a uni-dimensional array, which is specified by a single
dimension, length. A Vector can be created using ‘c()‘ function. A list of values is passed to the c()
function to create a vector.
Ex: > # create a vector
> vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
> print (vec1)
[1] 1 2 3 4 5 6 7 8 9
> # cat is used to concatenate strings and print it.
> cat ("Length of vector : ", length(vec1))
Length of vector : 9
Multi-Dimensional Array: A two-dimensional matrix is an array specified by a fixed number of
rows and columns, each containing the same data type. A matrix is created by using array() function
to which the values and the dimensions are passed.
Ex: > # arranges data from 2 to 13 in two matrices of dimensions 2x3
> arr = array(2:13, dim = c(2, 3, 2))
> print(arr)
,,1
[,1] [,2] [,3]
[1,] 2 4 6
[2,] 3 5 7
,,2
[,1] [,2] [,3]
[1,] 8 10 12
[2,] 9 11 13
Creation of arrays: In R, array creation is quite simple. We can easily create an array using vector
and array() function. In array, data is stored in the form of the matrix. There are only two steps to create
a matrix which are as follows
1. In the first step, we will create two vectors of different lengths.
2. Once our vectors are created, we take these vectors as inputs to the array.
Ex: > # Creating two vectors of different lengths
> vec1 <-c(1,3,5)
> vec2 <-c(10,11,12,13,14,15)
> # Taking these vectors as input to the array
> res <- array(c(vec1,vec2))
> print(res)
[1] 1 3 5 10 11 12 13 14 15
Naming rows and columns: In R, we can give the names to the rows, columns, and matrices of
the array. This is done with the help of the dim name parameter of the array() function. It is not
necessary to give the name to the rows and columns. It is only used to differentiate the row and
column for better understanding.
Ex: > # Creating two vectors of different lengths
> vec1 <-c(1,3,5)
> vec2 <-c(10,11,12,13,14,15)
> #Initializing names for rows, columns and matrices
> col_names <- c("Col1","Col2","Col3")
> row_names <- c("Row1","Row2","Row3")
> matrix_names <- c("Matrix1","Matrix2")
> #Taking the vectors as input to the array
> Res<-array(c(vec1,vec2), dim=c(3,3,2), dimnames=list(row_names,col_names, matrix_names))
> print(Res)
, , Matrix1
Col1 Col2 Col3
Row1 1 10 13
Row2 3 1 14
Row3 5 12 15
, , Matrix2
Col1 Col2 Col3
Row1 1 10 13
Row2 3 11 14
Row3 5 12 15
Accessing arrays: The arrays can be accessed by using indices for different dimensions separated
by commas. Different components can be specified by any combination of elements’ names or positions.
Accessing Uni-Dimensional Array: The elements can be accessed by using indexes of the
corresponding elements.
Ex: > # creating a vector
> vec <- c(1:10)
> # accessing entire vector
> cat ("Vector is : ", vec)
Vector is : 1 2 3 4 5 6 7 8 9 10
> # accessing elements
> cat ("Third element of vector is : ", vec[3])
Third element of vector is : 3
Access Entire Row or Column:
Ex: > # create a two 2 by 3 matrix
> array1 <- array(c(1:12), dim = c(2,3,2))
> print(array1)
,,1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
,,2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
> # access entire elements at 2nd column of 1st matrix
> cat("\n2nd Column Elements of 1st matrix:",array1[,c(2),1])
2nd Column Elements of 1st matrix: 3 4
> # access entire elements at 1st row of 2nd matrix
> cat("\n1st Row Elements of 2nd Matrix:", array1[c(1), ,2])
1st Row Elements of 2nd Matrix: 7 9 11

Manipulating Array Elements: As array is made up matrices in multiple dimension, the operations
on elements of array are carried out by accessing elements of the matrices.
Ex: > # Create two vectors of different lengths.
> vector1 <- c(5,9,3)
> vector2 <- c(10,11,12,13,14,15)
> # Take these vectors as input to the array.
> array1 <- array(c(vector1,vector2),dim = c(3,3,2))
> array1
,,1
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
,,2
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
> # Create two vectors of different lengths.
> vector3 <- c(9,1,0)
> vector4 <- c(6,0,11,3,14,1,2,6,9)
> array2 <- array(c(vector3,vector4),dim = c(3,3,2))
> array2
,,1
[,1] [,2] [,3]
[1,] 9 6 3
[2,] 1 0 14
[3,] 0 11 1
,,2
[,1] [,2] [,3]
[1,] 2 9 6
[2,] 6 1 0
[3,] 9 0 11
> # create matrices from these arrays.
> matrix1 <- array1[,,2]
> matrix1
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
> matrix2 <- array2[,,2]
> matrix2
[,1] [,2] [,3]
[1,] 2 9 6
[2,] 6 1 0
[3,] 9 0 11
> # Add the matrices.
> result <- matrix1+matrix2
> print(result)
[,1] [,2] [,3]
[1,] 7 19 19
[2,] 15 12 14
[3,] 12 12 26

Calculations across array elements:


• For calculation purpose, R provides apply() function. This apply function contains three
parameters i.e., x, margin, and function.
• This function takes the array on which we have to perform the calculations.
Using apply() function: 'apply()' is one of the R packages which have several functions that helps to
write code in an easier and efficient way. You'll see the example below where it can be used to
calculate the sum of two different arrays.
The syntax for apply() is apply(x, margin, function)
The argument above indicates that:
✓ x: An array or two-dimensional data as matrices.
✓ margin: Indicates a function to be applied as margin value to be c(1) for rows, c(2) for columns,
and c(1,2) for both rows and columns.
✓ function: Indicates the R- built-in or user-defined function to be applied over the given data.
Ex1: > #Creating two vectors of different lengths
> vec1 <-c(1,3,5)
> vec2 <-c(10,11,12,13,14,15)
> #Taking the vectors as input to the array1
> res1 <- array(c(vec1,vec2),dim=c(3,3,2))
> print(res1)
,,1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
,,2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
> #using apply function
> result <- apply(res1,c(1),sum)
> print(result)
[1] 48 56 64
> result <- apply(res1,c(2),sum)
> print(result)
[1] 18 66 84

Ex2: > #Creating two vectors of different lengths


> vec1 <-c(1,3,5)
> vec2 <-c(10,11,12,13,14,15)
> #Taking the vectors as input to the array1
> res1 <- array(c(vec1,vec2),dim=c(3,3,2))
> print(res1)
,,1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
,,2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
> #using apply function
> result <- apply(res1,c(1,2),sum)
> print(result)
[,1] [,2] [,3]
[1,] 2 20 26
[2,] 6 22 28
[3,] 10 24 30
Adding elements to array:
Elements can be appended at the different positions in the array. The sequence of elements is retained in
order of their addition to the array. The time complexity required to add new elements is O(n) where n
is the length of the array. The length of the array increases by adding the number of elements. There are
various in-built functions available in R to add new values:
• c(vector, values): c() function allows us to append values to the end of the array. Multiple values
can also be added together.
• append(vector, values): This method allows the values to be appended at any position in the
vector. By default, this function adds the element at end.
• append(vector, values, after=length(vector)) adds new values after specified length of the
array specified in the last argument of the function.
Using the length function of the array: Elements can be added at length+x indices where x>0.
Ex: > # creating a uni-dimensional array
> x <- c(1, 2, 3, 4, 5)
> # add element using c() function
> x <- c(x, 10)
> cat("Array after 1st modification: ", x)
Array after 1st modification: 1 2 3 4 5 10
> # addition of element using append function
> x <- append(x, 7)
> cat("Array after 2nd modification: ", x)
Array after 2nd modification: 1 2 3 4 5 10 7
> # adding elements after computing the length
> len <- length(x)
> cat("Length of the array is: ", len)
Length of the array is: 7
> x[len + 1] <- 8
> cat("Array after 3rd modification: ", x)
Array after 3rd modification: 1 2 3 4 5 10 7 8
> # adding on length + 3 index
> x[len + 3]<-9
> cat("Array after 4th modification: ", x)
Array after 4th modification: 1 2 3 4 5 10 7 8 NA 9
> # append a vector of values to the array after length + 3 of array
> x <- append(x, c(10, 11, 12), after = length(x)+3)
> cat("Array after 5th modification: ", x)
Array after 5th modification: 1 2 3 4 5 10 7 8 NA 9 10 11 12
> # adds new elements after 3rd index
> x <- append(x, c(-1, -1), after = 3)
> cat("Array after 6th modification: ", x)
Array after 6th modification: 1 2 3 -1 -1 4 5 10 7 8 NA 9 10 11 12
Removing Elements from Array
Elements can be removed from arrays in R, either one at a time or multiple together. These elements are
specified as indexes to the array, wherein the array values satisfying the conditions are retained and rest
removed. The comparison for removal is based on array values. Multiple conditions can also be
combined together to remove a range of elements. Another way to remove elements is by using %in%
operator wherein the set of element values belonging to the TRUE values of the operator are displayed
as result and the rest are removed.
Ex: > # creating an array of length 9
> m <- c(10, 20, 30, 40, 50, 60, 70, 80, 90)
> cat("Original array is: ", m)
Original array is: 10 20 30 40 50 60 70 80 90
> # remove a single value element:3 from array
> m <- m[m != 30]
> cat("After 1st modification: ", m)
After 1st modification: 10 20 40 50 60 70 80 90

FACTORS
INTRODUCTION TO FACTORS
• Factors in R Programming Language are data structures that are implemented to categorize the
data or represent categorical data and store it on multiple levels.
• They can be stored as integers with a corresponding label to every unique integer. Though factors
may look similar to character vectors, they are integers and care must be taken while using them
as strings. The factor accepts only a restricted number of distinct values. For example, a data
field such as gender may contain values only from female, male.
Attributes of a factor
There are the following attributes of a factor in R

1. X: It is the input vector which is to be transformed into a factor.


2. Levels: It is an input vector that represents a set of unique values which are taken by x.
3. Labels: It is a character vector which corresponds to the number of labels.
4. Exclude: It is used to specify the value which we want to be excluded,
5. Ordered: It is a logical attribute which determines if the levels are ordered.
6. Nmax: It is used to specify the upper bound for the maximum number of level.
Creating a Factor in R Programming Language: The command used to create or modify a factor in
R language is – factor() with a vector as input. In R, it is quite simple to create a factor. A factor is
created in two steps
The two steps to creating a factor are:
✓ Creating a vector
✓ Converting the created vector into a factor using factor() function.
R provides factor() function to convert the vector into factor. There is the following syntax of factor()
function. factor_name<- factor(vector)
The function is.factor() is used to check whether the variable is a factor and returns “TRUE” if it is a
factor.
Function class() is also used to check whether the variable is a factor and if true returns “factor”.
Ex: > # Create a vector as input.
> data <- c("East","West","East","North","North","East","West","West","West","East","North")
> print(data)
[1] "East" "West" "East" "North" "North" "East" "West" "West" "West" "East" "North"
> # Checking variable is factor or not
> is.factor(data)
[1] FALSE
> # Apply the factor function.
> factor_data <- factor(data)
> print(factor_data)
[1] East West East North North East West West West East North
Levels: East North West
> # Checking variable is factor or not
> is.factor(factor_data)
[1] TRUE
> # Checking type of a variable
> class(factor_data)
[1] "factor"
Accessing elements of a Factor in R
Like we access elements of a vector, the same way we access the elements of a factor. The process of
accessing components of factor is much more similar to the vectors. We can access the element with the
help of the indexing method or using logical vectors. If gender is a factor then gender[i] would mean
accessing ith element in the factor. Let's see an example in which we understand the different-different
ways of accessing the components.
Ex: > # Creating a vector as input.
> data <-c("female", "male", "male", "female")
> # Applying the factor function.
> gender <- factor(data)
> #Printing all elements of factor
> print(gender)
[1] female male male female
Levels: female male
> #Accessing 4th element of factor
> print(gender[4])
[1] female
Levels: female male
> #Accessing 2nd and 4th element
> print(gender[c(2,4)])
[1] male female
Levels: female male
> #Accessing all element except 3rd one
> print(gender[-3])
[1] female male female
Levels: female male
> #Accessing elements using logical vector
> print(gender[c(FALSE,TRUE,FALSE,TRUE)])
[1] male female
Levels: female male
FACTOR LEVELS
factors with specific factor levels. However, sometimes you will want to change the names of these levels
for clarity or other reasons. R allows you to do this with the function levels().
Syntax: levels(factor_name) <- c("name1", "name2",...)
Add new value to existing factor at level by using the following syntax:
levels(factor_name) <- c(levels(factor_name),"levelname1", “levelname2”, , ,)
Ex: > # Code to build factor_compass_vector
> compass_vector<-c("E", "W", "N")
> factor_compass_vector <- factor(compass_vector)
> factor_compass_vector
[1] E W N
Levels: E N W
> # Specify the levels of factor_compass_vector
> levels(factor_compass_vector)<-c("East", "North", "West")
> # Print factor_compass_vector
> factor_compass_vector
[1] East West North
Levels: East North West
> # Add new value to factor_compass_vector level
> levels(factor_compass_vector) <- c(levels(factor_compass_vector),"South")
> # Print factor_compass_vector
> factor_compass_vector
[1] East West North
Levels: East North West South
Generating Factor Levels: We can generate factor levels by using the gl() function. It takes two integers
as input which indicates how many levels and how many times each level.
Syntax: gl(n, k, labels)
Following is the description of the parameters used −
• n is an integer, it indicates the number of levels.
• k is an integer, it indicates the number of replications.
• labels is a vector of labels for the resulting factor levels.
Ex: > factor_data <- gl(3, 4, labels = c("A", "B","C"))
> print(factor_data)
[1] A A A A B B B B C C C C
Levels: A B C
Changing the Order of Levels: The order of the levels in a factor can be changed by applying the
factor() function again with new order of the levels.
Syntax: new_factor_name<-factor(old_factor_name, levels=c(“name1”, “name2”,,,,))
Ex: > # Create a vector as input.
> data <- c("East","West","East","North","North","East","West","West","West","East","North")
> # Create the factors
> factor_data <- factor(data)
> print(factor_data)
[1] East West East North North East West West West East North
Levels: East North West
> # Apply the factor function with required order of the level.
> new_order_data <- factor(factor_data, levels = c("East","West","North"))
> print(new_order_data)
[1] East West East North North East West West West East North
Levels: East West North
Modification of factor: Like data frames, R allows us to modify the factor. We can modify the value of
a factor by simply re-assigning it. In R, we cannot choose values outside of its predefined levels means
we cannot insert value if it's level is not present on it. For this purpose, we have to create a level of that
value, and then we can add it to our factor.
Let's see an example to understand how the modification is done in factors.
Ex: > # Creating a vector as input.
> data <- c("Shubham","Nishka","Arpita","Nishka","Shubham")
> # Applying the factor function.
> factor_data<- factor(data)
> #Printing all elements of factor
> print(factor_data)
[1] Shubham Nishka Arpita Nishka Shubham
Levels: Arpita Nishka Shubham
> #change 4th element of factor with "Gunjan"
> factor_data[4] <- "Gunjan" # cannot assign values outside levels
Warning message:
In `[<-.factor`(`*tmp*`, 4, value = "Gunjan") :
invalid factor level, NA generated
> #Change 4th element of factor with Arpita
> factor_data[4] <-"Arpita"
> print(factor_data)
[1] Shubham Nishka Arpita Arpita Shubham
Levels: Arpita Nishka Shubham
> #Adding the new value to the level
> levels(factor_data) <- c(levels(factor_data),"Gunjan") #Adding new level
> print(factor_data)
[1] Shubham Nishka Arpita Arpita Shubham
Levels: Arpita Nishka Shubham Gunjan
> #Change 4th element of factor with Gunjan
> factor_data[4] <- "Gunjan"
> print(factor_data)
[1] Shubham Nishka Arpita Gunjan Shubham
Levels: Arpita Nishka Shubham Gunjan

SUMMARIZING A FACTOR
The summary() function in R returns the results of basic statistical calculations (minimum, 1st quartile,
median, mean, 3rd quartile, and maximum) for a numerical vector. The general way to write the R
summary function is summary(x, na.rm=FALSE/TRUE). Again, X refers to a numerical vector, while
na.rm=FALSE/TRUE specifies whether to remove empty values from the calculation.
Ex: > # Creating a factor
> v <- gl(3, 4, labels = c("A", "B","C"))
> print(v)
[1] A A A A B B B B C C C C
Levels: A B C
> # summary of a factor is
> summary(v)
AB C
444
ORDERED FACTORS
Level Ordering of Factors: Factors are data objects used to categorize data and store it as levels. They
can store a string as well as an integer. They represent columns as they have a limited number of unique
values. Factors in R can be created using factor() function. It takes a vector as input. c() function is used
to create a vector with explicitly provided values.
Ex: > # Creating a vector as input.
> x <- c("Pen", "Pencil", "Brush", "Pen", "Brush", "Brush", "Pencil", "Pencil")
> print(x)
[1] "Pen" "Pencil" "Brush" "Pen" "Brush" "Brush" "Pencil" "Pencil"
> # Creating variable is x is factor or not
> print(is.factor(x))
[1] FALSE
> # Convert vector to factor
> factor_x = factor(x)
> # print levels of factor
> levels(factor_x)
[1] "Brush" "Pen" "Pencil"
In the above code, x is a vector with 8 elements. To convert it to a factor the function factor() is used.
Here there are 8 factors and 3 levels. Levels are the unique elements in the data. Can be found using
levels() function.
Ordering Factor Levels: Ordered factors is an extension of factors. It arranges the levels in increasing
order. We use two functions: factor() along with argument ordered().
Syntax: factor(data, levels =c(“name1”, “name2”, , ,), ordered =TRUE)
Parameter are
• data: input vector with explicitly defined values.
• levels(): Mention the list of levels in c function.
• ordered: It is set true for enabling ordering.
Ex: > # converting to vector
> size = c("small", "large", "large", "small", "medium", "large", "medium", "medium", "large")
> # converting to factor
> size_factor <- factor(size)
> print(size_factor)
[1] small large large small medium large medium medium large
Levels: large medium small
> # ordering the levels
> ordered_size <- factor(size, levels = c("small", "medium", "large"), ordered = TRUE)
> print(ordered_size)
[1] small large large small medium large medium medium large
Levels: small < medium < large
> summary(ordered_size)
small medium large
2 3 4
In the above code, size vector is created using c function. Then it is converted to a factor. And
for ordering factor() function is used along with the arguments described above. Thus, the sizes
arranged in order.

COMPARING ORDERED FACTORS


Having a day at work, ‘data analyst number two’ enters your office and starts complaining that ‘data
analyst number five’ is slowing down the entire project. Since you know that ‘data analyst number two’
has the reputation of being a smarty-pants, you first decide to check if his statement is true.
The fact that factor_speed_vector is now ordered enables us to compare different elements (the data
analysts in this case). You can simply do this by using the well-known operators.
Instructions
• Use [2] to select from factor_speed_vector the factor value for the second data analyst. Store
it as da2.
• Use [5] to select the factor_speed_vector factor value for the fifth data analyst. Store it as da5.
• Check if da2 is greater than da5; simply print out the result. Remember that you can use
the > operator to check whether one element is larger than the other.
Ex: > # Create factor_speed_vector
> speed_vector <- c("medium", "slow", "slow", "medium", "fast")
> factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("slow", "medium", "fast"))
> factor_speed_vector
[1] medium slow slow medium fast
Levels: slow < medium < fast
> # Factor value for second data analyst
> da2 <- factor_speed_vector[2]
> da2
[1] slow
Levels: slow < medium < fast
> # Factor value for fifth data analyst
> da5 <- factor_speed_vector[5]
> da5
[1] fast
Levels: slow < medium < fast
> # Is data analyst 2 faster than data analyst 5?
> da2>da5
[1] FALSE
> # Is data analyst 5 faster than data analyst 2?
> da2<da5
[1] TRUE

DATA FRAMES
INTRODUCTION TO DATA FRAME
A data frame is a table or a two-dimensional array-like structure in which each column contains values
of one variable and each row contains one set of values from each column. A data frame is a special case
of the list in which each component has equal length. Data frames can also be interpreted as matrices
where each column of a matrix can be of the different data types.
A data frame is used to store data table and the vectors which are present in the form of a list in a data
frame, are of equal length. In a simple way, it is a list of equal length vectors. A matrix can contain one
type of data, but a data frame can contain different data types such as numeric, character, factor, etc.
Following are the characteristics of a data frame.
• The column names should be non-empty.
• The row names should be unique.
• The data stored in a data frame can be of numeric, factor or character type.
• Each column should contain same number of data items.
Creating Data Frame: In R, the data frames are created with the help of frame() function of data. This
function contains the vectors of any type such as numeric, character, or integer. In below example, we
create a data frame that contains student id (integer vector), student name(character vector), age(numeric
vector), and date of birth(Date vector).
Ex: > # Creating the data frame.
> student.data <- data.frame (
student_id = c(1:5),
student_name= c("Rohan", "Rohith", "David", "Mary", "James"),
age=c(18, 20, 19, 21, 22),
date_of_birth = as.Date(c("2003-01-01", "2002-12-23", "2003-09-25", "2001-10-05", "2003-09-07"))
)
> # print the data frame
> print(student.data)
student_id student_name age date_of_birth
1 1 Rohan 18 2003-01-01
2 2 Rohith 20 2002-12-23
3 3 David 19 2003-09-25
4 4 Mary 21 2001-10-05
5 5 James 22 2003-09-07
Getting the structure of R Data Frame: In R, we can find the structure of our data frame. R provides
an in-build function called str() which returns the data with its complete structure. In below example,
we have created a frame using a vector of different data type and extracted the structure of it.
Ex: > # Creating the data frame.
> student.data <- data.frame (
student_id = c(1:5),
student_name= c("Rohan", "Rohith", "David", "Mary", "James"),
age=c(18, 20, 19, 21, 22),
date_of_birth = as.Date(c("2003-01-01", "2002-12-23", "2003-09-25", "2001-10-05", "2003-09-07"))
)
> # Printing the structure of data frame.
> str(student.data)
'data.frame' : 5 obs. of 4 variables:
$ student_id : int 1 2 3 4 5
$ student_name : chr "Rohan" "Rohith" "David" "Mary" ...
$ age : num 18 20 19 21 22
$ date_of_birth : Date, format: "2003-01-01" "2002-12-23" "2003-09-25" "2001-10-05"...
Extract Data from Data Frame: The data of the data frame is very crucial for us. To manipulate the
data of the data frame, it is essential to extract it from the data frame. We can extract the data in three
ways which are as follows:
1. We can extract the specific columns from a data frame using the column name.
2. We can extract the specific rows also from a data frame.
3. We can extract the specific rows corresponding to specific columns.
Let's see an example of each one to understand how data is extracted from the data frame with the help
these ways.
• Extract specific column from a data frame using column name.
> # Creating the data frame.
> student.data <- data.frame (
student_id = c(1:5),
student_name= c("Rohan", "Rohith", "David", "Mary", "James"),
age=c(18, 20, 19, 21, 22),
date_of_birth = as.Date(c("2003-01-01", "2002-12-23", "2003-09-25", "2001-10-05", "2003-09-07"))
)
> # print the data frame
> print(student.data)
student_id student_name age date_of_birth
1 1 Rohan 18 2003-01-01
2 2 Rohith 20 2002-12-23
3 3 David 19 2003-09-25
4 4 Mary 21 2001-10-05
5 5 James 22 2003-09-07
> # Extract Specific columns.
> result <-data.frame(student.data$student_id, student.data$student_name)
> print(result)
student.data.student_id student.data.student_name
1 1 Rohan
2 2 Rohith
3 3 David
4 4 Mary
5 5 James
• Extracting the specific rows from a data frame
> # Creating the data frame.
> student.data <- data.frame (
student_id = c(1:5),
student_name= c("Rohan", "Rohith", "David", "Mary", "James"),
age=c(18, 20, 19, 21, 22),
date_of_birth = as.Date(c("2003-01-01", "2002-12-23", "2003-09-25", "2001-10-05", "2003-09-07"))
)
> # print the data frame
> print(student.data)
student_id student_name age date_of_birth
1 1 Rohan 18 2003-01-01
2 2 Rohith 20 2002-12-23
3 3 David 19 2003-09-25
4 4 Mary 21 2001-10-05
5 5 James 22 2003-09-07
> # Extracting first row from a data frame
> result<-student.data[1,]
> result
student_id student_name age date_of_birth
1 1 Rohan 18 2003-01-01
> result<-student.data[3:5, ]
> result
student_id student_name age date_of_birth
3 3 David 19 2003-09-25
4 4 Mary 21 2001-10-05
5 5 James 22 2003-09-07
• Extracting specific rows corresponding to specific columns
> # Creating the data frame.
> student.data <- data.frame (
student_id = c(1:5),
student_name= c("Rohan", "Rohith", "David", "Mary", "James"),
age=c(18, 20, 19, 21, 22),
date_of_birth = as.Date(c("2003-01-01", "2002-12-23", "2003-09-25", "2001-10-05", "2003-09-07"))
)
> # print the data frame
> print(student.data)
student_id student_name age date_of_birth
1 1 Rohan 18 2003-01-01
2 2 Rohith 20 2002-12-23
3 3 David 19 2003-09-25
4 4 Mary 21 2001-10-05
5 5 James 22 2003-09-07
> #Extract 3rd and 5th row with 2nd and 4th column
> result<-student.data[c(3,5),c(2,4)]
> result
student_name date_of_birth
3 David 2003-09-25
5 James 2003-09-07
Summary of Data in Data Frame: The statistical summary and nature of the data can be obtained by
applying summary() function. R provides the summary() function to extract the statistical summary and
nature of the data. This function takes the data frame as a parameter and returns the statistical information
of the data.
Ex: > # Create the data frame.
> emp.data<-data.frame (
emp_id=c(1:15),
emp_name=c("Rick", "Dan", "Michelle", "Ryan", "Gary", "Chetan", "Nikhil", "Chithra",
"Shan", "Vidya", "Deepak", "Eshan", "Visha", "Benny", "Adi"),
salary=c(623. 3, 515.2, 611.0, 729.0, 843.25, 623.3, 515.2, 611.0, 729.0, 843.25, 623.3, 515.2,
611.0, 729.0, 843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27",
"2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27","2012-01-01", "2013-
09-23", "2014-11-15", "2014-05-11", "2015- 03-27")),
stringsAsFactors = FALSE
)
> # Print the emp data.
> print(emp.data)
emp_id emp_name salary start_date
1 1 Rick 623.30 2012-01-01
2 2 Dan 515.20 2013-09-23
3 3 Michelle 611.00 2014-11-15
4 4 Ryan 729.00 2014-05-11
5 5 Gary 843.25 2015-03-27
6 6 Chetan 623.30 2012-01-01
7 7 Nikhil 515.20 2013-09-23
8 8 Chithra 611.00 2014-11-15
9 9 Shan 729.00 2014-05-11
10 10 Vidya 843.25 2015-03-27
11 11 Deepak 623.30 2012-01-01
12 12 Eshan 515.20 2013-09-23
13 13 Visha 611.00 2014-11-15
14 14 Benny 729.00 2014-05-11
15 15 Adi 843.25 2015-03-27
> # Print the summary.
> print(summary(emp.data))
emp_id emp_name salary start_date
Min. :1.0 Length:15 Min. :515.2 Min. :2012-01-01
1st Qu. :4.5 Class :character 1st Qu. :611.0 1st Qu. :2013-09-23
Median :8.0 Mode :character Median :623.3 Median :2014-05-11
Mean :8.0 Mean :664.4 Mean :2014-01-14
3rd Qu. :11.5 3rd Qu. :729.0 3rd Qu. :2014-11-15
Max. :15.0 Max. :843.2 Max. :2015-03-27
> # display number of rows
> nrow(emp.data)
[1] 15
> # display number of columns
> ncol(emp.data)
[1] 4
> # display both no. of rows and col
> dim(emp.data)
[1] 15 4
> # print the first 6 rows in the data frame
> head(emp.data)
emp_id emp_name salary start_date
1 1 Rick 623.30 2012-01-01
2 2 Dan 515.20 2013-09-23
3 3 Michelle 611.00 2014-11-15
4 4 Ryan 729.00 2014-05-11
5 5 Gary 843.25 2015-03-27
6 6 Chetan 623.30 2012-01-01
> # print the last 6 rows in the data frame
> tail(emp.data)
emp_id emp_name salary start_date
10 10 Vidya 843.25 2015-03-27
11 11 Deepak 623.30 2012-01-01
12 12 Eshan 515.20 2013-09-23
13 13 Visha 611.00 2014-11-15
14 14 Benny 729.00 2014-05-11
15 15 Adi 843.25 2015-03-27
SUBSETTING OF A DATA FRAME
subset() function in R Programming Language is used to create subsets of a Data frame. This
can also be used to drop columns from a data frame.
Ex: > # Create the data frame.
> emp.data<-data.frame (
emp_id=c(1:5),
emp_name=c("Rick", "Dan", "Michelle", "Ryan", "Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")),
stringsAsFactors = FALSE
)
> # print the data frame
> emp.data
emp_id emp_name salary start_date
1 1 Rick 623.30 2012-01-01
2 2 Dan 515.20 2013-09-23
3 3 Michelle 611.00 2014-11-15
4 4 Ryan 729.00 2014-05-11
5 5 Gary 843.25 2015-03-27
> subset(emp.data, emp_id == 3)
emp_id emp_name salary start_date
3 3 Michelle 611 2014-11-15
> subset(emp.data, emp_id == c(1:3))
emp_id emp_name salary start_date
1 1 Rick 623.3 2012-01-01
2 2 Dan 515.2 2013-09-23
3 3 Michelle 611.0 2014-11-15

EXTENDING DATA FRAME / EXPAND DATA FRAME


R allows us to do extending (modification) in our data frame. Like matrices modification, we can modify
our data frame through re-assignment. We cannot only add rows and columns, but also, we can delete
them. The data frame is expanded by adding rows and columns. We can
1. Add a column by adding a column vector with the help of a new column name using cbind()
function.
2. Add rows by adding new rows in the same structure as the existing data frame and using rbind()
function
3. Delete the columns by assigning a NULL value to them.
4. Delete the rows by re-assignment to them.
Add Column: To add more columns permanently to an existing data frame, we need to bring in the new
columns in the same structure as the existing data frame and use the cbind() function. Just add the
column vector using a new column name.
> # Create the data frame.
> emp.data<-data.frame (
emp_id=c(1:5),
emp_name=c("Rick", "Dan", "Michelle", "Ryan", "Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")),
stringsAsFactors = FALSE
)
> # print the data frame
> print(emp.data)
emp_id emp_name salary start_date
1 1 Rick 623.30 2012-01-01
2 2 Dan 515.20 2013-09-23
3 3 Michelle 611.00 2014-11-15
4 4 Ryan 729.00 2014-05-11
5 5 Gary 843.25 2015-03-27
> # Add the "dept" column.
> emp.data$dept <- c("IT","Operations","IT","HR","Finance")
> v <- emp.data
> print(v)
emp_id emp_name salary start_date dept
1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT
4 4 Ryan 729.00 2014-05-11 HR
5 5 Gary 843.25 2015-03-27 Finance
> # Add the "Location" column using cbind().
> L <- c("Hyderabad"," Bengaluru"," Chennai "," Mumbai "," Delhi ")
> cbind(emp.data, location=L)
emp_id emp_name salary start_date dept location
1 1 Rick 623.30 2012-01-01 IT Hyderabad
2 2 Dan 515.20 2013-09-23 Operations Bengaluru
3 3 Michelle 611.00 2014-11-15 IT Chennai
4 4 Ryan 729.00 2014-05-11 HR Mumbai
5 5 Gary 843.25 2015-03-27 Finance Delhi
Add Row: To add more rows permanently to an existing data frame, we need to bring in the new rows
in the same structure as the existing data frame and use the rbind() function.
In the example below we create a data frame with new rows and merge it with the existing data frame to
create the final data frame.
> # Create the first data frame.
> emp.data<-data.frame (
emp_id=c(1:5),
emp_name=c("Rick", "Dan", "Michelle", "Ryan", "Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")),
stringsAsFactors = FALSE
)
> # print the data frame
> emp.data
emp_id emp_name salary start_date dept
1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT
4 4 Ryan 729.00 2014-05-11 HR
5 5 Gary 843.25 2015-03-27 Finance
> # Create the second data frame
> emp.newdata <- data.frame (
emp_id = c (6:8),
emp_name = c("Rasmi", "Pranab", "Tusar"),
salary = c(578.0, 722.5,632.8),
start_date = as.Date(c("2013-05-21", "2013-07-30", "2014-06-17")),
dept = c("IT", "Operations", "Fianance")
)
> # print the data frame
> emp.newdata
emp_id emp_name salary start_date dept
1 6 Rasmi 578.0 2013-05-21 IT
2 7 Pranab 722.5 2013-07-30 Operations
3 8 Tusar 632.8 2014-06-17 Fianance
> # Bind the two data frames.
> emp.finaldata <- rbind(emp.data,emp.newdata)
> # print the data frame
> emp.finaldata
emp_id emp_name salary start_date dept
1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT
4 4 Ryan 729.00 2014-05-11 HR
5 5 Gary 843.25 2015-03-27 Finance
6 6 Rasmi 578.00 2013-05-21 IT
7 7 Pranab 722.50 2013-07-30 Operations
8 8 Tusar 632.80 2014-06-17 Fianance
Remove Rows and Columns: Use the c() function to remove number of rows and number of columns
at time in a Data Frame.
Syntax for remove row and column is data_frame<- data_frame[c(-rowno), c(-columnno)]
Syntax for remove row is Data_frame<-data.frame[-rowno, ]
Syntax for remove column is Data_frame<-data.frame[, -columnno] or data_frame$columnno<-Null
> # Create the first data frame.
> emp.data <- data.frame (
emp_id = c (1:5),
emp_name = c("Rick", "Dan", "Michelle", "Ryan", "Gary"),
salary = c(623.3, 515.2, 611.0, 729.0, 843.25),
start_date = as.Date (c("2012-01-01", "2013-09- 23", "2014-11-15", "2014-05-11", "2015-03-27")),
dept = c("IT", "Operations", "IT", "HR", "Finance"),
location=c("Hyderabad", " Bengaluru", "Chennai ", "Mumbai", "Delhi"),
stringsAsFactors = FALSE
)
> # print the data frame
> emp.data
emp_id emp_name salary start_date dept location
1 1 Rick 623.30 2012-01-01 IT Hyderabad
2 2 Dan 515.20 2013-09-23 Operations Bengaluru
3 3 Michelle 611.00 2014-11-15 IT Chennai
4 4 Ryan 729.00 2014-05-11 HR Mumbai
5 5 Gary 843.25 2015-03-27 Finance Delhi
> # Remove the first row and column
> emp.data<-emp.data[-1, -1]
> # Print the new data frame
> emp.data
emp_name salary start_date dept location
2 Dan 515.20 2013-09-23 Operations Bengaluru
3 Michelle 611.00 2014-11-15 IT Chennai
4 Ryan 729.00 2014-05-11 HR Mumbai
5 Gary 843.25 2015-03-27 Finance Delhi
> # Remove the 2nd and 4th rows and 3rd and 5th column
> emp.data<-emp.data[c(-2, -4), c(-3, -5)]
> # Print the new data frame
> emp.data
emp_name salary dept
2 Dan 515.20 Operations
4 Ryan 729.00 HR
> # Remove 2nd row from data frame
> emp.data<-emp.data[-2, ]
> emp.data
emp_name salary dept
2 Dan 515.20 Operations
> # Remove 3rd column from data frame
> emp.data<-emp.data[, -3]
> emp.data
emp_name salary
2 Dan 515.20
> # Remove 2nd column from data frame
> emp.data$salary<-NULL
> emp.data
emp_name
2 Dan
SORTING DATA
To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the
sorting variable by a minus sign to indicate DESCENDING order. Here are some examples.
Syntax: data_frame[order(data_frame$columnname, decreasing = TRUE), ]
Here decreasing parameter is optional, if the value is true then data is sorted in decreasing order.
To sort the data in increasing order then no need to specify the decreasing parameter.
> # Create the first data frame.
> data = data.frame(rollno = c(1, 5, 4, 2, 3), subjects = c("java", "python", "php", "sql", "c"))
> print(data)
rollno subjects
1 1 java
2 5 python
3 4 php
4 2 sql
5 3 c
> # sort the data in increasing order based on rollno
> print(data[order(data$rollno), ] )
rollno subjects
1 1 java
4 2 sql
5 3 c
3 4 php
2 5 python
> # sort the data in decresing order based on rollno
> print(data[order(data$rollno, decreasing = TRUE), ] )
rollno subjects
2 5 python
3 4 php
5 3 c
4 2 sql
1 1 java
> # sort the data in decreasing order based on subjects
> print(data[order(data$subjects, decreasing = TRUE), ] )
rollno subjects
4 2 sql
2 5 python
3 4 php
1 1 java
5 3 c
LISTS
INTRODUCTION TO LIST
• Lists are one-dimensional, heterogeneous data structures. In R, lists are the second type of
vector. Lists are the objects of R which contain elements of different types such as number,
vectors, string and another list inside it.
• It can also contain a function or a matrix as its elements. A list is a data structure which has
components of mixed data types. We can say, a list is a generic vector which contains other
objects.
• The list can be a list of vectors, a list of matrices, a list of characters and a list of functions, and
so on. A list is a vector but with heterogeneous data elements.
• A list in R is created with the use of list() function. R allows accessing elements of a list with the
use of the index value. In R, the indexing of a list starts with 1 instead of 0 like other programming
languages.
CREATING A LIST
To create a List in R you need to use the function called “list()”. In other words, a list is a generic vector
containing other objects. The process of creating a list is the same as a vector. In R, the vector is created
with the help of c() function. Like c() function, there is another function, i.e., list() which is used to create
a list in R. A list avoids the drawback of the vector which is data type. We can add the elements in the
list of different data types.
Ex1: > # Creating list with same data type
> list(1,2,3)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
> list("Shubham","Arpita","Vaishali")
[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] "Vaishali"
> list(c(1,2,3))
[[1]]
[1] 1 2 3
> list(TRUE,FALSE,TRUE)
[[1]]
[1] TRUE
[[2]]
[1] FALSE
[[3]]
[1] TRUE
Ex2: > # Creating the list with different data type
> list_data<-list("Shubham","Arpita",c(1,2,3,4,5),TRUE,FALSE,22.5,12L)
> list_data
[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] 1 2 3 4 5
[[4]]
[1] TRUE
[[5]]
[1] FALSE
[[6]]
[1] 22.5
[[7]]
[1] 12
Ex3: Create list of employees with the following details. The attributes such as ID, employee name, and
the number of employees.
> empId = c(1, 2, 3, 4)
> empName = c("Debi", "Sandeep", "Subham", "Shiba")
> numberOfEmp = 4
> empList = list(empId, empName, numberOfEmp)
> empList
[[1]]
[1] 1 2 3 4
[[2]]
[1] "Debi" "Sandeep" "Subham" "Shiba"
[[3]]
[1] 4
Ex4: # Creating a list containing a vector, a matrix and a list.
> list_data <- list(c("Shubham","Nishka","Gunjan"), matrix(c(40,80,60,70,90,80), nrow = 2),
list("BCA","MCA","B.tech"))
> print(list_data)
[[1]]
[1] "Shubham" "Nishka" "Gunjan"
[[2]]
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
[[3]]
[[3]][[1]]
[1] "BCA"
[[3]][[2]]
[1] "MCA"
[[3]][[3]]
[1] "B.tech"

CREATING A NAMED LIST (GIVING A NAME TO LIST ELEMENTS)


R provides a very easy way for accessing elements, i.e., by giving the name to each element of a list. By
assigning names to the elements, we can access the element easily. There are only three steps to print the
list data corresponding to the name:
1. Creating a list.
2. Assign a name to the list elements with the help of names() function.
3. Print the list data.
Ex: > # Creating a list containing a vector, a matrix and a list.
> list_data <- list(c("Shubham","Nishka","Gunjan"), matrix(c(40,80,60,70,90,80), nrow = 2),
list("BCA","MCA","B.tech"))
> # Giving names to the elements in the list.
> names(list_data) <- c("Students", "Marks", "Course")
> # Show the list.
> print(list_data)
$Students
[1] "Shubham" "Nishka" "Gunjan"
$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
$Course
$Course[[1]]
[1] "BCA"
$Course[[2]]
[1] "MCA"
$Course[[3]]
[1] "B.tech"
Ex: > # Creating a named list
> my_named_list <- list(name = "Sudheer", age = 25, city = "Delhi")
# Printing the named list
> print(my_named_list)
$name
[1] "Sudheer"
$age
[1] 25
$city
[1] "Delhi"
ACCESSING LIST ELEMENTS
R provides two ways through which we can access the elements of a list.
• Access components by indices: First one is the indexing method performed in the same way as
a vector. We can also access the components of the list using indices. To access the top-level
components of a list we have to use a double slicing operator “[[ ]]” which is two square brackets
and if we want to access the lower or inner level components of a list we have to use another
square bracket “[ ]” along with the double slicing operator “[[ ]]”.
Ex: > # Creating a list
> empId = c(1, 2, 3, 4)
> empName = c("Debi", "Sandeep", "Subham", "Shiba")
> numberOfEmp = 4
> empList = list("ID" = empId, "Names" = empName, "Total Staff" = numberOfEmp)
> # Printing data in list
> empList
$ID
[1] 1 2 3 4
$Names
[1] "Debi" "Sandeep" "Subham" "Shiba"
$ ‘Total Staff’
[1] 4
> # Accessing name components using indices
> print(empList[2])
$Names
[1] "Debi" "Sandeep" "Subham" "Shiba"
> # Accessing top level name components using indices
> print(empList[[2]])
[1] "Debi" "Sandeep" "Subham" "Shiba"
> # Accessing inner level components by indices
> # Accessing name component of 2nd value by indices
> print(empList[[2]][2])
[1] "Sandeep"
> # Accessing ID from 4th component using indices
> empList[[1]][4]
[1] 4
• Access components by names: In the second one, we can access the elements of a list with the
help of names. It will be possible only with the named list. we cannot access the elements of a
list using names if the list is normal. All the components of a list can be named and we can use
those names to access the components of the list using the dollar command.
Ex: > # Creating a list
> empId = c(1, 2, 3, 4)
> empName = c("Debi", "Sandeep", "Subham", "Shiba")
> numberOfEmp = 4
> empList = list("ID" = empId, "Names" = empName, "Total Staff" = numberOfEmp)
> # Printing data in list
> empList
$ID
[1] 1 2 3 4
$Names
[1] "Debi" "Sandeep" "Subham" "Shiba"
$ ‘Total Staff’
[1] 4
> # Accessing ID element of the list.
> print(empList["ID"])
$ID
[1] 1 2 3 4
> # Accessing Names element of the list.
> print(empList$Names)
[1] "Debi" "Sandeep" "Subham" "Shiba"

MANIPULATING LIST ELEMENTS


R allows us to add, delete, or update elements in the list. We can update an element of a list from
anywhere, but elements can add or delete only at the end of the list. To remove an element from a
specified index, we will assign it a null value. To delete components of a list, first of all, we need to
access those components and then insert a negative sign before those components. We can update the
element of a list by overriding it from the new value. Let see an example to understand how we can add,
delete, or update the elements in the list.
Ex: > # Creating a list
> empId = c(1, 2, 3, 4)
> empName = c("Debi", "Sandeep", "Subham", "Shiba")
> numberOfEmp = 4
> empList = list("ID" = empId, "Names" = empName, "Total Staff" = numberOfEmp)
> # Printing data in list
> cat("Before modifying the list\n")
Before modifying the list
> empList
$ID
[1] 1 2 3 4
$Names
[1] "Debi" "Sandeep" "Subham" "Shiba"
$ ‘Total Staff’
[1] 4
> # Modifying the top-level component
> empList$"Total Staff" = 5
> cat("After Modifying the top-level component \n")
After Modifying the top-level component
> empList
$ID
[1] 1 2 3 4
$Names
[1] "Debi" "Sandeep" "Subham" "Shiba"
$`Total Staff`
[1] 5
> # Modifying inner level component
> empList[[1]][5] = 5
> empList[[2]][5] = "Kamala"
> cat("After Modifying the inner level component \n")
After Modifying the inner level component
> empList
$ID
[1] 1 2 3 4 5
$Names
[1] "Debi" "Sandeep" "Subham" "Shiba" "Kamala"
$`Total Staff`
[1] 5
> # Removing the last element.
> empList$`Total Staff`<-NULL or > empList<-empList[-3]
> # Removing Kamala element from Names component
> empList[[2]][-5]
[1] "Debi" "Sandeep" "Subham" "Shiba"
> cat("After Removing the last element, the list is: \n")
After Removing the last element, the list is:
> empList
$ID
[1] 1 2 3 4 5
$Names
[1] "Debi" "Sandeep" "Subham" "Shiba" "Kamala"
MERGING LISTS
R allows us to merge one or more lists into one list. Merging is done with the help of the list() function
or c() function also. To merge the lists, we have to pass all the lists into list() function or c() function as
a parameter, and it returns a list which contains all the elements which are present in the lists. Let see a
n example to understand how the merging process is done.
Ex: > # Creating two lists.
> # Creating first lists.
> Even_list <- list(2,4,6,8,10)
> # print first list of elements.
> Even_list
[[1]]
[1] 2
[[2]]
[1] 4
[[3]]
[1] 6
[[4]]
[1] 8
[[5]]
[1] 10
> # Creating second lists.
> Odd_list <- list(1,3,5,7,9)
> # print second list of elements.
> Odd_list
[[1]]
[1] 1
[[2]]
[1] 3
[[3]]
[1] 5
[[4]]
[1] 7
[[5]]
[1] 9
# Merging the two lists using list() function.
> mergedlist <- list(Even_list, Odd_list)
> print(mergedlist)
[[1]]
[[1]][[1]]
[1] 2
[[1]][[2]]
[1] 4
[[1]][[3]]
[1] 6
[[1]][[4]]
[1] 8
[[1]][[5]]
[1] 10

[[2]]
[[2]][[1]]
[1] 1
[[2]][[2]]
[1] 3
[[2]][[3]]
[1] 5
[[2]][[4]]
[1] 7
[[2]][[5]]
[1] 9
# Merging the two lists using c() function.
> newlist <- c(Even_list, Odd_list)
> print(newlist)
[[1]]
[1] 2
[[2]]
[1] 4
[[3]]
[1] 6
[[4]]
[1] 8
[[5]]
[1] 10
[[6]]
[1] 1
[[7]]
[1] 3
[[8]]
[1] 5
[[9]]
[1] 7
[[10]]
[1] 9

CONVERTING LISTS TO VECTORS


There is a drawback with the list, i.e., we cannot perform all the arithmetic operations on list elements.
To remove this, drawback R provides unlist() function. This function converts the list into vectors. In
some cases, it is required to convert a list into a vector so that we can use the elements of the vector for
further manipulation.
The unlist() function takes the list as a parameter and change into a vector. Let see an example to
understand how to unlist() function is used in R.
Ex: > # Creating list1.
> list1 <- list(10:20)
> print(list1)
[[1]]
[1] 10 11 12 13 14 15 16 17 18 19 20
> # Creating list2.
> list2 <-list(5:14)
> list2
[[1]]
[1] 5 6 7 8 9 10 11 12 13 14
> # perform arithmetic operations on list1 and list2
> list1+list2
Error in list1 + list2 : non-numeric argument to binary operator
> # Converting the lists to vectors.
> v1 <- unlist(list1)
> v1
[1] 10 11 12 13 14 15 16 17 18 19 20
> v2 <- unlist(list2)
[1] 5 6 7 8 9 10 11 12 13 14
> # perform arithmetic operations on v1 and v2
> result <- v1+v2
> result
[1] 15 17 19 21 23 25 27 29 31 33 25

You might also like