0% found this document useful (0 votes)
180 views

IDS-Unit 3

R22 JNTUH Introduction to Data Science

Uploaded by

vsamsri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
180 views

IDS-Unit 3

R22 JNTUH Introduction to Data Science

Uploaded by

vsamsri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 142

UNIT- III

Vectors: Creating and Naming Vectors, Vector Arithmetic, Vector sub


setting,
Matrices: Creating and Naming Matrices, Matrix Sub setting, Arrays, class.
Factors: Introduction to Factors: Factor Levels, summarizing a Factor,
Ordered Factors, Comparing Ordered Factors.
Data Frames: Introduction to Data Frame, sub setting of Data Frames,
Extending Data Frames, Sorting Data Frames.
Lists: Introduction, creating a List: Creating a Named List, Accessing List
Elements, Manipulating List Elements, Merging Lists, Converting Lists to
Vectors.
=====================================================

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 1


1. Vectors:
• A vector is a basic data structure which plays an important role in R
programming.
• In R, a sequence of elements which share the same data type is known as
vector.
• A vector supports logical, integer, double, character, complex, or raw data
type.
• The elements which are contained in vector known as components of the
vector.
• We can check the type of vector with the help of the typeof() function.
• The length is an important property of a vector.
• A vector length is basically the number of elements in the vector, and it is
calculated with the help of the length() function.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 2


Vector is classified into two parts, i.e., Atomic vectors and Lists.
• They have three common properties, i.e., function type, function length,
and attribute function.
• There is only one difference between atomic vectors and lists.
• In an atomic vector, all the elements are of the same type, but in the list, the
elements are of different data types.
How to create a vector in R?
• In R, we use c() function to create a vector.
• This function returns a one-dimensional array or simply vector.
• The c() function is a generic function which combines its argument.
• All arguments are restricted with a common data type which is the type of
the returned value.
• There are various other ways to create a vector in R, which are as follows:

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 3


Using the colon(:) operator
We can create a vector with the help of the colon operator.
There is the following syntax to use colon operator:
z<-x:y
This operator creates a vector with elements from x to y and assigns it to z.
Example:
a<-4:-10
a
Output:
[1] 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 4


2) Using the seq() function
• In R, we can create a vector with the help of the seq() function.
• A sequence function creates a sequence of elements as a vector.
• The seq() function is used in two ways, i.e., by setting step size with ? by'
parameter or specifying the length of the vector with the 'length.out' feature.
Example: Example:
seq_vec<-seq(1,4,by=0.5) seq_vec<-seq(1,4,length.out=6)
seq_vec seq_vec
class(seq_vec) class(seq_vec)
Output : Output
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 [1] 1.0 1.6 2.2 2.8 3.4 4.0
[1] "numeric"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 5


Vector Sub-setting:
• Use R base bracket notation to subset the vector in R.
• By using this notation, we can subset the vector by index, name, value, by
checking the condition, by range etc.
• R also provides a subset () function to sub-setting the vector by name and
index.
• The subset() is a generic function that can be also used to subset
dataframe and matrix.
• subsetting an atomic vector with [] and extending by using other options, and
learning how to subset a vector with positive integers, negative integers, a
logical vector, or a character vector, and many more.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 6


Atomic vectors in R
• In R, there are four types of atomic vectors. Atomic vectors play an important
role in Data Science.
• Atomic vectors are created with the help of c() function.
• These atomic vectors are as follows:

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 7


Numeric vector
• The decimal values are known as numeric data types in R.
• If we assign a decimal value to any variable d, then this d variable will become
a numeric type.
• A vector which contains numeric elements is known as a numeric vector.
Example: class(num_vec)
d<-45.5 Output:
num_vec<-c(10.1, 10.2, 33.2) [1] 45.5
d [1] 10.1 10.2 33.2
num_vec [1] "numeric"
class(d) [1] "numeric"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 8


Integer vector
• A non-fraction numeric value is known as integer data.
• This integer data is represented by "Int."
• The Int size is 2 bytes and long Int size of 4 bytes.
• There is two way to assign an integer value to a variable, i.e., by using
as.integer() function and appending of L to the value.
• A vector which contains integer elements is known as an integer vector.
Example:
d<-as.integer(5) class(d)
e<-5L class(e)
int_vec<-c(1,2,3,4,5) class(int_vec)
int_vec<-as.integer(int_vec) class(int_vec1)
int_vec1<-c(1L,2L,3L,4L,5L)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 9


Output:
[1] "integer"
[1] "integer"
[1] "integer"
[1] "integer"
Character vector
• A character is held as a one-byte integer in memory.
• In R, there are two different ways to create a character data type value, i.e.,
using as.character() function and by typing string between double quotes("")
or single quotes('').
• A vector which contains character elements is known as an integer vector.
Example:
d<-'shubham' e<-"Arpita"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 10


f<-65 char_vec1<-
f<-as.character(f) c("shubham","arpita","nishka","vais
d hali")
e char_vec
f class(d)
char_vec<-c(1,2,3,4,5) class(e)
char_vec<-as.character(char_vec) class(f)
class(char_vec)
class(char_vec1)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 11


Output
[1] "shubham"
[1] "Arpita"
[1] "65"
[1] "1" "2" "3" "4" "5"
[1] "shubham" "arpita" "nishka" "vaishali"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 12


Logical vector
• The logical data types have only two values i.e., True or False.
• These values are based on which condition is satisfied.
• A vector which contains Boolean values is known as the logical vector.
Example:
d<-as.integer(5)
e<-as.integer(6)
f<-as.integer(7)
g<-de
h<-e<f
g
h
log_vec<-c(d<e, d<f, e<d,e<f,f<d,f<e)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 13


log_vec
class(g)
class(h)
class(log_vec)
Output:
[1] FALSE
[1] TRUE
[1] TRUE TRUE FALSE TRUE FALSE FALSE
[1] "logical"
[1] "logical"
[1] "logical"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 14


Accessing elements of vectors:
• We can access the elements of a vector with the help of vector indexing.
• Indexing denotes the position where the value in a vector is stored.
• Indexing will be performed with the help of integer, character, or logic.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 15


Indexing with integer vector
• On integer vector, indexing is performed in the same way as we have applied
in C, C++, and java.
• There is only one difference, i.e., in C, C++, and java the indexing starts from
0, but in R, the indexing starts from 1.
• Like other programming languages, we perform indexing by specifying an
integer value in square braces [] next to our vector.
Example: Output
seq_vec<-seq(1,4,length.out=6) [1] 1.0 1.6 2.2 2.8 3.4 4.0
seq_vec [1] 1.6
seq_vec[2]

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 16


2) Indexing with a character vector
• In character vector indexing, we assign a unique key to each element of the
vector.
• These keys are uniquely defined as each element and can be accessed very
easily.
• Let's see an example to understand how it is performed.
Example:
char_vec<-c("shubham"=22,"arpita"=23,"vaishali"=25)
char_vec
char_vec["arpita"]
Output
shubham arpita vaishali arpita
22 23 25 23

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 17


3) Indexing with a logical vector
• In logical indexing, it returns the values of those positions whose
corresponding position has a logical vector TRUE.
• Let see an example to understand how it is performed on vectors.
Example:
a<-c(1,2,3,4,5,6)
a[c(TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)]
Output:
[1] 1 3 4 6

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 18


Vector Operation:
• In R, there are various operation which is performed on the vector.
• We can add, subtract, multiply or divide two or more vectors from each
other.
• In data science, R plays an important role, and operations are required for
data manipulation.
• There are the following types of operation which are performed on the
vector.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 19


Combining vectors:
• The c() function is not only used to create a vector, but also it is also used to
combine two vectors.
• By combining one or more vectors, it forms a new vector which contains all
the elements of each vector.
• Let see an example to see how c() function combines the vectors.
Example:
p<-c(1,2,4,5,7,8)
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
r<-c(p,q)
Output
[1] "1" "2" "4" "5" "7" "8"
[7] "shubham" "arpita" "nishka" "gunjan" "vaishali" "sumit"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 20


2) Arithmetic operations:
• We can perform all the arithmetic operation on vectors.
• The arithmetic operations are performed member-by-member on vectors.
• We can add, subtract, multiply, or divide two vectors.
• Let see an example to understand how arithmetic operations are performed
on vectors.
Example:
a<-c(1,3,5,7) Output:
b<-c(2,4,6,8) [1] 3 7 11 15
a+b [1] -1 -1 -1 -1
a-b [1] 2 12 30 56
a/b [1] 0.500 0.750 0.833 0.875
a%%b [1] 1 3 5 7

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 21


3) Logical Index vector:
• With the help of the logical index vector in R, we can form a new vector
from a given vector.
• This vector has the same length as the original vector.
• The vector members are TRUE only when the corresponding members of the
original vector are included in the slice; otherwise, it will be false.
• Let see an example to understand how a new vector is formed with the help
of logical index vector.
Example:
a<-c("Shubham","Arpita","Nishka","Vaishali","Sumit","Gunjan")
b<-c(TRUE,FALSE,TRUE,TRUE,FALSE,FALSE)
a[b]
Output: [1] "Shubham" "Nishka" "Vaishali"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 22


4) Numeric Index:
• In R, we specify the index between square braces [ ] for indexing a numerical
value.
• If our index is negative, it will return us all the values except for the index
which we have specified.
• For example, specifying [-3] will prompt R to convert -3 into its absolute
value and then search for the value which occupies that index.
Example:
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
q[2]
q[-4]
q[15]
Output:

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 23


[1] "arpita"
[1] "shubham" "arpita" "nishka" "vaishali" "sumit"
[1] NA
5) Duplicate Index:
• An index vector allows duplicate values which means we can access one
element twice in one operation.
• Let see an example to understand how duplicate index works.
Example:
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
q[c(2,4,4,3)]
Output:
[1] "arpita" "gunjan" "gunjan" "nishka"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 24


6) Range Indexes:
• Range index is used to slice our vector to form a new vector.
• For slicing, we used colon(:) operator.
• Range indexes are very helpful for the situation involving a large operator.
• Let see an example to understand how slicing is done with the help of the
colon operator to form a new vector.
Example:
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
b<-q[2:5]
b
Output:
[1] "arpita" "nishka" "gunjan" "vaishali"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 25


7) Out-of-order Indexes:
• In R, the index vector can be out-of-order.
• Below is an example in which a vector slice with the order of first and second
values reversed.
Example:
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")b<-q[2:5]
q[c(2,1,3,4,5,6)]
Output:
[1] "arpita" "shubham" "nishka" "gunjan" "vaishali" "sumit"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 26


8) Named vectors members:
We first create our vector of characters as:
z=c("TensorFlow","PyTorch")
z
Output:
[1] "TensorFlow" "PyTorch"
• Once our vector of characters is created, we name the first vector member as
"Start" and the second member as "End" as:
names(z)=c("Start","End")
z
Output:
Start End
"TensorFlow" "PyTorch"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 27


• We retrieve the first member by its name as follows:
z["Start"]
Output:
Start
"TensorFlow"
• We can reverse the order with the help of the character string index vector.
z[c("Second","First")]
Output:
Second First
"PyTorch" "TensorFlow"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 28


Applications of vectors:
• In machine learning for principal component analysis vectors are used.
• They are extended to eigenvalues and eigenvector and then used for
performing decomposition in vector spaces.
• The inputs which are provided to the deep learning model are in the form of
vectors.
• These vectors consist of standardized data which is supplied to the input layer
of the neural network.
• In the development of support vector machine algorithms, vectors are used.
• Vector operations are utilized in neural networks for various operations like
image recognition and text processing.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 29


Matrices:
• In R, a two-dimensional rectangular data set is known as a matrix.
• A matrix is created with the help of the vector input to the matrix function.
• On R matrices, we can perform addition, subtraction, multiplication, and
division operation.
• In the R matrix, elements are arranged in a fixed number of rows and
columns.
• The matrix elements are the real numbers.
• In R, we use matrix function, which can easily reproduce the memory
representation of the matrix.
• In the R matrix, all the elements must share a common basic type.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 30


Example:
matrix1<-matrix(c(11, 13, 15, 12, 14, 16),nrow =2, ncol =3, byrow = TRUE)
matrix1
Output:
[,1] [,2] [,3]
[1,] 11 13 15
[2,] 12 14 16

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 31


Creating Matrices:
• Like vector and list, R provides a function which creates a matrix.
• R provides the matrix() function to create a matrix.
• This function plays an important role in data analysis.
• There is the following syntax of the matrix in R:
matrix(data, nrow, ncol, byrow, dim_name)
data:
• The first argument in matrix function is data.
• It is the input vector which is the data elements of the matrix.
nrow:
• The second argument is the number of rows which we want to create in the
matrix.
ncol:

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 32


• The third argument is the number of columns which we want to create in
the matrix.
byrow:
• The byrow parameter is a logical clue.
• If its value is true, then the input vector elements are arranged by row.
dim_name:
• The dim_name parameter is the name assigned to the rows and columns.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 33


Let's see an example to understand how matrix function is used to create a matrix
and arrange the elements sequentially by row or column.
Example
#Arranging elements sequentially by row.
P <- matrix(c(5:16), nrow = 4, byrow = TRUE)
print(P)
Output:
[,1] [,2] [,3]
[1,] 5 6 7
[2,] 8 9 10
[3,] 11 12 13
[4,] 14 15 16

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 34


# Arranging elements sequentially by column.
Q <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(Q)
Output:
[,1] [,2] [,3]
[1,] 3 7 11
[2,] 4 8 12
[3,] 5 9 13
[4,] 6 10 14

# Defining the column and row names.


row_names = c("row1", "row2", "row3", "row4")
col_names = c("col1", "col2", "col3")

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 35


R <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(row_names, col
_names))
print(R)
Output:
col1 col2 col3
row1 3 4 5
row2 6 7 8
row3 9 10 11
row4 12 13 14

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 36


Accessing(or) Sub-Setting matrix elements in R:
• Like C and C++, we can easily access the elements of our matrix by using
the index of the element.
• There are three ways to access the elements from the matrix.
• We can access the element which presents on nth row and mth column.
• We can access all the elements of the matrix which are present on the nth
row.
• We can also access all the elements of the matrix which are present on the
mth column.

Let see an example to understand how elements are accessed from the matrix
present on nth row mth column, nth row, or mth column.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 37


Example
# Defining the column and row names.
row_names = c("row1", "row2", "row3", "row4")
ccol_names = c("col1", "col2", "col3")
#Creating matrix
R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col
_names))
print(R)
Output:
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 38
#Accessing element present on 3rd row and 2nd column
print(R[3,2])
Output: [1] 12
#Accessing element present in 3rd row
print(R[3,])
Output:
col1 col2 col3
11 12 13
#Accessing element present in 2nd column
print(R[,2])
Output:
row1 row2 row3 row4
6 9 12 15

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 39


Modification of the matrix:
• R allows us to do modification in the matrix.
• There are several methods to do modification in the matrix, which are as
follows:

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 40


Assign a single element:
• In matrix modification, the first method is to assign a single element to
the matrix at a particular position.
• By assigning a new value to that position, the old value will get replaced
with the new one.
• This modification technique is quite simple to perform matrix
modification.
• The basic syntax for it is as follows:
matrix[n, m]<-y
• Here, n and m are the rows and columns of the element, respectively.
• And, y is the value which we assign to modify our matrix.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 41


Let see an example to understand how modification will be done:
Example:
# Defining the column and row names.
row_names = c("row1", "row2", "row3", "row4")
col_names = c("col1", "col2", "col3")
R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col
_names))
print(R)
Output:
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 42
#Assigning value 20 to the element at 3rd row and 2nd column
Y = R[3,2]<-20
Print(Y)
print(R)
Output:
[1] 100
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 20 13
row4 14 15 16

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 43


Use of Relational Operator:
• R provides another way to perform matrix medication.
• In this method, we used some relational operators like , <, ==.
• Like the first method, the second method is quite simple to use.
Let see an example to understand how this method modifies the matrix.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 44


Example 1
# Defining the column and row names.
row_names = c("row1", "row2", "row3", "row4")
col_names = c("col1", "col2", "col3")
R = matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col
_names))
print(R)
Output
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 45


#Replacing element that equal to the 12 into 0
R[R==12] = 100
print(R)
Output
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 0 13
row4 14 15 16

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 46


Example 2
# Defining the column and row names.
row_names = c("row1", "row2", "row3", "row4")
col_names = c("col1", "col2", "col3")
R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col
_names))
print(R)
Output
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 47


#Replacing elements whose values are greater than 12
R[R12] = 0
print(R)
Output
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 0
row4 0 0 0

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 48


Addition of Rows and Columns
• The third method of matrix modification is through the addition of rows
and columns using the cbind() and rbind() function.
• The cbind() and rbind() function are used to add a column and a row
respectively.
• Let see an example to understand the working of cbind() and rbind()
functions.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 49


Example
# Defining the column and row names.
rows = c("row1", "row2", "row3", "row4")
cols = c("col1", "col2", "col3")
R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(rows, cols))
print(R)
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 50


#Adding row
rbind(R,c(17,18,19))
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
17 18 19

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 51


#Adding column
cbind(R,c(17,18,19,20))
col1 col2 col3
row1 5 6 7 17
row2 8 9 10 18
row3 11 12 13 19
row4 14 15 16 20
#transpose of the matrix using the t() function:
t(R)
row1 row2 row3 row4
col1 5 8 11 14
col2 6 9 12 15
col3 7 10 13 16

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 52


#Modifying the dimension of the matrix using the dim() function
dim(R)<-c(1,12)
print(R)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 5 8 11 14 6 9 12 15 7 10 13 16

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 53


Matrix operations
• In R, we can perform the mathematical operations on a matrix such as
addition, subtraction, multiplication, etc.
• For performing the mathematical operation on the matrix, it is required
that both the matrix should have the same dimensions.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 54


Let see an example to understand how mathematical operations are performed on
the matrix.
Example:
R <- matrix(c(5:16), nrow = 4,ncol=3)
S <- matrix(c(1:12), nrow = 4,ncol=3)
#Addition
sum<-R+S
print(sum)
[,1] [,2] [,3]
[1,] 6 14 22
[2,] 8 16 24
[3,] 10 18 26
[4,] 12 20 28

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 55


#Subtraction
sub<-R-S
print(sub)
[,1] [,2] [,3]
[1,] 4 4 4
[2,] 4 4 4
[3,] 4 4 4
[4,] 4 4 4

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 56


#Multiplication
mul<-R*S
print(mul)
[,1] [,2] [,3]
[1,] 5 45 117
[2,] 12 60 140
[3,] 21 77 165
[4,] 32 96 192

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 57


#Multiplication by constant
mul1<-R*12
print(mul1)
[,1] [,2] [,3]
[1,] 60 108 156
[2,] 72 120 168
[3,] 84 132 180
[4,] 96 144 192

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 58


#Division
div<-R/S
print(div)
[,1] [,2] [,3]
[1,] 5.000000 1.800000 1.444444
[2,] 3.000000 1.666667 1.400000
[3,] 2.333333 1.571429 1.363636
[4,] 2.000000 1.500000 1.333333

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 59


Applications of matrix
• In geology, Matrices takes surveys and plot graphs, statistics, and used to
study in different fields.
• Matrix is the representation method which helps in plotting common
survey things.
• In robotics and automation, Matrices have the topmost elements for the
robot movements.
• Matrices are mainly used in calculating the gross domestic products in
Economics, and it also helps in calculating the capability of goods and
products.
• In computer-based application, matrices play a crucial role in the
creation of realistic seeming motion.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 60


Arrays:
• In R, arrays are the data objects which allow us to store data in more than
two dimensions.
• In R, an array is created with the help of the array() function.
• This array() function takes a vector as an input and to create an array it uses
vectors values in the dim parameter.
For example- if we will create an array of dimension (2, 3, 4) then it will create 4
rectangular matrices of 2 row and 3 columns.
R Array Syntax
There is the following syntax of R arrays:
array_name <- array(data, dim= (row_size, column_size, matrices, dim_nam
es))

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 61


data
• The data is the first argument in the array() function.
• It is an input vector which is given to the array.
matrices
• In R, the array consists of multi-dimensional matrices.
row_size
• This parameter defines the number of row elements which an array can
store.
column_size
• This parameter defines the number of columns elements which an array can
store.
dim_names :
• This parameter is used to change the default names of rows and columns.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 62


Creating an Array:
• In R, array creation is quite simple.
• We can easily create an array using vector and array() function.
• In array, data is stored in the form of the matrix.
There are only two steps to create a matrix which are as follows
1. In the first step, we will create two vectors of different lengths.
2. Once our vectors are created, we take these vectors as inputs to the array.
Let see an example to understand how we can implement an array with the help
of the vectors and array() function.
Example:
#Creating two vectors of different lengths
vec1 <-c(1,3,5)
vec2 <-c(10,11,12,13,14,15)
INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 63
#Taking these vectors as input to the array
res <- array(c(vec1,vec2),dim=c(3,3,2))
print(res)
Output
,,1 ,,2
[,1] [,2] [,3] [,1] [,2] [,3]
[1,] 1 10 13 [1,] 1 10 13
[2,] 3 11 14 [2,] 3 11 14
[3,] 5 12 15 [3,] 5 12 15

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 64


Naming rows and columns:
• In R, we can give the names to the rows, columns, and matrices of the
array.
• This is done with the help of the dim name parameter of the array()
function.
• It is not necessary to give the name to the rows and columns.
• It is only used to differentiate the row and column for better
understanding.
• Below is an example, in which we create two arrays and giving names to
the rows, columns, and matrices.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 65


Example
#Creating two vectors of different lengths
vec1 <-c(1,3,5)
vec2 <-c(10,11,12,13,14,15)
#Initializing names for rows, columns and matrices
col_names <- c("Col1","Col2","Col3")
row_names <- c("Row1","Row2","Row3")
matrix_names <- c("Matrix1","Matrix2")
#Taking the vectors as input to the array
res <- array(c(vec1,vec2),dim=c(3,3,2),dimnames=list(row_names,col_names,m
atrix_names))
print(res)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 66


Output
, , Matrix1
Col1 Col2 Col3
Row1 1 10 13
Row2 3 11 14
Row3 5 12 15

, , Matrix2
Col1 Col2 Col3
Row1 1 10 13
Row2 3 11 14
Row3 5 12 15

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 67


Accessing array elements
• Like C or C++, we can access the elements of the array.
• The elements are accessed with the help of the index.
• Simply, we can access the elements of the array with the help of the
indexing method.
Let see an example to understand how we can access the elements of the array
using the indexing method.
Example:
# An array with one dimension with values ranging from 1 to 24
thisarray <- c(1:24)
thisarray
Output:
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 68


# An array with more than one dimension
multiarray <- array(thisarray, dim = c(4, 3, 2))
multiarray
Output:
,,1 ,,2
[,1] [,2] [,3] [,1] [,2] [,3]
[1,] 1 5 9 [1,] 13 17 21
[2,] 2 6 10 [2,] 14 18 22
[3,] 3 7 11 [3,] 15 19 23
[4,] 4 8 12 [4,] 16 20 24

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 69


Manipulation of elements:
• The array is made up matrices in multiple dimensions so that the operations
on elements of an array are carried out by accessing elements of the matrices.
Example
#Creating two vectors of different lengths
vec1 <-c(1,3,5)
vec2 <-c(10,11,12,13,14,15)
#Taking the vectors as input to the array1
res1 <- array(c(vec1,vec2),dim=c(3,3,2))
print(res1)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 70


Output:

,,1 ,,2
[,1] [,2] [,3] [,1] [,2] [,3]
[1,] 1 10 13 [1,] 1 10 13
[2,] 3 11 14 [2,] 3 11 14
[3,] 5 12 15 [3,] 5 12 15

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 71


#Creating two vectors of different lengths
vec1 <-c(8,4,7)
vec2 <-c(16,73,48,46,36,73)
#Taking the vectors as input to the array2
res2 <- array(c(vec1,vec2),dim=c(3,3,2))
print(res2)
Output:
,,1 ,,2
[,1] [,2] [,3] [,1] [,2] [,3]
[1,] 8 16 46 [1,] 8 16 46
[2,] 4 73 36 [2,] 4 73 36
[3,] 7 48 73 [3,] 7 48 73

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 72


#Creating matrices from these arrays
mat1 <- res1[,,2]
mat2 <- res2[,,2]
res3 <- mat1+mat2
print(res3)
Output:
[,1] [,2] [,3]
[1,] 9 26 59
[2,] 7 84 50
[3,] 12 60 88

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 73


Calculations across array elements
• For calculation purpose, r provides apply() function.
• This apply function contains three parameters i.e., x, margin, and function.
• This function takes the array on which we have to perform the calculations.
• The basic syntax of the apply() function is as follows:
apply(x, margin, fun)
• Here, x is an array, and a margin is the name of the dataset which is used and
fun is the function which is to be applied to the elements of the array.
Example
#Creating two vectors of different lengths
vec1 <-c(1,3,5)
vec2 <-c(10,11,12,13,14,15)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 74


#Taking the vectors as input to the array1
res1 <- array(c(vec1,vec2),dim=c(3,3,2))
print(res1)
Output
,,1 ,,2
[,1] [,2] [,3] [,1] [,2] [,3]
[1,] 1 10 13 [1,] 1 10 13
[2,] 3 11 14 [2,] 3 11 14
[3,] 5 12 15 [3,] 5 12 15

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 75


#using apply function
result <- apply(res1,c(1),sum)
print(result)
Output:
[1] 48 56 64

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 76


Class in R:
• Classes and Objects are basic concepts of Object-Oriented Programming that
revolve around the real-life entities.
• Everything in R is an object.
• An object is simply a data structure that has some methods and attributes.
• A class is just a blueprint or a sketch of these objects. It represents the set of
properties or methods that are common to all objects of one type.
• Unlike most other programming languages, R has a three-class system.
• These are S3, S4, and Reference Classes.
S3 Class:
• S3 is the simplest yet the most popular OOP system and it lacks formal
definition and structure.
• An object of this type can be created by just adding an attribute to it.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 77


Following is an example to make things more clear:
Example:
# create a list with required components
movieList <- list(name = "Iron man", leadActor = "Robert Downey Jr")
# give a name to your class
class(movieList) <- "movie"
movieList
Output:
$name
[1] "Iron man"
$leadActor
[1] "Robert Downey Jr"
• In S3 systems, methods don’t belong to the class.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 78


• They belong to generic functions.
• It means that we can’t create our own methods here, as we do in other
programming languages like C++ or Java.
• But we can define what a generic method (for example print) does when
applied to our objects.
print(movieList)
Output:
$name
[1] "Iron man"
$leadActor
[1] "Robert Downey Jr"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 79


Example: Creating a user-defined print function
# now let us write our method
print.movie <- function(obj)
{
cat("The name of the movie is", obj$name,".\n")
cat(obj$leadActor, "is the lead actor.\n")
}
Output:
The name of the movie is Iron man .
Robert Downey Jr is the lead actor.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 80


S4 Class
• Programmers of other languages like C++, Java might find S3 to be very much
different than their normal idea of classes as it lacks the structure that classes
are supposed to provide.
• S4 is a slight improvement over S3 as its objects have a proper definition and
it gives a proper structure to its objects.
Example:
library(methods)
# definition of S4 class
setClass("movies", slots=list(name="character", leadActor = "character"))
# creating an object using new() by passing class name and slot values
movieList <- new("movies", name="Iron man", leadActor = "Robert Downey Jr")
movieList

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 81


Output:
An object of class "movies"
Slot "name":
[1] "Iron man"
Slot "leadActor":
[1] "Robert Downey Jr"
• As shown in the above example, setClass() is used to define a class
and new() is used to create the objects.
• The concept of methods in S4 is similar to S3, i.e., they belong to generic
functions.
The following example shows how to create a method:
Example:
# using setMethod to set a method

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 82


setMethod("show", "movies",
function(object)
{
cat("The name of the movie is ", object@name, ".\n")
cat(object@leadActor, "is the lead actor.\n")
}
)
movieList
Output:
[1] "show"
The name of the movie is Iron man .
Robert Downey Jr is the lead actor.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 83


Reference Class
• Reference Class is an improvement over S4 Class.
• Here the methods belong to the classes.
• These are much similar to object-oriented classes of other languages.
• Defining a Reference class is similar to defining S4 classes.
• We use setRefClass() instead of setClass() and “fields” instead of “slots”.
Example:
library(methods)
# setRefClass returns a generator
movies <- setRefClass("movies", fields = list(name = "character", leadActor =
"character", rating = "numeric"))
#now we can use the generator to create objects

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 84


movieList <- movies(name = "Iron Man", leadActor = "Robert downey Jr", rating =
7)
movieList
Output:
Reference class object of class "movies"
Field "name":
[1] "Iron Man"
Field "leadActor":
[1] "Robert downey Jr"
Field "rating":
[1] 7

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 85


Factors
Introduction to Factors:
• The factor is a data structure which is used for fields which take only
predefined finite number of values.
• These are the variable which takes a limited number of different values.
• These are the data objects which are used to categorize the data and to store
it on multiple levels.
• It can store both integers and strings values, and are useful in the column that
has a limited number of unique values.
• Factors have labels which are associated with the unique integers stored in it.
• It contains predefined set value known as levels and by default R always sorts
levels in alphabetical order.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 86


Attributes of a factor
There are the following attributes of a factor in R
1. X
It is the input vector which is to be transformed into a factor.
2. levels
It is an input vector that represents a set of unique values which are taken by
x.
3. labels
It is a character vector which corresponds to the number of labels.
4. Exclude
It is used to specify the value which we want to be excluded,
5. ordered
It is a logical attribute which determines if the levels are ordered.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 87


6. nmax
It is used to specify the upper bound for the maximum number of level.
Creating a Factor:
• The command used to create or modify a factor in R language is
– factor() with a vector as input.
The two steps to creating an R factor :
• Creating a vector
• Converting the vector created into a factor using function factor()
Examples: Let us create a factor gender with levels female, male and
transgender.
# Creating a vector
x <-c("female", "male", "male", "female")
print(x)
INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 88
Output:
[1] "female" "male" "male" "female"

# Converting the vector x into a factor


# named gender
gender <-factor(x)
print(gender)
Output:
[1] female male male female
Levels: female male

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 89


Factor Levels:
• A factor is a data structure in R for storing categorical data.
• Factors can have levels, which are the unique values that a factor variable
can take.
• Levels are the distinct values or categories in a factor.
• Levels are stored in a specific order, which can affect analyses, especially in
modelling, since R treats factor levels as ordered or unordered.
Syntax:
levels(x) <- value

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 90


Creating Factors with Levels:
Example:
# Create a character vector
gender <- c("Male", "Female", "Female", "Male", "Male")
# Convert the character vector to a factor
gender_factor <- factor(gender)
# Print the factor
print(gender_factor)
# Print the levels of the factor
levels(gender_factor)
Output:
[1] Male Female Female Male Male
Levels: Female Male
INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 91
Specifying Levels in Order
• When creating factors, you can specify the levels explicitly and control
their order.
• This is especially useful for ordinal data.
Example:
# Creating a factor with ordered levels
education <- factor(c("High School", "College", "Master", "PhD", "College"),
levels = c("High School", "College", "Master", "PhD"),
ordered = TRUE)
# Print the factor and levels
print(education)
levels(education)
Output:

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 92


[1] High School College Master PhD College
Levels: High School < College < Master < PhD

Changing Levels:
You can also modify levels using the levels() function.
# Changing levels of a factor
levels(gender_factor) <- c("F", "M")
print(gender_factor)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 93


Summarizing a Factor:
• In R, summarizing a factor variable gives you a quick view of the frequency
of each level within that factor.
• This can be particularly useful in exploring categorical data to understand the
distribution of categories or levels in your dataset.
• R provides simple and effective ways to summarize factors.
Basic Summary of a Factor
• When you use the summary() function on a factor, it returns a table showing
the count (frequency) of each level in the factor.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 94


Example:
# Create a factor variable
color <- factor(c("Red", "Blue", "Green", "Blue", "Red", "Green", "Red"))
# Summarize the factor
summary(color)
Output:
Blue Green Red
2 2 3

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 95


Calculating Proportions
• To get proportions instead of counts, you can use prop.table() with table().
prop.table(table(color))
Output:
color
Blue Green Red
0.2857143 0.2857143 0.4285714

Converting Factor Levels into Data Frames


• You can also convert the summary into a data frame for further
analysis or visualization.
# Convert factor summary to a data frame
color_summary <- as.data.frame(table(color))
INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 96
print(color_summary)
Output:
color Freq
1 Blue 2
2 Green 2
3 Red 3
Ordered Factors:
• In R, ordered factors are a special type of factor where the levels have a
specific, meaningful order.
• They are particularly useful for ordinal categorical data, where the categories
have an inherent ranking, such as education level, satisfaction scores, or
income brackets.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 97


Difference Between Factors and Ordered Factors
• Factors: Levels are treated as categories without any order.
• Ordered Factors: Levels have a specific order, and this order is considered in
analysis.
Creating Ordered Factors:
# Create a vector of education levels
education <- c("High School", "College", "Master", "PhD", "College")
# Convert to an ordered factor
education_factor <- factor(education,
levels = c("High School", "College", "Master", "PhD"),
ordered = TRUE)
# Print the ordered factor
print(education_factor)
INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 98
Output:
[1] High School College Master PhD College
Levels: High School < College < Master < PhD

Comparing Ordered Factor Levels:


• Ordered factors allow comparisons between levels, unlike regular factors.
education_factor[1] < education_factor[2] # Is "High School" < "College"?
education_factor[3] education_factor[2] # Is "Master" "College"?
Output:
[1] TRUE
[1] TRUE

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 99


Visualization of Ordered Factors
• You can visualize ordered factors using bar plots or box plots. Here's an
example:
# Plot the ordered factor
barplot(table(education_factor),
main = "Education Levels",
xlab = "Levels",
ylab = "Frequency",
col = "skyblue")

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 100


INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 101
Data Frames
Introduction to Data Frame:
• A data frame is a two-dimensional array-like structure or a table in which a
column contains values of one variable, and rows contains one set of values
from each column.
• A data frame is a special case of the list in which each component has equal
length.
• A data frame is used to store data table and the vectors which are present in
the form of a list in a data frame, are of equal length.
• In a simple way, it is a list of equal length vectors.
• A matrix can contain one type of data, but a data frame can contain different
data types such as numeric, character, factor, etc.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 102


There are following characteristics of a data frame.
• The columns name should be non-empty.
• The rows name should be unique.
• The data which is stored in a data frame can be a factor, numeric, or character
type.
• Each column contains the same number of data items.
Creating Data Frame:
• In R, the data frames are created with the help of frame() function of data.
• This function contains the vectors of any type such as numeric, character, or
integer.
• In below example, we create a data frame that contains employee id (integer
vector), employee name(character vector), salary(numeric vector), and
starting date(Date vector).

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 103


Example:
# Creating the data frame.
emp.data<- data.frame(
emp_id = c (1:5),
empname = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
sal = c(623.3,915.2,611.0,729.0,843.25), starting_date = as.Date(c("2012-01-01",
"2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")), stringsAsFactors =
FALSE )
# Printing the data frame.
print(emp.data)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 104


Output:
emp_id emp_name sal starting_date
1 1 Shubham 623.30 2012-01-01
2 2 Arpita 915.20 2013-09-23
3 3 Nishka 611.00 2014-11-15
4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 105


Sub setting of Data Frames:
• The data of the data frame is very crucial for us.
• To manipulate the data of the data frame, it is essential to extract it from the
data frame.
We can extract the data in three ways which are as follows:
1. We can extract the specific columns from a data frame using the column
name.
2. We can extract the specific rows also from a data frame.
3. We can extract the specific rows corresponding to specific columns.
• In some cases, it is required to find the statistical summary and nature of the data
in the data frame.
• R provides the summary() function to extract the statistical summary and nature
of the data.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 106


• This function takes the data frame as a parameter and returns the statistical
information of the data.
An example to understand how this function is used in R:
# Creating the data frame.
emp.data<- data.frame(
emp_id = c (1:5),
empname = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
sal = c(623.3,915.2,611.0,729.0,843.25),
starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE )
# Printing the data frame.
print(emp.data)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 107


Output:
emp_id emp_name sal starting_date
1 1 Shubham 623.30 2012-01-01
2 2 Arpita 915.20 2013-09-23
3 3 Nishka 611.00 2014-11-15
4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27
#Printing the summary
print(summary(emp.data))
emp_id emp_name sal starting_date
Min. : 1 Length:5 Min. :515.2 Min. :2012-01-01
1st Qu.: 2 Class : character 1st Qu.:611.0 1st Qu.:2013-09-23
Median : 3 Mode :character Median :623.3 Median :2014-05-11
INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 108
Mean : 3 Mean :664.4 Mean :2014-01-14
3rd Qu.: 4 3rd Qu.:729.0 3rd Qu.:2014-11-15
Max. : 5 Max. :843.2 Max. :2015-03-27

Extending Data Frames:


• R allows us to do modification in our data frame.
• Like matrices modification, we can modify our data frame through re-
assignment.
• We cannot only add rows and columns, but also we can delete them. The data
frame is expanded by adding rows and columns.
We can
1. Add a column by adding a column vector with the help of a new column
name using cbind() function.
INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 109
2. Add rows by adding new rows in the same structure as the existing data frame
and using rbind() function
3. Delete the columns by assigning a NULL value to them.
4. Delete the rows by re-assignment to them.

1. Adding a New Column


• You can add a new column to a data frame by using the $ operator or
indexing.
# Create a sample data frame
df <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30))
# Add a new column using $
df$Gender <- c("Female", "Male")
# Add a new column using indexing

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 110


df["Salary"] <- c(50000, 60000)
print(df)
Output:
Name Age Gender Salary
1 Alice 25 Female 50000
2 Bob 30 Male 60000

2. Adding a New Row


• You can add a new row to a data frame using the rbind() function.
# Create a sample data frame
df <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30))
# Add a new row
new_row <- data.frame(Name = "Charlie", Age = 35)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 111


df <- rbind(df, new_row)
print(df)
Output:
Name Age
1 Alice 25
2 Bob 30
3 Charlie 35

3. Combining Two Data Frames (Row-wise)


• Use rbind() or cbind() to combine data frames.
df1 <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30))
df2 <- data.frame(Name = c("Charlie", "Diana"), Age = c(35, 28))
# Combine row-wise

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 112


df_combined <- rbind(df1, df2)
print(df_combined)
Output:
Name Age
1 Alice 25
2 Bob 30
3 Charlie 35
4 Diana 28
Combining Column-wise:
df1 <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30))
df2 <- data.frame(Gender = c("Female", "Male"))
# Combine column-wise
df_combined <- cbind(df1, df2)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 113


print(df_combined)
Output:
Name Age Gender
1 Alice 25 Female
2 Bob 30 Male
Sorting Data Frames.
• Sorting a data frame in R is a common task during data manipulation and
analysis.
• You can sort by one or more columns, either in ascending or descending
order.
Here's how to sort data frames in R with different approaches:

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 114


1. Using order() Function
• The order() function is a powerful tool for sorting. It can be applied to sort
rows of a data frame by one or more columns.
Example 1: Sort by a Single Column
# Create a sample data frame
df <- data.frame(Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 22),
Salary = c(50000, 60000, 55000))
# Sort by Age (ascending)
df_sorted <- df[order(df$Age), ]
print(df_sorted)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 115


Output:
Name Age Salary
3 Charlie 22 55000
1 Alice 25 50000
2 Bob 30 60000

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 116


Example 2: Sort by a Column in Descending Order
# Sort by Age (descending)
df_sorted <- df[order(-df$Age), ]
print(df_sorted)

Output:
Name Age Salary
2 Bob 30 60000
1 Alice 25 50000
3 Charlie 22 55000

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 117


Sorting Row Names
• If you want to sort rows based on their row names, you can use order() on
rownames().
Example:
# Create a data frame with row names
df <- data.frame(Age = c(25, 30, 22), Salary = c(50000, 60000, 55000))
rownames(df) <- c("Alice", "Bob", "Charlie")
# Sort by row names Output:
df_sorted <- df[order(rownames(df)), ] Age Salary
print(df_sorted) Alice 25 50000
Bob 30 60000
Charlie 22 55000

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 118


Using data.table Package
• The data.table package is optimized for high-performance sorting,
especially with large datasets.
Example:
library(data.table)
# Convert to data.table
Output:
dt <- as.data.table(df)
Age Salary
# Sort by Age
<num> <num>
dt_sorted <- dt[order(Age)]
1: 22 55000
print(dt_sorted)
2: 25 50000
3: 30 60000

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 119


Custom Sorting
• For custom sorting (e.g., sorting categorical variables based on a predefined
order), you can use factors with levels.
Example:
df <- data.frame(Name = c("Charlie", "Alice", "Bob"),
Age = c(22, 25, 30)) Output:
# Custom sorting order Name Age
custom_order <- c("Alice", "Bob", "Charlie") 2 Alice 25
# Convert Name to a factor with custom levels 3 Bob 30
df$Name <- factor(df$Name, levels = custom_order) 1 Charlie 22
# Sort by Name
df_sorted <- df[order(df$Name), ]
print(df_sorted)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 120


Lists
Introduction:
• In R, lists are the second type of vector.
• Lists are the objects of R which contain elements of different types such as
number, vectors, string and another list inside it.
• It can also contain a function or a matrix as its elements.
• A list is a data structure which has components of mixed data types.
• We can say, a list is a generic vector which contains other objects.
• A list in R is created with the use of the list() function.
• R allows accessing elements of an R list with the use of the index value.
• In R, the indexing of a list starts with 1 instead of 0.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 121


Example
vec <- c(3,4,5,6)
char_vec<-c("shubham","nishka","gunjan","sumit")
logic_vec<-c(TRUE,FALSE,FALSE,TRUE)
out_list<-list(vec,char_vec,logic_vec)
out_list
Output:
[[1]]
[1] 3 4 5
[[2]]
[1] "shubham" "nishka" "gunjan" "sumit"
[[3]]
[1] TRUE FALSE FALSE TRUE

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 122


creating a List:
• The process of creating a list is the same as a vector.
• In R, the vector is created with the help of c() function.
• Like c() function, there is another function, i.e., list() which is used to create
a list in R.
• A list avoid the drawback of the vector which is data type.
• We can add the elements in the list of different data types.
Syntax
list()

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 123


Example:
list1<-list(1,2,3)
list2<-list("Shubham","Arpita","Vaishali")
list3<-list(c(1,2,3))
list4<-list(TRUE,FALSE,TRUE)
list1
list2
list3
list4

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 124


Output:
[[1]] [1] "Vaishali"
[1] 1 list_3
[[2]] [[1]]
[1] 2 [1] 1 2 3
[[3]] list_4
[1] 3 [[1]]
list_2 [1] TRUE
[[1]] [[2]]
[1] "Shubham" [1] FALSE
[[2]] [[3]]
[1] "Arpita" [1] TRUE
[[3]]

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 125


Example 2: Creating the list with different data type
list_data<-list("Shubham","Arpita",c(1,2,3,4,5),TRUE,FALSE,22.5,12L)
print(list_data)
• In the above example, the list function will create a list with character,
logical, numeric, and vector element.
• It will give the following output
Output:
[[1]] [1] 1 2 3 4 5 [[6]]
[1] "Shubham" [[4]] [1] 22.5
[[2]] [1] TRUE [[7]]
[1] "Arpita" [[5]] [1] 12
[[3]] [1] FALSE

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 126


Creating a Named List:
• R provides a very easy way for accessing elements, i.e., by giving the name to
each element of a list.
• By assigning names to the elements, we can access the element easily.
There are only three steps to print the list data corresponding to the name:
1. Creating a list.
2. Assign a name to the list elements with the help of names() function.
3. Print the list data.
Example:
# Creating a list containing a vector, a matrix and a list.
list_data <- list(c("Shubham","Nishka","Gunjan"), matrix(c(40,80,60,70,90,80),
nrow = 2), list("BCA","MCA","B.tech"))

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 127


# Giving names to the elements in the list.
names(list_data) <- c("Students", "Marks", "Course")
# Show the list.
print(list_data)
Output:
$Students $Course[[1]]
[1] "Shubham" "Nishka" "Gunjan" [1] "BCA"
$Marks $Course[[2]]
[,1] [,2] [,3] [1] "MCA"
[1,] 40 60 90 $Course[[3]]
[2,] 80 70 80 [1] "B.tech"
$Course

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 128


Accessing List Elements:
• R provides two ways through which we can access the elements of a list.
• First one is the indexing method performed in the same way as a vector.
• In the second one, we can access the elements of a list with the help of names.
• It will be possible only with the named list.; we cannot access the elements
of a list using names if the list is normal.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 129


Example 1: Accessing elements using index
# Creating a list containing a vector, a matrix and a list.
list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80),
nrow = 2), list("BCA","MCA","B.tech"))
# Accessing the first element of the list.
print(list_data[1])
[[1]]
[1] "Shubham" "Arpita" "Nishka"
# Accessing the third element. The third element is also a list, so all its elements
will be printed.
print(list_data[3])
[[1]]
[[1]][[1]]

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 130


[1] "BCA"
[[1]][[2]]
[1] "MCA"
[[1]][[3]]
[1] "B.tech"
Example 2: Accessing elements using names:
# Creating a list containing a vector, a matrix and a list.
list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80),
nrow = 2),list("BCA","MCA","B.tech"))
# Giving names to the elements in the list.
names(list_data) <- c("Student", "Marks", "Course")
# Accessing the first element of the list.
print(list_data["Student"])

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 131


$Student
[1] "Shubham" "Arpita" "Nishka"
print(list_data$Marks)
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
print(list_data)
$Student
[1] "Shubham" "Arpita" "Nishka"
$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 132


$Course
$Course[[1]]
[1] "BCA"
$Course[[2]]
[1] "MCA"
$Course[[3]]
[1] "B.tech"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 133


Manipulating List Elements:
• R allows us to add, delete, or update elements in the list.
• We can update an element of a list from anywhere, but elements can add or
delete only at the end of the list.
• To remove an element from a specified index, we will assign it a null value.
• We can update the element of a list by overriding it from the new value.
Let see an example to understand how we can add, delete, or update the elements
in the list.
Example
# Creating a list containing a vector, a matrix and a list.
list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,8
0), nrow = 2), list("BCA","MCA","B.tech"))

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 134


# Giving names to the elements in the list.
names(list_data) <- c("Student", "Marks", "Course")
# Adding element at the end of the list.
list_data[4] <- "Moradabad"
print(list_data[4])
# Removing the last element.
list_data[4] <- NULL
# Printing the 4th Element.
print(list_data[4])
# Updating the 3rd Element.
list_data[3] <- "Masters of computer applications"
print(list_data[3])

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 135


Output:
[[1]]
[1] "Moradabad"

$<NA>
NULL

$Course
[1] "Masters of computer applications"

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 136


Merging Lists:
• R allows us to merge one or more lists into one list.
• Merging is done with the help of the list() function also.
• To merge the lists, we have to pass all the lists into list function as a
parameter, and it returns a list which contains all the elements which are
present in the lists.
Let see an example to understand how the merging process is done.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 137


Example:
# Creating two lists.
Even_list <- list(2,4,6,8,10)
Odd_list <- list(1,3,5,7,9)
# Merging the two lists.
merged.list <- list(Even_list,Odd_list)
# Printing the merged list.
print(merged.list)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 138


Output:
[[1]] [[2]]
[[1]][[1]] [[2]][[1]]
[1] 2 [1] 1
[[1]][[2]] [[2]][[2]]
[1] 4 [1] 3
[[1]][[3]] [[2]][[3]]
[1] 6 [1] 5
[[1]][[4]] [[2]][[4]]
[1] 8 [1] 7
[[1]][[5]] [[2]][[5]]
[1] 10 [1] 9

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 139


Converting Lists to Vectors:
• There is a drawback with the list, i.e., we cannot perform all the arithmetic
operations on list elements.
• To remove this, drawback R provides unlist() function.
• This function converts the list into vectors.
• In some cases, it is required to convert a list into a vector so that we can use
the elements of the vector for further manipulation.
• The unlist() function takes the list as a parameter and change into a vector.
Let see an example to understand how to unlist() function is used in R.

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 140


Example:
# Creating lists.
list1 <- list(10:20)
print(list1)

list2 <-list(5:14)
print(list2)
# Converting the lists to vectors.
v1 <- unlist(list1)
v2 <- unlist(list2)

print(v1)
print(v2)

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 141


Output:
[[1]]
[1] 10 11 12 13 14 15 16 17 18 19 20

[[1]]
[1] 5 6 7 8 9 10 11 12 13 14

[1] 10 11 12 13 14 15 16 17 18 19 20
[1] 5 6 7 8 9 10 11 12 13 14

INTRODUCTION TO DATA SCIENCE UNIT-3 AUTHOR: SHANMUGAM V 142

You might also like