IDS Unit-3
IDS Unit-3
VECTORS
• A vector is simply a list of items that are of the same type.
• A vector is a basic data structure which plays an important role in R programming.
• In R, a sequence of elements which share the same data type is known as vector. A vector
supports logical, integer, double, character, complex, or raw data type. The elements which
are contained in vector known as components of the vector. We can check the type of vector
with the help of the typeof() function.
• To combine the list of items to a vector, use the c() function and separate the items by a comma.
• Vectors in R are the same as the arrays in C language which are used to hold multiple data values
of the same type. One major key point is that in R the indexing of the vector will start from ‘1’
and not from ‘0’. We can create numeric vectors and character vectors as well.
• The length is an important property of a vector. A vector length is basically the number
of elements in the vector, and it is calculated with the help of the length() function.
MATRICES
• Matrix is a rectangular arrangement of numbers in rows and columns. In a matrix, as we know
rows are the ones that run horizontally and columns are the ones that run vertically.
• In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created
with the help of the vector input to the matrix function. On R matrices, we can perform addition,
subtraction, multiplication, and division operation.
• In the R matrix, elements are arranged in a fixed number of rows and columns. The
matrix elements are the real numbers. In R, we use matrix function, which can easily reproduce
the memory representation of the matrix. In the R matrix, all the elements must share a
common basic type.
CREATING AND NAMING MATRICES
• To create a matrix in R you need to use the function called matrix(). The arguments to this
matrix() are the set of elements in the vector.
• You have to pass how many numbers of rows and how many numbers of columns you want to
have in your matrix. Note: By default, matrices are in column-wise order.
• Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular
layout. They contain elements of the same atomic types. Though we can create a matrix
containing only characters or only logical values, they are not of much use. We use
matrices containing numeric elements to be used in mathematical calculations.
• A Matrix is created using the matrix() function.
Syntax: matrix(data, nrow, ncol, byrow, dimnames)
Following is the description of the parameters used –
➢ data is the input vector which becomes the data elements of the matrix.
➢ nrow is the number of rows to be created.
➢ ncol is the number of columns to be created.
➢ byrow is a logical value. If TRUE then the input vector elements are arranged by
row.( byrow is a logical variable. Matrices are by default column-wise. By setting
byrow as TRUE, we can arrange the data row- wise in the matrix).
➢ dimname is the names assigned to the rows and columns(takes two character arrays
as input for row names and column names).
create a matrix in R:
Like vector and list, R provides a function which creates a matrix. R provides the matrix() function
to create a matrix. This function plays an important role in data analysis. There is the following
syntax of the matrix in R:
matrix(data, nrow, ncol, byrow, dim_name)
data: The first argument in matrix function is data. It is the input vector which is the data elements of
the matrix.
Nrow: The second argument is the number of rows which we want to create in the matrix.
Ncol: The third argument is the number of columns which we want to create in the matrix.
Byrow: The byrow parameter is a logical clue. If its value is true, then the input vector
elements are arranged by row.
dim_name: The dim_name parameter is the name assigned to the rows and columns.
Example to understand how matrix function is used to create a matrix and arrange the elements
sequentially by row or column.
Ex1: > # Create a matrix
> thismatrix <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)
> # Print the matrix
> thismatrix
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
Ex2: > # Create a matrix and Arranging elements sequentially by row.
> A <- matrix(c(5:16), nrow = 4, byrow = TRUE)
> # Print the matrix
>A
[,1] [,2] [,3]
[1,] 5 6 7
[2,] 8 9 10
[3,] 11 12 13
[4,] 14 15 16
Ex3: > # Create a matrix and Arranging elements sequentially by column.
> B <- matrix(c(5:16), nrow = 4, byrow = FALSE)
> # Print the matrix
>B
[,1] [,2] [,3]
[1,] 5 9 13
[2,] 6 10 14
[3,] 7 11 15
[4,] 8 12 16
Ex4: > # Create a matrix with number of columns are 4
> C <- matrix(c(5:16), ncol = 4, byrow = TRUE)
> # Print the matrix
>C
[,1] [,2] [,3] [,4]
[1,] 5 6 7 8
[2,] 9 10 11 12
[3,] 13 14 15 16
Ex5: > # Create a matrix and Defining the column and row names.
> row_names = c("row1", "row2", "row3", "row4")
> col_names = c("col1", "col2", "col3")
> D<-matrix(c(3:14), nrow=4, byrow=TRUE, dimnames=list(row_names, col_names))
> # Print the matrix
>D
col1 col2 col3
row1 3 4 5
row2 6 7 8
row3 9 10 11
row4 12 13 14
Diagonal matrix: A diagonal matrix is a matrix in which the entries outside the main diagonal are all
zero. To create such a matrix the syntax is given below:
Ex: > print(diag(c(5, 3, 3), 3, 3))
[,1] [,2] [,3]
[1,] 5 0 0
[2,] 0 3 0
[3,] 0 0 3
Identity matrix: A square matrix in which all the elements of the principal diagonal are ones and all
other elements are zeros. To create such a matrix the syntax is given below:
Ex: > print(diag(1, 3, 3))
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
Accessing matrix elements in R:
There are three ways to access the elements from the matrix.
1. We can access the element which presents on nth row and mth column. Access the items by using
[ ] brackets. The first number "1" in the bracket specifies the row-position, while the second
number "2" specifies the column-position:
Ex: > # Defining the column and row names.
> row_names = c("row1", "row2", "row3", "row4")
> ccol_names = c("col1", "col2", "col3")
> A <-matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names,
col_names))
>A
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
> #Accessing element present on 3rd row and 2nd column
> print(A[3,2])
[1] 12
2. We can access all the elements of the matrix which are present on the nth row. The whole
row can be accessed if you specify a comma after the number in the bracket:
> # Accessing element present in 3rd row
> print(A[3,])
col1 col2 col3
11 12 13
3. We can also access all the elements of the matrix which are present on the mth column. The
whole column can be accessed if you specify a comma before the number in the bracket:
> #Accessing element present in 2nd column
> print(A[,2])
row1 row2 row3 row4
6 9 12 15
Modification of the matrix:
R allows us to do modification in the matrix. There are several methods to do modification in the matrix,
which are as follows:
Assign a single element: In matrix modification, the first method is to assign a single element to the
matrix at a particular position. By assigning a new value to that position, the old value will get
replaced with the new one. This modification technique is quite simple to perform matrix
modification.
The basic syntax for it is as follows: matrix[n, m]<-y
Here, n and m are the rows and columns of the element, respectively. And, y is the value which we assign
to modify our matrix.
Ex: > # Defining the column and row names.
> row_names = c("row1", "row2", "row3", "row4")
> col_names = c("col1", "col2", "col3")
> A <-matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names,
col_names))
>A
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
> # Assigning value 20 to the element at 3d row and 2nd column
> A[3,2]<-20
>A
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 20 13
row4 14 15 16
Use of Relational Operator: R provides another way to perform matrix modification. In this method,
we used some relational operators like >, <, ==. Like the first method, the second method is quite
simple to use. Let see an example to understand how this method modifies the matrix.
Ex: > # Defining the column and row names.
> row_names = c("row1", "row2", "row3", "row4")
> col_names = c("col1", "col2", "col3")
> A <-matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names,
col_names))
>A
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
> # Replacing element that equal to the 12
> A[A==12]<-0
>A
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 0 13
row4 14 15 16
> # Replacing elements whose values are greater than 10
> A[A>10]<-0
>A
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 0 0 0
row4 0 0 0
Adding of Rows and Columns: The third method of matrix modification is through the addition of
rows and columns using the cbind() and rbind() function. The cbind() and rbind() function are
used to add a column and a row respectively. Let see an example to understand the working of cbind()
and rbind() functions.
Ex: > # Defining the column and row names.
> row_names = c("row1", "row2", "row3", "row4")
> ccol_names = c("col1", "col2", "col3")
> R<-matrix(c(5:16), nrow=4, byrow=TRUE,dimnames =list(row_names,ccol_names)) > R
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
> # Adding row
> rbind(R,c(17,18,19))
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
17 18 19
> # Adding column
> cbind(R,c(17,18,19,20))
col1 col2 col3
row1 5 6 7 17
row2 8 9 10 18
row3 11 12 13 19
row4 14 15 16 20
> # Transpose of the matrix using the t() function:
> t(R)
row1 row2 row3 row4
col1 5 8 11 14
col2 6 9 12 15
col3 7 10 13 16
> # Modifying the dimension of the matrix using the dim() function
> dim(R)<-c(1,12)
>R
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 5 8 11 14 6 9 12 15 7 10 13 16
Combine Matrix using rbind() Function: We will create two matrix and combine them with the help
of rbind function.
Ex: > # create one matrix
> a1<-matrix(1:9,3,3)
> a1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> # Create second matrix
> a2<-matrix(10:18,3,3)
> a2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
[3,] 12 15 18
> # combine the both matrix
> rbind(a1,a2)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[4,] 10 13 16
[5,] 11 14 17
[6,] 12 15 18
Combine Matrix using cbind() Function: We will create two matrix and combine them with the help
of cbind function.
Ex: > # create one matrix
> a1<-matrix(1:9,3,3)
> a1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> # Create second matrix
> a2<-matrix(10:18,3,3)
> a2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
[3,] 12 15 18
> # combine the both matrix
> cbind(a1, a2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 7 10 13 16
[2,] 2 5 8 11 14 17
[3,] 3 6 9 12 15 18
> # Subtraction
> sub<-R-S
> sub
[,1] [,2] [,3]
[1,] 4 4 4
[2,] 4 4 4
[3,] 4 4 4
[4,] 4 4 4
> # Multiplication
> mul=R*S
> mul
[,1] [,2] [,3]
[1,] 5 45 117
[2,] 12 60 140
[3,] 21 77 165
[4,] 32 96 192
> # Division
> div=R/S
> div
[,1] [,2] [,3]
[1,] 5.000000 1.800000 1.444444
[2,] 3.000000 1.666667 1.400000
[3,] 2.333333 1.571429 1.363636
[4,] 2.000000 1.500000 1.333333
Matrix metrics: Matrix metrics mean once a matrix is created then
• How can you know the dimension of the matrix?
• How can you know how many rows are there in the matrix?
• How many columns are in the matrix?
Ex: > # Create A Matrix
> A = matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3, byrow = TRUE)
>A
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
> # Dimension of the matrix
> print(dim(A))
[1] 3 3
> # Number of rows
> print(nrow(A))
[1] 3
> # Number of columns
> print(ncol(A))
[1] 3
> # Number of elements
> print(length(A))
[1] 9
> print(prod(dim(A)))
[1] 9
Applications of matrix:
1. In geology, Matrices takes surveys and plot graphs, statistics, and used to study in different fields.
2. Matrix is the representation method which helps in plotting common survey things.
3. In robotics and automation, Matrices have the topmost elements for the robot movements.
4. Matrices are mainly used in calculating the gross domestic products in Economics, and it
also helps in calculating the capability of goods and products.
5. In computer-based application, matrices play a crucial role in the creation of realistic seeming
motion.
MATRIX SUBSETTING
A matrix is subset with two arguments within single brackets, [ ], and separated by a comma.
The first argument specifies the rows, and the second the columns.
Ex: > A<-matrix(1:16,4)
>A
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
> colnames(A)<-c("C1","C2","C3","C4")
> rownames(A)<-c("R1","R2","R3","R4")
>A
C1 C2 C3 C4
R1 1 5 9 13
R2 2 6 10 14
R3 3 7 11 15
R4 4 8 12 16
> # all rows with 1st column
> A[,1,drop=FALSE]
C1
R1 1
R2 2
R3 3
R4 4
> # 1st row with all column
> A[1,,drop=FALSE]
C1 C2 C3 C4
R1 1 5 9 13
> #display 1 row and 1st column, cell value
st
> A[1,1,drop=FALSE]
C1
R1 1
> # display 1st , 2nd rows and 2nd ,3rd column
> A[1:2,2:3]
C2 C3
R1 5 9
R2 6 10
> #display 1st , 2nd rows and 2nd, 4th column
> A[1:2,c(2,4)]
C2 C4
R1 5 13
R2 6 14
Note:
The only difference between vectors, matrices, and arrays are
✓ Vectors are uni-dimensional arrays
✓ Matrices are two-dimensional arrays
✓ Arrays can have more than two dimensions
CLASS
Classes and Objects are basic concepts of Object-Oriented Programming that revolve around the real-
life entities. Class is the blueprint that helps to create an object and contains its member variable along
with the attributes. It represents the set of properties or methods that are common to all objects of one
type. Everything in R is an object. An Object is simply a data structure that has some methods and
attributes. Unlike most other programming languages, R has a three-class system. These are S3, S4,
and Reference Classes.
S3 Class:
• S3 class is somewhat primitive in nature. It lacks a formal definition and object of this class can
be created simply by adding a class attribute to it.
• This simplicity accounts for the fact that it is widely used in R programming language. In fact
most of the R built-in classes are of this type.
Example: > # create a list with required components
> s <- list(name = "John", age = 21, GPA = 6.5)
> # give a name to your class
> class(s) <- "student"
>s
$name
[1] "John"
$age
[1] 21
$GPA
[1] 6.5
attr(,"class")
[1] "student"
S4 Class
• S4 class are an improvement over the S3 class. They have a formally defined structure which
helps in making object of the same class look more or less similar.
• Class components are properly defined using the setClass() function and objects are created using
the new() function.
Example: > # definition of S4 class
> setClass("student", slots=list(name="character", age="numeric", GPA="numeric"))
> # creating an object using new() by passing class name and slot values
> studentlist<-new("student", name="john", age=25, GPA=6.5)
> studentlist
An object of class "student"
Slot "name":
[1] "john"
Slot "age":
[1] 25
Slot "GPA":
[1] 6.5
Reference Class
• Reference class were introduced later, compared to the other two. Reference Class is an
improvement over S4 Class. Here the methods belong to the classes. It is more similar to the
object-oriented programming we are used to seeing in other major programming languages.
• Reference classes are basically S4 classed with an environment added to it.
• Defining a Reference class is similar to defining S4 classes. We use setRefClass() instead
of setClass() and “fields” instead of “slots”.
Example: > # setRefClass returns a generator
> students<-setRefClass("students", fields=list(name="character", age="numeric",
GPA="numeric"))
> #now we can use the generator to create objects
> studentlist<-students(name="john", age=25, GPA=6.5)
> studentlist
Reference class object of class "students"
Field "name":
[1] "john"
Field "age":
[1] 25
Field "GPA":
[1] 6.5
ARRAYS
• In R, arrays are the data objects which allow us to store data in more than two dimensions. In
R, an array is created with the help of the array() function. This array() function takes a
vector as an input and to create an array it uses vectors values in the dim parameter.
• For example- if we will create an array of dimension (2, 3, 4) then it will create 4 rectangu
lar matrices of 2 row and 3 columns.
• R Array Syntax: The following syntax of R arrays:
array_name <- array(data, dim= (row_size, column_size, matrices, dim_names))
✓ data: The data is the first argument in the array() function. It is an input vector which is
given to the array.
✓ Matrices: In R, the array consists of multi-dimensional matrices.
✓ row_size: This parameter defines the number of row elements which an array can store.
✓ column_size: This parameter defines the number of columns elements which an array
can store.
✓ dim_names: This parameter is used to change the default names of rows and
columns.
TYPES of Arrays:
Uni-Dimensional Array: A vector is a uni-dimensional array, which is specified by a single
dimension, length. A Vector can be created using ‘c()‘ function. A list of values is passed to the c()
function to create a vector.
Ex: > # create a vector
> vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
> print (vec1)
[1] 1 2 3 4 5 6 7 8 9
> # cat is used to concatenate strings and print it.
> cat ("Length of vector : ", length(vec1))
Length of vector : 9
Multi-Dimensional Array: A two-dimensional matrix is an array specified by a fixed number of
rows and columns, each containing the same data type. A matrix is created by using array() function
to which the values and the dimensions are passed.
Ex: > # arranges data from 2 to 13 in two matrices of dimensions 2x3
> arr = array(2:13, dim = c(2, 3, 2))
> print(arr)
,,1
[,1] [,2] [,3]
[1,] 2 4 6
[2,] 3 5 7
,,2
[,1] [,2] [,3]
[1,] 8 10 12
[2,] 9 11 13
Creation of arrays: In R, array creation is quite simple. We can easily create an array using vector
and array() function. In array, data is stored in the form of the matrix. There are only two steps to create
a matrix which are as follows
1. In the first step, we will create two vectors of different lengths.
2. Once our vectors are created, we take these vectors as inputs to the array.
Ex: > # Creating two vectors of different lengths
> vec1 <-c(1,3,5)
> vec2 <-c(10,11,12,13,14,15)
> # Taking these vectors as input to the array
> res <- array(c(vec1,vec2))
> print(res)
[1] 1 3 5 10 11 12 13 14 15
Naming rows and columns: In R, we can give the names to the rows, columns, and matrices of
the array. This is done with the help of the dim name parameter of the array() function. It is not
necessary to give the name to the rows and columns. It is only used to differentiate the row and
column for better understanding.
Ex: > # Creating two vectors of different lengths
> vec1 <-c(1,3,5)
> vec2 <-c(10,11,12,13,14,15)
> #Initializing names for rows, columns and matrices
> col_names <- c("Col1","Col2","Col3")
> row_names <- c("Row1","Row2","Row3")
> matrix_names <- c("Matrix1","Matrix2")
> #Taking the vectors as input to the array
> Res<-array(c(vec1,vec2), dim=c(3,3,2), dimnames=list(row_names,col_names, matrix_names))
> print(Res)
, , Matrix1
Col1 Col2 Col3
Row1 1 10 13
Row2 3 1 14
Row3 5 12 15
, , Matrix2
Col1 Col2 Col3
Row1 1 10 13
Row2 3 11 14
Row3 5 12 15
Accessing arrays: The arrays can be accessed by using indices for different dimensions separated
by commas. Different components can be specified by any combination of elements’ names or positions.
Accessing Uni-Dimensional Array: The elements can be accessed by using indexes of the
corresponding elements.
Ex: > # creating a vector
> vec <- c(1:10)
> # accessing entire vector
> cat ("Vector is : ", vec)
Vector is : 1 2 3 4 5 6 7 8 9 10
> # accessing elements
> cat ("Third element of vector is : ", vec[3])
Third element of vector is : 3
Access Entire Row or Column:
Ex: > # create a two 2 by 3 matrix
> array1 <- array(c(1:12), dim = c(2,3,2))
> print(array1)
,,1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
,,2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
> # access entire elements at 2nd column of 1st matrix
> cat("\n2nd Column Elements of 1st matrix:",array1[,c(2),1])
2nd Column Elements of 1st matrix: 3 4
> # access entire elements at 1st row of 2nd matrix
> cat("\n1st Row Elements of 2nd Matrix:", array1[c(1), ,2])
1st Row Elements of 2nd Matrix: 7 9 11
Manipulating Array Elements: As array is made up matrices in multiple dimension, the operations
on elements of array are carried out by accessing elements of the matrices.
Ex: > # Create two vectors of different lengths.
> vector1 <- c(5,9,3)
> vector2 <- c(10,11,12,13,14,15)
> # Take these vectors as input to the array.
> array1 <- array(c(vector1,vector2),dim = c(3,3,2))
> array1
,,1
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
,,2
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
> # Create two vectors of different lengths.
> vector3 <- c(9,1,0)
> vector4 <- c(6,0,11,3,14,1,2,6,9)
> array2 <- array(c(vector3,vector4),dim = c(3,3,2))
> array2
,,1
[,1] [,2] [,3]
[1,] 9 6 3
[2,] 1 0 14
[3,] 0 11 1
,,2
[,1] [,2] [,3]
[1,] 2 9 6
[2,] 6 1 0
[3,] 9 0 11
> # create matrices from these arrays.
> matrix1 <- array1[,,2]
> matrix1
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
> matrix2 <- array2[,,2]
> matrix2
[,1] [,2] [,3]
[1,] 2 9 6
[2,] 6 1 0
[3,] 9 0 11
> # Add the matrices.
> result <- matrix1+matrix2
> print(result)
[,1] [,2] [,3]
[1,] 7 19 19
[2,] 15 12 14
[3,] 12 12 26
FACTORS
INTRODUCTION TO FACTORS
• Factors in R Programming Language are data structures that are implemented to categorize the
data or represent categorical data and store it on multiple levels.
• They can be stored as integers with a corresponding label to every unique integer. Though factors
may look similar to character vectors, they are integers and care must be taken while using them
as strings. The factor accepts only a restricted number of distinct values. For example, a data
field such as gender may contain values only from female, male.
Attributes of a factor
There are the following attributes of a factor in R
SUMMARIZING A FACTOR
The summary() function in R returns the results of basic statistical calculations (minimum, 1st quartile,
median, mean, 3rd quartile, and maximum) for a numerical vector. The general way to write the R
summary function is summary(x, na.rm=FALSE/TRUE). Again, X refers to a numerical vector, while
na.rm=FALSE/TRUE specifies whether to remove empty values from the calculation.
Ex: > # Creating a factor
> v <- gl(3, 4, labels = c("A", "B","C"))
> print(v)
[1] A A A A B B B B C C C C
Levels: A B C
> # summary of a factor is
> summary(v)
AB C
444
ORDERED FACTORS
Level Ordering of Factors: Factors are data objects used to categorize data and store it as levels. They
can store a string as well as an integer. They represent columns as they have a limited number of unique
values. Factors in R can be created using factor() function. It takes a vector as input. c() function is used
to create a vector with explicitly provided values.
Ex: > # Creating a vector as input.
> x <- c("Pen", "Pencil", "Brush", "Pen", "Brush", "Brush", "Pencil", "Pencil")
> print(x)
[1] "Pen" "Pencil" "Brush" "Pen" "Brush" "Brush" "Pencil" "Pencil"
> # Creating variable is x is factor or not
> print(is.factor(x))
[1] FALSE
> # Convert vector to factor
> factor_x = factor(x)
> # print levels of factor
> levels(factor_x)
[1] "Brush" "Pen" "Pencil"
In the above code, x is a vector with 8 elements. To convert it to a factor the function factor() is used.
Here there are 8 factors and 3 levels. Levels are the unique elements in the data. Can be found using
levels() function.
Ordering Factor Levels: Ordered factors is an extension of factors. It arranges the levels in increasing
order. We use two functions: factor() along with argument ordered().
Syntax: factor(data, levels =c(“name1”, “name2”, , ,), ordered =TRUE)
Parameter are
• data: input vector with explicitly defined values.
• levels(): Mention the list of levels in c function.
• ordered: It is set true for enabling ordering.
Ex: > # converting to vector
> size = c("small", "large", "large", "small", "medium", "large", "medium", "medium", "large")
> # converting to factor
> size_factor <- factor(size)
> print(size_factor)
[1] small large large small medium large medium medium large
Levels: large medium small
> # ordering the levels
> ordered_size <- factor(size, levels = c("small", "medium", "large"), ordered = TRUE)
> print(ordered_size)
[1] small large large small medium large medium medium large
Levels: small < medium < large
> summary(ordered_size)
small medium large
2 3 4
In the above code, size vector is created using c function. Then it is converted to a factor. And
for ordering factor() function is used along with the arguments described above. Thus, the sizes
arranged in order.
DATA FRAMES
INTRODUCTION TO DATA FRAME
A data frame is a table or a two-dimensional array-like structure in which each column contains values
of one variable and each row contains one set of values from each column. A data frame is a special case
of the list in which each component has equal length. Data frames can also be interpreted as matrices
where each column of a matrix can be of the different data types.
A data frame is used to store data table and the vectors which are present in the form of a list in a data
frame, are of equal length. In a simple way, it is a list of equal length vectors. A matrix can contain one
type of data, but a data frame can contain different data types such as numeric, character, factor, etc.
Following are the characteristics of a data frame.
• The column names should be non-empty.
• The row names should be unique.
• The data stored in a data frame can be of numeric, factor or character type.
• Each column should contain same number of data items.
Creating Data Frame: In R, the data frames are created with the help of frame() function of data. This
function contains the vectors of any type such as numeric, character, or integer. In below example, we
create a data frame that contains student id (integer vector), student name(character vector), age(numeric
vector), and date of birth(Date vector).
Ex: > # Creating the data frame.
> student.data <- data.frame (
student_id = c(1:5),
student_name= c("Rohan", "Rohith", "David", "Mary", "James"),
age=c(18, 20, 19, 21, 22),
date_of_birth = as.Date(c("2003-01-01", "2002-12-23", "2003-09-25", "2001-10-05", "2003-09-07"))
)
> # print the data frame
> print(student.data)
student_id student_name age date_of_birth
1 1 Rohan 18 2003-01-01
2 2 Rohith 20 2002-12-23
3 3 David 19 2003-09-25
4 4 Mary 21 2001-10-05
5 5 James 22 2003-09-07
Getting the structure of R Data Frame: In R, we can find the structure of our data frame. R provides
an in-build function called str() which returns the data with its complete structure. In below example,
we have created a frame using a vector of different data type and extracted the structure of it.
Ex: > # Creating the data frame.
> student.data <- data.frame (
student_id = c(1:5),
student_name= c("Rohan", "Rohith", "David", "Mary", "James"),
age=c(18, 20, 19, 21, 22),
date_of_birth = as.Date(c("2003-01-01", "2002-12-23", "2003-09-25", "2001-10-05", "2003-09-07"))
)
> # Printing the structure of data frame.
> str(student.data)
'data.frame' : 5 obs. of 4 variables:
$ student_id : int 1 2 3 4 5
$ student_name : chr "Rohan" "Rohith" "David" "Mary" ...
$ age : num 18 20 19 21 22
$ date_of_birth : Date, format: "2003-01-01" "2002-12-23" "2003-09-25" "2001-10-05"...
Extract Data from Data Frame: The data of the data frame is very crucial for us. To manipulate the
data of the data frame, it is essential to extract it from the data frame. We can extract the data in three
ways which are as follows:
1. We can extract the specific columns from a data frame using the column name.
2. We can extract the specific rows also from a data frame.
3. We can extract the specific rows corresponding to specific columns.
Let's see an example of each one to understand how data is extracted from the data frame with the help
these ways.
• Extract specific column from a data frame using column name.
> # Creating the data frame.
> student.data <- data.frame (
student_id = c(1:5),
student_name= c("Rohan", "Rohith", "David", "Mary", "James"),
age=c(18, 20, 19, 21, 22),
date_of_birth = as.Date(c("2003-01-01", "2002-12-23", "2003-09-25", "2001-10-05", "2003-09-07"))
)
> # print the data frame
> print(student.data)
student_id student_name age date_of_birth
1 1 Rohan 18 2003-01-01
2 2 Rohith 20 2002-12-23
3 3 David 19 2003-09-25
4 4 Mary 21 2001-10-05
5 5 James 22 2003-09-07
> # Extract Specific columns.
> result <-data.frame(student.data$student_id, student.data$student_name)
> print(result)
student.data.student_id student.data.student_name
1 1 Rohan
2 2 Rohith
3 3 David
4 4 Mary
5 5 James
• Extracting the specific rows from a data frame
> # Creating the data frame.
> student.data <- data.frame (
student_id = c(1:5),
student_name= c("Rohan", "Rohith", "David", "Mary", "James"),
age=c(18, 20, 19, 21, 22),
date_of_birth = as.Date(c("2003-01-01", "2002-12-23", "2003-09-25", "2001-10-05", "2003-09-07"))
)
> # print the data frame
> print(student.data)
student_id student_name age date_of_birth
1 1 Rohan 18 2003-01-01
2 2 Rohith 20 2002-12-23
3 3 David 19 2003-09-25
4 4 Mary 21 2001-10-05
5 5 James 22 2003-09-07
> # Extracting first row from a data frame
> result<-student.data[1,]
> result
student_id student_name age date_of_birth
1 1 Rohan 18 2003-01-01
> result<-student.data[3:5, ]
> result
student_id student_name age date_of_birth
3 3 David 19 2003-09-25
4 4 Mary 21 2001-10-05
5 5 James 22 2003-09-07
• Extracting specific rows corresponding to specific columns
> # Creating the data frame.
> student.data <- data.frame (
student_id = c(1:5),
student_name= c("Rohan", "Rohith", "David", "Mary", "James"),
age=c(18, 20, 19, 21, 22),
date_of_birth = as.Date(c("2003-01-01", "2002-12-23", "2003-09-25", "2001-10-05", "2003-09-07"))
)
> # print the data frame
> print(student.data)
student_id student_name age date_of_birth
1 1 Rohan 18 2003-01-01
2 2 Rohith 20 2002-12-23
3 3 David 19 2003-09-25
4 4 Mary 21 2001-10-05
5 5 James 22 2003-09-07
> #Extract 3rd and 5th row with 2nd and 4th column
> result<-student.data[c(3,5),c(2,4)]
> result
student_name date_of_birth
3 David 2003-09-25
5 James 2003-09-07
Summary of Data in Data Frame: The statistical summary and nature of the data can be obtained by
applying summary() function. R provides the summary() function to extract the statistical summary and
nature of the data. This function takes the data frame as a parameter and returns the statistical information
of the data.
Ex: > # Create the data frame.
> emp.data<-data.frame (
emp_id=c(1:15),
emp_name=c("Rick", "Dan", "Michelle", "Ryan", "Gary", "Chetan", "Nikhil", "Chithra",
"Shan", "Vidya", "Deepak", "Eshan", "Visha", "Benny", "Adi"),
salary=c(623. 3, 515.2, 611.0, 729.0, 843.25, 623.3, 515.2, 611.0, 729.0, 843.25, 623.3, 515.2,
611.0, 729.0, 843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27",
"2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27","2012-01-01", "2013-
09-23", "2014-11-15", "2014-05-11", "2015- 03-27")),
stringsAsFactors = FALSE
)
> # Print the emp data.
> print(emp.data)
emp_id emp_name salary start_date
1 1 Rick 623.30 2012-01-01
2 2 Dan 515.20 2013-09-23
3 3 Michelle 611.00 2014-11-15
4 4 Ryan 729.00 2014-05-11
5 5 Gary 843.25 2015-03-27
6 6 Chetan 623.30 2012-01-01
7 7 Nikhil 515.20 2013-09-23
8 8 Chithra 611.00 2014-11-15
9 9 Shan 729.00 2014-05-11
10 10 Vidya 843.25 2015-03-27
11 11 Deepak 623.30 2012-01-01
12 12 Eshan 515.20 2013-09-23
13 13 Visha 611.00 2014-11-15
14 14 Benny 729.00 2014-05-11
15 15 Adi 843.25 2015-03-27
> # Print the summary.
> print(summary(emp.data))
emp_id emp_name salary start_date
Min. :1.0 Length:15 Min. :515.2 Min. :2012-01-01
1st Qu. :4.5 Class :character 1st Qu. :611.0 1st Qu. :2013-09-23
Median :8.0 Mode :character Median :623.3 Median :2014-05-11
Mean :8.0 Mean :664.4 Mean :2014-01-14
3rd Qu. :11.5 3rd Qu. :729.0 3rd Qu. :2014-11-15
Max. :15.0 Max. :843.2 Max. :2015-03-27
> # display number of rows
> nrow(emp.data)
[1] 15
> # display number of columns
> ncol(emp.data)
[1] 4
> # display both no. of rows and col
> dim(emp.data)
[1] 15 4
> # print the first 6 rows in the data frame
> head(emp.data)
emp_id emp_name salary start_date
1 1 Rick 623.30 2012-01-01
2 2 Dan 515.20 2013-09-23
3 3 Michelle 611.00 2014-11-15
4 4 Ryan 729.00 2014-05-11
5 5 Gary 843.25 2015-03-27
6 6 Chetan 623.30 2012-01-01
> # print the last 6 rows in the data frame
> tail(emp.data)
emp_id emp_name salary start_date
10 10 Vidya 843.25 2015-03-27
11 11 Deepak 623.30 2012-01-01
12 12 Eshan 515.20 2013-09-23
13 13 Visha 611.00 2014-11-15
14 14 Benny 729.00 2014-05-11
15 15 Adi 843.25 2015-03-27
SUBSETTING OF A DATA FRAME
subset() function in R Programming Language is used to create subsets of a Data frame. This
can also be used to drop columns from a data frame.
Ex: > # Create the data frame.
> emp.data<-data.frame (
emp_id=c(1:5),
emp_name=c("Rick", "Dan", "Michelle", "Ryan", "Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")),
stringsAsFactors = FALSE
)
> # print the data frame
> emp.data
emp_id emp_name salary start_date
1 1 Rick 623.30 2012-01-01
2 2 Dan 515.20 2013-09-23
3 3 Michelle 611.00 2014-11-15
4 4 Ryan 729.00 2014-05-11
5 5 Gary 843.25 2015-03-27
> subset(emp.data, emp_id == 3)
emp_id emp_name salary start_date
3 3 Michelle 611 2014-11-15
> subset(emp.data, emp_id == c(1:3))
emp_id emp_name salary start_date
1 1 Rick 623.3 2012-01-01
2 2 Dan 515.2 2013-09-23
3 3 Michelle 611.0 2014-11-15
[[2]]
[[2]][[1]]
[1] 1
[[2]][[2]]
[1] 3
[[2]][[3]]
[1] 5
[[2]][[4]]
[1] 7
[[2]][[5]]
[1] 9
# Merging the two lists using c() function.
> newlist <- c(Even_list, Odd_list)
> print(newlist)
[[1]]
[1] 2
[[2]]
[1] 4
[[3]]
[1] 6
[[4]]
[1] 8
[[5]]
[1] 10
[[6]]
[1] 1
[[7]]
[1] 3
[[8]]
[1] 5
[[9]]
[1] 7
[[10]]
[1] 9