Unit I - Introduction to R
Unit I - Introduction to R
Banking
A large amount of customer data is generated every day in Banks. While dealing with millions of
customers on a regular basis, it becomes hard to track their mortgages.
Solution
R builds a custom model that maintains the loans provided to every individual customer which
helps us to decide the amount to be paid by the customer over time.
Insurance
Insurance extensively depends on forecasting. It is difficult to decide which policy to accept or
reject.
Solution
By using the continuous credit report as input, we can create a model in R that will not only assess
risk appetite but also make a predictive forecast as well.
Healthcare
1
Every year millions of people are admitted to hospitals and billions are spent annually just in the
admission process.
Solution
Given the patient history and medical history, a predictive model can be built to identify who is
at risk for hospitalization and to what extent the medical equipment should be scaled.
Business Analytics
Business analytics is a process of examining large sets of data and achieving hidden
patterns, correlations and other insights. It basically helps us to understand all the data that we
have gathered, be it organizational data, market or product research data or any other kind of
data. It becomes easy for us to make better decisions, better products, better marketing
strategies etc. Refer to the below image for better understanding:
If we look at the above figure, our data in the first image is scattered. Now, if we want something
specific such as a particular record in a database, it becomes cumbersome. To simplify this, we
need analysis. With analysis, it becomes easy to strike a correlation between the data. Once we
have established what to do, it becomes quite easy for us to make decisions such as, which path
we want to follow or in terms of business analytics, which path will lead to the betterment of our
organization. But we can’t expect people in the chain above to always understand the raw data
that we are providing them after analytics. So to overcome this gap, we have a concept of data
visualization.
Data visualization
Data visualization is a visual access to huge amounts of data that we have generated
after analytics. The human mind processes visual images and visual graphics are better than
2
compare to raw data. It’s always easy for us to understand a pie chart or a bargraph compare
to raw numbers.
R
PowerBI
Spark
Qlikview etc.
Features of R
R supports procedural programming with functions and object-oriented
programming with generic functions. Procedural programming includes procedures,
records, modules, and procedure calls. While object-oriented programming language
includes classes, objects, and functions.
PackagesarepartofRprogramming.Hence,theyareusefulincollectingsetsof R functions into
a single unit.
R is a well-developed, simple, and effective programming language that includes
conditionals, loops, user-defined recursive functions and input and output facilities.
R has an effective data handling and storage facility.
R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
R provides a large, coherent and integrated collection of tools for data analysis. It
provides graphical facilities for data analysis and display either directly at the computer
or printing at the papers.
R programming features include database input, exporting data, viewing data, variable
labels, missing data, etc.
R is an interpreted language. So we can access it through a command line interpreter.
R supports matrix arithmetic.
R, SAS, and SPSS are three statistical languages. Of these three statistical languages, R is
the only open source.
As a conclusion, R is world’s most widely used statistics programming language. It is a good
choice for data scientists and is supported by a vibrant and talented community of contributors.
Free tools of R
RStudio
StatET
ESS (Emacs Speaks Statistics)
3
R Commander
JGR (Java GUI for R)
CRAN
CRAN abbreviated Comprehensive R Archive Network is the centralized repository having all tools
and packages.
R Studio
R Studio is an Integrated Development Environment (IDE) for R Language with an advanced and
more user-friendly GUI. R Studio allows the user to run R in a more user-friendly environment. It
is open-source (i.e.free) and available at http://www.rstudio.com/
USE CASES OF R
The Consumer Financial Protection Bureau uses R for data analysis
Statisticians at John Deere use R for time series modelling and geospatialanalysisina
reliable and reproducible way.
Bank of America uses R for reporting.
R is part of technology stack behind Four square’s famed recommendation engine.
ANZ, the fourth largest bank in Australia, using R for credit risk analysis.
Google uses R to predict Economic Activity.
Mozilla, the foundation responsible for the Firefox web browser, uses R to visualize Web
activity.
Numeric
Decimal values are called numeric in R. It is the default R data type for numbers in R. Real
numbers with a decimal point are represented using this data type in R. It uses a format for double-
precision floating-point numbers to represent numerical values.
Eg: 3, 6.7, 121
Integer
Integer data types consist of set of all integers. We can create as well as convert a value into an
integer type using the as.integer() function. We can also use the capital ‘L’ notation as a suffix to denote
that a particular value is of the integer R data type.
Eg: 2L, 42L
Logical
Logical data types take either a value of true or false. A logical value is often created via a
comparison between variables, which have two possible values and are represented by FALSE or TRUE
complex
The complex data type is used to store numbers with an imaginary component.
Eg: 7 + 5i
Character
R character data types store character values or strings. Strings in R can contain alphabets, numbers, and
symbols.
It can be denoted by wrapping the value inside single or double inverted commas.
character – (“J", “Jesus Loves you”, ‘10’)
Input/Output statements in R
To get input from the user, there are two methods in R.
1.Using readline() method
2. Using scan() method
Using readline() method
In R language readline() method gets input in string format. To convert the inputted value to the
desired data type, there are some functions in R,
as.integer(n); —> convert to integer
as.numeric(n); —> convert to numeric type (float, double etc)
as.complex(n); —> convert to complex number (i.e 3+2i)
as.Date(n) —> convert to date …, etc
6
Example 1. 1 R program to illustrate getting input from the use
# R program to illustrate getting input from the user
print("Enter the value")
var = readline();
# convert the inputted value to an integer
var = as.integer(var);
# print the value
print("The integer value is ");
print(var);
output
> source("C:/Christy/BCA/jk13.R")
[1] "Adding two numbers"
Enter the first value : 12
Enter the first value : 23
[1] "The Result is "
[1] 35
7
Using scan() method
Another way to get user input in R language is using a method, called scan() method. This method
ges input from the console. This method is a very handy method while inputs are needed to taken quickly
for any mathematical calculation or for any dataset. This method reads data in the form of a vector or list.
This method also uses to reads input from a file also.
scan() method is taking input continuously, to terminate the input process, we need to press Enter key
two times on the console.
8
d = scan(what = double())
# string input using 'scan()'
print("Enter a string")
s = scan(what = " ")
output :
> source("C:/Christy/BCA/14.R")
[1] "Enter double value"
1: 23.78
2:
Read 1 item
[1] "Enter a string"
1: Karthikajesus
2:
Read 1 item
[1] "Enter a character"
1: J
2:
Read 1 item
[1] 23.78
[1] "Karthikajesus"
[1] "J"
9
1.4 Data Structures in R
A data structure is a particular way of organizing data in a computer so that it can be used
effectively. The idea is to reduce the space and time complexities of different tasks. Different types of
Structures in R are as follows,
1. Vectors
2. Lists
3. Matrices
4. Arrays
5. Factors
6. Data Frames
R - Vector
A vector is a uni-dimensional array, which is specified by a single dimension, length. A Vector can
be created using c() function. A list of values is passed to the c() function to create a vector.
output
> source("C:/Christy/BCA/first.R")
[1] 1 3 5 7 8
R- Lists
A list is a generic object consisting of an ordered collection of objects. Lists are heterogeneous
data structures. These are also one-dimensional data structures. A list can be a list of vectors, a list of
matrices, a list of characters a list of functions and so on.
empId = c(1, 2, 3, 4)
10
# character vector for employee name
# We can combine all these three different data types into a list
# containing the details of employees which can be done using a list command
empList = list(empId, empName, numberOfEmp)
print(empList)
output
> source("C:/Christy/BCA/17.R")
[[1]]
[1] 1 2 3 4
[[2]]
[1] "Kiran" "Irfan" "Bala" "Chaitra"
[[3]]
[1] 4
R – Matrices
Creating a Matrix in R
matrix() is used to create matrices in R. The general syntax is ,
where
11
Examples 1.7 Matrix Example
#Matrix Example
Mat = matrix(c(1:16), nrow = 4, ncol = 4 )
Print(Mat)
Output
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
Output
[,1] [,2] [,3]
[1,] "a" "a" "b"
[2,] "c" "b" "a"
M = matrix( c('a','a','b','c','b','a'),nrow=3,byrow=FALSE)
print(M)
output
[,1] [,2]
[1,] "a" "c"
[2,] "a" "b"
[3,] "b" "a"
output
col1 col2 col3
row1 3 4 5
row2 6 7 8
row3 9 10 11
row4 12 13 14
R – Array
Arrays are data storage structures defined by a fixed number of dimensions. Arrays are used for
the allocation of space at contiguous memory locations. In R Programming Language Uni-dimensional
arrays are called vectors with the length being their only dimension. Two-dimensional arrays are called
matrices, consisting of fixed numbers of rows and columns. While matrices are confined to two
dimensions, arrays can be of any number of dimensions. R Arrays consist of all elements of the same
data type.
Creating an Array
An R array can be created with the use of array() the function. A list of elements is passed to the array()
functions along with the dimensions as required. It takes the general syntax,
where
nrow Number of rows
ncol Number of columns
nmat Number of matrices of dimensions nrow * ncol
dimnames Default value = NULL
,,1
[,1] [,2] [,3]
[1,] 2 4 6
[2,] 3 5 7
,,2
[,1] [,2] [,3]
13
[1,] 8 10 12
[2,] 9 11 13
R Factors
Factors in R Programming Language are data structures that are implemented to categorize the
data or represent categorical data and store it on multiple levels.
The factor accepts only a restricted number of distinct values. These distinct values are known as
levels. After a factor is created it only consists of levels that are by default sorted alphabetically.
For example, a data field such as gender may contain values only from female, male, or transgender.
Output
R – Data Frames
Data Frames in R Language are generic data objects that are used to store tabular data. Data
frames can also be interpreted as matrices where each column of a matrix can be of different data types.
R DataFrame is made up of three principal components, the data, rows, and columns.
Data frames are tabular data objects. Unlike a matrix in data frame each column can contain different
types of data. The first column can be numeric while the second column can be character and third
14
column can be logical. It is a list of vectors of equal length. Data Frames are created using the
data.frame() function.
Output
1 Male 152.0 81 42
2 Male 171.5 93 38
3 Female 165.078 26
Output
std_id std_name marks
1 1 Rick 623.30
2 2 Dan 515.20
3 3 Michelle 611.00
4 4 Ryan 729.00
5 5 Gary 843.25
Operators
An operator is a symbol that tells the compiler to perform specific mathematical or logical
15
manipulations. R language is rich in built-in operators and provides following types of
operators.
Arithmetic Operators
RelationalOperators
Logical Operators
AssignmentOperators
MiscellaneousOperators
ArithmeticOperators:
FollowingtableshowsthearithmeticoperatorssupportedbyRlanguage.Theoperatorsacton each
element of the vector.
Addition +
Subtraction -
Multiplication *
Division /
Exponentiation ^
Modulo %%
Output
Addition of vectors : 2 5
Subtraction of vectors : -2 -1
Multiplication of vectors : 0 6
Division of vectors : 0 0.6666667
Modulo of vectors : 0 2
Power operator : 0 8
16
Logical Operators
Logical Operators in R simulate element-wise decision operations, based on the specified operator
between the operands, which are then evaluated to either a True or False boolean value. Any non-
zero integer value is considered as a TRUE value, be it a complex or real number.
Output
Element wise AND : FALSE FALSE
Element wise OR : TRUE TRUE
Logical AND : FALSE
Logical OR : TRUE
Negation : TRUE FALSE
Relational Operators
(<,<=,>,>=,!=)
The Relational Operators in R carry out comparison operations between the corresponding
elements of the operands. Returns a boolean TRUE value if the first operand satisfies the relation
compared to the second.
Less than (<)
Returns TRUE if the corresponding element of the first operand is less than that of the second
operand. Else returns FALSE.
17
Returns TRUE if the corresponding element of the first operand is less than or equal to that of the
second operand. Else returns FALSE.
Miscellaneous Operators
%in% Operator
Checks if an element belongs to a list and returns a boolean value TRUE if the value is present else
FALSE.
%*% Operator
This operator is used to multiply a matrix with its transpose
print(numeric.var)
print(class(numeric.var))
18
print(character_var)
print(class(character_var))
print(logical_var)
print(class(logical_var))
print(complex_var)
print(class(complex_var))
Coercion
In R, coercion is the process of converting a value from one data type to a different type. There are two
types of coercion:
1)implicit coercion
2)explicit coercion
Implicit coercion happens automatically when an operation requires it. Explicit coercion requires
the programmer to specify the type of conversion using specific functions. Coercing atomic vectors
removes attributes. The hierarchy for coercion is logical, integer, numeric, and character.
Examples of coercion
c(1.5, "hello"): The numeric 1.5 is coerced into the character data type, resulting in c("1.5", "hello").
c(TRUE, 1.5): TRUE is coerced to the numeric 1, resulting in c(1, 1.5).
There are pre-defined methods for coercing any object to one of the basic data types.
For example, as(x, "numeric") uses the as.numeric function.
Function Description
as.logical Converts the value to logical type.
If 0 is present then it is converted to FALSE
Any other value is converted to TRUE
as.integer Converts the object to integer type
as.double Converts the object to double precision type
as.complex Converts the object to complex type
as.list It accepts only dictionary type or vector as input arguments in the parameter
# Creating a list
x<-c(0, 1, 0, 3)
# Converting it to a list
print(as.list(x))
Output
> source("C:/Christy/BCA/jk18.R")
[1] "numeric"
[1] 0 1 0 3
[1] 0 1 0 3
[1] FALSE TRUE FALSE TRUE
[[1]]
[1] 0
[[2]]
[1] 1
[[3]]
[1] 0
[[4]]
[1] 3
Plotting in R is a fundamental skill for any data analyst or researcher. The plot() function in R is a versatile
command that allows for a wide variety of plot types, including scatter plots, line plots, and more. The function can
handle simple plots of two vectors, as well as more complex plotting structures.
Plot()
The plot() function is used to draw points (markers) in a diagram. This function takes parameters for specifying
points in the diagram. Parameter 1 specifies points on the x-axis. Parameter 2 specifies points on the y-axis.
For example, draw one point in the diagram, at position (1) and position (3)
plot(1, 3)
20
# Create two vectors
x <- c(1, 2, 3, 4, 5)
y <- c(1, 4, 9, 16, 25)
3) We can add titles and labels to your plot using the main, xlab, and ylab arguments:
4) type argument allows us to change the type of plot, and the col argument lets us to change the color:
plot(x, y, type="b", col="blue")
Pie Charts
A pie chart is a circular graphical view of data. pie() function is used to draw pie charts
Example
# Create a vector of pies
x <- c(10,20,30,40)
Bar Charts
A bar chart uses rectangular bars to visualize data. Bar charts can be displayed horizontally or vertically. The
height or length of the bars are proportional to the values they represent. barplot() function to draw a vertical bar
chart
Example
# x-axis values
x <- c("A", "B", "C", "D")
# y-axis values
y <- c(2, 4, 6, 8)
barplot(y, names.arg = x)
21