0% found this document useful (0 votes)
24 views63 pages

4 Overview of R Part 2

Uploaded by

BAINS AWAAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views63 pages

4 Overview of R Part 2

Uploaded by

BAINS AWAAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Module 4.

Introduction to R – II
Recommended Readings
• DA: “Data Analytics using R” by Seema Acharya, 1st Edition,
McGraw Hill Education, India. Ch 3, 4
Loading and Handling Data in R
Expression, Variables and Functions
Operation Operator Description
Addition x+y y added to x
Subtraction x-y y subtracted from x
Expressions
Multiplication x*y x multiplied by y

Division x/y x divided by y


Exponentiation x^y x raised to the power y
x ** y
Modulus x %% y Remainder of (x divided by y)

Integer division x%/%y x divided by y but rounded down

Computing the sqrt(x) Computing the square root of x


Square root
Expression, Variables and Functions
Logical values
Logical values are TRUE and FALSE or T and F. Note that these are case sensitive.
The equality operator is ==.
Vector Creation and Related functions
Dates
Variables
Manipulating Text in Data
Functions Function Arguments Description
substr(a, start stop) • a is a character vector The function returns part of the
• Start and stop arguments contain a string starting from start argument
numeric value and ends at the stop argument.
strsplit(a, split, …) • a is a character vector The function splits the given text
• Split is also a character vector that string into substring.
contains a regular expression for splitting.

paste(…, sep= “”, …) • The dots “…” define R objects The function concatenates string
• sep argument is a character string for vectors after converting the objects
separating objects into strings.

grep(pattern, a) • Pattern argument contains matching The function returns string after
pattern searching for a text pattern into a
• a is a character vector given text string.

toupper(a) • a is a character vector The function converts a string into


uppercase
tolower(a) • a is a character vector The function converts a string into
lowercase.
Missing Values Treatment in R

• During analytical data processing, users come across problems


caused by missing and infinite values.
– To get an accurate output, users should remove or clean the missing
values.
• In R
– NA represents Not available
– Inf represents Infinite
Missing Values Treatment in R
Functions Function Arguments Description
is.na(x) x is an R object to be tested. The function checks the object and returns
true if data is missing.
na.omit(x, …) x is an R object from which NA needs to be The function returns object after removing
removed. missing values from the object.
The dots “…” define the other optional argument.

na.exclude(x, …) x is an R object from which NA needs to be The function returns object after removing
removed. missing values from the object.
The dots “…” define the other optional argument.
Missing Values Treatment in R
Functions Function Arguments Description
na.fail(x, …) The package provides the functions for accessing The function encounters an error if objects
all APIs. contain any missing values and returns an
object if an object does not contain any
missing value.
na.pass(x, …) x is an R object from which NA needs to be The function returns the unchanged object.
removed.
The dots “…” define the other optional argument.
Vectors
• Vectors are stored like arrays in C
• Vector indices begin at 1
• All Vector elements must have the same mode such as integer,
numeric (floating point number), character (string), logical
(Boolean), complex, object etc.

Create a vector of numbers

The c function (c is short for combine) creates a new vector consisting of three values: 4, 6,
and 7.
Vectors
A vector cannot hold values of different data types. Consider the example below.
We are trying to place integer, string and boolean values together in a vector.

Note: All the values are converted to the same data type, i.e. “character”.
Vectors
Vector
Vectors
Accessing the value (s) in the vector
Create a variable by the name, “VariableSeq” and assign to it a vector consisting of
string values.

• Access values in a vector, specify the indices at which the value is present in the
vector. Indices start at 1.
Vectors Maths
Vectors Maths
Matrices

Create a matrix, “mat”, 3 rows high and 4 columns wide using


a vector

Access the element present in the 2nd row and 3rd column of the matrix, “mat”.
Matrices
To access the 2nd column of the matrix, simply provide the column number and
omit the row number.

To access the 2nd and 3rd columns of the matrix, simply provide the column
numbers and omit the row number.
List

To create a list, “emp” having three elements.


To get the elements of the list, “emp” use the below command.

Retrieve the names of the elements in the list “emp”.


List
Add an element with the name Delete an element with the name “Deptt”
“EmpDesg” and value “Faculty” to the and value “CSE” from the list, “emp”.
list, “emp”.
Recursive list

A recursive list means a list within a list.


Let us begin with two lists, “emp” and “emp1”.
The elements in both the lists are as shown below:

Combine both the lists into a single list by the name


“EmpList”.
Exploring a Dataset
Exploring a Dataset
Functions Function Arguments Description
names(dataset) Dataset argument contains name of dataset The function displays the variables of the given
dataset.
summary(dataset) Dataset argument contains name of dataset The function displays the summary of the given
dataset.
Exploring a Dataset
Functions Function Arguments Description
str(dataset) Dataset argument contains name of dataset The function displays the structure of the given
dataset.
Exploring a Dataset
Functions Function Arguments Description
head(dataset, n) Dataset argument contains name of dataset The function displays the top rows according to the
value of n. If value of n is not provided in the function
n is a numeric value to display the number of top rows then by default function displays top 6 rows of the
dataset.
tail(dataset, n) Dataset argument contains the name of a dataset The function displays the top rows according to the
n is a numeric value to display the number of bottom rows value of n. If value of n is not provided in the function
then by default function displays bottom 6 rows of
the dataset.
Exploring a Dataset
Functions Function Arguments Description
dim(dataset) Dataset argument contains the name of a dataset The function returns the dimension of the dataset
which implies the total number of rows and columns
of the dataset.
table(dataset$variablenames) Dataset argument contains name of dataset The function returns the number of categorical value
Variable name contains the name of the variable names after counting it.
Data Frames

Think of a data frame as something akin to a database table or an Excel


spreadsheet.
Create a data frame
• First create three vectors, “EmpNo”, “EmpName” and “ProjName”
• Then create a data frame, “Employee”
Data Frames
Data Frame Access

There are two ways to access


the content of data frames:
• By providing the index
number in square brackets.
• By providing the column
name as a string in double
brackets.
Few R functions for understanding data in data frames
• dim()
dim()function is used to obtain dimensions of a data frame.
• nrow()
dim()function is used to obtain dimensions of a data frame.
• ncol()
ncol() function returns number of columns in a data frame.
• str()
str() function compactly displays the internal structure of R objects.
• summary()
use the summary() function to return result summaries for each column of the
dataset.
Few R functions for understanding data in data frames
• head()
head()function is used to obtain the first n observations where n is set as 6 by
default.

• tail()
tail()function is used to obtain the last n observations where n is set as 6 by
default.

• Negative values in head and tail


Few R functions for understanding data in data frames

• edit()
– The edit() function will invoke the text editor on the R object.
Task: Create a csv and an excel file
Reading from a CSV file
Task: Create a Tab separated values file
Reading from a Tab separated file
Data Summary
Data Summary
Data Summary
Data Summary
Finding the Missing Values
Finding the Missing Values
Finding the Missing Values
Data Visualization
• Histograms
• Density Plot
• Bar Charts
Data Visualization - Histogram
• A histogram is a graphical illustration of the distribution of
numerical data in successive numerical intervals of equal
sizes. It looks similar to a bar graph. However, values are
grouped into continuous ranges in a histogram. The height
of a histogram bar represents the number of values
occurring in a particular range.

• R uses hist(x) function to create simple histograms, where x


is a numeric value to be plotted.
Data Visualization - Histogram
Data Visualization – Density Plot
Data visualization – Bar Charts

Simple Bar Chart Grouped Bar Chart Stacked Bar Chart


Data visualization – Bar Charts - Simple
Data visualization – Bar Charts - Simple

You might also like