Unit 4
Unit 4
Data Analytics with R: Take your first steps with R, data types, missing
values, basics of R syntax, The R workspace, Vectors, System- and user-
defined objects, Matrices, Lists, Functions, Statistics methodology, Factors
and Data frames, Basic Graphics.
2
• R is a powerful open-source programming language and software environment
designed for statistical computing, data analysis, and visualization. It is
widely used in research, academia, and industries like healthcare, finance,
marketing, and agriculture.
3
FEATURES OF R
• Statistical Analysis : R is designed specifically for performing statistical tests like regression, ANOVA, and
hypothesis testing.
• Data Visualization : R can create high-quality plots and graphs using libraries like ggplot2, lattice, and
plotly.
• Data Manipulation : Packages like dplyr, tidyr, and data.table allow easy and fast data cleaning and
transformation.
• Wide Package Ecosystem: With over 19,000 packages on CRAN, R offers tools for every kind of data
science, machine learning, or domain-specific work.
• Reproducible Reporting: R allows you to create documents that include code, analysis, and results in one
file—great for reports and research papers.
• Open Source & Free : R is completely free to use, making it accessible for anyone with an internet
connection.
• Cross-Platform Support: Works on Windows, macOS, and Linux, ensuring flexibility for different systems.
• Community Support: R has a large and active community for learning, troubleshooting, and improving
your skills.
4
R ENVIRONMENT
5
DATA TYPES
6
VARIABLE DECLARATION
7
DATA TYPES
Vectors are one-dimensional arrays that hold elements of the same type (numeric,
character, or logical).
When you want to create vector with more than one element, you should
use c() function which means to combine the elements into a vector.
1. #integer
numbers <- c(1, 2, 3, 4, 5)
2. # Character vector
names <- c("Alice", "Bob", "Charlie")
3. # Logical vector
flags <- c(TRUE, FALSE, TRUE)
8
DATA TYPES
Accessing Elements:
Output:
9
VECTOR OPERATIONS
10
DATA TYPES
LISTS
A list is an R-object which can contain many different types of elements inside it
like vectors, functions and even another list inside it.
Lists can hold elements of different types, including vectors, strings, numbers,
and even other lists.
11
LIST EXAMPLE
my_list <- list(Name="Alice", Age=25, Marks=c(80, 90, 85))
Accesing Elements
my_list$name # Using name
my_list[[2]] # Second item
my_list$Marks[1] # First score
my_list$Age <- 26 # Modify
my_list$Passed <- TRUE # Add new element
my_list$Passed <- NULL # Remove element
12
DATA TYPES
MATRICES
A matrix is a two-dimensional rectangular data set. It can be
created using a vector input to the matrix function.
Example:
Output:
13
MATRIX
14
MATRIX OPERATIONS
Example:
16
DATAFRAMES
Data frames are tabular data objects. Unlike a matrix in data frame each column can
contain different modes of data. The first column can be numeric while the second
column can be character and third column can be logical. It is a list of vectors of equal
length.
Data Frames are created using the data.frame() function.
Example:
Output:
17
DATA FRAME
18
# Filter products with Total_Sales greater than 20,000
high_sales <- sales_data[sales_data$Total_Sales > 20000, ]
print(high_sales)
19
FUNCTIONS IN R
20
FACTORS
21
FACTORS
22
FUNCTION
23
BASIC GRAPH
x <- c(1, 2, 3, 4)
y <- c(2, 4, 6, 8)
plot(x, y, type = "p", main = "Scatter Plot")
slices <- c(10, 20, 30)labels <- c("Math", "Science", "Arts")pie(slices, labels = labels, main = "Pie
Chart")
24
BASIC GRAPH
25
BUILT-IN FUNCTIONS
26
BUILT-IN FUNCTIONS - EXAMPLE
scores <- c(88, 92, 79, 85, 90, 95, 87, 78, 94, 89)
Scores: 88 92 79 85 90 95 87 78 94 89
# Apply built-in functions Total number of scores: 10
cat("Scores:", scores, "\n") Sum of scores: 877
cat("Total number of scores:", length(scores), "\n") Mean (average): 87.7
cat("Sum of scores:", sum(scores), "\n") Median: 88.5
cat("Mean (average):", mean(scores), "\n") Standard Deviation: 5.53
cat("Median:", median(scores), "\n") Minimum score: 78
cat("Standard Deviation:", sd(scores), "\n") Maximum score: 95
cat("Minimum score:", min(scores), "\n") Range of scores: 78 95
cat("Maximum score:", max(scores), "\n") Sorted scores: 78 79 85 87 88 89 90 92 94 95
cat("Range of scores:", range(scores), "\n") Summary of scores:
cat("Sorted scores:", sort(scores), "\n") Min. 1st Qu. Median Mean 3rd Qu. Max.
cat("Summary of scores:\n") 78.00 85.25 88.50 87.70 91.50 95.00
print(summary(scores))
27