0% found this document useful (0 votes)
2 views27 pages

Unit 4

The document provides an overview of data analytics using R, covering its features, environment, and various data types such as vectors, lists, matrices, and data frames. It also discusses variable declaration, functions, factors, and basic graphics capabilities in R. Additionally, it highlights built-in functions for statistical analysis and data manipulation, making R a powerful tool for data science.

Uploaded by

sanketraikar78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views27 pages

Unit 4

The document provides an overview of data analytics using R, covering its features, environment, and various data types such as vectors, lists, matrices, and data frames. It also discusses variable declaration, functions, factors, and basic graphics capabilities in R. Additionally, it highlights built-in functions for statistical analysis and data manipulation, making R a powerful tool for data science.

Uploaded by

sanketraikar78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

1.

Data Analytics with R: Take your first steps with R, data types, missing
values, basics of R syntax, The R workspace, Vectors, System- and user-
defined objects, Matrices, Lists, Functions, Statistics methodology, Factors
and Data frames, Basic Graphics.

2
• R is a powerful open-source programming language and software environment
designed for statistical computing, data analysis, and visualization. It is
widely used in research, academia, and industries like healthcare, finance,
marketing, and agriculture.

3
FEATURES OF R

• Statistical Analysis : R is designed specifically for performing statistical tests like regression, ANOVA, and
hypothesis testing.
• Data Visualization : R can create high-quality plots and graphs using libraries like ggplot2, lattice, and
plotly.
• Data Manipulation : Packages like dplyr, tidyr, and data.table allow easy and fast data cleaning and
transformation.
• Wide Package Ecosystem: With over 19,000 packages on CRAN, R offers tools for every kind of data
science, machine learning, or domain-specific work.
• Reproducible Reporting: R allows you to create documents that include code, analysis, and results in one
file—great for reports and research papers.
• Open Source & Free : R is completely free to use, making it accessible for anyone with an internet
connection.
• Cross-Platform Support: Works on Windows, macOS, and Linux, ensuring flexibility for different systems.
• Community Support: R has a large and active community for learning, troubleshooting, and improving
your skills.

4
R ENVIRONMENT

R is an integrated suite of software facilities for data manipulation, calculation and


graphical display.
It includes,
• An effective data handling and storage facility
• A suite of operators for calculations on arrays, in particular matrices
• A large, coherent, integrated collection of intermediate tools for data analysis
• Graphical facilities for data analysis and display either on-screen or on hardcopy
• A well-developed, simple and effective programming language which includes
conditionals, loops, user-defined recursive functions and input and output
facilities

5
DATA TYPES

The frequently used ones are,


• Vectors
• Lists
• Matrices
• Arrays
• Factors
• Data Frames

6
VARIABLE DECLARATION

• In R, variables are declared by assigning a value using the assignment


operator <- or =.

• Numeric x <- 3.14


• Integer x <- 5L
• Character name <- "Bob"
• Logical flag <- FALSE
• Vector nums <- c(1, 2, 3)
• List mylist <- list(1, "a", TRUE)
• Data Frame df <- data.frame(x=1:3, y=c("a", "b", "c"))

7
DATA TYPES
Vectors are one-dimensional arrays that hold elements of the same type (numeric,
character, or logical).
When you want to create vector with more than one element, you should
use c() function which means to combine the elements into a vector.

1. #integer
numbers <- c(1, 2, 3, 4, 5)

2. # Character vector
names <- c("Alice", "Bob", "Charlie")

3. # Logical vector
flags <- c(TRUE, FALSE, TRUE)
8
DATA TYPES
Accessing Elements:

numbers[1] # First element


names[2:3] # Second to third elements
Example:

Output:

9
VECTOR OPERATIONS

v <- c(10, 20, 30, 40)

1. v + 2 # Add 2 to each element


2. v * 3 # Multiply each element by 3
3. sum(v) # Sum of elements
4. mean(v) # Mean (average)
5. length(v) # Number of elements
6. v > 25 # Which elements are greater than 25
7. v[v > 25] # Filter elements > 25

10
DATA TYPES
LISTS
A list is an R-object which can contain many different types of elements inside it
like vectors, functions and even another list inside it.
Lists can hold elements of different types, including vectors, strings, numbers,
and even other lists.

my_list <- list(


id = 101,
name = "Akash",
scores = c(89, 95, 78),
passed = TRUE
)

11
LIST EXAMPLE
my_list <- list(Name="Alice", Age=25, Marks=c(80, 90, 85))

Accesing Elements
my_list$name # Using name
my_list[[2]] # Second item
my_list$Marks[1] # First score
my_list$Age <- 26 # Modify
my_list$Passed <- TRUE # Add new element
my_list$Passed <- NULL # Remove element

12
DATA TYPES
MATRICES
A matrix is a two-dimensional rectangular data set. It can be
created using a vector input to the matrix function.
Example:

Output:

13
MATRIX

# Create a matrix with numbers 1 to 9, 3 rows and 3


columns
mat <- matrix(1:9, nrow = 3, ncol = 3)
print(mat)

# Fill the matrix by row instead of column (default)


mat_by_row <- matrix(1:9, nrow = 3, byrow = TRUE)
print(mat_by_row)

14
MATRIX OPERATIONS

• rownames(mat) <- c("Row1", "Row2", "Row3")


• colnames(mat) <- c("Col1", "Col2", "Col3")
• print(mat)
• mat[1, 2] # Element at 1st row, 2nd column
• mat[ , 2] # Entire 2nd column
• mat[3, ] # Entire 3rd row
• mat2 <- matrix(2, nrow = 3, ncol = 3)

• mat + mat2 # Addition


• mat - mat2 # Subtraction
• mat * mat2 # Element-wise multiplication
• mat %*% mat2 # Matrix multiplication (dot product)
15
ARRAYS
ARRAYS
While matrices are confined to two dimensions, arrays can be of any number
of dimensions. The array function takes a dim attribute which creates the
required number of dimension. In the below example we create an array
with two elements which are 3x3 matrices each.
Output:

Example:

16
DATAFRAMES
Data frames are tabular data objects. Unlike a matrix in data frame each column can
contain different modes of data. The first column can be numeric while the second
column can be character and third column can be logical. It is a list of vectors of equal
length.
Data Frames are created using the data.frame() function.
Example:

Output:

17
DATA FRAME

# Create the sales data frame


sales_data <- data.frame(
Product = c("Laptop", "Tablet", "Smartphone", "Monitor", "Keyboard"),
Units_Sold = c(50, 70, 100, 40, 85),
Unit_Price = c(600, 300, 400, 150, 50)
)

# View the data


print(sales_data)

# Total Sales = Units_Sold * Unit_Price


sales_data$Total_Sales <- sales_data$Units_Sold * sales_data$Unit_Price

18
# Filter products with Total_Sales greater than 20,000
high_sales <- sales_data[sales_data$Total_Sales > 20000, ]
print(high_sales)

# Total revenue from all products


total_revenue <- sum(sales_data$Total_Sales)

# Average unit price


average_price <- mean(sales_data$Unit_Price)

print(paste("Total Revenue:", total_revenue))


print(paste("Average Price:", average_price))

19
FUNCTIONS IN R

• Functions are reusable blocks of code that perform


specific tasks.

my_function <- function(a, b) {


return(a + b)
}
my_function(5, 3)

20
FACTORS

• Factors are used to store categorical data like gender


or product types.

gender <- factor(c("male", "female", "female", "male"))


levels(gender)

21
FACTORS

• Factors are used to represent categorical variables (like gender, status,


product type). They store both the values and the levels.

fruits <- factor(c("apple", "orange", "banana", "apple", "banana"))


print(fruits)
levels(fruits) # Shows unique categories
summary(fruits)

22
FUNCTION

• A function is a block of code designed to perform a specific task. R supports user-


defined and built-in functions.
add_numbers <- function(a, b) {
result <- a + b
function_name <- function(arg1, arg2, ...) { return(result)
# code block }
return(result)
} add_numbers(5, 3) # Output: 8

23
BASIC GRAPH

x <- c(1, 2, 3, 4)
y <- c(2, 4, 6, 8)
plot(x, y, type = "p", main = "Scatter Plot")

plot(x, y, type = "l", col = "blue", main = "Line Graph")

slices <- c(10, 20, 30)labels <- c("Math", "Science", "Arts")pie(slices, labels = labels, main = "Pie
Chart")

24
BASIC GRAPH

values <- c(10, 20, 15)


names <- c("A", "B", "C")
barplot(values, names.arg = names, col = "green", main = "Bar Chart")

data <- c(2, 3, 3, 4, 5, 6, 6, 7)


hist(data, col = "purple", main = "Histogram")

scores <- c(65, 70, 80, 85, 90, 95)


boxplot(scores, main = "Boxplot", col = "orange")

25
BUILT-IN FUNCTIONS

mean() Calculate average


median() Middle value
sum() Total sum
sd() Standard deviation

length() Number of elements

min() Minimum value


max() Maximum value
sort() Sort values
round() Round numbers
sqrt() Square root

26
BUILT-IN FUNCTIONS - EXAMPLE

scores <- c(88, 92, 79, 85, 90, 95, 87, 78, 94, 89)
Scores: 88 92 79 85 90 95 87 78 94 89
# Apply built-in functions Total number of scores: 10
cat("Scores:", scores, "\n") Sum of scores: 877
cat("Total number of scores:", length(scores), "\n") Mean (average): 87.7
cat("Sum of scores:", sum(scores), "\n") Median: 88.5
cat("Mean (average):", mean(scores), "\n") Standard Deviation: 5.53
cat("Median:", median(scores), "\n") Minimum score: 78
cat("Standard Deviation:", sd(scores), "\n") Maximum score: 95
cat("Minimum score:", min(scores), "\n") Range of scores: 78 95
cat("Maximum score:", max(scores), "\n") Sorted scores: 78 79 85 87 88 89 90 92 94 95
cat("Range of scores:", range(scores), "\n") Summary of scores:
cat("Sorted scores:", sort(scores), "\n") Min. 1st Qu. Median Mean 3rd Qu. Max.
cat("Summary of scores:\n") 78.00 85.25 88.50 87.70 91.50 95.00
print(summary(scores))

27

You might also like