0% found this document useful (0 votes)
20 views28 pages

CSV Files in R

Uploaded by

Vandana Monappa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views28 pages

CSV Files in R

Uploaded by

Vandana Monappa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Function Syntax Works On Output

apply(X,
apply() MARGIN, Matrix / Array Vector / Array
FUN, ...)
lapply(X,
lapply() List / Vector List
FUN, ...)
sapply(X,
sapply() FUN, ..., List / Vector Vector / Matrix
simplify=TRUE)
mapply(FUN, ...,
MoreArgs=NULL
Multiple vectors /
mapply() , Vector / List
lists
SIMPLIFY=TRU
E)
tapply(X, INDEX,
Vector + Factor
tapply() FUN, ..., Array / List
(groups)
simplify=TRUE)
A vector contains the numbers from 1 to 5. Write a program in R to apply the square
function to each element using lapply().

# Input vector
v <- 1:5

# Apply square function using lapply


result <- lapply(v, function(x) x^2)

print(result)
Given a 3x3 matrix of numbers, calculate the sum of each row using
the apply() function.

# Create a 3x3 matrix


m <- matrix(1:9, nrow = 3, byrow = TRUE)
print(m)

# Apply sum function across rows (MARGIN = 1


→ rows)
row_sums <- apply(m, 1, sum)

print(row_sums)
Given a vector of numbers from 1 to 5, find the square of each element
using sapply().

v <- 1:5

# Use sapply to square each number


result <- sapply(v, function(x) x^2)

print(result)
# tapply()
ages <- c(21, 23, 25, 30, 40, 45)
gender <- c("F", "M", "F", "M", "F", "M")
tapply(ages, gender, mean)
Getting and Setting the Working Directory

 Before working with CSV files, it is important to know and set


the working directory where your CSV files are stored.

print(getwd())
[1] "C:/Users/VANDANA/OneDrive/Desktop/2025 odd/prog"

setwd("/Example_Path/")

print(getwd())
Sample CSV File Example
Consider the following sample CSV data saved as sample.csv:
id,name,department,salary,projects
1,A,IT,60754,4
2,B,Tech,59640,2
3,C,Marketing,69040,8
4,D,Marketing,65043,5
5,E,Tech,59943,2
6,F,IT,65000,5
7,G,HR,69000,7
Reading CSV Files into R

 We can load a CSV file into R as a data frame using

the read.csv() function.

 The ncol() and nrow() return the number of columns and rows in the data

frame, respectively.

csv_data <- read.csv(file = 'C:\\Users\\GFG19565\\Downloads\\sample.csv')


return(csv_data)
print(ncol(csv_data))
print(nrow(csv_data))
min_pro <- min(csv_data$projects)
print(min_pro)

result <- csv_data[csv_data$salary > 60000, c("name", "salary")]


print(result)
Calculate average salary per department

result <- tapply(csv_data$salary, csv_data$department, mean)

result_df <- data.frame(Department = names(result), AverageSalary =


as.vector(result))

write.csv(result_df, "Mean_salary.csv", row.names = FALSE)


 2. Calculate total number of projects handled per department and write
to CSV
 The tapply() function is used to compute the total number of projects
handled in each department.
 The result is converted into a data frame for better structure and then written
to a CSV file named department_project_totals.csv.

total_projects <- tapply(csv_data$projects, csv_data$department, sum)

projects_df <- data.frame(Department = names(total_projects), TotalProjects =


total_projects)

write.csv(projects_df, "department_project_totals.csv", row.names = FALSE)


Calculate

CTR (%) = (Clicks ÷ Impressions) × 100


Engagement Score = Likes + Shares + Comments
Engagement Rate (%) = (Engagement Score ÷ Reach) × 100
Sentiment Distribution = Count of Positive, Neutral, Negative posts
CPC (₹) = Total Ad Cost ÷ Clicks (only for Paid = Yes)

•ifelse(condition, value_if_true, value_if_false)


•→ vectorized conditional function in R
# Read CSV
df <- read.csv("data.csv", stringsAsFactors = FALSE)

# Step 1: CTR (Click Through Rate) = (Clicks / Impressions) * 100


df$CTR_pct <- (df$Clicks / df$Impressions) * 100

# Step 2: Engagement Score = Likes + Shares + Comments


df$Engt_Score <- df$Likes + df$Shares + df$Comments

# Step 3: Engagement Rate (%) = (Engagement Score / Reach) * 100


df$Engt_Rate_pct <- (df$Engt_Score / df$Reach) * 100
# Step 4: CPC (Cost per Click) = Ad_Cost / Clicks (only for Paid posts)
df$CPC <- ifelse(df$Paid == "Yes" & df$Clicks > 0, df$Ad_Cost / df$Clicks, NA)

# Step 5: Round off values


df$CTR_pct <- round(df$CTR_pct, 2)
df$Engt_Rate_pct <- round(df$Engt_Rate_pct, 2)
df$CPC <- round(df$CPC, 2)

# Final Output
print(df)
Problem Statement
You are given a dataset containing Post_ID and Comment_Text from
social media.
Your task is to:
Perform sentiment analysis on each comment.
Categorize each comment as Positive, Negative, or Neutral.
Calculate the percentage of positive engagement per post.

Post_ID Comment_Text
101 Love this post! Very inspiring ❤️
101 Not useful at all, waste of time.
101 Nice effort, keep going.
102 This update is terrible 😡
102 I enjoyed reading this, very helpful.
103 Okay post, nothing special.
# Install once (if not already installed)
install.packages("tidyverse") # For data manipulation
install.packages("tidytext") # For text mining
install.packages("textdata") # For sentiment lexicons

# Load libraries
library(tidyverse)
library(tidytext)
library(textdata)
1. tidyverse
•The tidyverse is a collection of R packages designed for data
Analytics.
•All packages in the tidyverse share a common philosophy: tidy
data (where each column is a variable, each row is an
observation, and each cell is a single value).
•When you load library(tidyverse), it automatically loads several
core packages for data manipulation (filter, select, group,
summarize, etc.) and visulaization.
2. tidytext

•The tidytext package is for text mining using tidy data principles.

•It transforms unstructured text into structured formats.

•Key functions:

•unnest_tokens() → Breaks text into tokens (words, bigrams,

sentences).

•get_sentiments() → Fetches sentiment lexicons.


3. textdata
•The textdata package provides datasets and lexicons commonly
used in text analysis and NLP.
•Instead of you downloading sentiment dictionaries manually, textdata
helps install them directly from R.
•Examples of resources it provides:
•Sentiment lexicons:
•AFINN (numerical score −5 to +5)
•Bing (positive/negative)
The AFINN lexicon is a list of English words rated for valence with an integer
between -5 (very negative) and +5 (very positive). It’s widely used in sentiment
analysis.

Word Score
love +3
happy +3
good +3
nice +3
excellent +5
win +4
wow +3
bad -3

sentence: “I love this product, it is excellent!”


love = +3, excellent = +5 → total sentiment = +8 (positive)
What is Bing Lexicon?
It is a dictionary-based sentiment lexicon that classifies words as either:
Positive 👍
Negative 👎
👉 It does not assign numeric strength (like AFINN does) or multiple emotions (like
NRC).
It’s a binary classification: just positive or negative.
Step 2: Prepare Dataset
Imagine we have a CSV file comments.csv like this:

Post_ID Comment_Text
101 Love this post! Very inspiring ❤️
101 Not useful at all, waste of time.
101 Nice effort, keep going.
102 This update is terrible 😡
102 I enjoyed reading this, very helpful.
103 Okay post, nothing special.
comments <- read.csv("comments.csv", stringsAsFactors = FALSE)
print(comments)

🔹 Step 3: Load Sentiment Dictionary


We use the Bing lexicon, which has a list of positive & negative words.

bing <- get_sentiments("bing")


head(bing)
TedyText
🔹 Step 4: Break Comments into Words
We split each comment into individual words (called tokenization).

comments_words <- unnest_tokens(data = comments,


output = word,
input = Comment_Text)
print(comments_words)
🔹 Step 5: Match Words with Sentiments
Now, we join our comment words with the Bing sentiment dictionary.

sentiment_data <- inner_join(comments_words, bing, by = "word")

print(sentiment_data)

ID Name ID Score ID Name Score


1 Asha 2 88 2 Ravi 88
2 Ravi 3 92 3 Meera 92
3 Meera 4 76 4 Kiran 76
4 Kiran 5 85
comments_words
bing (subset)
word
happy word sentiment
sad happy positive
exam joy positive
joy sad negative
failure failure negative

inner_join(comments_words, bing, by = "word") →

word sentiment
happy positive
sad negative
joy positive
failure negative

You might also like