B.
TECH 5th AIML
R Programming Lab Manual
Sr. No. Name of Experiment Date of Date of Remark
Exp. Submission
1 Importing and cleaning data :
In this experiment, students will learn how to
import data from a variety of sources, such as CSV
files, Excel files, and databases. They will also
learn how to clean data by removing missing
values, outliers, and duplicate rows.
2 Data wrangling
In this experiment, students will learn how to
transform data by changing the data types,
merging data sets, and creating new variables.
They will also learn how to explore data by using
statistical methods such as descriptive statistics
and hypothesis testing.
3 Data visualization
In this experiment, students will learn how to
create effective data visualizations using R. They
will learn how to choose the right type of plot for
the data, how to customize plots, and how to save
plots.
4 Statistical analysis
In this experiment, students will learn how to
conduct descriptive and inferential statistical
analysis using R. They will learn how to calculate
descriptive statistics, such as mean, median, and
standard deviation. They will also learn how to
conduct hypothesis testing to determine if there is
a statistically significant difference between two
groups.
5 Machine learning
In this experiment, students will learn how to
apply machine learning algorithms to solve real-
world problems. They will learn how to train and
evaluate machine learning models, and how to use
machine learning models to make predictions.
6 Design an experiment to determine the effect of
different types of fertilizer on plant growth. This
experiment allows students to explore the factors
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
that affect plant growth.
7 This experiment allows students to explore the
relationship between food and energy.
8 Design an experiment to determine the effect of
different types of light on the growth of plants.
This experiment allows students to explore the role
of light in plant growth.
9 Design an experiment to determine the effect of
different types of soil on the growth of plants. This
experiment allows students to explore the role of
soil in plant growth.
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Experiment 1:
Importing and cleaning data
In this experiment, students will learn how to import data from a variety of sources, such
as CSV files, Excel files, and databases. They will also learn how to clean data by removing
missing values, outliers, and duplicate rows.
Importing Data from CSV Files
CSV files are commonly used for storing data, and they can be easily imported into R using the
read.csv() function.
Importing the 'readr' library for CSV import
library(readr)
Importing a CSV file
data_csv <- read_csv("path_to_file.csv")
Displaying the first few rows of the dataset
head(data_csv)
Alternatively, you can use the base R function read.csv():
data_csv <- read.csv("path_to_file.csv")
Display the first few rows of the dataset
head(data_csv)
Importing Data from Excel Files
To import Excel files in R, you will need the readxl or openxlsx package.
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Importing the 'readxl' library
library(readxl)
Importing an Excel file
data_excel <- read_excel("path_to_file.xlsx", sheet = 1)
Displaying the first few rows of the dataset
head(data_excel)
you can use the openxlsx package for more advanced Excel file manipulation
library(openxlsx)
Importing data from an Excel file
data_excel <- read.xlsx("path_to_file.xlsx", sheet = 1)
Displaying the first few rows of the dataset
head(data_excel)
Importing Data from a Database (e.g., MySQL, SQLite)
To import data from a database, you can use the DBI and RMySQL (or RSQLite for SQLite
databases) packages.
Installing and loading necessary libraries
install.packages("DBI")
install.packages("RMySQL")
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
library(DBI)
library(RMySQL)
Connecting to a MySQL database
con <- dbConnect(RMySQL::MySQL(), dbname = "your_database_name", host = "localhost",
user = "your_username", password = "your_password")
Querying data from a table
data_db <- dbGetQuery(con, "SELECT * FROM your_table_name")
Display the first few rows of the dataset
head(data_db)
Close the connection
dbDisconnect(con)
Cleaning the Data
Handling Missing Values
Handling missing values is crucial to ensure that the analysis is not biased or incomplete. There
are various strategies for dealing with missing values, such as removing or imputing them.
Checking for missing values in the dataset
sum(is.na(data_csv))
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Option 1: Remove rows with any missing values
data_no_missing <- na.omit(data_csv)
Option 2: Impute missing values (e.g., using the mean or median)
data_imputed <- data_csv
data_imputed$column_name[is.na(data_imputed$column_name)] <-
mean(data_imputed$column_name, na.rm = TRUE)
Alternatively, for median imputation:
data_imputed$column_name[is.na(data_imputed$column_name)] <-
median(data_imputed$column_name, na.rm = TRUE)
Removing Duplicate Rows :
Checking for duplicate rows
duplicates <- duplicated(data_csv)
sum(duplicates) This will show the number of duplicated rows
Removing duplicate rows
data_no_duplicates <- data_csv[!duplicated(data_csv),
Detecting and Handling Outliers :
Calculating the IQR
Q1 <- quantile(data_csv$column_name, 0.25)
Q3 <- quantile(data_csv$column_name, 0.75)
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
IQR <- Q3 - Q1
Defining the lower and upper bounds for outliers
lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR
Identifying outliers
outliers <- data_csv$column_name < lower_bound | data_csv$column_name > upper_bound
sum(outliers) Number of outliers
Removing outliers
data_no_outliers <- data_csv[!outliers, ]
Saving the Cleaned Data :
Saving the cleaned data to a CSV file
write.csv(data_no_duplicates, "cleaned_data.csv", row.names = FALSE)
Saving the cleaned data to an Excel file
library(openxlsx)
write.xlsx(data_no_duplicates, "cleaned_data.xlsx")
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Experiment 2:
Data wrangling
In this experiment, students will learn how to transform data by changing the data types,
merging data sets, and creating new variables. They will also learn how to explore data by
using statistical methods such as descriptive statistics and hypothesis testing.
Data Transformation
1. Changing Data Types
Sometimes, the data types of your variables might need to be changed for effective analysis. In
R, you can use functions like as.numeric(), as.character(), and as.factor() to change data types.
Example dataset
data <- data.frame(
ID = c(1, 2, 3, 4),
Date = c('2024-01-01', '2024-02-01', '2024-03-01', '2024-04-01'),
Score = c('85', '90', '87', '88')
Changing 'Score' from character to numeric
data$Score <- as.numeric(data$Score)
Changing 'Date' from character to Date type
data$Date <- as.Date(data$Date)
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Changing 'ID' to factor
data$ID <- as.factor(data$ID)
Viewing the data types
str(data)
2. Merging Datasets :
Example dataframes to merge
df1 <- data.frame(ID = c(1, 2, 3, 4), Name = c("Alice", "Bob", "Charlie", "David"))
df2 <- data.frame(ID = c(1, 2, 3, 5), Score = c(85, 90, 87, 88))
Merging data on the 'ID' column (inner join by default)
merged_data <- merge(df1, df2, by = "ID", all = FALSE) all = FALSE means inner join
Viewing the merged data
print(merged_data)
3. Creating New Variables :
Creating a new variable 'TotalScore' by adding two columns
data$TotalScore <- data$Score + 10 Adding 10 to each Score
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Creating a new categorical variable based on conditions
data$Performance <- ifelse(data$Score > 90, "High", "Low")
Viewing the updated dataset
head(data)
Data Exploration with Statistical Methods :
1. Descriptive Statistics
Descriptive statistics help summarize the main characteristics of a dataset. In R, you can use
functions like summary(), mean(), median(), sd(), and table() to explore data.
Summary of the data
summary(data)
Calculating mean and standard deviation of 'Score'
mean_score <- mean(data$Score)
sd_score <- sd(data$Score)
Median of 'Score'
median_score <- median(data$Score)
Frequency table of 'Performance'
table(data$Performance)
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
2. Visualizing Data (Descriptive Exploration) :
Basic histogram of 'Score'
hist(data$Score, main = "Histogram of Scores", xlab = "Score", col = "lightblue", border =
"black")
Boxplot of 'Score' to detect outliers
boxplot(data$Score, main = "Boxplot of Scores", ylab = "Score", col = "lightgreen")
Bar plot for 'Performance' category
barplot(table(data$Performance), main = "Performance Distribution", col = c("blue", "red"))
If you are using the ggplot2 package for visualization:
Install and load ggplot2 package
install.packages("ggplot2")
library(ggplot2)
Scatter plot of Score vs TotalScore
ggplot(data, aes(x = Score, y = TotalScore)) +
geom_point() +
ggtitle("Score vs TotalScore") +
xlab("Score") +
ylab("Total Score")
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Experiment 3:
Data visualization
In this experiment, students will learn how to create effective data visualizations using R.
They will learn how to choose the right type of plot for the data, how to customize plots,
and how to save plots.
Step 1: Installing and Loading Required Libraries
To get started with data visualization in R, we’ll use two primary libraries:
Base R plotting functions (e.g., plot(), hist(), boxplot())
ggplot2: A powerful and flexible package for creating visually appealing plots.
Install ggplot2 if not already installed
install.packages("ggplot2")
Load ggplot2 library
library(ggplot2)
Creating Basic Plots in R
1. Histogram (for Distribution of a Single Variable)
Creating a histogram using Base R
data <- c(85, 90, 87, 88, 92, 95, 91, 89, 88, 86)
Basic histogram in Base R
hist(data, main = "Histogram of Scores", xlab = "Scores", col = "lightblue", border = "black")
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Histogram with ggplot2
ggplot(data = data.frame(Scores = data), aes(x = Scores)) +
geom_histogram(binwidth = 2, fill = "lightblue", color = "black", alpha = 0.7) +
ggtitle("Histogram of Scores") +
xlab("Scores") +
ylab("Frequency")
Box Plot (for Distribution and Outliers) :
Creating a box plot using Base R
boxplot(data, main = "Boxplot of Scores", ylab = "Scores", col = "lightgreen")
Box plot using ggplot2
ggplot(data = data.frame(Scores = data), aes(y = Scores)) +
geom_boxplot(fill = "lightgreen", color = "black") +
ggtitle("Boxplot of Scores") +
ylab("Scores")
Customizing Plots :
1. Customizing Base R Plots
Customizing a histogram
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
hist(data, main = "Customized Histogram of Scores", xlab = "Scores", col = "lightblue",
border = "black", breaks = 5)
Adding gridlines and titles
plot(x, y, main = "Customized Scatter Plot", xlab = "X Values", ylab = "Y Values", pch = 19,
col = "blue")
grid()
Saving Plots :
a file in various formats such as PNG, JPEG, or PDF using the ggsave() function or base R
functions like png(), jpeg(), or pdf().
Saving as PNG
png("scatter_plot.png")
plot(x, y, main = "Scatter Plot", xlab = "X", ylab = "Y", pch = 19, col = "blue")
dev.off() Don't forget to turn off the device
Saving as PDF
pdf("line_plot.pdf")
plot(time, value, type = "o", main = "Line Plot Example", xlab = "Time", ylab = "Value", col =
"blue")
dev.off()
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Experiment 4:
Statistical analysis
In this experiment, students will learn how to conduct descriptive and inferential statistical
analysis using R. They will learn how to calculate descriptive statistics, such as mean,
median, and standard deviation. They will also learn how to conduct hypothesis testing to
determine if there is a statistically significant difference between two groups.
Calculating Descriptive Statistics
Descriptive statistics include measures of central tendency (mean, median), dispersion (standard
deviation, variance), and shape (skewness, kurtosis).
Example data
data <- c(23, 45, 56, 67, 45, 23, 56, 78, 90, 34, 56, 45)
Mean
mean_data <- mean(data)
cat("Mean:", mean_data, "\n")
Median
median_data <- median(data)
cat("Median:", median_data, "\n")
Standard Deviation
sd_data <- sd(data)
cat("Standard Deviation:", sd_data, "\n")
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Variance
variance_data <- var(data)
cat("Variance:", variance_data, "\n")
Minimum and Maximum values
min_data <- min(data)
max_data <- max(data)
cat("Min:", min_data, "Max:", max_data, "\n")
Summary (gives min, 1st quartile, median, mean, 3rd quartile, max)
summary_data <- summary(data)
cat("Summary:", summary_data, "\n")
Output:
Inferential Statistics :
One-Sample t-Test
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
A one-sample t-test is used to determine if the sample mean is significantly different from a
known value (typically the population mean).
One-sample t-test to test if the mean is different from 50
t_test_one_sample <- t.test(data, mu = 50)
cat("One-Sample t-Test Results:\n")
print(t_test_one_sample)
Chi-Square Test for Independence
A chi-square test is used to determine whether there is an association between two categorical
variables.
Contingency table for gender and smoking status
smoking_data <- data.frame(
Gender = c("Male", "Female"),
Non_Smoker = c(40, 60),
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Smoker = c(10, 20)
Perform the Chi-Square test
chisq_test <- chisq.test(smoking_data[, -1])
cat("Chi-Square Test Results:\n")
print(chisq_test)
Output:
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Experiment 5
In this experiment, students will learn how to apply machine learning algorithms to solve
real-world problems. They will learn how to train and evaluate machine learning models,
and how to use machine learning models to make predictions.
1. Setting Up the Environment
Install necessary libraries:
install.packages(c("caret", "randomForest", "e1071", "ggplot2"))
library(caret)
library(randomForest)
library(e1071)
library(ggplot2)
2. Understanding the Data
Students will begin by loading a dataset and performing basic exploration.
Example: Using the `iris` dataset:
data(iris)
str(iris) Check the structure of the data
summary(iris) Summary statistics of the dataset
3. Data Preprocessing
Clean the data by checking for missing values and normalizing or scaling if necessary.
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
sum(is.na(iris)) Check for missing values
4. Splitting the Data
Split the dataset into training and testing sets (typically 80% training and 20% testing).
set.seed(123)
trainIndex <- createDataPartition(iris$Species, p = 0.8, list = FALSE)
trainData <- iris[trainIndex, ]
testData <- iris[-trainIndex, ]
5. Training a Model
Example: Using the `randomForest` model to train the data.
model_rf <- randomForest(Species ~ ., data = trainData)
print(model_rf) Print model summary
6. Evaluating the Model
predictions <- predict(model_rf, newdata = testData)
confusionMatrix(predictions, testData$Species)
7. Making Predictions
new_data <- data.frame(Sepal.Length = 5.1, Sepal.Width = 3.5, Petal.Length = 1.4,
Petal.Width = 0.2)
prediction <- predict(model_rf, new_data)
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
print(prediction)
8. Model Tuning (Optional)
tune_rf <- train(Species ~ ., data = trainData, method = "rf", trControl = trainControl(method
= "cv", number = 10))
print(tune_rf)
9. Visualizing the Results
varImpPlot(model_rf) Plot variable importance
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Experiment 6:
Design an experiment to determine the effect of different types of fertilizer on plant
growth. This experiment allows students to explore the factors that affect plant growth.
Step 1: Set up the Environment and Simulate Data
Load necessary libraries
library(ggplot2)
library(dplyr)
Set seed for reproducibility
set.seed(123)
Create a dataset for plant growth simulation (4 weeks of data)
weeks <- rep(1:4, times = 3) 4 weeks, repeated for 3 fertilizer types
fertilizer_type <- rep(c("Organic", "Inorganic", "Control"), each = 4) Fertilizer types
growth_data <- data.frame(Week = weeks,
Fertilizer = fertilizer_type,
Height = numeric(12),
Leaves = numeric(12))
Simulate plant height and leaf number based on fertilizer type
growth_data$Height <- ifelse(growth_data$Fertilizer == "Organic",
rnorm(12, mean = 20 + growth_data$Week * 5, sd = 2),
ifelse(growth_data$Fertilizer == "Inorganic",
rnorm(12, mean = 25 + growth_data$Week * 6, sd = 2),
rnorm(12, mean = 15 + growth_data$Week * 3, sd = 2)))
growth_data$Leaves <- ifelse(growth_data$Fertilizer == "Organic",
rnorm(12, mean = 10 + growth_data$Week * 3, sd = 1),
ifelse(growth_data$Fertilizer == "Inorganic",
rnorm(12, mean = 12 + growth_data$Week * 4, sd = 1),
rnorm(12, mean = 8 + growth_data$Week * 2, sd = 1)))
View simulated data
head(growth_data)
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Step 2: Visualize the Growth Data
ggplot(growth_data, aes(x = Week, y = Height, color = Fertilizer)) +
geom_line() +
geom_point() +
labs(title = "Plant Height Over Time by Fertilizer Type", x = "Week", y = "Plant Height (cm)")
+
theme_minimal()
Visualize number of leaves by fertilizer type over time
ggplot(growth_data, aes(x = Week, y = Leaves, color = Fertilizer)) +
geom_line() +
geom_point() +
labs(title = "Number of Leaves Over Time by Fertilizer Type", x = "Week", y = "Number of
Leaves") +
theme_minimal()
Step 3: Statistical Analysis (ANOVA Test)
ANOVA for Plant Height
anova_height <- aov(Height ~ Fertilizer + Week + Fertilizer:Week, data = growth_data)
summary(anova_height)
ANOVA for Number of Leaves
anova_leaves <- aov(Leaves ~ Fertilizer + Week + Fertilizer:Week, data = growth_data)
summary(anova_leaves)
Step 4: Post-Hoc Test (If ANOVA is significant)
Post-Hoc Test for Plant Height
tukey_height <- TukeyHSD(anova_height)
summary(tukey_height)
Post-Hoc Test for Number of Leaves
tukey_leaves <- TukeyHSD(anova_leaves)
summary(tukey_leaves)
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Experiment 7 :
This experiment allows students to explore the relationship between food and energy.
Step 1: Simulating Data for Food Types and Energy Levels
Load necessary libraries
library(ggplot2)
library(dplyr)
Set seed for reproducibility
set.seed(123)
Define food types and their caloric values per 100g (in kcal)
food_data <- data.frame(
Food = c('Carbohydrates', 'Proteins', 'Fats', 'Fruits'),
Calories = c(250, 200, 300, 100), Approximate calories for 100g portion
EnergyBefore = c(5, 6, 5, 7), Energy level before consumption (scale 1-10)
EnergyAfter = c(7, 7, 6, 8), Energy level after consumption (scale 1-10)
DurationEnergy = c(3, 2.5, 2, 3) Duration of energy in hours
)
View the simulated data
print(food_data)
Step 2: Visualizing Energy Levels Before and After Eating
Boxplot of energy before and after eating
ggplot(food_data, aes(x = Food, y = EnergyAfter, fill = Food)) +
geom_boxplot() +
labs(title = "Energy After Eating Different Foods", y = "Energy Level (1-10)", x = "Food
Type") +
theme_minimal()
Boxplot of energy duration (how long energy lasts)
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
ggplot(food_data, aes(x = Food, y = DurationEnergy, fill = Food)) +
geom_boxplot() +
labs(title = "Duration of Energy After Eating Different Foods", y = "Duration of Energy
(hours)", x = "Food Type") +
theme_minimal()
Step 3: Statistical Analysis
ANOVA for Energy Levels After Eating
anova_energy <- aov(EnergyAfter ~ Food, data = food_data)
summary(anova_energy)
ANOVA for Duration of Energy
anova_duration <- aov(DurationEnergy ~ Food, data = food_data)
summary(anova_duration)
Step 4: Post-Hoc Analysis (Tukey's HSD)
Tukey's HSD test for post-hoc analysis
tukey_energy <- TukeyHSD(anova_energy)
summary(tukey_energy)
Tukey's HSD test for energy duration
tukey_duration <- TukeyHSD(anova_duration)
summary(tukey_duration)
Experiment 8:
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Design an experiment to determine the effect of different types of light on the growth of
plants. This experiment allows students to explore the role of light in plant growth.
Step 1: Set up the Environment and Simulate Data
Load necessary libraries
library(ggplot2)
library(dplyr)
Set seed for reproducibility
set.seed(123)
Define the light conditions and simulate plant growth data over 4 weeks
weeks <- rep(1:4, times = 3) 4 weeks repeated for each light condition
light_condition <- rep(c("Sunlight", "LED", "Fluorescent"), each = 4) Light conditions
Simulate plant growth data: height and number of leaves over time
growth_data <- data.frame(Week = weeks,
Light = light_condition,
Height = numeric(12), Plant height in cm
Leaves = numeric(12)) Number of leaves
Simulate plant height and leaf number based on light condition
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
growth_data$Height <- ifelse(growth_data$Light == "Sunlight",
rnorm(12, mean = 10 + growth_data$Week * 2, sd = 1),
ifelse(growth_data$Light == "LED",
rnorm(12, mean = 9 + growth_data$Week * 1.8, sd = 1),
rnorm(12, mean = 8 + growth_data$Week * 1.5, sd = 1)))
growth_data$Leaves <- ifelse(growth_data$Light == "Sunlight",
rnorm(12, mean = 5 + growth_data$Week * 1, sd = 1),
ifelse(growth_data$Light == "LED",
rnorm(12, mean = 4 + growth_data$Week * 0.8, sd = 1),
rnorm(12, mean = 3 + growth_data$Week * 0.6, sd = 1)))
View simulated data
head(growth_data)
Step 2: Visualize the Data.
Line plot for plant height over time by light condition
ggplot(growth_data, aes(x = Week, y = Height, color = Light)) +
geom_line() +
geom_point() +
labs(title = "Plant Height Over Time by Light Condition", x = "Week", y = "Plant Height
(cm)") +
theme_minimal()
Line plot for number of leaves over time by light condition
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
ggplot(growth_data, aes(x = Week, y = Leaves, color = Light)) +
geom_line() +
geom_point() +
labs(title = "Number of Leaves Over Time by Light Condition", x = "Week", y = "Number of
Leaves") +
theme_minimal()
Step 3: Statistical Analysis (ANOVA)
ANOVA for Plant Height
anova_height <- aov(Height ~ Light + Week + Light:Week, data = growth_data)
summary(anova_height)
ANOVA for Number of Leaves
anova_leaves <- aov(Leaves ~ Light + Week + Light:Week, data = growth_data)
summary(anova_leaves)
Step 4: Post-Hoc Test (If ANOVA is significant)
Post-Hoc Test for Plant Height
tukey_height <- TukeyHSD(anova_height)
summary(tukey_height)
Post-Hoc Test for Number of Leaves
tukey_leaves <- TukeyHSD(anova_leaves)
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
summary(tukey_leaves)
Output:
ANOVA for Plant Height:
summary(anova_height)
Example:
Df Sum Sq Mean Sq F value Pr(>F)
Light 2 2.456 1.228 5.43 0.015
Week 3 3.872 1.290 6.17 0.004
Tukey's HSD test for Plant Height:
summary(tukey_height)
Example:
diff lwr upr p adj
Sunlight-LED 0.45 -0.21 1.11 0.32
Sunlight-Florescent 1.15 0.72 1.58 0.001 *
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Experiment 9 :
Design an experiment to determine the effect of different types of soil on the growth of
plants. This experiment allows students to explore the role of soil in plant growth.
Step 1: Set up the Environment and Simulate Data
Load necessary libraries
library(ggplot2)
library(dplyr)
Set seed for reproducibility
set.seed(123)
Define the soil types and simulate plant growth data over 4 weeks
weeks <- rep(1:4, times = 3) 4 weeks repeated for each soil condition
soil_type <- rep(c("Loamy", "Sandy", "Clay"), each = 4) Soil types
Simulate plant growth data: height and number of leaves over time
growth_data <- data.frame(Week = weeks,
Soil = soil_type,
Height = numeric(12), Plant height in cm
Leaves = numeric(12)) Number of leaves
Simulate plant height and leaf number based on soil type
growth_data$Height <- ifelse(growth_data$Soil == "Loamy",
rnorm(12, mean = 10 + growth_data$Week * 2, sd = 1),
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
ifelse(growth_data$Soil == "Sandy",
rnorm(12, mean = 8 + growth_data$Week * 1.5, sd = 1),
rnorm(12, mean = 7 + growth_data$Week * 1.2, sd = 1)))
growth_data$Leaves <- ifelse(growth_data$Soil == "Loamy",
rnorm(12, mean = 5 + growth_data$Week * 1, sd = 1),
ifelse(growth_data$Soil == "Sandy",
rnorm(12, mean = 4 + growth_data$Week * 0.8, sd = 1),
rnorm(12, mean = 3 + growth_data$Week * 0.6, sd = 1)))
View simulated data
head(growth_data)
Step 2: Visualize the Data
Line plot for plant height over time by soil type
ggplot(growth_data, aes(x = Week, y = Height, color = Soil)) +
geom_line() +
geom_point() +
labs(title = "Plant Height Over Time by Soil Type", x = "Week", y = "Plant Height (cm)") +
theme_minimal()
Line plot for number of leaves over time by soil type
ggplot(growth_data, aes(x = Week, y = Leaves, color = Soil)) +
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
geom_line() +
geom_point() +
labs(title = "Number of Leaves Over Time by Soil Type", x = "Week", y = "Number of
Leaves") +
theme_minimal()
Step 3: Statistical Analysis (ANOVA)
ANOVA for Plant Height
anova_height <- aov(Height ~ Soil + Week + Soil:Week, data = growth_data)
summary(anova_height)
ANOVA for Number of Leaves
anova_leaves <- aov(Leaves ~ Soil + Week + Soil:Week, data = growth_data)
summary(anova_leaves)
Step 4: Post-Hoc Test (If ANOVA is significant)
Post-Hoc Test for Plant Height
tukey_height <- TukeyHSD(anova_height)
summary(tukey_height)
Post-Hoc Test for Number of Leaves
tukey_leaves <- TukeyHSD(anova_leaves)
summary(tukey_leaves)
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai
B.TECH 5th AIML
R Programming Lab Manual
Output :
Computer Science & Engineering Department
RSR Rungta College of Engineering & Technology Bhilai