11/29/24, 12:41 PM R Notebook
R Notebook
This is an R Markdown (http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the
results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and
pressing Ctrl+Shift+Enter.
plot(cars)
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
Assignment-06
Exercise-75
Step-01:Load the data
students<-read.delim("E:\\Statistics\\Datasets\\Students.txt",
stringsAsFactors=F)
Step-02:Check the dataset structure
file:///E:/Statistics/Exercises/Assignment-06.html 1/16
11/29/24, 12:41 PM R Notebook
summary(students)
## ID Sex Sex_coded Blood_group
## Min. : 1.00 Length:82 Min. :0.0000 Length:82
## 1st Qu.:21.25 Class :character 1st Qu.:0.0000 Class :character
## Median :41.50 Mode :character Median :1.0000 Mode :character
## Mean :41.50 Mean :0.6585
## 3rd Qu.:61.75 3rd Qu.:1.0000
## Max. :82.00 Max. :1.0000
## Blood_group_coded Rhesus_factor Rhesus_factor_coded Smoking
## Min. :0.0000 Length:82 Min. :0.0000 Length:82
## 1st Qu.:0.0000 Class :character 1st Qu.:1.0000 Class :character
## Median :1.0000 Mode :character Median :1.0000 Mode :character
## Mean :0.9512 Mean :0.8415
## 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :3.0000 Max. :1.0000
## Smoking_coded Size_cm Weight_kg Points_exam
## Min. :0.0000 Min. :157.0 Min. :46.00 Min. : 1.000
## 1st Qu.:0.0000 1st Qu.:167.0 1st Qu.:56.25 1st Qu.: 6.250
## Median :0.0000 Median :170.0 Median :61.00 Median : 8.000
## Mean :0.3171 Mean :173.2 Mean :65.84 Mean : 7.988
## 3rd Qu.:1.0000 3rd Qu.:179.0 3rd Qu.:75.75 3rd Qu.:10.000
## Max. :1.0000 Max. :194.0 Max. :98.00 Max. :12.000
## Grade
## Min. :1.000
## 1st Qu.:2.000
## Median :3.000
## Mean :3.122
## 3rd Qu.:4.750
## Max. :5.000
Step-03:Calculate the Pearson correlation coefficient
correlation <- cor(students$Weight_kg, students$Size_cm, method = "pearson")
cat("Pearson Correlation Coefficient:", correlation, "\n")
## Pearson Correlation Coefficient: 0.7790491
Conclusion
file:///E:/Statistics/Exercises/Assignment-06.html 2/16
11/29/24, 12:41 PM R Notebook
# **Is there any linear relationship between the variables?**
# Hypotheses for Pearson Correlation:
#Null Hypothesis(H0): There is no linear relationship between body weight and body height (p=0).
#Alternative Hypothesis (H1): There is a linear relationship between body weight and body height
(p≠0).
# As p<0.05, we reject the null hypothesis. There is sufficient evidence to conclude that body w
eight and body height are significantly positively linearly related with a correlation coefficie
nt of 𝑟=0.7790491.
Step-04:Visualize the Relationship of Scatter
library(ggpubr)
## Loading required package: ggplot2
ggscatter(
students, x = "Weight_kg", y = "Size_cm",
color = "#1f77b4",
add = "reg.line",
conf.int = TRUE,
add.params = list(color = "#ff7f0e"),
cor.coef = TRUE, cor.method = "pearson",
xlab = "Weight (kg)", ylab = "Height (cm)"
)
file:///E:/Statistics/Exercises/Assignment-06.html 3/16
11/29/24, 12:41 PM R Notebook
Conclusion
#The scatter plot reveals a Strong positive linear relationship between coefficient of body weig
ht and body height in the data set students.
Step-05:Shapiro-Wilk tests
# Shapiro-Wilk test for body weight
shapiro_weight <- shapiro.test(students$Weight_kg)
cat("Shapiro-Wilk Test for Weight:\n")
## Shapiro-Wilk Test for Weight:
cat("W-statistic:", shapiro_weight$statistic, "\n")
## W-statistic: 0.9195322
cat("p-value:", shapiro_weight$p.value, "\n")
## p-value: 7.40539e-05
file:///E:/Statistics/Exercises/Assignment-06.html 4/16
11/29/24, 12:41 PM R Notebook
# Shapiro-Wilk test for body height
shapiro_height <- shapiro.test(students$Size_cm)
cat("Shapiro-Wilk Test for Height:\n")
## Shapiro-Wilk Test for Height:
cat("W-statistic:", shapiro_height$statistic, "\n")
## W-statistic: 0.958204
cat("p-value:", shapiro_height$p.value, "\n")
## p-value: 0.009213035
Step-06:Q-Q Plots
library(ggpubr)
# Q-Q plot for body weight
plot1 <- ggqqplot(students$Weight_kg, ylab = "Body Weight (kg)", color = "#FFA500")
# Q-Q plot for body height
plot2 <- ggqqplot(students$Size_cm, ylab = "Body Height (cm)", color = "#FFA500")
# Arrange the plots side by side
ggarrange(plot1, plot2, ncol = 2, nrow = 1,
labels = c("A", "B"), # Add labels to the plots
common.legend = TRUE, legend = "bottom") # Shared legend
file:///E:/Statistics/Exercises/Assignment-06.html 5/16
11/29/24, 12:41 PM R Notebook
Conclusion
# **Test for significance of the correlation**
#Hypotheses for Shapiro-Wilk Test
#Null Hypothesis (H0): The data is normally distributed.
#Alternative Hypothes is (H1): The data is not normally distributed.
#Shapiro-Wilk Test for Weight: W-statistic: 0.9195322,p-value: 7.40539e-05, as p<0.05 we reject
the null hypothesis (The data for body weight is not normally distributed).
#Shapiro-Wilk Test for height: W-statistic: 0.958204,p-value: 0.009213035 , as p<0.05 we reject
the null hypothesis (The data for body height is not normally distributed).
Exercise-76
Step 01:Load the data:
# Load the ICM dataset
ICM <- read.delim("E:\\Statistics\\Datasets\\ICM.txt", stringsAsFactors = FALSE)
# View the structure of the data to identify the columns for negative and positive mood
str(ICM)
file:///E:/Statistics/Exercises/Assignment-06.html 6/16
11/29/24, 12:41 PM R Notebook
## 'data.frame': 199 obs. of 23 variables:
## $ ID : int 75 90 173 189 100 155 63 48 76 165 ...
## $ Gender : chr "female" "female" "female" "female" ...
## $ Age : int 22 22 37 17 19 16 17 19 27 19 ...
## $ Englishfluent : chr "yes" "yes" "yes" "yes" ...
## $ Germanfluent : chr "no" "no" "yes" "yes" ...
## $ Transport : chr "PublicTransport" "PublicTransport" "Car" "Car" ...
## $ Highest_level_of_education: chr "College" "College" "University" "none" ...
## $ Do_you_smoke : chr "No" "No" "No" "No" ...
## $ Socialmediahours : chr "1.5-3hrs/day" "1.5-3hrs/day" "<1.5hrs/day" "1.5-3hrs/da
y" ...
## $ Timewithfriends : chr "2-5hrs/week" "2-5hrs/week" "5-10hrs/week" "10-20hrs/wee
k" ...
## $ Pet : chr "No" "No" "Yes" "Yes" ...
## $ Siblings : chr "Yes" "Yes" "No" "Yes" ...
## $ Children : chr "No" "No" "Yes" "No" ...
## $ Relationshipstatus : chr "Relationship" "Relationship" "Relationship" "Single" ...
## $ Activitieshours : int 10 10 20 40 20 10 10 20 10 20 ...
## $ NegativeMood : num NA NA NA 4 2.82 ...
## $ PositiveMood : num NA NA NA 0 0.333 ...
## $ Mentalhealth : num 2.667 2.667 3.5 1 0.833 ...
## $ Socialization : num NA NA NA 1 2.5 ...
## $ Activity : num 2.8 2.8 3.4 3.2 1.2 2.6 1.6 1.8 1.2 0.4 ...
## $ SocialSupport : num 4 4 2.333 0.667 2.333 ...
## $ Communication_open_direct : num NA NA 3.38 3.62 3.15 ...
## $ OHS : num 4.59 4.59 5.1 3.14 2.76 ...
# View the first few rows to check the data
head(ICM)
ID Gen… A… Englishfluent Germanfluent Transport Highest_level_of_education
<int><chr> <int><chr> <chr> <chr> <chr>
1 75 female 22 yes no PublicTransport College
2 90 female 22 yes no PublicTransport College
3 173 female 37 yes yes Car University
4 189 female 17 yes yes Car none
5 100 female 19 yes yes Walk HighSchool
6 155 female 16 yes no Walk none
6 rows | 1-8 of 24 columns
Step 02:Check for missing values
# Check the number of missing values in both columns
sum(is.na(ICM$NegativeMood))
file:///E:/Statistics/Exercises/Assignment-06.html 7/16
11/29/24, 12:41 PM R Notebook
## [1] 5
sum(is.na(ICM$PositiveMood))
## [1] 3
ICM_clean <- na.omit(ICM[, c("NegativeMood", "PositiveMood")])
correlation <- cor(ICM_clean$NegativeMood, ICM_clean$PositiveMood, method = "pearson")
cat("Pearson Correlation Coefficient:", correlation, "\n")
## Pearson Correlation Coefficient: -0.6433565
Conclusion
# **Is there any linear relationship between the variables?**
#Null Hypothesis (H0):There is no linear relationship between Negative Mood and Positive Mood (p
=0)
# Alternative Hypothesis(H1): There is a linear relationship between Negative Mood and Positive
Mood (p is not equal to zero)
# As p<0.05, we reject the null hypothesis.There is a statistically significant negative linear
relationship between Negative Mood and Positive Mood.
Step 03:Test for Significance
cor_test <- cor.test(ICM_clean$NegativeMood, ICM_clean$PositiveMood, method = "pearson")
cat("Pearson Correlation Test:\n")
## Pearson Correlation Test:
print(cor_test)
##
## Pearson's product-moment correlation
##
## data: ICM_clean$NegativeMood and ICM_clean$PositiveMood
## t = -11.644, df = 192, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7190609 -0.5525618
## sample estimates:
## cor
## -0.6433565
Step 04:Visualize the Relationship (Scatter Plot)
file:///E:/Statistics/Exercises/Assignment-06.html 8/16
11/29/24, 12:41 PM R Notebook
library(ggpubr)
ggscatter(
ICM_clean, x = "NegativeMood", y = "PositiveMood",
color = "#1f77b4",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Negative Mood", ylab = "Positive Mood"
)
Step-05:Shapiro-Wilk tests
# Shapiro-Wilk Test for Normality
shapiro_negative <- shapiro.test(ICM_clean$NegativeMood)
cat("Shapiro-Wilk Test for Negative Mood:\n")
## Shapiro-Wilk Test for Negative Mood:
print(shapiro_negative)
file:///E:/Statistics/Exercises/Assignment-06.html 9/16
11/29/24, 12:41 PM R Notebook
##
## Shapiro-Wilk normality test
##
## data: ICM_clean$NegativeMood
## W = 0.97664, p-value = 0.002498
shapiro_positive <- shapiro.test(ICM_clean$PositiveMood)
cat("Shapiro-Wilk Test for Positive Mood:\n")
## Shapiro-Wilk Test for Positive Mood:
print(shapiro_positive)
##
## Shapiro-Wilk normality test
##
## data: ICM_clean$PositiveMood
## W = 0.98441, p-value = 0.03015
Step-06:Q-Q Plot
# Q-Q plot for Negative Mood
ggqqplot(ICM_clean$NegativeMood, ylab = "Negative Mood", color = "#1f77b4")
file:///E:/Statistics/Exercises/Assignment-06.html 10/16
11/29/24, 12:41 PM R Notebook
# Q-Q plot for Positive Mood
ggqqplot(ICM_clean$PositiveMood, ylab = "Positive Mood", color = "#1f77b4", )
file:///E:/Statistics/Exercises/Assignment-06.html 11/16
11/29/24, 12:41 PM R Notebook
Conclusion
#Test for significance of the correlation.
#Null Hypothesis (H0):The data is normally distributed.
# Alternative Hypothesis(H1):The data is not normally distributed.
# Shapiro-Wilk Test for Negative Mood: As p<0.05 , W = 0.97664, p-value = 0.002498,we reject the
null hypothesis. The data for Negative Mood is not normally distributed.
# Shapiro-Wilk Test for Positive Mood: As p<0.05 , W = 0.98441, p-value = 0.03015,we reject the
null hypothesis. The data for Positive Mood is not normally distributed.
Exercise-79
Step-01:Load the Dataset
# Load the students dataset
students <- read.delim("E:\\Statistics\\Datasets\\Students.txt", stringsAsFactors = FALSE)
# View the structure of the dataset to identify the columns for weight and height
str(students)
file:///E:/Statistics/Exercises/Assignment-06.html 12/16
11/29/24, 12:41 PM R Notebook
## 'data.frame': 82 obs. of 13 variables:
## $ ID : int 24 5 54 9 34 52 12 16 32 59 ...
## $ Sex : chr "M" "M" "F" "M" ...
## $ Sex_coded : int 0 0 1 0 1 1 0 0 1 1 ...
## $ Blood_group : chr "0" "0" "A" "0" ...
## $ Blood_group_coded : int 0 0 1 0 1 0 0 1 0 1 ...
## $ Rhesus_factor : chr "+" "+" "+" "+" ...
## $ Rhesus_factor_coded: int 1 1 1 1 1 1 1 1 1 0 ...
## $ Smoking : chr "no" "no" "no" "no" ...
## $ Smoking_coded : int 0 0 0 0 0 1 1 1 0 0 ...
## $ Size_cm : int 190 187 171 185 166 164 184 187 163 170 ...
## $ Weight_kg : int 98 81 54 70 53 55 74 75 46 63 ...
## $ Points_exam : int 1 2 2 3 3 3 4 4 4 4 ...
## $ Grade : int 5 5 5 5 5 5 5 5 5 5 ...
# View the first few rows of the dataset to check the data
head(students)
ID S… Sex_co… Blood_group Blood_group_coded Rhesus_factor Rhesus_factor_coded Sm
<int><chr> <int> <chr> <int> <chr> <int> <c
1 24 M 0 0 0 + 1 no
2 5 M 0 0 0 + 1 no
3 54 F 1 A 1 + 1 no
4 9 M 0 0 0 + 1 no
5 34 F 1 A 1 + 1 no
6 52 F 1 0 0 + 1 ye
6 rows | 1-9 of 14 columns
Step-02:Calculate Spearman’s rho
# Calculate Spearman's rank correlation coefficient between body weight and body height
spearman_corr <- cor(students$Weight_kg, students$Size_cm, method = "spearman")
# Display the Spearman correlation coefficient
cat("Spearman's rho:", spearman_corr, "\n")
## Spearman's rho: 0.7740172
Step-03:Test for Significance
# Perform the Spearman correlation test
cor_test <- cor.test(students$Weight_kg, students$Size_cm, method = "spearman")
file:///E:/Statistics/Exercises/Assignment-06.html 13/16
11/29/24, 12:41 PM R Notebook
## Warning in cor.test.default(students$Weight_kg, students$Size_cm, method =
## "spearman"): Cannot compute exact p-value with ties
# Print the result of the correlation test
cat("Spearman's rank correlation test result:\n")
## Spearman's rank correlation test result:
print(cor_test)
##
## Spearman's rank correlation rho
##
## data: students$Weight_kg and students$Size_cm
## S = 20764, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.7740172
Conclusion
# **Test for significance of the correlation**
# Null Hypothesis (H0):There is no monotonic relationship between body weight and body height (p
=0)
# Alternative Hypothesis (H1):There is a monotonic relationship between body weight and body hei
ght (p is not equal to 0)
#S = 20764, p-value < 2.2e-16,as the p-value is less than 0.05, we reject the null hypothesis. T
herefore, we conclude that there is a statistically significant monotonic relationship between b
ody weight and body height with a Spearman’s rho of p=0.7740172.
Exercise-80
Step-01:Load and view the Dataset
ICM <- read.delim("E:\\Statistics\\Datasets\\ICM.txt", stringsAsFactors = FALSE)
# View the structure of the data to identify the columns for NegativeMood and OHS
str(ICM)
file:///E:/Statistics/Exercises/Assignment-06.html 14/16
11/29/24, 12:41 PM R Notebook
## 'data.frame': 199 obs. of 23 variables:
## $ ID : int 75 90 173 189 100 155 63 48 76 165 ...
## $ Gender : chr "female" "female" "female" "female" ...
## $ Age : int 22 22 37 17 19 16 17 19 27 19 ...
## $ Englishfluent : chr "yes" "yes" "yes" "yes" ...
## $ Germanfluent : chr "no" "no" "yes" "yes" ...
## $ Transport : chr "PublicTransport" "PublicTransport" "Car" "Car" ...
## $ Highest_level_of_education: chr "College" "College" "University" "none" ...
## $ Do_you_smoke : chr "No" "No" "No" "No" ...
## $ Socialmediahours : chr "1.5-3hrs/day" "1.5-3hrs/day" "<1.5hrs/day" "1.5-3hrs/da
y" ...
## $ Timewithfriends : chr "2-5hrs/week" "2-5hrs/week" "5-10hrs/week" "10-20hrs/wee
k" ...
## $ Pet : chr "No" "No" "Yes" "Yes" ...
## $ Siblings : chr "Yes" "Yes" "No" "Yes" ...
## $ Children : chr "No" "No" "Yes" "No" ...
## $ Relationshipstatus : chr "Relationship" "Relationship" "Relationship" "Single" ...
## $ Activitieshours : int 10 10 20 40 20 10 10 20 10 20 ...
## $ NegativeMood : num NA NA NA 4 2.82 ...
## $ PositiveMood : num NA NA NA 0 0.333 ...
## $ Mentalhealth : num 2.667 2.667 3.5 1 0.833 ...
## $ Socialization : num NA NA NA 1 2.5 ...
## $ Activity : num 2.8 2.8 3.4 3.2 1.2 2.6 1.6 1.8 1.2 0.4 ...
## $ SocialSupport : num 4 4 2.333 0.667 2.333 ...
## $ Communication_open_direct : num NA NA 3.38 3.62 3.15 ...
## $ OHS : num 4.59 4.59 5.1 3.14 2.76 ...
# View the first few rows of the dataset to check the data
head(ICM)
ID Gen… A… Englishfluent Germanfluent Transport Highest_level_of_education
<int><chr> <int><chr> <chr> <chr> <chr>
1 75 female 22 yes no PublicTransport College
2 90 female 22 yes no PublicTransport College
3 173 female 37 yes yes Car University
4 189 female 17 yes yes Car none
5 100 female 19 yes yes Walk HighSchool
6 155 female 16 yes no Walk none
6 rows | 1-8 of 24 columns
Step:02-Calculate Spearman’s rho
file:///E:/Statistics/Exercises/Assignment-06.html 15/16
11/29/24, 12:41 PM R Notebook
# Remove rows with missing values in either NegativeMood or OHS
cleaned_data <- na.omit(ICM[, c("NegativeMood", "OHS")])
# Calculate Spearman's correlation on the cleaned data
spearman_corr <- cor(cleaned_data$NegativeMood, cleaned_data$OHS, method = "spearman")
# Display the Spearman correlation coefficient
cat("Spearman's rho:", spearman_corr, "\n")
## Spearman's rho: -0.5725575
Step-03:Test for Significance
# Perform the Spearman correlation test
cor_test <- cor.test(ICM$NegativeMood, ICM$OHS, method = "spearman")
## Warning in cor.test.default(ICM$NegativeMood, ICM$OHS, method = "spearman"):
## Cannot compute exact p-value with ties
# Print the result of the correlation test
cat("Spearman's rank correlation test result:\n")
## Spearman's rank correlation test result:
print(cor_test)
##
## Spearman's rank correlation rho
##
## data: ICM$NegativeMood and ICM$OHS
## S = 1453320, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5725575
Conclusion
# **Test for significance of the correlation**
# Null Hypothesis (H0): There is no monotonic relationship between negative mood and OHS (p=0)
# Alternative Hypothesis (H1):There is a monotonic relationship between negative mood and OHS
(p is not equal to 0)
#S = 1453320, p-value < 2.2e-16,as the p-value is less than 0.05, we reject the null hypothesis
and conclude that there is a statistically significant negative monotonic relationship between n
egative mood and OHS.
file:///E:/Statistics/Exercises/Assignment-06.html 16/16