Name of student:
Practical 2: Test for independence and homogeneity of proportions
Q.1
# Step 1: Create the observed data table
observed = matrix(c(30, 50, 20,60, 40, 40),nrow = 2, ncol = 3, byrow = TRUE)
Yoga Strength Training Cardio
Male 30 50 20
Female 60 40 40
# Assign row and column names
rownames(observed) = c("Male", "Female")
colnames(observed) = c("Yoga", "Strength Training", "Cardio")
# Display the observed data
observed
# Step 2: Calculate the expected frequencies
# Total sum
total_sum = sum(observed)
# Row and column totals
row_totals = rowSums(observed)
col_totals = colSums(observed)
# Calculate expected frequencies for each cell
expected = outer(row_totals, col_totals, FUN = "*") / total_sum
# Display the expected frequencies
expected
# Step 3: Calculate the Chi-square statistic manually
chi_square_stat = sum((observed - expected)^2 / expected)
# Degrees of freedom: (rows - 1) * (columns - 1)
df = (nrow(observed) - 1) * (ncol(observed) - 1)
# Step 4: Get the critical value from the Chi-square distribution table
alpha = 0.05 # significance level
critical_value = qchisq(1 - alpha, df) # Chi-square critical value for 0.05 significance level
# Step 5: Display the results
cat("Chi-square Statistic:", chi_square_stat, "\n")
Chi-square Statistic: 11.42857
cat("Degrees of Freedom:", df, "\n")
Degrees of Freedom: 2
cat("Critical Value from the Chi-square table:", critical_value, "\n")
Critical Value from the Chi-square table: 5.991465
# Step 6: Interpretation using the table value
if (chi_square_stat > critical_value) {
cat("There is a significant relationship between gender and workout type (reject the null hypothesis).")
} else {
cat("There is no significant relationship between gender and workout type (fail to reject the null
hypothesis).")
}
There is a significant relationship between gender and workout type (reject the null hypothesis).
Q.2
# Step 1: Create the observed data table
observed = matrix(c(4, 6, 7, 12, 8, 5, 10, 14, 6, 7, 4, 12),nrow = 4, ncol = 3, byrow = TRUE)
# Assign row and column names
rownames(observed) = c("D1", "D2", "D3", "D4")
colnames(observed) = c("P1", "P2", "P3")
# Step 2: Calculate the expected frequencies
# Total sum
total_sum = sum(observed)
# Row and column totals
row_totals = rowSums(observed)
col_totals = colSums(observed)
# Calculate expected frequencies for each cell
expected = outer(row_totals, col_totals, FUN = "*") / total_sum
# Display the expected frequencies
expected
# Step 3: Calculate the Chi-square statistic manually
chi_square_stat = sum((observed - expected)^2 / expected)
# Degrees of freedom: (rows - 1) * (columns - 1)
df = (nrow(observed) - 1) * (ncol(observed) - 1)
# Step 4: Get the critical value from the Chi-square distribution table
alpha = 0.05 # significance level
critical_value = qchisq(1 - alpha, df) # Chi-square critical value for 0.05 significance level
# Step 5: Display the results
cat("Chi-square Statistic:", chi_square_stat, "\n")
Chi-square Statistic: 11.28832
cat("Degrees of Freedom:", df, "\n")
Degrees of Freedom: 6
cat("Critical Value from the Chi-square table:", critical_value, "\n")
Critical Value from the Chi-square table: 12.59159
# Step 6: Interpretation using the table value
if (chi_square_stat > critical_value) {
cat("There is a significant difference between the processes (reject the null hypothesis).")
} else {
cat("There is no significant difference between the processes (fail to reject the null hypothesis).")
}
There is no significant difference between the processes (fail to reject the null hypothesis).