Kendall Correlation Testing in R Programming
Last Updated :
10 Oct, 2024
Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient can be calculated to answer this question. Generally, it lies between -1 and +1. It is a scaled version of covariance and provides direction and strength of the relationship. It's dimensionless. There are mainly two types of correlation:
- Parametric Correlation - Pearson correlation(r): It measures a linear dependence between two variables (x and y) is known as a parametric correlation test because it depends on the distribution of the data.
- Non-Parametric Correlation - Kendall(tau) and Spearman(rho): They are rank-based correlation coefficients, and are known as non-parametric correlation.
What is Kendall's Tau?
Kendall’s tau is a measure of correlation that assesses the ordinal relationship between two variables. It is based on the difference between the number of concordant and discordant pairs in the dataset. Kendall Rank Correlation is a rank-based correlation coefficient, also known as non-parametric correlation. The formula for calculating Kendall Rank Correlation is as follows:
[
\tau = \frac{\text{Number of concordant pairs} - \text{Number of discordant pairs}}{\frac{n(n - 1)}{2}}
]
where,
- Concordant Pair: A pair of observations (x1, y1) and (x2, y2) that follows the property
- x1 > x2 and y1 > y2 or
- x1 < x2 and y1 < y2
- Discordant Pair: A pair of observations (x1, y1) and (x2, y2) that follows the property
- x1 > x2 and y1 < y2 or
- x1 < x2 and y1 > y2
- n: Total number of samples
Note: The pair for which x1 = x2 and y1 = y2 are not classified as concordant or discordant are ignored.
R’s base functions include support for calculating Kendall's tau using the cor()
and cor.test()
functions. Optionally, you can install additional visualization packages such as ggpubr
for enhanced plots.
install.packages("ggpubr")
Lets discuss stepby step Kendall Correlation Testing in R Programming Language:
Step 1: Creating a Dataset
Let’s create a sample dataset to work with.
R
# Sample data
set.seed(123)
x <- c(12, 25, 35, 47, 52, 68, 70, 85, 90, 100)
y <- c(15, 22, 37, 40, 48, 60, 67, 80, 95, 105)
data <- data.frame(x, y)
Step 2: Computing Kendall Correlation using cor()
function
You can compute Kendall’s tau using the cor()
function and specifying the method as "kendall"
.
R
# Calculate Kendall correlation
kendall_corr <- cor(data$x, data$y, method = "kendall")
kendall_corr
Output:
[1] 0.9111111
Step 3: Hypothesis Testing with cor.test()
To conduct a hypothesis test and obtain the p-value for Kendall correlation, use the cor.test()
function:
R
# Perform Kendall correlation test
kendall_test <- cor.test(data$x, data$y, method = "kendall")
kendall_test
Output:
Kendall's rank correlation tau
data: data$x and data$y
T = 45, p-value = 5.511e-07
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.9111111
The output shows a high positive Kendall's tau (0.91), with a p-value indicating statistical significance.
- Positive values indicate a strong positive monotonic relationship.
- Negative values indicate a strong negative monotonic relationship.
- Values near 0 indicate little to no monotonic relationship.
- Always consider the p-value to assess the significance of the relationship.
Step 4: Visualizing Kendall Correlation
We can visualize the correlation using a scatter plot and annotate it with Kendall’s tau using the ggpubr
package.
R
# Load required library
library(ggpubr)
# Scatter plot with Kendall correlation coefficient
ggscatter(data, x = "x", y = "y",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "kendall",
xlab = "X Values", ylab = "Y Values",
title = "Kendall Correlation Plot")
Output:
Kendall Correlation Testing in R ProgrammingThis will produce a scatter plot with a trendline and display the Kendall correlation coefficient.
Conclusion
Kendall correlation, or Kendall’s tau, provides a robust non-parametric measure of association between two variables. It is particularly useful for ordinal data and datasets with ties, where it outperforms other correlation measures like Pearson or Spearman. R offers convenient functions like cor()
and cor.test()
to compute Kendall’s tau and test its significance.