How Can I Label the Points of a Quantile-Quantile Plot Composed with ggplot2?
A Quantile-Quantile (Q-Q) plot is a graphical tool used to compare the distribution of a dataset with a theoretical distribution, such as the normal distribution. When using ggplot2
to create Q-Q plots in R, it is often useful to label specific points on the plot, especially when identifying outliers or highlighting specific data points. This article explains how to label points on a Q-Q plot created with ggplot2 in
R Programming Language
.
Creating a Basic Q-Q Plot with ggplot2
Before labeling points, let’s start with a basic Q-Q plot using ggplot2
. This plot compares the quantiles of a dataset to the quantiles of a normal distribution.
library(ggplot2)
# Generate sample data
set.seed(123)
sample_data <- rnorm(100)
# Create Q-Q plot
qq_plot <- ggplot(data = data.frame(sample_data), aes(sample = sample_data)) +
stat_qq() +
stat_qq_line() +
ggtitle("Basic Q-Q Plot")
# Display the plot
print(qq_plot)
Output:

In this example, stat_qq()
generates the Q-Q plot, and stat_qq_line()
adds a reference line, making it easier to assess how well the data follows the normal distribution.
1: Labeling Specific Points by Index
In this example, we label specific points based on their index in the dataset. This approach is useful if you know exactly which points you want to label.
library(ggplot2)
# Generate sample data
set.seed(123)
sample_data <- rnorm(100)
# Create Q-Q plot
qq_plot <- ggplot(data = data.frame(sample_data), aes(sample = sample_data)) +
stat_qq() +
stat_qq_line() +
ggtitle("Q-Q Plot with Labeled Points")
# Extract the data used for the Q-Q plot
plot_data <- ggplot_build(qq_plot)$data[[1]]
# Label specific points (e.g., first and last points)
plot_data$label <- ifelse(plot_data$sample %in% range(plot_data$sample),
"Extreme", "")
# Add labels to the Q-Q plot
qq_plot_labeled <- qq_plot +
geom_text(data = plot_data, aes(x = x, y = y, label = label),
vjust = -1, hjust = 0.5, color = "red")
# Display the plot
print(qq_plot_labeled)
Output:

2: Labeling Points Based on a Condition
This example demonstrates how to label points that meet a specific condition, such as being greater than or less than a certain value.
library(ggplot2)
# Generate sample data
set.seed(123)
sample_data <- rnorm(100)
# Create Q-Q plot
qq_plot <- ggplot(data = data.frame(sample_data), aes(sample = sample_data)) +
stat_qq() +
stat_qq_line() +
ggtitle("Q-Q Plot with Conditional Labels")
# Extract the data used for the Q-Q plot
plot_data <- ggplot_build(qq_plot)$data[[1]]
# Label points greater than a specific threshold
plot_data$label <- ifelse(plot_data$y > 1.5, "High", "")
# Add labels to the Q-Q plot
qq_plot_labeled <- qq_plot +
geom_text(data = plot_data, aes(x = x, y = y, label = label),
vjust = -1, hjust = 0.5, color = "blue")
# Display the plot
print(qq_plot_labeled)
Output:

3: Labeling All Points with Their Quantile Values
If you want to label all points on the Q-Q plot with their quantile values, this example shows how to do that.
library(ggplot2)
# Generate sample data
set.seed(123)
sample_data <- rnorm(100)
# Create Q-Q plot
qq_plot <- ggplot(data = data.frame(sample_data), aes(sample = sample_data)) +
stat_qq() +
stat_qq_line() +
ggtitle("Q-Q Plot with Quantile Labels")
# Extract the data used for the Q-Q plot
plot_data <- ggplot_build(qq_plot)$data[[1]]
# Add labels to all points with their quantile values
plot_data$label <- round(plot_data$y, 2)
# Add labels to the Q-Q plot
qq_plot_labeled <- qq_plot +
geom_text(data = plot_data, aes(x = x, y = y, label = label),
vjust = -1, hjust = 0.5, size = 3)
# Display the plot
print(qq_plot_labeled)
Output:

4: Labeling Points with Custom Text
In this example, you can label specific points with custom text, which is useful for highlighting particular data points.
library(ggplot2)
# Generate sample data
set.seed(123)
sample_data <- rnorm(100)
# Create Q-Q plot
qq_plot <- ggplot(data = data.frame(sample_data), aes(sample = sample_data)) +
stat_qq() +
stat_qq_line() +
ggtitle("Q-Q Plot with Custom Labels")
# Extract the data used for the Q-Q plot
plot_data <- ggplot_build(qq_plot)$data[[1]]
# Custom labels for specific points
plot_data$label <- ""
plot_data$label[plot_data$y > 1.5] <- "High"
plot_data$label[plot_data$y < -1.5] <- "Low"
# Add labels to the Q-Q plot
qq_plot_labeled <- qq_plot +
geom_text(data = plot_data, aes(x = x, y = y, label = label),
vjust = -1, hjust = 0.5, color = "green")
# Display the plot
print(qq_plot_labeled)
Output:

Conclusion
Labeling points on a Q-Q plot in ggplot2
is a straightforward process that adds valuable information to your visualizations. Whether you're labeling specific quantiles or all points, geom_text()
and geom_label()
provide flexible options for customizing the appearance of labels. By carefully choosing which points to label and how to display those labels, you can enhance the interpretability and clarity of your Q-Q plots in R. This approach can be particularly useful for identifying and communicating the behavior of outliers or specific data points that warrant further investigation in your data analysis process.