Cross-correlation is a method used to see how similar two sets of data are, especially when one is shifted in time. It helps us find out if a change in one set happens before or after a change in the other, and how closely they are related. This is useful in areas like time series analysis, finance, and science to understand if one signal can help predict another.
Mathematically, the cross-correlation of two sequences and is given by:
R_{xy}(\tau) = \sum_{t} x(t) y(t + \tau)
Where:
- R_{xy}(\tau) - This is the cross-correlation value between the two signals x and y at a time lag of \tau.
- \sum_{t} - This symbol means we are adding up values over all points in time t.
- x(t) - This is the value of the first signal x at time t
- y(t + \tau) - This is the value of the second signal y, but shifted in time by \tau.
Uses of Cross-Correlation
Cross-correlation is helpful for:
- Finding Time Delays: Identifying whether one signal leads or lags another.
- Signal Matching: Comparing signals for similarity.
- Feature Extraction: In fields like speech or image processing.
- Detecting Periodicity: In repetitive signals or time series data
Visualizing Cross-Correlation with Python
Let’s use Python to demonstrate cross-correlation between two simple signals.
Python
import numpy as np
import matplotlib.pyplot as plt
# Generate two signals
np.random.seed(0)
x = np.random.randn(100)
y = np.roll(x, 5) + np.random.randn(100) * 0.1 # Shift x by 5 and add noise
# Compute cross-correlation
correlation = np.correlate(x - np.mean(x), y - np.mean(y), mode='full')
lags = np.arange(-len(x) + 1, len(x))
# Plot
plt.figure(figsize=(10, 5))
plt.plot(lags, correlation)
plt.title('Cross-Correlation between x and y')
plt.xlabel('Lag')
plt.ylabel('Cross-Correlation')
plt.grid(True)
plt.show()
Output
Cross-CorrelationIn this example:
- We generate a random signal x.
- We create y by shifting x by 5-time steps and adding a little noise.
- The peak in the plot should occur at lag = 5, indicating the time delay.
Normalized Cross-Correlation
To compare signals of different energy (magnitude), we often normalize the cross-correlation.
Normalized cross-correlation is calculated as:
R_{xy}^{\text{norm}}(\tau) = \frac{\sum_t (x(t) - \mu_x)(y(t+\tau) - \mu_y)}{\sigma_x \sigma_y}
Here’s how you can compute it in Python:
Python
def normalized_cross_correlation(x, y):
x = (x - np.mean(x)) / np.std(x)
y = (y - np.mean(y)) / np.std(y)
return np.correlate(x, y, mode='full') / len(x)
ncc = normalized_cross_correlation(x, y)
lags = np.arange(-len(x) + 1, len(x))
plt.figure(figsize=(10, 5))
plt.plot(lags, ncc)
plt.title('Normalized Cross-Correlation')
plt.xlabel('Lag')
plt.ylabel('Correlation Coefficient')
plt.grid(True)
plt.show()
Output
Normalized Cross-CorrelationCross-Correlation in the Frequency Domain
Using the Fast Fourier Transform (FFT), cross-correlation can be computed efficiently. According to the convolution theorem:
R_{xy}(\tau) = \mathcal{F}^{-1} \left\{ \mathcal{F}(x) \cdot \mathcal{F}(y)^* \right\}
This approach is useful when dealing with very large signals.
Python
from scipy.fft import fft, ifft
def fft_cross_correlation(x, y):
X = fft(x)
Y = fft(y)
corr = ifft(X * np.conj(Y))
return np.real(corr)
fft_corr = fft_cross_correlation(x, y)
plt.figure(figsize=(10, 5))
plt.plot(fft_corr)
plt.title('Cross-Correlation using FFT')
plt.xlabel('Lag')
plt.ylabel('Correlation')
plt.grid(True)
plt.show()
Output
Cross Correlation using FFTCross-Correlation Matrix
In multivariate analysis, we deal with multiple time series. A cross-correlation matrix helps examine all pairwise correlations.
Python
import pandas as pd
# Simulate multiple signals
data = pd.DataFrame({
'x': x,
'y': y,
'z': np.roll(x, -3) + np.random.randn(100) * 0.2
})
# Compute correlation matrix
corr_matrix = data.corr()
print(corr_matrix)
Output
Cross-Correlation MatrixCross-Correlation vs Auto-Correlation
- Cross-Correlation: Measures similarity between two different signals.
- Auto-Correlation: Measures similarity of a signal with itself over time.
Limitations of Cross-Correlation
- Stationarity Assumption: It assumes the signals are stationary (their properties do not change over time).
- Sensitivity to Noise: High levels of noise can obscure meaningful correlations.
- Ambiguity in Causality: A high cross-correlation doesn't necessarily mean one signal causes the other.
- Windowing Effects: For very long signals, the choice of analysis window affects the result.
Applications of Cross-Correlation
- Signal Processing: To detect a known signal in a noisy recording.
- Economics: To find relationships between two financial indicators.
- Weather Forecasting: To detect time shifts in climate patterns.
- Neuroscience: To understand connectivity between brain signals.
- Image Matching: Comparing templates in computer vision.
Similar Reads
Correlation and Regression Correlation and regression are essential statistical tools used to analyze the relationship between variables. Correlation measures the strength and direction of a linear relationship between two variables, indicating how one variable changes in response to another. Regression, on the other hand, go
8 min read
AutoCorrelation Autocorrelation is a fundamental concept in time series analysis. Autocorrelation is a statistical concept that assesses the degree of correlation between the values of variable at different time points. The article aims to discuss the fundamentals and working of Autocorrelation. Table of Content Wh
10 min read
Spearman's Rank Correlation Correlation measures the strength of the association between two variables. For instance, if we are interested in knowing whether there is a relationship between the heights of fathers and sons, a correlation coefficient can be calculated to answer this question. To learn more about correlation, ple
8 min read
Correlate function in R Co-relation is a basic, general statistical tool used to predict the degree of association and direction between two variables. In R, the most basic resource for computing correlations is the cor function, which is designed for statistical computation and graphical illustration in R Programming Lang
5 min read
Correlograms in R A correlogram (or a correlation matrix plot or scatterplot matrix) is a graphical display of the pairwise relationships between a group of variables. In R Programming Language, there is a package called corrplot that makes it simple to create correlograms. Below is an example of how to create a corr
3 min read