Cross-correlation is a method used to see how similar two sets of data are, especially when one is shifted in time. It helps us find out if a change in one set happens before or after a change in the other, and how closely they are related. This is useful in areas like time series analysis, finance, and science to understand if one signal can help predict another.
Mathematically, the cross-correlation of two sequences and is given by:
R_{xy}(\tau) = \sum_{t} x(t) y(t + \tau)
Where:
- R_{xy}(\tau) - This is the cross-correlation value between the two signals x and y at a time lag of \tau.
- \sum_{t} - This symbol means we are adding up values over all points in time t.
- x(t) - This is the value of the first signal x at time t
- y(t + \tau) - This is the value of the second signal y, but shifted in time by \tau.
Uses of Cross-Correlation
Cross-correlation is helpful for:
- Finding Time Delays: Identifying whether one signal leads or lags another.
- Signal Matching: Comparing signals for similarity.
- Feature Extraction: In fields like speech or image processing.
- Detecting Periodicity: In repetitive signals or time series data
Visualizing Cross-Correlation with Python
Let’s use Python to demonstrate cross-correlation between two simple signals.
Python
import numpy as np
import matplotlib.pyplot as plt
# Generate two signals
np.random.seed(0)
x = np.random.randn(100)
y = np.roll(x, 5) + np.random.randn(100) * 0.1 # Shift x by 5 and add noise
# Compute cross-correlation
correlation = np.correlate(x - np.mean(x), y - np.mean(y), mode='full')
lags = np.arange(-len(x) + 1, len(x))
# Plot
plt.figure(figsize=(10, 5))
plt.plot(lags, correlation)
plt.title('Cross-Correlation between x and y')
plt.xlabel('Lag')
plt.ylabel('Cross-Correlation')
plt.grid(True)
plt.show()
Output
Cross-CorrelationIn this example:
- We generate a random signal x.
- We create y by shifting x by 5-time steps and adding a little noise.
- The peak in the plot should occur at lag = 5, indicating the time delay.
Normalized Cross-Correlation
To compare signals of different energy (magnitude), we often normalize the cross-correlation.
Normalized cross-correlation is calculated as:
R_{xy}^{\text{norm}}(\tau) = \frac{\sum_t (x(t) - \mu_x)(y(t+\tau) - \mu_y)}{\sigma_x \sigma_y}
Here’s how you can compute it in Python:
Python
def normalized_cross_correlation(x, y):
x = (x - np.mean(x)) / np.std(x)
y = (y - np.mean(y)) / np.std(y)
return np.correlate(x, y, mode='full') / len(x)
ncc = normalized_cross_correlation(x, y)
lags = np.arange(-len(x) + 1, len(x))
plt.figure(figsize=(10, 5))
plt.plot(lags, ncc)
plt.title('Normalized Cross-Correlation')
plt.xlabel('Lag')
plt.ylabel('Correlation Coefficient')
plt.grid(True)
plt.show()
Output
Normalized Cross-CorrelationCross-Correlation in the Frequency Domain
Using the Fast Fourier Transform (FFT), cross-correlation can be computed efficiently. According to the convolution theorem:
R_{xy}(\tau) = \mathcal{F}^{-1} \left\{ \mathcal{F}(x) \cdot \mathcal{F}(y)^* \right\}
This approach is useful when dealing with very large signals.
Python
from scipy.fft import fft, ifft
def fft_cross_correlation(x, y):
X = fft(x)
Y = fft(y)
corr = ifft(X * np.conj(Y))
return np.real(corr)
fft_corr = fft_cross_correlation(x, y)
plt.figure(figsize=(10, 5))
plt.plot(fft_corr)
plt.title('Cross-Correlation using FFT')
plt.xlabel('Lag')
plt.ylabel('Correlation')
plt.grid(True)
plt.show()
Output
Cross Correlation using FFTCross-Correlation Matrix
In multivariate analysis, we deal with multiple time series. A cross-correlation matrix helps examine all pairwise correlations.
Python
import pandas as pd
# Simulate multiple signals
data = pd.DataFrame({
'x': x,
'y': y,
'z': np.roll(x, -3) + np.random.randn(100) * 0.2
})
# Compute correlation matrix
corr_matrix = data.corr()
print(corr_matrix)
Output
Cross-Correlation MatrixCross-Correlation vs Auto-Correlation
- Cross-Correlation: Measures similarity between two different signals.
- Auto-Correlation: Measures similarity of a signal with itself over time.
Limitations of Cross-Correlation
- Stationarity Assumption: It assumes the signals are stationary (their properties do not change over time).
- Sensitivity to Noise: High levels of noise can obscure meaningful correlations.
- Ambiguity in Causality: A high cross-correlation doesn't necessarily mean one signal causes the other.
- Windowing Effects: For very long signals, the choice of analysis window affects the result.
Applications of Cross-Correlation
- Signal Processing: To detect a known signal in a noisy recording.
- Economics: To find relationships between two financial indicators.
- Weather Forecasting: To detect time shifts in climate patterns.
- Neuroscience: To understand connectivity between brain signals.
- Image Matching: Comparing templates in computer vision.
Explore
Introduction to Machine Learning
Python for Machine Learning
Introduction to Statistics
Feature Engineering
Model Evaluation and Tuning
Data Science Practice