Similar Series

Similar Series
Data Engineering Silicon Valley
Conducting real-time similarity search using an approximate nearest neighbor technique.

Problem Statement:
- Want to query historical price data to get Real time approximate nearest neighbors.

Motivation
To provide researchers of financial time series to find periods in time that are
“similar” to the latest period in real time.
- Would help algorithm developers gain insights into how the time series developed in “similar”
periods in the past.
- Allows to cross-reference other time series (commodities/other currencies) for “similar” periods
in the past.
- Can be used as a signal in quantitative trading.

Overview of challenges
- Compute distance between unevenly spaced time series.
- Compute approximate nearest neighbor in near constant
time.
- Construct a data structure that allows reliable processing,
storage and retrieval of data to quickly respond to queries.

Distance metric between non-uniform time series.
- L1 Analogue
- Satisfies triangle inequality
- Easy to visualize

Finding the nearest neighbor quickly
- LSH for a generic metric space.
- N pivots
- Use the distance ordering to pivots as a
permutation.
- Example permutation: 32154
- Permutation is used to index the historical
data and perform fast queries.
On Locality-sensitive Indexing in Generic Metric Spaces.
Novak, Kyselak, Zezula 2010

Applying the idea to unevenly spaced time series.
Query:
Resulting permutation:
13245

Data structure for fast querying of similar permutations:
- Use a nested key-value store.
- Store the full permutations and timestamps in the leaves.
- Total possible number of leave nodes is n! Where n is the number of pivots.
- Implemented a persistent version using Cassandra tables.
Want to query permutation: 13245
The desired timestamp is at the leaf.

Further Directions
- Optimize pivot selection.
- Optimize algorithm to find more exact results.
- Consider different distance functions.
- Benchmark accuracy.
- Use the obtained nearest neighbors for research.

Yevgeniy Grechka
MA Statistics UC Berkeley

Similar Series

More Related Content

Similar to Similar Series

Recently uploaded

Similar Series