SlideShare a Scribd company logo
Approaches to online quantile estimationApproaches to online quantile estimation
Joe Ross
Principal Data Scientist, Splunk
October 23, 2020
Data Con LA
Approaches to online quantile estimation
Approaches to online quantile estimation
The core problemThe core problem
Given a stream of numbers , build a data structure that can answer rank
queries, i.e.,
= number of stream elements
= some number such that , where is the current
stream length
Requirements:
Online operation: process the stream exactly once
Stream length not known in advance
Size of the data structure should have mild dependence on
Update (" ") and query operations should be fast
...
ApplicationsApplications
quantiles are fundamental summary statistics, especially for non-normal
distributions
time series anomaly detection
"observability" SLA monitoring: agree to serve 99.95% of requests within 50ms,
calculated per-month
"high cardinality" applications
First resultsFirst results
"Exact answers" and "mild dependence on " are incompatible
Munro-Paterson proved a lower bound for a -pass algorithm.
To answer queries exactly online ( ), need to store the whole stream.
White-box attack: given storage size , carefully designed size input leaves a problem as
hard as rank problem on input of size . Idea: always replace by indistinguishable
element, so next pass must compute on some set of size .
Formulate approximate versions: return rank/quantile within bounded error :
Then can also ask for tunability (required size as function of ).
Comments about theComments about the -approximate conditions-approximate conditions
Formulated in quantile space, not value space, i.e.,
the estimated 95th percentile is required to be between the actual 94th and 96th
percentiles;
~the estimated 95th percentile is required to be within 1% of the true 95th percentile~
formulation that makes sense for streams drawn for arbitrary ordered sets
invariant under monotonic transformations ("1%" guarantees are not preserved
under translation e.g.)
problem for guarantees in value space has several simple solutions (enhance xed
bins)
guarantees in quantile space can be arbitrarily bad in value space, and vice versa:
these are different problems
Now, imagine we could store the whole stream and then produce a compact read-only data
structure:
In [3]: ideal_samples()
Mergeability (another requirement)Mergeability (another requirement)
formed on separate streams
Mergeability means we can de ne a new data structure with similar error
guarantees as that of and (i.e., hard to distinguish from constructing on
).
Applications: distributed computing (separate machines), separate windows of time (or
other dimensions) to be re-assembled at query time
First approaches, continuedFirst approaches, continued
Munro-Paterson also provided algorithm that succeeds with high probability (for
nding median, say).
Maintain consecutive elements, counts and of elements below and above the -
element set.
View progression as random walk of ; will nd median if always .
Assuming equal probabilities, rst steps stay within of the origin (with high
probability).
( enables reduction to problem of size , hence favorable asymptotic size)
Greenwald-Khanna sketch maintains an ordered set of stream elements, together with
bounds on their possible ranks.
Denote by bounds on current rank of some stream element .
Sketch consists of:
Error for rank queries is bounded by
Insert an element by adding
Compression tries to merge tuples so that for all (the 's add under
merging consecutive elements)
Requires space
Essentially optimal solution: KLL sketch (careful sampling)Essentially optimal solution: KLL sketch (careful sampling)
A compactor holds items, each of weight ; can compact them into items each of
weight (keep even or odd elements, with equal probability).
Hierarchy of compactors of increasing size. Fix .
elements of weight
...
...
elements of weight
elements of weight
Express number of levels and compactions in terms of weights and stream size.
A single compaction produces error where and (whether even or
odd selected); sum over all compactions and levels, use Hoeffding's lemma.
Matches G-K size, simpli ed construction and arguments. Also mergeable.
Replace lower levels with samplers, keep constant in higher levels.
Optionally pass the whole device to G-K (loses mergeability); get , which is
optimal.
Relative errorRelative error
For skewed distributions (e.g., latency), care more about accuracy near the tails.
-digest prescribes desired accuracy as a function of quantile space
maintain sorted list of centroids: represents points near ; insertion and
merging mechanics
permissible centroid size governed by scale function (non-
decreasing)
cluster occupies in quantile space, then interval has
length (or cluster consists of one point)
ExamplesExamples
In [5]: scale_functions()
Non-linear scale function makes accuracy variable, error proportional to (something like)
for
Latency distributions have signi cant positive skew
Desire asymmetric accuracy: higher accuracy towards , lower towards
Property of the scale function that clusters satisfy condition after insertion/merging.
CharacterizationCharacterization
The scale function is decent ( accepts insertions for all ) if and only if
for all and all , we have:
for (moves to right) and
for (moves to left).
( is proportion occupied by inserted cluster)
In [7]: # Decent scale function must be continuous, in fact differentiable
discont()
Differentiability suggests tangent line construction to produce asymmetric -digest
In [9]: glued_scale_functions()
To verify decency, use one-variable characterization of decency:
and are non-increasing on
Errors and centroid counts for the usual ( rst row) and glued (second row) variants ofErrors and centroid counts for the usual ( rst row) and glued (second row) variants of
for for for
Relative error KLLRelative error KLL
Instead of ,
Corresponds to scale function.
Motivation: near , error is not so helpful.
Uses hierarchy of relative-compactors : only compact in the larger half, and "how close"
compaction gets to the median is controlled by an exponential distribution. (Vary sampling
across the distribution.)
Worse space than usual KLL (provably needed):
In a given level:
In [11]: relative_compactor()
Moment-based quantile sketchMoment-based quantile sketch
Motivation: why not just keep and use -score to answer rank/quantile queries?
Idea: sketch consists of several (log-)moments. Trivial to merge!
To extract quantiles: among all distributions realizing the empirical moments, pick one via
principle of maximum entropy and use its quantiles.
Solution from exponential family, ef cient numerical methods to solve optimization
problem.
In case two moments, amounts to assuming normal distribution.
Aimed at high-cardinality scenario in which answering a quantile query may require
merging millions of subsketches; for the sketches mentioned earlier, amounts to merging
millions of sorted lists. Addition of moments (even over millions of records) can be made
very fast because vectorizable.
Example: understand application performance across {user device type, geography,
software version, time}.
ReferencesReferences
J Ian Munro and Mike S Paterson. "Selection and sorting with limited storage." Theoretical
computer science, 12(3):315–323, 1980.
Michael Greenwald, Sanjeev Khanna, et al. "Space-ef cient online computation of quantile
summaries." ACM SIGMOD Record, 30(2):58–66, 2001.
Zohar Karnin, Kevin Lang, and Edo Liberty. "Optimal quantile approximation in streams." In
2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 71–78.
IEEE, 2016.
Ted Dunning, Otmar Ertl. "Computing extremely accurate quantiles using -digests."
arXiv:1902.04023, 2019.
Ted Dunning. "Conservation of the -digest scale invariant." arXiv:1903.09919, 2019.
Joe Ross. "Asymmetric scale functions for -digests." Submitted, 2019; branch of Dunning's
-digest repo.
Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, and Pavel Veselý. "Relative
error streaming quantiles." arXiv preprint arXiv:2004.01668, 2020.
Edward Gan, Jialin Ding, Kai Sheng Tai, Vatsal Sharan, Peter Bailis. "Moment-based quantile
sketches for ef cient high cardinality aggregation queries." Proceedings of the VLDB
Endowment, 11(11), 1647-1660, 2018.
https://github.com/signalfx/t-digest/tree/asymmetric/docs/asymmetric
(https://github.com/signalfx/t-digest/tree/asymmetric/docs/asymmetric)

More Related Content

PDF
SEQUENTIAL CLUSTERING-BASED EVENT DETECTION FOR NONINTRUSIVE LOAD MONITORING
PPT
Chapter 3 pc
PDF
Training and Inference for Deep Gaussian Processes
PDF
P229 godfrey
PDF
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
PPTX
Application of stochastic modelling in bioinformatics
PPTX
Machine Learning Algorithms (Part 1)
PPTX
Advanced topics in artificial neural networks
SEQUENTIAL CLUSTERING-BASED EVENT DETECTION FOR NONINTRUSIVE LOAD MONITORING
Chapter 3 pc
Training and Inference for Deep Gaussian Processes
P229 godfrey
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
Application of stochastic modelling in bioinformatics
Machine Learning Algorithms (Part 1)
Advanced topics in artificial neural networks

What's hot (20)

PDF
010_20160216_Variational Gaussian Process
PDF
Parallel External Memory Algorithms Applied to Generalized Linear Models
PPTX
Implement principal component analysis (PCA) in python from scratch
PDF
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
PPTX
Reading group nfm - 20170312
PPT
Chap3 slides
PPT
Clustering: Large Databases in data mining
PPT
Parallel Processing Concepts
PDF
cis98010
PDF
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
PDF
Data Science - Part VII - Cluster Analysis
PDF
The Gaussian Process Latent Variable Model (GPLVM)
PDF
Neural Networks: Principal Component Analysis (PCA)
PDF
Datapath
PDF
Ikdd co ds2017presentation_v2
PDF
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
PDF
Principal Component Analysis
PPTX
K-means Clustering
PDF
New emulation based approach for probabilistic seismic demand
PPT
Aggregation computation over distributed data streams(the final version)
010_20160216_Variational Gaussian Process
Parallel External Memory Algorithms Applied to Generalized Linear Models
Implement principal component analysis (PCA) in python from scratch
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
Reading group nfm - 20170312
Chap3 slides
Clustering: Large Databases in data mining
Parallel Processing Concepts
cis98010
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
Data Science - Part VII - Cluster Analysis
The Gaussian Process Latent Variable Model (GPLVM)
Neural Networks: Principal Component Analysis (PCA)
Datapath
Ikdd co ds2017presentation_v2
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Principal Component Analysis
K-means Clustering
New emulation based approach for probabilistic seismic demand
Aggregation computation over distributed data streams(the final version)
Ad

Similar to Approaches to online quantile estimation (20)

PDF
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
PDF
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
PDF
Two methods for optimising cognitive model parameters
PDF
Gk3611601162
DOC
HW2-1_05.doc
PDF
Data clustering using kernel based
PDF
IEEE Datamining 2016 Title and Abstract
PDF
Machine Learning.pdf
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PPTX
ML_in_QM_JC_02-10-18
PDF
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
ODP
From Simulation to Online Gaming: the need for adaptive solutions
PDF
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
PDF
Continuous Architecting of Stream-Based Systems
PPT
AHF_IDETC_2011_Jie
PDF
Deep learning concepts
PDF
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
PDF
My Postdoctoral Research
PPTX
A Tale of Data Pattern Discovery in Parallel
PPTX
Accounting for uncertainty in species delineation during the analysis of envi...
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Two methods for optimising cognitive model parameters
Gk3611601162
HW2-1_05.doc
Data clustering using kernel based
IEEE Datamining 2016 Title and Abstract
Machine Learning.pdf
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ML_in_QM_JC_02-10-18
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
From Simulation to Online Gaming: the need for adaptive solutions
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
Continuous Architecting of Stream-Based Systems
AHF_IDETC_2011_Jie
Deep learning concepts
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
My Postdoctoral Research
A Tale of Data Pattern Discovery in Parallel
Accounting for uncertainty in species delineation during the analysis of envi...
Ad

More from Data Con LA (20)

PPTX
Data Con LA 2022 Keynotes
PPTX
Data Con LA 2022 Keynotes
PDF
Data Con LA 2022 Keynote
PPTX
Data Con LA 2022 - Startup Showcase
PPTX
Data Con LA 2022 Keynote
PDF
Data Con LA 2022 - Using Google trends data to build product recommendations
PPTX
Data Con LA 2022 - AI Ethics
PDF
Data Con LA 2022 - Improving disaster response with machine learning
PDF
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
PDF
Data Con LA 2022 - Real world consumer segmentation
PPTX
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
PPTX
Data Con LA 2022 - Moving Data at Scale to AWS
PDF
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
PDF
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
PDF
Data Con LA 2022 - Intro to Data Science
PDF
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
PPTX
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
PPTX
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
PPTX
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
PPTX
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynote
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 Keynote
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022 - Data Streaming with Kafka

Recently uploaded (20)

PPTX
lec_5(probability).pptxzzjsjsjsjsjsjjsjjssj
PPTX
Trading Procedures (1).pptxcffcdddxxddsss
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Extract Transformation Load (3) (1).pptx
PDF
CB-Insights_Artificial-Intelligence-Report-Q2-2025.pdf
PPTX
Global journeys: estimating international migration
PDF
Master Databricks SQL with AccentFuture – The Future of Data Warehousing
PDF
Report The-State-of-AIOps 20232032 3.pdf
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
AI Lect 2 Identifying AI systems, branches of AI, etc.pdf
PPTX
Economic Sector Performance Recovery.pptx
PDF
Digital Infrastructure – Powering the Connected Age
PPTX
Presentation1.pptxvhhh. H ycycyyccycycvvv
DOCX
Estimating GW Storage Variability Using GRACE derived data_Paper.docx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Data-Driven-Credit-Card-Launch-A-Wells-Fargo-Case-Study.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
1intro to AI.pptx AI components & composition
PPTX
Machine Learning Solution for Power Grid Cybersecurity with GraphWavelets
lec_5(probability).pptxzzjsjsjsjsjsjjsjjssj
Trading Procedures (1).pptxcffcdddxxddsss
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Extract Transformation Load (3) (1).pptx
CB-Insights_Artificial-Intelligence-Report-Q2-2025.pdf
Global journeys: estimating international migration
Master Databricks SQL with AccentFuture – The Future of Data Warehousing
Report The-State-of-AIOps 20232032 3.pdf
Business Acumen Training GuidePresentation.pptx
AI Lect 2 Identifying AI systems, branches of AI, etc.pdf
Economic Sector Performance Recovery.pptx
Digital Infrastructure – Powering the Connected Age
Presentation1.pptxvhhh. H ycycyyccycycvvv
Estimating GW Storage Variability Using GRACE derived data_Paper.docx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Data-Driven-Credit-Card-Launch-A-Wells-Fargo-Case-Study.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
1intro to AI.pptx AI components & composition
Machine Learning Solution for Power Grid Cybersecurity with GraphWavelets

Approaches to online quantile estimation

  • 1. Approaches to online quantile estimationApproaches to online quantile estimation Joe Ross Principal Data Scientist, Splunk October 23, 2020 Data Con LA
  • 4. The core problemThe core problem Given a stream of numbers , build a data structure that can answer rank queries, i.e., = number of stream elements = some number such that , where is the current stream length Requirements: Online operation: process the stream exactly once Stream length not known in advance Size of the data structure should have mild dependence on Update (" ") and query operations should be fast ...
  • 5. ApplicationsApplications quantiles are fundamental summary statistics, especially for non-normal distributions time series anomaly detection "observability" SLA monitoring: agree to serve 99.95% of requests within 50ms, calculated per-month "high cardinality" applications
  • 6. First resultsFirst results "Exact answers" and "mild dependence on " are incompatible Munro-Paterson proved a lower bound for a -pass algorithm. To answer queries exactly online ( ), need to store the whole stream. White-box attack: given storage size , carefully designed size input leaves a problem as hard as rank problem on input of size . Idea: always replace by indistinguishable element, so next pass must compute on some set of size . Formulate approximate versions: return rank/quantile within bounded error : Then can also ask for tunability (required size as function of ).
  • 7. Comments about theComments about the -approximate conditions-approximate conditions Formulated in quantile space, not value space, i.e., the estimated 95th percentile is required to be between the actual 94th and 96th percentiles; ~the estimated 95th percentile is required to be within 1% of the true 95th percentile~ formulation that makes sense for streams drawn for arbitrary ordered sets invariant under monotonic transformations ("1%" guarantees are not preserved under translation e.g.) problem for guarantees in value space has several simple solutions (enhance xed bins) guarantees in quantile space can be arbitrarily bad in value space, and vice versa: these are different problems Now, imagine we could store the whole stream and then produce a compact read-only data structure:
  • 9. Mergeability (another requirement)Mergeability (another requirement) formed on separate streams Mergeability means we can de ne a new data structure with similar error guarantees as that of and (i.e., hard to distinguish from constructing on ). Applications: distributed computing (separate machines), separate windows of time (or other dimensions) to be re-assembled at query time
  • 10. First approaches, continuedFirst approaches, continued Munro-Paterson also provided algorithm that succeeds with high probability (for nding median, say). Maintain consecutive elements, counts and of elements below and above the - element set. View progression as random walk of ; will nd median if always . Assuming equal probabilities, rst steps stay within of the origin (with high probability). ( enables reduction to problem of size , hence favorable asymptotic size)
  • 11. Greenwald-Khanna sketch maintains an ordered set of stream elements, together with bounds on their possible ranks. Denote by bounds on current rank of some stream element . Sketch consists of: Error for rank queries is bounded by Insert an element by adding Compression tries to merge tuples so that for all (the 's add under merging consecutive elements) Requires space
  • 12. Essentially optimal solution: KLL sketch (careful sampling)Essentially optimal solution: KLL sketch (careful sampling) A compactor holds items, each of weight ; can compact them into items each of weight (keep even or odd elements, with equal probability). Hierarchy of compactors of increasing size. Fix . elements of weight ... ... elements of weight elements of weight
  • 13. Express number of levels and compactions in terms of weights and stream size. A single compaction produces error where and (whether even or odd selected); sum over all compactions and levels, use Hoeffding's lemma. Matches G-K size, simpli ed construction and arguments. Also mergeable. Replace lower levels with samplers, keep constant in higher levels. Optionally pass the whole device to G-K (loses mergeability); get , which is optimal.
  • 14. Relative errorRelative error For skewed distributions (e.g., latency), care more about accuracy near the tails. -digest prescribes desired accuracy as a function of quantile space maintain sorted list of centroids: represents points near ; insertion and merging mechanics permissible centroid size governed by scale function (non- decreasing) cluster occupies in quantile space, then interval has length (or cluster consists of one point) ExamplesExamples
  • 16. Non-linear scale function makes accuracy variable, error proportional to (something like) for Latency distributions have signi cant positive skew Desire asymmetric accuracy: higher accuracy towards , lower towards Property of the scale function that clusters satisfy condition after insertion/merging.
  • 17. CharacterizationCharacterization The scale function is decent ( accepts insertions for all ) if and only if for all and all , we have: for (moves to right) and for (moves to left). ( is proportion occupied by inserted cluster)
  • 18. In [7]: # Decent scale function must be continuous, in fact differentiable discont()
  • 19. Differentiability suggests tangent line construction to produce asymmetric -digest In [9]: glued_scale_functions() To verify decency, use one-variable characterization of decency: and are non-increasing on
  • 20. Errors and centroid counts for the usual ( rst row) and glued (second row) variants ofErrors and centroid counts for the usual ( rst row) and glued (second row) variants of for for for
  • 21. Relative error KLLRelative error KLL Instead of , Corresponds to scale function. Motivation: near , error is not so helpful. Uses hierarchy of relative-compactors : only compact in the larger half, and "how close" compaction gets to the median is controlled by an exponential distribution. (Vary sampling across the distribution.) Worse space than usual KLL (provably needed): In a given level: In [11]: relative_compactor()
  • 22. Moment-based quantile sketchMoment-based quantile sketch Motivation: why not just keep and use -score to answer rank/quantile queries? Idea: sketch consists of several (log-)moments. Trivial to merge! To extract quantiles: among all distributions realizing the empirical moments, pick one via principle of maximum entropy and use its quantiles. Solution from exponential family, ef cient numerical methods to solve optimization problem. In case two moments, amounts to assuming normal distribution. Aimed at high-cardinality scenario in which answering a quantile query may require merging millions of subsketches; for the sketches mentioned earlier, amounts to merging millions of sorted lists. Addition of moments (even over millions of records) can be made very fast because vectorizable. Example: understand application performance across {user device type, geography, software version, time}.
  • 23. ReferencesReferences J Ian Munro and Mike S Paterson. "Selection and sorting with limited storage." Theoretical computer science, 12(3):315–323, 1980. Michael Greenwald, Sanjeev Khanna, et al. "Space-ef cient online computation of quantile summaries." ACM SIGMOD Record, 30(2):58–66, 2001. Zohar Karnin, Kevin Lang, and Edo Liberty. "Optimal quantile approximation in streams." In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 71–78. IEEE, 2016. Ted Dunning, Otmar Ertl. "Computing extremely accurate quantiles using -digests." arXiv:1902.04023, 2019. Ted Dunning. "Conservation of the -digest scale invariant." arXiv:1903.09919, 2019. Joe Ross. "Asymmetric scale functions for -digests." Submitted, 2019; branch of Dunning's -digest repo. Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, and Pavel Veselý. "Relative error streaming quantiles." arXiv preprint arXiv:2004.01668, 2020. Edward Gan, Jialin Ding, Kai Sheng Tai, Vatsal Sharan, Peter Bailis. "Moment-based quantile sketches for ef cient high cardinality aggregation queries." Proceedings of the VLDB Endowment, 11(11), 1647-1660, 2018. https://github.com/signalfx/t-digest/tree/asymmetric/docs/asymmetric (https://github.com/signalfx/t-digest/tree/asymmetric/docs/asymmetric)