SlideShare a Scribd company logo
December 12-14 2023 December 12-14 2023
The Building Blocks of a
Time-Series database
Javier Ramirez
Database advocate
@supercoco9
Timestamp problems are hard
https://twitter.com/rshidashrf/status/975566803052134400
https://stackoverflow.com/questions/6841333/why-is-subtracting-these-two-epoch-milli-times-in-year-1927-giving-a-strange-r
1553983200
* This is a date. Which one?
When is this?
1/4/19
* is this April 19th? january 4th? April 1st?
Working with timestamped data in a
database is tricky*
* specially working with analytics of data changing over time or at a high rate
If you can use only one
database for everything, go
with PostgreSQL*
* Or any other major and well supported RDBMS
Some things RDBMS are not designed for
● Writing data faster than it is read (several millions of inserts per day and faster)
● Aggregations scoped to different time units (per year/minute/microsecond)
● Identifying gaps or missing data for a given interval
● Joining tables by approximate timestamp
● Sparse data (tables with hundreds or thousands of columns)
● Aggregates over billions of records
QuestDB: The building blocks of a fast open-source time-series database
● a factory floor with 500 machines, or
● a fleet with 500 vehicles, or
● 50 trains, with 10 cars each, or
● 500 users with a mobile phone
Sending data every second
How I made my first billion
86,400
* Seconds in one day
604,800
* Seconds in one week
2,628,288
* Seconds in one month. Well, in the average month of 30.42 days anyway
43,200,000 rows a day…….
302,400,000 rows a week….
1,314,144,000 rows a month
How I made my first billion
* See? On streaming data, It is kind of easy to get your first billion of data points
Not all data
problems are
the same
Time-series database basics
● Optimised for fast append-only ingestion
● Data lifecycle policies
● Analytics over chunks of time
● Time-based aggregations
● Often power real-time dashboards
QuestDB: The building blocks of a fast open-source time-series database
QuestDB would like to be known for:
● Performance
○ Also with smaller machines
● Developer Experience
● Proudly Open Source (Apache 2.0)
Fast streaming
ingestion
* You can try ingesting streaming data using https://github.com/javier/questdb-quickstart
QuestDB ingestion and storage layer
● Data always stored by incremental timestamp.
● Data partitioned by time units and stored in columnar format.
● No indexes needed. Data is immediately available after writing.
● Predictable ingestion rate, even under demanding workloads (millions/second).
● Built-in event deduplication.
● Optimized data types (Symbol, geohash, ipv4, uuid).
● Row updates and upserts supported.
Lifecycle policies
ALTER TABLE my_table DROP PARTITION LIST '2021-01-01', '2021-01-02';
--Delete days before 2021-01-03
ALTER TABLE my_table DROP PARTITION WHERE timestamp < to_timestamp('2021-01-03', 'yyyy-MM-dd');
ALTER TABLE x DETACH PARTITION LIST '2019-02-01', '2019-02-02';
-- It is also possible to use WHERE clause to define the partition list
ALTER TABLE sensors DETACH PARTITION WHERE < '2019-02-03T00';
CREATE TABLE my_table (i symbol, ts timestamp) IN VOLUME SECONDARY_VOLUME;
Connectivity, protocols, and interfaces
● REST API and web console: Query execution, CSV imports/exports. Basic charts.
● Pgwire: perfect for querying, DDL, and DML. Ingestion supported, up to moderate
throughput. Compatible with any low-level postgresql client or library.
● Influx Line Protocol(ILP): socket-based, ingestion only, very high throughput. Official clients
available for C/C++, JAVA, Python, Rust, Go, NodeJS, and .Net.
● Health/Metrics: HTTP endpoint with Prometheus format
● Integrations with: Apache Kafka, Apache Flink, Apache Spark, Python Pandas, Grafana,
Superset, Telegraf, Redpanda, qStudio, SQLAlchemy, Cube…
The query engine
QuestDB Query engine internals
● Our Java codebase has zero dependencies. No garbage collection on
the hot path. As close to the hardware as possible.
● We research the latest trends. Our code takes advantage of the
state-of-the-art in CPU, storage design, and data structures.
● We implement our own Just in Time Compiler to make query execution
as parallel and fast as possible.
● We spend weeks of development to save microseconds or
nanoseconds in many operations.
The query language: SQL with
time-series extensions
LATEST ON … PARTITION BY …
Retrieves the latest entry by timestamp for a given key or combination of keys, for scenarios where multiple
time series are stored in the same table.
SELECT * FROM trades
LATEST ON timestamp PARTITION BY symbol;
Try it live on
https://demo.questdb.io
LATEST ON … PARTITION BY …
Retrieves the latest entry by timestamp for a given key or combination of keys, for scenarios where multiple
time series are stored in the same table.
SELECT * FROM trades
WHERE symbol in ('BTC-USD', 'ETH-USD')
LATEST ON timestamp PARTITION BY symbol, side;
Try it live on
https://demo.questdb.io
SAMPLE BY
Aggregates data in homogeneous time chunks
SELECT
timestamp,
sum(price * amount) / sum(amount) AS vwap_price,
sum(amount) AS volume
FROM trades
WHERE symbol = 'BTC-USD' AND timestamp > dateadd('d', -1, now())
SAMPLE BY 15m ALIGN TO CALENDAR;
SELECT timestamp, min(tempF),
max(tempF), avg(tempF)
FROM weather SAMPLE BY 1M;
Try it live on
https://demo.questdb.io
How do you ask your database to
return which data is not stored?
I am sending data every second or
so. Tell me which devices didn’t
send any data with more than 1.5
seconds gap
SAMPLE BY … FILL
Can fill missing time chunks using different strategies (NULL, constant, LINEAR, PREVious value)
SELECT
timestamp,
sum(price * amount) / sum(amount) AS vwap_price,
sum(amount) AS volume
FROM trades
WHERE symbol = 'BTC-USD' AND timestamp > dateadd('d', -1, now())
SAMPLE BY 1s FILL(NULL) ALIGN TO CALENDAR;
Try it live on
https://demo.questdb.io
WHERE … TIME RANGE
SELECT * from trips WHERE pickup_datetime in '2018';
SELECT * from trips WHERE pickup_datetime in '2018-06';
SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59';
Try it live on
https://demo.questdb.io
WHERE … TIME RANGE
SELECT * from trips WHERE pickup_datetime in '2018';
SELECT * from trips WHERE pickup_datetime in '2018-06';
SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59';
SELECT * from trips WHERE pickup_datetime in '2018;2M' LIMIT -10;
SELECT * from trips WHERE pickup_datetime in '2018;10s' LIMIT -10;
SELECT * from trips WHERE pickup_datetime in '2018;-3d' LIMIT -10;
Try it live on
https://demo.questdb.io
WHERE … TIME RANGE
SELECT * from trips WHERE pickup_datetime in '2018';
SELECT * from trips WHERE pickup_datetime in '2018-06';
SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59';
SELECT * from trips WHERE pickup_datetime in '2018;2M' LIMIT -10;
SELECT * from trips WHERE pickup_datetime in '2018;10s' LIMIT -10;
SELECT * from trips WHERE pickup_datetime in '2018;-3d' LIMIT -10;
SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59:58;4s;1d;7'
SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59:58;4s;-1d;7'
Try it live on
https://demo.questdb.io
What if I have two tables, where
data is (obviously) not sent at the
same exact timestamps and I want
to join by closest matching
timestamp?
ASOF JOIN (LT JOIN and SPLICE JOIN variations)
ASOF JOIN joins two different time-series measured. For each row in the first time-series, the ASOF JOIN takes from
the second time-series a timestamp that meets both of the following criteria:
● The timestamp is the closest to the first timestamp.
● The timestamp is strictly prior or equal to the first timestamp.
WITH trips2018 AS (
SELECT * from trips WHERE pickup_datetime in '2016'
)
SELECT pickup_datetime, timestamp, fare_amount, tempF, windDir
FROM trips2018
ASOF JOIN weather;
Try it live on
https://demo.questdb.io
Some things we are trying out next for performance
● Compression, and exploring data formats like arrow/ parquet
● Own ingestion protocol
● Second level partitioning
● Improved vectorization of some operations (group by multiple columns or by expressions
● Add specific joins optimizations (index nested loop joins, for example)
QuestDB OSS
Open Source. Self-managed. Suitable for
production workloads.
https://github.com/questdb/questdb
QuestDB Enterprise
Licensed. Self-managed. Enterprise features like
RBAC, compression, replication, TLS on all
protocols, cold storage, K8s operator…
https://questdb.io/enterprise/
QuestDB Cloud
Fully managed, pay per usage environment,
with enterprise-grade features.
https://questdb.io/cloud/
OSA CON | December 12-14 2023
Q&A
● github.com/questdb/questdb
● https://questdb.io
● https://demo.questdb.io
● https://github.com/javier/questdb-quickstart
● https://slack.questdb.io/
40
Javier Ramirez
@supercoco9
We 💕 contributions
and GitHub ⭐ stars

More Related Content

PDF
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Spark Summit
 
PDF
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
PDF
QuestDB: ingesting a million time series per second on a single instance. Big...
javier ramirez
 
PPTX
HBase coprocessors, Uses, Abuses, Solutions
DataWorks Summit
 
PPTX
Redis and it's data types
Aniruddha Chakrabarti
 
PPT
Introduction to redis
Tanu Siwag
 
PDF
Spark shuffle introduction
colorant
 
PDF
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
ScyllaDB
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Spark Summit
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
QuestDB: ingesting a million time series per second on a single instance. Big...
javier ramirez
 
HBase coprocessors, Uses, Abuses, Solutions
DataWorks Summit
 
Redis and it's data types
Aniruddha Chakrabarti
 
Introduction to redis
Tanu Siwag
 
Spark shuffle introduction
colorant
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
ScyllaDB
 

What's hot (20)

PDF
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
PDF
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
PDF
Advanced MySQL Query Tuning
Alexander Rubin
 
PDF
Is it sensible to use Data Vault at all? Conclusions from a project.
Capgemini
 
PDF
SparkSQL: A Compiler from Queries to RDDs
Databricks
 
PDF
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
PDF
Continuous Application with FAIR Scheduler with Robert Xue
Databricks
 
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
Altinity Ltd
 
PPTX
NATS for Modern Messaging and Microservices
Apcera
 
PPTX
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
PDF
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
Altinity Ltd
 
PDF
MyRocks Deep Dive
Yoshinori Matsunobu
 
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
PPTX
Apache Arrow: In Theory, In Practice
Dremio Corporation
 
PDF
Lessons for the optimizer from running the TPC-DS benchmark
Sergey Petrunya
 
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
PDF
Presto Summit 2018 - 09 - Netflix Iceberg
kbajda
 
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Altinity Ltd
 
KEY
NoSQL Databases: Why, what and when
Lorenzo Alberton
 
PPTX
CQL: This is not the SQL you are looking for.
Aaron Ploetz
 
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
Advanced MySQL Query Tuning
Alexander Rubin
 
Is it sensible to use Data Vault at all? Conclusions from a project.
Capgemini
 
SparkSQL: A Compiler from Queries to RDDs
Databricks
 
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
Continuous Application with FAIR Scheduler with Robert Xue
Databricks
 
ClickHouse ReplacingMergeTree in Telecom Apps
Altinity Ltd
 
NATS for Modern Messaging and Microservices
Apcera
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
Altinity Ltd
 
MyRocks Deep Dive
Yoshinori Matsunobu
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
Apache Arrow: In Theory, In Practice
Dremio Corporation
 
Lessons for the optimizer from running the TPC-DS benchmark
Sergey Petrunya
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Presto Summit 2018 - 09 - Netflix Iceberg
kbajda
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Altinity Ltd
 
NoSQL Databases: Why, what and when
Lorenzo Alberton
 
CQL: This is not the SQL you are looking for.
Aaron Ploetz
 
Ad

Similar to QuestDB: The building blocks of a fast open-source time-series database (20)

PDF
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
PDF
Your Timestamps Deserve Better than a Generic Database
javier ramirez
 
PDF
Deduplicating and analysing time-series data with Apache Beam and QuestDB
javier ramirez
 
PDF
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
 
PDF
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
PDF
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
PPTX
How to Create a Data Infrastructure
Intersog
 
PDF
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
 
PDF
SFScon22 - Anton Dignoes - Managing Temporal Data in PostgreSQL.pdf
South Tyrol Free Software Conference
 
PDF
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
 
PDF
Temporal Data
Command Prompt., Inc
 
PDF
Management of Bi-Temporal Properties of Sql/Nosql Based Architectures – A Re...
lyn kurian
 
PDF
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
Marcin Bielak
 
PDF
Your Database Cannot Do this (well)
javier ramirez
 
PPT
tempDB.ppt
GopiBala5
 
PDF
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
DataStax Academy
 
PDF
MongoDB and the Internet of Things
MongoDB
 
PDF
PostgreSQL: The Time-Series Database You (Actually) Want
Christoph Engelbert
 
PPTX
Just in time (series) - KairosDB
Victor Anjos
 
PDF
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Your Timestamps Deserve Better than a Generic Database
javier ramirez
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
javier ramirez
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
How to Create a Data Infrastructure
Intersog
 
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
 
SFScon22 - Anton Dignoes - Managing Temporal Data in PostgreSQL.pdf
South Tyrol Free Software Conference
 
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
 
Temporal Data
Command Prompt., Inc
 
Management of Bi-Temporal Properties of Sql/Nosql Based Architectures – A Re...
lyn kurian
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
Marcin Bielak
 
Your Database Cannot Do this (well)
javier ramirez
 
tempDB.ppt
GopiBala5
 
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
DataStax Academy
 
MongoDB and the Internet of Things
MongoDB
 
PostgreSQL: The Time-Series Database You (Actually) Want
Christoph Engelbert
 
Just in time (series) - KairosDB
Victor Anjos
 
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
Ad

More from javier ramirez (20)

PDF
¿Se puede vivir del open source? T3chfest
javier ramirez
 
PDF
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
PDF
QuestDB-Community-Call-20220728
javier ramirez
 
PDF
Processing and analysing streaming data with Python. Pycon Italy 2022
javier ramirez
 
PDF
Servicios e infraestructura de AWS y la próxima región en Aragón
javier ramirez
 
PPTX
Primeros pasos en desarrollo serverless
javier ramirez
 
PDF
How AWS is reinventing the cloud
javier ramirez
 
PDF
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
javier ramirez
 
PDF
Getting started with streaming analytics
javier ramirez
 
PDF
Getting started with streaming analytics: Setting up a pipeline
javier ramirez
 
PDF
Getting started with streaming analytics: Deep Dive
javier ramirez
 
PDF
Getting started with streaming analytics: streaming basics (1 of 3)
javier ramirez
 
PPTX
Monitorización de seguridad y detección de amenazas con AWS
javier ramirez
 
PPTX
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...
javier ramirez
 
PDF
Recomendaciones, predicciones y detección de fraude usando servicios de intel...
javier ramirez
 
PDF
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...
javier ramirez
 
PPTX
Re:Invent 2019 Recap. AWS User Groups in Spain. Javier Ramirez
javier ramirez
 
PPTX
Re:Invent 2019 Recap. AWS User Group Zaragoza. Javier Ramirez
javier ramirez
 
PDF
OpenDistro for Elasticsearch and how Bitergia is using it.Madrid DevOps
javier ramirez
 
PDF
¿Son las bases de datos de contabilidad interesantes, o son parte del hype al...
javier ramirez
 
¿Se puede vivir del open source? T3chfest
javier ramirez
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
QuestDB-Community-Call-20220728
javier ramirez
 
Processing and analysing streaming data with Python. Pycon Italy 2022
javier ramirez
 
Servicios e infraestructura de AWS y la próxima región en Aragón
javier ramirez
 
Primeros pasos en desarrollo serverless
javier ramirez
 
How AWS is reinventing the cloud
javier ramirez
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
javier ramirez
 
Getting started with streaming analytics
javier ramirez
 
Getting started with streaming analytics: Setting up a pipeline
javier ramirez
 
Getting started with streaming analytics: Deep Dive
javier ramirez
 
Getting started with streaming analytics: streaming basics (1 of 3)
javier ramirez
 
Monitorización de seguridad y detección de amenazas con AWS
javier ramirez
 
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...
javier ramirez
 
Recomendaciones, predicciones y detección de fraude usando servicios de intel...
javier ramirez
 
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...
javier ramirez
 
Re:Invent 2019 Recap. AWS User Groups in Spain. Javier Ramirez
javier ramirez
 
Re:Invent 2019 Recap. AWS User Group Zaragoza. Javier Ramirez
javier ramirez
 
OpenDistro for Elasticsearch and how Bitergia is using it.Madrid DevOps
javier ramirez
 
¿Son las bases de datos de contabilidad interesantes, o son parte del hype al...
javier ramirez
 

Recently uploaded (20)

PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PDF
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
Power BI in Business Intelligence with AI
KPR Institute of Engineering and Technology
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PDF
Company Presentation pada Perusahaan ADB.pdf
didikfahmi
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PDF
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
Power BI in Business Intelligence with AI
KPR Institute of Engineering and Technology
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Company Presentation pada Perusahaan ADB.pdf
didikfahmi
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 

QuestDB: The building blocks of a fast open-source time-series database

  • 1. December 12-14 2023 December 12-14 2023 The Building Blocks of a Time-Series database Javier Ramirez Database advocate @supercoco9
  • 5. 1553983200 * This is a date. Which one?
  • 6. When is this? 1/4/19 * is this April 19th? january 4th? April 1st?
  • 7. Working with timestamped data in a database is tricky* * specially working with analytics of data changing over time or at a high rate
  • 8. If you can use only one database for everything, go with PostgreSQL* * Or any other major and well supported RDBMS
  • 9. Some things RDBMS are not designed for ● Writing data faster than it is read (several millions of inserts per day and faster) ● Aggregations scoped to different time units (per year/minute/microsecond) ● Identifying gaps or missing data for a given interval ● Joining tables by approximate timestamp ● Sparse data (tables with hundreds or thousands of columns) ● Aggregates over billions of records
  • 11. ● a factory floor with 500 machines, or ● a fleet with 500 vehicles, or ● 50 trains, with 10 cars each, or ● 500 users with a mobile phone Sending data every second How I made my first billion
  • 14. 2,628,288 * Seconds in one month. Well, in the average month of 30.42 days anyway
  • 15. 43,200,000 rows a day……. 302,400,000 rows a week…. 1,314,144,000 rows a month How I made my first billion * See? On streaming data, It is kind of easy to get your first billion of data points
  • 16. Not all data problems are the same
  • 17. Time-series database basics ● Optimised for fast append-only ingestion ● Data lifecycle policies ● Analytics over chunks of time ● Time-based aggregations ● Often power real-time dashboards
  • 19. QuestDB would like to be known for: ● Performance ○ Also with smaller machines ● Developer Experience ● Proudly Open Source (Apache 2.0)
  • 20. Fast streaming ingestion * You can try ingesting streaming data using https://github.com/javier/questdb-quickstart
  • 21. QuestDB ingestion and storage layer ● Data always stored by incremental timestamp. ● Data partitioned by time units and stored in columnar format. ● No indexes needed. Data is immediately available after writing. ● Predictable ingestion rate, even under demanding workloads (millions/second). ● Built-in event deduplication. ● Optimized data types (Symbol, geohash, ipv4, uuid). ● Row updates and upserts supported.
  • 22. Lifecycle policies ALTER TABLE my_table DROP PARTITION LIST '2021-01-01', '2021-01-02'; --Delete days before 2021-01-03 ALTER TABLE my_table DROP PARTITION WHERE timestamp < to_timestamp('2021-01-03', 'yyyy-MM-dd'); ALTER TABLE x DETACH PARTITION LIST '2019-02-01', '2019-02-02'; -- It is also possible to use WHERE clause to define the partition list ALTER TABLE sensors DETACH PARTITION WHERE < '2019-02-03T00'; CREATE TABLE my_table (i symbol, ts timestamp) IN VOLUME SECONDARY_VOLUME;
  • 23. Connectivity, protocols, and interfaces ● REST API and web console: Query execution, CSV imports/exports. Basic charts. ● Pgwire: perfect for querying, DDL, and DML. Ingestion supported, up to moderate throughput. Compatible with any low-level postgresql client or library. ● Influx Line Protocol(ILP): socket-based, ingestion only, very high throughput. Official clients available for C/C++, JAVA, Python, Rust, Go, NodeJS, and .Net. ● Health/Metrics: HTTP endpoint with Prometheus format ● Integrations with: Apache Kafka, Apache Flink, Apache Spark, Python Pandas, Grafana, Superset, Telegraf, Redpanda, qStudio, SQLAlchemy, Cube…
  • 25. QuestDB Query engine internals ● Our Java codebase has zero dependencies. No garbage collection on the hot path. As close to the hardware as possible. ● We research the latest trends. Our code takes advantage of the state-of-the-art in CPU, storage design, and data structures. ● We implement our own Just in Time Compiler to make query execution as parallel and fast as possible. ● We spend weeks of development to save microseconds or nanoseconds in many operations.
  • 26. The query language: SQL with time-series extensions
  • 27. LATEST ON … PARTITION BY … Retrieves the latest entry by timestamp for a given key or combination of keys, for scenarios where multiple time series are stored in the same table. SELECT * FROM trades LATEST ON timestamp PARTITION BY symbol; Try it live on https://demo.questdb.io
  • 28. LATEST ON … PARTITION BY … Retrieves the latest entry by timestamp for a given key or combination of keys, for scenarios where multiple time series are stored in the same table. SELECT * FROM trades WHERE symbol in ('BTC-USD', 'ETH-USD') LATEST ON timestamp PARTITION BY symbol, side; Try it live on https://demo.questdb.io
  • 29. SAMPLE BY Aggregates data in homogeneous time chunks SELECT timestamp, sum(price * amount) / sum(amount) AS vwap_price, sum(amount) AS volume FROM trades WHERE symbol = 'BTC-USD' AND timestamp > dateadd('d', -1, now()) SAMPLE BY 15m ALIGN TO CALENDAR; SELECT timestamp, min(tempF), max(tempF), avg(tempF) FROM weather SAMPLE BY 1M; Try it live on https://demo.questdb.io
  • 30. How do you ask your database to return which data is not stored?
  • 31. I am sending data every second or so. Tell me which devices didn’t send any data with more than 1.5 seconds gap
  • 32. SAMPLE BY … FILL Can fill missing time chunks using different strategies (NULL, constant, LINEAR, PREVious value) SELECT timestamp, sum(price * amount) / sum(amount) AS vwap_price, sum(amount) AS volume FROM trades WHERE symbol = 'BTC-USD' AND timestamp > dateadd('d', -1, now()) SAMPLE BY 1s FILL(NULL) ALIGN TO CALENDAR; Try it live on https://demo.questdb.io
  • 33. WHERE … TIME RANGE SELECT * from trips WHERE pickup_datetime in '2018'; SELECT * from trips WHERE pickup_datetime in '2018-06'; SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59'; Try it live on https://demo.questdb.io
  • 34. WHERE … TIME RANGE SELECT * from trips WHERE pickup_datetime in '2018'; SELECT * from trips WHERE pickup_datetime in '2018-06'; SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59'; SELECT * from trips WHERE pickup_datetime in '2018;2M' LIMIT -10; SELECT * from trips WHERE pickup_datetime in '2018;10s' LIMIT -10; SELECT * from trips WHERE pickup_datetime in '2018;-3d' LIMIT -10; Try it live on https://demo.questdb.io
  • 35. WHERE … TIME RANGE SELECT * from trips WHERE pickup_datetime in '2018'; SELECT * from trips WHERE pickup_datetime in '2018-06'; SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59'; SELECT * from trips WHERE pickup_datetime in '2018;2M' LIMIT -10; SELECT * from trips WHERE pickup_datetime in '2018;10s' LIMIT -10; SELECT * from trips WHERE pickup_datetime in '2018;-3d' LIMIT -10; SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59:58;4s;1d;7' SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59:58;4s;-1d;7' Try it live on https://demo.questdb.io
  • 36. What if I have two tables, where data is (obviously) not sent at the same exact timestamps and I want to join by closest matching timestamp?
  • 37. ASOF JOIN (LT JOIN and SPLICE JOIN variations) ASOF JOIN joins two different time-series measured. For each row in the first time-series, the ASOF JOIN takes from the second time-series a timestamp that meets both of the following criteria: ● The timestamp is the closest to the first timestamp. ● The timestamp is strictly prior or equal to the first timestamp. WITH trips2018 AS ( SELECT * from trips WHERE pickup_datetime in '2016' ) SELECT pickup_datetime, timestamp, fare_amount, tempF, windDir FROM trips2018 ASOF JOIN weather; Try it live on https://demo.questdb.io
  • 38. Some things we are trying out next for performance ● Compression, and exploring data formats like arrow/ parquet ● Own ingestion protocol ● Second level partitioning ● Improved vectorization of some operations (group by multiple columns or by expressions ● Add specific joins optimizations (index nested loop joins, for example)
  • 39. QuestDB OSS Open Source. Self-managed. Suitable for production workloads. https://github.com/questdb/questdb QuestDB Enterprise Licensed. Self-managed. Enterprise features like RBAC, compression, replication, TLS on all protocols, cold storage, K8s operator… https://questdb.io/enterprise/ QuestDB Cloud Fully managed, pay per usage environment, with enterprise-grade features. https://questdb.io/cloud/
  • 40. OSA CON | December 12-14 2023 Q&A ● github.com/questdb/questdb ● https://questdb.io ● https://demo.questdb.io ● https://github.com/javier/questdb-quickstart ● https://slack.questdb.io/ 40 Javier Ramirez @supercoco9 We 💕 contributions and GitHub ⭐ stars