Time Series Database
Time Series Database
Time series data are simply measurements or events that are tracked, monitored, downsampled,
and aggregated over time.
This could be server metrics, application performance monitoring, network data, sensor data,
events, clicks, trades in a market, and many other types of analytics data.
A time series database is built specifically for handling metrics and events or measurements that
are time-stamped.
Properties that make time series data very different than other data workloads are data lifecycle
management, summarization, and large range scans of many records.
But financial data is hardly the only application of time series data anymore — in fact, it’s only
one among numerous applications across various industries.
The fundamental conditions of computing have changed dramatically over the last decade.
Everything has become compartmentalized.
Today, everything that can be a component is a component. In addition, we are witnessing the
instrumentation of every available surface in the material world — streets, cars, factories, power
grids, ice caps, satellites, clothing, phones, microwaves, milk containers, planets, human bodies.
Everything has, or will have, a sensor.
So now, everything inside and outside the company is emitting a relentless stream of metrics
and events or time series data.
Time series databases have key architectural design properties that make them very different
from other databases. These include time-stamp data storage and compression, data lifecycle
management, data summarization, ability to handle large time series dependent scans of many
records, and time series aware queries.
For example: With a time series database, it is common to request a summary of data over a
large time period.
This requires going over a range of data points to perform some computation like a percentile
increase this month of a metric over the same period in the last six months, summarized by
month.
This kind of workload is very difficult to optimize for with a distributed key value store.
TSDB’s are optimized for exactly this use case giving millisecond level query times over months
of data.’
Another example: With time series databases, it’s common to keep high precision data around
for a short period of time. This data is aggregated and downsampled into longer term trend data.
This means that for every data point that goes into the database, it will have to be deleted after its
period of time is up. This kind of data lifecycle management is difficult for application
developers to implement on top of regular databases. They must devise schemes for cheaply
evicting large sets of data and constantly summarizing that data at scale. With a time series
database, this functionality is provided out of the box.
1. InfluxDB
2. Kdb+
3. Prometheus
4. Graphite
5. TimescaleDB
6. DolphinDB
7. RRDTool
8. OpenTSDB
9. Apache Druid
10. TDengine
11. GridDB
12. QuestDB
13. Fauna
14. Amazon Timestream
15. VictoriaMetrics
To see trends over time, the following graphic shows the top 10 time series databases and their
historical changes:
Time series – the fastest growing database category
DB-Engines also ranks time series database management systems (Time Series DBMS)
according to their popularity. Time series databases are the fastest growing segment of the
database industry over the past year.
InfluxDB is part of a comprehensive platform that supports the collection, storage, monitoring,
visualization and alerting of time series data.
Other time series solutions don’t support multiple fields, which can make their network protocols
bloated when transmitting data with shared tag sets. Most other time series solutions only
support float64 values, which means the user is unable to encode additional metadata along with
the time series. Even OpenTSDB and KairosDB, which support tags (unlike Graphite and RRD),
have limitations on the number of tags that can be used. At around 5 to 6 tags, the user will start
seeing hot spots within their cluster of HBase or Cassandra machines.
InfluxDB doesn’t have this limitation because the InfluxDB data model is designed for time
series specifically. It pushes the developer in the right direction to get good performance out of
the database by indexing tags and keeping fields unindexed. It’s flexible in that many data types
are supported, and the user can have many fields and tags. Because of all these factors, a
purpose-built time series database like InfluxDB is the best solution for working with time series
data.
Here’s a brief time series database definition: A time series database (TSDB) is a database
optimized for time-stamped (time series) data and for measuring change over time.
What is the best time series database?
Visit this page to learn about what makes a powerful time series database and which database is
best for storing large volumes of time series data.
Visit the What is time series data page to view time series data examples.
InfluxDB is an open source time series database with a large and vibrant community.
There are thousands of use cases utilizing InfluxDB and Grafana. Visit our Community
Showcase to read about them.
View InfluxDB benchmarking tests comparing its performance to other databases (such as
Cassandra, Elasticsearch, MongoDB, OpenTSDB, Graphite and Splunk) based on parameters
such as write throughout, query throughput, and on-disk storage.
Is a time series database better than a relational database for handling time series data?
If you’re debating time series database vs relational database, a time series database (TSDB) is
specific for sorting and querying time series data, and tends to be more efficient than a relational
database, which is more generic.
Transmitting data from the edge to cloud in a reliable way continues to be a challenge for many
businesses. Read the Edge Computing & Data Replication with InfluxDB e-book to learn what
‘the edge’ is, edge computing use cases and benefits, and how InfluxDB time series database can
be used for edge computing.
What is the difference between a time series database and a data warehouse?
While a time series database is a database optimized for time-stamped or time series data, a data
warehouse stores and organizes data from multiple sources in a central location.