Skip to content

Logstash Integration with Elasticsearch Data Streams #12178

Closed
@acchen97

Description

@acchen97

Overview

This is an overview of the Logstash integration with Elasticsearch data streams. The integration will be added as a feature to the existing Elasticsearch output plugin. This will include new data stream options that will be recommended for indexing any time series datasets (logs, metrics, etc.) into Elasticsearch. The existing options will continue to be used for non-time series use cases. This feature will be available on both the default and OSS Logstash distributions.

Indexing Strategy

The data streams integration will adopt the new indexing strategy under the {type}-{dataset}-{namespace} format, leveraging the composable templates bundled in Elasticsearch starting in 7.9.

The default data streams name used will be logs-generic-default. This default enables users to easily correlate data with other different data sources (e.g. with logs-* and logs-generic-*) in Elasticsearch. Given the new indexing strategy, the type, dataset, and namespace of the data stream name can all be configured separately.

As Logstash will not be fully ECS compliant until 8.0, there are caveats we need to document (or provide bootstrap checks) for users to avoid ECS conflicts.

  • Update the Beats input, TCP input, UDP input, and grok filter. If they are using these plugins, they should enable ECS compatibility mode to avoid ECS conflicts. This is work in progress for the 7.9 / 7.10 timeframe.
  • Users should not introduce any ECS conflicting fields in their pipeline when using this plugin. This should be more systematic in the future when we add ECS validation.

Example Configuration

Basic default configuration

output {
    elasticsearch {
        hosts => "hostname"
        data_stream => "true"
    }
}

Minimal settings to get started in Logstash 7.x. Events with the data_stream.* fields will automatically get routed to the appropriate data streams. Defaults to logs-generic-logstash if the fields are missing.

Customize data stream name

output {
    elasticsearch {
        hosts => "hostname"
        data_stream => "true"
        data_stream_timestamp => "@timestamp"
        data_stream_type => "metrics"
        data_stream_dataset => "foo"
        data_stream_namespace => "bar"
    }
}

Configuration Settings

These are the net new data stream specific settings that will be added to the Elasticsearch output plugin:

  • data_stream (string, optional) - defines whether data will be indexed into an Elasticsearch data stream. The data_stream_* settings will only be used if this setting is enabled. This setting supports the values true, false, and auto. Defaults to false in Logstash 7.x and auto starting in Logstash 8.0. More details on the auto behavior can be found in this issue.
  • data_stream_timestamp (timestamp, required) - the timestamp used for the data stream. Defaults to @timestamp.
  • data_stream_type (string, optional) - the data stream type used to construct the data stream at index time. Only logs or metrics is allowed. This field does not support hyphens (-). Defaults to logs.
  • data_stream_dataset (string, optional) - the data stream dataset used to construct the data stream at index time. This field does not support hyphens (-). Defaults to generic.
  • data_stream_namespace (string, optional) - the data stream namespace used to construct the data stream at index time. This field does not support hyphens (-). Defaults to default.
  • data_stream_auto_routing (boolean, optional) - automatically routes events by deriving the data stream name using specific event fields with the %{data_stream.type}-%{data_stream.dataset}-%{data_stream.namespace} format. If enabled, the data_stream.* event fields will take precedence over the data_stream_type, data_stream_dataset, and data_stream_namespace settings, but will fall back to them if any of the fields are missing from the event. Defaults to true.
  • data_stream_sync_fields (boolean, optional) - automatically syncs the data_stream.* event fields if they are missing from the event. This ensures the data_stream.* fields match the data stream name that events are indexed to. The field syncing behavior between this setting and the data_stream_auto_routing setting can be found in this issue. Defaults to true.

Elastic Agent Compatibility

Logstash often acts as an intermediary for receiving data from other systems like the Elastic Agent and Kafka. For these use cases, Logstash will by default use the data_stream.type, data_stream.dataset, and data_stream.namespace event fields to derive the data stream name. This allows events from the Elastic Agent to automatically be routed to the appropriate Elasticsearch data stream when using Logstash in between. This feature can be disabled by configuring the data_stream_auto_routing setting to false.

Format: %{data_stream.type}-%{data_stream.dataset}-%{data_stream.namespace}

Events received from the Elastic Agent should generally have all the data_stream.* fields populated. In the case where any of these fields are missing, the data_stream_sync_fields setting will be used to sync these fields prior to indexing.

Limitations

The primary limitation of data streams is the ability to perform updates to the documents. Logstash users have historically used the existing Elasticsearch output plugin’s capabilities to conduct document updates and achieve exactly once delivery semantics.

Future Considerations

  • The logs-generic-default is the default data stream for generic data from Logstash and the Elastic Agent. If users express feedback that it’s difficult to identify Logstash sourced data from the shared data stream, we could consider adding a from-logstash tag to the tags ECS base field for events coming from Logstash.
  • We want to guide users towards using the new indexing strategy, but if users express the need for more flexibility, we could introduce a free form option for specifying the data stream name in the future where template/ILM management would be manual.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions