-
Notifications
You must be signed in to change notification settings - Fork 174
OM 2.0: OM protobuf future #296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Few thoughts and remarks: First of all, please take #256 into account. It fixes a few things in the existing OM proto spec and adds native histograms and also adds classic float histograms. While we might want to change things in detail, I think the basic ideas in this PR should be considered more or less part of OM already when deciding about OMv2 proto vs. Prometheus proto. With that said, I think the most important next step is to check for payload differences between OM and classic Prometheus that have nothing to do with the proto version. OMv1 has enforced units in the metric name, and enforced addition of a Once we are at that point, I think both approaches described above would converge towards something very similar, and whether we call it "OMv2.X proto" or "Prometheus vX.Y proto" is mostly cosmetic. With a versioned protobuf protocol, we could even do "formally breaking" changes in the Prometheus proto that we have shied away from. I don't think it would be hard for self-sustained generators and parsers to have parallel implementations. It is a bit harder for code that uses the proto spec as the internal data model, too, like prometheus/client_golang or prometheus/pushgateway, where the current proto spec is present throughout the code. But that shouldn't stop us if getting rid of legacy structures in the proto spec has a tangible benefit. |
About a few specific features of OM proto vs. Prometheus proto:
|
I guess the desired technical outcome will be that we have an OMv2 text format with no adoption blockers (neither for classic Prometheus usage nor for "modern" OTel interoperation), where exposers can be negotiated to use protobuf instead in a transparent fashion (i.e. without any change of the exposed metrics). In that case, I would call that protobuf "OMv2 proto format". This is a non-technical preference, to avoid confusion, increase consistency in terminology, and keep the "brand" of OM. |
Uh oh!
There was an error while loading. Please reload this page.
Problem Statement
OM proto is not currently adopted (Prometheus libs and main binary is not aware of it).
Prometheus ecosystem still use and invest in Prometheus Proto although in the past it was attempted to be deprecated. (proto3 version). Currently it's on the way to be used as a default scrape configuration (it's default for native histograms and bunch of other feature flags).
Given that, it's not clear if, as a part of OM 2.0 WG we should continue OM proto, improve it or remove from OM completely and recommend the existing Prometheus proto. Note that this is a separate topic to the OM text which is the main area of the OM 2.0 focus.
OM Proto vs Prometheus Proto
Protocols are pretty similar, both uses similar MetricFamily abstraction and have similar gauge, counter histogram, summary structures. They do differ a little bit though too:
OM proto:
Metric.MetricPoint
on the already repeatedMetric
. Therepeated
part is interesting, because potentially encourages sending multiple points (e.g. historical too), not only current values, not sure if intended.MetricSet
that blocks major optimizations possible with PrometheusProto delimited format.Prometheus Proto (proto3)
Metric
for each metric value.Info
andStateSet
MetricTypes (both are interpreted as gauges in Prometheus as of now).google.protobuf.Timestamp timestamp = 3; // OpenMetrics-style.
, some useint64 timestamp_ms = 6;
. The latter is easier (and faster) to use, but0
means not set, which blocks the use of the exact 0 millisecond timestamp (implicitly accepted in many places in Prometheus e.g. Remote Write).To sum up, PrometheusProto is closer to what Prometheus implements now, including native histograms. It also unblocks a bit more efficient parsing. On the other hand OM Proto is consistent with OM 1.0 types and makes it a bit easier (?) to send historical samples for the same series. OM proto is also strictly versioned (read below why that's important).
Protobuf versioning
During WG discussions there was a point made around protobuf versioning -- the fact it does not need strict minor/patch versioning as we can do a lot of changes without breaking users or user interaction.
I would argue, in the world of data heavy network protocols like OM or Remote Write that's not practically true. Generally, we need to use the same versioning structructure as for the text format.
Examples:
schemaURL
attribute to MetricFamily one day. Adding field with this new information is not a breaking change. However, without a concrete minor version bump this change won't be well announced. This is also the same if our text format make a MUST on skipping unknown lines.Info
type it has to decide where to put it (a) as the newInfo
type, (b) old, deprecated for info metrics,Gauge
type or (c) both. To not break user it would need to be (c), but it's not practically possible for complexity and efficiency reasons (not easily compressible duplicated data send over network, detecting duplicates on parse).To sum up, some versioning and content negotiation might be needed for protobuf protocols as well.
Proposed solution
Implementing Protobuf support, efficiently was a big task, and PrometheusProto unblocks streaming and is already adopted. There's also not many differences vs OM Proto that would motivate the ecosystem to adopt OM proto either.
Perhaps the best course of action would be:
Pros:
Cons:
Alternatives considered
We could add native histograms in OM 2.0. For efficiency we could introduce delimited format. Then we kind of reimplement PrometheusProto though under OM umbrella, which is Prometheus umbrella now. Perhaps not worth it?
Iterating on adopted protocol feels better for the ecosystem too.
Interesting, but do we have resources for this. The only benefit I see is the opportunity to rethink "MetricFamily" concept that does not exists (and does not make sense) in Prometheus. That would be only readability improvement, nothing more 🤔
At some point that was an intention. However protobuf was useful for experiments (it's the only protocol that has practical native histograms for the last few years) and it's likely to be more efficient once Prometheus switches to complex types and we finalize the gogo/custom generator aspect.
The text was updated successfully, but these errors were encountered: