Skip to content

Only publish desired balance gauges on master #115383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

nicktindall
Copy link
Contributor

@nicktindall nicktindall commented Oct 23, 2024

Closes ES-9834

@@ -168,6 +171,7 @@ public String toString() {
if (event.localNodeMaster() == false) {
onNoLongerMaster();
}
nodeIsMaster.set(event.localNodeMaster());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps there is an alternative pattern for doing this kind of thing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can reduce the number of atomic set calls with a check of event.localNodeMaster() != event.previousState().nodes().isLocalNodeElectedMaster().

Alternatively, we can register the metrics in this class since we are already accessing the fields like desiredBalanceReconciler.unassignedShards directly in onNoLongerMater. This way we can keep a volatile field for master check and no need to pass it down. Not necessarily better than what you have. Happy for you to decide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a fairly significant refactor to move all the metrics and their state & logic to a dedicated class, it means we can unit test, get rid of all the Atomics, and make a nicer interface for setting whether we're master and getting and setting the metric values.

@nicktindall nicktindall added >bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Oct 23, 2024
@nicktindall nicktindall requested review from idegtiarenko and DaveCTurner and removed request for DaveCTurner October 23, 2024 05:49
@elasticsearchmachine
Copy link
Collaborator

Hi @nicktindall, I've created a changelog YAML for you.

@nicktindall nicktindall requested a review from ywangd October 23, 2024 05:50
@nicktindall nicktindall marked this pull request as ready for review October 23, 2024 06:06
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 23, 2024
"es.allocator.desired_balance.allocations.undesired.current",
"Total number of shards allocated on undesired nodes excluding shutting down nodes",
"{shard}"
"{shard}",
this::getUndesiredAllocations
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These return a Closeable which (in the APM implementation at least) deregisters the metric when closed, we were ignoring it previously when we used the synchronous API and these metrics live for the life of the node so I assume we don't need to worry about it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine since a node may become master again. So we don't want to close it since there is no way to re-open. Muting with empty result is what we want here.

);
undesiredAllocations = LongGaugeMetric.create(
meterRegistry,
meterRegistry.registerLongsGauge(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am new to this one.
Does it emit a single document with array of metrics or number of separate documents?
In other words, if not an elected master, would we get empty array or no documents?

Copy link
Contributor Author

@nicktindall nicktindall Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The places I've seen it used is where you have a bunch of values for a metric that have different labels. So a number of separate documents. In this case we're just using it to be able to publish nothing when we are not the master (i.e. no documents)

pr: 115383
summary: Only publish desired balance gauges on master
area: Allocation
type: bug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely convinced this is bug. Sounds more like an enhancement when we only publish metric from elected master and do not publish zero metrics from everywhere else.

@elasticsearchmachine
Copy link
Collaborator

Hi @nicktindall, I've updated the changelog YAML for you.

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

"es.allocator.desired_balance.allocations.undesired.current",
"Total number of shards allocated on undesired nodes excluding shutting down nodes",
"{shard}"
"{shard}",
this::getUndesiredAllocations
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine since a node may become master again. So we don't want to close it since there is no way to re-open. Muting with empty result is what we want here.

@@ -168,6 +171,7 @@ public String toString() {
if (event.localNodeMaster() == false) {
onNoLongerMaster();
}
nodeIsMaster.set(event.localNodeMaster());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can reduce the number of atomic set calls with a check of event.localNodeMaster() != event.previousState().nodes().isLocalNodeElectedMaster().

Alternatively, we can register the metrics in this class since we are already accessing the fields like desiredBalanceReconciler.unassignedShards directly in onNoLongerMater. This way we can keep a volatile field for master check and no need to pass it down. Not necessarily better than what you have. Happy for you to decide.

@nicktindall nicktindall requested a review from ywangd October 24, 2024 02:38
Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for the extra refinement.

@@ -168,6 +170,7 @@ public String toString() {
if (event.localNodeMaster() == false) {
onNoLongerMaster();
}
desiredBalanceMetrics.setNodeIsMaster(event.localNodeMaster());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I still think checking event.localNodeMaster() != event.previousState().nodes().isLocalNodeElectedMaster() before calling this method is helpful since otherwise most of the time it ends up setting the same value.

@nicktindall nicktindall merged commit cc9a08a into elastic:main Oct 24, 2024
15 of 16 checks passed
@nicktindall nicktindall deleted the ES-9834_only_publish_desired_balance_metrics_on_current_master branch October 24, 2024 07:43
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Oct 24, 2024
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants