-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Do not recommend increasing max_shards_per_node
#120458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not recommend increasing max_shards_per_node
#120458
Conversation
Today if the `shards_capacity` health indicator detects a problem then it recommends increasing the limit, which goes against the advice in the manual about not increasing these limits and also makes it rather pointless having a limit in the first place. This commit improves the recommendation to suggest either adding nodes or else reducing the shard count.
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @DaveCTurner, I've created a changelog YAML for you. |
static final Diagnosis SHARDS_MAX_CAPACITY_REACHED_DATA_NODES = SHARD_MAX_CAPACITY_REACHED_FN.apply( | ||
"increase_max_shards_per_node", | ||
"decrease_shards_per_non_frozen_node", | ||
ShardLimitValidator.SETTING_CLUSTER_MAX_SHARDS_PER_NODE, | ||
"data" | ||
"non-frozen" | ||
); | ||
static final Diagnosis SHARDS_MAX_CAPACITY_REACHED_FROZEN_NODES = SHARD_MAX_CAPACITY_REACHED_FN.apply( | ||
"increase_max_shards_per_node_frozen", | ||
"decrease_shards_per_frozen_node", | ||
ShardLimitValidator.SETTING_CLUSTER_MAX_SHARDS_PER_NODE_FROZEN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bad "increase the limit" advice was baked into the actual diagnosis IDs - fixed here, and see also https://github.com/elastic/telemetry/pull/4362 for the corresponding change to the telemetry cluster
Hey @DaveCTurner , you are bringing up a very good point here. I do have a concern though. If I am not mistaken the current limit is quite low, so it is probable that it would make sense to first increase the limit before expanding the cluster or reducing the shards. So, I am thinking of 2 options to make this more useful to users:
Does this make sense? |
The default of 1000 shards per node is still rather relaxed IMO, at least for high-segment-count or high-field-count indices, and we do want users to stick to it for now. We do get support cases involving egregiously high shard-per-node counts sometimes, and we need to be able to point at the guidance in the manual when telling users to scale up their clusters. It rather weakens that argument when the health API told them specifically to keep on relaxing the limit each time they got close. A better limit would be nice ofc, maybe one based on #111123, but that won't be a quick process and I don't think we can in good conscience block this change on that work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for raising this and addressing this @DaveCTurner
Thanks @gmarouli |
💔 Backport failed
You can use sqren/backport to manually backport by running |
Today if the `shards_capacity` health indicator detects a problem then it recommends increasing the limit, which goes against the advice in the manual about not increasing these limits and also makes it rather pointless having a limit in the first place. This commit improves the recommendation to suggest either adding nodes or else reducing the shard count.
Backported to 8.x in de5be24 |
Today if the
shards_capacity
health indicator detects a problem thenit recommends increasing the limit, which goes against the advice in the
manual about not increasing these limits and also makes it rather
pointless having a limit in the first place.
This commit improves the recommendation to suggest either adding nodes
or else reducing the shard count.