-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Use the system index descriptor in the snapshot blob cache cleanup task #120937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the system index descriptor in the snapshot blob cache cleanup task #120937
Conversation
Hi @pxsalehi, I've created a changelog YAML for you. |
4b3409b
to
dd132d8
Compare
* Mimics migration of the {@link SearchableSnapshots#SNAPSHOT_BLOB_CACHE_INDEX} as done in | ||
* {@link org.elasticsearch.upgrades.SystemIndexMigrator}, where the index is re-indexed, and replaced by an alias. | ||
*/ | ||
private void migrateTheSystemIndex() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did have a quick look at SystemIndexMigrator
hoping I could trigger this migration step by creating/running such a migrator. But that doesn't seem straight-forward and it would still be mimicking a proper upgrade test which is what is missing here. Considering the lack of automated upgrade tests that include the migration steps is a bigger issue and not addressed currently, I kept the test here short just trying to reindex/alias/remove, which is what the migrator is doing.
I have also manually verified this by creating/mounting a searchable snapshot index in 7.x, and upgrading to 8.x (with migration) and then 9.x and removing the searchable snapshot index in 9.x which cleans up the migrated snapshot blob cache index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have no actual upgrade tests that run system index migrations? That indeed sounds problematic and something we should ensure happens (but fine outside this PR, for another team too I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that we don't have that. which is why we resorted to manual testing in the first place. I can follow up and ask around, and create a follow up issue or PR if it turns out to be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #121517 Christoph reused the upgrade tests we made for N-2 support in order to test the upgrade of the async-search
system index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll look into that. back then I asked core/infra about v7->v8->v9 and they mentioned there is currently no option. But this could be partially useful.
Hi @pxsalehi, I've created a changelog YAML for you. |
a9d07aa
to
fb359a3
Compare
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
Hi @pxsalehi, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
* Mimics migration of the {@link SearchableSnapshots#SNAPSHOT_BLOB_CACHE_INDEX} as done in | ||
* {@link org.elasticsearch.upgrades.SystemIndexMigrator}, where the index is re-indexed, and replaced by an alias. | ||
*/ | ||
private void migrateTheSystemIndex() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have no actual upgrade tests that run system index migrations? That indeed sounds problematic and something we should ensure happens (but fine outside this PR, for another team too I think).
if (indexRoutingTable != null) { | ||
return indexRoutingTable.shard(0).primaryShard(); | ||
private boolean systemIndexPrimaryShardActiveAndAssignedToLocalNode(final ClusterState state) { | ||
for (IndexMetadata indexMetadata : state.metadata()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use getMatchingIndices
instead and then assert that it has <= 1 results? I think that simplifies the code here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@henningandersen I wasn't sure that's safe. During migration there will be a brief period after reindexing and before alias/remove that there will be two indices matching the pattern. So that assertion might not hold.
💚 Backport successful
|
…sk (elastic#120937) Clean up of the `.snapshot-blob-cache*` system index is done only on the node that hosts the primary of the shard 0 of that index. When the index is migrated as part of an upgrade test e.g. v7 -> v8, the index is reindexed to a new index `.snapshot-blob-cache-reindexed-for-9`. The code scheduling this clean up task is not able to locate the shard and would never trigger a clean up after the upgrade. This change uses the system index descriptor to find the matching shard and would work for future versions too. Closes elastic#120518
…sk (#120937) (#121053) Clean up of the `.snapshot-blob-cache*` system index is done only on the node that hosts the primary of the shard 0 of that index. When the index is migrated as part of an upgrade test e.g. v7 -> v8, the index is reindexed to a new index `.snapshot-blob-cache-reindexed-for-9`. The code scheduling this clean up task is not able to locate the shard and would never trigger a clean up after the upgrade. This change uses the system index descriptor to find the matching shard and would work for future versions too. Closes #120518
Clean up of the
.snapshot-blob-cache*
system index is done only on the node that hosts the primary of the shard 0 of that index. When the index is migrated as part of an upgrade test e.g. v7 -> v8, the index is reindexed to a new index.snapshot-blob-cache-reindexed-for-9
. The code scheduling this clean up task is not able to locate the shard and would never trigger a clean up after the upgrade. This change uses the system index descriptor to find the matching shard and would work for future versions too.Closes #120518