Skip to content

Optimize IngestCtxMap construction #120833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 27, 2025

Conversation

joegallo
Copy link
Contributor

This change is 99% tests, and 1% removing a call to new HashMap(...) that shallowly copied an incoming map into a new map. It's certainly been convenient be able to write tests with Map.of(...) for the source of the document, but in reality the cost of re-hashing at scale at runtime is not zero (though it is small), and the idea that it's protecting us from much of anything to rehash in this way seems mistaken, especially since it's a only shallow re-hashing.

@joegallo joegallo added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team v9.0.0 v8.18.0 labels Jan 24, 2025
@joegallo joegallo requested a review from masseyke January 24, 2025 19:40
@elasticsearchmachine
Copy link
Collaborator

Hi @joegallo, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

joegallo and others added 4 commits January 24, 2025 15:03
Honestly, shallowly rehashing the map is misleading at best -- the
maps that back an ingest document are typically deeply nested, so a
shallow rehash at the top-level doesn't do much for you.
@joegallo joegallo force-pushed the optimize-ingest-ctx-map-construction branch from f3bd49c to 1cde31e Compare January 24, 2025 20:03
@@ -43,7 +45,7 @@ final class IngestCtxMap extends CtxMap<IngestDocMetadata> {
ZonedDateTime timestamp,
Map<String, Object> source
) {
super(new HashMap<>(source), new IngestDocMetadata(index, id, version, routing, versionType, timestamp));
super(source, new IngestDocMetadata(index, id, version, routing, versionType, timestamp));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no way this can burn us in any existing scripts, is there? I don't think they can ever call this constructor, so it couldn't be a problem. Just trying to think through potential problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the ctx of a script processor, so by the time the script processor has a handle on it, it would have already been duplicated off the once (at the very very beginning new IngestService.newIngestDocument). It's not something you can call from a script processor.

I spent a fair amount of time reading through the various constructors and copy constructors of the IngestDocument and its friends, and I really do think I've covered all the bases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But philosophically, if it's a bug to not rehash, then what about the bug of re-hashing but only one layer deep -- these maps are almost always deeply nested, if this code had a job to do then it never actually did it.

Copy link
Member

@masseyke masseyke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joegallo joegallo merged commit 022b841 into elastic:main Jan 27, 2025
16 checks passed
@joegallo joegallo deleted the optimize-ingest-ctx-map-construction branch January 27, 2025 16:03
joegallo added a commit to joegallo/elasticsearch that referenced this pull request Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants