-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Optimize IngestCtxMap construction #120833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize IngestCtxMap construction #120833
Conversation
Hi @joegallo, I've created a changelog YAML for you. |
Pinging @elastic/es-data-management (Team:Data Management) |
Honestly, shallowly rehashing the map is misleading at best -- the maps that back an ingest document are typically deeply nested, so a shallow rehash at the top-level doesn't do much for you.
f3bd49c
to
1cde31e
Compare
@@ -43,7 +45,7 @@ final class IngestCtxMap extends CtxMap<IngestDocMetadata> { | |||
ZonedDateTime timestamp, | |||
Map<String, Object> source | |||
) { | |||
super(new HashMap<>(source), new IngestDocMetadata(index, id, version, routing, versionType, timestamp)); | |||
super(source, new IngestDocMetadata(index, id, version, routing, versionType, timestamp)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no way this can burn us in any existing scripts, is there? I don't think they can ever call this constructor, so it couldn't be a problem. Just trying to think through potential problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the ctx
of a script
processor, so by the time the script processor has a handle on it, it would have already been duplicated off the once (at the very very beginning new IngestService.newIngestDocument
). It's not something you can call from a script
processor.
I spent a fair amount of time reading through the various constructors and copy constructors of the IngestDocument
and its friends, and I really do think I've covered all the bases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But philosophically, if it's a bug to not rehash, then what about the bug of re-hashing but only one layer deep -- these maps are almost always deeply nested, if this code had a job to do then it never actually did it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This change is 99% tests, and 1% removing a call to
new HashMap(...)
that shallowly copied an incoming map into a new map. It's certainly been convenient be able to write tests withMap.of(...)
for the source of the document, but in reality the cost of re-hashing at scale at runtime is not zero (though it is small), and the idea that it's protecting us from much of anything to rehash in this way seems mistaken, especially since it's a only shallow re-hashing.