-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ES|QL - Add scoring for full text functions disjunctions #121793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL - Add scoring for full text functions disjunctions #121793
Conversation
…corers for expressions
…ner when an Filter is being planned
Hi @carlosdelest, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach because most expressions won't need to worry about score. They can participate in a scorable expression and they just make 0.0
. You can't cast a regular expression into something that quacks like a scorable expression. But AND, OR, MATCH, the other full text functions you can.
* Evaluates a tree of functions for every position in the block, resulting in a | ||
* new block which is appended to the page. | ||
*/ | ||
public class ScoreOperator extends AbstractPageMappingOperator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if you could adapt ExpressionScorer.Factory
into an ExpressionEvaluator.Factory
so you could use EvalOperator
for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem I see is that ExpressionEvaluators
evaluate expressions - we've made scoring a separate concern from expressions in this PR, so it cannot be directly combined with expressions via evaluation.
That's what the ScoreOperator
is for - it evaluates the scores using their own scoring tree and interface, and then adds it to the _score
metadata.
To make scoring compatible with evaluations I think we could create an expression evaluation tree that would map the expressions tree to their corresponding expression scoring tree. Then, we could use the EvalOperator to sum that evaluation to the _score
metadata.
I like more the idea of keeping expressions and scores separate. ScoringOperator
seems like a small price to pay in order to have that, and also has a single, well defined purpose.
|
||
Block[] blocks = new Block[page.getBlockCount()]; | ||
blocks[0] = page.getBlock(0); | ||
try (DoubleBlock evalScores = scorer.score(page); DoubleBlock existingScores = page.getBlock(1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to move forward with your "replace a scorable expression in a WHERE with evaluating it's score and then checking the score" then this'll have to be just like EVAL is - just adding a new column. So you'd not be able to combine the scores. In fact, you could use EVAL
to combine the scores later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using EVAL for the combination would also let you use AddDoublesEvaluator
without much trouble which would get you error handling (NaN and stuff) for free too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, for as "free" as the rest of the code has it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"replace a scorable expression in a WHERE with evaluating it's score and then checking the score" then this'll have to be just like EVAL is - just adding a new column.
Correct. A new column would be added for each expression to be scored, and then each expression could check its corresponding column for it being > 0.
So you'd not be able to combine the scores. In fact, you could use EVAL to combine the scores later.
The global score would need to be combined as a separate, global column. This is due to scores being part of other non-scorable expressions. The global column would resolve the scoring expression and would be the one added to the _score. But yes, it could be combined via EVAL.
I see this step more as an optimization, as it would build on the new scoring interface and scoring mappers.
…e-disjunctions-scoreing-operator # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/PlannerUtils.java
…ses to avoid code duplication
…junctions-scoreing-operator' into enhancement/esql-score-disjunctions-scoreing-operator
@elasticmachine run elasticsearch-ci/part-4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @carlosdelest , it looks pretty good to me, I added some comments around the tests.
required_capability: metadata_score | ||
|
||
from books metadata _score | ||
| where length(title) > 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a typo? It seems like this query is the same as scoresNonPushableFunctions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a mistake - I corrected the test to actually use a pushable function in 7457544. Thanks!
@@ -284,6 +284,11 @@ public final void test() throws Throwable { | |||
"CSV tests cannot currently handle the _source field mapping directives", | |||
testCase.requiredCapabilities.contains(EsqlCapabilities.Cap.SOURCE_FIELD_MAPPING.capabilityName()) | |||
); | |||
assumeFalse( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we need to skip metadata_score
tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with the scoring tests is that they cannot be done via CsvTests, that are not real integration tests and thus have not a Lucene implementation.
Up until now, every test in the scoring spec required a full text function capability (match, kql, qstr), which effectively disable CsvTests.
Right now we have added more tests that do not depend on full text functions (like scoring for pushable conditions). That's the reason why I needed to add this capability to CsvTests.
I've added some capabilities to the tests for completeness in e3e068c, but it's necessary to keep metadata_score
as a capability that disables CsvTests, which makes sense as Lucene is needed for scoring to work when we have pushed down functions.
@@ -1466,11 +1466,12 @@ public void testFullTextFunctionsDisjunctions() { | |||
private void checkWithFullTextFunctionsDisjunctions(String functionInvocation) { | |||
|
|||
// Disjunctions with non-pushable functions - scoring | |||
checkdisjunctionScoringError("1:35", functionInvocation + " or length(first_name) > 10"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checkdisjunctionScoringError
method can be safely removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Removed in e4758d3
…e-disjunctions-scoreing-operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @carlosdelest! LGTM. Just a couple of more things that I can think of.
If I remember it correctly, the search functions in ES|QL
are still under tech preview, we are free to change the scores later, as you mentioned the non-full text function will have no score. At some point, if we can include some score related examples in the docs that will be great.
I would inform Kibana with the ui label, as more limitation of the full text functions are removed, I'm not sure if any action need to be taken on Kibana side, but it is nice to just FYI.
Pinging @elastic/kibana-esql (ES|QL-ui) |
Thanks @fang-xing-esql for your review! 👍
Yes! We're already in conversations with the docs team about this. Expect a PR soon 🙂
Sure! Added the label. I'm not sure that any action was taken on the UI side for this, but it doesn't hurt. |
…e-disjunctions-scoreing-operator # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great job @carlosdelest
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
) (cherry picked from commit 2b40e73) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/FullTextFunction.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Adds scoring in FTFs disjunctions. This one takes a separate path for evaluating scores for expressions, creating a separate interface and Operator to retrieve them instead of doing it as part of expression evaluation (#121322)
Changes
LuceneQueryScoreEvaluator
that produces just scores. For now, it's a copy ofLuceneQueryExpressionEvaluator
that we can refactor afterwards.ExpressionScorer
class that provides the way for scoring expressions.ExpressionScoreMapper
interface that allows expressions that implement it to provide factories forExpressionScorer
.ScoreMapper
class that provides factories forExpressionScorer
. This is the same concept thatEvalMapper
but applied to scores, keeping those concerns separate. This class provides a default evaluation for scores, and delegates toExpressionScoreMapper
interfaces in case the expressions implement them.ScoreOperator
that retrieves scores from a filter expression via theScoreMapper
. This operator blends the scores retrieved from Lucene with the ones provided by the filter expressions.LocalExecutionMapper
adds aScoreOperator
on top of theFilterOperator
when scoring is active.Pros
Cons
View previous PoCs at #121153, #121322