Skip to content

ES|QL - Add scoring for full text functions disjunctions #121793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Feb 5, 2025

Adds scoring in FTFs disjunctions. This one takes a separate path for evaluating scores for expressions, creating a separate interface and Operator to retrieve them instead of doing it as part of expression evaluation (#121322)

Changes

  • A new LuceneQueryScoreEvaluator that produces just scores. For now, it's a copy of LuceneQueryExpressionEvaluator that we can refactor afterwards.
  • A new ExpressionScorer class that provides the way for scoring expressions.
  • A new ExpressionScoreMapper interface that allows expressions that implement it to provide factories for ExpressionScorer.
  • A ScoreMapper class that provides factories for ExpressionScorer. This is the same concept that EvalMapper but applied to scores, keeping those concerns separate. This class provides a default evaluation for scores, and delegates to ExpressionScoreMapper interfaces in case the expressions implement them.
  • A ScoreOperator that retrieves scores from a filter expression via the ScoreMapper. This operator blends the scores retrieved from Lucene with the ones provided by the filter expressions.
  • LocalExecutionMapper adds a ScoreOperator on top of the FilterOperator when scoring is active.

Pros

  • Separate logic for scoring and expression evaluation
  • The ScoreMapper / ExpressionScorer may be reused for other scoring mechanisms (score function, WHERE .. AS s1 construct)
  • It can be further optimized - scoring could be done before expression evaluation so expression evaluation uses scores as a way of understanding if expression is true or not

Cons

  • A separate "phase" for scoring, which is less efficient than doing evaluation / scoring in one go. This can be optimized afterwards, so evaluation can use scores for determining if a result is true or not.

View previous PoCs at #121153, #121322

@carlosdelest carlosdelest added >enhancement auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.19.0 v9.1.0 labels Feb 5, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @carlosdelest, I've created a changelog YAML for you.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach because most expressions won't need to worry about score. They can participate in a scorable expression and they just make 0.0. You can't cast a regular expression into something that quacks like a scorable expression. But AND, OR, MATCH, the other full text functions you can.

* Evaluates a tree of functions for every position in the block, resulting in a
* new block which is appended to the page.
*/
public class ScoreOperator extends AbstractPageMappingOperator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if you could adapt ExpressionScorer.Factory into an ExpressionEvaluator.Factory so you could use EvalOperator for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I see is that ExpressionEvaluators evaluate expressions - we've made scoring a separate concern from expressions in this PR, so it cannot be directly combined with expressions via evaluation.

That's what the ScoreOperator is for - it evaluates the scores using their own scoring tree and interface, and then adds it to the _score metadata.

To make scoring compatible with evaluations I think we could create an expression evaluation tree that would map the expressions tree to their corresponding expression scoring tree. Then, we could use the EvalOperator to sum that evaluation to the _score metadata.

I like more the idea of keeping expressions and scores separate. ScoringOperator seems like a small price to pay in order to have that, and also has a single, well defined purpose.


Block[] blocks = new Block[page.getBlockCount()];
blocks[0] = page.getBlock(0);
try (DoubleBlock evalScores = scorer.score(page); DoubleBlock existingScores = page.getBlock(1)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to move forward with your "replace a scorable expression in a WHERE with evaluating it's score and then checking the score" then this'll have to be just like EVAL is - just adding a new column. So you'd not be able to combine the scores. In fact, you could use EVAL to combine the scores later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using EVAL for the combination would also let you use AddDoublesEvaluator without much trouble which would get you error handling (NaN and stuff) for free too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, for as "free" as the rest of the code has it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"replace a scorable expression in a WHERE with evaluating it's score and then checking the score" then this'll have to be just like EVAL is - just adding a new column.

Correct. A new column would be added for each expression to be scored, and then each expression could check its corresponding column for it being > 0.

So you'd not be able to combine the scores. In fact, you could use EVAL to combine the scores later.

The global score would need to be combined as a separate, global column. This is due to scores being part of other non-scorable expressions. The global column would resolve the scoring expression and would be the one added to the _score. But yes, it could be combined via EVAL.

I see this step more as an optimization, as it would build on the new scoring interface and scoring mappers.

@carlosdelest carlosdelest changed the title [PoC 3] ES|QL - Add scoring for full text functions disjunctions via a ScoreMapper ES|QL - Add scoring for full text functions disjunctions via a ScoreMapper Mar 3, 2025
@carlosdelest carlosdelest changed the title ES|QL - Add scoring for full text functions disjunctions via a ScoreMapper ES|QL - Add scoring for full text functions disjunctions Mar 6, 2025
@carlosdelest
Copy link
Member Author

@elasticmachine run elasticsearch-ci/part-4

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @carlosdelest , it looks pretty good to me, I added some comments around the tests.

required_capability: metadata_score

from books metadata _score
| where length(title) > 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo? It seems like this query is the same as scoresNonPushableFunctions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a mistake - I corrected the test to actually use a pushable function in 7457544. Thanks!

@@ -284,6 +284,11 @@ public final void test() throws Throwable {
"CSV tests cannot currently handle the _source field mapping directives",
testCase.requiredCapabilities.contains(EsqlCapabilities.Cap.SOURCE_FIELD_MAPPING.capabilityName())
);
assumeFalse(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we need to skip metadata_score tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the scoring tests is that they cannot be done via CsvTests, that are not real integration tests and thus have not a Lucene implementation.

Up until now, every test in the scoring spec required a full text function capability (match, kql, qstr), which effectively disable CsvTests.

Right now we have added more tests that do not depend on full text functions (like scoring for pushable conditions). That's the reason why I needed to add this capability to CsvTests.

I've added some capabilities to the tests for completeness in e3e068c, but it's necessary to keep metadata_score as a capability that disables CsvTests, which makes sense as Lucene is needed for scoring to work when we have pushed down functions.

@@ -1466,11 +1466,12 @@ public void testFullTextFunctionsDisjunctions() {
private void checkWithFullTextFunctionsDisjunctions(String functionInvocation) {

// Disjunctions with non-pushable functions - scoring
checkdisjunctionScoringError("1:35", functionInvocation + " or length(first_name) > 10");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checkdisjunctionScoringError method can be safely removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Removed in e4758d3

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @carlosdelest! LGTM. Just a couple of more things that I can think of.

If I remember it correctly, the search functions in ES|QL are still under tech preview, we are free to change the scores later, as you mentioned the non-full text function will have no score. At some point, if we can include some score related examples in the docs that will be great.

I would inform Kibana with the ui label, as more limitation of the full text functions are removed, I'm not sure if any action need to be taken on Kibana side, but it is nice to just FYI.

@carlosdelest carlosdelest added the ES|QL-ui Impacts ES|QL UI label Mar 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

@carlosdelest
Copy link
Member Author

Thanks @fang-xing-esql for your review! 👍

At some point, if we can include some score related examples in the docs that will be great.

Yes! We're already in conversations with the docs team about this. Expect a PR soon 🙂

I would inform Kibana with the ui label, as more limitation of the full text functions are removed, I'm not sure if any action need to be taken on Kibana side, but it is nice to just FYI.

Sure! Added the label. I'm not sure that any action was taken on the UI side for this, but it doesn't hurt.

…e-disjunctions-scoreing-operator

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
@carlosdelest carlosdelest enabled auto-merge (squash) March 11, 2025 13:24
Copy link
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great job @carlosdelest

@carlosdelest carlosdelest merged commit 2b40e73 into elastic:main Mar 11, 2025
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts
9.0 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 121793

@carlosdelest
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.x
9.0

Questions ?

Please refer to the Backport tool documentation

carlosdelest added a commit to carlosdelest/elasticsearch that referenced this pull request Mar 11, 2025
)

(cherry picked from commit 2b40e73)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/FullTextFunction.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
albertzaharovits pushed a commit to albertzaharovits/elasticsearch that referenced this pull request Mar 13, 2025
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged backport pending >enhancement ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.0.1 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants