Skip to content

ESQL - Remove restrictions for disjunctions in full text functions #118544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
0766c5d
Remove restrictions for disjunctions in full text functions
carlosdelest Dec 12, 2024
ab91f4c
Fix CSV test
carlosdelest Dec 12, 2024
6097a0d
Update docs/changelog/118544.yaml
carlosdelest Dec 12, 2024
68e88ef
Merge remote-tracking branch 'carlosdelest/enhancement/esql-match-dis…
carlosdelest Dec 12, 2024
c198d91
Fixing tests
carlosdelest Dec 12, 2024
c91cf7c
Refactor detection method for full text functions
carlosdelest Dec 12, 2024
47be335
Add capabilities to tests
carlosdelest Dec 12, 2024
3a9e468
Fix test
carlosdelest Dec 13, 2024
da32a06
Fix mixed cluster test
carlosdelest Dec 13, 2024
60221ba
Check that all elements on each disjunction side have full text funct…
carlosdelest Dec 13, 2024
191e710
Add CSV tests
carlosdelest Dec 13, 2024
9860680
Merge branch 'main' into enhancement/esql-match-disjunction-restrictions
carlosdelest Dec 13, 2024
1056909
Fix merge
carlosdelest Dec 13, 2024
b3e9a64
Fix tests
carlosdelest Dec 13, 2024
657c447
Merge branch 'refs/heads/main' into enhancement/esql-match-disjunctio…
carlosdelest Dec 17, 2024
6449d22
Fix merge
carlosdelest Dec 17, 2024
8901591
Simplify disjunction algorithm
carlosdelest Dec 17, 2024
d213d76
Checkstyle
carlosdelest Dec 17, 2024
4968e9e
Fix test
carlosdelest Dec 17, 2024
317f7d8
Merge branch 'main' into enhancement/esql-match-disjunction-restrictions
carlosdelest Dec 17, 2024
1cd80aa
Merge branch 'main' into enhancement/esql-match-disjunction-restrictions
carlosdelest Dec 17, 2024
d4789dd
Fix test
carlosdelest Dec 17, 2024
d3b5dce
Spotless
carlosdelest Dec 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Remove restrictions for disjunctions in full text functions
  • Loading branch information
carlosdelest committed Dec 12, 2024
commit 0766c5dd720bd241cd3673b81771f887ded125a7
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,62 @@ book_no:keyword | title:text
7140 |The Lord of the Rings Poster Collection: Six Paintings by Alan Lee (No. 1)
;

matchWithDisjunction
required_capability: match_function

from books
| where match(author, "Vonnegut") or match(author, "Guinane")
| keep book_no, author;
ignoreOrder:true

book_no:keyword | author:text
2464 | Kurt Vonnegut
6970 | Edith Vonnegut
8956 | Kurt Vonnegut
3950 | Kurt Vonnegut
4382 | Carole Guinane
;

matchWithDisjunctionAndFiltersConjunction
required_capability: match_function

from books
| where (match(author, "Vonnegut") or match(author, "Guinane")) and year > 1997
| keep book_no, author, year;
ignoreOrder:true

book_no:keyword | author:text | year:integer
6970 | Edith Vonnegut | 1998
4382 | Carole Guinane | 2001
;

matchWithDisjunctionAndConjunction
required_capability: match_function

from books
| where (match(author, "Vonnegut") or match(author, "Marquez")) and match(description, "realism")
| keep book_no;

book_no:keyword
4814
;

matchWithDisjunctionIncludingConjunction
required_capability: match_function

from books
| where match(author, "Vonnegut") or (match(author, "Marquez") and match(description, "realism"))
| keep book_no;
ignoreOrder:true

book_no:keyword
2464
6970
4814
8956
3950
;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a few more queries with nested AND/OR(in csv and yml tests), similar as the one below, and perhaps make it a bit more complicated? It will be helpful to catch some surprises.

curl -u elastic:password -v -X POST "localhost:9200/_query?format=txt&pretty" -H 'Content-Type: application/json' -d'
{
  "query": "from books | where (match(name, \"Space\") and length(name) > 0) or (match(author, \"Neal\") and length(author) > 0)"
}

  "error" : {
    "root_cause" : [
      {
        "type" : "ql_illegal_argument_exception",
        "reason" : "Unsupported expression [match(name, \"Space\")]"
      }
    ],
    "type" : "ql_illegal_argument_exception",
    "reason" : "Unsupported expression [match(name, \"Space\")]"
  },
  "status" : 500

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we can add an IT which programmatically adds a random number of nested disjunctions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking this, I added some more tests as this test case needed some code changes.

perhaps we can add an IT which programmatically adds a random number of nested disjunctions?

Agreed, we need to change the tests for full text functions to include this kind of testing. Let's do that on a separate PR and issue.

matchWithFunctionPushedToLucene
required_capability: match_function

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,64 @@ book_no:keyword | title:text
7140 |The Lord of the Rings Poster Collection: Six Paintings by Alan Lee (No. 1)
;


matchWithDisjunction
required_capability: match_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to add a new esql capability if the bwc tests fail 🤔

Copy link
Member Author

@carlosdelest carlosdelest Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was too optimistic...

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this ! 🤟


from books
| where author : "Vonnegut" or author : "Guinane"
| keep book_no, author;
ignoreOrder:true

book_no:keyword | author:text
2464 | Kurt Vonnegut
6970 | Edith Vonnegut
8956 | Kurt Vonnegut
3950 | Kurt Vonnegut
4382 | Carole Guinane
;

matchWithDisjunctionAndFiltersConjunction
required_capability: match_function

from books
| where (author : "Vonnegut" or author : "Guinane") and year > 1997
| keep book_no, author, year;
ignoreOrder:true

book_no:keyword | author:text | year:integer
6970 | Edith Vonnegut | 1998
4382 | Carole Guinane | 2001
;

matchWithDisjunctionAndConjunction
required_capability: match_function

from books
| where (author : "Vonnegut" or author : "Marquez") and description : "realism"
| keep book_no;

book_no:keyword
4814
;

matchWithDisjunctionIncludingConjunction
required_capability: match_function

from books
| where author : "Vonnegut") or (author : "Marquez" and description : "realism")
| keep book_no;
ignoreOrder:true

book_no:keyword
2464
6970
4814
8956
3950
;


matchWithFunctionPushedToLucene
required_capability: match_operator_colon

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
import org.elasticsearch.xpack.esql.core.expression.predicate.logical.Or;
import org.elasticsearch.xpack.esql.core.expression.predicate.operator.comparison.BinaryComparison;
import org.elasticsearch.xpack.esql.core.type.DataType;
import org.elasticsearch.xpack.esql.core.util.Holder;
import org.elasticsearch.xpack.esql.expression.function.UnsupportedAttribute;
import org.elasticsearch.xpack.esql.expression.function.aggregate.AggregateFunction;
import org.elasticsearch.xpack.esql.expression.function.aggregate.FilteredExpression;
Expand Down Expand Up @@ -773,35 +774,53 @@ private static void checkRemoteEnrich(LogicalPlan plan, Set<Failure> failures) {
* @param typeNameProvider provider for the type name to add in the failure message
* @param failures failures collection to add to
*/
private static void checkNotPresentInDisjunctions(
private static void checkFullTextSearchDisjunctions(
Expression condition,
java.util.function.Function<FullTextFunction, String> typeNameProvider,
Set<Failure> failures
) {
condition.forEachUp(Or.class, or -> {
checkNotPresentInDisjunctions(or.left(), or, typeNameProvider, failures);
checkNotPresentInDisjunctions(or.right(), or, typeNameProvider, failures);
boolean left = checkFullTextSearchInDisjunctions(or.left());
boolean right = checkFullTextSearchInDisjunctions(or.right());
if (left ^ right) {
Holder<String> elementName = new Holder<>();
if (right) {
or.left().forEachDown(FullTextFunction.class, ftf -> elementName.set(typeNameProvider.apply(ftf)));
} else {
or.right().forEachDown(FullTextFunction.class, ftf -> elementName.set(typeNameProvider.apply(ftf)));
}
failures.add(
fail(
or,
"Invalid condition [{}]. {} can be used as part of an OR condition, "
+ "but only if other full text functions are used as part of the condition",
or.sourceText(),
elementName.get()
)
);
}
});
}

/**
* Checks whether a condition contains a disjunction with the specified typeToken. Adds to failure if it does.
* Checks whether an expression contains just full text functions or negations (NOT) and combinations (AND, OR) of full text functions
*
* @param parentExpression parent expression to add to the failure message
* @param or disjunction that is being checked
* @param failures failures collection to add to
* @param expression parent expression to add to the failure message
* @return true if all children are full text functions or negations of full text functions, false otherwise
*/
private static void checkNotPresentInDisjunctions(
Expression parentExpression,
Or or,
java.util.function.Function<FullTextFunction, String> elementName,
Set<Failure> failures
) {
parentExpression.forEachDown(FullTextFunction.class, ftp -> {
failures.add(
fail(or, "Invalid condition [{}]. {} can't be used as part of an or condition", or.sourceText(), elementName.apply(ftp))
);
});
private static boolean checkFullTextSearchInDisjunctions(Expression expression) {
if (expression instanceof FullTextFunction) {
return false;
} else if (expression instanceof Not || expression instanceof BinaryLogic) {
for (Expression child : expression.children()) {
if (checkFullTextSearchInDisjunctions(child)) {
return true;
}
}
return false;
}

return true;
}

/**
Expand Down Expand Up @@ -871,7 +890,7 @@ private static void checkFullTextQueryFunctions(LogicalPlan plan, Set<Failure> f
m -> "[" + m.functionName() + "] " + m.functionType(),
failures
);
checkNotPresentInDisjunctions(condition, ftf -> "[" + ftf.functionName() + "] " + ftf.functionType(), failures);
checkFullTextSearchDisjunctions(condition, ftf -> "[" + ftf.functionName() + "] " + ftf.functionType(), failures);
checkFullTextFunctionsParents(condition, failures);
} else {
plan.forEachExpression(FullTextFunction.class, ftf -> {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1453,6 +1453,47 @@ private void checkWithDisjunctions(String functionName, String functionInvocatio
);
}

public void testFullTextFunctionsDisjunctions() {
checkWithFullTextFunctionsDisjunctions("MATCH", "match(last_name, \"Smith\")", "function");
checkWithFullTextFunctionsDisjunctions(":", "last_name : \"Smith\"", "operator");
checkWithFullTextFunctionsDisjunctions("QSTR", "qstr(\"last_name: Smith\")", "function");

assumeTrue("KQL function capability not available", EsqlCapabilities.Cap.KQL_FUNCTION.isEnabled());
checkWithFullTextFunctionsDisjunctions("KQL", "kql(\"last_name: Smith\")", "function");
}

private void checkWithFullTextFunctionsDisjunctions(String functionName, String functionInvocation, String functionType) {
passes("from test | where " + functionInvocation + " or match(first_name, \"Anna\")");
passes("from test | where " + functionInvocation + " or not match(first_name, \"Anna\")");
passes("from test | where (" + functionInvocation + " or match(first_name, \"Anna\")) and length(first_name) > 10");
passes("from test | where (" + functionInvocation + " or match(first_name, \"Anna\")) and match(last_name, \"Smith\")");
passes("from test | where " + functionInvocation + " or (match(first_name, \"Anna\") and match(last_name, \"Smith\"))");

assertEquals(
LoggerMessageFormat.format(
null,
"1:19: Invalid condition [{} or length(first_name) > 10]. [{}] {} can be used as part of an OR condition, "
+ "but only if other full text functions are used as part of the condition",
functionInvocation,
functionName,
functionType
),
error("from test | where " + functionInvocation + " or length(first_name) > 10")
);
assertEquals(
LoggerMessageFormat.format(
null,
"1:19: Invalid condition [{} or (match(last_name, \"Anneke\") and length(first_name) > 10)]."
+ " [{}] {} can be used as part of an OR condition, "
+ "but only if other full text functions are used as part of the condition",
functionInvocation,
functionName,
functionType
),
error("from test | where " + functionInvocation + " or (match(last_name, \"Anneke\") and length(first_name) > 10)")
);
}

public void testQueryStringFunctionWithNonBooleanFunctions() {
checkFullTextFunctionsWithNonBooleanFunctions("QSTR", "qstr(\"first_name: Anna\")", "function");
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import org.elasticsearch.index.query.MatchQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.QueryStringQueryBuilder;
import org.elasticsearch.index.query.RangeQueryBuilder;
import org.elasticsearch.index.query.SearchExecutionContext;
import org.elasticsearch.license.XPackLicenseState;
Expand Down Expand Up @@ -57,6 +58,7 @@
import org.elasticsearch.xpack.esql.plan.physical.EvalExec;
import org.elasticsearch.xpack.esql.plan.physical.ExchangeExec;
import org.elasticsearch.xpack.esql.plan.physical.FieldExtractExec;
import org.elasticsearch.xpack.esql.plan.physical.FilterExec;
import org.elasticsearch.xpack.esql.plan.physical.LimitExec;
import org.elasticsearch.xpack.esql.plan.physical.LocalSourceExec;
import org.elasticsearch.xpack.esql.plan.physical.PhysicalPlan;
Expand Down Expand Up @@ -1543,6 +1545,46 @@ public void testMultipleMatchFilterPushdown() {
assertThat(actualLuceneQuery.toString(), is(expectedLuceneQuery.toString()));
}

public void testFullTextFunctionsDisjunctionPushdown() {
String query = """
from test
| where (match(first_name, "Anna") or qstr("first_name: Anneke")) and last_name: "Smith"
| sort emp_no
""";
var plan = plannerOptimizer.plan(query);
var topNExec = as(plan, TopNExec.class);
var exchange = as(topNExec.child(), ExchangeExec.class);
var project = as(exchange.child(), ProjectExec.class);
var fieldExtract = as(project.child(), FieldExtractExec.class);
var actualLuceneQuery = as(fieldExtract.child(), EsQueryExec.class).query();
var expectedLuceneQuery = new BoolQueryBuilder().must(
new BoolQueryBuilder().should(new MatchQueryBuilder("first_name", "Anna").lenient(true))
.should(new QueryStringQueryBuilder("first_name: Anneke"))
).must(new MatchQueryBuilder("last_name", "Smith").lenient(true));
assertThat(actualLuceneQuery.toString(), is(expectedLuceneQuery.toString()));
}

public void testFullTextFunctionsDisjunctionWithFiltersPushdown() {
String query = """
from test
| where (first_name:"Anna" or first_name:"Anneke") and last_name:"first_name) > 5
| sort emp_no
""";
var plan = plannerOptimizer.plan(query);
var topNExec = as(plan, TopNExec.class);
var exchange = as(topNExec.child(), ExchangeExec.class);
var project = as(exchange.child(), ProjectExec.class);
var fieldExtract = as(project.child(), FieldExtractExec.class);
var secondTopNExec = as(fieldExtract.child(), TopNExec.class);
var secondFieldExtract = as(secondTopNExec.child(), FieldExtractExec.class);
var filterExec = as(secondFieldExtract.child(), FilterExec.class);
var thirdFilterExtract = as(filterExec.child(), FieldExtractExec.class);
var actualLuceneQuery = as(thirdFilterExtract.child(), EsQueryExec.class).query();
var expectedLuceneQuery = new BoolQueryBuilder().should(new MatchQueryBuilder("first_name", "Anna").lenient(true))
.should(new MatchQueryBuilder("first_name", "Anneke").lenient(true));
assertThat(actualLuceneQuery.toString(), is(expectedLuceneQuery.toString()));
}

/**
* Expecting
* LimitExec[1000[INTEGER]]
Expand Down