-
Notifications
You must be signed in to change notification settings - Fork 163
Description
Please, track first design draft and discussion in #1752
TL;DR
Current behavior:
def Optimize:
for node in PlanTree: # Traverse the Logical Plan Tree
for rule in rules: # Enumerate rules
tryApplyRule()
New behavior:
def Optimize:
for rule in rules: # Enumerate rules
for node in PlanTree: # Traverse the Logical Plan Tree
tryApplyRule()
No new features, all tests pass, nothing changed for the end-user.
Background
Currently each storage engine adds its own logical operator as concrete implementation for TableScanOperator
abstraction. Typically each data source needs to add 2 logical operators for table scan with without aggregation. Take OpenSearch for example, there are OpenSearchLogicalIndexScan
and OpenSearchLogicalIndexAgg
and a bunch of pushdown optimization rules for each accordingly.
class LogicalPlanOptimizer:
/*
* OpenSearch rules include:
* PUSH_DOWN_PAGE_SIZE
* PUSH_DOWN_FILTER
* PUSH_DOWN_AGGREGATION
* PUSH_DOWN_SORT
* PUSH_DOWN_HIGHLIGHT
* PUSH_DOWN_NESTED
* PUSH_DOWN_PROJECT
* PUSH_DOWN_LIMIT
*
* that return *OpenSearchLogicalIndexAgg*
* or *OpenSearchLogicalIndexScan* finally
*/
val rules: List<Rule>
def optimize(plan: LogicalPlan):
for rule in rules: # Enumerate rules
for node in plan: # Traverse the Logical Plan Tree
tryApplyRule()
Optimization Protocol
There are optimizaion guidelines which should be strictly followed to ensure that search query built completely matches user request.
1.
Optimizer
should apply rules in the strict order they are defined. For example, PUSH_DOWN_LIMIT
should be applied last.
Violation of that causes bugs, for example #1764, #1774, #1788.
Sample queries
- Pagination with
LIMIT
{
"query": "SELECT * from `calcs` limit 10",
"fetch_size": 3
}
TODO ❗ expected behavior once LIMIT
supported in pagination.
stateDiagram-v2
state "Before" as Before {
state "LogicalPaginate" as Paginate
state "LogicalProject" as ProjectB
state "LogicalLimit" as LimitB
state "LogicalRelation" as Relation
Paginate --> ProjectB
ProjectB --> LimitB
LimitB --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalLimit" as LimitA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> LimitA
LimitA --> TableScanBuilder
}
NESTED
withLIMIT
SELECT nested(message.*) from nested-type limit 4
TODO ❗ expected behavior once #1764 fixed - need to reorder tree nodes
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalNested" as NestedB
state "LogicalLimit" as LimitB
state "LogicalRelation" as Relation
ProjectB --> NestedB
NestedB --> LimitB
LimitB --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalLimit" as LimitA
state "LogicalNested" as NestedA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> LimitA
LimitA --> NestedA
NestedA --> TableScanBuilder
}
2.
Optimizer should be able to apply a rule matching Something
even when plan tree has something in between of LogicalSomething
and TableScanBuilder
, unless exception specified. TableScanBuilder
could be wrapped by another tree node, for example in join implementation (see #1623).
Sample queries
PPL
withSORT
thenLIMIT
source=calcs | sort - int0 | head 10 | fields int0;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalLimit" as Limit
state "LogicalSort" as Sort
state "LogicalRelation" as Relation
ProjectB --> Limit
Limit --> Sort
Sort --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
PPL
withLIMIT
thenSORT
source=calcs | head 10 | sort - int0 | fields int0;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalSort" as Sort
state "LogicalLimit" as Limit
state "LogicalRelation" as Relation
ProjectB --> Sort
Sort --> Limit
Limit --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
- An
SQL
query; likely plan trees of all SQL queries are always in the same order
select * from calcs where int0 > 0 order by int2 limit 10;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalLimit" as Limit
state "LogicalSort" as Sort
state "LogicalFilter" as Filter
state "LogicalRelation" as Relation
ProjectB --> Limit
Limit --> Sort
Sort --> Filter
Filter --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
3.
A query might have multiple highlight
s backed by LogicalHighlight
(and filter
s and sort
s) - all of them should be pushed down.
A corresponding rule should be attempted multiple times.
Sample queries
- Multiple
highlight
s
SELECT highlight(Title), highlight(Body, pre_tags='<mark style="background-color: green;">', post_tags='</mark>') FROM
beer.stackexchange WHERE multi_match([Title, Body], 'IPA') ORDER BY Id LIMIT 1;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalHighlight" as Highlight1
state "LogicalHighlight" as Highlight2
state "LogicalLimit" as Limit
state "LogicalSort" as Sort
state "LogicalFilter" as Filter
state "LogicalRelation" as Relation
ProjectB --> Highlight1
Highlight1 --> Highlight2
Highlight2 --> Limit
Limit --> Sort
Sort --> Filter
Filter --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
- Multiple
filter
s
source=account | where age > 30 | where age < 35 | fields age;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalFilter" as Filter1
state "LogicalFilter" as Filter2
state "LogicalRelation" as Relation
ProjectB --> Filter1
Filter1 --> Filter2
Filter2 --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
- Multiple
sort
s
source=account | sort age | sort lastname | head 20;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalLimit" as Limit
state "LogicalSort" as Sort1
state "LogicalSort" as Sort2
state "LogicalRelation" as Relation
ProjectB --> Limit
Limit --> Sort1
Sort1 --> Sort2
Sort2 --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
4.
Dislike 3 most of the rules should be applied once only.
TODO ❗ optimization doesn't work correctly in that cases, #917
Sample queries
PPL
source=account | fields firstname, lastname | head 10;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalLimit" as LimitB
state "LogicalProject" as Project2B
state "LogicalRelation" as Relation
Project1B --> LimitB
LimitB --> Project2B
Project2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalLimit" as LimitA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> LimitA
LimitA --> Project2A
Project2A --> TableScanBuilder
}
5.
PUSH_DOWN_PROJECT
should not happen if there is a LogicalEval
between LogicalProject
and TableScanBuilder
.
Sample queries
source=bank | eval f = abs(age) | fields f;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalEval" as EvalB
state "LogicalRelation" as Relation
ProjectB --> EvalB
EvalB --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalEval" as EvalA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> EvalA
EvalA --> TableScanBuilder
}
6.
Similar to 5, LogicalWindow
in the plan tree between LogicalProject
and TableScanBuilder
should block PUSH_DOWN_PROJECT
operation.
Sample queries
SELECT avg(date0) OVER(PARTITION BY datetime1) from calcs;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalWindow" as WindowB
state "LogicalSort" as Sort
state "LogicalRelation" as Relation
ProjectB --> WindowB
WindowB --> Sort
Sort --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalWindow" as WindowA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> WindowA
WindowA --> TableScanBuilder
}
7.
Some push down operations could be rejected (e.g. pushDownWhatever
returns false), so corresponding LogicalSomething
node remains in the tree. Avoid infinite re-trying to apply a rule for that node.
Sample queries
OpenSearchIndexScanAggregationBuilder
rejectspushDownFilter
SELECT SUM(1) AS `cnt_` FROM calcs HAVING COUNT(2) > 1;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalFilter" as FilterB
state "LogicalAggregation" as Aggregation
state "LogicalRelation" as Relation
ProjectB --> FilterB
FilterB --> Aggregation
Aggregation --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalFilter" as FilterA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> FilterA
FilterA --> TableScanBuilder
}
OpenSearchIndexScanAggregationBuilder
rejectspushDownSort
andpushDownLimit
SELECT COUNT(*) FROM account GROUP BY age ORDER BY COUNT(*) LIMIT 5;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalLimit" as LimitB
state "LogicalSort" as SortB
state "LogicalAggregation" as Aggregation
state "LogicalRelation" as Relation
ProjectB --> LimitB
LimitB --> SortB
SortB --> Aggregation
Aggregation --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalLimit" as LimitA
state "LogicalSort" as SortA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> LimitA
LimitA --> SortA
SortA --> TableScanBuilder
}
8.
PUSH_DOWN_LIMIT
should be blocked if there is a LogicalSort
or LogicalFilter
between LogicalLimit
and TableScanBuilder
.
Sample queries
PUSH_DOWN_SORT
can't be performed due to implementation restrictions (#1471), soPUSH_DOWN_LIMIT
shouldn't be performed too.
SELECT CAST(balance AS FLOAT) AS jdbc_float_alias FROM account ORDER BY jdbc_float_alias LIMIT 1;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalLimit" as LimitB
state "LogicalSort" as SortB
state "LogicalRelation" as Relation
ProjectB --> LimitB
LimitB --> SortB
SortB --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalLimit" as LimitA
state "LogicalSort" as SortA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> LimitA
LimitA --> SortA
SortA --> TableScanBuilder
}
OpenSearchIndexScanAggregationBuilder
rejectspushDownFilter
, soLogicalFilter
remains in the tree andPUSH_DOWN_LIMIT
shouldn't be performed.
SELECT gender from account GROUP BY gender HAVING count(*) > 5 LIMIT 1;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalLimit" as LimitB
state "LogicalFilter" as FilterB
state "LogicalAggregation" as Aggregation
state "LogicalRelation" as Relation
ProjectB --> LimitB
LimitB --> FilterB
FilterB --> Aggregation
Aggregation --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalLimit" as LimitA
state "LogicalFilter" as FilterA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> LimitA
LimitA --> FilterA
FilterA --> TableScanBuilder
}
9.
PUSH_DOWN_SORT
and PUSH_DOWN_FILTER
should be after PUSH_DOWN_AGGREGATION
if LogicalSort
or LogicalFilter
are on top of LogicalAggregation
, see rules in OpenSearchIndexScanAggregationBuilder
.
Sample queries
SQL
:LogicalFilter
is on top ofLogicalAggregation
SELECT gender from account GROUP BY gender HAVING count(*) > 500;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalFilter" as FilterB
state "LogicalAggregation" as Aggregation
state "LogicalRelation" as Relation
ProjectB --> FilterB
FilterB --> Aggregation
Aggregation --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalFilter" as FilterA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> FilterA
FilterA --> TableScanBuilder
}
PPL
:LogicalFilter
is on top ofLogicalAggregation
source=account | stats sum(balance) as a by state | where a > 780000;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalFilter" as FilterB
state "LogicalAggregation" as Aggregation
state "LogicalRelation" as Relation
ProjectB --> FilterB
FilterB --> Aggregation
Aggregation --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalFilter" as FilterA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> FilterA
FilterA --> TableScanBuilder
}
10.
PUSH_DOWN_FILTER
should be before PUSH_DOWN_AGGREGATION
if LogicalFilteris
under of LogicalAggregation
.
Sample queries
LogicalFilter
is underLogicalAggregation
SELECT gender from account WHERE age > 20 GROUP BY gender;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalAggregation" as Aggregation
state "LogicalFilter" as Filter
state "LogicalRelation" as Relation
ProjectB --> Aggregation
Aggregation --> Filter
Filter --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
11.
As combination of 9 and 10, PUSH_DOWN_FILTER
should be attempted before and after PUSH_DOWN_AGGREGATION
.
Sample queries
LogicalAggregation
surrounded by twoLogicalFilter
s
SELECT gender from account WHERE age > 20 GROUP BY gender HAVING count(*) > 80;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalFilter" as Filter1B
state "LogicalAggregation" as Aggregation
state "LogicalFilter" as Filter2B
state "LogicalRelation" as Relation
ProjectB --> Filter1B
Filter1B --> Aggregation
Aggregation --> Filter2B
Filter2B --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalFilter" as FilterA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> FilterA
FilterA --> TableScanBuilder
}
12.
Subqueries: don’t PUSH_DOWN_AGGREGATION
, PUSH_DOWN_LIMIT
, PUSH_DOWN_FILTER
, PUSH_DOWN_SORT
for the outer query, only for inner one (under most bottom LogicalProject
).
Sample queries
- Aggregation
SELECT COUNT(*) FILTER(WHERE age > 35) FROM (SELECT * FROM bank) as a;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalAggregation" as AggergationB
state "LogicalProject" as Project2B
state "LogicalRelation" as Relation
Project1B --> AggergationB
AggergationB --> Project2B
Project2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalAggregation" as AggergationA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> AggergationA
AggergationA --> Project2A
Project2A --> TableScanBuilder
}
- Filter
SELECT origin FROM (SELECT Origin AS origin, AvgTicketPrice AS price FROM flights) AS f WHERE f.price > 100;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalFilter" as FilterB
state "LogicalProject" as Project2B
state "LogicalRelation" as Relation
Project1B --> FilterB
FilterB --> Project2B
Project2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalFilter" as FilterA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> FilterA
FilterA --> Project2A
Project2A --> TableScanBuilder
}
- Sort
SELECT origin FROM (SELECT Origin AS origin, AvgTicketPrice AS price FROM flights) AS f ORDER BY f.price;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalSort" as SortB
state "LogicalProject" as Project2B
state "LogicalRelation" as Relation
Project1B --> SortB
SortB --> Project2B
Project2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalSort" as SortA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> SortA
SortA --> Project2A
Project2A --> TableScanBuilder
}
- Sort and Aggregation
SELECT Origin, MIN(AvgTicketPrice) FROM (SELECT * FROM flights) AS flights GROUP BY Origin ORDER BY MAX(AvgTicketPrice)
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalSort" as SortB
state "LogicalAggregation" as AggergationB
state "LogicalProject" as Project2B
state "LogicalRelation" as Relation
Project1B --> SortB
SortB --> AggergationB
AggergationB --> Project2B
Project2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalSort" as SortA
state "LogicalAggregation" as AggergationA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> SortA
SortA --> AggergationA
AggergationA --> Project2A
Project2A --> TableScanBuilder
}
- Sort and Limit
SELECT price FROM (SELECT AvgTicketPrice AS price FROM flights LIMIT 10) AS flights ORDER BY price LIMIT 5, 5;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalLimit" as Limit1B
state "LogicalSort" as SortB
state "LogicalProject" as Project2B
state "LogicalLimit" as Limit2B
state "LogicalRelation" as Relation
Project1B --> Limit1B
Limit1B --> SortB
SortB --> Project2B
Project2B --> Limit2B
Limit2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalLimit" as LimitA
state "LogicalSort" as SortA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> LimitA
LimitA --> SortA
SortA --> Project2A
Project2A --> TableScanBuilder
}
❗ TODO add IT
❗ TODO nested, pagination, highlight, window
13.
Push down absolutely identical tree nodes (LogicalSort
, LogicalFilter
).
Sample queries
- Sort
SELECT FlightDelayMin, AvgTicketPrice, STDDEV_SAMP(AvgTicketPrice) OVER (ORDER BY FlightDelayMin) AS num FROM flights ORDER BY FlightDelayMin;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalWindow" as WindowB
state "LogicalSort" as Sort1
state "LogicalSort" as Sort2
state "LogicalRelation" as Relation
ProjectB --> WindowB
WindowB --> Sort1
Sort1 --> Sort2
Sort2 --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalWindow" as WindowA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> WindowA
WindowA --> TableScanBuilder
}
- Filter
source=account | where age > 38 | where age > 38 | fields firstname, age;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalFilter" as Filter1
state "LogicalFilter" as Filter2
state "LogicalRelation" as Relation
ProjectB --> Filter1
Filter1 --> Filter2
Filter2 --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> TableScanBuilder
}
❗ TBD merge them (remove duplicates) or push down (current behavior - push down)?
❗ TODO add ITs (includehighlight)
14.
Optimize subqueries for SQL
and complex queries in PPL
. ❗ TODO not implemented.
Sample queries
- Example from 4 - not optimized:
source=account | fields firstname, lastname | head 10;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalLimit" as LimitB
state "LogicalProject" as Project2B
state "LogicalRelation" as Relation
Project1B --> LimitB
LimitB --> Project2B
Project2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalLimit" as LimitA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> LimitA
LimitA --> Project2A
Project2A --> TableScanBuilder
}
- Relevance search with subquery:
SELECT *, highlight(origin), _score
FROM (SELECT Origin AS origin, AvgTicketPrice AS price FROM flights WHERE AvgTicketPrice > 100) AS f
WHERE score(origin = match_query('Base'));
This query fails because LogicalFilter
from outer query wasn’t pushed down, so V2 tried to do apply relevance search in memory.
UnsupportedOperationException: OpenSearch defined function [match_query] is only supported in WHERE and HAVING clause.
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as Project1B
state "LogicalHighlight" as Highlight
state "LogicalFilter" as Filter1B
state "LogicalProject" as Project2B
state "LogicalFilter" as Filter2B
state "LogicalRelation" as Relation
Project1B --> Highlight
Highlight --> Filter1B
Filter1B --> Project2B
Project2B --> Filter2B
Filter2B --> Relation
}
state "After" as After {
state "LogicalProject" as Project1A
state "LogicalFilter" as FilterA
state "LogicalProject" as Project2A
state "TableScanBuilder" as TableScanBuilder
Project1A --> FilterA
FilterA --> Project2A
Project2A --> TableScanBuilder
}
- Complex PPL query returns incorrect results
source=account | where age > 30 | head 1000 | sort +balance | where age < 40 | head 100 | sort -balance | where balance > 10000 | fields age;
stateDiagram-v2
state "Before" as Before {
state "LogicalProject" as ProjectB
state "LogicalFilter" as Filter1
state "LogicalSort" as Sort1
state "LogicalLimit" as Limit1
state "LogicalFilter" as Filter2
state "LogicalSort" as Sort2
state "LogicalLimit" as Limit2
state "LogicalFilter" as Filter3
state "LogicalRelation" as Relation
ProjectB --> Filter1
Filter1 --> Sort1
Sort1 --> Limit1
Limit1 --> Filter2
Filter2 --> Sort2
Sort2 --> Limit2
Limit2 --> Filter3
Filter3 --> Relation
}
state "After" as After {
state "LogicalProject" as ProjectA
state "LogicalLimit" as LimitA
state "TableScanBuilder" as TableScanBuilder
ProjectA --> LimitA
LimitA --> TableScanBuilder
}
Optimizer rule update
To satisfy requirement listed in 2, new format rule format was created. See example for PUSH_DOWN_FILTER
below; ...
matches any amount of Logical Plan Tree nodes of any types, except LogicalFilter
.
stateDiagram-v2
state "Old PUSH_DOWN_FILTER implementation" as OldFilter {
state "LogicalFilter" as FilterOld
state "ScanBuilder" as RelationOld
FilterOld --> RelationOld
}
state "New PUSH_DOWN_FILTER implementation" as NewFilter {
state "LogicalFilter" as FilterNew
state "..." as dots
state "ScanBuilder" as RelationNew
FilterNew --> dots
dots --> RelationNew
}
This new format was applied to PUSH_DOWN_PAGE_SIZE
, PUSH_DOWN_FILTER
, PUSH_DOWN_AGGREGATION
, PUSH_DOWN_FILTER
, PUSH_DOWN_SORT
, PUSH_DOWN_HIGHLIGHT
, PUSH_DOWN_NESTED
, PUSH_DOWN_PROJECT
and PUSH_DOWN_LIMIT
.
CreateTableScanBuilder
, CreateTableWriteBuilder
and Prometheum related rules are not changed.
stateDiagram-v2
state "CreateTableScanBuilder" as Builder {
state "LogicalRelation" as Scan
}
PushDownRule
class used to build PUSH_DOWN_*
rules. The class architecture follows:
classDiagram
class PushDownRule~T~ {
-Class~T~ clazz
-BiFunction pushDownFunction
-List~Function~ exceptions
-boolean canBeAppliedMultipleTimes*
-getDefaultException() Function
+PushDownRule(Class~T~, boolean, BiFunction)
+PushDownRule(Class~T~, boolean, BiFunction, Function)
+pattern()* Pattern~T~
+apply(T, Captures)* LogicalPlan
-findTableScanBuilder(LogicalPlan) Optional~TableScanBuilder~
}
The following rule configurations are created:
Rule | Tree Node Type | Can be applied multiple times | Push Down function | Exception |
---|---|---|---|---|
PUSH_DOWN_FILTER |
LogicalFilter |
true |
pushDownFilter |
LogicalAggregation , LogicalProject |
PUSH_DOWN_AGGREGATION |
LogicalAggregation |
false |
pushDownAggregation |
LogicalProject |
PUSH_DOWN_SORT |
LogicalSort |
true |
pushDownSort |
LogicalProject |
PUSH_DOWN_LIMIT |
LogicalLimit |
false |
pushDownLimit |
LogicalSort , LogicalFilter , LogicalProject |
PUSH_DOWN_PROJECT |
LogicalProject |
false |
pushDownProject |
LogicalEval , LogicalWindow |
PUSH_DOWN_HIGHLIGHT |
LogicalHighlight |
true |
pushDownHighlight |
none |
PUSH_DOWN_NESTED |
LogicalNested |
false |
pushDownNested |
none |
PUSH_DOWN_PAGE_SIZE |
LogicalPaginate |
false |
pushDownPageSize |
none |
Note that default exception is applied to all rules.
Rules are defined, checked and applied in the following order:
PUSH_DOWN_PAGE_SIZE
PUSH_DOWN_FILTER
PUSH_DOWN_AGGREGATION
PUSH_DOWN_FILTER
PUSH_DOWN_SORT
PUSH_DOWN_HIGHLIGHT
PUSH_DOWN_NESTED
PUSH_DOWN_PROJECT
PUSH_DOWN_LIMIT
To satisfy requirement 11, PUSH_DOWN_FILTER
is listed twice.
Code
Please, see code MVP PoC in Bit-Quill:dev-optimizer-rework
branch: https://github.com/opensearch-project/sql/compare/main..Bit-Quill:opensearch-project-sql:dev-optimizer-rework?expand=1
This code passes all IT and contains only Optimizer rework. There're no new features, so things marked as not implemented are still not implemented.
References
Current optimizer doc: https://github.com/opensearch-project/sql/blob/main/docs/dev/query-optimizer-improvement.md
Last optimizer update was done in #1091
Purpose
Proposed changes will unblock other fixes and features:
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status