Description
Description
Meta ticket around the main implementation topics required for lookup join
-
add grammar / parser
LOOKUP JOIN
-
Make it execute correctly on data nodes, too, and enable csv tests
...OnTheDataNode
. See (reproducible) failed runs from ESQL: Fix lookup join output #117639 branch: -
ON
condition-
rewrite into JOIN equi-joinmaybe later - make
ON field
alias toUSING field
(syntax already used in enrich) - alternative is to deprecate this syntax - change index name grammar for identifier to source (to allow date math and others)
- enhance the field matching grammar against various declarations such as
parent.subfield
& co.
-
-
CSV tests (some suggestions to start with: Esql/lookup join grammar #116515 (comment))
- Cover primary use cases: Enhance LOOKUP JOIN csv-spec tests to cover more cases and fix several bugs found #117843
- Add tests for using timeseries indices in the
FROM
-
Known bugs
- Make
PushDownAndCombineLimits
work forJoin
s ESQL: LookupJoin fails to push down limits #117698 - Additional
null
rows ESQL: LOOKUP JOIN produces additional null rows #117702 - EsqlSession.fieldNames does not handle lookup references that are also mentioned in aliases (erases them)
- EsqlSession calls
preAnalyzeLookupIndices
asking for all the lookup index' fields all the time, c.f. Enhance LOOKUP JOIN csv-spec tests to cover more cases and fix several bugs found #117843 (comment) - Using an index alias makes it fail with a 500
- Filtering right side columns removes all data ESQL: Skip lookup fields when eliminating missing fields #118658
- using same lookup index in FROM breaks an internal assumption ESQL: using LOOKUP JOIN with same index as in FROM not working #118865
- duplicate rows in some specific cases ESQL lookup join with lookup index results in duplicate set of rows. #118852
- ESQL: LOOKUP JOIN and Sort: wrong optimizations or unplannable with "unknown physical plan node [OrderExec] #120817
- ESQL: No rows with JOIN/ENRICH/SAMPLE + EVAL + KEEP #120272
- Make
-
field resolution
- establish shadowing rules
- clarify in USING/ON pattern the rules around the join key which are following downstream (left vs right)
- make the other side fields nullable (this changes the equality for existing fields causing side-effects) create new fields or update the entire field declaration.
- check resolution exception to differentiate between 500 and 400 ESQL: Don't throw VerificationException on illegal JOIN state during resolution #118826
-
clarify output order rules in case of common keys (join keys and columns with same names) for consistency
- SQL JOIN:
- columns are kept in place and in lacking qualifiers, lead to ambiguation errors
- JOIN USING moves the join keys in front followed by the columns in table declaration order
- in ESQL the columns override each other:
- in EVAL, the latest declaration wins alongside its position - all previous declarations are hidden as if they were removed
- existing LOOKUP and INLINESTATS return left columns, join keys, right columns using EVAL rules for non-joining keys
- STATS BY x, returns grouping keys last (declaration order)
- proposal:
- a. return join keys first (like SQL first)
- b. return join keys last (declaration order, similar to STATS BY - clashes with USING)
- c. we use
ON
, which is not a thing in SQL, and have it behave similar to ENRICH: join columns inON
remain in place, right hand side columns are added on the right (like ENRICH) and shadow any left hand side columns with conflicting names.
- (c.f. output computation in Analyzer (Esql/lookup join grammar #116515 (comment)))
- We decided on Proposal c.
- SQL JOIN:
-
Filter pushdown (on the left/main branch) - see ESQL: Push down filter passed lookup join #118410
-
Fetch only the required fields (relates to field extraction + fieldNames), resp. prune irrelevant lookup index fields from the plan. ESQL: Prune lookup join cols #118808
-
Look whether the LookupJoin output could be computed inside the class ESQL: Fix lookup join output #117639
-
Resolve lookup table mappings - LOOKUP JOIN using field-caps for field mapping #117246
- Update lookup table mappings solution to support multiple join commands - Craig
-
Update verifier and add corresponding tests Esql/lookup join grammar #116515 (comment).
- disallow lookup joins on remote clusters/validate against CCQ
- disallow lookup joins on more than 1 join field
- Have validation for stuff that the grammar permits but we can't actually handle yet, like
LOOKUP JOIN idx ON a == b AND to_upper(c) LIKE d
-
Enhance tests
- Cover primary use cases: Enhance LOOKUP JOIN csv-spec tests to cover more cases and fix several bugs found #117843
- Add negative tests (missing indexes, indexes with wrong mode, unresolvable field references, etc.) ESQL: Small LOOKUP JOIN cleanups #117922
Compute
-
Make field extractor work with LookupOperators (keep field extraction in one place) -
Bifurcate join operator/allow other operators between join and Lookup source (eval, filter, etc..) -
Align LookupJoin and LookupJoinExec - the first allows branching, the second does not.
Update: decided to postpone all this until we either evolve the way we perform the actual lookup join, or allow more features like actual condition expressions.
Later
- Pushdown optimizations ESQL: LOOKUP JOIN push down optimizations #119082
Maybe
- obtain IndexMode inside IndexResolver (FieldCaps)
- ReplaceMissingFieldWithNull rule does not have access to SearchStats for lookup indices ESQL: SearchStats for lookup indices in LocalLogicalPlanOptimizer #118690
- refactor JoinConfig into general Expression
- rewrite into JOIN equi-join
- name qualifiers (for disambiguation) (look to descope)