SlideShare a Scribd company logo
Joins and aggregations in a 
distributed NoSQL DB 
NoSQLmatters, Barcelona, 22 November 2014 
Max Neunhöffer 
www.arangodb.com
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "1934-11-13", 
"hobbies": ["Golf", 
"Singing", 
"Running"], 
"home": 
{"town": "Duck town", 
"street": "Lake Road", 
"number": 17}, 
"species": "duck" 
} 
1
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "1934-11-13", 
"hobbies": ["Golf", 
"Singing", 
"Running"], 
"home": 
{"town": "Duck town", 
"street": "Lake Road", 
"number": 17}, 
"species": "duck" 
} 
When I say “document”, 
I mean “JSON”. 
1
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "1934-11-13", 
"hobbies": ["Golf", 
"Singing", 
"Running"], 
"home": 
{"town": "Duck town", 
"street": "Lake Road", 
"number": 17}, 
"species": "duck" 
} 
When I say “document”, 
I mean “JSON”. 
A “collection” is a set of 
documents in a DB. 
1
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "1934-11-13", 
"hobbies": ["Golf", 
"Singing", 
"Running"], 
"home": 
{"town": "Duck town", 
"street": "Lake Road", 
"number": 17}, 
"species": "duck" 
} 
When I say “document”, 
I mean “JSON”. 
A “collection” is a set of 
documents in a DB. 
The DB can inspect the 
values, allowing for 
secondary indexes. 
1
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "1934-11-13", 
"hobbies": ["Golf", 
"Singing", 
"Running"], 
"home": 
{"town": "Duck town", 
"street": "Lake Road", 
"number": 17}, 
"species": "duck" 
} 
When I say “document”, 
I mean “JSON”. 
A “collection” is a set of 
documents in a DB. 
The DB can inspect the 
values, allowing for 
secondary indexes. 
Or one can just treat the 
DB as a key/value store. 
1
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "1934-11-13", 
"hobbies": ["Golf", 
"Singing", 
"Running"], 
"home": 
{"town": "Duck town", 
"street": "Lake Road", 
"number": 17}, 
"species": "duck" 
} 
When I say “document”, 
I mean “JSON”. 
A “collection” is a set of 
documents in a DB. 
The DB can inspect the 
values, allowing for 
secondary indexes. 
Or one can just treat the 
DB as a key/value store. 
Sharding: the data of a 
collection is distributed 
between multiple servers. 
1
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
2
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
2
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
directed or undirected. 
2
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
directed or undirected. 
Vertices and edges are 
documents. 
2
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
directed or undirected. 
Vertices and edges are 
documents. 
Every edge has a _from and a _to 
attribute. 
2
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
directed or undirected. 
Vertices and edges are 
documents. 
Every edge has a _from and a _to 
attribute. 
The database offers queries and 
transactions dealing with graphs. 
2
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
directed or undirected. 
Vertices and edges are 
documents. 
Every edge has a _from and a _to 
attribute. 
The database offers queries and 
transactions dealing with graphs. 
For example, paths in the graph 
are interesting. 
2
Query 1 
Fetch all documents in a collection 
FOR p IN people 
RETURN p 
3
Query 1 
Fetch all documents in a collection 
FOR p IN people 
RETURN p 
[ { "name": "Schmidt", "firstname": "Helmut", 
"hobbies": ["Smoking"]}, 
{ "name": "Neunhöffer", "firstname": "Max", 
"hobbies": ["Piano", "Golf"]}, 
... 
] 
3
Query 1 
Fetch all documents in a collection 
FOR p IN people 
RETURN p 
[ { "name": "Schmidt", "firstname": "Helmut", 
"hobbies": ["Smoking"]}, 
{ "name": "Neunhöffer", "firstname": "Max", 
"hobbies": ["Piano", "Golf"]}, 
... 
] 
(Actually, a cursor is returned.) 
3
Query 2 
Use 1ltering, sorting and limit 
FOR p IN people 
FILTER p.age >= @minage 
SORT p.name, p.firstname 
LIMIT @nrlimit 
RETURN { name: CONCAT(p.name, ", ", p.firstname), 
age : p.age } 
4
Query 2 
Use 1ltering, sorting and limit 
FOR p IN people 
FILTER p.age >= @minage 
SORT p.name, p.firstname 
LIMIT @nrlimit 
RETURN { name: CONCAT(p.name, ", ", p.firstname), 
age : p.age } 
[ { "name": "Neunhöffer, Max", "age": 44 }, 
{ "name": "Schmidt, Helmut", "age": 95 }, 
... 
] 
4
Query 3 
Aggregation and functions 
FOR p IN people 
COLLECT a = p.age INTO L 
FILTER a >= @minage 
RETURN { "age": a, "number": LENGTH(L) } 
5
Query 3 
Aggregation and functions 
FOR p IN people 
COLLECT a = p.age INTO L 
FILTER a >= @minage 
RETURN { "age": a, "number": LENGTH(L) } 
[ { "age": 18, "number": 10 }, 
{ "age": 19, "number": 17 }, 
{ "age": 20, "number": 12 }, 
... 
] 
5
Query 4 
Joins 
FOR p IN @@peoplecollection 
FOR h IN houses 
FILTER p._key == h.owner 
SORT h.streetname, h.housename 
RETURN { housename: h.housename, 
streetname: h.streetname, 
owner: p.name, 
value: h.value } 
6
Query 4 
Joins 
FOR p IN @@peoplecollection 
FOR h IN houses 
FILTER p._key == h.owner 
SORT h.streetname, h.housename 
RETURN { housename: h.housename, 
streetname: h.streetname, 
owner: p.name, 
value: h.value } 
[ { "housename": "Firlefanz", 
"streetname": "Meyer street", 
"owner": "Hans Schmidt", "value": 423000 
}, 
... 
] 
6
Query 5 
Modifying data 
FOR e IN events 
FILTER e.timestamp < "2014-09-01T09:53+0200" 
INSERT e IN oldevents 
FOR e IN events 
FILTER e.timestamp < "2014-09-01T09:53+0200" 
REMOVE e._key IN events 
7
Query 6 
Graph queries 
FOR x IN GRAPH_SHORTEST_PATH( 
"routeplanner", "germanCity/Cologne", 
"frenchCity/Paris", {weight: "distance"} ) 
RETURN { begin : x.startVertex, 
end : x.vertex, 
distance : x.distance, 
nrPaths : LENGTH(x.paths) } 
8
Query 6 
Graph queries 
FOR x IN GRAPH_SHORTEST_PATH( 
"routeplanner", "germanCity/Cologne", 
"frenchCity/Paris", {weight: "distance"} ) 
RETURN { begin : x.startVertex, 
end : x.vertex, 
distance : x.distance, 
nrPaths : LENGTH(x.paths) } 
[ { "begin": "germanCity/Cologne", 
"end" : {"_id": "frenchCity/Paris", ... }, 
"distance": 550, 
"nrPaths": 10 }, 
... 
] 8
Life of a query 
Text and query parameters come from user 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
Optimise one EXP, produce many, potentially better EXPs 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
Optimise one EXP, produce many, potentially better EXPs 
Reason about distribution in cluster 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
Optimise one EXP, produce many, potentially better EXPs 
Reason about distribution in cluster 
Optimise distributed EXPs 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
Optimise one EXP, produce many, potentially better EXPs 
Reason about distribution in cluster 
Optimise distributed EXPs 
Estimate costs for all EXPs, and sort by ascending cost 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
Optimise one EXP, produce many, potentially better EXPs 
Reason about distribution in cluster 
Optimise distributed EXPs 
Estimate costs for all EXPs, and sort by ascending cost 
Instanciate “cheapest” plan, i.e. set up execution engine 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
Optimise one EXP, produce many, potentially better EXPs 
Reason about distribution in cluster 
Optimise distributed EXPs 
Estimate costs for all EXPs, and sort by ascending cost 
Instanciate “cheapest” plan, i.e. set up execution engine 
Distribute and link up engines on different servers 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
Optimise one EXP, produce many, potentially better EXPs 
Reason about distribution in cluster 
Optimise distributed EXPs 
Estimate costs for all EXPs, and sort by ascending cost 
Instanciate “cheapest” plan, i.e. set up execution engine 
Distribute and link up engines on different servers 
Execute plan, provide cursor API 
9
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
Calculation xx 
EnumerateCollection b 
Calculation xx == b.y 
Filter xx == b.y 
Calc {x: a.x, z: b.z} 
Return {x: a.x, z: b.z} 
FILTER xx == b.y 
Query ! EXP 
10
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
Calculation xx 
EnumerateCollection b 
Calculation xx == b.y 
Filter xx == b.y 
Calc {x: a.x, z: b.z} 
Return {x: a.x, z: b.z} 
FILTER xx == b.y 
Query ! EXP 
Black arrows are 
dependencies 
10
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
Calculation xx 
EnumerateCollection b 
Calculation xx == b.y 
Filter xx == b.y 
Calc {x: a.x, z: b.z} 
Return {x: a.x, z: b.z} 
FILTER xx == b.y 
Query ! EXP 
Black arrows are 
dependencies 
Think of a pipeline 
10
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
Calculation xx 
EnumerateCollection b 
Calculation xx == b.y 
Filter xx == b.y 
Calc {x: a.x, z: b.z} 
Return {x: a.x, z: b.z} 
FILTER xx == b.y 
Query ! EXP 
Black arrows are 
dependencies 
Think of a pipeline 
Each node provides 
a cursor API 
10
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
Calculation xx 
EnumerateCollection b 
Calculation xx == b.y 
Filter xx == b.y 
Calc {x: a.x, z: b.z} 
Return {x: a.x, z: b.z} 
FILTER xx == b.y 
Query ! EXP 
Black arrows are 
dependencies 
Think of a pipeline 
Each node provides 
a cursor API 
Blocks of “Items” 
travel through the 
pipeline 
10
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
Calculation xx 
EnumerateCollection b 
Calculation xx == b.y 
Filter xx == b.y 
Calc {x: a.x, z: b.z} 
Return {x: a.x, z: b.z} 
FILTER xx == b.y 
Query ! EXP 
Black arrows are 
dependencies 
Think of a pipeline 
Each node provides 
a cursor API 
Blocks of “Items” 
travel through the 
pipeline 
What is an “item”??? 
10
Pipeline and items 
Singleton 
FOR a IN collA EnumerateCollection a 
LET xx = a.x Calculation xx 
Items have vars a, xx 
EnumerateCollection b 
FOR b IN collB 
Items have no vars 
Items are the thingies traveling through the pipeline. 
11
Pipeline and items 
Singleton 
FOR a IN collA EnumerateCollection a 
LET xx = a.x Calculation xx 
Items have vars a, xx 
EnumerateCollection b 
FOR b IN collB 
Items have no vars 
Items are the thingies traveling through the pipeline. 
An item holds values of those variables in the current frame 
11
Pipeline and items 
Singleton 
FOR a IN collA EnumerateCollection a 
LET xx = a.x Calculation xx 
Items have vars a, xx 
EnumerateCollection b 
FOR b IN collB 
Items have no vars 
Items are the thingies traveling through the pipeline. 
An item holds values of those variables in the current frame 
Thus: Items look differently in different parts of the plan 
11
Pipeline and items 
Singleton 
FOR a IN collA EnumerateCollection a 
LET xx = a.x Calculation xx 
Items have vars a, xx 
EnumerateCollection b 
FOR b IN collB 
Items have no vars 
Items are the thingies traveling through the pipeline. 
An item holds values of those variables in the current frame 
Thus: Items look differently in different parts of the plan 
We always deal with blocks of items for performance reasons 
11
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
Calculation xx 
EnumerateCollection b 
Calculation xx == b.y 
Filter xx == b.y 
Calc {x: a.x, z: b.z} 
Return {x: a.x, z: b.z} 
FILTER xx == b.y 
12
Move 1lters up 
FOR a IN collA 
FOR b IN collB 
FILTER a.x == 10 
FILTER a.u == b.v 
RETURN {u:a.u,w:b.w} 
Singleton 
EnumColl a 
EnumColl b 
Calc a.x == 10 
Filter a.x == 10 
Calc a.u == b.v 
Filter a.u == b.v 
Return {u:a.u,w:b.w} 
13
Move 1lters up 
FOR a IN collA 
FOR b IN collB 
FILTER a.x == 10 
FILTER a.u == b.v 
RETURN {u:a.u,w:b.w} 
The result and behaviour does not 
change, if the 1rst FILTER is pulled 
out of the inner FOR. 
Singleton 
EnumColl a 
EnumColl b 
Calc a.x == 10 
Filter a.x == 10 
Calc a.u == b.v 
Filter a.u == b.v 
Return {u:a.u,w:b.w} 
13
Move 1lters up 
FOR a IN collA 
FILTER a.x < 10 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {u:a.u,w:b.w} 
The result and behaviour does not 
change, if the 1rst FILTER is pulled 
out of the inner FOR. 
However, the number of items trave-ling 
in the pipeline is decreased. 
Singleton 
EnumColl a 
Calc a.x == 10 
Filter a.x == 10 
EnumColl b 
Calc a.u == b.v 
Filter a.u == b.v 
Return {u:a.u,w:b.w} 
13
Move 1lters up 
FOR a IN collA 
FILTER a.x < 10 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {u:a.u,w:b.w} 
The result and behaviour does not 
change, if the 1rst FILTER is pulled 
out of the inner FOR. 
However, the number of items trave-ling 
in the pipeline is decreased. 
Note that the two FOR statements 
could be interchanged! 
Singleton 
EnumColl a 
Calc a.x == 10 
Filter a.x == 10 
EnumColl b 
Calc a.u == b.v 
Filter a.u == b.v 
Return {u:a.u,w:b.w} 
13
Remove unnecessary calculations 
FOR a IN collA 
LET L = LENGTH(a.hobbies) 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {h:a.hobbies,w:b.w} 
Singleton 
EnumColl a 
Calc L = ... 
EnumColl b 
Calc a.u == b.v 
Filter a.u == b.v 
Return {...} 
14
Remove unnecessary calculations 
FOR a IN collA 
LET L = LENGTH(a.hobbies) 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {h:a.hobbies,w:b.w} 
The Calculation of L is unnecessary! 
Singleton 
EnumColl a 
Calc L = ... 
EnumColl b 
Calc a.u == b.v 
Filter a.u == b.v 
Return {...} 
14
Remove unnecessary calculations 
FOR a IN collA 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {h:a.hobbies,w:b.w} 
The Calculation of L is unnecessary! 
(since it cannot throw an exception). 
Singleton 
EnumColl a 
EnumColl b 
Calc a.u == b.v 
Filter a.u == b.v 
Return {...} 
14
Remove unnecessary calculations 
FOR a IN collA 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {h:a.hobbies,w:b.w} 
The Calculation of L is unnecessary! 
(since it cannot throw an exception). 
Therefore we can just leave it out. 
Singleton 
EnumColl a 
EnumColl b 
Calc a.u == b.v 
Filter a.u == b.v 
Return {...} 
14
Use index for FILTER and SORT 
FOR a IN collA 
FILTER a.x > 17 && 
a.x <= 23 && 
a.y == 10 
SORT a.y, a.x 
RETURN a 
Singleton 
EnumColl a 
Calc ... 
Filter ... 
Sort a.y, a.x 
Return a 
15
Use index for FILTER and SORT 
FOR a IN collA 
FILTER a.x > 17 && 
a.x <= 23 && 
a.y == 10 
SORT a.y, a.x 
RETURN a 
Assume collA has a skiplist index on “y” 
and “x” (in this order), 
Singleton 
EnumColl a 
Calc ... 
Filter ... 
Sort a.y, a.x 
Return a 
15
Use index for FILTER and SORT 
FOR a IN collA 
FILTER a.x > 17 && 
a.x <= 23 && 
a.y == 10 
SORT a.y, a.x 
RETURN a 
Assume collA has a skiplist index on “y” 
and “x” (in this order), then we can read 
off the half-open interval between 
{ y: 10, x: 17 } and 
{ y: 10, x: 23 } 
from the skiplist index. 
Singleton 
IndexRange a 
Sort a.y, a.x 
Return a 
15
Use index for FILTER and SORT 
FOR a IN collA 
FILTER a.x > 17 && 
a.x <= 23 && 
a.y == 10 
SORT a.y, a.x 
RETURN a 
Assume collA has a skiplist index on “y” 
and “x” (in this order), then we can read 
off the half-open interval between 
{ y: 10, x: 17 } and 
{ y: 10, x: 23 } 
from the skiplist index. 
The result will automatically be sorted by 
y and then by x. 
Singleton 
IndexRange a 
Return a 
15
Data distribution in a cluster 
Requests 
Coordinator Coordinator 
DBserver DBserver DBserver 
1 4 2 5 3 1 
The shards of a collection are distributed across the DB 
servers. 
16
Data distribution in a cluster 
Requests 
Coordinator Coordinator 
DBserver DBserver DBserver 
1 4 2 5 3 1 
The shards of a collection are distributed across the DB 
servers. 
The coordinators receive queries and organise their 
execution 
16
Scatter/gather 
EnumerateCollection 
17
Scatter/gather 
Remote Remote 
EnumShard 
Remote 
EnumShard 
Remote 
Concat/Merge 
Remote 
EnumShard 
Remote 
Scatter 
17
Scatter/gather 
Remote Remote 
EnumShard 
Remote 
EnumShard 
Remote 
Concat/Merge 
Remote 
EnumShard 
Remote 
Scatter 
17
Modifying queries 
Fortunately: 
There can be at most one modifying node in each query. 
There can be no modifying nodes in subqueries. 
18
Modifying queries 
Fortunately: 
There can be at most one modifying node in each query. 
There can be no modifying nodes in subqueries. 
Modifying nodes 
The modifying node in a query 
is executed on the DBservers, 
18
Modifying queries 
Fortunately: 
There can be at most one modifying node in each query. 
There can be no modifying nodes in subqueries. 
Modifying nodes 
The modifying node in a query 
is executed on the DBservers, 
to this end, we either scatter the items to all DBservers, 
or, if possible, we distribute each item to the shard 
that is responsible for the modi1cation. 
18
Modifying queries 
Fortunately: 
There can be at most one modifying node in each query. 
There can be no modifying nodes in subqueries. 
Modifying nodes 
The modifying node in a query 
is executed on the DBservers, 
to this end, we either scatter the items to all DBservers, 
or, if possible, we distribute each item to the shard 
that is responsible for the modi1cation. 
Sometimes, we can even optimise away a gather/scatter 
combination and parallelise completely. 
18

More Related Content

PDF
Understanding Graph Databases with Neo4j and Cypher
Ruhaim Izmeth
 
KEY
Hadoop london
Yahoo Developer Network
 
PDF
What is the best full text search engine for Python?
Andrii Soldatenko
 
PPTX
La recherche sur internet, l'E-réputation
Bernard André
 
PDF
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
Ruby Meditation
 
KEY
MongoDB Aggregation Framework
Tyler Brock
 
PDF
MongoDB Aggregation Framework
Caserta
 
PPTX
Java Performance Tips (So Code Camp San Diego 2014)
Kai Chan
 
Understanding Graph Databases with Neo4j and Cypher
Ruhaim Izmeth
 
What is the best full text search engine for Python?
Andrii Soldatenko
 
La recherche sur internet, l'E-réputation
Bernard André
 
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
Ruby Meditation
 
MongoDB Aggregation Framework
Tyler Brock
 
MongoDB Aggregation Framework
Caserta
 
Java Performance Tips (So Code Camp San Diego 2014)
Kai Chan
 

What's hot (7)

PDF
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Kai Chan
 
PDF
Search Engine-Building with Lucene and Solr
Kai Chan
 
PDF
Regular Expressions in Google Analytics
Shivani Singh
 
PPTX
Google code search
mona zavichi tork
 
PDF
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Kai Chan
 
PPTX
TextMining with R
Aleksei Beloshytski
 
PDF
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Kai Chan
 
Search Engine-Building with Lucene and Solr
Kai Chan
 
Regular Expressions in Google Analytics
Shivani Singh
 
Google code search
mona zavichi tork
 
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Kai Chan
 
TextMining with R
Aleksei Beloshytski
 
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
Ad

Similar to Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL matters Barcelona 2014 (20)

PDF
Complex queries in a distributed multi-model database
Max Neunhöffer
 
PPTX
Powerful Analysis with the Aggregation Pipeline
MongoDB
 
PPTX
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
MongoDB
 
PDF
Doing More with MongoDB Aggregation
MongoDB
 
PPTX
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
MongoDB
 
PDF
06. ElasticSearch : Mapping and Analysis
OpenThink Labs
 
PPTX
Mapping Graph Queries to PostgreSQL
Gábor Szárnyas
 
PPTX
The Aggregation Framework
MongoDB
 
PDF
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Paul Leclercq
 
PDF
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
André Ricardo Barreto de Oliveira
 
PDF
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB
 
PDF
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB
 
PDF
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
MongoDB
 
PDF
01 ElasticSearch : Getting Started
OpenThink Labs
 
PPTX
Python crush course
Mohammed El Rafie Tarabay
 
PDF
Dynamic languages, for software craftmanship group
Reuven Lerner
 
PPT
A Survey Of R Graphics
Dataspora
 
PPTX
Webinar: Strongly Typed Languages and Flexible Schemas
MongoDB
 
Complex queries in a distributed multi-model database
Max Neunhöffer
 
Powerful Analysis with the Aggregation Pipeline
MongoDB
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
MongoDB
 
Doing More with MongoDB Aggregation
MongoDB
 
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
MongoDB
 
06. ElasticSearch : Mapping and Analysis
OpenThink Labs
 
Mapping Graph Queries to PostgreSQL
Gábor Szárnyas
 
The Aggregation Framework
MongoDB
 
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Paul Leclercq
 
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
André Ricardo Barreto de Oliveira
 
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB
 
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB
 
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
MongoDB
 
01 ElasticSearch : Getting Started
OpenThink Labs
 
Python crush course
Mohammed El Rafie Tarabay
 
Dynamic languages, for software craftmanship group
Reuven Lerner
 
A Survey Of R Graphics
Dataspora
 
Webinar: Strongly Typed Languages and Flexible Schemas
MongoDB
 
Ad

More from NoSQLmatters (20)

PDF
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
NoSQLmatters
 
PDF
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
NoSQLmatters
 
PDF
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
NoSQLmatters
 
PDF
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
NoSQLmatters
 
PDF
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
NoSQLmatters
 
PDF
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
NoSQLmatters
 
PDF
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
NoSQLmatters
 
PDF
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
NoSQLmatters
 
PDF
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
NoSQLmatters
 
PDF
Chris Ward - Understanding databases for distributed docker applications - No...
NoSQLmatters
 
PDF
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
NoSQLmatters
 
PDF
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
NoSQLmatters
 
PDF
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
NoSQLmatters
 
PDF
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
NoSQLmatters
 
PDF
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
NoSQLmatters
 
PDF
David Pilato - Advance search for your legacy application - NoSQL matters Par...
NoSQLmatters
 
PDF
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
NoSQLmatters
 
PDF
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
NoSQLmatters
 
PDF
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
NoSQLmatters
 
PDF
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
NoSQLmatters
 
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
NoSQLmatters
 
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
NoSQLmatters
 
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
NoSQLmatters
 
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
NoSQLmatters
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
NoSQLmatters
 
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
NoSQLmatters
 
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
NoSQLmatters
 
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
NoSQLmatters
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
NoSQLmatters
 
Chris Ward - Understanding databases for distributed docker applications - No...
NoSQLmatters
 
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
NoSQLmatters
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
NoSQLmatters
 
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
NoSQLmatters
 
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
NoSQLmatters
 
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
NoSQLmatters
 
David Pilato - Advance search for your legacy application - NoSQL matters Par...
NoSQLmatters
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
NoSQLmatters
 
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
NoSQLmatters
 
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
NoSQLmatters
 
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
NoSQLmatters
 

Recently uploaded (20)

PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PPTX
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PDF
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PDF
Chad Readey - An Independent Thinker
Chad Readey
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
Chad Readey - An Independent Thinker
Chad Readey
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 

Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL matters Barcelona 2014

  • 1. Joins and aggregations in a distributed NoSQL DB NoSQLmatters, Barcelona, 22 November 2014 Max Neunhöffer www.arangodb.com
  • 2. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } 1
  • 3. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. 1
  • 4. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. 1
  • 5. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. 1
  • 6. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. Or one can just treat the DB as a key/value store. 1
  • 7. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. Or one can just treat the DB as a key/value store. Sharding: the data of a collection is distributed between multiple servers. 1
  • 8. Graphs A B D E F G C "likes" "hates" 2
  • 9. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. 2
  • 10. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. 2
  • 11. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. 2
  • 12. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. 2
  • 13. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. The database offers queries and transactions dealing with graphs. 2
  • 14. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. The database offers queries and transactions dealing with graphs. For example, paths in the graph are interesting. 2
  • 15. Query 1 Fetch all documents in a collection FOR p IN people RETURN p 3
  • 16. Query 1 Fetch all documents in a collection FOR p IN people RETURN p [ { "name": "Schmidt", "firstname": "Helmut", "hobbies": ["Smoking"]}, { "name": "Neunhöffer", "firstname": "Max", "hobbies": ["Piano", "Golf"]}, ... ] 3
  • 17. Query 1 Fetch all documents in a collection FOR p IN people RETURN p [ { "name": "Schmidt", "firstname": "Helmut", "hobbies": ["Smoking"]}, { "name": "Neunhöffer", "firstname": "Max", "hobbies": ["Piano", "Golf"]}, ... ] (Actually, a cursor is returned.) 3
  • 18. Query 2 Use 1ltering, sorting and limit FOR p IN people FILTER p.age >= @minage SORT p.name, p.firstname LIMIT @nrlimit RETURN { name: CONCAT(p.name, ", ", p.firstname), age : p.age } 4
  • 19. Query 2 Use 1ltering, sorting and limit FOR p IN people FILTER p.age >= @minage SORT p.name, p.firstname LIMIT @nrlimit RETURN { name: CONCAT(p.name, ", ", p.firstname), age : p.age } [ { "name": "Neunhöffer, Max", "age": 44 }, { "name": "Schmidt, Helmut", "age": 95 }, ... ] 4
  • 20. Query 3 Aggregation and functions FOR p IN people COLLECT a = p.age INTO L FILTER a >= @minage RETURN { "age": a, "number": LENGTH(L) } 5
  • 21. Query 3 Aggregation and functions FOR p IN people COLLECT a = p.age INTO L FILTER a >= @minage RETURN { "age": a, "number": LENGTH(L) } [ { "age": 18, "number": 10 }, { "age": 19, "number": 17 }, { "age": 20, "number": 12 }, ... ] 5
  • 22. Query 4 Joins FOR p IN @@peoplecollection FOR h IN houses FILTER p._key == h.owner SORT h.streetname, h.housename RETURN { housename: h.housename, streetname: h.streetname, owner: p.name, value: h.value } 6
  • 23. Query 4 Joins FOR p IN @@peoplecollection FOR h IN houses FILTER p._key == h.owner SORT h.streetname, h.housename RETURN { housename: h.housename, streetname: h.streetname, owner: p.name, value: h.value } [ { "housename": "Firlefanz", "streetname": "Meyer street", "owner": "Hans Schmidt", "value": 423000 }, ... ] 6
  • 24. Query 5 Modifying data FOR e IN events FILTER e.timestamp < "2014-09-01T09:53+0200" INSERT e IN oldevents FOR e IN events FILTER e.timestamp < "2014-09-01T09:53+0200" REMOVE e._key IN events 7
  • 25. Query 6 Graph queries FOR x IN GRAPH_SHORTEST_PATH( "routeplanner", "germanCity/Cologne", "frenchCity/Paris", {weight: "distance"} ) RETURN { begin : x.startVertex, end : x.vertex, distance : x.distance, nrPaths : LENGTH(x.paths) } 8
  • 26. Query 6 Graph queries FOR x IN GRAPH_SHORTEST_PATH( "routeplanner", "germanCity/Cologne", "frenchCity/Paris", {weight: "distance"} ) RETURN { begin : x.startVertex, end : x.vertex, distance : x.distance, nrPaths : LENGTH(x.paths) } [ { "begin": "germanCity/Cologne", "end" : {"_id": "frenchCity/Paris", ... }, "distance": 550, "nrPaths": 10 }, ... ] 8
  • 27. Life of a query Text and query parameters come from user 9
  • 28. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) 9
  • 29. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters 9
  • 30. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. 9
  • 31. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) 9
  • 32. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs 9
  • 33. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster 9
  • 34. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs 9
  • 35. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost 9
  • 36. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine 9
  • 37. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine Distribute and link up engines on different servers 9
  • 38. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine Distribute and link up engines on different servers Execute plan, provide cursor API 9
  • 39. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP 10
  • 40. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies 10
  • 41. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies Think of a pipeline 10
  • 42. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API 10
  • 43. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline 10
  • 44. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline What is an “item”??? 10
  • 45. Pipeline and items Singleton FOR a IN collA EnumerateCollection a LET xx = a.x Calculation xx Items have vars a, xx EnumerateCollection b FOR b IN collB Items have no vars Items are the thingies traveling through the pipeline. 11
  • 46. Pipeline and items Singleton FOR a IN collA EnumerateCollection a LET xx = a.x Calculation xx Items have vars a, xx EnumerateCollection b FOR b IN collB Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame 11
  • 47. Pipeline and items Singleton FOR a IN collA EnumerateCollection a LET xx = a.x Calculation xx Items have vars a, xx EnumerateCollection b FOR b IN collB Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan 11
  • 48. Pipeline and items Singleton FOR a IN collA EnumerateCollection a LET xx = a.x Calculation xx Items have vars a, xx EnumerateCollection b FOR b IN collB Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan We always deal with blocks of items for performance reasons 11
  • 49. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y 12
  • 50. Move 1lters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} Singleton EnumColl a EnumColl b Calc a.x == 10 Filter a.x == 10 Calc a.u == b.v Filter a.u == b.v Return {u:a.u,w:b.w} 13
  • 51. Move 1lters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the 1rst FILTER is pulled out of the inner FOR. Singleton EnumColl a EnumColl b Calc a.x == 10 Filter a.x == 10 Calc a.u == b.v Filter a.u == b.v Return {u:a.u,w:b.w} 13
  • 52. Move 1lters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the 1rst FILTER is pulled out of the inner FOR. However, the number of items trave-ling in the pipeline is decreased. Singleton EnumColl a Calc a.x == 10 Filter a.x == 10 EnumColl b Calc a.u == b.v Filter a.u == b.v Return {u:a.u,w:b.w} 13
  • 53. Move 1lters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the 1rst FILTER is pulled out of the inner FOR. However, the number of items trave-ling in the pipeline is decreased. Note that the two FOR statements could be interchanged! Singleton EnumColl a Calc a.x == 10 Filter a.x == 10 EnumColl b Calc a.u == b.v Filter a.u == b.v Return {u:a.u,w:b.w} 13
  • 54. Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 55. Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 56. Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 57. Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Therefore we can just leave it out. Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 58. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Singleton EnumColl a Calc ... Filter ... Sort a.y, a.x Return a 15
  • 59. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), Singleton EnumColl a Calc ... Filter ... Sort a.y, a.x Return a 15
  • 60. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. Singleton IndexRange a Sort a.y, a.x Return a 15
  • 61. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. The result will automatically be sorted by y and then by x. Singleton IndexRange a Return a 15
  • 62. Data distribution in a cluster Requests Coordinator Coordinator DBserver DBserver DBserver 1 4 2 5 3 1 The shards of a collection are distributed across the DB servers. 16
  • 63. Data distribution in a cluster Requests Coordinator Coordinator DBserver DBserver DBserver 1 4 2 5 3 1 The shards of a collection are distributed across the DB servers. The coordinators receive queries and organise their execution 16
  • 65. Scatter/gather Remote Remote EnumShard Remote EnumShard Remote Concat/Merge Remote EnumShard Remote Scatter 17
  • 66. Scatter/gather Remote Remote EnumShard Remote EnumShard Remote Concat/Merge Remote EnumShard Remote Scatter 17
  • 67. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. 18
  • 68. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, 18
  • 69. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, to this end, we either scatter the items to all DBservers, or, if possible, we distribute each item to the shard that is responsible for the modi1cation. 18
  • 70. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, to this end, we either scatter the items to all DBservers, or, if possible, we distribute each item to the shard that is responsible for the modi1cation. Sometimes, we can even optimise away a gather/scatter combination and parallelise completely. 18