Custom Pregel Algorithms in ArangoDB

Feature Preview: Custom Pregel
Complex Graph Algorithms made Easy
@arangodb @joerg_schad @hkernbach

2
tl;dr
● “Many practical computing problems concern large
graphs.”
● ArangoDB is a “Beyond Graph Database”
supporting multiple data models around a scalable
graph foundation
● Pregel is a framework for distributed graph
processing
○ ArangoDB supports predefined Prgel algorithms, e.g.
PageRank, Single-Source Shortest Path and Connected
components.
● Programmable Pregel Algorithms (PPA) allows
adding/modifying algorithms on the flight
Disclaimer
This is an experimental
feature and especially the
language specification
(front-end) is still under
development!

Jörg Schad, PhD
Head of Engineering and ML
@ArangoDB
● Suki.ai
● Mesosphere
● Architect @SAP Hana
● PhD Distributed DB
Systems
● Twitter: @joerg_schad

4
Heiko Kernbach
Core Engineer (Graphs Team)
@
● Graph
● Custom Pregel
● Geo / UI
● Twitter: @hkernbach
● Slack:
hkernbach.ArangoDB

5
● Open Source
● Beyond Graph Database
○ Stores, K/V, Documents connected by
scalable Graph Processing
● Scalable
○ Distributed Graphs
● AQL - SQL-like multi-model query language
● ACID Transactions including Multi Collection
Transactions

https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/

https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/
Pregel Max Value
While not converged:
Communicate: send own value to neighbours
Compute: Own value = Max Value from all messages (+ own value) Superstep

ArangoDB and Pregel: Status Quo
● https://www.arangodb.com/docs/stable/graphs-pregel.html
● https://www.arangodb.com/pregel-community-detection/
Available Algorithms
● Page Rank
● Seeded PageRank
● Single-Source Shortest Path
● Connected Components
○ Component
○ WeaklyConnected
○ StronglyConnected
● Hyperlink-Induced Topic Search
(HITS)Permalink
● Vertex Centrality
● Effective Closeness
● LineRank
● Label Propagation
● Speaker-Listener Label Propagation 8
var pregel = require("@arangodb/pregel");
pregel.start("pagerank", "graphname", {maxGSS: 100,
threshold: 0.00000001, resultField: "rank"})
● Pregel support since 2014
● Predefined algorithms
○ Could be extended via C++
● Same platform used for PPA
Challenges
Add and modify Algorithms

Programmable Pregel Algorithms (PPA)
const pregel = require("@arangodb/pregel");
let pregelID = pregel.start("air", graphName, "<custom-algorithm>");
var status = pregel.status(pregelID);
● Add/Modify algorithms on-the-fly
○ Without C++ code
○ Without restarting the Database
● Efficiency (as Pregel) depends on Sharding
○ Smart Graphs
○ Required: Collocation of vertices and edges
9

Custom Algorithm
10
{
"resultField": "<string>",
"maxGSS": "<number>",
"dataAccess": {
"writeVertex": "<program>",
"readVertex": "<array>",
"readEdge": "<array>"
},
"vertexAccumulators": "<object>",
"globalAccumulators": "<object>",
"customAccumulators": "<object>",
"phases": "<array>"
}
Accumulators
Accumulators are used to consume and process messages which are being
sent to them during the computational phase (initProgram, updateProgram,
onPreStep, onPostStep) of a superstep. After a superstep is done, all messages
will be processed.
● max: stores the maximum of all messages received.
● min: stores the minimum of all messages received.
● sum: sums up all messages received.
● and: computes and on all messages received.
● or: computes or and all messages received.
● store: holds the last received value (non-deterministic).
● list: stores all received values in list (order is non-deterministic).
● custom

Custom Algorithm
11
{
"resultField": "<string>",
"maxGSS": "<number>",
"dataAccess": {
"writeVertex": "<program>",
"readVertex": "<array>",
"readEdge": "<array>"
},
"vertexAccumulators": "<object>",
"globalAccumulators": "<object>",
"customAccumulators": "<object>",
"phases": "<array>"
}
● resultField (string, optional): Name of the document attribute to store the result in. The
vertex computation results will be in all vertices pointing to the given attribute.
● maxGSS (number, required): The max amount of global supersteps After the amount of max
defined supersteps is reached, the Pregel execution will stop.
● dataAccess (object, optional): Allows to define writeVertex, readVertex and readEdge.
○ writeVertex: A program that is used to write the results into vertices. If writeVertex is
used, the resultField will be ignored.
○ readVertex: An array that consists of strings and/or additional arrays (that represents
a path).
■ string: Represents a single attribute at the top level.
■ array of strings: Represents a nested path
○ readEdge: An array that consists of strings and/or additional arrays (that represents
a path).
■ string: Represents a single path at the top level which is not nested.
■ array of strings: Represents a nested path
● vertexAccumulators (object, optional): Definition of all used vertex accumulators.
● globalAccumulators (object, optional): Definition all used global accumulators. Global
Accumulators are able to access variables at shared global level.
● customAccumulators (object, optional): Definition of all used custom accumulators.
● phases (array): Array of a single or multiple phase definitions.
● debug (optional): See Debugging.

Phases - Execution order
12
Step 1: Initialization
1. onPreStep (Conductor, executed on Coordinator
instances)
2. initProgram (Worker, executed on DB-Server instances)
3. onPostStep (Conductor)
Step {2, ...n} Computation
1. onPreStep (Conductor)
2. updateProgram (Worker)
3. onPostStep (Conductor)

Program - Arango Intermediate Representation (AIR)
13

Lisp-like intermediate representation, represented in
JSON and supports its data types
14
Specification
● Language Primitives
○ Basic Algebraic Operators
○ Logical operators
○ Comparison operators
○ Lists
○ Sort
○ Dicts
○ Lambdas
○ Reduce
○ Utilities
○ Functional
○ Variables
○ Debug operators
● Math Library
● Special Form
○ let statement
○ seq statement
○ if statement
○ match statement
○ for-each statement
○ quote and quote-splice
statements
○ quasi-quote, unquote and
unquote-splice statements
○ cons statement
○ and and or statements

Lisp-like intermediate representation,
represented in JSON and supports its data types
15
Specification
● Language Primitives
○ Basic Algebraic Operators
○ Logical operators
○ Comparison operators
○ Lists
○ Sort
○ Dicts
○ Lambdas
○ Reduce
○ Utilities
○ Functional
○ Variables
○ Debug operators
● Math Library
● Special Form
○ let statement
○ seq statement
○ if statement
○ match statement
○ for-each statement
○ quote and quote-splice
statements
○ quasi-quote, unquote and
unquote-splice statements
○ cons statement
○ and and or statements

Pregelator
Simple Foxx service based IDE
16https://github.com/arangodb-foxx/pregelator

Custom Pregel Algorithms in ArangoDB

PPA: What is next?
- Gather Feedback
- In particular use-cases
- Missing functions & functionality
- User-friendly Front-End language
- Improve Scale/Performance of underlying
Pregel platform
- Algorithm library
- Blog Post (including Jupyter example)
18
ArangoDB 3.8 (end of year)
- Experimental Feature
- Initial Library
ArangoDB 3.9 (Q1 21)
- Draft for Front-End
- Extended Library
- Platform Improvements
ArangoDB 4.0 (Mid 21)
- GA

Pregel vs AQL
When to (not) use Pregel…
- Can the algorithm be efficiently be
expressed in Pregel?
- Counter example: Topological Sort
- Is the graph size worth the loading?
19
AQL Pregel
All Models (Graph, Document, Key-Value, Search, …) Iterative Graph Processing
Online Queries Large Graphs, multiple iterations

How can I start?
● Docker Image: arangodb/enterprise-preview:3.8.0-milestone.3
● Check existing algorithms
● Preview documentation
● Give Feedback
○ https://slack.arangodb.com/ -> custom-pregel
20

Thanks for listening!
21
Reach out with Feedback/Questions!
• @arangodb
• https://www.arangodb.com/
• docker pull arangodb
Test-drive Oasis
14-days for free

Custom Pregel Algorithms in ArangoDB

Recommended

More Related Content

Similar to Custom Pregel Algorithms in ArangoDB (20)

More from ArangoDB Database (20)

Recently uploaded (20)

Custom Pregel Algorithms in ArangoDB