SlideShare a Scribd company logo
Feature Preview: Custom Pregel
Complex Graph Algorithms made Easy
@arangodb @joerg_schad @hkernbach
2
tl;dr
● “Many practical computing problems concern large
graphs.”
● ArangoDB is a “Beyond Graph Database”
supporting multiple data models around a scalable
graph foundation
● Pregel is a framework for distributed graph
processing
○ ArangoDB supports predefined Prgel algorithms, e.g.
PageRank, Single-Source Shortest Path and Connected
components.
● Programmable Pregel Algorithms (PPA) allows
adding/modifying algorithms on the flight
Disclaimer
This is an experimental
feature and especially the
language specification
(front-end) is still under
development!
Jörg Schad, PhD
Head of Engineering and ML
@ArangoDB
● Suki.ai
● Mesosphere
● Architect @SAP Hana
● PhD Distributed DB
Systems
● Twitter: @joerg_schad
4
Heiko Kernbach
Core Engineer (Graphs Team)
@
● Graph
● Custom Pregel
● Geo / UI
● Twitter: @hkernbach
● Slack:
hkernbach.ArangoDB
5
● Open Source
● Beyond Graph Database
○ Stores, K/V, Documents connected by
scalable Graph Processing
● Scalable
○ Distributed Graphs
● AQL - SQL-like multi-model query language
● ACID Transactions including Multi Collection
Transactions
https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/
https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/
Pregel Max Value
While not converged:
Communicate: send own value to neighbours
Compute: Own value = Max Value from all messages (+ own value) Superstep
ArangoDB and Pregel: Status Quo
● https://www.arangodb.com/docs/stable/graphs-pregel.html
● https://www.arangodb.com/pregel-community-detection/
Available Algorithms
● Page Rank
● Seeded PageRank
● Single-Source Shortest Path
● Connected Components
○ Component
○ WeaklyConnected
○ StronglyConnected
● Hyperlink-Induced Topic Search
(HITS)Permalink
● Vertex Centrality
● Effective Closeness
● LineRank
● Label Propagation
● Speaker-Listener Label Propagation 8
var pregel = require("@arangodb/pregel");
pregel.start("pagerank", "graphname", {maxGSS: 100,
threshold: 0.00000001, resultField: "rank"})
● Pregel support since 2014
● Predefined algorithms
○ Could be extended via C++
● Same platform used for PPA
Challenges
Add and modify Algorithms
Programmable Pregel Algorithms (PPA)
const pregel = require("@arangodb/pregel");
let pregelID = pregel.start("air", graphName, "<custom-algorithm>");
var status = pregel.status(pregelID);
● Add/Modify algorithms on-the-fly
○ Without C++ code
○ Without restarting the Database
● Efficiency (as Pregel) depends on Sharding
○ Smart Graphs
○ Required: Collocation of vertices and edges
9
Custom Algorithm
10
{
"resultField": "<string>",
"maxGSS": "<number>",
"dataAccess": {
"writeVertex": "<program>",
"readVertex": "<array>",
"readEdge": "<array>"
},
"vertexAccumulators": "<object>",
"globalAccumulators": "<object>",
"customAccumulators": "<object>",
"phases": "<array>"
}
Accumulators
Accumulators are used to consume and process messages which are being
sent to them during the computational phase (initProgram, updateProgram,
onPreStep, onPostStep) of a superstep. After a superstep is done, all messages
will be processed.
● max: stores the maximum of all messages received.
● min: stores the minimum of all messages received.
● sum: sums up all messages received.
● and: computes and on all messages received.
● or: computes or and all messages received.
● store: holds the last received value (non-deterministic).
● list: stores all received values in list (order is non-deterministic).
● custom
Custom Algorithm
11
{
"resultField": "<string>",
"maxGSS": "<number>",
"dataAccess": {
"writeVertex": "<program>",
"readVertex": "<array>",
"readEdge": "<array>"
},
"vertexAccumulators": "<object>",
"globalAccumulators": "<object>",
"customAccumulators": "<object>",
"phases": "<array>"
}
● resultField (string, optional): Name of the document attribute to store the result in. The
vertex computation results will be in all vertices pointing to the given attribute.
● maxGSS (number, required): The max amount of global supersteps After the amount of max
defined supersteps is reached, the Pregel execution will stop.
● dataAccess (object, optional): Allows to define writeVertex, readVertex and readEdge.
○ writeVertex: A program that is used to write the results into vertices. If writeVertex is
used, the resultField will be ignored.
○ readVertex: An array that consists of strings and/or additional arrays (that represents
a path).
■ string: Represents a single attribute at the top level.
■ array of strings: Represents a nested path
○ readEdge: An array that consists of strings and/or additional arrays (that represents
a path).
■ string: Represents a single path at the top level which is not nested.
■ array of strings: Represents a nested path
● vertexAccumulators (object, optional): Definition of all used vertex accumulators.
● globalAccumulators (object, optional): Definition all used global accumulators. Global
Accumulators are able to access variables at shared global level.
● customAccumulators (object, optional): Definition of all used custom accumulators.
● phases (array): Array of a single or multiple phase definitions.
● debug (optional): See Debugging.
Phases - Execution order
12
Step 1: Initialization
1. onPreStep (Conductor, executed on Coordinator
instances)
2. initProgram (Worker, executed on DB-Server instances)
3. onPostStep (Conductor)
Step {2, ...n} Computation
1. onPreStep (Conductor)
2. updateProgram (Worker)
3. onPostStep (Conductor)
Program - Arango Intermediate Representation (AIR)
13
Program - Arango Intermediate Representation (AIR)
Lisp-like intermediate representation, represented in
JSON and supports its data types
14
Specification
● Language Primitives
○ Basic Algebraic Operators
○ Logical operators
○ Comparison operators
○ Lists
○ Sort
○ Dicts
○ Lambdas
○ Reduce
○ Utilities
○ Functional
○ Variables
○ Debug operators
● Math Library
● Special Form
○ let statement
○ seq statement
○ if statement
○ match statement
○ for-each statement
○ quote and quote-splice
statements
○ quasi-quote, unquote and
unquote-splice statements
○ cons statement
○ and and or statements
Program - Arango Intermediate Representation (AIR)
Lisp-like intermediate representation,
represented in JSON and supports its data types
15
Specification
● Language Primitives
○ Basic Algebraic Operators
○ Logical operators
○ Comparison operators
○ Lists
○ Sort
○ Dicts
○ Lambdas
○ Reduce
○ Utilities
○ Functional
○ Variables
○ Debug operators
● Math Library
● Special Form
○ let statement
○ seq statement
○ if statement
○ match statement
○ for-each statement
○ quote and quote-splice
statements
○ quasi-quote, unquote and
unquote-splice statements
○ cons statement
○ and and or statements
Pregelator
Simple Foxx service based IDE
16https://github.com/arangodb-foxx/pregelator
Custom Pregel Algorithms in ArangoDB
PPA: What is next?
- Gather Feedback
- In particular use-cases
- Missing functions & functionality
- User-friendly Front-End language
- Improve Scale/Performance of underlying
Pregel platform
- Algorithm library
- Blog Post (including Jupyter example)
18
ArangoDB 3.8 (end of year)
- Experimental Feature
- Initial Library
ArangoDB 3.9 (Q1 21)
- Draft for Front-End
- Extended Library
- Platform Improvements
ArangoDB 4.0 (Mid 21)
- GA
Pregel vs AQL
When to (not) use Pregel…
- Can the algorithm be efficiently be
expressed in Pregel?
- Counter example: Topological Sort
- Is the graph size worth the loading?
19
AQL Pregel
All Models (Graph, Document, Key-Value, Search, …) Iterative Graph Processing
Online Queries Large Graphs, multiple iterations
How can I start?
● Docker Image: arangodb/enterprise-preview:3.8.0-milestone.3
● Check existing algorithms
● Preview documentation
● Give Feedback
○ https://slack.arangodb.com/ -> custom-pregel
20
Thanks for listening!
21
Reach out with Feedback/Questions!
• @arangodb
• https://www.arangodb.com/
• docker pull arangodb
Test-drive Oasis
14-days for free

More Related Content

Similar to Custom Pregel Algorithms in ArangoDB (20)

Processing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelProcessing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) Pregel
ArangoDB Database
 
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
NoSQLmatters
 
Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1
ArangoDB Database
 
Pregel - Ezequiel Aguilar
Pregel - Ezequiel AguilarPregel - Ezequiel Aguilar
Pregel - Ezequiel Aguilar
Ezequiel Aguilar Gonzalez
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
sscdotopen
 
Andrea Iacono - Graphs are everywhere!
 Andrea Iacono - Graphs are everywhere! Andrea Iacono - Graphs are everywhere!
Andrea Iacono - Graphs are everywhere!
Codemotion
 
Pregel
PregelPregel
Pregel
Weiru Dai
 
Large Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache GiraphLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraph
sscdotopen
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
Riyad Parvez
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google Pregel
Max Neunhöffer
 
Pregel and giraph
Pregel and giraphPregel and giraph
Pregel and giraph
Cao Manh Dat
 
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph ProcessingMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Zuhair khayyat
 
Graphs
GraphsGraphs
Graphs
Steve Loughran
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDB
ArangoDB Database
 
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at... Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Big Data Spain
 
Processing Large Graphs in Hadoop
Processing Large Graphs in HadoopProcessing Large Graphs in Hadoop
Processing Large Graphs in Hadoop
Dani Solà Lagares
 
Graph processing
Graph processingGraph processing
Graph processing
yeahjs
 
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Webinar: ArangoDB 3.8 Preview - Analytics at Scale Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
ArangoDB Database
 
Pregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph ProcessingPregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph Processing
Chris Bunch
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
Yu Liu
 
Processing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelProcessing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) Pregel
ArangoDB Database
 
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
NoSQLmatters
 
Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1
ArangoDB Database
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
sscdotopen
 
Andrea Iacono - Graphs are everywhere!
 Andrea Iacono - Graphs are everywhere! Andrea Iacono - Graphs are everywhere!
Andrea Iacono - Graphs are everywhere!
Codemotion
 
Large Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache GiraphLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraph
sscdotopen
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
Riyad Parvez
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google Pregel
Max Neunhöffer
 
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph ProcessingMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Zuhair khayyat
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDB
ArangoDB Database
 
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at... Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Big Data Spain
 
Processing Large Graphs in Hadoop
Processing Large Graphs in HadoopProcessing Large Graphs in Hadoop
Processing Large Graphs in Hadoop
Dani Solà Lagares
 
Graph processing
Graph processingGraph processing
Graph processing
yeahjs
 
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Webinar: ArangoDB 3.8 Preview - Analytics at Scale Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
ArangoDB Database
 
Pregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph ProcessingPregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph Processing
Chris Bunch
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
Yu Liu
 

More from ArangoDB Database (20)

ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
ArangoDB Database
 
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at ScaleArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB Database
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
ArangoDB Database
 
Getting Started with ArangoDB Oasis
Getting Started with ArangoDB OasisGetting Started with ArangoDB Oasis
Getting Started with ArangoDB Oasis
ArangoDB Database
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
ArangoDB Database
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarA Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
ArangoDB Database
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
ArangoDB Database
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoDB Database
 
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at ScaleArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB Database
 
Webinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB OasisWebinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB Oasis
ArangoDB Database
 
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB Database
 
3.5 webinar
3.5 webinar 3.5 webinar
3.5 webinar
ArangoDB Database
 
Webinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDBWebinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDB
ArangoDB Database
 
An introduction to multi-model databases
An introduction to multi-model databasesAn introduction to multi-model databases
An introduction to multi-model databases
ArangoDB Database
 
Running complex data queries in a distributed system
Running complex data queries in a distributed systemRunning complex data queries in a distributed system
Running complex data queries in a distributed system
ArangoDB Database
 
Guacamole Fiesta: What do avocados and databases have in common?
Guacamole Fiesta: What do avocados and databases have in common?Guacamole Fiesta: What do avocados and databases have in common?
Guacamole Fiesta: What do avocados and databases have in common?
ArangoDB Database
 
Are you a Tortoise or a Hare?
Are you a Tortoise or a Hare?Are you a Tortoise or a Hare?
Are you a Tortoise or a Hare?
ArangoDB Database
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed Database
ArangoDB Database
 
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
ArangoDB Database
 
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at ScaleArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB Database
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
ArangoDB Database
 
Getting Started with ArangoDB Oasis
Getting Started with ArangoDB OasisGetting Started with ArangoDB Oasis
Getting Started with ArangoDB Oasis
ArangoDB Database
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
ArangoDB Database
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarA Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
ArangoDB Database
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
ArangoDB Database
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoDB Database
 
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at ScaleArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB Database
 
Webinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB OasisWebinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB Oasis
ArangoDB Database
 
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB Database
 
Webinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDBWebinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDB
ArangoDB Database
 
An introduction to multi-model databases
An introduction to multi-model databasesAn introduction to multi-model databases
An introduction to multi-model databases
ArangoDB Database
 
Running complex data queries in a distributed system
Running complex data queries in a distributed systemRunning complex data queries in a distributed system
Running complex data queries in a distributed system
ArangoDB Database
 
Guacamole Fiesta: What do avocados and databases have in common?
Guacamole Fiesta: What do avocados and databases have in common?Guacamole Fiesta: What do avocados and databases have in common?
Guacamole Fiesta: What do avocados and databases have in common?
ArangoDB Database
 
Are you a Tortoise or a Hare?
Are you a Tortoise or a Hare?Are you a Tortoise or a Hare?
Are you a Tortoise or a Hare?
ArangoDB Database
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed Database
ArangoDB Database
 
Ad

Recently uploaded (20)

apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays
 
Mining Presentation Online Courses for Student
Mining Presentation Online Courses for StudentMining Presentation Online Courses for Student
Mining Presentation Online Courses for Student
Rizki229625
 
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays
 
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays
 
Chronic constipation presentaion final.ppt
Chronic constipation presentaion final.pptChronic constipation presentaion final.ppt
Chronic constipation presentaion final.ppt
DrShashank7
 
Али махмуд to The teacm of ghsbh to fortune .pptx
Али махмуд to The teacm of ghsbh to fortune .pptxАли махмуд to The teacm of ghsbh to fortune .pptx
Али махмуд to The teacm of ghsbh to fortune .pptx
palr19411
 
apidays New York 2025 - Building Green Software by Marissa Jasso & Katya Drey...
apidays New York 2025 - Building Green Software by Marissa Jasso & Katya Drey...apidays New York 2025 - Building Green Software by Marissa Jasso & Katya Drey...
apidays New York 2025 - Building Green Software by Marissa Jasso & Katya Drey...
apidays
 
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays
 
THE FRIEDMAN TEST ( Biostatics B. Pharm)
THE FRIEDMAN TEST ( Biostatics B. Pharm)THE FRIEDMAN TEST ( Biostatics B. Pharm)
THE FRIEDMAN TEST ( Biostatics B. Pharm)
JishuHaldar
 
Report_Government Authorities_Index_ENG_FIN.pdf
Report_Government Authorities_Index_ENG_FIN.pdfReport_Government Authorities_Index_ENG_FIN.pdf
Report_Government Authorities_Index_ENG_FIN.pdf
OlhaTatokhina1
 
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays
 
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays
 
apidays New York 2025 - The FINOS Common Domain Model for Capital Markets by ...
apidays New York 2025 - The FINOS Common Domain Model for Capital Markets by ...apidays New York 2025 - The FINOS Common Domain Model for Capital Markets by ...
apidays New York 2025 - The FINOS Common Domain Model for Capital Markets by ...
apidays
 
AG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdf
AG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdfAG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdf
AG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdf
Anass Nabil
 
Managed Cloud services - Opsio Cloud Man
Managed Cloud services - Opsio Cloud ManManaged Cloud services - Opsio Cloud Man
Managed Cloud services - Opsio Cloud Man
Opsio Cloud
 
EPC UNIT-V forengineeringstudentsin.pptx
EPC UNIT-V forengineeringstudentsin.pptxEPC UNIT-V forengineeringstudentsin.pptx
EPC UNIT-V forengineeringstudentsin.pptx
ExtremerZ
 
egc.pdf tài liệu tiếng Anh cho học sinh THPT
egc.pdf tài liệu tiếng Anh cho học sinh THPTegc.pdf tài liệu tiếng Anh cho học sinh THPT
egc.pdf tài liệu tiếng Anh cho học sinh THPT
huyenmy200809
 
apidays New York 2025 - Unifying OpenAPI & AsyncAPI by Naresh Jain & Hari Kri...
apidays New York 2025 - Unifying OpenAPI & AsyncAPI by Naresh Jain & Hari Kri...apidays New York 2025 - Unifying OpenAPI & AsyncAPI by Naresh Jain & Hari Kri...
apidays New York 2025 - Unifying OpenAPI & AsyncAPI by Naresh Jain & Hari Kri...
apidays
 
Tableau Cloud - what to consider before making the move update 2025.pdf
Tableau Cloud - what to consider before making the move update 2025.pdfTableau Cloud - what to consider before making the move update 2025.pdf
Tableau Cloud - what to consider before making the move update 2025.pdf
elinavihriala
 
apidays Singapore 2025 - Building Finance Innovation Ecosystems by Umang Moon...
apidays Singapore 2025 - Building Finance Innovation Ecosystems by Umang Moon...apidays Singapore 2025 - Building Finance Innovation Ecosystems by Umang Moon...
apidays Singapore 2025 - Building Finance Innovation Ecosystems by Umang Moon...
apidays
 
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays
 
Mining Presentation Online Courses for Student
Mining Presentation Online Courses for StudentMining Presentation Online Courses for Student
Mining Presentation Online Courses for Student
Rizki229625
 
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays
 
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays
 
Chronic constipation presentaion final.ppt
Chronic constipation presentaion final.pptChronic constipation presentaion final.ppt
Chronic constipation presentaion final.ppt
DrShashank7
 
Али махмуд to The teacm of ghsbh to fortune .pptx
Али махмуд to The teacm of ghsbh to fortune .pptxАли махмуд to The teacm of ghsbh to fortune .pptx
Али махмуд to The teacm of ghsbh to fortune .pptx
palr19411
 
apidays New York 2025 - Building Green Software by Marissa Jasso & Katya Drey...
apidays New York 2025 - Building Green Software by Marissa Jasso & Katya Drey...apidays New York 2025 - Building Green Software by Marissa Jasso & Katya Drey...
apidays New York 2025 - Building Green Software by Marissa Jasso & Katya Drey...
apidays
 
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays
 
THE FRIEDMAN TEST ( Biostatics B. Pharm)
THE FRIEDMAN TEST ( Biostatics B. Pharm)THE FRIEDMAN TEST ( Biostatics B. Pharm)
THE FRIEDMAN TEST ( Biostatics B. Pharm)
JishuHaldar
 
Report_Government Authorities_Index_ENG_FIN.pdf
Report_Government Authorities_Index_ENG_FIN.pdfReport_Government Authorities_Index_ENG_FIN.pdf
Report_Government Authorities_Index_ENG_FIN.pdf
OlhaTatokhina1
 
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays
 
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays
 
apidays New York 2025 - The FINOS Common Domain Model for Capital Markets by ...
apidays New York 2025 - The FINOS Common Domain Model for Capital Markets by ...apidays New York 2025 - The FINOS Common Domain Model for Capital Markets by ...
apidays New York 2025 - The FINOS Common Domain Model for Capital Markets by ...
apidays
 
AG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdf
AG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdfAG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdf
AG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdf
Anass Nabil
 
Managed Cloud services - Opsio Cloud Man
Managed Cloud services - Opsio Cloud ManManaged Cloud services - Opsio Cloud Man
Managed Cloud services - Opsio Cloud Man
Opsio Cloud
 
EPC UNIT-V forengineeringstudentsin.pptx
EPC UNIT-V forengineeringstudentsin.pptxEPC UNIT-V forengineeringstudentsin.pptx
EPC UNIT-V forengineeringstudentsin.pptx
ExtremerZ
 
egc.pdf tài liệu tiếng Anh cho học sinh THPT
egc.pdf tài liệu tiếng Anh cho học sinh THPTegc.pdf tài liệu tiếng Anh cho học sinh THPT
egc.pdf tài liệu tiếng Anh cho học sinh THPT
huyenmy200809
 
apidays New York 2025 - Unifying OpenAPI & AsyncAPI by Naresh Jain & Hari Kri...
apidays New York 2025 - Unifying OpenAPI & AsyncAPI by Naresh Jain & Hari Kri...apidays New York 2025 - Unifying OpenAPI & AsyncAPI by Naresh Jain & Hari Kri...
apidays New York 2025 - Unifying OpenAPI & AsyncAPI by Naresh Jain & Hari Kri...
apidays
 
Tableau Cloud - what to consider before making the move update 2025.pdf
Tableau Cloud - what to consider before making the move update 2025.pdfTableau Cloud - what to consider before making the move update 2025.pdf
Tableau Cloud - what to consider before making the move update 2025.pdf
elinavihriala
 
apidays Singapore 2025 - Building Finance Innovation Ecosystems by Umang Moon...
apidays Singapore 2025 - Building Finance Innovation Ecosystems by Umang Moon...apidays Singapore 2025 - Building Finance Innovation Ecosystems by Umang Moon...
apidays Singapore 2025 - Building Finance Innovation Ecosystems by Umang Moon...
apidays
 
Ad

Custom Pregel Algorithms in ArangoDB

  • 1. Feature Preview: Custom Pregel Complex Graph Algorithms made Easy @arangodb @joerg_schad @hkernbach
  • 2. 2 tl;dr ● “Many practical computing problems concern large graphs.” ● ArangoDB is a “Beyond Graph Database” supporting multiple data models around a scalable graph foundation ● Pregel is a framework for distributed graph processing ○ ArangoDB supports predefined Prgel algorithms, e.g. PageRank, Single-Source Shortest Path and Connected components. ● Programmable Pregel Algorithms (PPA) allows adding/modifying algorithms on the flight Disclaimer This is an experimental feature and especially the language specification (front-end) is still under development!
  • 3. Jörg Schad, PhD Head of Engineering and ML @ArangoDB ● Suki.ai ● Mesosphere ● Architect @SAP Hana ● PhD Distributed DB Systems ● Twitter: @joerg_schad
  • 4. 4 Heiko Kernbach Core Engineer (Graphs Team) @ ● Graph ● Custom Pregel ● Geo / UI ● Twitter: @hkernbach ● Slack: hkernbach.ArangoDB
  • 5. 5 ● Open Source ● Beyond Graph Database ○ Stores, K/V, Documents connected by scalable Graph Processing ● Scalable ○ Distributed Graphs ● AQL - SQL-like multi-model query language ● ACID Transactions including Multi Collection Transactions
  • 7. https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/ Pregel Max Value While not converged: Communicate: send own value to neighbours Compute: Own value = Max Value from all messages (+ own value) Superstep
  • 8. ArangoDB and Pregel: Status Quo ● https://www.arangodb.com/docs/stable/graphs-pregel.html ● https://www.arangodb.com/pregel-community-detection/ Available Algorithms ● Page Rank ● Seeded PageRank ● Single-Source Shortest Path ● Connected Components ○ Component ○ WeaklyConnected ○ StronglyConnected ● Hyperlink-Induced Topic Search (HITS)Permalink ● Vertex Centrality ● Effective Closeness ● LineRank ● Label Propagation ● Speaker-Listener Label Propagation 8 var pregel = require("@arangodb/pregel"); pregel.start("pagerank", "graphname", {maxGSS: 100, threshold: 0.00000001, resultField: "rank"}) ● Pregel support since 2014 ● Predefined algorithms ○ Could be extended via C++ ● Same platform used for PPA Challenges Add and modify Algorithms
  • 9. Programmable Pregel Algorithms (PPA) const pregel = require("@arangodb/pregel"); let pregelID = pregel.start("air", graphName, "<custom-algorithm>"); var status = pregel.status(pregelID); ● Add/Modify algorithms on-the-fly ○ Without C++ code ○ Without restarting the Database ● Efficiency (as Pregel) depends on Sharding ○ Smart Graphs ○ Required: Collocation of vertices and edges 9
  • 10. Custom Algorithm 10 { "resultField": "<string>", "maxGSS": "<number>", "dataAccess": { "writeVertex": "<program>", "readVertex": "<array>", "readEdge": "<array>" }, "vertexAccumulators": "<object>", "globalAccumulators": "<object>", "customAccumulators": "<object>", "phases": "<array>" } Accumulators Accumulators are used to consume and process messages which are being sent to them during the computational phase (initProgram, updateProgram, onPreStep, onPostStep) of a superstep. After a superstep is done, all messages will be processed. ● max: stores the maximum of all messages received. ● min: stores the minimum of all messages received. ● sum: sums up all messages received. ● and: computes and on all messages received. ● or: computes or and all messages received. ● store: holds the last received value (non-deterministic). ● list: stores all received values in list (order is non-deterministic). ● custom
  • 11. Custom Algorithm 11 { "resultField": "<string>", "maxGSS": "<number>", "dataAccess": { "writeVertex": "<program>", "readVertex": "<array>", "readEdge": "<array>" }, "vertexAccumulators": "<object>", "globalAccumulators": "<object>", "customAccumulators": "<object>", "phases": "<array>" } ● resultField (string, optional): Name of the document attribute to store the result in. The vertex computation results will be in all vertices pointing to the given attribute. ● maxGSS (number, required): The max amount of global supersteps After the amount of max defined supersteps is reached, the Pregel execution will stop. ● dataAccess (object, optional): Allows to define writeVertex, readVertex and readEdge. ○ writeVertex: A program that is used to write the results into vertices. If writeVertex is used, the resultField will be ignored. ○ readVertex: An array that consists of strings and/or additional arrays (that represents a path). ■ string: Represents a single attribute at the top level. ■ array of strings: Represents a nested path ○ readEdge: An array that consists of strings and/or additional arrays (that represents a path). ■ string: Represents a single path at the top level which is not nested. ■ array of strings: Represents a nested path ● vertexAccumulators (object, optional): Definition of all used vertex accumulators. ● globalAccumulators (object, optional): Definition all used global accumulators. Global Accumulators are able to access variables at shared global level. ● customAccumulators (object, optional): Definition of all used custom accumulators. ● phases (array): Array of a single or multiple phase definitions. ● debug (optional): See Debugging.
  • 12. Phases - Execution order 12 Step 1: Initialization 1. onPreStep (Conductor, executed on Coordinator instances) 2. initProgram (Worker, executed on DB-Server instances) 3. onPostStep (Conductor) Step {2, ...n} Computation 1. onPreStep (Conductor) 2. updateProgram (Worker) 3. onPostStep (Conductor)
  • 13. Program - Arango Intermediate Representation (AIR) 13
  • 14. Program - Arango Intermediate Representation (AIR) Lisp-like intermediate representation, represented in JSON and supports its data types 14 Specification ● Language Primitives ○ Basic Algebraic Operators ○ Logical operators ○ Comparison operators ○ Lists ○ Sort ○ Dicts ○ Lambdas ○ Reduce ○ Utilities ○ Functional ○ Variables ○ Debug operators ● Math Library ● Special Form ○ let statement ○ seq statement ○ if statement ○ match statement ○ for-each statement ○ quote and quote-splice statements ○ quasi-quote, unquote and unquote-splice statements ○ cons statement ○ and and or statements
  • 15. Program - Arango Intermediate Representation (AIR) Lisp-like intermediate representation, represented in JSON and supports its data types 15 Specification ● Language Primitives ○ Basic Algebraic Operators ○ Logical operators ○ Comparison operators ○ Lists ○ Sort ○ Dicts ○ Lambdas ○ Reduce ○ Utilities ○ Functional ○ Variables ○ Debug operators ● Math Library ● Special Form ○ let statement ○ seq statement ○ if statement ○ match statement ○ for-each statement ○ quote and quote-splice statements ○ quasi-quote, unquote and unquote-splice statements ○ cons statement ○ and and or statements
  • 16. Pregelator Simple Foxx service based IDE 16https://github.com/arangodb-foxx/pregelator
  • 18. PPA: What is next? - Gather Feedback - In particular use-cases - Missing functions & functionality - User-friendly Front-End language - Improve Scale/Performance of underlying Pregel platform - Algorithm library - Blog Post (including Jupyter example) 18 ArangoDB 3.8 (end of year) - Experimental Feature - Initial Library ArangoDB 3.9 (Q1 21) - Draft for Front-End - Extended Library - Platform Improvements ArangoDB 4.0 (Mid 21) - GA
  • 19. Pregel vs AQL When to (not) use Pregel… - Can the algorithm be efficiently be expressed in Pregel? - Counter example: Topological Sort - Is the graph size worth the loading? 19 AQL Pregel All Models (Graph, Document, Key-Value, Search, …) Iterative Graph Processing Online Queries Large Graphs, multiple iterations
  • 20. How can I start? ● Docker Image: arangodb/enterprise-preview:3.8.0-milestone.3 ● Check existing algorithms ● Preview documentation ● Give Feedback ○ https://slack.arangodb.com/ -> custom-pregel 20
  • 21. Thanks for listening! 21 Reach out with Feedback/Questions! • @arangodb • https://www.arangodb.com/ • docker pull arangodb Test-drive Oasis 14-days for free