SlideShare a Scribd company logo
Feature Preview: Custom Pregel
Complex Graph Algorithms made Easy
@arangodb @joerg_schad @hkernbach
2
tl;dr
● “Many practical computing problems concern large
graphs.”
● ArangoDB is a “Beyond Graph Database”
supporting multiple data models around a scalable
graph foundation
● Pregel is a framework for distributed graph
processing
○ ArangoDB supports predefined Prgel algorithms, e.g.
PageRank, Single-Source Shortest Path and Connected
components.
● Programmable Pregel Algorithms (PPA) allows
adding/modifying algorithms on the flight
Disclaimer
This is an experimental
feature and especially the
language specification
(front-end) is still under
development!
Jörg Schad, PhD
Head of Engineering and ML
@ArangoDB
● Suki.ai
● Mesosphere
● Architect @SAP Hana
● PhD Distributed DB
Systems
● Twitter: @joerg_schad
4
Heiko Kernbach
Core Engineer (Graphs Team)
@
● Graph
● Custom Pregel
● Geo / UI
● Twitter: @hkernbach
● Slack:
hkernbach.ArangoDB
5
● Open Source
● Beyond Graph Database
○ Stores, K/V, Documents connected by
scalable Graph Processing
● Scalable
○ Distributed Graphs
● AQL - SQL-like multi-model query language
● ACID Transactions including Multi Collection
Transactions
https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/
https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/
Pregel Max Value
While not converged:
Communicate: send own value to neighbours
Compute: Own value = Max Value from all messages (+ own value) Superstep
ArangoDB and Pregel: Status Quo
● https://www.arangodb.com/docs/stable/graphs-pregel.html
● https://www.arangodb.com/pregel-community-detection/
Available Algorithms
● Page Rank
● Seeded PageRank
● Single-Source Shortest Path
● Connected Components
○ Component
○ WeaklyConnected
○ StronglyConnected
● Hyperlink-Induced Topic Search
(HITS)Permalink
● Vertex Centrality
● Effective Closeness
● LineRank
● Label Propagation
● Speaker-Listener Label Propagation 8
var pregel = require("@arangodb/pregel");
pregel.start("pagerank", "graphname", {maxGSS: 100,
threshold: 0.00000001, resultField: "rank"})
● Pregel support since 2014
● Predefined algorithms
○ Could be extended via C++
● Same platform used for PPA
Challenges
Add and modify Algorithms
Programmable Pregel Algorithms (PPA)
const pregel = require("@arangodb/pregel");
let pregelID = pregel.start("air", graphName, "<custom-algorithm>");
var status = pregel.status(pregelID);
● Add/Modify algorithms on-the-fly
○ Without C++ code
○ Without restarting the Database
● Efficiency (as Pregel) depends on Sharding
○ Smart Graphs
○ Required: Collocation of vertices and edges
9
Custom Algorithm
10
{
"resultField": "<string>",
"maxGSS": "<number>",
"dataAccess": {
"writeVertex": "<program>",
"readVertex": "<array>",
"readEdge": "<array>"
},
"vertexAccumulators": "<object>",
"globalAccumulators": "<object>",
"customAccumulators": "<object>",
"phases": "<array>"
}
Accumulators
Accumulators are used to consume and process messages which are being
sent to them during the computational phase (initProgram, updateProgram,
onPreStep, onPostStep) of a superstep. After a superstep is done, all messages
will be processed.
● max: stores the maximum of all messages received.
● min: stores the minimum of all messages received.
● sum: sums up all messages received.
● and: computes and on all messages received.
● or: computes or and all messages received.
● store: holds the last received value (non-deterministic).
● list: stores all received values in list (order is non-deterministic).
● custom
Custom Algorithm
11
{
"resultField": "<string>",
"maxGSS": "<number>",
"dataAccess": {
"writeVertex": "<program>",
"readVertex": "<array>",
"readEdge": "<array>"
},
"vertexAccumulators": "<object>",
"globalAccumulators": "<object>",
"customAccumulators": "<object>",
"phases": "<array>"
}
● resultField (string, optional): Name of the document attribute to store the result in. The
vertex computation results will be in all vertices pointing to the given attribute.
● maxGSS (number, required): The max amount of global supersteps After the amount of max
defined supersteps is reached, the Pregel execution will stop.
● dataAccess (object, optional): Allows to define writeVertex, readVertex and readEdge.
○ writeVertex: A program that is used to write the results into vertices. If writeVertex is
used, the resultField will be ignored.
○ readVertex: An array that consists of strings and/or additional arrays (that represents
a path).
■ string: Represents a single attribute at the top level.
■ array of strings: Represents a nested path
○ readEdge: An array that consists of strings and/or additional arrays (that represents
a path).
■ string: Represents a single path at the top level which is not nested.
■ array of strings: Represents a nested path
● vertexAccumulators (object, optional): Definition of all used vertex accumulators.
● globalAccumulators (object, optional): Definition all used global accumulators. Global
Accumulators are able to access variables at shared global level.
● customAccumulators (object, optional): Definition of all used custom accumulators.
● phases (array): Array of a single or multiple phase definitions.
● debug (optional): See Debugging.
Phases - Execution order
12
Step 1: Initialization
1. onPreStep (Conductor, executed on Coordinator
instances)
2. initProgram (Worker, executed on DB-Server instances)
3. onPostStep (Conductor)
Step {2, ...n} Computation
1. onPreStep (Conductor)
2. updateProgram (Worker)
3. onPostStep (Conductor)
Program - Arango Intermediate Representation (AIR)
13
Program - Arango Intermediate Representation (AIR)
Lisp-like intermediate representation, represented in
JSON and supports its data types
14
Specification
● Language Primitives
○ Basic Algebraic Operators
○ Logical operators
○ Comparison operators
○ Lists
○ Sort
○ Dicts
○ Lambdas
○ Reduce
○ Utilities
○ Functional
○ Variables
○ Debug operators
● Math Library
● Special Form
○ let statement
○ seq statement
○ if statement
○ match statement
○ for-each statement
○ quote and quote-splice
statements
○ quasi-quote, unquote and
unquote-splice statements
○ cons statement
○ and and or statements
Program - Arango Intermediate Representation (AIR)
Lisp-like intermediate representation,
represented in JSON and supports its data types
15
Specification
● Language Primitives
○ Basic Algebraic Operators
○ Logical operators
○ Comparison operators
○ Lists
○ Sort
○ Dicts
○ Lambdas
○ Reduce
○ Utilities
○ Functional
○ Variables
○ Debug operators
● Math Library
● Special Form
○ let statement
○ seq statement
○ if statement
○ match statement
○ for-each statement
○ quote and quote-splice
statements
○ quasi-quote, unquote and
unquote-splice statements
○ cons statement
○ and and or statements
Pregelator
Simple Foxx service based IDE
16https://github.com/arangodb-foxx/pregelator
Custom Pregel Algorithms in ArangoDB
PPA: What is next?
- Gather Feedback
- In particular use-cases
- Missing functions & functionality
- User-friendly Front-End language
- Improve Scale/Performance of underlying
Pregel platform
- Algorithm library
- Blog Post (including Jupyter example)
18
ArangoDB 3.8 (end of year)
- Experimental Feature
- Initial Library
ArangoDB 3.9 (Q1 21)
- Draft for Front-End
- Extended Library
- Platform Improvements
ArangoDB 4.0 (Mid 21)
- GA
Pregel vs AQL
When to (not) use Pregel…
- Can the algorithm be efficiently be
expressed in Pregel?
- Counter example: Topological Sort
- Is the graph size worth the loading?
19
AQL Pregel
All Models (Graph, Document, Key-Value, Search, …) Iterative Graph Processing
Online Queries Large Graphs, multiple iterations
How can I start?
● Docker Image: arangodb/enterprise-preview:3.8.0-milestone.3
● Check existing algorithms
● Preview documentation
● Give Feedback
○ https://slack.arangodb.com/ -> custom-pregel
20
Thanks for listening!
21
Reach out with Feedback/Questions!
• @arangodb
• https://www.arangodb.com/
• docker pull arangodb
Test-drive Oasis
14-days for free

More Related Content

Similar to Custom Pregel Algorithms in ArangoDB (20)

Building Your First Apache Apex Application
Building Your First Apache Apex ApplicationBuilding Your First Apache Apex Application
Building Your First Apache Apex Application
Apache Apex
 
Building your first aplication using Apache Apex
Building your first aplication using Apache ApexBuilding your first aplication using Apache Apex
Building your first aplication using Apache Apex
Yogi Devendra Vyavahare
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
James Tan
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
Mao Geng
 
Oracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration HustleOracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration Hustle
EDB
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
Lucian Neghina
 
Java 8
Java 8Java 8
Java 8
vilniusjug
 
Dart the better Javascript 2015
Dart the better Javascript 2015Dart the better Javascript 2015
Dart the better Javascript 2015
Jorg Janke
 
BUD17-302: LLVM Internals #2
BUD17-302: LLVM Internals #2 BUD17-302: LLVM Internals #2
BUD17-302: LLVM Internals #2
Linaro
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
Petr Zapletal
 
CS267_Graph_Lab
CS267_Graph_LabCS267_Graph_Lab
CS267_Graph_Lab
JaideepKatkar
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
Omid Vahdaty
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAs
Luis Marques
 
Meetup C++ A brief overview of c++17
Meetup C++  A brief overview of c++17Meetup C++  A brief overview of c++17
Meetup C++ A brief overview of c++17
Daniel Eriksson
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
fnothaft
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
Martin Zapletal
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream API
Apache Apex
 
Tech Talk - Overview of Dash framework for building dashboards
Tech Talk - Overview of Dash framework for building dashboardsTech Talk - Overview of Dash framework for building dashboards
Tech Talk - Overview of Dash framework for building dashboards
Appsilon Data Science
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalyst
datamantra
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Ran Silberman
 
Building Your First Apache Apex Application
Building Your First Apache Apex ApplicationBuilding Your First Apache Apex Application
Building Your First Apache Apex Application
Apache Apex
 
Building your first aplication using Apache Apex
Building your first aplication using Apache ApexBuilding your first aplication using Apache Apex
Building your first aplication using Apache Apex
Yogi Devendra Vyavahare
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
James Tan
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
Mao Geng
 
Oracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration HustleOracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration Hustle
EDB
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
Lucian Neghina
 
Dart the better Javascript 2015
Dart the better Javascript 2015Dart the better Javascript 2015
Dart the better Javascript 2015
Jorg Janke
 
BUD17-302: LLVM Internals #2
BUD17-302: LLVM Internals #2 BUD17-302: LLVM Internals #2
BUD17-302: LLVM Internals #2
Linaro
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
Petr Zapletal
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
Omid Vahdaty
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAs
Luis Marques
 
Meetup C++ A brief overview of c++17
Meetup C++  A brief overview of c++17Meetup C++  A brief overview of c++17
Meetup C++ A brief overview of c++17
Daniel Eriksson
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
fnothaft
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
Martin Zapletal
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream API
Apache Apex
 
Tech Talk - Overview of Dash framework for building dashboards
Tech Talk - Overview of Dash framework for building dashboardsTech Talk - Overview of Dash framework for building dashboards
Tech Talk - Overview of Dash framework for building dashboards
Appsilon Data Science
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalyst
datamantra
 

More from ArangoDB Database (20)

ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
ArangoDB Database
 
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at ScaleArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB Database
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
ArangoDB Database
 
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Webinar: ArangoDB 3.8 Preview - Analytics at Scale Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
ArangoDB Database
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDB
ArangoDB Database
 
Getting Started with ArangoDB Oasis
Getting Started with ArangoDB OasisGetting Started with ArangoDB Oasis
Getting Started with ArangoDB Oasis
ArangoDB Database
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
ArangoDB Database
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarA Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
ArangoDB Database
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
ArangoDB Database
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoDB Database
 
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at ScaleArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB Database
 
Webinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB OasisWebinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB Oasis
ArangoDB Database
 
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB Database
 
3.5 webinar
3.5 webinar 3.5 webinar
3.5 webinar
ArangoDB Database
 
Webinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDBWebinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDB
ArangoDB Database
 
An introduction to multi-model databases
An introduction to multi-model databasesAn introduction to multi-model databases
An introduction to multi-model databases
ArangoDB Database
 
Running complex data queries in a distributed system
Running complex data queries in a distributed systemRunning complex data queries in a distributed system
Running complex data queries in a distributed system
ArangoDB Database
 
Guacamole Fiesta: What do avocados and databases have in common?
Guacamole Fiesta: What do avocados and databases have in common?Guacamole Fiesta: What do avocados and databases have in common?
Guacamole Fiesta: What do avocados and databases have in common?
ArangoDB Database
 
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
ArangoDB Database
 
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at ScaleArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB Database
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
ArangoDB Database
 
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Webinar: ArangoDB 3.8 Preview - Analytics at Scale Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
ArangoDB Database
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDB
ArangoDB Database
 
Getting Started with ArangoDB Oasis
Getting Started with ArangoDB OasisGetting Started with ArangoDB Oasis
Getting Started with ArangoDB Oasis
ArangoDB Database
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
ArangoDB Database
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarA Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
ArangoDB Database
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
ArangoDB Database
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoDB Database
 
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at ScaleArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB Database
 
Webinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB OasisWebinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB Oasis
ArangoDB Database
 
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB Database
 
Webinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDBWebinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDB
ArangoDB Database
 
An introduction to multi-model databases
An introduction to multi-model databasesAn introduction to multi-model databases
An introduction to multi-model databases
ArangoDB Database
 
Running complex data queries in a distributed system
Running complex data queries in a distributed systemRunning complex data queries in a distributed system
Running complex data queries in a distributed system
ArangoDB Database
 
Guacamole Fiesta: What do avocados and databases have in common?
Guacamole Fiesta: What do avocados and databases have in common?Guacamole Fiesta: What do avocados and databases have in common?
Guacamole Fiesta: What do avocados and databases have in common?
ArangoDB Database
 
Ad

Recently uploaded (20)

Ch01_Introduction_to_Information_Securit
Ch01_Introduction_to_Information_SecuritCh01_Introduction_to_Information_Securit
Ch01_Introduction_to_Information_Securit
KawukiDerrick
 
Chapter 5.1.pptxsertj you can get it done before the election and I will
Chapter 5.1.pptxsertj you can get it done before the election and I willChapter 5.1.pptxsertj you can get it done before the election and I will
Chapter 5.1.pptxsertj you can get it done before the election and I will
SotheaPheng
 
Tableau Finland User Group June 2025.pdf
Tableau Finland User Group June 2025.pdfTableau Finland User Group June 2025.pdf
Tableau Finland User Group June 2025.pdf
elinavihriala
 
BODMAS-Rule-&-Unit-Digit-Concept-pdf.pdf
BODMAS-Rule-&-Unit-Digit-Concept-pdf.pdfBODMAS-Rule-&-Unit-Digit-Concept-pdf.pdf
BODMAS-Rule-&-Unit-Digit-Concept-pdf.pdf
SiddharthSean
 
Mining Presentation Online Courses for Student
Mining Presentation Online Courses for StudentMining Presentation Online Courses for Student
Mining Presentation Online Courses for Student
Rizki229625
 
apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...
apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...
apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...
apidays
 
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays
 
apidays New York 2025 - Spring Modulith Design for Microservices by Renjith R...
apidays New York 2025 - Spring Modulith Design for Microservices by Renjith R...apidays New York 2025 - Spring Modulith Design for Microservices by Renjith R...
apidays New York 2025 - Spring Modulith Design for Microservices by Renjith R...
apidays
 
AG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdf
AG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdfAG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdf
AG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdf
Anass Nabil
 
FYE 2025 Business Results and FYE 2026 Management Plan.pptx
FYE 2025 Business Results and FYE 2026 Management Plan.pptxFYE 2025 Business Results and FYE 2026 Management Plan.pptx
FYE 2025 Business Results and FYE 2026 Management Plan.pptx
Herianto80
 
1022_ExtendEnrichExcelUsingPythonWithTableau_04_16+04_17 (1).pdf
1022_ExtendEnrichExcelUsingPythonWithTableau_04_16+04_17 (1).pdf1022_ExtendEnrichExcelUsingPythonWithTableau_04_16+04_17 (1).pdf
1022_ExtendEnrichExcelUsingPythonWithTableau_04_16+04_17 (1).pdf
elinavihriala
 
Али махмуд to The teacm of ghsbh to fortune .pptx
Али махмуд to The teacm of ghsbh to fortune .pptxАли махмуд to The teacm of ghsbh to fortune .pptx
Али махмуд to The teacm of ghsbh to fortune .pptx
palr19411
 
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays
 
apidays New York 2025 - The Challenge is Not the Pattern, But the Best Integr...
apidays New York 2025 - The Challenge is Not the Pattern, But the Best Integr...apidays New York 2025 - The Challenge is Not the Pattern, But the Best Integr...
apidays New York 2025 - The Challenge is Not the Pattern, But the Best Integr...
apidays
 
"Machine Learning in Agriculture: 12 Production-Grade Models", Danil Polyakov
"Machine Learning in Agriculture: 12 Production-Grade Models", Danil Polyakov"Machine Learning in Agriculture: 12 Production-Grade Models", Danil Polyakov
"Machine Learning in Agriculture: 12 Production-Grade Models", Danil Polyakov
Fwdays
 
Math arihant handbook.pdf all formula is here
Math arihant handbook.pdf all formula is hereMath arihant handbook.pdf all formula is here
Math arihant handbook.pdf all formula is here
rdarshankumar84
 
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptxLONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
vemuripraveena2622
 
原版英国威尔士三一圣大卫大学毕业证(UWTSD毕业证书)如何办理
原版英国威尔士三一圣大卫大学毕业证(UWTSD毕业证书)如何办理原版英国威尔士三一圣大卫大学毕业证(UWTSD毕业证书)如何办理
原版英国威尔士三一圣大卫大学毕业证(UWTSD毕业证书)如何办理
Taqyea
 
egc.pdf tài liệu tiếng Anh cho học sinh THPT
egc.pdf tài liệu tiếng Anh cho học sinh THPTegc.pdf tài liệu tiếng Anh cho học sinh THPT
egc.pdf tài liệu tiếng Anh cho học sinh THPT
huyenmy200809
 
apidays New York 2025 - Building Agentic Workflows with FDC3 Intents by Nick ...
apidays New York 2025 - Building Agentic Workflows with FDC3 Intents by Nick ...apidays New York 2025 - Building Agentic Workflows with FDC3 Intents by Nick ...
apidays New York 2025 - Building Agentic Workflows with FDC3 Intents by Nick ...
apidays
 
Ch01_Introduction_to_Information_Securit
Ch01_Introduction_to_Information_SecuritCh01_Introduction_to_Information_Securit
Ch01_Introduction_to_Information_Securit
KawukiDerrick
 
Chapter 5.1.pptxsertj you can get it done before the election and I will
Chapter 5.1.pptxsertj you can get it done before the election and I willChapter 5.1.pptxsertj you can get it done before the election and I will
Chapter 5.1.pptxsertj you can get it done before the election and I will
SotheaPheng
 
Tableau Finland User Group June 2025.pdf
Tableau Finland User Group June 2025.pdfTableau Finland User Group June 2025.pdf
Tableau Finland User Group June 2025.pdf
elinavihriala
 
BODMAS-Rule-&-Unit-Digit-Concept-pdf.pdf
BODMAS-Rule-&-Unit-Digit-Concept-pdf.pdfBODMAS-Rule-&-Unit-Digit-Concept-pdf.pdf
BODMAS-Rule-&-Unit-Digit-Concept-pdf.pdf
SiddharthSean
 
Mining Presentation Online Courses for Student
Mining Presentation Online Courses for StudentMining Presentation Online Courses for Student
Mining Presentation Online Courses for Student
Rizki229625
 
apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...
apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...
apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...
apidays
 
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays
 
apidays New York 2025 - Spring Modulith Design for Microservices by Renjith R...
apidays New York 2025 - Spring Modulith Design for Microservices by Renjith R...apidays New York 2025 - Spring Modulith Design for Microservices by Renjith R...
apidays New York 2025 - Spring Modulith Design for Microservices by Renjith R...
apidays
 
AG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdf
AG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdfAG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdf
AG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdf
Anass Nabil
 
FYE 2025 Business Results and FYE 2026 Management Plan.pptx
FYE 2025 Business Results and FYE 2026 Management Plan.pptxFYE 2025 Business Results and FYE 2026 Management Plan.pptx
FYE 2025 Business Results and FYE 2026 Management Plan.pptx
Herianto80
 
1022_ExtendEnrichExcelUsingPythonWithTableau_04_16+04_17 (1).pdf
1022_ExtendEnrichExcelUsingPythonWithTableau_04_16+04_17 (1).pdf1022_ExtendEnrichExcelUsingPythonWithTableau_04_16+04_17 (1).pdf
1022_ExtendEnrichExcelUsingPythonWithTableau_04_16+04_17 (1).pdf
elinavihriala
 
Али махмуд to The teacm of ghsbh to fortune .pptx
Али махмуд to The teacm of ghsbh to fortune .pptxАли махмуд to The teacm of ghsbh to fortune .pptx
Али махмуд to The teacm of ghsbh to fortune .pptx
palr19411
 
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays
 
apidays New York 2025 - The Challenge is Not the Pattern, But the Best Integr...
apidays New York 2025 - The Challenge is Not the Pattern, But the Best Integr...apidays New York 2025 - The Challenge is Not the Pattern, But the Best Integr...
apidays New York 2025 - The Challenge is Not the Pattern, But the Best Integr...
apidays
 
"Machine Learning in Agriculture: 12 Production-Grade Models", Danil Polyakov
"Machine Learning in Agriculture: 12 Production-Grade Models", Danil Polyakov"Machine Learning in Agriculture: 12 Production-Grade Models", Danil Polyakov
"Machine Learning in Agriculture: 12 Production-Grade Models", Danil Polyakov
Fwdays
 
Math arihant handbook.pdf all formula is here
Math arihant handbook.pdf all formula is hereMath arihant handbook.pdf all formula is here
Math arihant handbook.pdf all formula is here
rdarshankumar84
 
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptxLONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
vemuripraveena2622
 
原版英国威尔士三一圣大卫大学毕业证(UWTSD毕业证书)如何办理
原版英国威尔士三一圣大卫大学毕业证(UWTSD毕业证书)如何办理原版英国威尔士三一圣大卫大学毕业证(UWTSD毕业证书)如何办理
原版英国威尔士三一圣大卫大学毕业证(UWTSD毕业证书)如何办理
Taqyea
 
egc.pdf tài liệu tiếng Anh cho học sinh THPT
egc.pdf tài liệu tiếng Anh cho học sinh THPTegc.pdf tài liệu tiếng Anh cho học sinh THPT
egc.pdf tài liệu tiếng Anh cho học sinh THPT
huyenmy200809
 
apidays New York 2025 - Building Agentic Workflows with FDC3 Intents by Nick ...
apidays New York 2025 - Building Agentic Workflows with FDC3 Intents by Nick ...apidays New York 2025 - Building Agentic Workflows with FDC3 Intents by Nick ...
apidays New York 2025 - Building Agentic Workflows with FDC3 Intents by Nick ...
apidays
 
Ad

Custom Pregel Algorithms in ArangoDB

  • 1. Feature Preview: Custom Pregel Complex Graph Algorithms made Easy @arangodb @joerg_schad @hkernbach
  • 2. 2 tl;dr ● “Many practical computing problems concern large graphs.” ● ArangoDB is a “Beyond Graph Database” supporting multiple data models around a scalable graph foundation ● Pregel is a framework for distributed graph processing ○ ArangoDB supports predefined Prgel algorithms, e.g. PageRank, Single-Source Shortest Path and Connected components. ● Programmable Pregel Algorithms (PPA) allows adding/modifying algorithms on the flight Disclaimer This is an experimental feature and especially the language specification (front-end) is still under development!
  • 3. Jörg Schad, PhD Head of Engineering and ML @ArangoDB ● Suki.ai ● Mesosphere ● Architect @SAP Hana ● PhD Distributed DB Systems ● Twitter: @joerg_schad
  • 4. 4 Heiko Kernbach Core Engineer (Graphs Team) @ ● Graph ● Custom Pregel ● Geo / UI ● Twitter: @hkernbach ● Slack: hkernbach.ArangoDB
  • 5. 5 ● Open Source ● Beyond Graph Database ○ Stores, K/V, Documents connected by scalable Graph Processing ● Scalable ○ Distributed Graphs ● AQL - SQL-like multi-model query language ● ACID Transactions including Multi Collection Transactions
  • 7. https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/ Pregel Max Value While not converged: Communicate: send own value to neighbours Compute: Own value = Max Value from all messages (+ own value) Superstep
  • 8. ArangoDB and Pregel: Status Quo ● https://www.arangodb.com/docs/stable/graphs-pregel.html ● https://www.arangodb.com/pregel-community-detection/ Available Algorithms ● Page Rank ● Seeded PageRank ● Single-Source Shortest Path ● Connected Components ○ Component ○ WeaklyConnected ○ StronglyConnected ● Hyperlink-Induced Topic Search (HITS)Permalink ● Vertex Centrality ● Effective Closeness ● LineRank ● Label Propagation ● Speaker-Listener Label Propagation 8 var pregel = require("@arangodb/pregel"); pregel.start("pagerank", "graphname", {maxGSS: 100, threshold: 0.00000001, resultField: "rank"}) ● Pregel support since 2014 ● Predefined algorithms ○ Could be extended via C++ ● Same platform used for PPA Challenges Add and modify Algorithms
  • 9. Programmable Pregel Algorithms (PPA) const pregel = require("@arangodb/pregel"); let pregelID = pregel.start("air", graphName, "<custom-algorithm>"); var status = pregel.status(pregelID); ● Add/Modify algorithms on-the-fly ○ Without C++ code ○ Without restarting the Database ● Efficiency (as Pregel) depends on Sharding ○ Smart Graphs ○ Required: Collocation of vertices and edges 9
  • 10. Custom Algorithm 10 { "resultField": "<string>", "maxGSS": "<number>", "dataAccess": { "writeVertex": "<program>", "readVertex": "<array>", "readEdge": "<array>" }, "vertexAccumulators": "<object>", "globalAccumulators": "<object>", "customAccumulators": "<object>", "phases": "<array>" } Accumulators Accumulators are used to consume and process messages which are being sent to them during the computational phase (initProgram, updateProgram, onPreStep, onPostStep) of a superstep. After a superstep is done, all messages will be processed. ● max: stores the maximum of all messages received. ● min: stores the minimum of all messages received. ● sum: sums up all messages received. ● and: computes and on all messages received. ● or: computes or and all messages received. ● store: holds the last received value (non-deterministic). ● list: stores all received values in list (order is non-deterministic). ● custom
  • 11. Custom Algorithm 11 { "resultField": "<string>", "maxGSS": "<number>", "dataAccess": { "writeVertex": "<program>", "readVertex": "<array>", "readEdge": "<array>" }, "vertexAccumulators": "<object>", "globalAccumulators": "<object>", "customAccumulators": "<object>", "phases": "<array>" } ● resultField (string, optional): Name of the document attribute to store the result in. The vertex computation results will be in all vertices pointing to the given attribute. ● maxGSS (number, required): The max amount of global supersteps After the amount of max defined supersteps is reached, the Pregel execution will stop. ● dataAccess (object, optional): Allows to define writeVertex, readVertex and readEdge. ○ writeVertex: A program that is used to write the results into vertices. If writeVertex is used, the resultField will be ignored. ○ readVertex: An array that consists of strings and/or additional arrays (that represents a path). ■ string: Represents a single attribute at the top level. ■ array of strings: Represents a nested path ○ readEdge: An array that consists of strings and/or additional arrays (that represents a path). ■ string: Represents a single path at the top level which is not nested. ■ array of strings: Represents a nested path ● vertexAccumulators (object, optional): Definition of all used vertex accumulators. ● globalAccumulators (object, optional): Definition all used global accumulators. Global Accumulators are able to access variables at shared global level. ● customAccumulators (object, optional): Definition of all used custom accumulators. ● phases (array): Array of a single or multiple phase definitions. ● debug (optional): See Debugging.
  • 12. Phases - Execution order 12 Step 1: Initialization 1. onPreStep (Conductor, executed on Coordinator instances) 2. initProgram (Worker, executed on DB-Server instances) 3. onPostStep (Conductor) Step {2, ...n} Computation 1. onPreStep (Conductor) 2. updateProgram (Worker) 3. onPostStep (Conductor)
  • 13. Program - Arango Intermediate Representation (AIR) 13
  • 14. Program - Arango Intermediate Representation (AIR) Lisp-like intermediate representation, represented in JSON and supports its data types 14 Specification ● Language Primitives ○ Basic Algebraic Operators ○ Logical operators ○ Comparison operators ○ Lists ○ Sort ○ Dicts ○ Lambdas ○ Reduce ○ Utilities ○ Functional ○ Variables ○ Debug operators ● Math Library ● Special Form ○ let statement ○ seq statement ○ if statement ○ match statement ○ for-each statement ○ quote and quote-splice statements ○ quasi-quote, unquote and unquote-splice statements ○ cons statement ○ and and or statements
  • 15. Program - Arango Intermediate Representation (AIR) Lisp-like intermediate representation, represented in JSON and supports its data types 15 Specification ● Language Primitives ○ Basic Algebraic Operators ○ Logical operators ○ Comparison operators ○ Lists ○ Sort ○ Dicts ○ Lambdas ○ Reduce ○ Utilities ○ Functional ○ Variables ○ Debug operators ● Math Library ● Special Form ○ let statement ○ seq statement ○ if statement ○ match statement ○ for-each statement ○ quote and quote-splice statements ○ quasi-quote, unquote and unquote-splice statements ○ cons statement ○ and and or statements
  • 16. Pregelator Simple Foxx service based IDE 16https://github.com/arangodb-foxx/pregelator
  • 18. PPA: What is next? - Gather Feedback - In particular use-cases - Missing functions & functionality - User-friendly Front-End language - Improve Scale/Performance of underlying Pregel platform - Algorithm library - Blog Post (including Jupyter example) 18 ArangoDB 3.8 (end of year) - Experimental Feature - Initial Library ArangoDB 3.9 (Q1 21) - Draft for Front-End - Extended Library - Platform Improvements ArangoDB 4.0 (Mid 21) - GA
  • 19. Pregel vs AQL When to (not) use Pregel… - Can the algorithm be efficiently be expressed in Pregel? - Counter example: Topological Sort - Is the graph size worth the loading? 19 AQL Pregel All Models (Graph, Document, Key-Value, Search, …) Iterative Graph Processing Online Queries Large Graphs, multiple iterations
  • 20. How can I start? ● Docker Image: arangodb/enterprise-preview:3.8.0-milestone.3 ● Check existing algorithms ● Preview documentation ● Give Feedback ○ https://slack.arangodb.com/ -> custom-pregel 20
  • 21. Thanks for listening! 21 Reach out with Feedback/Questions! • @arangodb • https://www.arangodb.com/ • docker pull arangodb Test-drive Oasis 14-days for free