These are the slides to the webinar about Custom Pregel algorithms in ArangoDB https://youtu.be/DWJ-nWUxsO8. It provides a brief introduction to the capabilities and use cases for Pregel.
In this session, Kevin will dive into the unique challenges of keeping your Kubernetes workloads highly available while keeping costs low. You will learn about how to leverage cloud-native autoscaling, pod requirement right-sizing, resource buffer definition, cost allocation and more.
Exported pdf slides from our talk at PyData London 2016. The online version is available on http://pydata2016.cfapps.io/.
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergAnant Corporation
In this talk, Dremio Developer Advocate, Alex Merced, discusses strategies for migrating your existing data over to Apache Iceberg. He'll go over the following:
How to Migrate Hive, Delta Lake, JSON, and CSV sources to Apache Iceberg
Pros and Cons of an In-place or Shadow Migration
Migrating between Apache Iceberg catalogs Hive/Glue -- Arctic/Nessie
The document discusses Pregel, a graph-parallel processing platform developed at Google for large-scale graph processing. Pregel is inspired by the bulk synchronous parallel (BSP) model and uses a vertex-centric programming model where computation is viewed as messages passed between graph vertices. In Pregel, applications run as a series of supersteps where vertices can update themselves and pass messages to other vertices, with global synchronization between supersteps. This model is better suited for graph problems compared to more general data-parallel systems.
The document summarizes the Pregel paper, which introduces a system for large-scale graph processing. Pregel presents a programming model where computation proceeds through supersteps and messages are passed between vertices. It allows for writing graph algorithms in a simple, scalable, and fault-tolerant way. The paper describes Pregel's API, architecture, applications including PageRank and shortest paths, and experimental results showing it can process graphs with billions of vertices and edges.
This document discusses a lecture on Pregel, a framework for large-scale graph processing. It introduces Pregel and describes its computation model, API, execution process, and fault tolerance features. The key points covered are that Pregel is inspired by the Bulk Synchronous Parallel model and uses a vertex-centric programming model where computation occurs through message passing between vertices over supersteps. Fault tolerance is achieved through checkpointing state at each superstep.
This document discusses large scale graph processing. The goal is to run graph algorithms like shortest path on huge graphs with terabytes of data that cannot fit on one machine. It introduces Pregel, Google's graph processing system from 2010 that uses a vertex-centric programming model. Pregel partitions graphs across machines and executes synchronous iterations where vertices send messages and perform computations to solve problems like PageRank. Several open source implementations of this model now exist like Apache Giraph and GraphLab.
Processing large-scale graphs with Google(TM) PregelArangoDB Database
This document discusses processing large-scale graphs using Google's Pregel framework. It provides an overview of Pregel, including its map-reduce approach with multiple iterations. An example of using Pregel to calculate connected components in a graph is shown step-by-step. The document also discusses graph algorithms like page rank, bipartite matching, and shortest paths that can be implemented with Pregel and examples of Pregel implementations in systems like Giraph, TinkerPop and ArangoDB.
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...NoSQLmatters
Frank Celler – Processing large-scale graphs with Google(TM) Pregel
Many popular graph databases are optimized to run on a single machine, using efficient traversals to query the stored graphs. This boosts performance of algorithms originating at a single vertex and iterating through the graph e.g. finding shortest paths or neighbors. However, graphs are getting bigger and traversals are poorly performing if they require a large depth. If you need to distribute a large-scale graph thru several machines, traversals won't be the best choice (in case of performance) to process the graph. Therefore Google has released it's Pregel framework offering an environment to query distributed graphs, Pregel is also known as the map-reduce for graphs. In this talk I want to present the architecture and requirements of the Pregel framework and introduce you to the different mind-set required to write a Pregel algorithm. Furthermore I will give a short introduction to three implementations or Pregel — Giraph, TinkerPop3 and ArangoDB.
The document outlines an agenda for a workshop on ArangoDB and Ashikawa. The agenda includes introducing ArangoDB, installing it, performing CRUD operations, using the query language, and building a small example with the Ruby driver Ashikawa. It also provides information on importing data and performing queries on ArangoDB.
Many practical computing problems concern large graphs.
Standard examples include the Web graph and various social
networks. The scale of these graphs—in some cases billions
of vertices, trillions of edges—poses challenges to their
efficient processing. In this paper we present a computational
model suitable for this task. Programs are expressed
as a sequence of iterations, in each of which a vertex can
receive messages sent in the previous iteration, send messages
to other vertices, and modify its own state and that of
its outgoing edges or mutate graph topology. This vertexcentric
approach is flexible enough to express a broad set of
algorithms. The model has been designed for efficient, scalable
and fault-tolerant implementation on clusters of thousands
of commodity computers, and its implied synchronicity
makes reasoning about programs easier. Distributionrelated
details are hidden behind an abstract API. The result
is a framework for processing large graphs that is expressive
and easy to program.
Introducing Apache Giraph for Large Scale Graph Processingsscdotopen
This document introduces Apache Giraph, an open source implementation of Google's Pregel framework for large scale graph processing. Giraph allows for distributed graph computation using the bulk synchronous parallel (BSP) model. Key points:
- Giraph uses the vertex-centric programming model where computation is defined in terms of messages passed between vertices.
- It runs on Hadoop and uses its master-slave architecture, with the master coordinating workers that hold vertex partitions.
- PageRank is given as a example algorithm, where each vertex computes its rank based on messages from neighbors in each superstep until convergence.
- Giraph handles fault tolerance, uses ZooKeeper for coordination, and allows graph algorithms
GraphX è l'API di Apache Spark per la computazione parallela di grafi. Questo intervento introdurrà brevemente il concetto di grafo e illustrerà alcuni dei problemi che possono essere modellizzati utilizzandolo. Di seguito verrà presentata l'API GraphX, che verrà poi utilizzata per illustrare la soluzione pratica di un problema.
Pregel is a system for large-scale graph processing that was developed by Google. It provides a scalable and fault-tolerant platform for graph algorithms using the bulk synchronous parallel (BSP) model. In Pregel, computation is expressed as a series of iterations called supersteps where each vertex performs computation and sends messages to other vertices. This vertex-centric approach allows graph algorithms to be naturally expressed by focusing on local operations. Pregel was designed for scalability across thousands of machines and provides features like checkpointing and recovery for fault tolerance. It has been used for applications such as PageRank, shortest paths, and clustering on large graphs with billions of vertices and edges.
Large Scale Graph Processing with Apache Giraphsscdotopen
This document summarizes a talk on large scale graph processing using Apache Giraph. It begins with an introduction of the speaker and their research interests. It then provides an overview of graphs and challenges with graph processing using Hadoop/MapReduce. It describes Google's Pregel framework for graph processing and how Apache Giraph is an open source implementation of Pregel. Example graph algorithms like PageRank and connected components are demonstrated in Giraph. Experimental results show Giraph providing a 10x performance improvement over Hadoop for PageRank. The talk concludes that many problems can be modeled as networks and solved using graph processing frameworks like Giraph.
Pregel: A System For Large Scale Graph ProcessingRiyad Parvez
Pregel is a distributed system for large-scale graph processing that uses a vertex-centric programming model based on Google's Bulk Synchronous Parallel (BSP) framework. In Pregel's message passing model, computations are organized into supersteps where each vertex performs computations and sends messages to other vertices. A barrier synchronization occurs between supersteps. Pregel provides fault tolerance through checkpointing and the ability to dynamically mutate graph topology during processing. The paper demonstrates that Pregel can efficiently process large graphs and scale computation near linearly with the size of the graph.
Processing large-scale graphs with Google PregelMax Neunhöffer
Graphs are a very popular data structure to store relations like
friendship or web pages and their links. Therefore graph databases
have become popular recently and some of them even allow sharding,
i.e. automatic distribution of the data across multiple machines.
On the other hand, very computation-intensive algorithms for graphs are known and used in practice, and they often access very large data sets, which leads to heavy communication loads.
Therefore, it is an obvious idea to run such graph algorithms on the database servers, close to the data, making use of the computational power of the storage nodes.
Google's Pregel framework allows to implement a lot of graph algorithms in a general system and plays a role similar to the map-reduce skeleton, but for graphs.
In this talk I will explain the framework and describe its implementation in the multi-model database ArangoDB.
The document discusses Pregel, a system for large-scale graph processing. Pregel uses a message passing model where computation is organized into supersteps. In each superstep, vertices can send messages to other vertices and modify their own state. The document also discusses Giraph, an open source implementation of Pregel built on Hadoop. Giraph runs as a single Map-only job to avoid disk I/O between supersteps. It uses a master node to coordinate workers and assign graph partitions.
Mizan: A System for Dynamic Load Balancing in Large-scale Graph ProcessingZuhair khayyat
- Mizan is a system for dynamic load balancing in large-scale graph processing using the Pregel framework. It monitors runtime characteristics of vertices and performs efficient fine-grained vertex migration to balance computation and communication across workers.
- Existing Pregel implementations focus on static graph partitioning but this is insufficient for highly dynamic algorithms where workload needs change frequently. Mizan adapts by migrating vertices as needed.
- In an evaluation on a 21-machine cluster, Mizan provided up to 84% improvement over static partitioning techniques, and reduced overhead by 40% even with inefficient initial partitioning. It also demonstrated linear scalability on a 1024-CPU supercomputer.
The document discusses graph algorithms and their implementation using MapReduce. It describes how transitive closure, PageRank, and other graph algorithms can be computed in a distributed manner using MapReduce. While graph processing with MapReduce has challenges, systems like Pregel and Apache Hamburg aim to provide easier programming models for graph algorithms on large datasets.
This document discusses using graphs and graph databases for machine learning. It provides an overview of graph analytics algorithms that can be used to solve problems with graph data, including recommendations, fraud detection, and network analysis. It also discusses using graph embeddings and graph neural networks for tasks like node classification and link prediction. Finally, it discusses how graphs can be used for machine learning infrastructure and metadata tasks like data provenance, audit trails, and privacy.
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...Big Data Spain
This talk will give a good overview over the complex architecture of the Pregel framework and will give some insights where there are potential bottlenecks when writing a Pregel algorithm.
This document discusses processing large graphs in Hadoop. It describes Google's Pregel framework, which takes a vertex-centric approach to iterative graph processing. Pregel computations occur over multiple supersteps, with vertices sending messages to each other between steps. The document also covers Apache Giraph, an open-source implementation of Pregel built on Hadoop. Giraph allows graph processing jobs to leverage Hadoop features like HDFS and MapReduce. An example shortest path algorithm is provided to illustrate Pregel's message passing model.
This document discusses graph processing and the need for distributed graph frameworks. It provides examples of real-world graph sizes that are too large for a single machine to process. It then summarizes some of the key challenges in parallel graph processing like irregular structure and data transfer issues. Several graph processing frameworks are described including Pregel, GraphLab, PowerGraph, and LFGraph. LFGraph is presented as a simple and fast distributed graph analytics framework that aims to have low pre-processing, load-balanced computation and communication, and low memory footprint compared to previous frameworks. The document provides examples and analyses to compare the computation and communication characteristics of different frameworks. It concludes by discussing some open questions and potential areas for improvement in LFGraph.
Webinar: ArangoDB 3.8 Preview - Analytics at Scale ArangoDB Database
The ArangoDB community and team are proud to preview the next version of ArangoDB, an open-source, highly scalable graph database with multi-model capabilities. Join our CTO, Jörg Schad, Ph.D. and Developer Relation Engineer Chris Woodward in this webinar to learn more about ArangoDB 3.8 and the roadmap for upcoming releases.
Pregel: A System for Large-Scale Graph ProcessingChris Bunch
These are the slides for a presentation I recently gave at a seminar on Tools for High-Performance Computing with Big Graphs. It covers Google's Pregel system, in use for processing graph algorithms in a scalable manner.
Start From A MapReduce Graph Pattern-recognize AlgorithmYu Liu
This document summarizes a presentation on developing a MapReduce algorithm to recognize patterns in large graphs by finding connected components. It discusses:
- Motivation to study parallel graph algorithms and frameworks like MapReduce and Pregel
- The problem of finding link patterns in graphs by extracting connected components
- Background on semantic web and linked open data modeled as RDF graphs
- A naive O(2Ck)-iteration MapReduce algorithm to find connected components between pairs of datasets
- Examples and analysis of the algorithm's complexity and communication costs
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ArangoDB Database
This document summarizes techniques for combining machine learning and graph databases for better recommendations. It discusses using collaborative filtering with AQL, content-based recommendations with TFIDF and FAISS, and graph neural networks with PyTorch. The document also describes an ArangoFlix demo project that combines these techniques on a movie recommendation system using ArangoDB as the backend graph database.
Note: You have to download the slides and use either powerpoint or google slides to make the links clickable.
Machine Learning + Graph Databases for Better Recommendations
Presented by Chris Woodward
More Related Content
Similar to Custom Pregel Algorithms in ArangoDB (20)
Processing large-scale graphs with Google(TM) PregelArangoDB Database
This document discusses processing large-scale graphs using Google's Pregel framework. It provides an overview of Pregel, including its map-reduce approach with multiple iterations. An example of using Pregel to calculate connected components in a graph is shown step-by-step. The document also discusses graph algorithms like page rank, bipartite matching, and shortest paths that can be implemented with Pregel and examples of Pregel implementations in systems like Giraph, TinkerPop and ArangoDB.
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...NoSQLmatters
Frank Celler – Processing large-scale graphs with Google(TM) Pregel
Many popular graph databases are optimized to run on a single machine, using efficient traversals to query the stored graphs. This boosts performance of algorithms originating at a single vertex and iterating through the graph e.g. finding shortest paths or neighbors. However, graphs are getting bigger and traversals are poorly performing if they require a large depth. If you need to distribute a large-scale graph thru several machines, traversals won't be the best choice (in case of performance) to process the graph. Therefore Google has released it's Pregel framework offering an environment to query distributed graphs, Pregel is also known as the map-reduce for graphs. In this talk I want to present the architecture and requirements of the Pregel framework and introduce you to the different mind-set required to write a Pregel algorithm. Furthermore I will give a short introduction to three implementations or Pregel — Giraph, TinkerPop3 and ArangoDB.
The document outlines an agenda for a workshop on ArangoDB and Ashikawa. The agenda includes introducing ArangoDB, installing it, performing CRUD operations, using the query language, and building a small example with the Ruby driver Ashikawa. It also provides information on importing data and performing queries on ArangoDB.
Many practical computing problems concern large graphs.
Standard examples include the Web graph and various social
networks. The scale of these graphs—in some cases billions
of vertices, trillions of edges—poses challenges to their
efficient processing. In this paper we present a computational
model suitable for this task. Programs are expressed
as a sequence of iterations, in each of which a vertex can
receive messages sent in the previous iteration, send messages
to other vertices, and modify its own state and that of
its outgoing edges or mutate graph topology. This vertexcentric
approach is flexible enough to express a broad set of
algorithms. The model has been designed for efficient, scalable
and fault-tolerant implementation on clusters of thousands
of commodity computers, and its implied synchronicity
makes reasoning about programs easier. Distributionrelated
details are hidden behind an abstract API. The result
is a framework for processing large graphs that is expressive
and easy to program.
Introducing Apache Giraph for Large Scale Graph Processingsscdotopen
This document introduces Apache Giraph, an open source implementation of Google's Pregel framework for large scale graph processing. Giraph allows for distributed graph computation using the bulk synchronous parallel (BSP) model. Key points:
- Giraph uses the vertex-centric programming model where computation is defined in terms of messages passed between vertices.
- It runs on Hadoop and uses its master-slave architecture, with the master coordinating workers that hold vertex partitions.
- PageRank is given as a example algorithm, where each vertex computes its rank based on messages from neighbors in each superstep until convergence.
- Giraph handles fault tolerance, uses ZooKeeper for coordination, and allows graph algorithms
GraphX è l'API di Apache Spark per la computazione parallela di grafi. Questo intervento introdurrà brevemente il concetto di grafo e illustrerà alcuni dei problemi che possono essere modellizzati utilizzandolo. Di seguito verrà presentata l'API GraphX, che verrà poi utilizzata per illustrare la soluzione pratica di un problema.
Pregel is a system for large-scale graph processing that was developed by Google. It provides a scalable and fault-tolerant platform for graph algorithms using the bulk synchronous parallel (BSP) model. In Pregel, computation is expressed as a series of iterations called supersteps where each vertex performs computation and sends messages to other vertices. This vertex-centric approach allows graph algorithms to be naturally expressed by focusing on local operations. Pregel was designed for scalability across thousands of machines and provides features like checkpointing and recovery for fault tolerance. It has been used for applications such as PageRank, shortest paths, and clustering on large graphs with billions of vertices and edges.
Large Scale Graph Processing with Apache Giraphsscdotopen
This document summarizes a talk on large scale graph processing using Apache Giraph. It begins with an introduction of the speaker and their research interests. It then provides an overview of graphs and challenges with graph processing using Hadoop/MapReduce. It describes Google's Pregel framework for graph processing and how Apache Giraph is an open source implementation of Pregel. Example graph algorithms like PageRank and connected components are demonstrated in Giraph. Experimental results show Giraph providing a 10x performance improvement over Hadoop for PageRank. The talk concludes that many problems can be modeled as networks and solved using graph processing frameworks like Giraph.
Pregel: A System For Large Scale Graph ProcessingRiyad Parvez
Pregel is a distributed system for large-scale graph processing that uses a vertex-centric programming model based on Google's Bulk Synchronous Parallel (BSP) framework. In Pregel's message passing model, computations are organized into supersteps where each vertex performs computations and sends messages to other vertices. A barrier synchronization occurs between supersteps. Pregel provides fault tolerance through checkpointing and the ability to dynamically mutate graph topology during processing. The paper demonstrates that Pregel can efficiently process large graphs and scale computation near linearly with the size of the graph.
Processing large-scale graphs with Google PregelMax Neunhöffer
Graphs are a very popular data structure to store relations like
friendship or web pages and their links. Therefore graph databases
have become popular recently and some of them even allow sharding,
i.e. automatic distribution of the data across multiple machines.
On the other hand, very computation-intensive algorithms for graphs are known and used in practice, and they often access very large data sets, which leads to heavy communication loads.
Therefore, it is an obvious idea to run such graph algorithms on the database servers, close to the data, making use of the computational power of the storage nodes.
Google's Pregel framework allows to implement a lot of graph algorithms in a general system and plays a role similar to the map-reduce skeleton, but for graphs.
In this talk I will explain the framework and describe its implementation in the multi-model database ArangoDB.
The document discusses Pregel, a system for large-scale graph processing. Pregel uses a message passing model where computation is organized into supersteps. In each superstep, vertices can send messages to other vertices and modify their own state. The document also discusses Giraph, an open source implementation of Pregel built on Hadoop. Giraph runs as a single Map-only job to avoid disk I/O between supersteps. It uses a master node to coordinate workers and assign graph partitions.
Mizan: A System for Dynamic Load Balancing in Large-scale Graph ProcessingZuhair khayyat
- Mizan is a system for dynamic load balancing in large-scale graph processing using the Pregel framework. It monitors runtime characteristics of vertices and performs efficient fine-grained vertex migration to balance computation and communication across workers.
- Existing Pregel implementations focus on static graph partitioning but this is insufficient for highly dynamic algorithms where workload needs change frequently. Mizan adapts by migrating vertices as needed.
- In an evaluation on a 21-machine cluster, Mizan provided up to 84% improvement over static partitioning techniques, and reduced overhead by 40% even with inefficient initial partitioning. It also demonstrated linear scalability on a 1024-CPU supercomputer.
The document discusses graph algorithms and their implementation using MapReduce. It describes how transitive closure, PageRank, and other graph algorithms can be computed in a distributed manner using MapReduce. While graph processing with MapReduce has challenges, systems like Pregel and Apache Hamburg aim to provide easier programming models for graph algorithms on large datasets.
This document discusses using graphs and graph databases for machine learning. It provides an overview of graph analytics algorithms that can be used to solve problems with graph data, including recommendations, fraud detection, and network analysis. It also discusses using graph embeddings and graph neural networks for tasks like node classification and link prediction. Finally, it discusses how graphs can be used for machine learning infrastructure and metadata tasks like data provenance, audit trails, and privacy.
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...Big Data Spain
This talk will give a good overview over the complex architecture of the Pregel framework and will give some insights where there are potential bottlenecks when writing a Pregel algorithm.
This document discusses processing large graphs in Hadoop. It describes Google's Pregel framework, which takes a vertex-centric approach to iterative graph processing. Pregel computations occur over multiple supersteps, with vertices sending messages to each other between steps. The document also covers Apache Giraph, an open-source implementation of Pregel built on Hadoop. Giraph allows graph processing jobs to leverage Hadoop features like HDFS and MapReduce. An example shortest path algorithm is provided to illustrate Pregel's message passing model.
This document discusses graph processing and the need for distributed graph frameworks. It provides examples of real-world graph sizes that are too large for a single machine to process. It then summarizes some of the key challenges in parallel graph processing like irregular structure and data transfer issues. Several graph processing frameworks are described including Pregel, GraphLab, PowerGraph, and LFGraph. LFGraph is presented as a simple and fast distributed graph analytics framework that aims to have low pre-processing, load-balanced computation and communication, and low memory footprint compared to previous frameworks. The document provides examples and analyses to compare the computation and communication characteristics of different frameworks. It concludes by discussing some open questions and potential areas for improvement in LFGraph.
Webinar: ArangoDB 3.8 Preview - Analytics at Scale ArangoDB Database
The ArangoDB community and team are proud to preview the next version of ArangoDB, an open-source, highly scalable graph database with multi-model capabilities. Join our CTO, Jörg Schad, Ph.D. and Developer Relation Engineer Chris Woodward in this webinar to learn more about ArangoDB 3.8 and the roadmap for upcoming releases.
Pregel: A System for Large-Scale Graph ProcessingChris Bunch
These are the slides for a presentation I recently gave at a seminar on Tools for High-Performance Computing with Big Graphs. It covers Google's Pregel system, in use for processing graph algorithms in a scalable manner.
Start From A MapReduce Graph Pattern-recognize AlgorithmYu Liu
This document summarizes a presentation on developing a MapReduce algorithm to recognize patterns in large graphs by finding connected components. It discusses:
- Motivation to study parallel graph algorithms and frameworks like MapReduce and Pregel
- The problem of finding link patterns in graphs by extracting connected components
- Background on semantic web and linked open data modeled as RDF graphs
- A naive O(2Ck)-iteration MapReduce algorithm to find connected components between pairs of datasets
- Examples and analysis of the algorithm's complexity and communication costs
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ArangoDB Database
This document summarizes techniques for combining machine learning and graph databases for better recommendations. It discusses using collaborative filtering with AQL, content-based recommendations with TFIDF and FAISS, and graph neural networks with PyTorch. The document also describes an ArangoFlix demo project that combines these techniques on a movie recommendation system using ArangoDB as the backend graph database.
Note: You have to download the slides and use either powerpoint or google slides to make the links clickable.
Machine Learning + Graph Databases for Better Recommendations
Presented by Chris Woodward
Note: You have to download the slides and use either powerpoint or google slides to make the links clickable.
Machine Learning + Graph Databases for Better Recommendations
Presented by Chris Woodward
The ArangoML Group had a detailed discussion on the topic "GraphSage Vs PinSage" where they shared their thoughts on the difference between the working principles of two popular Graph ML algorithms. The following slidedeck is an accumulation of their thoughts about the comparison between the two algorithms.
These are the slides from the Getting Started with ArangoDB Oasis webinar: https://www.arangodb.com/events/getting-started-with-arangodb-oasis/
Get your own Oasis with a free 14-day trial (no credit card required) at https://cloud.arangodb.com/home.
Hacktoberfest 2020 'Intro to Knowledge Graph' with Chris Woodward of ArangoDB and reKnowledge. Accompanying video is available here: https://youtu.be/ZZt6xBmltz4
A Graph Database That Scales - ArangoDB 3.7 Release WebinarArangoDB Database
örg Schad (Head of Engineering and ML) and Chris Woodward (Developer Relations Engineer) introduce the new capabilities to work with graph in a distributed setting. In addition explain and showcase the new fuzzy search within ArangoDB's search engine as well as JSON schema validation.
Get started with ArangoDB: https://www.arangodb.com/arangodb-tra...
Explore ArangoDB Cloud for free with 1-click demos: https://cloud.arangodb.com/home
ArangoDB is a native multi-model database written in C++ supporting graph, document and key/value needs with one engine and one query language. Fulltext search and ranking is supported via ArangoSearch the fully integrated C++ based search engine in ArangoDB.
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?ArangoDB Database
View the video of this webinar here: https://www.arangodb.com/arangodb-events/gvisor-kata-containers-firecracker-docker/
Containers* have revolutionized the IT landscape and for a long time. Docker seemed to be the default whenever people were talking about containerization technologies**. But traditional container technologies might not be suitable if strong isolation guarantees are required. So recently new technologies such as gVisor, Kata Container, or firecracker have been introduced to close the gap between the strong isolation of virtual machines and the small resource footprint of containers.
In this talk, we will provide an overview of the different containerization technologies, discuss their tradeoffs, and provide guidance for different use cases.
* We will define the term container in more detailed during the talk
** and yes we will also cover some of the pre-docker container space!
We all know good training data is crucial for data scientists to build quality machine learning models. But when productionizing Machine Learning, Metadata is equally important. Consider for example:
- Provenance of model allowing for reproducible builds
- Context to comply with GDPR, CCPA requirements
- Identifying data shift in your production data
This is the reason we built ArangoML Pipeline, a flexible Metadata store which can be used with your existing ML Pipeline.
Today we are happy to announce a release of ArangoML Pipeline Cloud. Now you can start using ArangoML Pipeline without having to even start a separate docker container.
In this webinar, we will show how to leverage ArangoML Pipeline Cloud with your Machine Learning Pipeline by using an example notebook from the TensorFlow tutorial.
Find the video here: https://www.arangodb.com/arangodb-events/arangoml-pipeline-cloud/
Find the recording of this webinar here: https://www.arangodb.com/arangodb-events/3-7-roadmap-performance-at-scale/
After the release of ArangoDB 3.6 we are starting to work on the next version with even more exciting features. As an open-source project we would love to hear your ideas and discuss the roadmap with our community.
Would you like to learn more about Satellite Graphs, Schema Validation, a number of performance and security improvements?
Than join Jörg Schad, Head of Engineering and Machine Learning at ArangoDB, who will share the latest plans for the upcoming ArangoDB 3.7 release as well as the long term roadmap.
The long-awaited Managed Service for ArangoDB is finally here! Users have a fully managed document, graph, and key/value store, plus a search engine, in one place. As we thought of such a powerful service — something that gives you room to breathe, relax, and having someone else taking care of everything —, we called it Oasis.
In this live webinar, Ewout Prangsma, Architect & Teamlead of ArangoDB Oasis, walks you through all the main capabilities of the new service, including high availability, elastic scalability, enterprise-grade security, and also demo the different deployment modes you have at your fingertips.
Before the Q&A part, Ewout also shares what you will be capable of in the future.
The new ArangoDB 3.5 release is here and includes a number of minor and major new features. For example, the ability to perform distributed JOIN operations with SmartJoins, new text search features in ArangoSearch, new consistent backup mechanism, and extended graph database features including k-shortest path queries and the new PRUNE keyword for more efficient queries. Jörg Schad, our Head of Engineering and Machine Learning, will discuss these new features and provide a hands-on demo on how to leverage them for your use case.
This document summarizes new features in ArangoDB version 3.5 including distributed joins, streaming transactions, expanded graph and search capabilities, hot backups, data masking, and time-to-live indexes. It also previews upcoming features like fuzzy search, autocomplete, and faceted search in ArangoSearch as well as k-shortest paths and pruning in graphs.
These are the slides from the webinar, where Chris & Jan walked through the basic concepts, key features and query options you have within ArangoDB as well as discuss scalability considerations for different data models. Chris is the hands-on guy and will showcase a variety of query options you have with a native multi-model database like ArangoDB
In these slides, Jan Steemann, core member of the ArangoDB project, introduced to the idea of native multi-model databases and how this approach can provide much more flexibility for developers, software architects & data scientists.
Running complex data queries in a distributed systemArangoDB Database
With the always-growing amount of data, it is getting increasingly hard to store and get it back efficiently. While the first versions of distributed databases have put all the burden of sharding on the application code, there are now some smarter solutions that handle most of the data distribution and resilience tasks inside the database.
This poses some interesting questions, e.g.
- how are other than by-primary-key queries actually organized and executed in a distributed system, so that they can run most efficiently?
- how do the contemporary distributed databases actually achieve transactional semantics for non-trivial operations that affect different shards/servers?
This talk will give an overview of these challenges and the available solutions that some open source distributed databases have picked to solve them.
Guacamole Fiesta: What do avocados and databases have in common?ArangoDB Database
First, our CTO, Frank Celler, does a quick overview of the latest feature developments and what is new with ArangoDB.
Then, Senior Graph Specialist, Michael Hackstein talks about multi-model database movement, diving deeper into main advantages and technological benefits. He introduces three data-models of ArangoDB (Documents, Graphs and Key-Values) and the reasons behind the technology. We have a look at the ArangoDB Query language (AQL) with hands-on examples. Compare AQL to SQL, see where the differences are and what makes AQL better comprehensible for developers. Finally, we touch the Foxx Microservice framework which allows to easily extend ArangoDB and include it in your microservices landscape.
Different applications need different performance guarantees. Some applications need fastest possible ingest of a dataset (a hare). Other applications need write rate guarantees for every single record (a tortoise). The choice between the two becomes critical as CPU cores, memory size, and disk throughput decrease to save money on cloud virtual machines / AWS instances. This presentation demonstrates how to convert RocksDB's natural "hare mode" into "tortoise mode" for timing sensitive applications and/or lower cost, lower capability hardware. RocksDB's stalls and stops are explained.
The Computer Science Behind a modern Distributed DatabaseArangoDB Database
What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Every application needs a stateful layer which holds the data. There are several different necessary components which are anything but trivial to combine, and, of course, even more challenging when attempting to optimize for performance. Over the past years there has been significant progress in both the science and practical implementations of such data stores. In this talk Dan Larkin-York will introduce the audience to some of the challenges, address the difficulties of their interplay, and cover key approaches taken by some of the industry’s leaders (ArangoDB, Cassandra, CockroachDB, MarkLogic, and more).
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...apidays
4 identity factors you didn't know you needed to support large organizations in your SaaS
Daizen Ikehara, Principal Developer Advocate at Auth0
apidays Singapore 2025
Where APIs Meet AI: Building Tomorrow's Intelligent Ecosystems
April 15 & 16, 2025
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)apidays
Two tales of API Change Management from my time at Google
Eric Koleda, Developer Advocate at Coda
apidays New York 2025
API Management for Surfing the Next Innovation Waves: GenAI and Open Banking
Convene 360 Madison, New York
May 14 & 15, 2025
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...apidays
Why an SDK is Needed to Protect APIs from Mobile Apps
Pearce Erensel, Global VP of Sales at Approov Mobile Security
apidays New York 2025
API Management for Surfing the Next Innovation Waves: GenAI and Open Banking
May 14 & 15, 2025
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
apidays New York 2025 - Building Green Software by Marissa Jasso & Katya Drey...apidays
Building Green Software: How Cloud-Native Platforms Can Power Sustainable App Development
Katya Dreyer-Oren, Lead Software Engineer at Heroku (Salesforce)
Marissa Jasso, Product Manager at Heroku (Salesforce)
apidays New York 2025
API Management for Surfing the Next Innovation Waves: GenAI and Open Banking
Convene 360 Madison, New York
May 14 & 15, 2025
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...apidays
Fast, Repeatable, Secure: Pick 3 with FINOS CCC
Leigh Capili, Kubernetes Contributor at Control Plane
apidays New York 2025
API Management for Surfing the Next Innovation Waves: GenAI and Open Banking
May 14 & 15, 2025
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
THE FRIEDMAN TEST ( Biostatics B. Pharm)JishuHaldar
The Friedman Test is a valuable non-parametric alternative to the
Repeated Measures ANOVA, allowing for the comparison of three or
more related groups when data is ordinal or not normally distributed.
By ranking data instead of using raw values, the test overcomes the
limitations of parametric tests, making it ideal for small sample sizes and
real-world applications in medicine, psychology, pharmaceutical
sciences, and education. However, while it effectively detects differences
among groups, it does not indicate which specific groups differ, requiring
further post-hoc analysis.
Report based on the findings of a quantitative research conducted by the research agency New Image Marketing Group, commissioned by the NGO Detector Media, compiled by PhD in Sociology Marta Naumova.
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...apidays
Breaking Barriers: Lessons Learned from API Integration with Large Hotel Chains and the Role of Standardization
Constantine Nikolaou, Manager Business Solutions Architect at Booking.com
apidays New York 2025
API Management for Surfing the Next Innovation Waves: GenAI and Open Banking
May 14 & 15, 2025
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...apidays
Boost API Development Velocity with Practical AI Tooling
Sumit Amar, VP of Engineering at WEX
apidays New York 2025
API Management for Surfing the Next Innovation Waves: GenAI and Open Banking
May 14 & 15, 2025
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
apidays New York 2025 - The FINOS Common Domain Model for Capital Markets by ...apidays
The FINOS Common Domain Model for Capital Markets
Tom Healey, Founder & Director at FINXIS LLC
Daniel Schwartz, Managing Partner at FT Advisory LLC
apidays New York 2025
API Management for Surfing the Next Innovation Waves: GenAI and Open Banking
May 14 & 15, 2025
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
AG-FIRMA FINCOME ARTICLE AI AGENT RAG.pdfAnass Nabil
AI CHAT BOT Design of a multilingual AI assistant to optimize agricultural practices in Morocco
Delivery service status checking
Mobile architecture + orchestrator LLM + expert agents (RAG, weather,sensors).
At Opsio, we specialize in delivering advanced cloud services that enable businesses to scale, transform, and modernize with confidence. Our core offerings focus on cloud management, digital transformation, and cloud modernization — all designed to help organizations unlock the full potential of their technology infrastructure.We take a client-first approach, blending industry-leading hosted technologies with strategic expertise to create tailored, future-ready solutions. Leveraging AI, automation, and emerging technologies, our services simplify IT operations, enhance agility, and accelerate business outcomes. Whether you're migrating to the cloud or optimizing existing cloud environments, Opsio is your partner in achieving sustainable, measurable success.
apidays New York 2025 - Unifying OpenAPI & AsyncAPI by Naresh Jain & Hari Kri...apidays
Unifying OpenAPI & AsyncAPI: Designing JSON Schemas+Examples for Reuse
Naresh Jain, Co-founder & CEO at Specmatic
Hari Krishnan, Co-founder & CTO at Specmatic
apidays New York 2025
API Management for Surfing the Next Innovation Waves: GenAI and Open Banking
May 14 & 15, 2025
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
Tableau Cloud - what to consider before making the move update 2025.pdfelinavihriala
Thinking of moving your data infrastructure to the cloud? This presentation will break down the critical things to consider—performance, security, scalability, and those "gotchas" nobody talks about. Think of this as your roadmap to a successful (and smooth!) migration.
apidays Singapore 2025 - Building Finance Innovation Ecosystems by Umang Moon...apidays
Building Finance Innovation Ecosystems
Umang Moondra, CEO at APIX
apidays Singapore 2025
Where APIs Meet AI: Building Tomorrow's Intelligent Ecosystems
April 15 & 16, 2025
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
2. 2
tl;dr
● “Many practical computing problems concern large
graphs.”
● ArangoDB is a “Beyond Graph Database”
supporting multiple data models around a scalable
graph foundation
● Pregel is a framework for distributed graph
processing
○ ArangoDB supports predefined Prgel algorithms, e.g.
PageRank, Single-Source Shortest Path and Connected
components.
● Programmable Pregel Algorithms (PPA) allows
adding/modifying algorithms on the flight
Disclaimer
This is an experimental
feature and especially the
language specification
(front-end) is still under
development!
3. Jörg Schad, PhD
Head of Engineering and ML
@ArangoDB
● Suki.ai
● Mesosphere
● Architect @SAP Hana
● PhD Distributed DB
Systems
● Twitter: @joerg_schad
8. ArangoDB and Pregel: Status Quo
● https://www.arangodb.com/docs/stable/graphs-pregel.html
● https://www.arangodb.com/pregel-community-detection/
Available Algorithms
● Page Rank
● Seeded PageRank
● Single-Source Shortest Path
● Connected Components
○ Component
○ WeaklyConnected
○ StronglyConnected
● Hyperlink-Induced Topic Search
(HITS)Permalink
● Vertex Centrality
● Effective Closeness
● LineRank
● Label Propagation
● Speaker-Listener Label Propagation 8
var pregel = require("@arangodb/pregel");
pregel.start("pagerank", "graphname", {maxGSS: 100,
threshold: 0.00000001, resultField: "rank"})
● Pregel support since 2014
● Predefined algorithms
○ Could be extended via C++
● Same platform used for PPA
Challenges
Add and modify Algorithms
9. Programmable Pregel Algorithms (PPA)
const pregel = require("@arangodb/pregel");
let pregelID = pregel.start("air", graphName, "<custom-algorithm>");
var status = pregel.status(pregelID);
● Add/Modify algorithms on-the-fly
○ Without C++ code
○ Without restarting the Database
● Efficiency (as Pregel) depends on Sharding
○ Smart Graphs
○ Required: Collocation of vertices and edges
9
10. Custom Algorithm
10
{
"resultField": "<string>",
"maxGSS": "<number>",
"dataAccess": {
"writeVertex": "<program>",
"readVertex": "<array>",
"readEdge": "<array>"
},
"vertexAccumulators": "<object>",
"globalAccumulators": "<object>",
"customAccumulators": "<object>",
"phases": "<array>"
}
Accumulators
Accumulators are used to consume and process messages which are being
sent to them during the computational phase (initProgram, updateProgram,
onPreStep, onPostStep) of a superstep. After a superstep is done, all messages
will be processed.
● max: stores the maximum of all messages received.
● min: stores the minimum of all messages received.
● sum: sums up all messages received.
● and: computes and on all messages received.
● or: computes or and all messages received.
● store: holds the last received value (non-deterministic).
● list: stores all received values in list (order is non-deterministic).
● custom
11. Custom Algorithm
11
{
"resultField": "<string>",
"maxGSS": "<number>",
"dataAccess": {
"writeVertex": "<program>",
"readVertex": "<array>",
"readEdge": "<array>"
},
"vertexAccumulators": "<object>",
"globalAccumulators": "<object>",
"customAccumulators": "<object>",
"phases": "<array>"
}
● resultField (string, optional): Name of the document attribute to store the result in. The
vertex computation results will be in all vertices pointing to the given attribute.
● maxGSS (number, required): The max amount of global supersteps After the amount of max
defined supersteps is reached, the Pregel execution will stop.
● dataAccess (object, optional): Allows to define writeVertex, readVertex and readEdge.
○ writeVertex: A program that is used to write the results into vertices. If writeVertex is
used, the resultField will be ignored.
○ readVertex: An array that consists of strings and/or additional arrays (that represents
a path).
■ string: Represents a single attribute at the top level.
■ array of strings: Represents a nested path
○ readEdge: An array that consists of strings and/or additional arrays (that represents
a path).
■ string: Represents a single path at the top level which is not nested.
■ array of strings: Represents a nested path
● vertexAccumulators (object, optional): Definition of all used vertex accumulators.
● globalAccumulators (object, optional): Definition all used global accumulators. Global
Accumulators are able to access variables at shared global level.
● customAccumulators (object, optional): Definition of all used custom accumulators.
● phases (array): Array of a single or multiple phase definitions.
● debug (optional): See Debugging.
14. Program - Arango Intermediate Representation (AIR)
Lisp-like intermediate representation, represented in
JSON and supports its data types
14
Specification
● Language Primitives
○ Basic Algebraic Operators
○ Logical operators
○ Comparison operators
○ Lists
○ Sort
○ Dicts
○ Lambdas
○ Reduce
○ Utilities
○ Functional
○ Variables
○ Debug operators
● Math Library
● Special Form
○ let statement
○ seq statement
○ if statement
○ match statement
○ for-each statement
○ quote and quote-splice
statements
○ quasi-quote, unquote and
unquote-splice statements
○ cons statement
○ and and or statements
15. Program - Arango Intermediate Representation (AIR)
Lisp-like intermediate representation,
represented in JSON and supports its data types
15
Specification
● Language Primitives
○ Basic Algebraic Operators
○ Logical operators
○ Comparison operators
○ Lists
○ Sort
○ Dicts
○ Lambdas
○ Reduce
○ Utilities
○ Functional
○ Variables
○ Debug operators
● Math Library
● Special Form
○ let statement
○ seq statement
○ if statement
○ match statement
○ for-each statement
○ quote and quote-splice
statements
○ quasi-quote, unquote and
unquote-splice statements
○ cons statement
○ and and or statements
16. Pregelator
Simple Foxx service based IDE
16https://github.com/arangodb-foxx/pregelator
18. PPA: What is next?
- Gather Feedback
- In particular use-cases
- Missing functions & functionality
- User-friendly Front-End language
- Improve Scale/Performance of underlying
Pregel platform
- Algorithm library
- Blog Post (including Jupyter example)
18
ArangoDB 3.8 (end of year)
- Experimental Feature
- Initial Library
ArangoDB 3.9 (Q1 21)
- Draft for Front-End
- Extended Library
- Platform Improvements
ArangoDB 4.0 (Mid 21)
- GA
19. Pregel vs AQL
When to (not) use Pregel…
- Can the algorithm be efficiently be
expressed in Pregel?
- Counter example: Topological Sort
- Is the graph size worth the loading?
19
AQL Pregel
All Models (Graph, Document, Key-Value, Search, …) Iterative Graph Processing
Online Queries Large Graphs, multiple iterations
20. How can I start?
● Docker Image: arangodb/enterprise-preview:3.8.0-milestone.3
● Check existing algorithms
● Preview documentation
● Give Feedback
○ https://slack.arangodb.com/ -> custom-pregel
20
21. Thanks for listening!
21
Reach out with Feedback/Questions!
• @arangodb
• https://www.arangodb.com/
• docker pull arangodb
Test-drive Oasis
14-days for free