0% found this document useful (0 votes)
16 views9 pages

Graph Done Right

This white paper discusses graph databases, specifically focusing on ArangoDB, and their advantages in modeling complex relationships in data. It highlights the structure of graphs, including nodes and edges, and explains various use cases such as identity management, fraud detection, and knowledge graphs. The paper emphasizes the efficiency of graph databases in handling highly connected data and performing complex queries.

Uploaded by

iskar1407
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views9 pages

Graph Done Right

This white paper discusses graph databases, specifically focusing on ArangoDB, and their advantages in modeling complex relationships in data. It highlights the structure of graphs, including nodes and edges, and explains various use cases such as identity management, fraud detection, and knowledge graphs. The paper emphasizes the efficiency of graph databases in handling highly connected data and performing complex queries.

Uploaded by

iskar1407
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

W H I T E PA P E R

Graph Done Right


W H I T E P A P E R — Graph Done Right

Table of Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Graph Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Graph Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Why Graph? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
ArangoDB as a Graph Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Identifying Graph Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6


Use Cases for Graph Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Graph Done Right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8


Features Beyond Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Graphs at Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Additional Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Abstract
In this white paper, we will discuss:
• What a graph database is
• ArangoDB as a graph database
• How ArangoDB is graph done right

2
W H I T E P A P E R — Graph Done Right

Introduction
A graph database uses graph structures for semantic queries with nodes, edges, and
properties to represent and store data about each node and graph. Graph databases are
often used where data is hard to model with traditional relational databases of tables with
rows and columns.

For example, graph databases are used to model social networks of who is friends with
whom. Each node can be a person, and each edge can be a relationship, such as whether
they’re friends with another person or have liked one of their posts. Data for each node can
be their demographics: their age, where they live, etc. With this data, it’s relatively easy to
do things like suggest new friends based on who your friends have friended, whose posts
they’ve liked, and how similar their demographics are.

Another example is supply chains. Nodes can be locations such as factories, roads, ports,
warehouses, and retail stores. Edges can indicate which sites are connected to which,
essentially forming a supply chain “road graph.” Once modeled, it’s easy to find the shortest
path through a supply chain, or which parts of the supply chain are poorly connected and
thus have potential bottlenecks.

Most real-world interactions form a graph. Cellular networks are simply graphs of cell phones
and the cell towers to which they connect. Datacenters are graphs of networking hardware,
servers, applications, and the connections between them. Customer 360 data is a graph of
all customers and the various touchpoints — online and off — they’ve had with a company.
In each of these examples, companies are making cellular service more reliable, applications
more snappy, and customer service more satisfying — all thanks to graph data models
stored in graph databases.

A quick aside to clear up any potential confusion: when we say “graph” in this document, we
don’t mean a graph of a function, such as a line chart or bar chart. That information is best
stored in a relational database or spreadsheet. Instead, we mean a graph as defined by the
graph theory of computer science, where we are modeling relationships between two objects.

Graph Basics
To summarize the examples above, graph databases store objects — also called nodes or
vertices. The relations between the nodes are called edges. These nodes and edges form a
network of data points called a “graph.” Using the graph data model allows you to represent
data alongside the inherent connections that exist within it. A graph database efficiently
leverages this representation with built-in graph queries, referred to as traversals.

One last point about nodes and edges: each has a set of properties. In the social media
example above, each user can have properties, such as an age of 21 years old or a hometown
of Chicago. In the supply chain example, each location might have latitude and longitude
properties. These properties are schema-free, meaning we easily add new properties to each
node or edge. For example, some users might list their favorite movie; this becomes a new
property for that user node. While ArangoDB is natively schema-less, if your use case does
require a defined schema, you can enforce it by enabling the built-in JSON schema validation.
In ArangoDB, schema validation comes with varying levels of configuration and validation
control and complies with the popular JSON schema specification.

3
W H I T E P A P E R — Graph Done Right

Graph Components
A graph consists of nodes and edges.

Nodes
In a graph database, each object is called a node. We represent nodes as circles and edges
as lines or arcs. The terms node and vertex are synonymous. Nodes may be:
• Connected with more than one other node via multiple edges
• Connected to themselves
• Disconnected from the graph, having no connecting edges

Edges
The connections between the nodes are called edges. In other words, edges store
information about the relationships between the nodes. They can have properties just
like other documents, but they uniquely describe the relationship between nodes.

Vertex
Edge

Graph databases offer specialized algorithms to analyze the relationships among data. The
simplest algorithm is a graph traversal (also known as graph search), referring to the process
of checking or updating each node in a graph, beginning at a defined start node and ending
at a defined depth with the end node.

Why Graph?
Graphs are a good data model for representing relationships in data. In many real-world
cases, a graph is a natural data model. It captures relations and, using JSON, can store
complex data on edges and nodes.

A graph database excels at navigational queries. A crucial component for a graph database
is that the query language must implement traversal algorithms such as breadth or depth-
first traversal, shortest path(s), k paths, and more. The fundamental capability for these
algorithms is to rapidly access the list of all outgoing or incoming edges of a node.

Breadth-first algorithms explore each node at the present depth before moving on to other
nodes. If it’s likely that you are looking for a node close to your starting node, a breadth-
first search is likely to work best. For example, when building social capabilities into an app,
looking for friends-of-friends (just two levels deep) calls for a breadth-first search.

Depth-first algorithms explore each path as far as possible before backtracking and
exploring another. If a graph has many edges from each node, a breadth-first search might
consume too much memory, necessitating a depth-first search. Depth-first might be better
when exploring deep within a graph, such as money laundering cycles.
4
W H I T E P A P E R — Graph Done Right

Shortest path algorithms find the shortest path from one node to another. Finding the
shortest route is a common task in the real world; for instance, determining which train
route to take. If we go from Paris to Berlin, we can take many different trains, but their paths
have different times or weights in the graph world. It might be cheaper to take a route with
more stops (hops, in graph speak), and it may even travel a shorter distance overall but
take longer. The amount of time it takes to travel from one stop to the other is the weight of
the path between those two stations. You can add complexity to your search by saying you
only want paths below a certain weight value (total travel time). Or, you can consider the
route’s cost and filter your results further. Refining your search this way is trivial with graph
traversals and is an example of where a graph database shines.

K path algorithms generalize on shortest path algorithms by finding other paths through
a graph. These alternate paths may be the same length or longer than the shortest path.
K path algorithms can be useful in supply chain problems when looking for alternative (if
slightly more expensive) routes to ship goods.

ArangoDB as a Graph Database


In ArangoDB, each edge has a single direction; it can’t point both ways simultaneously.
This model is also known as a directed graph.

Edges are always directed, but users can ignore the direction (follow in ANY direction) when
they walk through the graph or follow edges in the reverse direction (INBOUND) instead of
going in the direction they point to (OUTBOUND).

OUTBOUND INBOUND

ANY

5
W H I T E P A P E R — Graph Done Right

Graph Store
In ArangoDB, data models can be implemented by storing a JSON document for each node
and a JSON document for each edge. Edges are kept in special edge collections that ensure
that every edge has _from and _to attributes that reference the starting and ending nodes
of an edge as well as the direction of a relationship. ArangoDB enables efficient and scalable
graph query performance by using a special hash index on _from and _to attributes (i.e.,
an edge index). This allows for constant lookup times. Using an edge index, ArangoDB can
process graph queries very efficiently.

Arango Query Language (AQL)


AQL is the query language used in ArangoDB that allows users to express document queries,
key/value lookups, graph queries, full-text search, and arbitrary combinations of these. The
example below demonstrates combining a full-text search for sci-fi movies with a graph
traversal to retrieve metadata:

FOR d IN v_imdb
SEARCH
ANALYZER(d.description
IN TOKENS(‘amazing action world alien sci-fi science documental’,
‘text_en’) ||
BOOST(d.description IN TOKENS(‘galaxy’, ‘text_en’), 5), ‘text_en’)
SORT BM25(d) DESC
LIMIT 10
FOR vertex, edge, path IN 1..1 INBOUND d imdb_edges
FILTER path.edges[0].$label == “DIRECTED”
RETURN DISTINCT {
“director” : vertex.name,
“movie” : d.title
}

Identifying Graph Use Cases


Graph databases tend to be most appropriate for highly connected data. For instance,
if you find yourself using a relational database, and your queries have many JOINs, that’s
an indicator to consider a graph database. Another indicator is when your relational queries
need to follow JOINs on multiple levels. Yet another is when you’re trying to uncover hidden
patterns in your data.

In other words, graph databases are most useful when the connections between data are
just as interesting, if not more so, than the data itself. This contrasts with relational data,
where what’s in each row — such as a customer ID or a transaction amount — is what’s
most interesting.

Use Cases for Graph Databases


Specific use cases include:

Identity and Access Management


When determining who can see which information in an organization, a manager often has
permission to view data about their team. For instance, each sales manager can see the
travel expenditures of their team, each sales director can see the expenditures of their
managers and teams, and so on. But, accountants can cut across these hierarchies allowing
them to audit a set of sales teams by viewing their travel purchases. This web of permissions
is best represented as a graph and is crucial to provide access to all appropriate employees,
but no one else.
6
W H I T E P A P E R — Graph Done Right

Fraud Detection
Detecting fraud involves complex pattern matching that also considers the graph structure
of connections (e.g., an unusual amount of connections between different entities and
accounts, IP addresses, etc.), as well as statistical analysis, associative queries, and joins.
In many cases, this can sensibly be modeled by a graph structure that involves assembling
and integrating a huge amount of data.

Knowledge Graph
Enterprise Knowledge Graphs (EKGs) have been on the rise and are valuable tools for
harmonizing internal and external data relevant to an organization. EKGs bring data into a
common semantic model to improve enterprise operational efficiency and increase business
units’ competitive advantage.

Research
Research teams use graphs to uncover and catalog valuable insights across projects.
Citations networks describe the contributions within individual papers but can connect to
large EKGs as described above. For example, one research organization has nodes for each
portion of the human genome, all medical research papers published, and each other then
has edges describing which authors wrote which papers on which genome segment. This
makes it easier for researchers to collaborate with others working on similar genomic topics.

Recommendation Engine
There are many different approaches and techniques for generating recommendations, and
most synergize perfectly with graph databases. To enrich graph traversals, one approach
stores inferences derived from machine learning activities as document attributes, usually
along graph edges. It is also possible to eliminate the need for complex ML pipelines and
instead use built-in AQL functions and graph algorithms to offer on-the-fly recommendations
using just the data stored in the database.

Network and IT Operations


Computer networks, the associated hosts and their components, as well as virtualizations of
software-defined infrastructure form a graph. Management of such an infrastructure involves
queries about the graph structure, as well as queries about the set of hosts or similar things.

Social Media Management


Social networks are the prime example of large, highly connected graphs. They typically
involve graph algorithms and graph traversal queries.

Traffic Management
Street networks are naturally modeled as a graph. Traffic flow data produces a high volume
of time-based data that is closely related to the street network. Finding good decisions
about traffic management involves querying all this data and running intelligent algorithms
using aggregations, graph traversals, and joins.

7
W H I T E P A P E R — Graph Done Right

Graph Done Right


ArangoDB goes beyond just a graph database with document and key/value store
capabilities, offering full-text search, integrations for machine learning (ML), and more.

Features Beyond Graph


As shown in the image below, ArangoDB provides access to a rich set of features that includes:
• Managed Service (ArangoGraph Insights Platform)
• Built-in Search Engine (ArangoSearch)
• Big Data Graph Processing (Pregel)
• Machine Learning-as-a-Service (ArangoGraphML)
• Kubernetes Integration (Kube-Arangodb)

Managed
Cloud Service
ArangoDBGraph

Kubernetes
Document
Integration
JSON Support
Kube-Arango

Scalable
Graph
Technology
AQL

Single query language


GraphML and
Full Text
Analytics
ArangoDBSearch
ArangoGraphML

Iterative
Graph Processing
Pregel

ArangoGraph Insights Platform


ArangoGraph Insights Platform (ArangoGraph) is a cloud-based, next-generation graph data
and analytics platform that natively integrates graph, JSON, search, and machine learning. It
is a fully-managed service that allows users to take advantage of the complete functionality
of an ArangoDB cluster deployment without running or managing the system in-house.
ArangoGraph runs in data centers of a preferred cloud provider: Google Cloud Platform
(GCP), Amazon Web Services (AWS), or Microsoft Azure. This ensures that your databases
are always available, up-to-date, and encrypted.

ArangoSearch
ArangoSearch is a search and similarity ranking engine integrated natively into ArangoDB
and AQL. It supports relevance-based searching, phrase and prefix-matching, complex
Boolean searches, fuzzy search capabilities, and query-time relevance tuning. You can
combine ArangoSearch with all supported data models in a single query. Many specialized
language analyzers are available out of the box (e.g., English, German, French, Chinese,
Spanish, and many other languages).

Pregel
Pregel is a system for large-scale graph processing. This system can perform distributed
graph processing without needing distributed global locking. Distributed graph processing
enables users to conduct online analytical processing directly on graphs stored in ArangoDB.
ArangoDB implements Pregel to discover hidden patterns, identify communities, and
perform in-depth analytics of large graph data sets.
8
W H I T E P A P E R — Graph Done Right

Kube-Arangodb
The ArangoDB Kubernetes Operator (kube-arangodb) is a set of operators deployed
in a Kubernetes cluster to:
• Manage ArangoDB database deployments
• Provide PersistentVolumes on local storage nodes for optimal storage performance
• Configure ArangoDB datacenter to datacenter replication

ArangoGraphML
ArangoGraph Insights Platform offers support for both analytics tasks and graph-powered
machine learning. ArangoGraphML is backed by the graph capabilities of ArangoDB.
These graph capabilities are especially useful in a machine-learning platform for feature
engineering. They enable users to combine different data aspects into features that can
be used by machine learning frameworks such as TensorFlow or PyTorch to train models.
ArangoGraphML offers a simple interface for accessing machine learning frameworks and
tools. In a production-grade machine learning infrastructure, ArangoGraphML provides
support for common metadata storage across the entire machine learning lifecycle,
enabling reproducibility, monitoring, and auditing for machine learning models.

Graphs at Scale
The graph capabilities of ArangoDB are similar to a property graph database, but they provide
more flexibility in data modeling because nodes and edges are both full JSON documents.

As an application grows, so does graph size. To make sure graph traversals stay as
performant as possible, even when sharded across multiple servers in a cluster, ArangoDB
provides EnterpriseGraph and SmartGraphs. For larger datasets, EnterpriseGraph and
SmartGraphs reduce the needed network hops by intelligently sharding data.

Additional Resources
For more information, see:
• Scaling with Graphs
• SmartGraphs
• EnterpriseGraph

About ArangoDB
ArangoDB is the company behind ArangoGraph Insights Platform: a next-generation graph data and analytics
platform that uncovers insights in data that are difficult or impossible with traditional SQL, document, or even
other graph databases — making it easier to drive value from connected data, faster. ArangoGraph Insights
Platform is the scalable backbone for graph analytics and complex data architectures for thousands of
Fortune 500 enterprises and innovative startups across many different industries, including financial
services, healthcare, and telecommunications.

Founded in 2015 in Cologne, Germany, ArangoDB Inc. is a venture-backed, next-generation graph data
and analytics company headquartered in San Francisco, California, with offices and employees worldwide.
Learn more at arangodb.com.

© 2022 ArangoDB, Inc. All rights reserved. All trademarks are the property of their respective owner(s).

11/22

You might also like