SAP_HANA_distributed_in-memory_database_system_Transaction_session_and_metadata_management

The document provides an overview of the SAP HANA distributed in-memory database system, focusing on its transaction, session, and metadata management capabilities. It highlights the system's main-memory centric architecture, which allows for efficient data management and scalability across multiple nodes, while maintaining ACID properties. Additionally, the paper discusses the challenges and solutions related to distributed query processing, data distribution, and metadata management within the SAP HANA landscape.

Uploaded by

jerry0222forwork

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views9 pages

SAP_HANA_distributed_in-memory_database_system_Transaction_session_and_metadata_management

Uploaded by

jerry0222forwork

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

SAP HANA Distributed In-Memory Database

System: Transaction, Session, and Metadata

Management
Juchang Lee#1, Yong Sik Kwon#2, Franz Färber*3, Michael Muehle*4, Chulwon Lee#5,
Christian Bensberg*6, Joo Yeon Lee#7, Arthur H. Lee+#8, Wolfgang Lehner^*9
#
SAP Labs, Korea
1 2 5 7
[email protected], [email protected], [email protected], [email protected]
*
SAP AG, Germany
3 4 6
[email protected], [email protected], [email protected]
+
Claremont McKenna College, USA
8
[email protected]
^
Dresden University of Technology, Germany
9
[email protected]

Abstract— One of the core principles of the SAP HANA database compression making it possible to store even large datasets
system is the comprehensive support of distributed query facility. within the main memory of the database server. Recent
Supporting scale-out scenarios was one of the major design developments in the hardware sector economically allow
principles of the system from the very beginning. Within this having off-the-shelf servers with 2 TByte of DRAM. The
paper, we first give an overview of the overall functionality with
main-memory centric approach therefore turns the classical
respect to data allocation, metadata caching and query routing.
We then dive into some level of detail for specific topics and architectural paradigm upside-down: While traditional disk-
explain features and methods not common in traditional disk- centric systems try to guess the hot data to be cached in main
based database systems. In summary, the paper provides a memory, the SAP HANA approach defaults to have
comprehensive overview of distributed query processing in SAP everything in main memory; only “cold” data—usually
HANA database to achieve scalability to handle large databases determined by complex business rules and not by buffer pool
and heterogeneous types of workloads. replacement strategies working without any knowledge of the
application domain and corresponding business objects—can
be staged out onto disk infrastructures. This allows SAP
I. INTRODUCTION
HANA to support even very large databases in terms of a
An efficient and holistic data management infrastructure is large number of tables and data volumes sufficient to serve all
one of the key requirements for making the right decisions at SAP customers with existing and future applications.
an operational, tactical, and strategic level. The SAP HANA In addition to performance, the SAP HANA database also
database is the core component of SAP’s HANA roadmap targets to support business processes from a holistic
playing the foundation to efficiently support all SAP and non- perspective. For example, the system may hold text
SAP business processes from a data management perspective documents of products within an order together with
[1]. In opposite to the traditional architecture of a database structured information of the customer and spatial information
system, the SAP HANA database takes a different approach to of the current delivery route. As outlined in [2], the SAP
provide support for a wide range of data management tasks. HANA database provides multiple engines exposing special
For example, the system is organized in a main-memory services. Data entered for example in text format can be
centric fashion to reflect the shift within the memory extracted, semantically enriched, and transformed into
hierarchy [9] and to consistently provide high performance structural data for combination with information coming from
without any slow disk interactions. an engine optimized for graph-structured analytics.
Completely transparent for the application, data is Combining heterogeneous datasets seamlessly within a single
organized along its life cycle either in column or row format, query processing environment and providing support for the
providing the best performance for different workload complete life cycle of data on a business object level are some
characteristics [10]. Transactional workloads with a high of the unique features of SAP HANA.
update rate and point queries are routed against a row store; Finally, SAP HANA is positioned to act as a consolidation
analytical workloads with range scans over large datasets are platform for many different use cases, from an application
supported by column oriented data structures. In addition to a perspective and from a data management perspective.
high scan performance over columns, the column-oriented Multiple SAP and non-SAP applications may run on top of
representation offers an extremely high potential for one single SAP HANA instance providing the right degree of

Authorized licensed use limited©to:2013

978-1-4673-4910-9/13/$31.00 IEEE
Huawei 1165
Technologies Co Ltd. Downloaded on January 17,2024 at 01:55:18 UTC from IEEE Xplore.ICDE Conference
Restrictions apply. 2013
“isolation”—strict isolation, if required, e.g., from security at different nodes pursuing two different strategies. One the
perspective, deep integration if different datasets are supposed one hand, specialized reorganization tools exist to provide
to be merged like in typical data-warehouse scenarios. Having advice for the DBA reaching optimal partitioning schemes.
a single SAP HANA landscape reduces operational expenses For example, the toolset checks incoming workloads on a
and therefore TCO in general. However, scalability is required table usage level to come up with a proposal to either spread
to provide such a degree of service. Therefore, from the very out partitions of a table or co-locate different tables in order to
beginning on, the SAP HANA database was designed for avoid multi-node joins or expensive commit protocols. As of
scalability in different directions: now, the toolset is optimized to support specific SAP
• Scale Up: Due to main memory requirements SAP applications, especially SAP Business Warehouse also
HANA was designed to run on “big machines” offering considering CPU and memory usage of all active nodes.
multiple CPUs and a fairly large number of threads. Based on the reference behavior and current system usage, the
• Scale Out: The SAP HANA database runs in a multi- reorganization tool makes a proposal of a revised allocation
node environment to balance the need of CPU power and scheme. Future versions of the toolset will act in an
main memory capacity providing the same level of application-agnostic way supporting any arbitrary SQL-based
transactional guarantees like in a single node scenario. workload.
• Scale In: Scale in typically denotes multi-tenancy On the other hand, the DBA may directly assign partitions
support and therefore ability to host multiple logical of a table to individual HANA nodes. This manual task is
databases within a single physical instance offering a especially beneficial to achieve certain performance
certain level of schema and data sharing. characteristics of certain tables. For example, a DBA might
Contributions: Within this paper, we focus on some core want to avoid distributed transactions with network traffic and
concepts of distributed query processing in order to provide a protocol delay to improve query performance. For example, a
robust and efficient scale-out solution. We outline the need to single landscape may consist of one very large machine and
balance the gain of larger main memory capacities and larger multiple smaller nodes as shown in Figure 1. The large
number of computing units against the complexity coming machine node will then host all “transactionally hot” tables or
with a distributed environment. Therefore, we start with an partition of tables avoiding distributed transactions with
overview of distributed query processing in SAP HANA network traffic and protocol delay. More analytically oriented
following the life cycle of an individual query pinpointing applications targeting multiple partitions of historical data or
specific problems and solutions along the way. Thereafter, we databases coming from external data sources will then hit the
dive into detail for some selected problems and give insights parallel nodes to improve query performance. Since all
into the conceptual solution design. In summary, the paper datasets are part of one single SAP HANA landscape, the
provides a comprehensive overview of distributed query database system is able to run cross-joins within multi-node
processing in SAP HANA and describes some procedures and transactions, if the query demands it—the allocation and
their optimizations in detail. deployment scheme just tries to reduce the communication
within the cluster.
II. DISTRIBUTED QUERY PROCESSING IN HANA
As already mentioned, scaling database services over
multiple nodes connected via a high-speed network
infrastructure implies a variety of challenges. Every single
component of a database system has to be “distribution-
enabled”, i.e., not only working correctly but also efficiently
in a distributed environment. From that perspective, the fact of
distribution affects functional as well as non-functional
service primitives ranging from distributed (multi-node) query
processing to caching strategies of metadata repositories.
The overall goal of the SAP HANA database approach
consists in scaling over a reasonably large number of nodes
without sacrificing overall system performance and all well-
known transactional guarantees, i.e., ACID properties. Fig. 1. Asymmetric deployment of an SAP HANA landscape
The core database challenges can be classified into four B. Distributed Transaction Management
major categories: distribution of data, distributed transaction
management, distributed metadata management, and In opposite to scale-out solutions like Hadoop, SAP HANA
distributed query optimization and execution. follows the traditional semantics of providing full ACID
support. In order to make good the promise of supporting both
A. Deployment Schemes and Data Distribution OLTP and OLAP-style query processing within a single
As in all high-end database systems, a single table can be platform, the SAP HANA database relaxes neither any
split into multiple partitions using hash, round-robin, range consistency constraints nor any degree of atomicity or
partitioning strategies. Individual partitions are then allocated durability. On the one side, the SAP HANA database applies
traditional locking and logging schemes for distributed

1166
Authorized licensed use limited to: Huawei Technologies Co Ltd. Downloaded on January 17,2024 at 01:55:18 UTC from IEEE Xplore. Restrictions apply.
scenarios with some very specific optimizations like connects to a particular server and starts the query compilation
optimizing the two-phase commit protocol (subsection III-D) process. Figure 3 shows the different steps.
or providing sophisticated mechanisms for session
management (subsections III-C.1, III-C.2, and III-C.3). As
mentioned the deployment of the system usually reflects the
intended use in order to have a benefit of a large node for
heavy transaction processing and a number of usually smaller
nodes for analytical workloads where the additional overhead
of distributed synchronization reflects a relatively small
portion of the overall query runtime. Since SAP HANA relies
on MVCC as the underlying concurrency control mechanism,
the system provides distributed snapshot isolation and
distributed locking to synchronize multiple writers. Therefore,
the system relies on a distributed locking scheme with a global
deadlock detection mechanism avoiding a centralized lock
server as a potential single point of failure.
C. Distributed Metadata Management
Within an SAP HANA database landscape, a coordinator Fig. 3. Query compilation and execution in a single node scenario
node stores and manages all the persistent metadata such as
table/view schema, user information, privileges on DB objects, The session layer forwards an incoming query request to
etc. To satisfy requirements for consistent metadata access, the optimizer (1). After consulting metadata (2) and checking
the metadata object container provides both MVCC based the plan cache (3), the query will eventually be optimized and
access and transactional update (ACID) on its contents. It also compiled. In opposite to other systems, the optimizer embeds
provides index-based fast object lookup. substantial metadata into the query plan and returns it to the
client. For example, metadata flowing back to the client
contains information about the most optimal node to actually
start and orchestrate the query execution. Within a single node
case, the client sends the query plan to the execution
component (4), puts the plan into the (current) query plan
cache and starts executing the query in a combination of
column- and row-store. In the case of an update, additional log
information will eventually be written to the persistency layer
to ensure atomicity and durability.
Fig. 2. Distributed metadata management within an SAP HANA landscape The query flow is a bit more complicated in a multi-node
deployment as illustrate in Figure 4. As before, a client may
In order to improve access to metadata at worker nodes, the send a query to a dedicated coordinator node (1). After the
concept of metadata caches enables local access to “remote” initial query compilation, the returned query plan contains
metadata in a distributed environment. Figure 2 shows the information about the node where the query should be
metadata object container and cache in the coordinator and executed (4). This recommendation is based on data locality
worker nodes. When a component in a worker node requires for the particular query and the current system status. The
access to a metadata object located at the (remote) coordinator, client then sends the query to the recommended (worker) node
the metadata manager first tries to locate it in the cache. If (5) and re-compiles the query with respect to the node-
there is no result object in the cache, a corresponding retrieval specific properties and statistics (6). As before, the
request is sent to the coordinator. The result is placed within optimization step requires access to metadata, which is either
the cache and access is granted to the requesting component. already available at the current node (7) or requested on-the-
In order to reduce potential round-trips to fetch different fly from the coordinator (8). After re-compilation, the query is
entries of metadata, the system applies group caching of executed (10) by extracting the plan from the plan cache (11),
tightly related metadata, e.g., a cache request for metadata passing it to the execution component which again routes the
related for a table also returns metadata about columns, individual requests to the local storage structures (12) and
existing indexes, etc. within a single request. For consistent potentially local persistency layer (13). Figure 5 illustrates the
query processing, the access to the metadata cache is tightly benefits of using statement re-routing with a simple single-
coupled with the transaction management. table select. The figure shows three cases: (i) single-node case;
(ii) multi-mode case with statement routing turned on; and (iii)
D. Distributed Query Compilation and Execution multi-node case with statement routing turned off. As we can
In order to illustrate the key concepts of distributed query see, cases (i) and (ii) are virtually identical. Case (iii) however
processing within SAP HANA, we will follow a query in is significantly and consistently slower than the other two
different scenarios. Within a single node setup, the client cases. In addition to statement routing, this scenario also

1167
Authorized licensed use limited to: Huawei Technologies Co Ltd. Downloaded on January 17,2024 at 01:55:18 UTC from IEEE Xplore. Restrictions apply.
Fig. 6. Distributed query spanning multiple nodes

E. Some Experimental Results

Without diving into too much detail at this point, we give
some overall performance numbers showing the potential (e.g.,
Fig. 4. Query re-compilation at remote node
read scalability, write behavior, etc.) of the SAP HANA
exploits the optimization of using a local transaction token to database system with respect to different workload scenarios.
reduce the communication cost between nodes, effectively The reported numbers are based on an SAP HANA landscape
behaving like a single-node scenario. with 31 nodes, each consisting of 0.5 TByte main memory, 20
physical CPU cores, and local disks with 700 MB/s
throughput for local logging. A 10 Gbps network was used for
the interconnect. The numbers are taken from a real customer
SAP Business Warehouse installation on top of SAP HANA
database; the SAP HANA database (after compression)
showed an overall size of 100 GByte.

Fig. 5. Sample scenario showing benefits of client-side statement routing

The most general case of a distributed query execution

spans some of the operators over multiple nodes. While the
initial compilation and client-side routing including re-
The first workload consists of typical analytical queries
compilation, metadata access, etc. are equivalent to the former
issued against the warehouse from Business Intelligence tools.
case, a multi-node query requires two additional steps (Figure
While Table I shows the average execution times over 200
6). Before starting the execution of the query, the transaction
single queries comprising the two different types of workloads,
manager consults the coordinator (1), registering the
Figure 7 depicts the speedup with a growing number of nodes:
transaction and retrieving a global transaction token holding
1) Query 1 is a representative set of queries requiring a full
the visibility information for that particular query (in general
table scan over the fact table, which can be parallelized
for all queries within a transaction). For execution, the over multiple worker nodes. As we can see,
executor component distributes the query plan over multiple
performance increases with the number of nodes but
nodes by shipping sub-query plans with the transaction token
drops as the landscape uses a partitioning scheme
and triggering the individual processing steps (5). The
stretching the fact table over all 31 nodes implying a
decision where to run what part of the global query plan and
larger number of local joins. The performance drop is
when to move intermediate data between the nodes is
primarily caused by the size of the resulting partitions.
determined on a cost-based basis during query compilation.
If the partitions are too small, the resulting overhead
Obviously, the distributed query optimizer is a non-trivial
and the large number of local join operations with
piece of the SAP HANA database in order to reduce network
distribution of the join partner is counterproductive with
traffic and decide which part of a query can be parallelized
respect to the overall query performance.
over multiple nodes.
2) In the opposite, query set 2 touches a small piece of
data located only at a single node. As can be seen, the
scale-out does not significantly affect query
performance, neither in the positive nor in the negative
way; the speedup therefore remains constant
independent of the number of nodes.

1168
Authorized licensed use limited to: Huawei Technologies Co Ltd. Downloaded on January 17,2024 at 01:55:18 UTC from IEEE Xplore. Restrictions apply.
HANA landscape. Optimally partitioning and allocating
10 partitions to nodes within a landscape is therefore of utmost
importance. The SAP HANA database therefore comes with a
8
variety of tools optimized for SAP applications to support the
administration task. Second, the SAP HANA database
speedup

6 ●

●
query #1
provides full ACID support in the scale-out scenario which is
4 ●
therefore different to “Big Data”-systems like Hadoop and the
2 ● like. Third, the system provides a sophisticated and MVCC-
query #2
●
based metadata caching infrastructure reducing the overall
0
1 3 7 15 31
communication cost in a landscape. Finally, the SAP HANA
database pursues client-side statement routing as the core
# nodes
principle of load balancing—this paradigm is therefore
Fig. 7. Speedup for analytical workload orthogonal to traditional parallel database architectures relying
on page caching strategies. While page caching may result in
As outlined in the introduction, the SAP HANA database
having the same page in multiple nodes and thus reducing the
system is designed to deliver extremely good performance for
net main memory capacity, client-side statement routing
OLAP- and OLTP-style query workloads. We therefore also
brings the query to the node with the highest data locality thus
consider a typical update-heavy workload in the context of a
reducing network traffic and in cases of transactions against a
real SAP BW scenario. After loading extracted raw data into
small number of “related” tables even the need for two-phase-
the BW system and applying transformations with respect to
commit. This mechanism is especially useful for asymmetric
data cleansing and harmonization requirements, new data is
deployment schemes (Figure 1) by routing transactional
eventually “merged” with the existing state of the fact table
workload queries to the large node of an SAP HANA
implying a mix of insert, delete, and update operations on the
landscape.
potentially very large fact table. Following this “activation
step”, the system is applying reorganization steps to better III. SELECTED OPTIMIZATIONS
reflect the multidimensional model. Again this “compression
step” within the overall SAP BW loading process is a write- In this section we describe in detail some selected
intensive step which can nicely be parallelized over multiple optimization problems in the SAP HANA distributed
nodes. landscape and provide insights into the conceptual solution
1) The loading chain type 1 shown in Table I reflects an design for them.
activation step. As can be seen in Figure 8, the system A. Client-Side Statement Routing
scales very nicely with the number of nodes, despite the
In a distributed database environment a query statement
heavy write process.
issued by a client may be executed in any one of the available
2) The compression step (loading chain type 2) finally
nodes. The problem is determining the optimal node to run it
shows excellent scalability with a speedup of 13.6 for
on and route it to the node, which in HANA would be any one
15 nodes or 25.2 for the 31 node case and therefore
of the worker nodes in the system. HANA uses what we call
demonstrates the scale-out capability in an optimal
client-side statement routing to achieve this. Affinity
setting with respect to the physical database design.
information between a given compiled query and its accessed
data location is cached transparently at the client library. So,
30
loading chain #1 without changing the existing client programs and also
25 ●
without requiring any more information on clients, any
20 arbitrary query given by a client program can be routed to a
speedup

15 loading chain #2
desirable server node.
●
Figure 9 shows a conceptual view of the solution on how a
10
● query is routed by client library and how the affinity
5
● information is maintained at the client side. First, at compile
0
●
time the desired server location of a given query can be
1 3 7 15 31 decided and it is returned to the client library. On a
# nodes subsequent repeated execution of such a compiled query, it is
routed to the associated server node directly by the client
Fig. 8. Speedup for write-intensive workload
library. Sometimes, by DDL operations such as table
movement or repartitioning, however, the desired location of a
given query can be changed as for Table T2 in Figure 9. In
Summary such a case, on the next query execution following the DDL
As outlined, distributed query processing in SAP HANA is operation, such inefficiency is detected automatically by
based on four fundamental building blocks. First of all, the comparing the metadata timestamp of the client-side cached
SAP HANA system is working in a shared-nothing fashion information with the server-side latest metadata timestamp
with respect to the nodes' main memory within an SAP

1169
Authorized licensed use limited to: Huawei Technologies Co Ltd. Downloaded on January 17,2024 at 01:55:18 UTC from IEEE Xplore. Restrictions apply.
value and then the client-cached information is updated with transaction isolation mode, etc.). Without client-side
the latest information. statement routing, there is only 1-to-1 mapping between a
logical session and a client-server physical connection, which
means that the session context information can be stored in the
server-side connection. With client-side statement routing,
however, it should be shared across multiple client-server
physical connections. In HANA, some of the session context
information is also cached at the client library side. The
cached information can then be used when it has to make a
new physical connection to a different server within the same
logical session without having to contact the initial physical
connection of the logical session.
In this subsection, we primarily focused on the cases in
which the desired execution location of a query is decided at
query compilation time. However, there are other types of
queries whose desired location can be decided dynamically at
run time. Such cases are described in the subsection.
B. Dynamic Resolution in Statement Routing
1) Table partitioning: Using the partitioning feature of the
Fig. 9. Client-side statement routing
SAP HANA database, a table can be partitioned horizontally
How is the optimal target location decided given a client- into disjunctive sub-tables or “partitions”, each of which may
side statement then? A server extracts tables and/or views be used by each node of a distributed HANA database system.
needed to run the statement (or a stored procedure if that is the Problem is how to partition a table optimally so that each
case) and returns the locations (nodes) of the target tables partition is shipped to the optimal worker node. Partitioning
and/or views. The client then does the following: If the may be done by using one of three strategies: hash, round-
number of nodes returned is 1, route the query to that node. If robin, and range. Both hash partitioning and round-robin are
the number is greater than 1, then do the following: If the used to equally distribute the partitions to worker nodes.
table returned is a hash partitioned table or a round-robin Range partitioning can be used to create dedicated partitions
partitioned table, and furthermore the query is an INSERT for certain values or certain value ranges to be distributed to
query, then evaluate the input value of the partitioning key worker nodes.
and use the result as the target location. Otherwise, route it to Let us consider hash partitioning in more detail. To decide
the returned nodes in a round-robin fashion. a desirable partition of a given query to a partitioned table, we
The routing decision can be resolved at compile time need to consider the table's partitioning specification (or the
(which we call static resolution) in some cases or at run time partitioning function) and the execution-time input values of
(which we call dynamic resolution) in other cases. We now its partitioning keys. The partitioning specification can be
discuss static resolution optimizations in the rest of this given to the client library on the query compilation, but the
subsection and dynamic resolution optimizations will be execution-time input values can be known only during the
discussed in the subsection that follows. query execution time. This input value interpretation logic is
Let us consider stored procedures first. How is it decided normally done by the server-side query processing layer. For
at which node a stored procedure is executed? Within a stored this optimization though, such logic is also shipped together to
procedure there may be a variety of multiple queries. Usage the client library as well.
pattern of each query in a stored procedure is used in making This partitioning optimization technique is used to support
the routing decision. For example, if one query executes client-side routing of various statements such as inserts,
multiple times in a procedure, we can be a little smarter. selects, updates, and deletes as long as their WHERE clause
Suppose there are 10 queries and one of them is in a loop, in includes partitioning key information.
which case a higher priority is given to the query in a loop in 2) Load balancing: Each node in a distributed database
making the decision. system may be utilizing the key resources, i.e., CPU and
In some cases analyzing a stored procedure at a server memory, differently with different workloads at any given
alone may not be sufficient, in which case the server could get time. For example, one node may be busy with CPU-bound
some hints from the client code. Application programs could tasks while there may be at least one other node that is not
pass some domain knowledge into the application code. busy at all at that point in time. With HANA as a main-
With the client-side statement routing mechanism in place memory database system, memory is being used not only for
there is no more 1-to-1 mapping between a session and a processing but also for storage, i.e., holding table data. Again
physical connection or between a transaction and a physical for example, one node may almost be out of memory while
connection. This poses a technical challenge though, which there may be at least one other node with plenty of available
has to do with session context management. A session context memory. It is a correctness concern as well as performance
is a property of a session (for example, user name, locale, concern, i.e., we cannot ever get into a situation where a node

1170
Authorized licensed use limited to: Huawei Technologies Co Ltd. Downloaded on January 17,2024 at 01:55:18 UTC from IEEE Xplore. Restrictions apply.
fails because it ran out of memory. Therefore, it is important isolation mode) starts, it copies the current transaction token
to balance not only the processing load but also storage load into its context (called snapshot token). And, the transaction
among different nodes in the system. (or statement) decides which versions should be visible to
In addition to improving affinity between data location and itself based on the copied ‘snapshot token’.
processing location, we can extend client-side statement Now we describe distributed snapshot isolation (or
routing to achieve better load balancing across HANA server distributed MVCC). In our transaction protocol every
nodes. When returning a query result to a client library, a transaction started at a worker node should access the
HANA server can return its memory and CPU resource transaction coordinator to get its snapshot transaction token.
consumption status together with the result, i.e., without This could cause (1) a throughput bottleneck at the transaction
making any additional round trip. Alternatively, the client coordinator and (2) additional network delay to the worker-
library can periodically collect the resource status of HANA side local transactions. To remedy these situations we use
server nodes. If the client detects that one of computing nodes three techniques: (i) one that enables local read-only
does not have enough CPU or memory resource at that point transactions to run without accessing the global coordinator;
in time, the client library tries to temporarily re-route the (ii) another that enables local read or write transactions to run
current query to other nodes. Furthermore, at query without accessing the global coordinator; and (iii) third that
compilation time the client library can attach its expected uses “write TID buffering” to enable multi-node write
memory consumption to the query plan. If this information is transactions to run with only a single access to the global
then cached and attached to the compiled query at the client coordinator. We now describe all three techniques in order.
side, the client library can perform more efficient re-routing. 1) Optimization for worker-node local read transactions or
statements: Every update transaction accesses the transaction
C. Distributed Snapshot Isolation (Distributed MVCC) coordinator to access and update the global transaction token.
For distributed snapshot isolation figuring out how to Read-only statements on the other hand just start with its
reduce the overhead of synchronizing transaction ID or cached local transaction token. This local transaction token is
commit timestamp across multiple servers belonging to a refreshed
same transaction domain has been a challenging problem [3], • by the transaction token of an update transaction when it
[4], [5]. SAP HANA database focuses on optimization for a commits on the node, or
single-node query which is executed without accessing any • by the transaction token of a ‘global statement’ when it
other node. If the partitioning and table placement is done comes in to (or started at) the node.
optimally, then most of queries can be processed within a If the statement did not need to access any other node, it
single node. For long-running analytical queries that have to can just finish with the cached transaction token, i.e., without
be spanned across multiple nodes, the overhead incurred by any access to the transaction coordinator. If it detects that the
communicating such transactional information will be statement should also be executed in another node, however, it
relatively ignorable compared to the overall execution time of is switched to the 'global statement' type, after which the
the query. So, our optimization choice that favors single-node current statement is retried with the global transaction token
queries can make sense for many applications. obtained from the coordinator.
The question is how is it figured out whether a transactional Single-node read-only statements/transactions do not need
snapshot boundary will only touch one node or not in advance? to access the transaction coordinator at all. This is significant
It is feasible especially under the Read-Committed isolation to avoid the performance bottleneck at the coordinator and to
mode [6], which is the default isolation mode in many SAP reduce performance overhead of single-node statements (or
applications. transactions).
In Read-Committed isolation mode the MVCC snapshot 2) Optimization for worker-side local write transactions:
boundary is the lifetime of the query, which means that the Each node manages its own local transaction token
transaction does not need to consider any other query. So, the independently of the global transaction token. Even the
isolation boundary for the query can be determined at compile update transaction can just update its own local transaction
time, i.e., at that time it can figure out exactly what parts of token if it is a single-node transaction. The difference is that
which tables to access to execute the query. When the query each database record has two TID (or Transaction ID)
finishes, the snapshot finishes its life as well, i.e., the snapshot columns for MVCC version filtering: one for global TID and
is meaningful only on that local node while the query is another for local TID. (In the existing other schemes, there is
executing. The entire transaction context that is needed for a only one TID/Commit ID column.) If it is a local-only
query to execute is captured in a data structure called transaction, it reads/updates the local TID. If it is a global
transaction token (described below) and is cached on a local transaction, however, it reads either global or local TID (reads
node. For the “Repeatable Read” or “Serializable” isolation a global TID if there is a value in its global TID column;
level, the transaction token can be reused for the queries otherwise, reads a local TID) and updates both global and
belonging to the same transaction, which means that its local TIDs. So, the global transactions carry two snapshot
communication cost with the coordinator node is less transaction tokens: one for global transaction token and
important. another for the current worker node's local transaction token.
Whenever a transaction (in a transaction-level snapshot
isolation mode) or a statement (in a statement-level snapshot

1171
Authorized licensed use limited to: Huawei Technologies Co Ltd. Downloaded on January 17,2024 at 01:55:18 UTC from IEEE Xplore. Restrictions apply.
In the log record both global and local TIDs are also
recorded if it is a global transaction. On recovery a local
transaction's commit can be decided by its own local commit
log record. Only for global transactions it is required to check
the global coordinator. Here again statement type switch
protocol is necessary as it is the case in case 1) above.
3) Optimization for multi-node write transactions: A multi-
node write transaction needs a global write TID as all multi-
node transactions do. The optimization described in case 2)
above would not help. It can however be handled by the write
TID buffering technique which we describe now.
In a HANA scale-out system one of the server nodes
becomes the transaction coordinator which manages the
distributed transaction and controls two phase commit.
Executing a distributed transaction involves multiple network
communications between the coordinator node and worker Fig. 10. Returning commit ack early after first commit phase
nodes. Each write transaction is assigned a globally unique
For this optimization three things are considered.
write transaction ID. A worker-node-first transaction, one that
starts at a worker node first, should be assigned a TID from 1) Writing the commit log entries on the worker nodes
the coordinator node as well, which causes an additional can be done asynchronously. During crash recovery
network communication. Such a communication might then some committed transactions can be classified as
significantly affect the performance of distributed transaction in-doubt transactions, which will be resolved as
execution. This extra network communication is eliminated to committed finally by checking the transaction's status
improve the worker-node-first write transaction performance. in the coordinator.
We solve this problem by buffering such global write TIDs 2) If transaction tokens are cached asynchronously on the
in a worker node. When a request for a TID assignment is worker nodes, the data is visible by a transaction but
made the very first time, the coordinator node returns a range not by the next (local-only) transaction in the same
session. This situation can be detected by storing the
of TIDs which gets buffered in the worker node. The next
last transaction token information for each session at
transaction which needs a TID gets it from the local buffer
thus eliminating the extra network communication with the the client side. And then, until the second commit
coordinator node. A few key challenges are determining phase of the previous transaction is done, the next
optimal buffer size as a parameter and deciding when to flush query can be stalled.
the buffer if they are not being used. 3) If transactional locks are released after sending a
Therefore, by combination of the optimizations described commit acknowledgement to the client, a 'false lock
in this subsection SAP HANA provides transparency of conflict' may arise by the next transaction in the same
transaction performance regardless of where it is started and session. This situation can however be detected by the
committed without losing or mitigating any transactional same problem with (2) above. If this is detected, the
consistency. transaction can wait for a short time period until the
commit notification arrives to the worker node.
D. Optimizing Two-Phase Commit Protocol 2) Skipping writes of prepare logs: The second optimiza-
Two-phase commit (2PC) protocol is widely used to ensure tion is to remove additional log I/Os for writing prepare-
atomicity of distributed multi-node update transactions. A commit log entries.
series of optimization techniques we describe in this In a typical 2-phase-commit the prepare-commit log entry
subsection are our attempts to reduce the network and log I/O is used to ensure that the transaction's previous update logs are
delays during a two-phase commit thus increasing throughput. written to disk and to identify in-doubt transactions at the
1) Early commit acknowledgement after the first commit recovery time.
phase: Our first optimization is to return commit • Writing the transaction's previous update logs to disk can
acknowledgement early after the first commit phase [7], [8], be ensured without writing any additional log entry, by
as shown in Figure 10. just comparing the transaction-last-LSN (log sequence
Right after the first commit phase and the commit log is number) with the log-disk-last-LSN. If the log-disk-last-
written to the disk, the commit acknowledgement can be LSN is larger than the transaction-last-LSN, it means
returned to the client. And then, the second commit phase can that the transaction's update logs are already flushed to
be done asynchronously. disk.
• If we do not write the prepare-commit log entry, we can
handle all the uncommitted transactions at recovery time
as in-doubt transactions.

1172
Authorized licensed use limited to: Huawei Technologies Co Ltd. Downloaded on January 17,2024 at 01:55:18 UTC from IEEE Xplore. Restrictions apply.
Their commitance can be decided by checking with the IV. SUMMARY
transaction coordinator. So, the size of in-doubt The SAP HANA database is primarily designed to cover
transaction list can increase, but with less run-time the three difference scaling principles: scale-in, scale-up, and
overhead. scale-out. In this paper we outlined some of the hard problems
3) Group two-phase commit protocol: This is a similar idea in multi-node scenarios, showed the core architectural designs,
to the one described in Subsection III-C.3, but instead of and give optimization details for some of the problems.
sending commit requests to the coordinator node individually Specifically, we discuss optimization for query routing in
(i.e., one for each write transaction), we can group multiple single and multiple node scenarios, we show optimizations
concurrent commit requests into one and send it to the techniques for client-based statement routing, two-phase-
coordinator node in one shot. commit protocol, and finally give some insights into caching
Also, when the coordinator node multicasts a "prepare- strategies and techniques for metadata catalogue of the SAP
commit" request to multiple-related worker nodes of a HANA database.
transaction, we can group multiple "prepare-commit" requests
of multiple concurrent transactions which will go to the same ACKNOWLEDGMENT
worker node. We explicitly want to express our Thanks to the SAP
By this optimization we can get better throughput of HANA team, especially to all the readers of a preliminary
concurrent transactions. version of this paper. We also thank Sang K. Cha for
Two-phase commit itself cannot be avoided fundamentally suggesting the idea of writing this paper.
for ensuring global atomicity to multi-node write transactions.
However, by combination of the optimizations described in REFERENCES
this subsection, we can reduce its overhead significantly. [1] F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner,
“SAP HANA database: data management for modern business
applications,” SIGMOD Record, vol. 40, no. 4, pp. 45–51, 2011.
E. Metadata Cache Management [2] V. Sikka, F. Färber, W. Lehner, S. K. Cha, T. Peh, and C. Bornhövd,
“Efficient transaction processing in SAP HANA database: the end of a
The system-wide metadata catalogue is kept in the column store myth,” in SIGMOD Conference, 2012, pp. 731–742.
coordinator node. As a worker node processes a query, it [3] C. Mohan, H. Pirahesh, and R. Lorie, “Efficient and flexible methods
would have to obtain the necessary metadata information from for transient versioning of records to avoid locking by read-only trans-
the coordinator node. To reduce the overhead caused by the actions,” in Proceedings of ACM SIGMOD International Conference
on Management of Data, 1992.
network latency in the communication between the [4] H. V. Jagadish, I. S. Mumick, and M. Rabinovich, “Scalable versioning
coordinator node and a worker node, HANA supports a in distributed databases with commuting updates,” in Proceedings of
worker node requesting multiple metadata objects with a the International Conference on Data Engineering (ICDE), 1997.
single request when beneficial. [5] H. V. Jagadish, I. S. Mumick, and M. Rabinovich,, “Asynchronous
version advancement in a distributed three version database,” in
When a worker node needs to obtain metadata information Proceedings of the International Conference on Data Engineering
from the coordinator node because it does not have that (ICDE), 1998.
information already cached on it, it will incur IPC. An [6] H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O’Neil, and P.
example can be seen in Step 8 of Figure 4. Since each single O’Neil, “A critique on ansi sql isolation levels,” in Proceedings of
ACM SIGMOD International Conference on Management of Data,
metadata cache miss will cause IPC via network 1995.
communication, it can be a considerable performance penalty [7] C. Mohan, B. Lindsay, and R. Obermarck, “Transaction management
for the cases where there are many metadata cache misses for in the R* distributed database management system,” ACM
a single query execution. Group caching for metadata is Transactions on Database Systems, vol. 11, no. 4, 1986.
[8] R. Gupta, J. Haritsa, and K. Ramamritham, “Revisiting commit
introduced to minimize cache miss penalties of this kind. processing in distributed data-base systems,” in Proceedings of ACM
Instead of checking cache in an on-demand manner, a worker SIGMOD International Conference on Management of Data, 1997.
node collects the entire required metadata object IDs first and [9] J. Gray, “Tape is dead, disk is tape, flash is disk. ram locality is king,”
sends a single request for all the missing objects. This 2006.
(http://research.microsoft.com/en-us/um/people/gray/jimgraytalks.htm).
optimization is particularly good for a query with a complex [10] H. Plattner, “A common database approach for OLTP and OLAP using
query plan which requires accessing multiple metadata objects an in-memory column database,” in Proceedings of ACM SIGMOD
of various kinds such as tables, indexes, views, and privileges. International Conference on Management of Data, 2009.

1173
Authorized licensed use limited to: Huawei Technologies Co Ltd. Downloaded on January 17,2024 at 01:55:18 UTC from IEEE Xplore. Restrictions apply.

Enterprise FW 03-Security Fabric
No ratings yet
Enterprise FW 03-Security Fabric
31 pages
Abap On Hana: 1. Introduction and Technical Concepts of SAP HANA 2. Introduction To HANA Studio
No ratings yet
Abap On Hana: 1. Introduction and Technical Concepts of SAP HANA 2. Introduction To HANA Studio
29 pages
SAP - HANA Sales and Distribution
63% (16)
SAP - HANA Sales and Distribution
14 pages
Sap Hana DB Architecture
No ratings yet
Sap Hana DB Architecture
21 pages
Operational Acceptance Test - White Paper, 2015 Capgemini
100% (6)
Operational Acceptance Test - White Paper, 2015 Capgemini
12 pages
Hana Selected Concepts For Interview
No ratings yet
Hana Selected Concepts For Interview
29 pages
Performance Analysis and Architectural O
No ratings yet
Performance Analysis and Architectural O
6 pages
The SAP HANA Database - An Architecture Overview
No ratings yet
The SAP HANA Database - An Architecture Overview
6 pages
Sap Hana
No ratings yet
Sap Hana
4 pages
Hana PDF
No ratings yet
Hana PDF
7 pages
HA200
No ratings yet
HA200
19 pages
S4HANA Embedded Analytics2
No ratings yet
S4HANA Embedded Analytics2
10 pages
SAP HANA Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
SAP HANA Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
SAP HANA Database - Data Management For Modern Business Applications
No ratings yet
SAP HANA Database - Data Management For Modern Business Applications
7 pages
What Is Sap HANA?
No ratings yet
What Is Sap HANA?
58 pages
What Is Sap HANA?
100% (1)
What Is Sap HANA?
58 pages
Column-Oriented In-Memory Database Olap Oltp Main Memory Disk
No ratings yet
Column-Oriented In-Memory Database Olap Oltp Main Memory Disk
2 pages
Sap Hana Database: Mihnea Andrei SAP Products & Innovation HANA Platform July 8, 2014 Public
No ratings yet
Sap Hana Database: Mihnea Andrei SAP Products & Innovation HANA Platform July 8, 2014 Public
32 pages
SAP HANA's Defining Capabilities
No ratings yet
SAP HANA's Defining Capabilities
11 pages
SAP S4 HANA Overview
No ratings yet
SAP S4 HANA Overview
2 pages
Introduction To HANA - Deep Dive
No ratings yet
Introduction To HANA - Deep Dive
106 pages
Sap Hana:: OLTP: Simple Queries Like INSERT, UPDATE, DELETE Etc
No ratings yet
Sap Hana:: OLTP: Simple Queries Like INSERT, UPDATE, DELETE Etc
6 pages
Introduction To SAP HANA
No ratings yet
Introduction To SAP HANA
29 pages
The SAP HANA Database - An Architecture Overview
No ratings yet
The SAP HANA Database - An Architecture Overview
6 pages
SAP HANA Database
No ratings yet
SAP HANA Database
9 pages
Features of SAP HANA
No ratings yet
Features of SAP HANA
18 pages
Seminar Report
No ratings yet
Seminar Report
30 pages
An Insight Into SAP HANA Architecture
No ratings yet
An Insight Into SAP HANA Architecture
116 pages
SAP HANA Architecture
No ratings yet
SAP HANA Architecture
10 pages
Sap Hana®:: The In-Memory Platform For Digital Transformation
No ratings yet
Sap Hana®:: The In-Memory Platform For Digital Transformation
13 pages
1SAP HANA Introduction
No ratings yet
1SAP HANA Introduction
22 pages
HBase Configuration and Operations: Definitive Reference for Developers and Engineers
From Everand
HBase Configuration and Operations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
SAP - HANA - Tuitorial - Chapter 1
No ratings yet
SAP - HANA - Tuitorial - Chapter 1
8 pages
SAP HANA Definitions
No ratings yet
SAP HANA Definitions
158 pages
Live Webinar On SAP S/4Hana: Evolving Technologies Need Smart Finance Solutions
No ratings yet
Live Webinar On SAP S/4Hana: Evolving Technologies Need Smart Finance Solutions
6 pages
Sap Hana
100% (1)
Sap Hana
22 pages
Hana Basics
No ratings yet
Hana Basics
153 pages
The 6 Key Business Benefits of SAP HANA
No ratings yet
The 6 Key Business Benefits of SAP HANA
4 pages
Introduction To SAP HANA
No ratings yet
Introduction To SAP HANA
12 pages
Sap Hana Docs
No ratings yet
Sap Hana Docs
9 pages
Sap Hana Quick Guide
No ratings yet
Sap Hana Quick Guide
110 pages
SAP HANA Overview: /hana/shared/ Exe Global Profile /hana/shared/ /hdbclient
No ratings yet
SAP HANA Overview: /hana/shared/ Exe Global Profile /hana/shared/ /hdbclient
7 pages
SAP HANA Questions 01
100% (1)
SAP HANA Questions 01
24 pages
SAP HANA Guide Book
No ratings yet
SAP HANA Guide Book
110 pages
11 - Different Information Systems
No ratings yet
11 - Different Information Systems
4 pages
SAP HANA Cloud - Foundation - Unit 2
No ratings yet
SAP HANA Cloud - Foundation - Unit 2
11 pages
Things To Know ABAP Coding
No ratings yet
Things To Know ABAP Coding
17 pages
www_smartlogicacademy
No ratings yet
www_smartlogicacademy
5 pages
Sap Hana Q&A: Answer
No ratings yet
Sap Hana Q&A: Answer
2 pages
SAP Hanna - The Game Changer
100% (1)
SAP Hanna - The Game Changer
103 pages
Abap On Hana
No ratings yet
Abap On Hana
10 pages
p731 Sikka PDF
No ratings yet
p731 Sikka PDF
11 pages
Hana Basic Interview Questionary 1726556337
No ratings yet
Hana Basic Interview Questionary 1726556337
52 pages
SAP HANA Cloud Guide
No ratings yet
SAP HANA Cloud Guide
30 pages
Hana basic knowledge for beginners
No ratings yet
Hana basic knowledge for beginners
11 pages
Hana Interview Questions with answers
No ratings yet
Hana Interview Questions with answers
50 pages
Sap Hana
No ratings yet
Sap Hana
2 pages
Differences~Haritha
No ratings yet
Differences~Haritha
25 pages
SAP HANA Interview Questions You'll Most Likely Be Asked
From Everand
SAP HANA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
ABAP On HANA
No ratings yet
ABAP On HANA
23 pages
Differences Between ECC and SAP S4 HANA With Frequently Asked Questions
80% (5)
Differences Between ECC and SAP S4 HANA With Frequently Asked Questions
24 pages
Advanced Apache Tez Techniques: Definitive Reference for Developers and Engineers
From Everand
Advanced Apache Tez Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Analysis of Large Web Sequences Using Aprioriall - Set Algorithm
No ratings yet
Analysis of Large Web Sequences Using Aprioriall - Set Algorithm
5 pages
Main Content
No ratings yet
Main Content
74 pages
CSIDL Values
No ratings yet
CSIDL Values
4 pages
jhv
No ratings yet
jhv
24 pages
Content
No ratings yet
Content
21 pages
Amol Patil - Android Developer
No ratings yet
Amol Patil - Android Developer
3 pages
How-To Install PowerDesigner 16.5 V1.2
No ratings yet
How-To Install PowerDesigner 16.5 V1.2
21 pages
KrishnaChaitanya Gattupalli 4.7yrsexp UI
No ratings yet
KrishnaChaitanya Gattupalli 4.7yrsexp UI
5 pages
Home Work Day 1
No ratings yet
Home Work Day 1
11 pages
Copy IDOC From One SAP System To Other SAP System
No ratings yet
Copy IDOC From One SAP System To Other SAP System
4 pages
Prophaze WAF - Native Cloud Security Platform k8s
No ratings yet
Prophaze WAF - Native Cloud Security Platform k8s
15 pages
Data Structures and Computer Algorithms
No ratings yet
Data Structures and Computer Algorithms
7 pages
Topic 1 Background of Google and Google Workspace
No ratings yet
Topic 1 Background of Google and Google Workspace
9 pages
Information Technology Risk Frameworks and Audits
No ratings yet
Information Technology Risk Frameworks and Audits
43 pages
Cisco Virtual WAAS: Cloud-Ready WAN Optimization Solution
No ratings yet
Cisco Virtual WAAS: Cloud-Ready WAN Optimization Solution
5 pages
A Practical Training Seminar Report
No ratings yet
A Practical Training Seminar Report
42 pages
Django ORM Basics
No ratings yet
Django ORM Basics
6 pages
Dupms (4 Files Merged)
No ratings yet
Dupms (4 Files Merged)
25 pages
Backend Challenge
No ratings yet
Backend Challenge
6 pages
Superior University Lahore: Faculty of Computer Science & IT
No ratings yet
Superior University Lahore: Faculty of Computer Science & IT
11 pages
05 - IaaS, PaaS, SaaS Case Study
No ratings yet
05 - IaaS, PaaS, SaaS Case Study
20 pages
Compute Sanitizer
No ratings yet
Compute Sanitizer
50 pages
How To Use The Emsisoft Decrypter For Jsworm 2.0
No ratings yet
How To Use The Emsisoft Decrypter For Jsworm 2.0
4 pages
Stack
No ratings yet
Stack
5 pages
Getting Started With Oracle Cloud Infrastructure
No ratings yet
Getting Started With Oracle Cloud Infrastructure
12 pages
DBMS PBL
No ratings yet
DBMS PBL
36 pages
Week 09 (Role of IT in SCM)
No ratings yet
Week 09 (Role of IT in SCM)
37 pages
CSS - 05-Week 14 - Module 14 - Security Firewall and VPN
No ratings yet
CSS - 05-Week 14 - Module 14 - Security Firewall and VPN
5 pages

SAP_HANA_distributed_in-memory_database_system_Transaction_session_and_metadata_management

Uploaded by

SAP_HANA_distributed_in-memory_database_system_Transaction_session_and_metadata_management

Uploaded by

SAP HANA Distributed In-Memory Database

System: Transaction, Session, and Metadata

Authorized licensed use limited©to:2013

E. Some Experimental Results

Fig. 5. Sample scenario showing benefits of client-side statement routing

The most general case of a distributed query execution

You might also like