High Performance With Distributed Caching Couchbase
High Performance With Distributed Caching Couchbase
High-Performance
Applications with
Distributed Caching
Get integrated caching from
a complete NoSQL solution
Contents
EXECUTIVE SUMMARY 3
KEY REQUIREMENTS 5
COUCHBASE ALTERNATIVES 11
Limitations of Redis 12
Limitations of Memcached 13
W H I T E P AP ER 2
EXECUTIVE SUMMARY
For many web, mobile, and Internet of Things (IoT) applications, distributed
caching is a key requirement, for improving performance and reducing cost. By
caching frequently accessed data—rather than making round trips to the backend
database—applications can deliver highly responsive experiences. And by reducing
workloads on backend resources and network calls to the backend, caching can
significantly lower capital and operating costs.
High performance is a given, because the primary goal of caching is to alleviate the
bottlenecks that come with traditional databases. This is not limited to relational
databases, however. NoSQL databases like MongoDB™ also have to make up for
their performance problems by recommending a third-party cache, such as Redis, to
service large numbers of requests in a timely manner.
Caching solutions must be easy to manage, but often are not. Whether it’s being able
to easily add a new node, or to resize existing services, it needs to be quick and easy
to configure. The best solutions provide command line, GUI, DBaaS (database-as-a-
service), and REST APIs to help keep things manageable.
Elastic scalability refers not only to the ability to grow a cluster as needed, but also
refers to the ability to replicate across multiple data centers (cloud and/or on-prem).
Cross data center replication (XDCR) is a feature that is often missing or performs
poorly across many caching technologies. To achieve this scalability, several products
often have to be glued together, thereby decreasing manageability and greatly
CACHING CAN
increasing cost.
BOOST APPLICATION
Based on Couchbase’s experience with leading enterprises, the remainder
PERFORMANCE AS WELL
of this document:
AS REDUCE COSTS.
• Explains the value of caching and describes common caching use cases
W H I T E P AP ER 3
IMPORTANCE OF A CACHE IN ENTERPRISE ARCHITECTURES
Today’s web, mobile, and IoT applications often need to operate at large
Caching vs Buffering scale: thousands to millions of users, terabytes (or even petabytes) of data,
Caching and buffering submillisecond response times, multiple device types, and global reach. To meet
are techniques that are these requirements, modern applications are built to run on clusters in distributed
often conflated. While computing environments, either in enterprise data centers or on public clouds such
many databases make as Microsoft Azure, Amazon Web Services (AWS), or Google Cloud Platform (GCP).
heavy use of memory Caching is a technology to boost application performance as well as reduce costs.
for buffering, this does By caching frequently used data in memory—rather than making database round
not mean they have a trips and incurring disk IO overhead—application response times can be dramatically
managed cache. Buffering improved, typically by orders of magnitude.
stores transitory data in
In addition, caching can substantially lower capital and operating costs by reducing
memory temporarily while
workloads on backend systems and reducing network usage. In particular, if the
it’s being read or written.
application runs on a relational database like Oracle—which requires high-end, costly
Caching stores data in
hardware in order to scale vertically—a distributed, horizontally scaling caching
memory until it’s evicted.
solution that runs on low-cost commodity servers can reduce the need to buy and
manage expensive resources.
... ...
1
2 3
• Speeding up RDBMS – Many web and mobile applications need to access data
from a backend relational database management system (RDBMS)—for example,
inventory data for an online product catalog. However, relational systems struggle
with large scale, and can be easily overwhelmed by the volume of requests from
web and mobile applications, particularly as usage grows over time. Caching data
from the RDBMS in memory is a cost-effective technique to speed up the backend.
• Managing usage spikes – Web and mobile applications often experience spikes in
usage (for example, seasonal surges like Black Friday, Cyber Monday, during prime
time television, etc.). Caching can prevent the application from being overwhelmed
and can help avoid the need to add expensive backend resources.
W H I T E P AP ER 4
THE USE OF A CACHE • Mainframe offloading – Mainframes are still widely used in many industries,
SHOULD NOT PLACE including financial services, government, retail, airlines, and heavy manufacturing.
UNDUE BURDEN ON THE A cache is used to offload workloads from a backend mainframe, thereby reducing
OPERATIONS TEAM. IT
MIPS costs (i.e., mainframe usage fees charged on a “millions of instructions per
second” basis), as well as enabling completely new services otherwise not possible
SHOULD BE REASONABLY
or cost prohibitive utilizing just the mainframe.
QUICK TO DEPLOY AND
• Token caching – In this use case, tokens are cached in memory in order to deliver
EASY TO MONITOR
high-performance user authentication and validation. eBay, for example, deploys
AND MANAGE.
Couchbase Server to cache token data for its buyers and sellers (over 100 million
active users globally, who are served more than 2 billion page views a day).
• Web session store – Session data and web history are kept in memory—for
example, as inputs to a shopping cart, real-time recommendation engine on an
e-commerce site, or player history in a game.
KEY REQUIREMENTS
Enterprises generally factor six key criteria into their evaluation. How you weigh
them depends on your specific situation.
W H I T E P AP ER 5
DISTRIBUTED CACHING WITH COUCHBASE SERVER
IN DESIGNING COUCHBASE Couchbase Server (and Couchbase Capella™ DBaaS) has become an attractive
SERVER, THE MEMCACHED alternative to caching tools like Redis and Memcached. It’s the only solution that fully
Architectural advantages
A BENCHMARK RUN ON Couchbase Server was built for distributed caching with a focus on agility,
GOOGLE CLOUD PLATFORM manageability, and scalability for mission-critical applications.
SHOWED 50 NODES OF
PERFORM AT ANY SCALE
COUCHBASE SERVER • Memory and network-centric: Couchbase’s memory-first architecture, with
SUSTAINED 1.1 MILLION integrated document cache, was designed to deliver high-throughput rates in
OPERATIONS PER SECOND. distributed computing environments while providing submillisecond latency and
TO DELIVER COMPARABLE resource efficiency. The network-centric architecture with a high-performance
replication backbone allows new workloads to be added while maintaining
PERFORMANCE, APACHE
performance at scale.
CASSANDRA NEEDED
• Always-on, edge-to-cloud: Couchbase is designed to be fault tolerant and
300 NODES.
highly resilient at any scale and on any platform—physical or virtual—delivering
always-on availability in case of hardware failures, network outages, or planned
maintenance windows.
W H I T E P AP ER 6
• Workload isolation and optimization: Adding or removing nodes can be done
without any downtime or code changes. Couchbase’s Multi-Dimensional Scaling
(MDS) allows users to isolate their workloads while incrementally increasing access
to specific services on the cluster resources as needed.
• Full-featured SQL for JSON: Standard SQL has been extended for JSON querying
and analytics to allow developers to use familiar database skills with Couchbase.
• Versatile data access patterns: Couchbase’s set of data access methods include
key-value lookup, SQL++ querying, full-text search, real-time analytics, and server-
side eventing—available across cloud, mobile, and edge devices.
W H I T E P AP ER 7
• No hassle scale out: Application code using Couchbase does not need to
change when a cluster grows in size—from development laptop to a multi-node
production deployment. No manual re-sharding or re-balancing is required by
any application, and cluster configuration information is all managed behind the
scenes by the topology-aware clients.
• Simplicity and ease of development: It’s easy for developers to work with
through the officially supported SDKs that are available for all popular languages
(Java, .NET, Python, PHP, Node.js, Go, and C). Rich integration is available via
frameworks and components such as Spring Data, Apache Spark, LINQ, and more.
USER REQUESTS
APPLICATION TIER
...
READ-WRITE
REQUESTS
CACHE MISSES
AND WRITE
REQUESTS
Replication
RDBMS
W H I T E P AP ER 8
Caching and document performance benchmarking
Couchbase supports typical caching use cases, and also supports more challenging
document database scenarios as well; in both of these scenarios, it outperforms
the competition.
400K 20
Couchbase
MongoDB
300K 15 DataStax
THROUGHPUT (OPS/SEC)
LATENCY (MS)
Couchbase
200K 10 MongoDB
DataStax
100K 5
0 0
NODES X RECORDS
In addition to caching, there are other workloads in the benchmark that serve as
examples of how Couchbase solves other common scenarios such as serving as a
database for an enterprise source of truth or system of record solution.
W H I T E P AP ER 9
SYSTEM OF RECORD
Operating as a system of record for enterprise data is another distinct role that
Couchbase can serve. In this case, Couchbase operates as both a cache and the
authoritative primary database for applications, providing the durability and stability
that is needed for any primary database application. This is the domain of traditional
relational databases but has become increasingly popular for NoSQL databases to
address, especially on cloud and web platforms.
To learn more about how well these types of queries perform on Couchbase, versus
other NoSQL products, see the charts, queries, and testing approaches used in
benchmark reports at couchbase.com/benchmarks.
Couchbase is a great fit for many caching scenarios. Many leading companies have
deployed Couchbase Server for mission-critical applications, including many of the
world’s leading enterprises:
• LinkedIn – With over 300 million members, LinkedIn uses Couchbase to cache
over 8 million real-time metrics (over 12TB of data). Over 16 million entries are
loaded into Couchbase every 5 minutes.
• eBay – The world’s largest online auction marketplace uses Couchbase to cache
over 100 million authentication tokens per day to ensure session validity. eBay
achieves over 300,000 writes per second with Couchbase.
W H I T E P AP ER 10
So why have these enterprises chosen Couchbase over the alternatives?
Many caching solutions are simple key-value stores with in-memory capabilities and
some ability to scale out. Couchbase is instead built from the ground up to deliver
elastic performance at scale—the very foundation of a superior caching tier.
Other features include SQL querying using the Couchbase NoSQL query language
(SQL++)—effectively letting you query JSON data without having to enforce a schema
or transform your data to behave a certain way just to get answers to queries.
Advanced real-time analytic queries are also possible—as well as full-text searches.
Many developer-centric features exist in Couchbase, including server-side event
processing, operation tracing, ACID transactions, scope/collection organization, and
automatic application failover between clusters.
These are all features that the most demanding teams require. The remainder of
this paper explores these concepts further and contrasts them with other solutions
within the overall context of caching solutions.
COUCHBASE ALTERNATIVES
Memcached and Redis are two examples of solutions that are part of the broader
landscape including both key-value databases and caching solutions. Many other
caching-related products exist, including GemFire, Hazelcast, and Oracle Coherence.
They attempt to solve similar problems, but do not necessarily aim to be a
comprehensive database solution to service caching and other use cases. This paper
will focus on Memcached and Redis, however, the architectural considerations apply
to all NoSQL databases and caching solutions.
REDIS
For businesses using MongoDB, Redis is often recommended as a caching add-on
to solve caching-related performance challenges. Redis is a popular data structure
server. It runs in-memory and has some snapshot persistence, but is not designed to
be a highly persistent database and has limitations around its partitioning model and
workload isolation.
W H I T E P AP ER 11
MEMCACHED
At the other end of the spectrum, Memcached is a free, open source product that’s
Couchbase used in thousands of web, mobile, and IoT applications around the world. It’s
simple to install and deploy, and it delivers reliable high performance. However,
Ephemeral Mode
Memcached has no enterprise support available, nor does it include a management
Couchbase automatically
console for monitoring. Many companies that deploy Memcached find they want
persists to disk, enabling
additional capabilities not included in Memcached, such as automatic failover to
larger-than-memory data.
avoid downtime and automatic rebalance to avoid cold caches.
However, Couchbase also
has a memory-only Couchbase has some shared lineage with Memcached and addresses many of its
Ephemeral mode, for limitations while also serving as a complete document database solution.
situations where you do
not ever want to invoke the Limitations of Redis
overhead of disk access.
Redis is a key-value data structure server that is popular for in-memory caching
solutions. Companies who employ Redis typically use it on top of other products
such as MongoDB or MySQL to improve performance. It solves other use cases
but is not generally recognized as a document database. Common concerns with
Redis include:
• Complexity – Redis data can be sharded across several nodes, but scripts and
command line utilities have to be run to redistribute data when adding/removing
nodes. It also runs in a primary/secondary (historically known as master-slave
architecture), where the secondaries are read-only. Couchbase uses a “masterless”
approach. Couchbase tasks such as rebalance, adding and failing over nodes, and
more can all be done automatically.
• Lacks built-in features – As Redis is optimized for key-value lookups, the concept
of querying is different than most database users expect. Ad hoc query and
indexing is not possible with the core product if applications need a change to the
data model, then rehashing of data may be required. Couchbase provides an array
Couchbase
of built-in query and indexing services and allows them to run on different nodes –
Memcached support
providing powerful workload isolation.
Memcached still appears
as an option in Couchbase, • Persistence – While Redis has the ability to persist data, it is still primarily an
for purposes of backwards in-memory focused layer. The persistence capabilities are designed to back up
compatibility. However, data and speed up the “cold” restart process, but this impacts performance as it
it is deprecated and not saves its snapshots to disk. It is not designed for real-time storage and swapping
recommended for any new of disk or in-memory datasets. Couchbase is a complete database solution, able to
development. efficiently load and persist data from/to disk as expected from a database.
• Memory limitations – Redis datasets must fit into memory. This makes it very
challenging for larger datasets as they must scale up the machine or scale out the
cluster of Redis nodes to shard the data across nodes. Since Redis requires all data
to be in memory, Redis does not efficiently support rotating through a hot working
set as requests shift over the course of a day. This requires more hardware and
increased licensing costs when data volumes start to exceed memory. In contrast,
Couchbase can load data that is larger than memory. Memory quotas can be set
to determine how much of the dataset is kept in RAM, with most used data being
read as needed into the cache for quick access.
W H I T E P AP ER 12
Limitations of Memcached
Memcached is a simple, open source cache used by many companies, including
YouTube, Reddit, Craigslist, Facebook, Twitter, Tumblr, and Wikipedia. It’s an in-
memory, key-value store for small chunks of arbitrary data (strings, objects) from
results of database calls, API calls, or page rendering.
• Low initial cost – Licensed under the Revised BSD license, Memcached is free
open source software.
Memcached does not include advanced features that many enterprises require, such
as automatic failover, load rebalancing to add capacity without downtime, and cross
data center replication.
With its many integrated features, including a built-in managed cache, disk
persistence, high availability, geographic replication, structured query language,
real-time analytics, full-text search, eventing, and mobile synchronization, Couchbase
consolidates multiple layers into a single platform that otherwise would require
separate solutions to work together.
W H I T E P AP ER 13
How and where you deploy Couchbase is entirely up to you. Some use Couchbase
just as a cache or just as a system of record. Others start with Couchbase as a
cache and eventually evolve it to become a source of truth and system of record.
Regardless of your strategy, Couchbase gives you the flexibility to choose any starting
point and easily evolve over time.
• Customer 360
• Content management
• Operational dashboarding
• Session store
• Shopping cart
W H I T E P AP ER 14
Modern customer experiences need a flexible database platform that can
power applications spanning from cloud to edge and everything in between.
Couchbase’s mission is to simplify how developers and architects develop,
deploy and consume modern applications wherever they are. We have
reimagined the database with our fast, flexible and affordable cloud database
platform Capella, allowing organizations to quickly build applications that
deliver premium experiences to their customers—all with best-in-class price
performance. More than 30% of the Fortune 100 trust Couchbase to power
their modern applications.