0% found this document useful (0 votes)
92 views31 pages

Mongo Best Practices

Best practices para mongodb

Uploaded by

Abraham Martinez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views31 pages

Mongo Best Practices

Best practices para mongodb

Uploaded by

Abraham Martinez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

A MongoDB White Paper

MongoDB Operations Best Practices


June 2018
Table of Contents
Introduction 1

Preparing for a MongoDB Deployment 2

Continuous Availability 13

Scaling a MongoDB System 16

Managing MongoDB 18

Security 25

MongoDB Atlas: Database as a Service For MongoDB 27

MongoDB Stitch: Backend as a Service 28

Conclusion 28

We Can Help 29

Resources 29
Introduction

MongoDB is designed to meet the demands of modern This paper provides guidance on best practices for
apps with a technology foundation that enables you deploying and managing MongoDB. It assumes familiarity
through: with the architecture of MongoDB and an understanding of
concepts related to the deployment of enterprise software.
1. The document data model – presenting you the best
way to work with dat
dataa. This guide is aimed at users managing the database
themselves. A dedicated guide is provided for users of the
2. A distributed systems design – allowing you to
MongoDB database as a service – MongoDB Atlas Best
intelligently put dat
dataa wher
where
e you want it
it.
Practices. MongoDB Atlas is the best way to run MongoDB
3. A unified experience that gives you the fr
freedom
eedom to run in the cloud.
anywher
anywhere e – allowing you to future-proof your work and
eliminate vendor lock-in. While this guide is broad in scope, it is not exhaustive. You
should refer to MongoDB documentation, starting with the
While some aspects of MongoDB are different from Production Notes, which detail system configurations that
traditional relational databases, the concepts of the system, affect MongoDB. Also consider the no cost, online training
its operations, policies, and procedures will be familiar to classes offered by MongoDB University. In addition,
staff who have deployed and operated other database MongoDB offers a range of consulting services to work
systems. Organizations have found that DBAs and with you at every stage of your application lifecycle.
operations teams are able to preserve existing investments
by integrating MongoDB into their production
environments, without needing to customize established
operational processes or tools.

1
Preparing for a MongoDB differences in recommendations between the storage
engines are noted.
Deployment
WiredTiger is the default storage engine for new MongoDB
deployments from MongoDB 3.2; if another engine is
MongoDB Pluggable Storage Engines preferred then start the mongod using the
--storageEngine option. If a 3.2+ mongod process is
MongoDB exposes a storage engine API, enabling the
started and one or more databases already exist, then it will
integration of pluggable storage engines that extend
use whichever storage engine those databases were
MongoDB with new capabilities, and enable optimal use of
created with.
specific hardware architectures. MongoDB ships with
multiple supported storage engines:

• The default WiriredT


edTiger
iger storage engine
engine. For most
Schema Design
applications, WiredTiger's granular concurrency control Developers and data architects should work together to
and native compression will provide the best all-around develop the right data model, and they should invest time in
performance and storage efficiency for the broadest this exercise early in the project. The requirements of the
range of applications. application should drive the data model, updates, and
• The Encrypted storage engine
engine, protecting highly queries of your MongoDB system. Given MongoDB's
sensitive data, without the performance or management dynamic schema, developers and data architects can
overhead of separate files system encryption. The continue to iterate on the data model throughout the
Encrypted storage is based upon WiredTiger and so development and deployment processes to optimize
throughout this document, statements regarding performance and storage efficiency, as well as support the
WiredTiger also apply to the Encrypted storage engine. addition of new application features. All of this can be done
This engine is part of MongoDB Enterprise Advanced. without expensive schema migrations.

• The In-Memory storage engine


engine, delivering predictable The topic of schema design is significant, and a full
latency coupled with real-time analytics for the most discussion is beyond the scope of this guide. For more
demanding, applications. This engine is part of information, please see Data Modeling Considerations for
MongoDB Enterprise Advanced. MongoDB in the MongoDB Documentation. A number of
• The M MAP
MAPv1
v1 storage engine
engine, which is provided for additional resources are available on-line, including
backwards compatibility only. This engine is deprecated conference presentations from MongoDB Solutions
with the MongoDB 4.0 release. Architects and users, as well as the no-cost, web-based
training provided by MongoDB University. MongoDB Global
MongoDB uniquely allows users to mix and match multiple Consulting Services offers assistance in schema design as
storage engines within a single MongoDB cluster. This part of the Development Rapid Start service..
flexibility provides a more simple and reliable approach to
meeting diverse application needs for data. Traditionally, The key schema design concepts to keep in mind are as
multiple database technologies would need to be managed follows.
to meet these needs, with complex, custom integration
code to move data between the technologies, and to Document Model
ensure consistent, secure access. While each storage
engine is optimized for different workloads, users still MongoDB stores data as documents in a binary
leverage the same MongoDB query language, data model, representation called BSON. The BSON encoding extends
scaling, security, and operational tooling independent of the the popular JSON representation to include additional
engine they use. As a result most of best practices in this types such as int, long, decimal, and date. BSON
guide apply to all of the supported storage engines. Any documents contain one or more fields, and each field
contains a value of a specific data type, including arrays,

2
sub-documents, and binary data. It may be helpful to think just a subset of fields – for example, requiring a valid
of documents as roughly equivalent to rows in a relational customer name and address, while other fields can be
database, and fields as roughly equivalent to columns. freeform.
However, MongoDB documents tend to have all related
With schema validation, DBAs can apply data governance
data for a given object in a single document, whereas in a
standards to their schema, while developers maintain the
relational database that data is usually normalized across
benefits of a flexible document model.
rows in many tables. For example, data that belongs to
parent-child relationships in multiple RDBMS tables can As an example, you can add a JSON Schema to enforce
frequently be collapsed (embedded) into a single these rules:
document in MongoDB. For operational applications, the
document model makes JOINs redundant in many cases. • Each document must contain a field named lineItems

• The document may optionally contain other fields

Collections • lineItems must be an array where each element:

Collections are groupings of documents. Typically all ◦ Must contain a title (string), price (number no smaller
documents in a collection have similar or related purposes than 0)
for an application. It may be helpful to think of collections ◦ May optionally contain a boolean named purchased
as being analogous to tables in a relational database.
◦ Must contain no further fields

db.createCollection( "orders",
Dynamic Schema & Schema Validation {validator: {$jsonSchema:

MongoDB documents can vary in structure. For example, {


documents that describe users might all contain the user id properties: {
lineItems:
and the last date they logged into the system, but only
some of these documents might contain the user's {type: "array",
shipping address, and perhaps some of those contain items:{
properties: {
multiple shipping addresses. MongoDB does not require title: {type: "string"},
that all documents conform to the same structure. price: {type: "number",
minimum: 0.0},
Furthermore, there is no need to declare the structure of
purchased: {type: "boolean"}
documents to the system – documents are self-describing. },
required: ["_id", "title", "price"],
While MongoDB’s flexible schema is a powerful feature, additionalProperties: false
}
there are situations where strict guarantees on the
}
schema’s data structure and content are required. Unlike },
NoSQL databases that push enforcement of these controls required: ["lineItems"]
}}
back into application code, MongoDB provides schema })
validation within the database via syntax derived from the
proposed IETF JSON Schema standard.
Indexes
Using Schema Validation, DevOps and DBA teams can
MongoDB uses B-tree indexes to optimize queries. Indexes
define a prescribed document structure for each collection,
are defined on a collection’s document fields. MongoDB
with the database rejecting any documents that do not
includes support for many indexes, including compound,
conform to it. Administrators have the flexibility to tune
geospatial, TTL, text search, sparse, partial, unique, and
schema validation according to use case – for example, if a
others. For more information see the section on indexing
document fails to comply with the defined structure, it can
below.
be either be rejected or written to the collection while
logging a warning message. Structure can be imposed on

3
Transactions seconds. Note that if write volumes to the server are
low, you have the flexibility to tune your transactions for
Because documents can bring together related data that
a longer execution time. To address timeouts, the
would otherwise be modelled across separate parent-child
transaction should be broken into smaller parts that
tables in a tabular schema, MongoDB’s atomic
allow execution within the configured time limit. You
single-document operations provide transaction semantics
should also ensure your query patterns are properly
that meet the data integrity needs of the majority of
optimized with the appropriate index coverage to allow
applications. One or more fields may be written in a single
fast data access within the transaction.
operation, including updates to multiple sub-documents
and elements of an array. The guarantees provided by 2. There are no hard limits to the number of documents
MongoDB ensure complete isolation as a document is that can be read within a transaction. As a best practice,
updated; any errors cause the operation to roll back so that no more than 1,000 documents should be modified
clients receive a consistent view of the document. within a transaction. For operations that need to modify
MongoDB’s existing document atomicity guarantees will more than 1,000 documents, developers should break
meet 80-90% of an application’s transactional needs. They the transaction into separate parts that process
remain the recommended way of enforcing your app’s data documents in batches.
integrity requirements 3. In MongoDB 4.0, a transaction is represented in a
single oplog entry, therefore must be within the 16MB
MongoDB 4.0 adds support for multi-document ACID
document size limit. While an update operation only
transactions, making it even easier for developers to
stores the deltas of the update (i.e., what has changed),
address more use cases with MongoDB. They feel just like
an insert will store the entire document. As a result, the
the transactions developers are familiar with from relational
combination of oplog descriptions for all statements in
databases – multi-statement, similar syntax, and easy to
the transaction must be less than 16MB. If this limit is
add to any application. Through snapshot isolation,
exceeded, the transaction will be aborted and fully rolled
transactions provide a consistent view of data, enforce
back. The transaction should therefore be decomposed
all-or-nothing execution, and do not impact performance
into a smaller set of operations that can be represented
for workloads that do not require them. For those
in 16MB or less.
operations that do require multi-document transactions,
there are several best practices that developers should 4. When a transaction aborts, an exception is returned to
observe. the driver and the transaction is fully rolled back.
Developers should add application logic that can catch
Creating long running transactions, or attempting to and retry a transaction that aborts due to temporary
perform an excessive number of operations in a single exceptions, such as a transient network failure or a
ACID transaction can result in high pressure on primary replica election. With retryable writes, the
WiredTiger’s cache. This is because the cache must MongoDB drivers will automatically retry the commit
maintain state for all subsequent writes since the oldest statement of the transaction.
snapshot was created. As a transaction always uses the
same snapshot while it is running, new writes accumulate You can review all best practices in the MongoDB
in the cache throughout the duration of the transaction. documentation for multi-document transactions.
These writes cannot be flushed until transactions currently
running on old snapshots commit or abort, at which time
Visualizing your Schema and Adding Validation
the transactions release their locks and WiredTiger can Rules: MongoDB Compass
evict the snapshot. To maintain predictable levels of
database performance, developers should therefore The MongoDB Compass GUI allows users to understand
consider the following: the structure of existing data in the database and perform
ad hoc queries against it – all with zero knowledge of
1. By default, MongoDB will automatically abort any MongoDB's query language. Typical users could include
multi-document transaction that runs for more than 60 architects building a new MongoDB project or a DBA who

4
has inherited a database from an engineering team, and groups of reviews as a separate document with a
who must now maintain it in production. You need to reference to the product document; while also storing the
understand what kind of data is present, define what key reviews in the product document for fast access.
indexes might be appropriate, and identify if Document
Validation rules should be added to enforce a consistent
GridFS
document structure.
For files larger than 16 MB, MongoDB provides a
convention called GridFS, which is implemented by all
MongoDB drivers. GridFS automatically divides large data
into 256 KB pieces called chunks and maintains the
metadata for all chunks. GridFS allows for retrieval of
individual chunks as well as entire documents. For example,
an application could quickly jump to a specific timestamp in
a video. GridFS is frequently used to store large binary files
such as images and videos in MongoDB.

Data Lifecycle Management


MongoDB provides features to facilitate the management
Figur
Figuree 1: View schema & interactively build and execute of data lifecycles, including Time to Live indexes, and
database queries with MongoDB Compass
capped collections. In addition, by using MongoDB Zones,
administrators can build highly efficient tiered storage
Without MongoDB Compass, users wishing to understand
models to support the data lifecycle. By assigning shards to
the shape of their data would have to connect to the
Zones, administrators can balance query latency with
MongoDB shell and write queries to reverse engineer the
storage density and cost by assigning data sets based on a
document structure, field names, and data types. Similarly,
value such as a timestamp to specific storage devices:
anyone wanting to run custom queries on the data would
need to understand MongoDB's query language. • Recent, frequently accessed data can be assigned to
high performance SSDs with Snappy compression
MongoDB Compass provides users with a graphical view
enabled.
of their MongoDB schema by sampling a subset of
documents from a collection. By using sampling, MongoDB • Older, less frequently accessed data is tagged to
Compass minimizes database overhead and can present lower-throughput hard disk drives where it is
results to the user almost instantly. compressed with zlib to attain maximum storage density
with a lower cost-per-bit.

• As data ages, MongoDB automatically migrates it


Document Size between storage tiers, without administrators having to
The maximum BSON document size in MongoDB is 16 build tools or ETL processes to manage data
MB. Users should avoid certain application patterns that movement.
would allow documents to grow unbounded. For example,
You can learn more about sharding using Zones later in
in an e-commerce application it would be difficult to
this guide.
estimate how many reviews each product might receive
from customers. Furthermore, it is typically the case that
only a subset of reviews is displayed to a user, such as the Time to Live (TTL)
most popular or the most recent reviews. Rather than
If documents in a collection should only persist for a
modeling the product and customer reviews as a single
pre-defined period of time, the TTL feature can be used to
document it would be better to model each review or

5
automatically delete documents of a certain age rather Users should always create indexes to support queries, but
than scheduling a process to check the age of all should not maintain indexes that queries do not use. This is
documents and run a series of deletes. For example, if user particularly important for deployments that support
sessions should only exist for one hour, the TTL can be set insert-heavy (or writes which modify indexed values)
to 3600 seconds for a date field called lastActivity workloads.
that exists in documents used to track user sessions and
For operational simplicity, the Performance Advisor in
their last interaction with the system. A background thread
MongoDB Ops Manager and Cloud Manager platforms can
will automatically check all these documents and delete
identify missing indexes, enabling the administrator to then
those that have been idle for more than 3600 seconds.
automate the process of rolling them out – while avoiding
Another example use case for TTL is a price quote that
any application impact. Ops Manager and Cloud Manager
should automatically expire after a period of time.
are discussed later in this guide.

To understand the effectiveness of the existing indexes


Capped Collections
being used, an $indexStats aggregation stage can be
In some cases a rolling window of data should be used to determine how frequently each index is used.
maintained in the system based on data size. Capped MongoDB Compass visualizes index coverage, enabling
collections are fixed-size collections that support you to determine which specific fields are indexed, their
high-throughput inserts and reads based on insertion order. type, size, and how often they are used.
A capped collection behaves like a circular buffer: data is
inserted into the collection, that insertion order is
preserved, and when the total size reaches the threshold of
Query Optimization
the capped collection, the oldest documents are deleted to Queries are automatically optimized by MongoDB to make
make room for the newest documents. For example, store evaluation of the query as efficient as possible. Evaluation
log information from a high-volume system in a capped normally includes the selection of data based on
collection to quickly retrieve the most recent log entries. predicates, and the sorting of data based on the sort
criteria provided. The query optimizer selects the best
Dropping a Collection indexes to use by periodically running alternate query plans
and selecting the index with the best performance for each
It is very efficient to drop a collection in MongoDB. If your query type. The results of this empirical test are stored as a
data lifecycle management requires periodically deleting cached query plan and periodically updated.
large volumes of documents, it may be best to model those
documents as a single collection. Dropping a collection is MongoDB provides an explain plan capability that shows
much more efficient than removing all documents or a information about how a query will be, or was, resolved,
large subset of a collection, just as dropping a table is more including:
efficient than deleting all the rows in a table in a relational
• The number of documents returned
database.
• The number of documents read
WiredTiger automatically reclaims disk space after a
• Which indexes were used
collection is dropped.
• Whether the query was covered, meaning no documents
needed to be read to return results
Indexing
• Whether an in-memory sort was performed, which
Like most database management systems, indexes are a indicates an index would be beneficial
crucial mechanism for optimizing system performance in • The number of index entries scanned
MongoDB. While indexes will improve the performance of
• How long the query took to resolve in milliseconds
some operations by one or more orders of magnitude, they
(when using the executionStats mode)
incur overhead to writes, disk space, and memory usage.

6
• Which alternative query plans were rejected (when duration exceeds a configurable threshold (whose default
using the allPlansExecution mode) is 100 ms). Profiling data is stored in a capped collection
where it can easily be searched for relevant events. It may
The explain plan will show 0 milliseconds if the query was
be easier to query this collection than parsing the log files.
resolved in less than 1 ms, which is typical in well-tuned
systems. When the explain plan is called, prior cached MongoDB Ops Manager and Cloud Manager can be used
query plans are abandoned, and the process of testing to visualize output from the profiler when identifying slow
multiple indexes is repeated to ensure the best possible queries. The Visual Query Profiler provides a quick and
plan is used. The query plan can be calculated and convenient way for operations teams and DBAs to analyze
returned without first having to run the query. This enables specific queries or query families. The Visual Query Profiler
DBAs to review which plan will be used to execute the (as shown in Figure 3) displays how query and write
query, without having to wait for the query to run to latency varies over time – making it simple to identify
completion. slower queries with common access patterns and
characteristics, as well as identify any latency spikes. A
MongoDB Compass provides the ability to visualize explain
single click in the Ops Manager UI activates the profiler,
plans, presenting key information on how a query
which then consolidates and displays metrics from every
performed – for example the number of documents
node in a single screen.
returned, execution time, index usage, and more. Each
stage of the execution pipeline is represented as a node in The Visual Query Profiler will analyze the data –
a tree, making it simple to view explain plans from queries recommending additional indexes and optionally add them
distributed across multiple nodes. through an automated, rolling index build.

Figur
Figuree 3: Visual Query Profiling in MongoDB Ops Manager

As noted above, the Performance Advisor can


Figur
Figuree 2: MongoDB Compass visual query plan for automatically notify you of missing indexes.
performance optimization across distributed clusters

If the application will always use indexes, MongoDB can be Primary and Secondary Indexes
configured through the notablescan setting to throw an A unique index on the _id attribute is created for all
error if a query is issued that requires scanning the entire documents. MongoDB will automatically create the _id
collection. field and assign a unique value if the value is not be
specified when the document is inserted. All user-defined
Profiling indexes are secondary indexes. MongoDB includes support
for many types of secondary indexes that can be declared
MongoDB provides a profiling capability called Database on any field(s) in the document, including fields within
Profiler, which logs fine-grained information about arrays and sub-documents. Index options include:
database operations. The profiler can be enabled to log
information for all events or only those events whose • Compound indexes

7
• Geospatial indexes • By default, WiredTiger uses prefix compression to
reduce index footprint on both persistent storage and in
• Text search indexes
RAM. This enables administrators to dedicate more of
• Unique indexes the working set to manage frequently accessed
• Array indexes documents. Compression ratios of around 50% are
typical, but users are encouraged to evaluate the actual
• TTL indexes
ratio they can expect by testing their own workloads.
• Sparse indexes
• Administrators can place indexes on their own separate
• Partial Indexes storage volume, allowing for faster disk paging and
• Hash indexes lower contention.

• Collated indexes for different languages


Index Limitations
You can learn more about each of these indexes from the
MongoDB Architecture Guide As with any database, indexes consume disk space and
memory, so should only be used as necessary. Indexes can
impact update performance. An update must first locate
Index Creation Options
the data to change, so an index will help in this regard, but
Indexes and data are updated synchronously in MongoDB, index maintenance itself has overhead and this work will
thus ensuring queries on indexes never return stale or reduce update performance.
deleted data. The appropriate indexes should be
There are several index limitations that should be observed
determined as part of the schema design process, and can
when deploying MongoDB:
be added or removed at any time. By default creating an
index is a blocking operation in MongoDB. Because the • A collection cannot have more than 64 indexes.
creation of indexes can be time and resource intensive,
• Index entries cannot exceed 1024 bytes.
MongoDB provides an option for creating new indexes as a
background operation on both the primary and secondary • The name of an index must not exceed 125 characters
members of a replica set. When the background option is (including its namespace).
enabled, the total time to create an index will be greater • In-memory sorting of data without an index is limited to
than if the index was created in the foreground, but it will 32MB. This operation is very CPU intensive, and
still be possible to query the database while creating in-memory sorts indicate an index should be created to
indexes. optimize these queries.

A common practice is to build the indexes in the


foreground, first on the secondaries and then on the Common Mistakes Regarding Indexes
demoted primary. Ops Manager and Cloud Manager
automate this process. The following tips may help to avoid some common
mistakes regarding indexes:
In addition, multiple indexes can be built concurrently in the
background. Refer to the Build Index on Replica Sets • Use a compound index rather than index
documentation to learn more about considerations for intersection: For best performance when querying via
index creation and on-going maintenance. multiple predicates, compound indexes will generally be
a better option.

• Compound indexes
indexes: Compound indexes are defined
Managing Indexes with the MongoDB WiredTiger
Storage Engine and ordered by field. So, if a compound index is defined
for last name, first name, and city, queries that
The WiredTiger storage engine offers optimizations that specify last name or last name, and first name
you can take advantage of: will be able to use this index, but queries that try to

8
search based on city will not be able to benefit from through its internal cache but it also benefits from pages
this index. Remove indexes that are prefixes of other held in the filesystem cache.
indexes.
The set of data and indexes that are accessed during
• Low selectivity indexes
indexes: An index should radically normal operations is called the working set. It is best
reduce the set of possible documents to select from. practice that the working set fits in RAM. It may be the
For example, an index on a field that indicates gender is case the working set represents a fraction of the entire
not as beneficial as an index on zip code, or even better, database, such as in applications where data related to
phone number. recent events or popular products is accessed most
• Regular expr
expressions
essions: Indexes are ordered by value, commonly.
hence leading wildcards are inefficient and may result in
Page faults occur when MongoDB attempts to access data
full index scans. Trailing wildcards can be efficient if
that has not been loaded in RAM. If there is free memory
there are sufficient case-sensitive leading characters in
then the operating system can locate the page on disk and
the expression.
load it into memory directly. However, if there is no free
• Negation
Negation: Inequality queries can be inefficient with memory, the operating system must write a page that is in
respect to indexes. Like most database systems, memory to disk, and then read the requested page into
MongoDB does not index the absence of values and memory when it is required by the application. This process
negation conditions may require scanning all can be time consuming and will be significantly slower than
documents. If negation is the only condition and it is not accessing data that is already resident in memory.
selective (for example, querying an orders table, where
Some operations may inadvertently purge a large
99% of the orders are complete, to identify those that
percentage of the working set from memory, which
have not been fulfilled), all records will need to be
adversely affects performance. For example, a query that
scanned.
scans all documents in the database, where the database
• Eliminate unnecessary indexes
indexes: Indexes are is larger than available RAM on the server, will cause
resource-intensive: even with they consume RAM, and documents to be read into memory and may lead to
as fields are updated their associated indexes must be portions of the working set being written out to disk. Other
maintained, incurring additional disk I/O overhead. To examples include various maintenance operations such as
understand the effectiveness of existing indexes use compacting or repairing a database, and rebuilding indexes.
the strategies described earlier.
If your database working set size exceeds the available
• Partial indexes
indexes: If only a subset of documents need to
RAM of your system, consider increasing RAM capacity or
be included in a given index then the index can be made
adding sharding the database across additional servers.
partial by specifying a filter expression. e.g., if an index
For a discussion on this topic, refer to the section on
on the userID field is only needed for querying open
Sharding Best Practices. It is far easier to implement
orders then it can be made conditional on the order
sharding before the system’s resources are consumed, so
status being set to in progress. In this way, partial
capacity planning is an important element in successful
indexes improve query performance while minimizing
project delivery.
overheads.
Refer to the documentation for configuring the WiredTiger
internal cache size.
Working Sets
MongoDB makes extensive use of RAM to speed up
database operations. In MongoDB, all data is read and
manipulated through in-memory representations of the
data. The WiredTiger storage engine manages data

9
MongoDB Setup and Configuration Data Migration
Users should assess how best to model their data for their
Setup applications rather than simply importing the flat file
exports of their legacy systems. In a traditional relational
MongoDB provides repositories for .deb and .rpm
database environment, data tends to be moved between
packages for consistent setup, upgrade, system integration,
systems using delimited flat files such as CSV. While it is
and configuration. This software uses the same binaries as
possible to ingest data into MongoDB from CSV files, this
the tarball packages provided from the MongoDB
may in fact only be the first step in a data migration
Downloads Page. The MongoDB Windows package is
process. It is typically the case that MongoDB's document
available via the downloadable binary installed via its MSI.
data model provides advantages and alternatives that do
Binaries for OS X are also provided in a tarball1.
not exist in a relational data model.

The mongoimport and mongoexport tools are provided


Database Configuration
with MongoDB for simple loading or exporting of data in
User should store configuration options in mongod's JSON or CSV format. These tools may be useful in moving
configuration file. This allows sysadmins to implement data between systems as an initial step. Other tools such
consistent configurations across entire clusters. The as mongodump and mongorestore, or Ops Manager and
configuration files support all options provided as Cloud Manager backups are useful for moving data
command line options for mongod. Popular tools such as between different MongoDB systems.
Ansible, Chef, and Puppet can be used to provision
There are many options to migrate data from flat files into
MongoDB instances. The provisioning of complex
rich JSON documents, including mongoimport, custom
topologies comprising replica sets and sharded clusters
scripts, ETL tools, and from within an application itself
can be automated by the the Ops Manager and Cloud
which can read from the existing RDBMS and then write a
Manager platforms, which are discussed later in this guide.
JSON version of the document back to MongoDB.

Upgrades
Hardware
Users should upgrade software as often as possible so
that they can take advantage of the latest features as well The following recommendations are only intended to
as any stability updates or bug fixes. Upgrades should be provide high-level guidance for hardware for a MongoDB
tested in non-production environments to validate correct deployment. The specific configuration of your hardware
application behavior. will be dependent on your data, queries, performance SLA,
availability requirements, and the capabilities of the
Customers can deploy rolling upgrades without incurring underlying hardware infrastructure. MongoDB has
any downtime, as each member of a replica set can be extensive experience helping customers to select hardware
upgraded individually without impacting database and tune their configurations and we frequently work with
availability. It is possible for each member of a replica set to customers to plan for and optimize their MongoDB
run under different versions of MongoDB, and with systems. The Health Check, Operations Rapid Start, and
different storage engines. As a precaution, the MongoDB Production Readiness consulting packages can be
release notes should be consulted to determine if there is especially valuable in helping select the appropriate
a particular order of upgrade steps that needs to be hardware for your project.
followed, and whether there are any incompatibilities
between two specific versions. Upgrades can be MongoDB was specifically designed with commodity
automated with Ops Manager and Cloud Manager. hardware in mind and has few hardware requirements or
limitations. Generally speaking, MongoDB will take
advantage of more RAM and faster CPU clock speeds.
1. OS X is intended as a development rather than a production environment

10
Memory As with networking, use paravirtualized drivers for your
storage when running on VMs.
MongoDB makes extensive use of RAM to increase
performance. Ideally, the working set fits in RAM. As a
general rule of thumb, the more RAM, the better. As Compression
workloads begin to access data that is not in RAM, the
MongoDB natively supports compression when using the
performance of MongoDB will degrade, as it will for any
default WiredTiger storage engine. Compression reduces
database. The default WiredTiger storage engine gives
storage footprint by as much as 80%, and enables higher
more control of memory by allowing users to configure how
storage I/O scalability as fewer bits are read from disk. As
much RAM to allocate to the WiredTiger internal cache –
with any compression algorithm, administrators trade
defaulting to 60% of RAM minus 1 GB. WiredTiger also
storage efficiency for CPU overhead, and so it is important
exploits the operating system’s filesystem cache which will
to test the impacts of compression in your own
grow to utilize the remaining memory available.
environment.

MongoDB offers administrators a range of compression


Storage
options for documents, indexes, and the journal. The
MongoDB does not require shared storage (e.g., storage default Snappy compression algorithm provides a good
area networks). MongoDB can use local attached storage balance between high document and journal compression
as well as solid state drives (SSDs). Most disk access ratio (typically around 70%, dependent on the data) with
patterns in MongoDB do not have sequential properties, low CPU overhead, while the optional zlib library will
and as a result, customers may experience substantial achieve higher compression, but incur additional CPU
performance gains by using SSDs. Good results and strong cycles as data is written to and read from disk. Indexes in
price to performance have been observed with SATA SSD WiredTiger uses prefix compression, which serves to
and with PCIe. Commodity SATA spinning drives are reduce the in-memory footprint of index storage, freeing up
comparable to higher cost spinning drives due to the more of the working set for frequently accessed
non-sequential access patterns of MongoDB: rather than documents. Administrators can modify the default
spending more on expensive spinning drives, that budget compression settings for all collections. Compression can
may be more effectively invested on increasing RAM or also be specified on a per-collection basis during collection
using SSDs. Another benefit of using SSDs is the creation.
performance benefit of flash over hard disk drives if the
working set no longer fits in memory.
CPU
While data files benefit from SSDs, MongoDB's journal
MongoDB will deliver better performance on faster CPUs.
files are good candidates for fast, conventional disks due
The MongoDB WiredTiger storage engine is able to
to their high sequential write profile. See the section on
saturate multi-core processor resources. The Encrypted
journaling later in this guide for more information.
Storage engine adds an average of 10% overhead
Most MongoDB deployments should use RAID-10. RAID-5 compared to WiredTiger due to a portion of available CPU
and RAID-6 have limitations and may not provide sufficient being used for encryption/decryption – the actual impact
performance. RAID-0 provides good read and write will be dependent on your data set and workload.
performance, but insufficient fault tolerance. MongoDB's
replica sets allow deployments to provide stronger
Process Per Host
availability for data, and should be considered with RAID
and other factors to meet the desired availability SLA. For best performance, users should run one mongod
process per host. With appropriate sizing and resource
If using Amazon EC2 then select the required IOPS rate
allocation using virtualization or container technologies,
using the Provisioned-IOPS option when configuring
multiple MongoDB processes can run on a single server
storage to provide consistent storage performance.
without contending for resources. Using the WiredTiger

11
storage engine, administrators will need to calculate the to deploy a mongos instance on each of their application
appropriate cache size for each instance by evaluating servers. The optimal number of mongos servers will be
what portion of total RAM each of them should use, and determined by the specific workload of the application: in
splitting the default cache_size between each. some cases mongos simply routes queries to the
appropriate shard, and in other cases mongos must route
For availability, multiple members of the same replica set
them to multiple shards and merge the result sets. To
should never be co-located on the same physical hardware
estimate the memory requirements for each mongos,
or share any single point of failure such as a power supply.
consider the following:
When running in the cloud, make use of your provider’s
ability to deploy across availability zones to ensure that • The total size of the shard metadata that is cached by
members from each replica set are geographically mongos
dispersed and do not share the same power, hypervisor or
• 1MB for each application connection
network. The MongoDB Atlas database service will take of
all of this for you. The mongos process uses limited RAM and will benefit
more from fast CPUs and networks.
Sizing for mongos and Config Server Processes
For sharded systems, additional processes must be Operating System and File System
deployed alongside the mongod data storing processes:
mongos query routers and config servers. Shards are
Configurations for Linux
physical partitions of data spread across multiple servers.
For more on sharding, please see the section on horizontal Only 64-bit versions of operating systems are supported
scaling with shards. Queries are routed to the appropriate for use with MongoDB.
shards using a query router process called mongos. The
Version 2.6.36 of the Linux kernel or later should be used
metadata used by mongos to determine where to route a for MongoDB in production.
query is maintained by the config servers. Both mongos
and config server processes are lightweight, but each has Use XFS file systems; avoid EXT3.** EXT3 is quite old and
somewhat different sizing requirements. is not optimal for most database workloads. With the
WiredTiger storage engine, use of XFS is strongly
Within a shard, MongoDB further partitions documents into recommended to avoid performance issues that have been
chunks. MongoDB maintains metadata about the observed when using EXT4 with WiredTiger.
relationship of chunks to shards in the config database.
Three or more config servers are maintained in sharded For MongoDB on Linux use the following recommended
deployments to ensure availability of the metadata at all configurations:
times. Shard metadata access is infrequent: each mongos
• Turn off atime for the storage volume with the
maintains a cache of this data, which is periodically
database files.
updated by background processes when chunks are split
or migrated to other shards, typically during balancing • Do not use Huge Pages virtual memory pages,
operations as the cluster expands and contracts. The MongoDB performs better with normal virtual memory
hardware for a config server should therefore be focused pages.
on availability: redundant power supplies, redundant • Disable NUMA in your BIOS or invoke mongod with
network interfaces, redundant RAID controllers, and NUMA disabled.
redundant storage should be used. Config servers can be
• Ensure that readahead settings for the block devices
deployed as a replica set with up to 50 members.
that store the database files are relatively small as most
Typically multiple mongos instances are used in a sharded access is non-sequential. For example, setting
MongoDB system. It is not uncommon for MongoDB users readahead to 32 (16 KB) is a good starting point.

12
• Synchronize time between your hosts – for example, other topics is available in the MongoDB Security Tutorials.
using NTP. This is especially important in sharded Review the Security section later in this guide for more
MongoDB clusters. This also applies to VM guests information on best practices on securing your deployment.
running MongoDB processes.
MongoDB offers IP whitelisting, allowing administrators to
Linux provides controls to limit the number of resources configure MongoDB to only accept external connections
and open files on a per-process and per-user basis. The from approved IP addresses or CIDR ranges that have
default settings may be insufficient for MongoDB. been explicitly added to the whitelist.
Generally MongoDB should be the only process on a
When running on virtual machines, use paravirtualized
system, VM, or container to ensure there is no contention
drivers to implement an optimized network and storage
with other processes.
interfaces that passes instructions between the virtual
While each deployment has unique requirements, the machine and the hypervisor with minimal overhead.
following configurations are a good starting point for
mongod and mongos instances. Use ulimit to apply these
Network Compression
settings:
As a distributed database, MongoDB relies on efficient
• -f (file size): unlimited
network transport during query routing and inter-node
• -t (CPU time): unlimited replication. MongoDB compresses all network traffic
between client and the database, and traffic between
• -v (virtual memory): unlimited
nodes of the cluster. Based on the snappy compression
• -n (open files): above 20,000 algorithm, network traffic can be compressed by up to
• -m (memory size): unlimited 70%, providing major performance benefits in
bandwidth-constrained environments, and reducing
• -u (processes/threads): above 20,000
networking costs.
For more on using ulimit to set the resource limits for
Compressing and decompressing network traffic requires
MongoDB, see the MongoDB Documentation page on
CPU resources – typically low single digit percentage
Linux ulimit Settings.
overhead. Compression is ideal for those environments
where performance is bottlenecked by bandwidth, and
Networking sufficient CPU capacity is available.

Always run MongoDB in a trusted environment with


network rules that prevent access from all unknown Production-Proven Recommendations
entities. There are a finite number of predefined processes
The latest recommendations on specific configurations for
that communicate with a MongoDB system: application
operating systems, file systems, storage devices, and other
servers, monitoring processes, and other MongoDB
system-related topics are maintained in the MongoDB
processes running in a replica set or sharded cluster.
Production Notes.
From the MongoDB 3.6 release onwards, MongoDB binds
to localhost by default. As a result, all networked
connections to the database will be denied unless explicitly
Continuous Availability
configured by an administrator. Review the documentation.
If your system has more than one network interface, bind Under normal operating conditions, a MongoDB system will
MongoDB processes to the private or internal network perform according to the performance and functional goals
interface. of the system. However, from time to time certain inevitable
failures or unintended actions can affect a system in
Detailed information on default port numbers for
adverse ways. Hard drives, network cards, power supplies,
MongoDB, configuring firewalls for MongoDB, VPN, and

13
and other hardware components will fail. These risks can reads and writes. Simply placing the journal files on a
be mitigated with redundant hardware components. separate storage device normally provides some
Similarly, a MongoDB system provides configurable performance enhancements by reducing disk contention.
redundancy throughout its software components as well as
Learn more about journaling from the documentation.
configurable data redundancy.

Journaling Data Redundancy


MongoDB maintains multiple copies of data, called replica
MongoDB implements write-ahead journaling of operations
sets, using native replication. Users should use replica sets
to enable fast crash recovery and durability in the storage
to help prevent database downtime. Replica failover is fully
engine. In the case of a server crash, journal entries are
automated in MongoDB, so it is not necessary to manually
recovered when the server process is restarted.
intervene to recover nodes in the event of a failure.
The WiredTiger journal ensures that writes are persisted to
A replica set consists of multiple replica nodes. At any
disk between checkpoints. WiredTiger uses checkpoints to
given time, one member acts as the primary replica and the
flush data to disk by default every 60 seconds after the
other members act as secondary replicas. If the primary
prior flush or after 2GB of data has been written. Thus, by
replica set member suffers an outage (e.g., a power failure,
default, WiredTiger can lose more than 60 seconds of
hardware fault, network partition), one of the secondary
writes if running without journaling – though the risk of this
members is automatically elected to primary, typically within
loss will typically be much less if using replication to other
several seconds, and the client connections automatically
nodes for additional durability. The WiredTiger write ahead
failover to that new primary. Any writes that could not be
log is not necessary to keep the data files in a consistent
serviced during the election can be automatically retried by
state in the event of an unclean shutdown, and so it is safe
the drivers once a new primary is established, with the
to run without journaling enabled, though to ensure
MongoDB server enforcing exactly-once processing
durability the "replica safe" write concern should be used
semantics. Retryable writes enable MongoDB to ensure
(see the Write Availability section later in the guide for
write availability, without sacrificing data consistency.
more information).
Sophisticated algorithms control the election process,
WiredTiger provides the ability to compress the journal on
ensuring only the most suitable secondary member is
disk, thereby reducing storage space.
promoted to primary, and reducing the risk of unnecessary
For additional guarantees, the administrator can configure failovers (also known as "false positives"). The election
the journaled write concern, whereby MongoDB algorithm processes a range of parameters including
acknowledges the write operation only after committing analysis of histories to identify those replica set members
the data to the journal. When using a write concern greater that have applied the most recent updates from the
than 1 and the v1 replication protocol2, the application will primary, heartbeat and connectivity status, and
not receive an acknowledgement until the write has been user-defined priorities assigned to replica set members.
journaled on the specified number of secondaries and For example, administrators can configure all replicas
when using a write concern of “majority” it must also be located in a secondary data center to be candidates for
journaled on the primary. election only if the primary data center fails. Once the new
primary replica set member has been elected, remaining
Locating MongoDB's journal files and data files on secondary members are automatically start replicating from
separate storage arrays can help performance. The I/O the new primary. If the original primary comes back on-line,
patterns for the journal are very sequential in nature and it will recognize that it is no longer the primary and will
are well suited for storage devices that are optimized for reconfigure itself to become a secondary replica set
fast sequential writes, whereas the data files are well member.
suited for storage devices that are optimized for random

2. Enhanced (v1) replication protocol – earlier versions are referred to as v0

14
The number of replica nodes in a MongoDB replica set is Multi-Data Center Replication
configurable, and a larger number of replica nodes
provides increased protection against database downtime MongoDB replica sets allow for flexible deployment
in case of multiple machine failures. While a node is down designs both within and across data centers that account
MongoDB will continue to function. The DBA or sysadmin for failure at the server, rack, and regional levels. In the
should work to recover or replace the failed replica in order case of a natural or human-induced disaster, the failure of
to mitigate the temporarily reduced resilience of the a single data center can be accommodated with no
system. downtime when MongoDB replica sets are deployed
across data centers. Multi-data center replication is also
Replica sets also provide operational flexibility by providing fully supported as a managed service in MongoDB Atlas.
sysadmins with an option for performing hardware and
software maintenance without taking down the entire
system. Using a rolling upgrade, secondary members of the Write Guarantees
replica set can be upgraded in turn, before the
MongoDB allows administrators to specify the level of
administrator demotes the master to complete the
persistence guarantee when issuing writes to the
upgrade. This process is fully automated when using Ops
database, which is called the write concern. The following
Manager or Cloud Manager – discussed later in this guide.
options can be configured on a per connection, per
Consider the following factors when developing the database, per collection, or even per operation basis. The
architecture for your replica set: options are as follows:

• Ensure that the members of the replica set will always • Write Ac
Acknowledged:
knowledged: This is the default write concern.
be able to elect a primary. A strict majority of voting The mongod will confirm the execution of the write
cluster members must be available and in contact with operation, allowing the client to catch network, duplicate
each other to elect a new primary. Therefore you should key, Document Validation, and other exceptions.
run an odd number of members. There should be at • Journal Ac
Acknowledged:
knowledged: The mongod will confirm the
least three replicas with copies of the data in a replica write operation only after it has flushed the operation to
set. the journal on the primary. This confirms that the write
• Best practice is to have a minimum of 3 data centers so operation can survive a mongod crash and ensures that
that a majority is maintained after the loss of any single the write operation is durable on disk.
site. If only 2 sites are possible then know where the
• Replic
Replicaa Ac
Acknowledged:
knowledged: It is also possible to wait for
majority of members will be in the case of any network
acknowledgment of writes to other replica set members.
partitions and attempt to ensure that the replica set can
MongoDB supports writing to a specific number of
elect a primary from the members located in that
replicas. This also ensures that the write is written to the
primary data center.
journal on the secondaries. Because replicas can be
• Consider including a hidden member in the replica set. deployed across racks within data centers and across
Hidden replica set members can never become a multiple data centers, ensuring writes propagate to
primary and are typically used for backups, or to run additional replicas can provide extremely robust
applications such as analytics and reporting that require durability.
isolation from regular operational workloads. Delayed
• Majority: This write concern waits for the write to be
replica set members can also be deployed that apply
applied to a majority of replica set members. This also
changes on a fixed time delay to provide recovery from
ensures that the write is recorded in the journal on
unintentional operations, such as accidentally dropping
these replicas – including on the primary.
a collection.
• Dat
Dataa Center AAwar
wareness:
eness: Using tag sets, sophisticated
More information on replica sets can be found on the policies can be created to ensure data is written to
Replication MongoDB documentation page. specific combinations of replicas prior to

15
acknowledgment of success. For example, you can MongoDB offers a readConcern level of “Linearizable”. The
create a policy that requires writes to be written to at linearizable read concern ensures that a node is still the
least three data centers on two continents, or two primary member of the replica set at the time of the read,
servers across two racks in a specific data center. For and that the data it returns will not be rolled back if another
more information see the MongoDB Documentation on node is subsequently elected as the new primary member.
Data Center Awareness. Configuring this read concern level can have a significant
impact on latency, therefore a maxTimeMS value should be
supplied in order to timeout long running operations.
Read Preferences
Reading from the primary replica is the default
Causal Consistency
configuration as it guarantees consistency. If higher read
throughput is required, it is recommended to take Causal consistency – guarantees that every read operation
advantage of MongoDB's auto-sharding to distribute read within a client session will always see the previous write
operations across multiple primary members. With operation, regardless of which replica is serving the
MongoDB's read concern levels, discussed below, request. By enforcing strict, causal ordering of operations
administrators can tune MongoDB read consistency across within a session, causal consistency ensures every read is
members of the replica set. always logically consistent, enabling monotonic reads from
a distributed system – guarantees that cannot be met by
Distributing read operations across replica set members
most multi-node databases. Causal consistency allows
can improve read scalability of the MongoDB deployment.
developers to maintain the benefits of strict data
For example, analytics and Business Intelligence (BI)
consistency enforced by legacy single node relational
applications can execute queries against a secondary
databases, while modernizing their infrastructure to take
replica, thereby reducing overhead on the primary and
advantage of the scalability and availability benefits of
enabling MongoDB to serve operational and analytical
modern distributed data platforms.
workloads from a single deployment. Another configuration
option directs reads to the replica nearest to the user
based on ping distance, which can significantly decrease Scaling a MongoDB System
the latency of read operations in globally distributed
applications at the expense of potentially reading slightly
stale data. Horizontal Scaling with Automatic
Sharding
A very useful option is primaryPreferred, which issues
reads to a secondary replica only if the primary is To meet the needs of apps with large data sets and high
unavailable. This configuration allows for the continuous throughput requirements, MongoDB provides horizontal
availability of reads during the short failover process. scale-out for databases on low-cost, commodity hardware
or cloud infrastructure using a technique called sharding.
For more on the subject of configurable reads, see the
Sharding automatically partitions and distributes data
MongoDB Documentation page on replica set Read
across multiple physical instances called shards. Each
Preference.
shard is backed by a replica set to provide always-on
availability and workload isolation. Sharding allows
Read Concerns developers to seamlessly scale the database as their apps
grow beyond the hardware limits of a single server, and it
To ensure isolation and consistency, the readConcern can does this without adding complexity to the application. To
be set to majority to indicate that data should only be respond to workload demand, nodes can be added or
returned to the application if it has been replicated to a removed from the cluster in real time, and MongoDB will
majority of the nodes in the replica set, and so cannot be automatically rebalance the data accordingly, without
rolled back in the event of a failure. manual intervention.

16
Sharding is transparent to applications; whether there is • RAM Limit
Limitation:
ation: The size of the system's active
one or a thousand shards, the application code for querying working set plus indexes is expected to exceed the
MongoDB remains the same. Applications issue queries to capacity of the maximum amount of RAM in the system.
a query router that dispatches the query to the appropriate
• Disk II/O
/O Limit
Limitation:
ation: The system will have a large
shards. For key-value queries that are based on the shard
amount of write activity, and the operating system will
key, the query router will dispatch the query to the shard
not be able to write data fast enough to meet demand,
that manages the document with the requested key. When
or I/O bandwidth will limit how fast the writes can be
using range-based sharding, queries that specify ranges on
flushed to disk.
the shard key are only dispatched to shards that contain
documents with values within the range. For queries that • Storage Limit
Limitation:
ation: The data set will grow to exceed
don’t use the shard key, the query router will broadcast the the storage capacity of a single node in the system.
query to all shards, aggregating and sorting the results as • Dat
Dataa placement rrequir
equirements:
ements: The data set needs to
appropriate. Multiple query routers can be used within a be assigned to a specific data center to support low
MongoDB cluster, with the appropriate number governed latency local reads and writes, or for data sovereignty to
by the performance and availability requirements of the meet privacy regulations such as the GDPR.
application. Alternatively, data placement might be required to
create multi-temperature storage infrastructures that
MongoDB exposed multiple sharding policies. As a result,
separate hot and cold data onto specific volumes.
data can be distributed according to query patterns or data
MongoDB gives you this flexibility.
placement requirements, giving developers much higher
scalability across a diverse et of workloads: Applications that meet these criteria, or that are likely to do
so in the future, should be designed for sharding in
• Ranged Shar
Sharding
ding. Documents are partitioned across
advance rather than waiting until they have consumed
shards according to the shard key value. Documents
available capacity. Applications that will eventually benefit
with shard key values close to one another are likely to
from sharding should consider which collections they will
be co-located on the same shard. This approach is well
want to shard and the corresponding shard keys when
suited for applications that need to optimize range
designing their data models. If a system has already
based queries, such as co-locating data for all
reached or exceeded its capacity, it will be challenging to
customers in a specific region on a specific shard.
deploy sharding without impacting the application's
• Hashed Shar
Sharding
ding. Documents are distributed performance.
according to an MD5 hash of the shard key value. This
approach guarantees a uniform distribution of writes
across shards, which is often optimal for ingesting Sharding Best Practices
streams of time-series and event data.
Users who choose to shard should consider the following
• Zoned Shar
Sharding
ding. Provides the ability for developers to best practices:
define specific rules governing data placement in a
Select a good shar
shardd key
key.. When selecting fields to use as
sharded cluster. Zones are discussed in more detail in
a shard key, there are at least three key criteria to consider:
the following Data Locality section of the guide.
1. Cardinality: Data partitioning is managed in 64 MB
Thousands of organizations use MongoDB to build
chunks by default. Low cardinality (e.g., a user's home
high-performance systems at scale. You can read more
country) will tend to group documents together on a
about them on the MongoDB scaling page.
small number of shards, which in turn will require
Users should consider deploying a sharded cluster in the frequent rebalancing of the chunks and a single country
following situations: is likely to exceed the 64 MB chunk size. Instead, a
shard key should exhibit high cardinality.

17
2. Insert Scaling: Writes should be evenly distributed the balancer or to configure when balancing is performed
across all shards based on the shard key. If the shard to further minimize the impact on performance.
key is monotonically increasing, for example, all inserts
will go to the same shard even if they exhibit high
cardinality, thereby creating an insert hotspot. Instead,
Geographic Distribution
the key should be evenly distributed. Shards can be configured such that specific ranges of
3. Query Isolation: Queries should be targeted to a specific shard key values are mapped to a physical shard location.
shard to maximize scalability. If queries cannot be Zoned sharding allows a MongoDB administrator to control
isolated to a specific shard, all shards will be queried in the physical location of documents in a MongoDB cluster,
a pattern called scatter/gather, which is less efficient even when the deployment spans multiple data centers in
than querying a single shard. different regions.

For more on selecting a shard key, see Considerations for It is possible to combine the features of replica sets, zoned
Selecting Shard Keys. sharding, read preferences, and write concerns in order to
provide a deployment that is geographically distributed,
Add ccapacity
apacity befor
beforee it is needed. Cluster maintenance enabling users to read and write to their local data centers.
is lower risk and more simple to manage if capacity is An administrator can restrict sharded collections to a
added before the system is over utilized. specific set of shards, effectively federating those shards
for different users. For example, one can tag all USA data
Run thr
three
ee or mor
moree configuration servers to pr
provide
ovide
and assign it to shards located in the United States.
redundancy
edundancy.. Production deployments must use three or
more config servers. Config servers should be deployed in To learn more, download the MongoDB Multi-Datacenter
a topology that is robust and resilient to a variety of Deployments Guide.
failures.

Use rreplic
eplicaa sets. Sharding and replica sets are absolutely
compatible. Replica sets should be used in all deployments,
Managing MongoDB:
and sharding should be used when appropriate. Sharding Provisioning, Monitoring and
allows a database to make use of multiple servers for data
Disaster Recovery
capacity and system throughput. Replica sets maintain
redundant copies of the data across servers, server racks,
and even data centers. If you are running your apps and databases in the public
cloud, MongoDB offers the fully managed, on-demand and
Use multiple mongos inst
instances.
ances. elastic MongoDB Atlas service. Atlas enables customers to
deploy, operate, and scale MongoDB databases on AWS,
Apply best practices for bulk inserts. Pre-split data into
Azure, or GCP in just a few clicks or programmatic API
multiple chunks so that no balancing is required during the
calls. MongoDB Atlas is available through a pay-as-you-go
insert process. Alternately, disable the balancer during bulk
model and billed on an hourly basis. A fuller description of
loads. Also, use multiple mongos instances to load in
MongoDB Atlas is included later in this guide.
parallel for greater throughput. For more information see
Create Chunks in a Sharded Cluster in the MongoDB If you are running MongoDB yourself, Ops Manager is the
Documentation. simplest way to run the database on your own
infrastructure, making it easy for operations teams to
deploy, monitor, backup, and scale MongoDB. Many of the
Dynamic Data Balancing capabilities of Ops Manager are also available in the
As data is loaded into MongoDB, the system may need to MongoDB Cloud Manager service hosted in the cloud.
dynamically rebalance chunks across shards in the cluster Today, Cloud Manager supports thousands of deployments,
using a process called the balancer. It is possible to disable including systems from one to hundreds of servers.

18
Organizations who run their deployments with MongoDB the infrastructure through agents installed on each server.
Enterprise Advanced can choose between Ops Manager The servers can reside in the public cloud or a private data
and Cloud Manager. center. Ops Manager reliably orchestrates the tasks that
administrators have traditionally performed manually –
Ops Manager and Cloud Manager incorporate best
deploying a new cluster, upgrades, creating point in time
practices to help keep managed databases healthy and
backups, rolling out new indexes, and many other
optimized. They ensure operational continuity by converting
operational tasks.
complex manual tasks into reliable, automated procedures
with the click of a button or via an API call: Ops Manager is designed to adapt to problems as they
arise by continuously assessing state and making
• Deploy
Deploy.. Any topology, at any scale
adjustments as needed. Here’s how:
• Upgrade. In minutes, with no downtime
• Ops Manager agents are installed on servers (where
• Sc
Scale.
ale. Add capacity, without taking the application MongoDB will be deployed), either through
offline. configuration tools such as Ansible, Chef or Puppet, or
• Point-in-time, Sc
Scheduled
heduled Bac
Backups.
kups. Restore by an administrator.
complete running clusters to any point in time with just • The administrator creates a new design goal for the
a few clicks, because disasters aren't predictable. system, either as a modification to an existing
• Queryable Bac
Backups.
kups. Allow partial restores of selected deployment (e.g., upgrade, oplog resize, new shard), or
data, and the ability to query a backup file in-place, as a new system.
without having to restore it. • The agents periodically check in with the Ops Manager
• Performance Alerts. Monitor 100+ system metrics central server and receive the new design instructions.
and get custom alerts before the system degrades. • Agents create and follow a plan for implementing the
• Roll Out Indexes. Avoid impact to the application by design. Using a sophisticated rules engine, agents
introducing new indexes node by node – starting with continuously adjust their individual plans as conditions
the secondaries and then the demoted primary. change. In the face of many failure scenarios – such as
server failures and network partitions – agents will
• Manage Zones. Configure sharding Zones to mandate
revise their plans to reach a safe state.
what data is stored where.
• Minutes later, the system is deployed – safely and
• Dat
Dataa Explor
Explorer
er.. Examine the database’s schema by
reliably.
running queries to review document structure, viewing
collection metadata, and inspecting index usage Beyond deploying new databases, Ops Manager can
statistics "attach to" or import existing MongoDB deployments and
take over their control.
The Operations Rapid Start service gives your operations
and devops teams the skills and tools to run and manage In addition to initial deployment, Ops Manager make it
MongoDB with all the best practices accumulated over possible to dynamically resize capacity by adding shards
many years working with some of the world's largest and replica set members. Other maintenance tasks such as
companies. This engagement offers introductory upgrading MongoDB or resizing the oplog can be reduced
administrator training and custom consulting to help you from dozens or hundreds of manual steps to the click of a
set up and use either MongoDB Ops Manager or button, all with zero downtime.
MongoDB Cloud Manager.
A common DBA task is to roll out new indexes in
production systems. In order to minimize the impact to the
Deployments and Upgrades live system, the best practice is to perform a rolling index
build – starting with each of the secondaries and finally
Ops Manager coordinates critical operational tasks across
applying changes to the original primary, after swapping its
the servers in a MongoDB system. It communicates with

19
role with one of the secondaries. While this rolling process performance, and system capacity utilization. These
can be performed manually, Ops Manager and Cloud baselines should reflect the workloads you expect the
Manager can automate the process across MongoDB system to perform in production, and they should be
replica sets, reducing operational overhead and the risk of revisited periodically as the number of users, application
failovers caused by incorrectly sequencing management features, performance SLA, or other factors change.
processes.
Baselines will help you understand when the system is
Administrators can use the Ops Manager interface directly, operating as designed, and when issues begin to emerge
or invoke the Ops Manager RESTful API from existing that may affect the quality of the user experience or other
enterprise tools, including popular monitoring and factors critical to the system. It is important to monitor your
orchestration frameworks. Specific integration is provided MongoDB system for unusual behavior so that actions can
with the leading Application Performance Management be taken to address issues proactively. The following
(APM) tools. Details are included later in this section of the represents the most popular tools for monitoring
guide. MongoDB, and also describes different aspects of the
system that should be monitored.
Cloud Native Integration. Ops Manager can be
integrated with Pivotal Cloud Foundry, Red Hat OpenShift,
and Kubernetes. With Ops Manager, you can rapidly deploy Monitoring with Ops Manager and Cloud
MongoDB Enterprise powered applications by abstracting Manager
away the complexities of managing, scaling and securing
Featuring charts, custom dashboards, and automated
hybrid clouds. Ops Manager coordinates orchestration
alerting, Ops Manager tracks 100+ key database and
between your cloud native platform, which handles the
systems health metrics including operations counters,
underlying infrastructure, while Ops Manager handles the
memory and CPU utilization, replication status, open
MongoDB instances, automatically configured and
connections, queues, and any node status. Ops Manager
managed with operational best practices.
allows telemetry data to be collected every 10 seconds.
With this integration, you can consistently and effortlessly
run workloads wherever they need to be, standing up the
same database configuration in different environments, all
controlled from a single pane of glass.

Ops Manager features such as server pooling make it


easier to build a database as a service within a private
cloud environment. Ops Manager will maintain a pool of
globally provisioned servers that have agents already
installed. When users want to create a new MongoDB
deployment, they can request servers from this pool to host
the MongoDB cluster. Administrators can even associate
certain properties with the servers in the pool and expose Figur
Figuree 4: Ops Manager: simple, intuitive, and powerful.
server properties as selectable options when a user Deploy and upgrade entire clusters with a single click.
initiates a request for new instances.
The metrics are securely reported to Ops Manager and
Cloud Manager where they are processed, aggregated,
Monitoring & Capacity Planning alerted and visualized in a browser, letting administrators
System performance and capacity planning are two easily determine the health of MongoDB in real-time. Views
important topics that should be addressed as part of any can be based on explicit permissions, so project team
MongoDB deployment. Part of your planning should involve visibility can be restricted to their own applications, while
establishing baselines on data volume, system load,

20
systems administrators can monitor all the MongoDB The free monitoring service is available to all MongoDB
deployments in the organization. users, without needing to install an agent, navigate a
paywall, or complete a registration form. You will be able to
Historic performance can be reviewed in order to create
see the metrics and topology about your environment from
operational baselines and to support capacity planning.
the moment free monitoring is enabled. You can enable
Integration with existing monitoring tools is also
free monitoring easily using the MongoDB shell, MongoDB
straightforward via the Ops Manager RESTful API, making
Compass, or by starting the mongod process with the new
the deep insights from Ops Manager part of a consolidated
'db.enableFreeMonitoring()' command line option, and you
view across your operations.
can opt out at any time.
Ops Manager allows administrators to set custom alerts
With the Monitoring Cloud Service, the collected metrics
when key metrics are out of range. Alerts can be
enable you to quickly assess database health and optimize
configured for a range of parameters affecting individual
performance, all from the convenience of a powerful
hosts, replica sets, agents, and backup. Alerts can be sent
browser-based GUI. Monitoring features include
via SMS, email, webhooks, Flowdock, HipChat, and Slack or
integrated into existing incident management systems such • Environment information: Topology (standalone, replica
as PagerDuty to proactively warn of potential issues, sets including primary and secondary nodes). MongoDB
before they escalate to costly outages. version.

If using Cloud Manager, access to monitoring data can also • Charts with 24 hours of data for the following metrics,
be shared with MongoDB support engineers, providing fast updated every minute: Database operations per second
issue resolution by eliminating the need to ship logs (averaged to the minute), including commands, queries,
between different teams. updates, deletes, getMores, inserts and replication
operations for replica set secondaries.

• Operation execution time.

• Queues.

• Replication lag.

• Network I/O.

• Memory (resident and virtual).

• Hardware: Process CPU, disk % utilization, Disk % free


space

Learn more at the MongoDB Cloud page.

Figur
Figuree 5: Ops Manager provides real time & historic
visibility into the MongoDB deployment. mongotop
mongotop is a utility that ships with MongoDB. It tracks
Free MongoDB Monitoring Cloud Service
and reports the current read and write activity of a
With the 4.0 release, the MongoDB database can natively MongoDB cluster. mongotop provides collection-level
push monitoring metadata directly to the MongoDB statistics.
Monitoring Cloud. Once enabled, you will be shown a
unique URL that you can navigate to in a web browser, and
instantly see monitoring metrics and topology information mongostat
collected for your environment. You can share the URL to mongostat is a utility that ships with MongoDB. It shows
provide visibility to anyone on your team.
real-time statistics about all servers in your MongoDB
system. mongostat provides a comprehensive overview of

21
all operations, including counts of updates, inserts, page proactively warn administrators of potential issues before
faults, index misses, and many other important measures of users experience a problem.
the system health. mongostat is similar to the Linux tool
vmstat.
Application Logs And Database Logs
Application and database logs should be monitored for
Other Popular Tools
errors and other system information. It is important to
There are a number of popular open-source monitoring correlate your application and database logs in order to
tools for which MongoDB plugins are available. If determine whether activity in the application is ultimately
MongoDB is configured with the WiredTiger storage responsible for other issues in the system. For example, a
engine, ensure the tool is using a WiredTiger-compatible spike in user writes may increase the volume of writes to
driver: MongoDB, which in turn may overwhelm the underlying
storage system. Without the correlation of application and
• Nagios database logs, it might take more time than necessary to
• Ganglia establish that the application is responsible for the
increase in writes rather than some process running in
• Cacti
MongoDB.
• Scout
In the event of errors, exceptions or unexpected behavior,
• Zabbix
the logs should be saved and uploaded to MongoDB when
• Datadog opening a support case. Logs for mongod processes
running on primary and secondary replica set members, as
Linux Utilities well as mongos and config processes will enable the
support team to more quickly root cause any issues.
Other common utilities that can be used to monitor
different aspects of a MongoDB system:
Page Faults
• iostat: Provides usage statistics for the storage
subsystem When a working set ceases to fit in memory, or other
operations have moved working set data out of memory,
• vmstat: Provides usage statistics for virtual memory
the volume of page faults may spike in your MongoDB
• netstat: Provides usage statistics for the network system. Page faults are part of the normal operation of a
MongoDB system, but the volume of page faults should be
• sar: Captures a variety of system statistics periodically
monitored in order to determine if the working set is
and stores them for analysis
growing to the level that it no longer fits in available
memory and if alternatives such as more memory or
Windows Utilities sharding across multiple servers is appropriate. In most
cases, the underlying issue for problems in a MongoDB
Performance Monitor, a Microsoft Management Console
system tends to be page faults.
snap-in, is a useful tool for measuring a variety of stats in a
Windows environment.
Disk

Things to Monitor Beyond memory, disk I/O is also a key performance


consideration for a MongoDB system because writes are
Ops Manager and Cloud Manager can be used to monitor journaled and regularly flushed to disk. Under heavy write
database-specific metrics, including page faults, ops load the underlying disk subsystem may become
counters, queues, connections and replica set status. Alerts overwhelmed, or other processes could be contending with
can be configured against each monitored metric to MongoDB, or the RAID configuration may be inadequate

22
for the volume of writes. Other potential issues could be System Configuration
the root cause, but the symptom is typically visible through
It is not uncommon to make changes to hardware and
iostat as showing high disk utilization and high queuing
software in the course of a MongoDB deployment. For
for writes.
example, a disk subsystem may be replaced to provide
better performance or increased capacity. When
CPU components are changed it is important to ensure their
configurations are appropriate for the deployment.
A variety of issues could trigger high CPU utilization. This
MongoDB is very sensitive to the performance of the
may be normal under most circumstances, but if high CPU
operating system and underlying hardware, and in some
utilization is observed without other issues such as disk
cases the default values for system configurations are not
saturation or pagefaults, there may be an unusual issue
ideal. For example, the default readahead for the file
in the system. For example, a MapReduce job with an
system could be several MB whereas MongoDB is
infinite loop, or a query that sorts and filters a large number
optimized for readahead values closer to 32 KB. If the
of documents from the working set without good index
new storage system is installed without making the change
coverage, might cause a spike in CPU without triggering
to the readahead from the default to the appropriate
issues in the disk system or pagefaults.
setting, the application's performance is likely to degrade
substantially. Remember to review the Production Notes for
Connections latest best practices.

MongoDB drivers implement connection pooling to


facilitate efficient use of resources. Each connection Shard Balancing
consumes 1MB of RAM, so be careful to monitor the total
One of the goals of sharding is to uniformly distribute data
number of connections so they do not overwhelm RAM
across multiple servers. If the utilization of server resources
and reduce the available memory for the working set. This
is not approximately equal across servers there may be an
typically happens when client applications do not properly
underlying issue that is problematic for the deployment. For
close their connections, or with Java in particular, that relies
example, a poorly selected shard key can result in uneven
on garbage collection to close the connections.
data distribution. In this case, most if not all of the queries
will be directed to the single mongod that is managing the
Op Counters data. Furthermore, MongoDB may be attempting to
redistribute the documents to achieve a more ideal balance
The utilization baselines for your application will help you
across the servers. While redistribution will eventually result
determine a normal count of operations. If these counts
in a more desirable distribution of documents, there is
start to substantially deviate from your baselines it may be
substantial work associated with rebalancing the data and
an indicator that something has changed in the application,
this activity itself may interfere with achieving the desired
or that a malicious attack is underway.
performance SLA. By running db.currentOp() you will
be able to determine what work is currently being
Queues performed by the cluster, including rebalancing of
documents across the shards.
If MongoDB is unable to complete all requests in a timely
fashion, requests will begin to queue up. A healthy If in the course of a deployment it is determined that a new
deployment will exhibit very low queues. If metrics start to shard key should be used, it will be necessary to reload the
deviate from baseline performance, caused by a data with a new shard key because designation and values
long-running query for example, requests from applications of the shard keys are immutable. To support the use of a
will start to queue. The queue is therefore a good first new shard key, it is possible to write a script that reads
place to look to determine if there are issues that will affect each document, updates the shard key, and writes it back
user experience. to the database.

23
Replication Lag test, and recovery clusters can be built in a few simple
clicks. Operations teams can configure backups against
Replication lag is the amount of time it takes a write
specific collections only, rather than the entire database,
operation on the primary replica set member to replicate to
speeding up backups and reducing the requisite storage
a secondary member. A small amount of delay is normal,
space.
but as replication lag grows, issues may arise. Typical
causes of replication lag include network latency or Queryable Backups allow partial restores of selected data,
connectivity issues, and disk latencies such as the and the ability to query a backup file in-place, without
throughput of the secondaries being inferior to that of the having to restore it.
primary.
Ops Manager supports cross-project restores, allowing
users to perform restores into a different Ops Manager
Config Server Availability Project than the backup snapshot source. This allows
DevOps teams to easily execute tasks such as creating
In sharded environments it is required to run three or more
multiple staging or test environments that match recent
config servers. Config servers are critical to the system for
production data, while configured with different user
understanding the location of documents across shards.
access privileges or running in different regions.
The database will remain operational in this case, but the
balancer will be unable to move chunks and other Because Ops Manager only reads the oplog, the ongoing
maintenance activities will be blocked until all three config performance impact is minimal – similar to that of adding
servers are available. Config servers are, by default, be an additional replica to a replica set.
deployed as a MongoDB replica set.
By using MongoDB Enterprise Advanced you can deploy
Ops Manager to control backups in your local data center,
Disaster Recovery: Backup & Recovery or use the Cloud Manager service that offers a fully
managed backup solution with a pay-as-you-go model.
A backup and recovery strategy is necessary to protect
Dedicated MongoDB engineers monitor user backups on a
your mission-critical data against catastrophic failure, such
24x365 basis, alerting operations teams if problems arise.
as a fire or flood in a data center, or human error such as
code errors or accidentally dropping collections. With a Ops Manager and Cloud Manager are not the only
backup and recovery strategy in place, administrators can mechanisms for backing up MongoDB. Other options
restore business operations without data loss, and the include:
organization can meet regulatory and compliance
requirements. Taking regular backups offers other • File system copies
advantages, as well. The backups can be used to seed new • The mongodump tool packaged with MongoDB
environments for development, staging, or QA without
impacting production systems.
File System Backups
Ops Manager and Cloud Manager backups are maintained
File system backups, such as that provided by Linux LVM,
continuously, just a few seconds behind the operational
quickly and efficiently create a consistent snapshot of the
system. If the MongoDB cluster experiences a failure, the
file system that can be copied for backup and restore
most recent backup is only moments behind, minimizing
purposes. For databases with a single replica set it is
exposure to data loss. MongoDB Atlas, Ops Manager, and
possible to stop operations temporarily so that a consistent
Cloud Manager are the only MongoDB solutions that offer
snapshot can be created by issuing the db.fsyncLock()
point-in-time backup of replica sets and cluster-wide
snapshots of sharded clusters. You can restore to precisely command. This will flush all pending writes to disk and lock
the moment you need, quickly and safely. Ops teams can the entire mongod instance to prevent additional writes
automate their database restores reliably and safely using until the lock is released with db.fsyncUnlock().
Ops Manager and Cloud Manager. Complete development,

24
For more on how to use file system snapshots to create a
backup of MongoDB, please see Backup and Restore with
Filesystem Snapshots in the MongoDB Documentation.

Only MongoDB Atlas, Ops Manager, and Cloud Manager


provide an automated method for taking a consistent
backup across all shards.

For more on backup and restore in sharded environments,


see the MongoDB Documentation page on Backup and
Restore Sharded Clusters and the tutorial on Backup a
Sharded Cluster with Filesystem Snapshots. Figur
Figuree 6: MongoDB integrated into a single view of
application performance

mongodump The MongoDB drivers include an API that exposes query


performance metrics to APM tools. Administrators can
mongodump is a tool bundled with MongoDB that performs
monitor time spent on each operation, and identify slow
a live backup of the data in MongoDB. mongodump may be
running queries that require further analysis and
used to dump an entire database, collection, or result of a
optimization.
query. mongodump can produce a dump of the data that
reflects a single moment in time by dumping the oplog In addition, Ops and Cloud Manager now provide packaged
entries created during the dump and then replaying it integration with the New Relic platform. Key metrics from
during mongorestore, a tool that imports content from Ops Manager are accessible to the APM for visualization,
BSON database dumps produced by mongodump. enabling MongoDB health to be monitored and correlated
with the rest of the application estate.

Integrating MongoDB with External As shown in Figure 6, summary metrics are presented
Monitoring Solutions within the APM’s UI. Administrators can also run New Relic
Insights for analytics against monitoring data to generate
The Ops Manager API provides integration with external dashboards that provide real-time tracking of Key
management frameworks through programmatic access to Performance Indicators (KPIs).
automation features and monitoring data.

In addition to Ops Manager, MongoDB Enterprise


Advanced can report system information to SNMP traps,
Security
supporting centralized data collection and aggregation via
external monitoring solutions. Review the documentation to As with all software, MongoDB administrators must
learn more about SNMP integration. consider security and risk exposure for a MongoDB
deployment. There are no magic solutions for risk
mitigation, and maintaining a secure MongoDB deployment
APM Integration
is an ongoing process.
Many operations teams use Application Performance
Monitoring (APM) platforms to gain global oversight of
their complete IT infrastructure from a single management
Defense in Depth
UI. Issues that risk affecting customer experience can be A Defense in Depth approach is recommended for
quickly identified and isolated to specific components – securing MongoDB deployments, and it addresses a
whether attributable to devices, hardware infrastructure, number of different methods for managing risk and
networks, APIs, application code, databases, and more. reducing risk exposure.

25
The intention of a Defense in Depth approach is to layer MongoDB also extends existing support for authenticating
your environment to ensure there are no exploitable single users via LDAP to now include LDAP authorization. This
points of failure that could allow an intruder or untrusted enables existing user privileges stored in the LDAP server
party to access the data stored in the MongoDB database. to be mapped to MongoDB roles, without users having to
The most effective way to reduce the risk of exploitation is be recreated in MongoDB itself.
to run MongoDB in a trusted environment, to limit access,
to follow a system of least privileges, to institute a secure
development lifecycle, and to follow deployment best
Auditing
practices. MongoDB Enterprise Advanced enables security
administrators to construct and filter audit trails for any
MongoDB Enterprise Advanced features extensive
operation against MongoDB, whether DML, DCL or DDL.
capabilities to defend, detect and control access to
For example, it is possible to log and audit the identities of
MongoDB, offering among the most complete security
users who retrieved specific documents, and any changes
controls of any modern database:
made to the database during their session. The audit log
• Access Contr
Control.
ol. Control access to sensitive data using can be written to multiple destinations in a variety of
industry standard mechanisms for authentication and formats including to the console and syslog (in JSON
authorization to the database, collection, and down to format), and to a file (JSON or BSON), which can then be
the level of individual fields within a document. loaded to MongoDB and analyzed to identify relevant
• Auditing. Ensure regulatory and internal compliance. events

• Encryption. Protect data in motion over the network


and at rest in persistent storage. Encryption
• Administrative Contr
Controls.
ols. Identify potential exploits MongoDB data can be encrypted on the network and on
faster and reduce their impact. disk.
• Network PrProtection.
otection. Refer to the earlier Networking
Support for TLS allows clients to connect to MongoDB
session for details.
over an encrypted channel. MongoDB supports FIPS
Review the MongoDB Security Reference Architecture to 140-2 encryption when run in FIPS Mode with a FIPS
learn more about each of the security features discussed validated Cryptographic module.
below.
Data at rest can be protected using:

• The MongoDB Encrypted storage engine


Authentication
• Certified database encryption solutions from MongoDB
Authentication can be managed from within the database partners such as IBM and Vormetric
itself or via MongoDB Enterprise Advanced integration with
• Logic within the application itself
external security mechanisms including LDAP, Windows
Active Directory, and Kerberos. With the Encrypted storage engine, protection of data
at-rest now becomes an integral feature of the database.
By natively encrypting database files on disk,
Authorization administrators eliminate both the management and
MongoDB allows administrators to define permissions for a performance overhead of external encryption mechanisms.
user or application, and what data it can access when This new storage engine provides an additional level of
querying the database. MongoDB provides the ability to defense, allowing only those staff with the appropriate
configure granular role-based access control, making it database credentials access to encrypted data.
possible to realize a separation of duties between different
entities accessing and managing the database.

26
Using the Encrypted storage engine, the raw database
MongoDB Atlas: Database as a
“plain text” content is encrypted using an algorithm that
takes a random encryption key as input and generates Service For MongoDB
"ciphertext" that can only be read if decrypted with the
decryption key. The process is entirely transparent to the
An increasing number of companies are moving to the
application. MongoDB supports a variety of encryption
public cloud to not only reduce the operational overhead of
algorithms – the default is AES-256 (256 bit encryption) in
managing infrastructure, but also provide their teams with
CBC mode. AES-256 in GCM mode is also supported.
access to on-demand services that give them the agility
Encryption can be configured to meet FIPS 140-2
they need to meet faster application development cycles.
requirements.
This move from building IT to consuming IT as a service is
The storage engine encrypts each database with a well aligned with parallel organizational shifts including
separate key. The key-wrapping scheme in MongoDB agile and DevOps methodologies and microservices
wraps all of the individual internal database keys with one architectures. Collectively these seismic shifts in IT help
external master key for each server. The Encrypted storage companies prioritize developer agility, productivity and time
engine supports two key management options – in both to market.
cases, the only key being managed outside of MongoDB is
MongoDB offers the fully managed, on-demand and elastic
the master key:
MongoDB Atlas service, in the public cloud. Atlas enables
• Local key management via a keyfile customers to deploy, operate, and scale MongoDB
databases on AWS, Azure, or GCP in just a few clicks or
• Integration with a third party key management appliance
programmatic API calls. MongoDB Atlas is available
via the KMIP protocol (recommended)
through a pay-as-you-go model and billed on an hourly
basis. It’s easy to get started – use a simple GUI to select
Read-Only, Redacted Views the public cloud provider, region, instance size, and features
you need. MongoDB Atlas provides:
DBAs can define non-materialized views that expose only a
subset of data from an underlying collection, i.e. a view that • Automated database and infrastructure provisioning so
filters out specific fields. DBAs can define a view of a teams can get the database resources they need, when
collection that's generated from an aggregation over they need them, and can elastically scale whenever they
another collection(s) or view. Permissions granted against need to.
the view are specified separately from permissions granted • Security features to protect your data, with network
to the underlying collection(s). isolation, fine-grained access control, auditing, and
end-to-end encryption, enabling you to comply with
Views are defined using the standard MongoDB Query
industry regulations such as HIPAA.
Language and aggregation pipeline. They allow the
inclusion or exclusion of fields, masking of field values, • Built in replication both within and across regions for
filtering, schema transformation, grouping, sorting, limiting, always-on availability.
and joining of data using $lookup and $graphLookup to • Global clusters allows you to deploy a fully managed,
another collection. globally distributed database that provides low latency,
responsive reads and writes to users anywhere, with
You can learn more about MongoDB read-only views from
strong data placement controls for regulatory
the documentation.
compliance.

• Fully managed, continuous and consistent backups with


point in time recovery to protect against data corruption,
and the ability to query backups in-place without full
restores.

27
• Fine-grained monitoring and customizable alerts for The Stitch serverless platform addresses these challenges
comprehensive performance visibility. by providing four services:

• Automated patching and single-click upgrades for new • Stitc


Stitchh QueryAnywher
QueryAnywhere e. Brings MongoDB's rich query
major versions of the database, enabling you to take language safely to the edge. An intuitive SDK provides
advantage of the latest and greatest MongoDB full access to your MongoDB database from mobile and
features. IoT devices. Authentication and declarative or
• Live migration to move your self-managed MongoDB programmable access rules empower you to control
clusters into the Atlas service or to move Atlas clusters precisely what data your users and devices can access.
between cloud providers. • Stitc
StitchhFFunctions
unctions. Stitch's HTTP service and webhooks
• Widespread coverage on the major cloud platforms with let you create secure APIs or integrate with
availability in over 50 cloud regions across Amazon Web microservices and server-side logic. The same SDK that
Services, Microsoft Azure, and Google Cloud Platform. accesses your database, also connects you with popular
MongoDB Atlas delivers a consistent experience across cloud services, enriching your apps with a single method
each of the cloud platforms, ensuring developers can call. Your custom, hosted JavaScript functions bring
deploy wherever they need to, without compromising everything together.
critical functionality or risking lock-in.
• Stitc
Stitch
hTTriggers
riggers. Real-time notifications let your
MongoDB Atlas can be used for everything from a quick application functions react in response to database
Proof of Concept, to dev/test/QA environments, to changes, as they happen, without the need for wasteful,
powering production applications. The user experience laggy polling.
across MongoDB Atlas, Cloud Manager, and Ops Manager • Stitc
Stitch
h Mobile Sync (coming soon). Automatically
is consistent, ensuring that you easily move from synchronizes data between documents held locally in
on-premises to the public cloud, and between providers as MongoDB Mobile and your backend database, helping
your needs evolve. resolve any conflicts – even after the mobile device has
been offline.
Built and run by the same team that engineers the
database, MongoDB Atlas is the best way to run MongoDB Whether building a mobile, IoT, or web app from scratch,
in the cloud. Learn more or deploy a free cluster now. adding a new feature to an existing app, safely exposing
your data to new users, or adding service integrations,
Stitch can take the place of your application server and
MongoDB Stitch
save you writing thousands of lines of boilerplate code.
The MongoDB Stitch serverless platform facilitates
application development with simple, secure access to data
and services from the client – getting your apps to market Conclusion
faster while reducing operational costs.

Stitch represents the next stage in the industry's migration MongoDB is a modern database used by the world's most
to a more streamlined, managed infrastructure. Virtual sophisticated organizations, from cutting-edge startups to
Machines running in public clouds (notably AWS EC2) led the largest companies, to create applications never before
the way, followed by hosted containers, and serverless possible at a fraction of the cost of legacy databases.
offerings such as AWS Lambda and Google Cloud MongoDB is the fastest-growing database ecosystem, with
Functions. These still required backend developers to over 35 million downloads, thousands of customers, and
implement and manage access controls and REST APIs to over 1,000 technology and service partners. MongoDB
provide access to microservices, public cloud services, and users rely on the best practices discussed in this guide to
of course data. Frontend developers were held back by maintain the highly available, secure and scalable
needing to work with APIs that weren't suited to rich data operations demanded by organizations today.
queries.

28
We Can Help Resources

We are the MongoDB experts. Over 6,600 organizations For more information, please visit mongodb.com or contact
rely on our commercial products. We offer software and us at [email protected].
services to make your life easier:
Case Studies (mongodb.com/customers)
MongoDB Enterprise Advanced is the best way to run Presentations (mongodb.com/presentations)
MongoDB in your data center. It's a finely-tuned package Free Online Training (university.mongodb.com)
of advanced software, support, certifications, and other Webinars and Events (mongodb.com/events)
services designed for the way you do business. Documentation (docs.mongodb.com)
MongoDB Enterprise Download (mongodb.com/download)
MongoDB Atlas is a database as a service for MongoDB,
MongoDB Atlas database as a service for MongoDB
letting you focus on apps instead of ops. With MongoDB
(mongodb.com/cloud)
Atlas, you only pay for what you use with a convenient
MongoDB Stitch backend as a service (mongodb.com/
hourly billing model. With the click of a button, you can
cloud/stitch)
scale up and down when you need to, with no downtime,
full security, and high performance.

MongoDB Stitch is a serverless platform which accelerates


application development with simple, secure access to data
and services from the client – getting your apps to market
faster while reducing operational costs and effort.

MongoDB Mobile (Beta) MongoDB Mobile lets you store


data where you need it, from IoT, iOS, and Android mobile
devices to your backend – using a single database and
query language.

MongoDB Cloud Manager is a cloud-based tool that helps


you manage MongoDB on your own infrastructure. With
automated provisioning, fine-grained monitoring, and
continuous backups, you get a full management suite that
reduces operational overhead, while maintaining full control
over your databases.

MongoDB Consulting packages get you to production


faster, help you tune performance in production, help you
scale, and free you up to focus on your next release.

MongoDB Training helps you become a MongoDB expert,


from design to operating mission-critical systems at scale.
Whether you're a developer, DBA, or architect, we can
make you better at MongoDB.

US 866-237-8815 • INTL +1-650-440-4474 • [email protected]


© 2018 MongoDB, Inc. All rights reserved.

29

You might also like