Mongo Best Practices
Mongo Best Practices
Continuous Availability 13
Managing MongoDB 18
Security 25
Conclusion 28
We Can Help 29
Resources 29
Introduction
MongoDB is designed to meet the demands of modern This paper provides guidance on best practices for
apps with a technology foundation that enables you deploying and managing MongoDB. It assumes familiarity
through: with the architecture of MongoDB and an understanding of
concepts related to the deployment of enterprise software.
1. The document data model – presenting you the best
way to work with dat
dataa. This guide is aimed at users managing the database
themselves. A dedicated guide is provided for users of the
2. A distributed systems design – allowing you to
MongoDB database as a service – MongoDB Atlas Best
intelligently put dat
dataa wher
where
e you want it
it.
Practices. MongoDB Atlas is the best way to run MongoDB
3. A unified experience that gives you the fr
freedom
eedom to run in the cloud.
anywher
anywhere e – allowing you to future-proof your work and
eliminate vendor lock-in. While this guide is broad in scope, it is not exhaustive. You
should refer to MongoDB documentation, starting with the
While some aspects of MongoDB are different from Production Notes, which detail system configurations that
traditional relational databases, the concepts of the system, affect MongoDB. Also consider the no cost, online training
its operations, policies, and procedures will be familiar to classes offered by MongoDB University. In addition,
staff who have deployed and operated other database MongoDB offers a range of consulting services to work
systems. Organizations have found that DBAs and with you at every stage of your application lifecycle.
operations teams are able to preserve existing investments
by integrating MongoDB into their production
environments, without needing to customize established
operational processes or tools.
1
Preparing for a MongoDB differences in recommendations between the storage
engines are noted.
Deployment
WiredTiger is the default storage engine for new MongoDB
deployments from MongoDB 3.2; if another engine is
MongoDB Pluggable Storage Engines preferred then start the mongod using the
--storageEngine option. If a 3.2+ mongod process is
MongoDB exposes a storage engine API, enabling the
started and one or more databases already exist, then it will
integration of pluggable storage engines that extend
use whichever storage engine those databases were
MongoDB with new capabilities, and enable optimal use of
created with.
specific hardware architectures. MongoDB ships with
multiple supported storage engines:
2
sub-documents, and binary data. It may be helpful to think just a subset of fields – for example, requiring a valid
of documents as roughly equivalent to rows in a relational customer name and address, while other fields can be
database, and fields as roughly equivalent to columns. freeform.
However, MongoDB documents tend to have all related
With schema validation, DBAs can apply data governance
data for a given object in a single document, whereas in a
standards to their schema, while developers maintain the
relational database that data is usually normalized across
benefits of a flexible document model.
rows in many tables. For example, data that belongs to
parent-child relationships in multiple RDBMS tables can As an example, you can add a JSON Schema to enforce
frequently be collapsed (embedded) into a single these rules:
document in MongoDB. For operational applications, the
document model makes JOINs redundant in many cases. • Each document must contain a field named lineItems
Collections are groupings of documents. Typically all ◦ Must contain a title (string), price (number no smaller
documents in a collection have similar or related purposes than 0)
for an application. It may be helpful to think of collections ◦ May optionally contain a boolean named purchased
as being analogous to tables in a relational database.
◦ Must contain no further fields
db.createCollection( "orders",
Dynamic Schema & Schema Validation {validator: {$jsonSchema:
3
Transactions seconds. Note that if write volumes to the server are
low, you have the flexibility to tune your transactions for
Because documents can bring together related data that
a longer execution time. To address timeouts, the
would otherwise be modelled across separate parent-child
transaction should be broken into smaller parts that
tables in a tabular schema, MongoDB’s atomic
allow execution within the configured time limit. You
single-document operations provide transaction semantics
should also ensure your query patterns are properly
that meet the data integrity needs of the majority of
optimized with the appropriate index coverage to allow
applications. One or more fields may be written in a single
fast data access within the transaction.
operation, including updates to multiple sub-documents
and elements of an array. The guarantees provided by 2. There are no hard limits to the number of documents
MongoDB ensure complete isolation as a document is that can be read within a transaction. As a best practice,
updated; any errors cause the operation to roll back so that no more than 1,000 documents should be modified
clients receive a consistent view of the document. within a transaction. For operations that need to modify
MongoDB’s existing document atomicity guarantees will more than 1,000 documents, developers should break
meet 80-90% of an application’s transactional needs. They the transaction into separate parts that process
remain the recommended way of enforcing your app’s data documents in batches.
integrity requirements 3. In MongoDB 4.0, a transaction is represented in a
single oplog entry, therefore must be within the 16MB
MongoDB 4.0 adds support for multi-document ACID
document size limit. While an update operation only
transactions, making it even easier for developers to
stores the deltas of the update (i.e., what has changed),
address more use cases with MongoDB. They feel just like
an insert will store the entire document. As a result, the
the transactions developers are familiar with from relational
combination of oplog descriptions for all statements in
databases – multi-statement, similar syntax, and easy to
the transaction must be less than 16MB. If this limit is
add to any application. Through snapshot isolation,
exceeded, the transaction will be aborted and fully rolled
transactions provide a consistent view of data, enforce
back. The transaction should therefore be decomposed
all-or-nothing execution, and do not impact performance
into a smaller set of operations that can be represented
for workloads that do not require them. For those
in 16MB or less.
operations that do require multi-document transactions,
there are several best practices that developers should 4. When a transaction aborts, an exception is returned to
observe. the driver and the transaction is fully rolled back.
Developers should add application logic that can catch
Creating long running transactions, or attempting to and retry a transaction that aborts due to temporary
perform an excessive number of operations in a single exceptions, such as a transient network failure or a
ACID transaction can result in high pressure on primary replica election. With retryable writes, the
WiredTiger’s cache. This is because the cache must MongoDB drivers will automatically retry the commit
maintain state for all subsequent writes since the oldest statement of the transaction.
snapshot was created. As a transaction always uses the
same snapshot while it is running, new writes accumulate You can review all best practices in the MongoDB
in the cache throughout the duration of the transaction. documentation for multi-document transactions.
These writes cannot be flushed until transactions currently
running on old snapshots commit or abort, at which time
Visualizing your Schema and Adding Validation
the transactions release their locks and WiredTiger can Rules: MongoDB Compass
evict the snapshot. To maintain predictable levels of
database performance, developers should therefore The MongoDB Compass GUI allows users to understand
consider the following: the structure of existing data in the database and perform
ad hoc queries against it – all with zero knowledge of
1. By default, MongoDB will automatically abort any MongoDB's query language. Typical users could include
multi-document transaction that runs for more than 60 architects building a new MongoDB project or a DBA who
4
has inherited a database from an engineering team, and groups of reviews as a separate document with a
who must now maintain it in production. You need to reference to the product document; while also storing the
understand what kind of data is present, define what key reviews in the product document for fast access.
indexes might be appropriate, and identify if Document
Validation rules should be added to enforce a consistent
GridFS
document structure.
For files larger than 16 MB, MongoDB provides a
convention called GridFS, which is implemented by all
MongoDB drivers. GridFS automatically divides large data
into 256 KB pieces called chunks and maintains the
metadata for all chunks. GridFS allows for retrieval of
individual chunks as well as entire documents. For example,
an application could quickly jump to a specific timestamp in
a video. GridFS is frequently used to store large binary files
such as images and videos in MongoDB.
5
automatically delete documents of a certain age rather Users should always create indexes to support queries, but
than scheduling a process to check the age of all should not maintain indexes that queries do not use. This is
documents and run a series of deletes. For example, if user particularly important for deployments that support
sessions should only exist for one hour, the TTL can be set insert-heavy (or writes which modify indexed values)
to 3600 seconds for a date field called lastActivity workloads.
that exists in documents used to track user sessions and
For operational simplicity, the Performance Advisor in
their last interaction with the system. A background thread
MongoDB Ops Manager and Cloud Manager platforms can
will automatically check all these documents and delete
identify missing indexes, enabling the administrator to then
those that have been idle for more than 3600 seconds.
automate the process of rolling them out – while avoiding
Another example use case for TTL is a price quote that
any application impact. Ops Manager and Cloud Manager
should automatically expire after a period of time.
are discussed later in this guide.
6
• Which alternative query plans were rejected (when duration exceeds a configurable threshold (whose default
using the allPlansExecution mode) is 100 ms). Profiling data is stored in a capped collection
where it can easily be searched for relevant events. It may
The explain plan will show 0 milliseconds if the query was
be easier to query this collection than parsing the log files.
resolved in less than 1 ms, which is typical in well-tuned
systems. When the explain plan is called, prior cached MongoDB Ops Manager and Cloud Manager can be used
query plans are abandoned, and the process of testing to visualize output from the profiler when identifying slow
multiple indexes is repeated to ensure the best possible queries. The Visual Query Profiler provides a quick and
plan is used. The query plan can be calculated and convenient way for operations teams and DBAs to analyze
returned without first having to run the query. This enables specific queries or query families. The Visual Query Profiler
DBAs to review which plan will be used to execute the (as shown in Figure 3) displays how query and write
query, without having to wait for the query to run to latency varies over time – making it simple to identify
completion. slower queries with common access patterns and
characteristics, as well as identify any latency spikes. A
MongoDB Compass provides the ability to visualize explain
single click in the Ops Manager UI activates the profiler,
plans, presenting key information on how a query
which then consolidates and displays metrics from every
performed – for example the number of documents
node in a single screen.
returned, execution time, index usage, and more. Each
stage of the execution pipeline is represented as a node in The Visual Query Profiler will analyze the data –
a tree, making it simple to view explain plans from queries recommending additional indexes and optionally add them
distributed across multiple nodes. through an automated, rolling index build.
Figur
Figuree 3: Visual Query Profiling in MongoDB Ops Manager
If the application will always use indexes, MongoDB can be Primary and Secondary Indexes
configured through the notablescan setting to throw an A unique index on the _id attribute is created for all
error if a query is issued that requires scanning the entire documents. MongoDB will automatically create the _id
collection. field and assign a unique value if the value is not be
specified when the document is inserted. All user-defined
Profiling indexes are secondary indexes. MongoDB includes support
for many types of secondary indexes that can be declared
MongoDB provides a profiling capability called Database on any field(s) in the document, including fields within
Profiler, which logs fine-grained information about arrays and sub-documents. Index options include:
database operations. The profiler can be enabled to log
information for all events or only those events whose • Compound indexes
7
• Geospatial indexes • By default, WiredTiger uses prefix compression to
reduce index footprint on both persistent storage and in
• Text search indexes
RAM. This enables administrators to dedicate more of
• Unique indexes the working set to manage frequently accessed
• Array indexes documents. Compression ratios of around 50% are
typical, but users are encouraged to evaluate the actual
• TTL indexes
ratio they can expect by testing their own workloads.
• Sparse indexes
• Administrators can place indexes on their own separate
• Partial Indexes storage volume, allowing for faster disk paging and
• Hash indexes lower contention.
• Compound indexes
indexes: Compound indexes are defined
Managing Indexes with the MongoDB WiredTiger
Storage Engine and ordered by field. So, if a compound index is defined
for last name, first name, and city, queries that
The WiredTiger storage engine offers optimizations that specify last name or last name, and first name
you can take advantage of: will be able to use this index, but queries that try to
8
search based on city will not be able to benefit from through its internal cache but it also benefits from pages
this index. Remove indexes that are prefixes of other held in the filesystem cache.
indexes.
The set of data and indexes that are accessed during
• Low selectivity indexes
indexes: An index should radically normal operations is called the working set. It is best
reduce the set of possible documents to select from. practice that the working set fits in RAM. It may be the
For example, an index on a field that indicates gender is case the working set represents a fraction of the entire
not as beneficial as an index on zip code, or even better, database, such as in applications where data related to
phone number. recent events or popular products is accessed most
• Regular expr
expressions
essions: Indexes are ordered by value, commonly.
hence leading wildcards are inefficient and may result in
Page faults occur when MongoDB attempts to access data
full index scans. Trailing wildcards can be efficient if
that has not been loaded in RAM. If there is free memory
there are sufficient case-sensitive leading characters in
then the operating system can locate the page on disk and
the expression.
load it into memory directly. However, if there is no free
• Negation
Negation: Inequality queries can be inefficient with memory, the operating system must write a page that is in
respect to indexes. Like most database systems, memory to disk, and then read the requested page into
MongoDB does not index the absence of values and memory when it is required by the application. This process
negation conditions may require scanning all can be time consuming and will be significantly slower than
documents. If negation is the only condition and it is not accessing data that is already resident in memory.
selective (for example, querying an orders table, where
Some operations may inadvertently purge a large
99% of the orders are complete, to identify those that
percentage of the working set from memory, which
have not been fulfilled), all records will need to be
adversely affects performance. For example, a query that
scanned.
scans all documents in the database, where the database
• Eliminate unnecessary indexes
indexes: Indexes are is larger than available RAM on the server, will cause
resource-intensive: even with they consume RAM, and documents to be read into memory and may lead to
as fields are updated their associated indexes must be portions of the working set being written out to disk. Other
maintained, incurring additional disk I/O overhead. To examples include various maintenance operations such as
understand the effectiveness of existing indexes use compacting or repairing a database, and rebuilding indexes.
the strategies described earlier.
If your database working set size exceeds the available
• Partial indexes
indexes: If only a subset of documents need to
RAM of your system, consider increasing RAM capacity or
be included in a given index then the index can be made
adding sharding the database across additional servers.
partial by specifying a filter expression. e.g., if an index
For a discussion on this topic, refer to the section on
on the userID field is only needed for querying open
Sharding Best Practices. It is far easier to implement
orders then it can be made conditional on the order
sharding before the system’s resources are consumed, so
status being set to in progress. In this way, partial
capacity planning is an important element in successful
indexes improve query performance while minimizing
project delivery.
overheads.
Refer to the documentation for configuring the WiredTiger
internal cache size.
Working Sets
MongoDB makes extensive use of RAM to speed up
database operations. In MongoDB, all data is read and
manipulated through in-memory representations of the
data. The WiredTiger storage engine manages data
9
MongoDB Setup and Configuration Data Migration
Users should assess how best to model their data for their
Setup applications rather than simply importing the flat file
exports of their legacy systems. In a traditional relational
MongoDB provides repositories for .deb and .rpm
database environment, data tends to be moved between
packages for consistent setup, upgrade, system integration,
systems using delimited flat files such as CSV. While it is
and configuration. This software uses the same binaries as
possible to ingest data into MongoDB from CSV files, this
the tarball packages provided from the MongoDB
may in fact only be the first step in a data migration
Downloads Page. The MongoDB Windows package is
process. It is typically the case that MongoDB's document
available via the downloadable binary installed via its MSI.
data model provides advantages and alternatives that do
Binaries for OS X are also provided in a tarball1.
not exist in a relational data model.
Upgrades
Hardware
Users should upgrade software as often as possible so
that they can take advantage of the latest features as well The following recommendations are only intended to
as any stability updates or bug fixes. Upgrades should be provide high-level guidance for hardware for a MongoDB
tested in non-production environments to validate correct deployment. The specific configuration of your hardware
application behavior. will be dependent on your data, queries, performance SLA,
availability requirements, and the capabilities of the
Customers can deploy rolling upgrades without incurring underlying hardware infrastructure. MongoDB has
any downtime, as each member of a replica set can be extensive experience helping customers to select hardware
upgraded individually without impacting database and tune their configurations and we frequently work with
availability. It is possible for each member of a replica set to customers to plan for and optimize their MongoDB
run under different versions of MongoDB, and with systems. The Health Check, Operations Rapid Start, and
different storage engines. As a precaution, the MongoDB Production Readiness consulting packages can be
release notes should be consulted to determine if there is especially valuable in helping select the appropriate
a particular order of upgrade steps that needs to be hardware for your project.
followed, and whether there are any incompatibilities
between two specific versions. Upgrades can be MongoDB was specifically designed with commodity
automated with Ops Manager and Cloud Manager. hardware in mind and has few hardware requirements or
limitations. Generally speaking, MongoDB will take
advantage of more RAM and faster CPU clock speeds.
1. OS X is intended as a development rather than a production environment
10
Memory As with networking, use paravirtualized drivers for your
storage when running on VMs.
MongoDB makes extensive use of RAM to increase
performance. Ideally, the working set fits in RAM. As a
general rule of thumb, the more RAM, the better. As Compression
workloads begin to access data that is not in RAM, the
MongoDB natively supports compression when using the
performance of MongoDB will degrade, as it will for any
default WiredTiger storage engine. Compression reduces
database. The default WiredTiger storage engine gives
storage footprint by as much as 80%, and enables higher
more control of memory by allowing users to configure how
storage I/O scalability as fewer bits are read from disk. As
much RAM to allocate to the WiredTiger internal cache –
with any compression algorithm, administrators trade
defaulting to 60% of RAM minus 1 GB. WiredTiger also
storage efficiency for CPU overhead, and so it is important
exploits the operating system’s filesystem cache which will
to test the impacts of compression in your own
grow to utilize the remaining memory available.
environment.
11
storage engine, administrators will need to calculate the to deploy a mongos instance on each of their application
appropriate cache size for each instance by evaluating servers. The optimal number of mongos servers will be
what portion of total RAM each of them should use, and determined by the specific workload of the application: in
splitting the default cache_size between each. some cases mongos simply routes queries to the
appropriate shard, and in other cases mongos must route
For availability, multiple members of the same replica set
them to multiple shards and merge the result sets. To
should never be co-located on the same physical hardware
estimate the memory requirements for each mongos,
or share any single point of failure such as a power supply.
consider the following:
When running in the cloud, make use of your provider’s
ability to deploy across availability zones to ensure that • The total size of the shard metadata that is cached by
members from each replica set are geographically mongos
dispersed and do not share the same power, hypervisor or
• 1MB for each application connection
network. The MongoDB Atlas database service will take of
all of this for you. The mongos process uses limited RAM and will benefit
more from fast CPUs and networks.
Sizing for mongos and Config Server Processes
For sharded systems, additional processes must be Operating System and File System
deployed alongside the mongod data storing processes:
mongos query routers and config servers. Shards are
Configurations for Linux
physical partitions of data spread across multiple servers.
For more on sharding, please see the section on horizontal Only 64-bit versions of operating systems are supported
scaling with shards. Queries are routed to the appropriate for use with MongoDB.
shards using a query router process called mongos. The
Version 2.6.36 of the Linux kernel or later should be used
metadata used by mongos to determine where to route a for MongoDB in production.
query is maintained by the config servers. Both mongos
and config server processes are lightweight, but each has Use XFS file systems; avoid EXT3.** EXT3 is quite old and
somewhat different sizing requirements. is not optimal for most database workloads. With the
WiredTiger storage engine, use of XFS is strongly
Within a shard, MongoDB further partitions documents into recommended to avoid performance issues that have been
chunks. MongoDB maintains metadata about the observed when using EXT4 with WiredTiger.
relationship of chunks to shards in the config database.
Three or more config servers are maintained in sharded For MongoDB on Linux use the following recommended
deployments to ensure availability of the metadata at all configurations:
times. Shard metadata access is infrequent: each mongos
• Turn off atime for the storage volume with the
maintains a cache of this data, which is periodically
database files.
updated by background processes when chunks are split
or migrated to other shards, typically during balancing • Do not use Huge Pages virtual memory pages,
operations as the cluster expands and contracts. The MongoDB performs better with normal virtual memory
hardware for a config server should therefore be focused pages.
on availability: redundant power supplies, redundant • Disable NUMA in your BIOS or invoke mongod with
network interfaces, redundant RAID controllers, and NUMA disabled.
redundant storage should be used. Config servers can be
• Ensure that readahead settings for the block devices
deployed as a replica set with up to 50 members.
that store the database files are relatively small as most
Typically multiple mongos instances are used in a sharded access is non-sequential. For example, setting
MongoDB system. It is not uncommon for MongoDB users readahead to 32 (16 KB) is a good starting point.
12
• Synchronize time between your hosts – for example, other topics is available in the MongoDB Security Tutorials.
using NTP. This is especially important in sharded Review the Security section later in this guide for more
MongoDB clusters. This also applies to VM guests information on best practices on securing your deployment.
running MongoDB processes.
MongoDB offers IP whitelisting, allowing administrators to
Linux provides controls to limit the number of resources configure MongoDB to only accept external connections
and open files on a per-process and per-user basis. The from approved IP addresses or CIDR ranges that have
default settings may be insufficient for MongoDB. been explicitly added to the whitelist.
Generally MongoDB should be the only process on a
When running on virtual machines, use paravirtualized
system, VM, or container to ensure there is no contention
drivers to implement an optimized network and storage
with other processes.
interfaces that passes instructions between the virtual
While each deployment has unique requirements, the machine and the hypervisor with minimal overhead.
following configurations are a good starting point for
mongod and mongos instances. Use ulimit to apply these
Network Compression
settings:
As a distributed database, MongoDB relies on efficient
• -f (file size): unlimited
network transport during query routing and inter-node
• -t (CPU time): unlimited replication. MongoDB compresses all network traffic
between client and the database, and traffic between
• -v (virtual memory): unlimited
nodes of the cluster. Based on the snappy compression
• -n (open files): above 20,000 algorithm, network traffic can be compressed by up to
• -m (memory size): unlimited 70%, providing major performance benefits in
bandwidth-constrained environments, and reducing
• -u (processes/threads): above 20,000
networking costs.
For more on using ulimit to set the resource limits for
Compressing and decompressing network traffic requires
MongoDB, see the MongoDB Documentation page on
CPU resources – typically low single digit percentage
Linux ulimit Settings.
overhead. Compression is ideal for those environments
where performance is bottlenecked by bandwidth, and
Networking sufficient CPU capacity is available.
13
and other hardware components will fail. These risks can reads and writes. Simply placing the journal files on a
be mitigated with redundant hardware components. separate storage device normally provides some
Similarly, a MongoDB system provides configurable performance enhancements by reducing disk contention.
redundancy throughout its software components as well as
Learn more about journaling from the documentation.
configurable data redundancy.
14
The number of replica nodes in a MongoDB replica set is Multi-Data Center Replication
configurable, and a larger number of replica nodes
provides increased protection against database downtime MongoDB replica sets allow for flexible deployment
in case of multiple machine failures. While a node is down designs both within and across data centers that account
MongoDB will continue to function. The DBA or sysadmin for failure at the server, rack, and regional levels. In the
should work to recover or replace the failed replica in order case of a natural or human-induced disaster, the failure of
to mitigate the temporarily reduced resilience of the a single data center can be accommodated with no
system. downtime when MongoDB replica sets are deployed
across data centers. Multi-data center replication is also
Replica sets also provide operational flexibility by providing fully supported as a managed service in MongoDB Atlas.
sysadmins with an option for performing hardware and
software maintenance without taking down the entire
system. Using a rolling upgrade, secondary members of the Write Guarantees
replica set can be upgraded in turn, before the
MongoDB allows administrators to specify the level of
administrator demotes the master to complete the
persistence guarantee when issuing writes to the
upgrade. This process is fully automated when using Ops
database, which is called the write concern. The following
Manager or Cloud Manager – discussed later in this guide.
options can be configured on a per connection, per
Consider the following factors when developing the database, per collection, or even per operation basis. The
architecture for your replica set: options are as follows:
• Ensure that the members of the replica set will always • Write Ac
Acknowledged:
knowledged: This is the default write concern.
be able to elect a primary. A strict majority of voting The mongod will confirm the execution of the write
cluster members must be available and in contact with operation, allowing the client to catch network, duplicate
each other to elect a new primary. Therefore you should key, Document Validation, and other exceptions.
run an odd number of members. There should be at • Journal Ac
Acknowledged:
knowledged: The mongod will confirm the
least three replicas with copies of the data in a replica write operation only after it has flushed the operation to
set. the journal on the primary. This confirms that the write
• Best practice is to have a minimum of 3 data centers so operation can survive a mongod crash and ensures that
that a majority is maintained after the loss of any single the write operation is durable on disk.
site. If only 2 sites are possible then know where the
• Replic
Replicaa Ac
Acknowledged:
knowledged: It is also possible to wait for
majority of members will be in the case of any network
acknowledgment of writes to other replica set members.
partitions and attempt to ensure that the replica set can
MongoDB supports writing to a specific number of
elect a primary from the members located in that
replicas. This also ensures that the write is written to the
primary data center.
journal on the secondaries. Because replicas can be
• Consider including a hidden member in the replica set. deployed across racks within data centers and across
Hidden replica set members can never become a multiple data centers, ensuring writes propagate to
primary and are typically used for backups, or to run additional replicas can provide extremely robust
applications such as analytics and reporting that require durability.
isolation from regular operational workloads. Delayed
• Majority: This write concern waits for the write to be
replica set members can also be deployed that apply
applied to a majority of replica set members. This also
changes on a fixed time delay to provide recovery from
ensures that the write is recorded in the journal on
unintentional operations, such as accidentally dropping
these replicas – including on the primary.
a collection.
• Dat
Dataa Center AAwar
wareness:
eness: Using tag sets, sophisticated
More information on replica sets can be found on the policies can be created to ensure data is written to
Replication MongoDB documentation page. specific combinations of replicas prior to
15
acknowledgment of success. For example, you can MongoDB offers a readConcern level of “Linearizable”. The
create a policy that requires writes to be written to at linearizable read concern ensures that a node is still the
least three data centers on two continents, or two primary member of the replica set at the time of the read,
servers across two racks in a specific data center. For and that the data it returns will not be rolled back if another
more information see the MongoDB Documentation on node is subsequently elected as the new primary member.
Data Center Awareness. Configuring this read concern level can have a significant
impact on latency, therefore a maxTimeMS value should be
supplied in order to timeout long running operations.
Read Preferences
Reading from the primary replica is the default
Causal Consistency
configuration as it guarantees consistency. If higher read
throughput is required, it is recommended to take Causal consistency – guarantees that every read operation
advantage of MongoDB's auto-sharding to distribute read within a client session will always see the previous write
operations across multiple primary members. With operation, regardless of which replica is serving the
MongoDB's read concern levels, discussed below, request. By enforcing strict, causal ordering of operations
administrators can tune MongoDB read consistency across within a session, causal consistency ensures every read is
members of the replica set. always logically consistent, enabling monotonic reads from
a distributed system – guarantees that cannot be met by
Distributing read operations across replica set members
most multi-node databases. Causal consistency allows
can improve read scalability of the MongoDB deployment.
developers to maintain the benefits of strict data
For example, analytics and Business Intelligence (BI)
consistency enforced by legacy single node relational
applications can execute queries against a secondary
databases, while modernizing their infrastructure to take
replica, thereby reducing overhead on the primary and
advantage of the scalability and availability benefits of
enabling MongoDB to serve operational and analytical
modern distributed data platforms.
workloads from a single deployment. Another configuration
option directs reads to the replica nearest to the user
based on ping distance, which can significantly decrease Scaling a MongoDB System
the latency of read operations in globally distributed
applications at the expense of potentially reading slightly
stale data. Horizontal Scaling with Automatic
Sharding
A very useful option is primaryPreferred, which issues
reads to a secondary replica only if the primary is To meet the needs of apps with large data sets and high
unavailable. This configuration allows for the continuous throughput requirements, MongoDB provides horizontal
availability of reads during the short failover process. scale-out for databases on low-cost, commodity hardware
or cloud infrastructure using a technique called sharding.
For more on the subject of configurable reads, see the
Sharding automatically partitions and distributes data
MongoDB Documentation page on replica set Read
across multiple physical instances called shards. Each
Preference.
shard is backed by a replica set to provide always-on
availability and workload isolation. Sharding allows
Read Concerns developers to seamlessly scale the database as their apps
grow beyond the hardware limits of a single server, and it
To ensure isolation and consistency, the readConcern can does this without adding complexity to the application. To
be set to majority to indicate that data should only be respond to workload demand, nodes can be added or
returned to the application if it has been replicated to a removed from the cluster in real time, and MongoDB will
majority of the nodes in the replica set, and so cannot be automatically rebalance the data accordingly, without
rolled back in the event of a failure. manual intervention.
16
Sharding is transparent to applications; whether there is • RAM Limit
Limitation:
ation: The size of the system's active
one or a thousand shards, the application code for querying working set plus indexes is expected to exceed the
MongoDB remains the same. Applications issue queries to capacity of the maximum amount of RAM in the system.
a query router that dispatches the query to the appropriate
• Disk II/O
/O Limit
Limitation:
ation: The system will have a large
shards. For key-value queries that are based on the shard
amount of write activity, and the operating system will
key, the query router will dispatch the query to the shard
not be able to write data fast enough to meet demand,
that manages the document with the requested key. When
or I/O bandwidth will limit how fast the writes can be
using range-based sharding, queries that specify ranges on
flushed to disk.
the shard key are only dispatched to shards that contain
documents with values within the range. For queries that • Storage Limit
Limitation:
ation: The data set will grow to exceed
don’t use the shard key, the query router will broadcast the the storage capacity of a single node in the system.
query to all shards, aggregating and sorting the results as • Dat
Dataa placement rrequir
equirements:
ements: The data set needs to
appropriate. Multiple query routers can be used within a be assigned to a specific data center to support low
MongoDB cluster, with the appropriate number governed latency local reads and writes, or for data sovereignty to
by the performance and availability requirements of the meet privacy regulations such as the GDPR.
application. Alternatively, data placement might be required to
create multi-temperature storage infrastructures that
MongoDB exposed multiple sharding policies. As a result,
separate hot and cold data onto specific volumes.
data can be distributed according to query patterns or data
MongoDB gives you this flexibility.
placement requirements, giving developers much higher
scalability across a diverse et of workloads: Applications that meet these criteria, or that are likely to do
so in the future, should be designed for sharding in
• Ranged Shar
Sharding
ding. Documents are partitioned across
advance rather than waiting until they have consumed
shards according to the shard key value. Documents
available capacity. Applications that will eventually benefit
with shard key values close to one another are likely to
from sharding should consider which collections they will
be co-located on the same shard. This approach is well
want to shard and the corresponding shard keys when
suited for applications that need to optimize range
designing their data models. If a system has already
based queries, such as co-locating data for all
reached or exceeded its capacity, it will be challenging to
customers in a specific region on a specific shard.
deploy sharding without impacting the application's
• Hashed Shar
Sharding
ding. Documents are distributed performance.
according to an MD5 hash of the shard key value. This
approach guarantees a uniform distribution of writes
across shards, which is often optimal for ingesting Sharding Best Practices
streams of time-series and event data.
Users who choose to shard should consider the following
• Zoned Shar
Sharding
ding. Provides the ability for developers to best practices:
define specific rules governing data placement in a
Select a good shar
shardd key
key.. When selecting fields to use as
sharded cluster. Zones are discussed in more detail in
a shard key, there are at least three key criteria to consider:
the following Data Locality section of the guide.
1. Cardinality: Data partitioning is managed in 64 MB
Thousands of organizations use MongoDB to build
chunks by default. Low cardinality (e.g., a user's home
high-performance systems at scale. You can read more
country) will tend to group documents together on a
about them on the MongoDB scaling page.
small number of shards, which in turn will require
Users should consider deploying a sharded cluster in the frequent rebalancing of the chunks and a single country
following situations: is likely to exceed the 64 MB chunk size. Instead, a
shard key should exhibit high cardinality.
17
2. Insert Scaling: Writes should be evenly distributed the balancer or to configure when balancing is performed
across all shards based on the shard key. If the shard to further minimize the impact on performance.
key is monotonically increasing, for example, all inserts
will go to the same shard even if they exhibit high
cardinality, thereby creating an insert hotspot. Instead,
Geographic Distribution
the key should be evenly distributed. Shards can be configured such that specific ranges of
3. Query Isolation: Queries should be targeted to a specific shard key values are mapped to a physical shard location.
shard to maximize scalability. If queries cannot be Zoned sharding allows a MongoDB administrator to control
isolated to a specific shard, all shards will be queried in the physical location of documents in a MongoDB cluster,
a pattern called scatter/gather, which is less efficient even when the deployment spans multiple data centers in
than querying a single shard. different regions.
For more on selecting a shard key, see Considerations for It is possible to combine the features of replica sets, zoned
Selecting Shard Keys. sharding, read preferences, and write concerns in order to
provide a deployment that is geographically distributed,
Add ccapacity
apacity befor
beforee it is needed. Cluster maintenance enabling users to read and write to their local data centers.
is lower risk and more simple to manage if capacity is An administrator can restrict sharded collections to a
added before the system is over utilized. specific set of shards, effectively federating those shards
for different users. For example, one can tag all USA data
Run thr
three
ee or mor
moree configuration servers to pr
provide
ovide
and assign it to shards located in the United States.
redundancy
edundancy.. Production deployments must use three or
more config servers. Config servers should be deployed in To learn more, download the MongoDB Multi-Datacenter
a topology that is robust and resilient to a variety of Deployments Guide.
failures.
Use rreplic
eplicaa sets. Sharding and replica sets are absolutely
compatible. Replica sets should be used in all deployments,
Managing MongoDB:
and sharding should be used when appropriate. Sharding Provisioning, Monitoring and
allows a database to make use of multiple servers for data
Disaster Recovery
capacity and system throughput. Replica sets maintain
redundant copies of the data across servers, server racks,
and even data centers. If you are running your apps and databases in the public
cloud, MongoDB offers the fully managed, on-demand and
Use multiple mongos inst
instances.
ances. elastic MongoDB Atlas service. Atlas enables customers to
deploy, operate, and scale MongoDB databases on AWS,
Apply best practices for bulk inserts. Pre-split data into
Azure, or GCP in just a few clicks or programmatic API
multiple chunks so that no balancing is required during the
calls. MongoDB Atlas is available through a pay-as-you-go
insert process. Alternately, disable the balancer during bulk
model and billed on an hourly basis. A fuller description of
loads. Also, use multiple mongos instances to load in
MongoDB Atlas is included later in this guide.
parallel for greater throughput. For more information see
Create Chunks in a Sharded Cluster in the MongoDB If you are running MongoDB yourself, Ops Manager is the
Documentation. simplest way to run the database on your own
infrastructure, making it easy for operations teams to
deploy, monitor, backup, and scale MongoDB. Many of the
Dynamic Data Balancing capabilities of Ops Manager are also available in the
As data is loaded into MongoDB, the system may need to MongoDB Cloud Manager service hosted in the cloud.
dynamically rebalance chunks across shards in the cluster Today, Cloud Manager supports thousands of deployments,
using a process called the balancer. It is possible to disable including systems from one to hundreds of servers.
18
Organizations who run their deployments with MongoDB the infrastructure through agents installed on each server.
Enterprise Advanced can choose between Ops Manager The servers can reside in the public cloud or a private data
and Cloud Manager. center. Ops Manager reliably orchestrates the tasks that
administrators have traditionally performed manually –
Ops Manager and Cloud Manager incorporate best
deploying a new cluster, upgrades, creating point in time
practices to help keep managed databases healthy and
backups, rolling out new indexes, and many other
optimized. They ensure operational continuity by converting
operational tasks.
complex manual tasks into reliable, automated procedures
with the click of a button or via an API call: Ops Manager is designed to adapt to problems as they
arise by continuously assessing state and making
• Deploy
Deploy.. Any topology, at any scale
adjustments as needed. Here’s how:
• Upgrade. In minutes, with no downtime
• Ops Manager agents are installed on servers (where
• Sc
Scale.
ale. Add capacity, without taking the application MongoDB will be deployed), either through
offline. configuration tools such as Ansible, Chef or Puppet, or
• Point-in-time, Sc
Scheduled
heduled Bac
Backups.
kups. Restore by an administrator.
complete running clusters to any point in time with just • The administrator creates a new design goal for the
a few clicks, because disasters aren't predictable. system, either as a modification to an existing
• Queryable Bac
Backups.
kups. Allow partial restores of selected deployment (e.g., upgrade, oplog resize, new shard), or
data, and the ability to query a backup file in-place, as a new system.
without having to restore it. • The agents periodically check in with the Ops Manager
• Performance Alerts. Monitor 100+ system metrics central server and receive the new design instructions.
and get custom alerts before the system degrades. • Agents create and follow a plan for implementing the
• Roll Out Indexes. Avoid impact to the application by design. Using a sophisticated rules engine, agents
introducing new indexes node by node – starting with continuously adjust their individual plans as conditions
the secondaries and then the demoted primary. change. In the face of many failure scenarios – such as
server failures and network partitions – agents will
• Manage Zones. Configure sharding Zones to mandate
revise their plans to reach a safe state.
what data is stored where.
• Minutes later, the system is deployed – safely and
• Dat
Dataa Explor
Explorer
er.. Examine the database’s schema by
reliably.
running queries to review document structure, viewing
collection metadata, and inspecting index usage Beyond deploying new databases, Ops Manager can
statistics "attach to" or import existing MongoDB deployments and
take over their control.
The Operations Rapid Start service gives your operations
and devops teams the skills and tools to run and manage In addition to initial deployment, Ops Manager make it
MongoDB with all the best practices accumulated over possible to dynamically resize capacity by adding shards
many years working with some of the world's largest and replica set members. Other maintenance tasks such as
companies. This engagement offers introductory upgrading MongoDB or resizing the oplog can be reduced
administrator training and custom consulting to help you from dozens or hundreds of manual steps to the click of a
set up and use either MongoDB Ops Manager or button, all with zero downtime.
MongoDB Cloud Manager.
A common DBA task is to roll out new indexes in
production systems. In order to minimize the impact to the
Deployments and Upgrades live system, the best practice is to perform a rolling index
build – starting with each of the secondaries and finally
Ops Manager coordinates critical operational tasks across
applying changes to the original primary, after swapping its
the servers in a MongoDB system. It communicates with
19
role with one of the secondaries. While this rolling process performance, and system capacity utilization. These
can be performed manually, Ops Manager and Cloud baselines should reflect the workloads you expect the
Manager can automate the process across MongoDB system to perform in production, and they should be
replica sets, reducing operational overhead and the risk of revisited periodically as the number of users, application
failovers caused by incorrectly sequencing management features, performance SLA, or other factors change.
processes.
Baselines will help you understand when the system is
Administrators can use the Ops Manager interface directly, operating as designed, and when issues begin to emerge
or invoke the Ops Manager RESTful API from existing that may affect the quality of the user experience or other
enterprise tools, including popular monitoring and factors critical to the system. It is important to monitor your
orchestration frameworks. Specific integration is provided MongoDB system for unusual behavior so that actions can
with the leading Application Performance Management be taken to address issues proactively. The following
(APM) tools. Details are included later in this section of the represents the most popular tools for monitoring
guide. MongoDB, and also describes different aspects of the
system that should be monitored.
Cloud Native Integration. Ops Manager can be
integrated with Pivotal Cloud Foundry, Red Hat OpenShift,
and Kubernetes. With Ops Manager, you can rapidly deploy Monitoring with Ops Manager and Cloud
MongoDB Enterprise powered applications by abstracting Manager
away the complexities of managing, scaling and securing
Featuring charts, custom dashboards, and automated
hybrid clouds. Ops Manager coordinates orchestration
alerting, Ops Manager tracks 100+ key database and
between your cloud native platform, which handles the
systems health metrics including operations counters,
underlying infrastructure, while Ops Manager handles the
memory and CPU utilization, replication status, open
MongoDB instances, automatically configured and
connections, queues, and any node status. Ops Manager
managed with operational best practices.
allows telemetry data to be collected every 10 seconds.
With this integration, you can consistently and effortlessly
run workloads wherever they need to be, standing up the
same database configuration in different environments, all
controlled from a single pane of glass.
20
systems administrators can monitor all the MongoDB The free monitoring service is available to all MongoDB
deployments in the organization. users, without needing to install an agent, navigate a
paywall, or complete a registration form. You will be able to
Historic performance can be reviewed in order to create
see the metrics and topology about your environment from
operational baselines and to support capacity planning.
the moment free monitoring is enabled. You can enable
Integration with existing monitoring tools is also
free monitoring easily using the MongoDB shell, MongoDB
straightforward via the Ops Manager RESTful API, making
Compass, or by starting the mongod process with the new
the deep insights from Ops Manager part of a consolidated
'db.enableFreeMonitoring()' command line option, and you
view across your operations.
can opt out at any time.
Ops Manager allows administrators to set custom alerts
With the Monitoring Cloud Service, the collected metrics
when key metrics are out of range. Alerts can be
enable you to quickly assess database health and optimize
configured for a range of parameters affecting individual
performance, all from the convenience of a powerful
hosts, replica sets, agents, and backup. Alerts can be sent
browser-based GUI. Monitoring features include
via SMS, email, webhooks, Flowdock, HipChat, and Slack or
integrated into existing incident management systems such • Environment information: Topology (standalone, replica
as PagerDuty to proactively warn of potential issues, sets including primary and secondary nodes). MongoDB
before they escalate to costly outages. version.
If using Cloud Manager, access to monitoring data can also • Charts with 24 hours of data for the following metrics,
be shared with MongoDB support engineers, providing fast updated every minute: Database operations per second
issue resolution by eliminating the need to ship logs (averaged to the minute), including commands, queries,
between different teams. updates, deletes, getMores, inserts and replication
operations for replica set secondaries.
• Queues.
• Replication lag.
• Network I/O.
Figur
Figuree 5: Ops Manager provides real time & historic
visibility into the MongoDB deployment. mongotop
mongotop is a utility that ships with MongoDB. It tracks
Free MongoDB Monitoring Cloud Service
and reports the current read and write activity of a
With the 4.0 release, the MongoDB database can natively MongoDB cluster. mongotop provides collection-level
push monitoring metadata directly to the MongoDB statistics.
Monitoring Cloud. Once enabled, you will be shown a
unique URL that you can navigate to in a web browser, and
instantly see monitoring metrics and topology information mongostat
collected for your environment. You can share the URL to mongostat is a utility that ships with MongoDB. It shows
provide visibility to anyone on your team.
real-time statistics about all servers in your MongoDB
system. mongostat provides a comprehensive overview of
21
all operations, including counts of updates, inserts, page proactively warn administrators of potential issues before
faults, index misses, and many other important measures of users experience a problem.
the system health. mongostat is similar to the Linux tool
vmstat.
Application Logs And Database Logs
Application and database logs should be monitored for
Other Popular Tools
errors and other system information. It is important to
There are a number of popular open-source monitoring correlate your application and database logs in order to
tools for which MongoDB plugins are available. If determine whether activity in the application is ultimately
MongoDB is configured with the WiredTiger storage responsible for other issues in the system. For example, a
engine, ensure the tool is using a WiredTiger-compatible spike in user writes may increase the volume of writes to
driver: MongoDB, which in turn may overwhelm the underlying
storage system. Without the correlation of application and
• Nagios database logs, it might take more time than necessary to
• Ganglia establish that the application is responsible for the
increase in writes rather than some process running in
• Cacti
MongoDB.
• Scout
In the event of errors, exceptions or unexpected behavior,
• Zabbix
the logs should be saved and uploaded to MongoDB when
• Datadog opening a support case. Logs for mongod processes
running on primary and secondary replica set members, as
Linux Utilities well as mongos and config processes will enable the
support team to more quickly root cause any issues.
Other common utilities that can be used to monitor
different aspects of a MongoDB system:
Page Faults
• iostat: Provides usage statistics for the storage
subsystem When a working set ceases to fit in memory, or other
operations have moved working set data out of memory,
• vmstat: Provides usage statistics for virtual memory
the volume of page faults may spike in your MongoDB
• netstat: Provides usage statistics for the network system. Page faults are part of the normal operation of a
MongoDB system, but the volume of page faults should be
• sar: Captures a variety of system statistics periodically
monitored in order to determine if the working set is
and stores them for analysis
growing to the level that it no longer fits in available
memory and if alternatives such as more memory or
Windows Utilities sharding across multiple servers is appropriate. In most
cases, the underlying issue for problems in a MongoDB
Performance Monitor, a Microsoft Management Console
system tends to be page faults.
snap-in, is a useful tool for measuring a variety of stats in a
Windows environment.
Disk
22
for the volume of writes. Other potential issues could be System Configuration
the root cause, but the symptom is typically visible through
It is not uncommon to make changes to hardware and
iostat as showing high disk utilization and high queuing
software in the course of a MongoDB deployment. For
for writes.
example, a disk subsystem may be replaced to provide
better performance or increased capacity. When
CPU components are changed it is important to ensure their
configurations are appropriate for the deployment.
A variety of issues could trigger high CPU utilization. This
MongoDB is very sensitive to the performance of the
may be normal under most circumstances, but if high CPU
operating system and underlying hardware, and in some
utilization is observed without other issues such as disk
cases the default values for system configurations are not
saturation or pagefaults, there may be an unusual issue
ideal. For example, the default readahead for the file
in the system. For example, a MapReduce job with an
system could be several MB whereas MongoDB is
infinite loop, or a query that sorts and filters a large number
optimized for readahead values closer to 32 KB. If the
of documents from the working set without good index
new storage system is installed without making the change
coverage, might cause a spike in CPU without triggering
to the readahead from the default to the appropriate
issues in the disk system or pagefaults.
setting, the application's performance is likely to degrade
substantially. Remember to review the Production Notes for
Connections latest best practices.
23
Replication Lag test, and recovery clusters can be built in a few simple
clicks. Operations teams can configure backups against
Replication lag is the amount of time it takes a write
specific collections only, rather than the entire database,
operation on the primary replica set member to replicate to
speeding up backups and reducing the requisite storage
a secondary member. A small amount of delay is normal,
space.
but as replication lag grows, issues may arise. Typical
causes of replication lag include network latency or Queryable Backups allow partial restores of selected data,
connectivity issues, and disk latencies such as the and the ability to query a backup file in-place, without
throughput of the secondaries being inferior to that of the having to restore it.
primary.
Ops Manager supports cross-project restores, allowing
users to perform restores into a different Ops Manager
Config Server Availability Project than the backup snapshot source. This allows
DevOps teams to easily execute tasks such as creating
In sharded environments it is required to run three or more
multiple staging or test environments that match recent
config servers. Config servers are critical to the system for
production data, while configured with different user
understanding the location of documents across shards.
access privileges or running in different regions.
The database will remain operational in this case, but the
balancer will be unable to move chunks and other Because Ops Manager only reads the oplog, the ongoing
maintenance activities will be blocked until all three config performance impact is minimal – similar to that of adding
servers are available. Config servers are, by default, be an additional replica to a replica set.
deployed as a MongoDB replica set.
By using MongoDB Enterprise Advanced you can deploy
Ops Manager to control backups in your local data center,
Disaster Recovery: Backup & Recovery or use the Cloud Manager service that offers a fully
managed backup solution with a pay-as-you-go model.
A backup and recovery strategy is necessary to protect
Dedicated MongoDB engineers monitor user backups on a
your mission-critical data against catastrophic failure, such
24x365 basis, alerting operations teams if problems arise.
as a fire or flood in a data center, or human error such as
code errors or accidentally dropping collections. With a Ops Manager and Cloud Manager are not the only
backup and recovery strategy in place, administrators can mechanisms for backing up MongoDB. Other options
restore business operations without data loss, and the include:
organization can meet regulatory and compliance
requirements. Taking regular backups offers other • File system copies
advantages, as well. The backups can be used to seed new • The mongodump tool packaged with MongoDB
environments for development, staging, or QA without
impacting production systems.
File System Backups
Ops Manager and Cloud Manager backups are maintained
File system backups, such as that provided by Linux LVM,
continuously, just a few seconds behind the operational
quickly and efficiently create a consistent snapshot of the
system. If the MongoDB cluster experiences a failure, the
file system that can be copied for backup and restore
most recent backup is only moments behind, minimizing
purposes. For databases with a single replica set it is
exposure to data loss. MongoDB Atlas, Ops Manager, and
possible to stop operations temporarily so that a consistent
Cloud Manager are the only MongoDB solutions that offer
snapshot can be created by issuing the db.fsyncLock()
point-in-time backup of replica sets and cluster-wide
snapshots of sharded clusters. You can restore to precisely command. This will flush all pending writes to disk and lock
the moment you need, quickly and safely. Ops teams can the entire mongod instance to prevent additional writes
automate their database restores reliably and safely using until the lock is released with db.fsyncUnlock().
Ops Manager and Cloud Manager. Complete development,
24
For more on how to use file system snapshots to create a
backup of MongoDB, please see Backup and Restore with
Filesystem Snapshots in the MongoDB Documentation.
Integrating MongoDB with External As shown in Figure 6, summary metrics are presented
Monitoring Solutions within the APM’s UI. Administrators can also run New Relic
Insights for analytics against monitoring data to generate
The Ops Manager API provides integration with external dashboards that provide real-time tracking of Key
management frameworks through programmatic access to Performance Indicators (KPIs).
automation features and monitoring data.
25
The intention of a Defense in Depth approach is to layer MongoDB also extends existing support for authenticating
your environment to ensure there are no exploitable single users via LDAP to now include LDAP authorization. This
points of failure that could allow an intruder or untrusted enables existing user privileges stored in the LDAP server
party to access the data stored in the MongoDB database. to be mapped to MongoDB roles, without users having to
The most effective way to reduce the risk of exploitation is be recreated in MongoDB itself.
to run MongoDB in a trusted environment, to limit access,
to follow a system of least privileges, to institute a secure
development lifecycle, and to follow deployment best
Auditing
practices. MongoDB Enterprise Advanced enables security
administrators to construct and filter audit trails for any
MongoDB Enterprise Advanced features extensive
operation against MongoDB, whether DML, DCL or DDL.
capabilities to defend, detect and control access to
For example, it is possible to log and audit the identities of
MongoDB, offering among the most complete security
users who retrieved specific documents, and any changes
controls of any modern database:
made to the database during their session. The audit log
• Access Contr
Control.
ol. Control access to sensitive data using can be written to multiple destinations in a variety of
industry standard mechanisms for authentication and formats including to the console and syslog (in JSON
authorization to the database, collection, and down to format), and to a file (JSON or BSON), which can then be
the level of individual fields within a document. loaded to MongoDB and analyzed to identify relevant
• Auditing. Ensure regulatory and internal compliance. events
26
Using the Encrypted storage engine, the raw database
MongoDB Atlas: Database as a
“plain text” content is encrypted using an algorithm that
takes a random encryption key as input and generates Service For MongoDB
"ciphertext" that can only be read if decrypted with the
decryption key. The process is entirely transparent to the
An increasing number of companies are moving to the
application. MongoDB supports a variety of encryption
public cloud to not only reduce the operational overhead of
algorithms – the default is AES-256 (256 bit encryption) in
managing infrastructure, but also provide their teams with
CBC mode. AES-256 in GCM mode is also supported.
access to on-demand services that give them the agility
Encryption can be configured to meet FIPS 140-2
they need to meet faster application development cycles.
requirements.
This move from building IT to consuming IT as a service is
The storage engine encrypts each database with a well aligned with parallel organizational shifts including
separate key. The key-wrapping scheme in MongoDB agile and DevOps methodologies and microservices
wraps all of the individual internal database keys with one architectures. Collectively these seismic shifts in IT help
external master key for each server. The Encrypted storage companies prioritize developer agility, productivity and time
engine supports two key management options – in both to market.
cases, the only key being managed outside of MongoDB is
MongoDB offers the fully managed, on-demand and elastic
the master key:
MongoDB Atlas service, in the public cloud. Atlas enables
• Local key management via a keyfile customers to deploy, operate, and scale MongoDB
databases on AWS, Azure, or GCP in just a few clicks or
• Integration with a third party key management appliance
programmatic API calls. MongoDB Atlas is available
via the KMIP protocol (recommended)
through a pay-as-you-go model and billed on an hourly
basis. It’s easy to get started – use a simple GUI to select
Read-Only, Redacted Views the public cloud provider, region, instance size, and features
you need. MongoDB Atlas provides:
DBAs can define non-materialized views that expose only a
subset of data from an underlying collection, i.e. a view that • Automated database and infrastructure provisioning so
filters out specific fields. DBAs can define a view of a teams can get the database resources they need, when
collection that's generated from an aggregation over they need them, and can elastically scale whenever they
another collection(s) or view. Permissions granted against need to.
the view are specified separately from permissions granted • Security features to protect your data, with network
to the underlying collection(s). isolation, fine-grained access control, auditing, and
end-to-end encryption, enabling you to comply with
Views are defined using the standard MongoDB Query
industry regulations such as HIPAA.
Language and aggregation pipeline. They allow the
inclusion or exclusion of fields, masking of field values, • Built in replication both within and across regions for
filtering, schema transformation, grouping, sorting, limiting, always-on availability.
and joining of data using $lookup and $graphLookup to • Global clusters allows you to deploy a fully managed,
another collection. globally distributed database that provides low latency,
responsive reads and writes to users anywhere, with
You can learn more about MongoDB read-only views from
strong data placement controls for regulatory
the documentation.
compliance.
27
• Fine-grained monitoring and customizable alerts for The Stitch serverless platform addresses these challenges
comprehensive performance visibility. by providing four services:
Stitch represents the next stage in the industry's migration MongoDB is a modern database used by the world's most
to a more streamlined, managed infrastructure. Virtual sophisticated organizations, from cutting-edge startups to
Machines running in public clouds (notably AWS EC2) led the largest companies, to create applications never before
the way, followed by hosted containers, and serverless possible at a fraction of the cost of legacy databases.
offerings such as AWS Lambda and Google Cloud MongoDB is the fastest-growing database ecosystem, with
Functions. These still required backend developers to over 35 million downloads, thousands of customers, and
implement and manage access controls and REST APIs to over 1,000 technology and service partners. MongoDB
provide access to microservices, public cloud services, and users rely on the best practices discussed in this guide to
of course data. Frontend developers were held back by maintain the highly available, secure and scalable
needing to work with APIs that weren't suited to rich data operations demanded by organizations today.
queries.
28
We Can Help Resources
We are the MongoDB experts. Over 6,600 organizations For more information, please visit mongodb.com or contact
rely on our commercial products. We offer software and us at [email protected].
services to make your life easier:
Case Studies (mongodb.com/customers)
MongoDB Enterprise Advanced is the best way to run Presentations (mongodb.com/presentations)
MongoDB in your data center. It's a finely-tuned package Free Online Training (university.mongodb.com)
of advanced software, support, certifications, and other Webinars and Events (mongodb.com/events)
services designed for the way you do business. Documentation (docs.mongodb.com)
MongoDB Enterprise Download (mongodb.com/download)
MongoDB Atlas is a database as a service for MongoDB,
MongoDB Atlas database as a service for MongoDB
letting you focus on apps instead of ops. With MongoDB
(mongodb.com/cloud)
Atlas, you only pay for what you use with a convenient
MongoDB Stitch backend as a service (mongodb.com/
hourly billing model. With the click of a button, you can
cloud/stitch)
scale up and down when you need to, with no downtime,
full security, and high performance.
29