Nosql Iontro
Nosql Iontro
NoSQL with
MongoDB
Acquisitions Editor
Mark Taber
All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or
transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without
written permission from the publisher. No patent liability is assumed with respect to the use of
the information contained herein. Although every precaution has been taken in the preparation of
this book, the publisher and author assume no responsibility for errors or omissions. Nor is any
liability assumed for damages resulting from the use of the information contained herein.
Managing Editor
Kristy Hart
ISBN-13: 9780672337130
Copy Editor
Krista Hansing
Editorial
Services, Inc.
ISBN-10: 0672337134
Library of Congress Control Number: 2014942748
Printed in the United States of America
First Printing: September 2014
Trademarks
All terms mentioned in this book that are known to be trademarks or service marks have been
appropriately capitalized. Pearson cannot attest to the accuracy of this information. Use of a term
in this book should not be regarded as affecting the validity of any trademark or service mark.
Special Sales
For information about buying this title in bulk quantities, or for special sales opportunities (which
may include electronic versions; custom cover designs; and content particular to your business,
training goals, marketing focus, or branding interests), please contact our corporate sales department at [email protected] or (800) 382-3419.
For government sales inquiries, please contact [email protected].
For questions about sales outside of the U.S., please contact [email protected].
Project Editors
Melissa Schirmer
Elaine Wiley
Indexer
WordWise
Publishing Services
Proofreader
Kathy Ruiz
Technical Editor
Russell Kloepfer
Publishing
Coordinator
Vanessa Evans
Cover Designer
Mark Shirar
Compositor
Gloria Schurick
Contents at a Glance
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
........................
293
Table of Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
How This Book Is Organized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Code Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Special Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Q&A, Quiz, and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
..............................................................................
10
Contents
Using Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Implementing Looping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Creating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Understanding Variable Scope
..................................................................
52
..........................................................
69
..............................................................................
........................................................
96
100
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Q&A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Workshop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
HOUR 6 : Finding Documents in the MongoDB Collection from the
vi
..............
161
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Q&A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Workshop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
HOUR 9 : Utilizing the Power of Grouping, Aggregation, and Map Reduce . . . . . . . . . . 167
Grouping Results of Find Operations in the MongoDB Shell . . . . . . . . . . . . . . . . . . . . . . . 167
Using Aggregation to Manipulate the Data During Requests from the
MongoDB Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Applying Map Reduce to Generate New Data Results Using the
MongoDB Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Q&A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Workshop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Contents
vii
viii
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
Q&A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
Workshop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
HOUR 14 : Accessing Data from MongoDB in PHP Applications . . . . . . . . . . . . . . . . . . . . . . . . 273
Limiting Result Sets Using PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Finding Distinct Field Values in PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Grouping Results of Find Operations in PHP Applications. . . . . . . . . . . . . . . . . . . . . . . . . . 283
Using Aggregation to Manipulate the Data During Requests from PHP
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
Q&A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
Workshop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
HOUR 15 : Working with MongoDB Data in PHP Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Adding Documents from PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Removing Documents from PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Saving Documents from PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Updating Documents from PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Upserting Documents from PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
Q&A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Workshop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
HOUR 16 : Implementing MongoDB in Python Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Understanding MongoDB Driver Objects in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Finding Documents Using Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
Counting Documents in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
Sorting Result Sets in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Q&A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Workshop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Contents
ix
..................................
367
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Q&A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Workshop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
HOUR 21 : Working with MongoDB Data in Node.js Applications . . . . . . . . . . . . . . . . . . . . . . . 411
Adding Documents from Node.js . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
Removing Documents from Node.js . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
Saving Documents from Node.js . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
Updating Documents from Node.js . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Upserting Documents from Node.js . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Q&A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Workshop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Contents
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
xi
Dedication
For D!
A&F
Acknowledgments
Id like to take this page to thank all those who made this title possible. First, I thank my
wonderful wife and boys for giving me the inspiration and support I need. Id never make
it far without you. Thanks to Mark Taber for getting this title rolling in the right direction,
Russell Kloepfer for his technical review, and Melissa Schirmer for managing everything on
the production end.
Mail:
Sams Publishing
800 East 96th Street
Indianapolis, IN 46240 USA
Reader Services
Visit our website and register this book at informit.com/register for convenient access to
any updates, downloads, or errata that might be available for this book.
Introduction
With billions of people using the Internet today, traditional RDBMS database solutions have difficulty meeting the rapidly growing need to handle large amounts of data. The growing trend is
to introduce specialized databases that are not restricted to the conventions and the legacy overhead of traditional SQL databases. These databases are given the term NoSQL, meaning Not
Only SQL. They are designed not to replace SQL databases, but to provide a different perspective in storing data.
This book teaches you the concepts of NoSQL through the MongoDB perspective. MongoDB is a
NoSQL database that has a reputation for being easy to implement while still robust and scalable. It is currently the most popular NoSQL database in use. MongoDB has matured into a stable platform that several companies have leveraged to provide the data scalability they require.
Each hour in the book provides fundamentals for implementing and using MongoDB as backend storage for high-performing applications. As you complete the 24 one-hour lessons in this
book, you will gain practical understanding of how to build, use, and maintain a MongoDB
database.
So pull up a chair, sit back, and enjoy the process of learning NoSQL through the perspective of
MongoDB development.
Introduction
the drivers for Java, PHP, Python, and Node.js. Each programming language section is isolated,
so if you have no interest in a particular language, you can skip its corresponding hour.
Part IV, Additional MongoDB Concepts, rounds out your knowledge of MongoDB by teaching
you additional MongoDB concepts. In this part, you learn some of the basics of administrating
MongoDB databases and look at more advanced MongoDB concepts such as replication, sharding, and GridFS storage.
Code Examples
Two types of code examples appear in this book. The most common are code snippets that
appear in-line with the text to illustrate talking points. Try It Yourself sections also provide code
examples. These examples are more robust and are designed to run as standalone mini applications. To keep the code examples small and easy to follow, they are compressed, with little or no
error checking, for example.
The Try It Yourself examples are presented in listings that include line numbers to make them
easier to follow. They also include a filename in the listing title to indicate which file the listing
came from. If the code listing in the Try It Yourself section has specific output, a follow-up listing
shows you the console output of the code so that you can follow along as you are reading the
book.
Special Elements
As you complete each lesson, margin notes help you immediately apply what you just learned to
your own web pages.
Whenever a new term is used, it is clearly highlightedno flipping back and forth to a glossary.
TIP
Tips and tricks to save you precious time are set aside in Tip boxes so that you can spot them
quickly.
NOTE
Note boxes highlight interesting information you want to be sure not to miss.
CAUTION
When you need to watch out for something, youre warned about it in Caution boxes.
HOUR 1
At the core of most large-scale applications and services is a high-performance data storage
solution. The back-end data store is responsible for storing important data such as user account
information, product data, accounting information, and blogs. Good applications require the
capability to store and retrieve data with accuracy, speed, and reliability. Therefore, the data
storage mechanism you choose must be capable of performing at a level that satisfies your
applications demand.
Several data storage solutions are available to store and retrieve the data your applications
need. The three most common are direct file system storage in files, relational databases, and
NoSQL databases. The NoSQL data store chosen for this book is MongoDB because it is the most
widely used and the most versatile.
The following sections describe NoSQL and MongoDB and discuss the design considerations to
review before deciding how to implement the structure of data and the database configuration.
The sections cover the questions to ask and then address the mechanisms built into MongoDB
that satisfy the resulting demands.
What Is NoSQL?
A common misconception is that the term NoSQL stands for No SQL. NoSQL actually stands for
Not only SQL, to emphasize the fact that NoSQL databases are an alternative to SQL and can,
in fact, apply SQL-like query concepts.
NoSQL covers any database that is not a traditional relational database management system
(RDBMS). The motivation behind NoSQL is mainly simplified design, horizontal scaling, and
finer control over the availability of data. NoSQL databases are more specialized for types of
data, which makes them more efficient and better performing than RDBMS servers in most
instances.
NoSQL seeks to break away from the traditional structure of relational databases, and enable
developers to implement models in ways that more closely fit the data flow needs of their system.
This means that NoSQL databases can be implemented in ways that traditional relational databases could never be structured.
Several different NoSQL technologies exist, including the HBase column structure, the Redis key/
value structure, and the Virtuoso graph structure. However, this book uses MongoDB and the
document model because of the great flexibility and scalability offered in implementing backend storage for web applications and services. In addition, MongoDB is by far the most popular
and well-supported NoSQL language currently available. The following sections describe some of
the NoSQL database types.
Key-Value Databases
The simplest type of NoSQL database is the key-value stores. These databases store data in a
completely schema-less way, meaning that no defined structure governs what is being stored. A
key can point to any type of data, from an object, to a string value, to a programming language
function.
The advantage of key-value stores is that they are easy to implement and add data to. That
makes them great to implement as simple storage for storing and retrieving data based on a key.
The downside is that you cannot find elements based on the stored values.
What does my data look like? Your data might favor a table/row structure of RDBMS, a
document structure, or a simple key-value pair structure.
How is the current data stored? If your data is stored in an RDBMS database, you must
evaluate what it would take to migrate all or part to NoSQL. Also consider whether it
is possible to keep the legacy data as is and move forward with new data in a NoSQL
database.
How important is the speed of the database? If speed is the most critical factor for your
database, NoSQL might fit your data well and can provide a huge performance boost.
What happens when the data is not available? Consider how critical it is for customers when data is not available. Keep in mind that customers view situations in which
your database is too slow to respond as unavailability. Many NoSQL solutions, including
MongoDB, provide a good high availability plan using replication and sharding.
How is the database being used? Specifically, consider whether most operations on the
database are writes to store data or whether they are reads. You can also use this exercise
as an opportunity to define the boundaries of how to split up data, enabling you to gear
some data toward writes and other data toward reads.
Should I split up the data to leverage the advantages of both RDBMS and NoSQL?
After you have looked at the previous questions, you might want to consider putting some
of the data, such as critical transactions, in an RDBMS while putting other data, such as
blog posts, in a NoSQL database.
Understanding MongoDB
MongoDB is an agile and scalable NoSQL database. The name Mongo comes from the word
humongous. MongoDB is based on the NoSQL document store model, in which data objects are
stored as separate documents inside a collection instead of in the traditional columns and rows
of a relational database. The documents are stored as binary JSON or BSON objects.
The motivation of the MongoDB language is to implement a data store that provides high performance, high availability, and automatic scaling. MongoDB is extremely simple to install and
implement, as you will see in upcoming hours. MongoDB offers great website back-end storage
for high-traffic websites that need to store data such as user comments, blogs, or other items
because it is fast, scalable, and easy to implement.
The following are some additional reasons MongoDB has become the most popular NoSQL
database:
Document oriented: Because MongoDB is document oriented, the data is stored in the
database in a format that is very close to what you will be dealing with in both server-side
and client-side scripts. This eliminates the need to transfer data from rows to objects and
back.
High scalability: MongoDBs structure makes it easy to scale horizontally by sharding the
data across multiple servers.
No SQL injection: MongoDB is not susceptible to SQL injection (putting SQL statements
in web forms or other input from the browser that compromises the DB security) because
objects are stored as objects, not by using SQL strings.
Understanding MongoDB
Understanding Collections
MongoDB groups data through collections. A collection is simply a grouping of documents that
have the same or a similar purpose. A collection acts similarly to a table in a traditional SQL
database. However, it has a major difference: In MongoDB, a collection is not enforced by a
strict schema. Instead, documents in a collection can have a slightly different structure from one
another, as needed. This reduces the need to break items in a document into several different
tables, as is often done in SQL implementations.
Understanding Documents
A document is a representation of a single entity of data in the MongoDB database. A collection
consists of one or more related objects. A major difference exists between MongoDB and SQL, in
that documents are different from rows. Row data is flat, with one column for each value in the
row. However, in MongoDB, documents can contain embedded subdocuments, providing a much
closer inherent data model to your applications.
In fact, the records in MongoDB that represent documents are stored as BSON, a lightweight
binary form of JSON. It uses field:value pairs that correspond to JavaScript property:value
pairs that define the values stored in the document. Little translation is necessary to convert
MongoDB records back into JSON strings that you might be using in your application.
For example, a document in MongoDB might be structured similar to the following, with name,
version, languages, admin, and paths fields:
{
name: "New Project",
version: 1,
languages: ["JavaScript", "HTML", "CSS"],
admin: {name: "Brad", password: "****"},
paths: {temp: "/tmp", project:"/opt/project", html: "/opt/project/html"}
}
Notice that the document structure contains fields/properties that are strings, integers, arrays,
and objects, just as in a JavaScript object. Table 11.1 lists the different data types for field values
in the BSON document.
The field names cannot contain null characters, dots (.), or dollar signs ($). In addition, the
_id field name is reserved for the Object ID. The _id field is a unique ID for the system that consists of the following parts:
A 2-byte process ID
10
The maximum size of a document in MongoDB is 16MB, to prevent queries that result in an
excessive amount of RAM or intensive hits to the file system. You might never come close to this,
but you still need to keep the maximum document size in mind when designing some complex
data types that contain file data into your system.
TABLE 1.1
Type
Number
Double
String
Object
Array
Binary data
Object ID
Boolean
Date
Null
10
Regular expression
11
JavaScript
13
Symbol
14
15
32-bit integer
16
Timestamp
17
64-bit integer
18
Type
Number
Min key
255
Max key
127
11
Another point to be aware of when working with the different data types in MongoDB is the
order in which they are compared when querying to find and update data. When comparing
values of different BSON types, MongoDB uses the following comparison order, from lowest to
highest:
12
How will groups of object types be accessedcommon ID, common property value,
or other?
When you have the answers to these questions, you are ready to consider the structure of collections and documents inside MongoDB. The following sections discuss different methods of document, collection, and database modeling you can use in MongoDB to optimize data storage and
access.
13
stores collection. The application can then use the reference ID favoriteStore to link data
from the Users collection to FavoriteStore documents in the FavoriteStores collection.
Figure 1.1 illustrates the structure of the Users and FavoriteStores collections just described.
Server
Browser
JavaScript
Client Side
JSON/XML/
etc.
JS
JPG
HTTP GET/PUT/AJAX
HTML
User
HTML/
CSS/
Images
JSON
CSS
Webserver
Apache/IIS/
etc.
Other
Services
Server-Side
Scripts
PHP/Java/.NET/
C++/etc.
User
Interactions
Files
DB
MySQL
Oracle
etc.
FIGURE 1.1
Defining normalized MongoDB documents by adding a reference to documents in another collection.
14
The work property takes a bit more thinking. How many people are you really going to get who
have the same work contact information? If the answer is not many, the work object should be
embedded with the User object. How often are you querying the User and need the work contact information? If you will do so rarely, you might want to normalize work into its own collection. However, if you will do so frequently or always, you will likely want to embed work with
the User object.
Figure 1.2 illustrates the structure of Users with the home and work contact information embedded, as described previously.
Browser
AngularJS
JavaScript
Client Side
Server
HTML
CSS
JSON/XML/
etc.
User
Node.js
HTTP GET/PUT/AJAX
HTML/
CSS/
Images
MongoDB
JSON
JPG
JS
User
Interactions
Express
Webserver
ServerSide
Scripts
Other
Services
Files
FIGURE 1.2
Defining denormalized MongoDB documents by implementing embedded objects inside a document.
Capped collections guarantee that the insert order is preserved. Queries do not need to
use an index to return documents in the order they were stored, eliminating the indexing
overhead.
15
Capped collections guarantee that the insertion order is identical to the order on disk by
prohibiting updates that increase the document size. This eliminates the overhead of relocating and managing the new location of documents.
Capped collections automatically remove the oldest documents in the collection. Therefore,
you do not need to implement deletion in your application code.
You cannot update documents to a larger size after they have been inserted into the
capped collection. You can update them, but the data must be the same size or smaller.
You cannot delete documents from a capped collection. The data will take up space on
disk even if it is not being used. You can explicitly drop the capped collection, which effectively deletes all entries, but you also need to re-create it to use it again.
A great use of capped collections is as a rolling log of transactions in your system. You can
always access the last X number of log entries without needing to explicitly clean up the oldest.
16
One way to mitigate document growth is to use normalized objects for properties that can grow
frequently. For example instead of using an array to store items in a Cart object, you could create a collection for CartItems; then you could store new items that get placed in the cart as new
documents in the CartItems collection and reference the users cart item within them.
Indexing: Indexes improve performance for frequent queries by building a lookup index
that can be easily sorted. The _id property of a collection is automatically indexed on
because looking up items by ID is common practice. However, you also need to consider
other ways users access data and implement indexes that enhance those lookup methods
as well.
Sharding: Sharding is the process of slicing up large collections of data among multiple
MongoDB servers in a cluster. Each MongoDB server is considered a shard. This provides
the benefit of utilizing multiple servers to support a high number of requests to a large system. This approach provides horizontal scaling to your database. You should look at the
size of your data and the amount of request that will be accessing it to determine whether
to shard your collections and how much to do so.
Summary
17
Summary
At the core of most large-scale web applications and services is a high-performance data storage solution. The back-end data store is responsible for storing everything from user account
information, to shopping cart items, to blog and comment data. Good web applications require
the capability to store and retrieve data with accuracy, speed, and reliability. Therefore, the data
storage mechanism you choose must perform at a level to satisfy user demand.
18
: Introducing1:NoSQL
CHAPTER
Introducing
and MongoDB
NoSQL and MongoDB
Several data storage solutions are available to store and retrieve data your web applications
need. The three most common are direct file system storage in files, relational databases, and
NoSQL databases. The data store chosen for this book is MongoDB, which is a NoSQL database.
In this hour, you learned about the design considerations to review before deciding how to
implement the structure of data and configuration of a MongoDB database. You also learned
which design questions to ask and then how to explore the mechanisms built into MongoDB to
answer those questions.
Q&A
Q. What types of distributions are available for MongoDB?
A. General distributions for MongoDB support Windows, Linux, Mac OS X, and Solaris.
Enterprise subscriptions also are available for professional and commercial applications
that require enterprise-level capabilities, uptime, and support. If the MongoDB data is
critical to your application and you have a high amount of DB traffic, you might want to
consider the paid subscription route. For information on the subscription, go to
https://www.mongodb.com/products/mongodb-subscriptions.
Q. Does MongoDB have a schema?
A. Sort of. MongoDB implements dynamic schemas, enabling you to create collections without
having to define the structure of the documents. This means you can store documents that
do not have identical fields.
Workshop
The workshop consists of a set of questions and answers designed to solidify your understanding
of the material covered in this hour. Try answering the questions before looking at the answers.
Quiz
1. What is the difference between normalized and denormalized documents?
2. True or false: JavaScript is a supported data type in a MongoDB document.
3. What is the purpose of a capped collection?
Quiz Answers
1. Denormalized documents have subdocuments within them, whereas subdocuments of normalized documents are stored in a separate collection.
2. True.
Workshop
19
3. A capped collection enables you to limit the total size or number of documents that can be
stored in a collection, keeping only the most recent.
Exercises
1. Go to the MongoDB documentation website and browse the FAQ page. This page answers
several questions on a variety of topics that can give you a good jump-start. You can find
the FAQ page at http://docs.mongodb.org/manual/faq/.
Index
! (not) operator, 42
% (modulous) operator, 40
$ operator, 147
* (multiplication) operator, 40
+ (addition) operator, 40
++ (increment) operator, 40
-- (decrement) operator, 40
- (subtraction) operator, 40
/ (brackets), 45, 49
/ (division) operator, 40
() (parentheses), 50
SYMBOLS
506
objects, 12
applying aggregation,
287-290
indexes, 438-440
|| (or) operator, 42
objects, 12
A
accessing
applying aggregation,
344-347
documents, counting,
125-127
files, 492-493
overview of, 69
database administrator,
formatting, 79
users
aggregation
applying, 171-178
formatting, 72
operators
managing, 70-78
adding
expression, 173-175
framework, 174-172
PHP applications, 287-290
pipelines, 176
Python application, 344-347
documents
MongoGridFS objects in
PHP, 490
Node.js applications,
411-416
Java, 225-228
applying aggregation,
406-409
administrator, formatting,
78
accuracy, 7, 17
admin database
authentication, starting, 79
applying aggregation,
225-228
accounts
items to arrays, 63
Collection object
strings
converting, 62
splitting, 58
applying
aggregation, 171-178
Java, 225-228
assigning
Node.js applications,
406-409
roles, 71
Booleans, 39
both value and type are equal
(===) operator, 42
both value and type are not equal
(!==) operator, 42
brackets (/), 45, 49
BSON (binary JSON), 9
values to variables, 38
assignment operators, 41
pipelines, 176
arrays, 60-65
if statements, 43-44
authentication, starting, 79
indexes, 438-443
auth() method, 87
changeUserPassword()
method, 87
replication, 459-467
auth setting, 24
characters, null, 9
Python application,
344-347
anonymous functions, 51
clients, shells
scripting, 33-34
cloneCollection() method, 87
backing up
cloneDatabase() method, 87
databases, 454-455
MongoDB, 454-455
BasicDBObject object, Java
applications, 191-194
contents, 118
searching documents
based on, 118
adding, 473
formatting, 475-479
items
507
deploying, 472
adding/deleting, 63
searching, 63
bind_ip setting, 24
blocks
Collection object, 89
iterating through, 62
manipulating, 61
finally, 66
try/catch, 65
508
collections
collections, 9
capped, formatting, 14-15,
436-437
databases, managing,
433-437
combining
arrays, 62
constructors, shells, 29
strings, 58
converting
arrays into strings, 62
deleting, 98-100
<database>, 87
design, 16
getLastError, 144-145
mongofiles, 482-483
copyDatabase() method, 87
parameters, 30-31
copyTo() method, 89
top, 451-453
use <new_database_name>,
92
comparison operators, 42
records, 9
configuring
databases, 22
MongoDB, 23-26
PHP application write
concerns, 257
createIndex() method, 89
servers, 461
limiting, 130-138
connections
sorting, 128-130
managing, 91
optimizing, 443-453
repairing, 453-454
modifying, 92
JavaScript, 38-39
profiling, 446-448
declaring variables, 38
MongoDB, 10-11
results
data types
<database> command, 87
Database object, 86-87
grouping, 167-171
defining
limiting, 130-138
documents, 13
sorting, 128-130
functions, 49
roles, assigning, 71
databases
selecting, 7-8
variables, 37-38
509
deleting
collections, 98-100
databases, 93-94
documents
testing, 31
collections, 236
Collection object, 89
collections, managing,
433-437
users, listing, 74
Node.js applications,
416-419
validating, 444-446
column store, 7
Python application,
353-355
dataSize() method, 89
indexes, 441
DB object
configuring, 22
copying, 434-435
objects, 12
deleting, 93-94
users, 77
document store, 6
formatting, 92-93
dbAdminAnyDatabase role, 71
deploying
graph store, 7
dbAdmin role, 71
indexes, 438-443
DBCollection object
key-value, 6
design, 6, 16
lists, viewing, 91
510
Java applications
updating, 302-305
adding, 231-235
upserting, 305-308
counting, 201-203
deleting, 236-238
adding, 349-353
saving, 239-241
counting, 324-326
searching, 194-201
deleting, 353-355
saving, 355-357
updating, 241-245
searching, 318-324
upserting, 245-249
manipulating, 143
updating, 358-361
upserting, 361-364
distinctField() method, 89
adding, 411-416
counting, 383-385
deleting, 416-419
results
documents
limiting, 130-138
sorting, 128-130
retrieving, 377-383
sizing, 10
saving, 419-423
updating, 423-427
updating, 15
upserting, 427-431
do/while loops, 45
drivers
reviewing, 260-262
saving, 299-302
searching, 259-265
fromdb parameter
find operations
PHP applications, 283-287
511
arrays
embedding documents,
denormalizing data, 13-14
enabling sharding
contents, 118
searching documents
based on, 118
collections, 474
databases, 474
naming, 9
engines, starting/stopping, 22
parameters, 213
eval() method, 87
evaluating
queries, 449-451
shells, expressions, 31-32
example datasets, implementing,
100-103
executing
shell scripting, 32
variables, 38
exit command, 28
values
files
collections, 96-98
throwing, 66
configuration settings, 24
GridFS Stores
database administrator
accounts, 79
databases, 92-93
example datasets, 101
functions, 49-52
listing, 485
manipulating, 492-493
users
accounts, 72
administrator accounts, 78
frameworks, aggregation
operators, 174-172
finally blocks, 66
512
fromhost parameter
greeting() function, 50
functions
GridFS Stores
anonymous, applying, 51
callback, 368
defining, 49
formatting, 49-52
greeting(), 50
print(), 32
values, returning, 50
variables, passing, 50
files
Hello World, 49
manipulating, 492-493
retrieving, 486, 491, 495,
499
implementing, 481
Java, 484-489
Node.js applications
Python
accessing, 496-497
implementing, 494-497
shells, implementing,
482-484
group() method, 89, 168, 188,
254, 314, 371
Node.js applications,
497-501
PHP, 489-493
grouping
Node.js applications,
402-406
PHP applications, 283-287
Python applications, 341
growth, documents, 15. See also
updating
Python, 494-497
shells, 482-484
in Java applications, 185.
See also Java applications
looping, 44-49
replication, 459
strategies, 7
switch statements, 44
upsert, 158
increment (++) operator, 40
indexes, 16
adding, 438-440
collections, reindexing,
441-443
deleting, 441
indexOf() method, 58
initial parameter, 168
insert() method, 89, 158, 188,
231, 254, 314, 371
inserting, 149. See also adding
shells, 30, 40
variables
defining, 37-38
HTTP, 26
REST, 26
documents
interrupting loops, 47
adding, 231-235
counting, 201-203
deleting, 236-238
saving, 239-241
updating, 241-245
scope, 52-53
journal setting, 24
jsMode option, 179
K
key-value databases, 6
keyf parameter, 168
keys
parameters, 168, 434
sharding, selecting, 470-471
values
searching, 194-201
upserting, 245-249
items, arrays
searching, 63
objects, 53-56
BasicDBObject object,
191-194
DB object, 187
adding/deleting, 63
installing MongoDB, 22
interfaces
513
514
limiting
limiting
fields, 132, 212
Node.js applications, 394
lookups, denormalized
documents, 13
requests, applying
aggregation, 171-178
looping
results, 181
do/while loops, 45
results
interrupting, 47
variables, 38
maxConns setting, 24
while loops, 45
methods
Node.js applications,
391-399
implementing, 44-49
strings, 56-60
add_user(), 313
addUser(), 70, 87, 187, 370
append(), 192
managing
auth(), 87
collections, 96
batch_size(), 316
configuration settings, 23
databases, 91
collections, 433-437
indexes, 438-443
optimizing, 443-453
repairing, 453-454
shells, 433
user accounts, 70-78
manipulating. See also modifying
arrays, 61
documents, 143
GridFS Stores
files, 492-493
Node.js applications,
500-501
Python, 496-497
changeUserPassword(), 87
cloneCollection(), 87
cloneDatabase(), 87
close(), 186, 191, 252, 312,
368
Collection objects, 89
collection_names(), 313
collections(), 370
connect(), 186, 252, 368
Connection objects, 86
copy(), 191
copyDatabase(), 87
copyTo(), 89
methods
515
find_and_modify(), 314
limitResults(), 210
find_one(), 314
listCollections(), 253
listDBs(), 252
logout(), 87
findone(), 112
createIndex(), 89
forEach(), 108
current(), 256
getCollectionNames(), 96
max(), 108
Database objects, 87
getConnections(), 252
min(), 108
database_names(), 312
getDatabaseNames(), 186
dataSize(), 89
native, shells, 29
DB objects, 187
getIndexes(), 89
db(), 368
getLastError(), 187
getMongo(), 87
objsLeftInBatch(), 108
getName(), 87
open(), 368
displayGroup(), 284
getNext(), 256
displayWords(), 131
getReadPrefMode(), 86
print(), 40
getReadPrefTagSet(), 86
printjson(), 40
getSiblingDB(), 87
push(), 60
distinctField(), 89
getStats(), 188
read_preference(), 312-314
readPref(), 108
drop_collection(), 313
drop_database(), 312
help(), 87
drop_index(), 314
hint(), 108
remove_user(), 313
dropCollection(), 370
hostInfo(), 87
indexOf(), 58
renameCollection(), 89
isAuthenticated(), 187
countWords(), 202
create_collection(), 313
createCollection(), 87, 97,
187, 253, 370
eval(), 87
find(), 89, 107, 112, 126,
188, 254, 259, 314, 371,
377
isCapped(), 89
iterator(), 191
limit(), 108, 130, 136, 191,
210, 216, 256, 274, 316,
332, 373
load(), 32
main(), 202
reIndex(), 89
repairDatabase(), 87
resetDoc(), 242
runCommand(), 87, 144
save(), 89, 155, 188, 239,
254, 299, 314, 371
516
methods
selectCollection(), 252-253
selectDB(), 252
databases, 92
serverStatus(), 87
objects, 12
setReadPreference(),
186-188, 252-254, 313
modifying
setReadPrefMode(), 86
setSlaveOk(), 86
setWriteConcern(), 186-188
n property, 145
showWord(), 300
shutdownServer(), 87
naming
snapshot(), 108
configuring, 23-26
noauth setting, 24
stats(), 89, 95
storageSize(), 89
installing, 22
totalIndexSize(), 89
shells
totalSize(), 89
scripting, 31-34
starting, 22
validate(), 92
stopping, 25
write_concern(), 312-314
variables, 38
HTTP interfaces,
accessing, 26
version(), 87
fields, 9
operators
Cursor, 107-108
operators
updating, 423-427
$, 148
upserting, 427-431
$add, 175
Database, 86-87
$all, 110
DB, 187
$and, 110
implementing, 497-501
DBCollection, 188
$avg, 173
$bit, 148
DBObject, 191-194
$divide, 175
$each, 148
drivers
$elemMatch, 110
GridFS Stores
$concat, 175
$exists, 110
NoSQL
$first, 173
overview of, 6
Node.js applications,
367-377
selecting, 7-8
$gt, 110
$gte, 110
null characters, 9
grouping, 169
$in, 110
null variables, 39
JavaScript, 53-56
$inc, 147
number
$last, 173
literals, 39
of servers, 462
MongoClient, 186
$lt, 110
$lte, 110
numbers, 38
patterns, prototyping, 55
planning, 11
$match, 172
$max, 173
$min, 173
$mod, 110, 175
$multiply, 175
$ne, 110
$nin, 110
$nor, 110
$not, 110
$or, 110
$pop, 148
$project, 172
$pull, 148
517
518
operators
$pullAll, 148
command-line, 22, 23
$regex, 110
$rename, 147
$set, 148
$setOnInsert, 147
$size, 110
$skip, 172
JavaScript, 40-44
$slice, 148
modulous (%), 40
multiplication (*), 40
operator, 287
$strcasecmp, 175
not (!), 42
projection, 133
$substr, 175
or (||), 42
$subtract, 175
query, 109-110
parentheses (()), 50
$sum, 173
subtraction (-), 40
update, 146-147
$toLower, 175
parameters
commands, 30-31
fields, 213
$toUpper, 175
$type, 110
$unset, 148
$unwind, 172
addition (+), 40
aggregation
expression, 173-175
or (||) operator, 42
out option, 179
framework, 174-172
and (&&), 42
arithmetic, 40-41
assignment, 41
both value and type are equal
(===), 42
both value and type are not
equal (!==), 42
comparison, 42
performance, 17
databases, managing,
443-453
applying aggregation,
287-290
paging
documents, Node.js
applications, 397
requests, 128
results, 136
decrement (--), 40
division (/), 40
DB object, 253
fields:value, 110
increment (++), 40
is equal to (==), 42
documents, 293
repairDatabase() method
adding, 293-297
counting, 265-267
deleting, 297-299
reviewing, 260-262
saving, 299-302
searching, 259-265
applying aggregation,
344-347
updating, 302-305
upserting, 305-308
documents, 349
read role, 71
adding, 349-353
counting, 324-326
read_preference() method,
312-314
MongoCollection object,
253-254
deleting, 353-355
readAnyDatabase role, 71
saving, 355-357
searching, 318-324
readWrite role, 71
readWriteAnyDatabase role, 71
updating, 358-361
records, converting, 9
upserting, 361-364
accessing, 496-497
reIndex() method, 89
implementing, 494-497
port setting, 24
519
reliability, 7
print() function, 32
print() method, 40
printjson() method, 40
queries
evaluating, 449-451
routers, 469
starting, 473
renameCollection() method, 89
520
repairing databases
paging, 136
PHP applications
assigning, 71
grouping, 283-287
paging, 278
clusterAdmin privileges, 91
routers
sorting, 267-270
queries, 469
Python application
limiting, 331-338
starting, 473
runCommand() method, 87, 144
sorting, 326-328
Python applications
grouping, 341
manipulating, applying
aggregation, 171-178
sorting, 128-130
paging, 128
roles
paging, 336
retrieving
documents
S
save() method, 89, 155, 188,
239, 254, 299, 314, 371
saving
databases, 454-455
documents
collections, 155-158, 239
REST interfaces, 26
Node.js applications,
377-383
Node.js applications,
419-423
rest setting, 24
results, 30-31
Python application,
355-357
mapReduce() method,
applying, 178-183
Node.js applications
grouping, 402-406
limiting, 391-399
sorting, 385-388
objects, 132
scalability, 6
scope
options, 179
variables, 52-53
scripting shells, 31-34
clients, 33-34
executing, 32
searching
array items, 63
contents, 118
slaveOk parameter
setReadPrefMode() method, 86
counting, 125-127
deleting, 161-163
Node.js applications,
400-402
saving, 155-158
documents
in shells, 112-116
limiting, 130-138
setSlaveOk() method, 86
updating, 151-155
Node.js applications,
377-383
upserting, 158-160
setWriteConcern() method,
186-188
expressions, evaluating,
31-32
JavaScript, 30, 40
adding, 473
native methods, 29
deploying, 472
formatting, 475-479
results
grouping, 167-171
mapReduce() method,
178-183
implementing, 468-479
keys, selecting, 470-471
scripting, 31-34
starting, 28
NoSQL, 7-8
partitioning methods, 471-472
RDBMS, 7-8
sharding keys, 470-471
servers
aggregation, applying,
171-178
clients
shells
selecting
521
shutdownServer() method, 87
documents, 10
constructors, 29
522
snapshot() method
totalIndexSize() method, 89
storageSize() method, 89
totalSize() method, 89
strategies, 7
troubleshooting
strings
databases
managing, 443-453
repairing, 453-454
arrays
converting, 62
try/catch blocks, 65
splitting, 58
combining, 58
manipulating, 56-60
types
specific documents
data
substrings, searching, 58
JavaScript, 38-39
PHP, 262-265
words, replacing, 58
MongoDB, 10-11
speed, 7
split() method, 58
splitting strings into arrays, 58
switch statements,
implementing, 44
starting
of indexes, 438-440
of loops, 44-49
of replica sets, 460
of sharding servers, 469
authentication, 79
unique property, 440
MongoDB, 22
query routers, 473
shells, 28
statements
return, 50
testing databases, 31
switch, implementing, 44
throwing errors, 66
stopping MongoDB, 25
wtimeout property
upserting documents
fields
searching distinct,
138-140, 218-221
functions, returning, 50
usability, 10
null variables, 39
subdocuments, searching,
118
use <new_database_name>
command, 92
var keyword, 38
user accounts
variables
authentication, starting, 79
defining, 37-38
formatting, 72
functions, passing, 50
managing, 70-78
scope, 52-53
userAdminAnyDatabase role, 71
version() method, 87
userAdmin role, 71
viewing
collections, 96
users
databases
deleting, 77
lists, 91
stats, 94-96
lists, 74
verbose setting, 24
statistics, 443-444
validate() method, 92
while loops, 45
values
configuring, 143-144
523
write concerns