0% found this document useful (0 votes)
2 views60 pages

Module-3

This document provides an overview of document databases, focusing on MongoDB, including its features, commands, and use cases. It highlights the flexibility of document structures compared to relational databases, emphasizing schema-less design and support for various data types. Key functionalities such as ad-hoc queries, indexing, replication, and sharding are also discussed, along with practical examples of MongoDB commands for data manipulation.

Uploaded by

sujitagrahari555
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views60 pages

Module-3

This document provides an overview of document databases, focusing on MongoDB, including its features, commands, and use cases. It highlights the flexibility of document structures compared to relational databases, emphasizing schema-less design and support for various data types. Key functionalities such as ad-hoc queries, indexing, replication, and sharding are also discussed, along with practical examples of MongoDB commands for data manipulation.

Uploaded by

sujitagrahari555
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

MODULE 3: DOCUMENT DATABASES

OUTLINE

 What Is a Document Database?


 MongoDB Features
 MongoDB Commands: Create, Drop
 MongoDB - Datatypes, Query features, Using the Dot
Notations, Using the Sort, Limit, and Skip Functions,
Retrieving a Single Document, Returning the Number of
Documents with Count(), Retrieving Unique Values with
Distinct(), Renaming a Collection, Removing Data
 Consistency, Transactions, Availability, Scaling,
 Suitable Use Cases,
 When Not to Use: Complex Transactions Spanning Different
Operations, Queries against Varying Aggregate Structure
DOCUMENT DATABASES
 Documents are the main concept in document
databases.
 The database stores and retrieves documents, which
can be XML, JSON, BSON, and so on.
 These documents are self-describing, hierarchical tree
data structures which can consist of maps, collections,
and scalar values.
 The documents stored are similar to each other but do
not have to be exactly the same.
 Document databases store documents in the value
part of the key-value store.
DOCUMENT DATABASES
COLLECTIONS
COLLECTIONS
ORACLE VS MONGODB
DIFFERENCE BETWEEN DOCUMENT
DATABASES & RELATIONAL DATABASES
DOCUMENT DATA MODEL
 It is a type of non-relational database that is designed to store
and query data as JSON-like documents which makes it
easier for developer to store and query data in a database.
 It works well with use cases such as catalogs, user profiles
etc.
 In document store database the data which is collection of
key-value pairs.
 The flexible, semi-structured and hierarchical nature of
documents
 This document can be considered a row in a traditional RDBMS
WHAT IS A DOCUMENT DATABASE?
{
"firstname": "Martin",
"likes": [ "Biking", "Photography" ],
"lastcity": "Boston"
}

 The above document can be considered a row in a traditional


RDBMS.
DOCUMENT2
{
"firstname": "Pramod",
"citiesvisited": [ "Chicago", "London", "Pune", "Bangalore"
],
"addresses": [
{ "state": "AK",
"city": "DILLINGHAM",
"type": "R"
},
{ "state": "MH",
"city": "PUNE",
"type": "R" }
],
 Looking at the documents, we can see that they are similar, but
have differences in attribute names. This is allowed in document
databases.
 The schema of the data can differ across documents, but these
documents can still belong to the same collection —unlike an
RDBMS where every row in a table has to follow the same schema.
 This different representation of data is not the same as in RDBMS
where every column has to be defined, and if it does not have data
it is marked as empty or set to null.
 In documents, there are no empty attributes; if a given attribute is
not found, we assume that it was not set or not relevant to the
document.
 Documents allow for new attributes to be created without the need
to define them or to change the existing documents.
SOME OF THE POPULAR DOCUMENT
DATABASES

 MongoDB
 CouchDB

 Terrastore

 OrientDB

 RavenDB

 Lotus Notes
KEY FEATURES OF DOCUMENT
DATABASES
 Schema Flexibility – No predefined structure; each
document can have different fields.
 Hierarchical Data Storage – Supports nested objects and
arrays within documents.
 Rich Querying – Allows filtering, indexing, full-text search,
and aggregation.
 Horizontal Scalability – Uses sharding to distribute data
across multiple servers.
 High Performance – Optimized for fast reads and writes,
making them ideal for real-time applications.
DIFFERENCE BETWEEN KEY-VALUE AND
DOCUMENT DATABASE
Features Document Database Key-Value Store
Data Format Stores data as structured JSON, Stores data as simple key-
BSON, or XML documents value pairs
Schema each document can have different fields values can be anything: text,
JSON, binary, etc.
Query Language NoSQL query languages (e.g., Only supports simple
MongoDB Query Language) GET/PUT operations

Filtering & Supports filtering, sorting, and No advanced searching, only


Searching aggregation retrieves by key

Scalability Horizontally scalable with sharding Horizontally scalable, often


faster than document DBs

Performance Fast for reads/writes but slower than High-speed performance for
key-value stores key-based retrieval
E-commerce catalog (structured Caching user sessions,
Use Case Example
product details), social media posts, session storage,
IoT data leaderboards
USE CASES OF DOCUMENT DATABASES
 E-Commerce – Storing dynamic product catalogs with
varying attributes.
 Content Management Systems – Managing blog posts,
comments, and metadata.
 Real-Time Chat Applications – Storing chat history and
user messages.
 IoT Data Storage – Managing large-scale sensor data from
connected devices.
 Personalization & Recommendations – Storing user
preferences and behavior data.
MONGO DB FEATURES:
MONGO DB FEATURES
Ad-hoc Queries
 Generally, when we design a schema of a database, we don’t
know in advance about the queries we will perform.
 Ad-hoc queries are the queries not known while structuring
the database.
 So, MongoDB provides ad-hoc query support which makes it so
special in this case.
 Ad-hoc queries are updated in real time, leading to an
improvement in performance.
AD-HOC QUERIES
Adhoc Queries SET @SQL =
SELECT LastName, N'SELECT LastName,
FirstName FirstName
FROM Person.Person;
FROM Person.Person
WHERE BusinessEntityID
Stored
Procedure(dynamic = @ID';
query) SET @ID = 1;
DECLARE @SQL SET @Param = N'@ID INT ';
NVARCHAR(MAX);
EXEC sp_executesql @SQL,
DECLARE @ID INT; @Param, @ID = @ID;
DECLARE@Param
NVARCHAR(MAX);
ADHOC QUERIES IN MONGODB
const results = await db.collection.find({ name:
req.query.name });

 In the above example, req.query.name is only


known at the time of execution, thus making our
query an ad-hoc query.
MONGO DB FEATURES
Schema-Less Database
• In MongoDB, one collection holds different
documents.
• It has no schema so can have many fields, content,
and size different than another document in the
same collection.
• This is why MongoDB shows flexibility in dealing
with the databases.
SQL(SCHEMA)
NOSQL
MONGO DB FEATURES
Document-Oriented
 In the relational databases, there are tables and rows
for arrangements of the data.
 Every row has specific no. of columns & those can
store a specific type of data.
 Here comes the flexibility of NoSQL where there are
fields instead of tables and rows.
 There are different documents which can store
different types of data.
 There are collections of similar documents.
 Each document has a unique key id or object id which
can both be user or system defined.
DOCUMENT ORIENTED
INDEXING
 Indexing is very
important for improving
the performances of
search queries.
 In MongoDB, we can
index any field indexed
with primary and
secondary indices.
 Making query searches
faster, MongoDB
indexing enhances the
performance.
REPLICATION
 This feature distributes
data to multiple
machines.
 It can have primary
nodes and their one or
more replica sets.
 When the primary node
is down for some
reasons, the secondary
node becomes primary
for the instance.
 This saves our time for
maintenance and makes
operations smooth.
AGGREGATION
 We can batch process
data and get a single
result even after
performing different
operations on the group
data.
 The aggregation
pipeline, map-reduce
function, and single
purpose aggregation
methods are the three
ways to provide an
aggregation framework
GRIDFS
 GridFS is a feature of
storing and retrieving
files.
 For files larger than
16 MB this feature is
very useful.
 GridFS divides a
document in parts
called chunks and
stores them in a
separate document.
 These chunks have a
default size of 255kB
except the last chunk.
SHARDING
 sharding comes when we
need to deal with larger
datasets.
 This feature helps to
distribute data to multiple
MongoDB instances.
 The collections in the
MongoDB which has a
larger size are distributed
in multiple collections.
 These collections are
called “shards”.
 Shards are implemented
by clusters.
HIGH PERFORMANCE
 MongoDB is an open source database with high
performance.
 This shows high availability and scalability.

 It has faster query response because of indexing and


replication.
 This makes it a better choice for big data and real-
time applications.
WORKING OF MONGODB
 Each MongoDB instance has multiple databases, and
each database can have multiple collections.
 When we store a document, we have to choose which
database and collection this document belongs in —for
example,
database.collection.insert(document), which is
usually represented as
db.collection.insert(document)
MongoDB
 MongoDB is a cross-platform, document oriented database that
provides high performance, high availability, and easy
scalability.
 MongoDB works on concept of collection and document.

Database
 Database is a physical container for collections.
 A single MongoDB server typically has multiple databases.
Collection
 Collection is a group of MongoDB documents.
 It is the equivalent of an RDBMS table.
 A collection exists within a single database. Collections do not
enforce a schema.
 Documents within a collection can have different fields.
 Typically, all documents in a collection are of similar or related
purpose.

Document
 A document is a set of key-value pairs.
 Documents have dynamic schema.
 Dynamic schema means that documents in the same collection
do not need to have the same set of fields or structure, and
common fields in a collection's documents may hold different types
of data.
MONGODB - DATATYPES
 String − This is the most commonly used datatype to
store the data. String in MongoDB must be UTF-8 valid.
 Integer − This type is used to store a numerical value.
Integer can be 32 bit or 64 bit depending upon your
server.
 Boolean − This type is used to store a boolean (true/
false) value.
 Double − This type is used to store floating point values.
 Arrays − This type is used to store arrays or list or
multiple values into one key.
 Timestamp − used to store a timestamp. This can be
handy for recording when a document has been modified
or added.
 Object − This datatype is used for embedded documents.
 Null − This type is used to store a Null value.
 Symbol − This datatype is used identically to a string; however,
it's generally reserved for languages that use a specific symbol
type.
 Date − This datatype is used to store the current date or time in
UNIX time format. You can specify your own date time by
creating object of Date and passing day, month, year into it.
 Object ID − This datatype is used to store the document’s ID.
 Binary data − This datatype is used to store binary data.
 Code − This datatype is used to store JavaScript code into the
document.
 Regular expression − This datatype is used to store regular
expression.
MONGODB COMMANDS
 Command to Start MongoDB
sudo service mongodb start

 Command to Stop MongoDB


sudo service mongodb stop

 Command to Restart MongoDB


sudo service mongodb restart

Command to use MongoDB


mongo
MongoDB - Create Database
1. use Command
 The command will create a new database if it doesn't exist,
otherwise it will return the existing database.
use DATABASE_NAME
2. db Command
 To check your currently selected database, use the
command
db >db
3. show dbs Command
If you want to check your databases list, use the command
show dbs.
>show dbs
4. insert command
>db.movie.insert({"name":"tutorials point"})
 In MongoDB default database is test.

 If you didn't create any database, then collections


will be stored in test database.
5. dropDatabase() Method
 MongoDB db.dropDatabase() command is used to
drop a existing database.
>db.dropDatabase()
 If you have not selected any database, then it will
delete default 'test' database
QUERY FEATURES
How to query for data in your collection

 The find() function provides the easiest way to retrieve data from
multiple documents within one of your collections.
> db.media.find( { "Author" : "Membrey, Peter" } )

 You can use the limit() function to specify the maximum number
of results returned.
 The following example returns only the first ten items in your
media collection:
> db.media.find().limit(10)
 The following example skips the first twenty documents in your
media collection:
> db.media.find().skip(20)
USING THE SORT, LIMIT, AND SKIP
FUNCTIONS
 You can use the sort function to sort the results returned
from a query.
 You can sort the results in ascending or descending order
using 1 or -1, respectively.
 The function itself is analogous to the ORDER BY
statement in SQL, and it uses the key’s name and sorting
method as criteria, as in this example:
> db.media.find().sort( { Title: 1 })
 This will sort the results based on the Title key’s value in
ascending order.
 This is the default sorting order when no parameters are
specified.
 You would add the -1 flag to sort in descending order.
RETRIEVING A SINGLE DOCUMENT
 If you want to receive only one result, however, querying for all
documents —which is what you generally do when executing a
find() function —would be a waste of CPU time and memory.
 For this case, you can use the findOne() function to retrieve a
single item from your collection.
 Overall, the result is identical to what occurs when you append
the limit(1) function.
 The syntax of the findOne() function is identical to the syntax of
the find() function:

> db.media.findOne()

 It’s generally advised that you use the findOne() function if you
expect only one result.
 count() function returns the number of documents in
specified collection.
>db.media.count() 2

Renaming a Collection
 Obviously, it might happen that you have a collection that
you named incorrectly, but you’ve already inserted some
data into it.
 This might make it troublesome to remove and read the
data again from scratch.
 Instead, you can use the renameCollection() function to
rename your existing collection
> db.media.renameCollection("newname") { "ok" : 1 }
 insert() function is used for inserting and update() is
used for modifying a document, remove() is used to
remove a document.
 To remove a single document from your collection,
you need to specify the criteria you’ll use to find the
document.
db.newname.remove( { "Title" : "Different Title" } )
 you can use the following snippet to remove all
documents from the newname library:
> db.newname.remove({ })
TRANSACTIONS
 Transactions, in the traditional RDBMS, mean that you can start
modifying the database with insert, update, or delete commands
over different tables and then decide if you want to keep the
changes or not by using commit or rollback.
 These constructs are generally not available in NoSQL solutions
—a write either succeeds or fails.
 Transactions at the single-document level are known as atomic
transactions.
 Transactions involving more than one operation are not possible,
although there are products such as RavenDB that do support
transactions across multiple operations.
 A finer control over the write can be achieved by using
WriteConcern parameter.
AVAILABILITY
 The CAP theorem dictates that we can have only two of
Consistency, Availability, and Partition Tolerance.
 Document databases try to improve on availability by replicating
data using the master-slave setup.
 The same data is available on multiple nodes and the clients can
get to the data even when the primary node is down.
 MongoDB implements replication, providing high availability
using replica sets.
 In a replica set, there are two or more nodes participating in an
asynchronous master-slave replication.
 The replica-set nodes elect the master, or primary, among
themselves.
 Assuming all the nodes have equal voting rights; users can
assign a priority —a number between 0 and 1000 —to a node.
 All requests go to the master node, and the data is replicated to
the slave nodes.
 If the master node goes down, the remaining nodes in the
replica set vote among themselves to elect a new master; all
future requests are routed to the new master, and the slave
nodes start getting data from the new master
 When the node that failed comes back online, it joins in as a
slave and catches up with the rest of the nodes by pulling all the
data it needs to get current.
• We have two nodes, mongo A and mongo B, running the MongoDB
database in the primary data-center, and mongo C in the secondary
datacenter.
• If we want nodes in the primary datacenter to be elected as primary
nodes, we can assign them a higher priority than the other nodes.
 When the primary node goes down, the driver talks to the new
primary elected by the replica set.
 Replica sets are generally used for
– Data redundancy
– Automated failover
– Read scaling
– Server maintenance without downtime
– Disaster recovery.

 Similar availability setups can be achieved with CouchDB,


RavenDB, Terrastore, and other products.
SCALING
 The idea of scaling is to add nodes or change data storage
without simply migrating the database to a bigger box.
 making application changes to handle more load;
 Scaling for heavy-read loads can be achieved by adding more
read slaves, so that all the reads can be directed to the slaves.
 Given a heavy-read application, with our 3-node replica-set
cluster, we can add more read capacity to the cluster as the read
load increases just by adding more slave nodes to the replica set
to execute reads with the slaveOk flag.
 Once the new node, mongo D, is started, it needs to be added to
the replica set.
 When a new node is added, it will sync up with the existing
nodes, join the replica set as secondary node, and start serving
read requests.
 An advantage of this setup is that we do not have to restart any
other nodes, and there is no downtime for the application either.
 When we want to scale for write, we can start sharding the data.
Sharding is similar to partitions in RDBMS.
 We can add more nodes to the cluster and increase the number of
writable nodes, enabling horizontal scaling for writes.
 When we add a new shard to this existing sharded cluster, the
data will now be balanced across shards.
SUITABLE USE CASES
1. Event Logging
 Applications have different event logging needs; within the
enterprise, there are many different applications that want to
log events.
 Document databases can store all these different types of events
and can act as a central data store for event storage.
 Events can be sharded by the name of the application where the
event originated or by the type of event such as order_processed
or customer_logged.
2. Content Management Systems
 Since document databases have no predefined schemas and
usually understand JSON documents, they work well in content
management systems or applications for publishing websites,
managing user comments, user registrations, profiles
3. Web Analytics or Real-Time Analytics
 Document databases can store data for real-time analytics;
since parts of the document can be updated, it’s very easy to
store page views or unique visitors, and new metrics can be
easily added without schema changes.

4. E-Commerce Applications
 E-commerce applications often need to have flexible schema for
products and orders, as well as the ability to evolve their data
models without expensive data migration
WHEN NOT TO USE
1. Complex Transactions Spanning Different Operations
 If you need to have atomic cross-document operations, then
document databases may not be for you.
 However, there are some document databases that do support
these kinds of operations, such as RavenDB.

2. Queries against Varying Aggregate Structure


 Flexible schema means that the database does not enforce any
restrictions on the schema.
 Since the data is saved as an aggregate, if the design of the
aggregate is constantly changing, you need to save the
aggregates at the lowest level of granularity —basically, you
need to normalize the data.
 In this scenario, document databases may not work.
USE CASES: REAL-TIME APPLICATIONS OF
DOCUMENT DATABASES
 Document databases store data in JSON-like documents,
making them flexible and ideal for real-time applications.
1. Content Management Systems (CMS) and Blogging
Platforms: Storing and retrieving articles, images, user
comments, and metadata dynamically.

2. E-Commerce and Product Catalogs: Managing large,


frequently changing product catalogs with different attributes.
 Example:
 Amazon & eBay: Use DynamoDB & MongoDB to store
product descriptions, pricing, reviews, and availability.
 Shopify: Uses document databases to handle dynamic
product variations (size, color, brand).
3. Real-Time Chat Applications & Messaging Systems:
Storing user messages, chat history, and metadata in a scalable
way.
 Example:
 WhatsApp: Use Firestore & DynamoDB to store
conversations in real-time.
 Facebook Messenger: Uses Cassandra & MongoDB for
message history and fast retrieval.

4. Personalization & Recommendation Systems: Providing


tailored recommendations based on user behavior.
 Example:
 Netflix & YouTube: Use MongoDB & Couchbase to store
personalized watchlists and recommendations.
 Spotify & Apple Music: Track listening habits and create
dynamic playlists using document stores.
5. IoT (Internet of Things) & Sensor Data Storage: Storing time-
series data from smart devices, sensors, and logs.
 Example:
 Smart Home Systems (Google Nest, Amazon Alexa): Store
device configurations and user preferences in MongoDB.

6. Healthcare & Electronic Medical Records (EMR): Storing


patient records, prescriptions, and treatment histories dynamically.
 Example:
 IBM Watson Health: Uses NoSQL for predictive analytics and
patient care recommendations.

7. Fraud Detection & Security Analytics: Storing


transaction logs, user behaviors, and anomaly patterns for
fraud detection.
 Ex:
 Banks & Payment Platforms (PayPal, Stripe, Visa): Use
document databases to detect fraudulent transactions in real
time.
8. Travel & Booking Systems: Managing flight, hotel, and
transportation bookings dynamically.
 Example:
 Expedia & Airbnb: Store user preferences, travel itineraries, and
booking details in MongoDB.
 Uber & Lyft: Store real-time ride data, driver availability, and
dynamic pricing in document databases.

9. Social Media & User-Generated Content: Storing user


posts, comments, likes, and interactions in a dynamic format.
 Example:
 Facebook, Twitter, Instagram: Use Cassandra & MongoDB to
store billions of posts, stories, and user interactions.

10. Gaming & Leaderboards: Managing player profiles, in-game purchases,


and live leaderboards.
 Example:
 Fortnite & PUBG: Store real-time player stats, game sessions, and
rewards in Firebase & MongoDB.

You might also like