0% found this document useful (0 votes)

35 views9 pages

Bda Unit12

NoSQL databases, short for 'not only SQL', are non-relational databases that store data in flexible, non-tabular formats, supporting various data models like document, key-value, column-family, and graph. They offer features such as schema flexibility, horizontal scalability, high performance, and are optimized for large-scale data processing. Examples of NoSQL databases include MongoDB, Redis, Cassandra, and Neo4j, each catering to different use cases and data structures.

Uploaded by

Ganesh Gaitonde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views9 pages

Bda Unit12

Uploaded by

Ganesh Gaitonde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

What is a NoSQL database?

The term NoSQL, short for “not only SQL,” refers to non-relational databases that store data in a non-
tabular format, rather than in rule-based, relational tables like relational databases do. NoSQL
databases use a flexible schema model that supports a wide variety of unstructured data such as
documents, key-value, wide columns, and graphs.

Features of NoSQL DB:

• Schema Flexibility: No predefined schema; allows dynamic and unstructured data storage.
• Scalability: Horizontal scaling (adding more servers) is easier than in traditional relational
databases.
• Data Models: Supports various models like key-value, document, column-family, and graph
databases.
• High Performance: Optimized for specific use cases like large-scale read and write
operations.
• Distributed Architecture: Data is often distributed across multiple servers, enhancing fault
tolerance and availability.
• Eventual Consistency: Prioritizes availability over strict consistency (CAP theorem).
• Replication and Sharding: Built-in mechanisms for data replication and partitioning.
• Flexible Transactions: Some NoSQL databases offer eventual or limited transactional
capabilities instead of ACID compliance.
• Big Data Compatibility: Ideal for processing large volumes of data.
• Open Source Options: Many NoSQL databases are open source (e.g., MongoDB, Cassandra).
• High Availability: Automatic failover and backup mechanisms.
• API-Based Access: Commonly accessed through REST APIs or proprietary protocols.

Types :

1. Document-Based Database ( Ex – MongoDB, CouchDB )

The document-based database is a nonrelational database. Instead of storing the data in rows and
columns (tables), it uses the documents to store the data in the database. A document database
stores data in JSON, BSON, or XML documents.

Documents can be stored and retrieved in a form that is much closer to the data objects used in
applications which means less translation is required to use these data in the applications. In the
Document database, the particular elements can be accessed by using the index value that is
assigned for faster querying.

Collections are the group of documents that store documents that have similar contents

Key features of documents database:

• Flexible schema: Documents in the database has a flexible schema. It means the documents
in the database need not be the same schema.

• Faster creation and maintenance: the creation of documents is easy and minimal
maintenance is required once we create the document.
• No foreign keys: There is no dynamic relationship between two documents so documents can
be independent of one another. So, there is no requirement for a foreign key in a document
database.

• Open formats: To build a document we use XML, JSON, and others.

2. Key-Value Stores ( Ex – Redis, Amazon DynamoDB )

A key-value store is a nonrelational database. The simplest form of a NoSQL database is a key-value
store. Every data element in the database is stored in key-value pairs. The data can be retrieved by
using a unique key allotted to each element in the database. The values can be simple data types like
strings and numbers or complex objects. A key-value store is like a relational database with only two
columns which is the key and the value.

Key features of the key-value store:

• Simplicity: Data retrieval is extremely fast due to direct key access.

• Scalability: Designed for horizontal scaling and distributed storage.

• Speed: Ideal for caching and real-time applications.

3. Column Oriented Databases ( Ex – Apache Cassandra, HBase)

A column-oriented database is a non-relational database that stores the data in columns instead of
rows. That means when we want to run analytics on a small number of columns, we can read those
columns directly without consuming memory with the unwanted data. Columnar databases are
designed to read data more efficiently and retrieve the data with greater speed. A columnar database
is used to store a large amount of data.

Key features of Columnar Oriented Database

• High Scalability: Supports distributed data processing.

• Compression: Columnar storage enables efficient data compression.

• Faster Query Performance: Best for analytical queries.

4. Graph-Based Databases ( Ex – Amazon Neptune, Neo4j)

Graph-based databases focus on the relationship between the elements. It stores the data in the form
of nodes in the database. The connections between the nodes are called links or relationships,
making them ideal for complex relationship-based queries.

• Data is represented as nodes (objects) and edges (connections).

• Fast graph traversal algorithms help retrieve relationships quickly.

• Used in scenarios where relationships are as important as the data itself.

Key features of Graph Database

• Relationship-Centric Storage: Perfect for social networks, fraud detection, recommendation

engines.

• Real-Time Query Processing: Queries return results almost instantly.

• Schema Flexibility: Easily adapts to evolving relationship structures

Aggregate data model

In NoSQL databases, aggregate data models are designed to store and retrieve related sets of data
that are often grouped together for efficient processing. Instead of using traditional relational
database schemas with tables and foreign keys, aggregate models in NoSQL databases focus on
encapsulating related entities into a single, self-contained unit, which is called an "aggregate."
This approach promotes data denormalization and typically results in faster read operations and
easier scaling, at the cost of more complexity in handling updates and potential data duplication.

Schema Less DB

In the context of NoSQL databases, a schema-less database refers to a type of database that does not
require or enforce a predefined schema for the data being stored. In other words, there is no strict
structure or format that the data must adhere to when being inserted into the database.
• Flexible Data Model:
o No fixed schema required before data insertion.
o Each record (or document) can have a different structure.
• Dynamic Fields:
o New fields can be added without altering existing data.
o Fields can vary across different records in the same collection.
• No Fixed Schema:
o Unlike traditional relational databases where the structure of data (tables, columns,
data types) must be defined beforehand, NoSQL schema-less databases allow data
to be stored without any predefined schema. The structure can be different for each
entry.

Types of Schema less Databases

1. Document Stores – Store semi-structured data as JSON or BSON (e.g., MongoDB).

2. Key-Value Stores – Map unique keys to simple values (e.g., Redis).

3. Column-Family Stores – Store data in flexible column-based structures (e.g., Cassandra).

4. Graph Databases – Represent relationships using nodes and edges (e.g., Neo4j).

5. Time-Series Databases – Manage time-stamped data efficiently (e.g., InfluxDB).

6. Object-Oriented Databases – Store data as objects with attributes and methods (e.g., db4o).

Materialized View

A materialized view in NoSQL is a precomputed, stored query result that is updated periodically or on-
demand. It is a snapshot of the data from a query that gets stored separately from the original data to
provide faster access to specific patterns or queries. In simple terms, a materialized view is a stored
result of a computation (such as an aggregation or complex filtering) that makes querying data more
efficient.

Working

o Pre-computation: A materialized view stores the result of a query (like aggregation or filtering) so
that you do not have to compute it again each time.
o Faster Reads: The stored result helps reduce the computation time for frequent or expensive
queries.
o Data Refresh: The materialized view may be refreshed automatically or manually when the
underlying data changes. How and when it is updated depends on the NoSQL database being
used.

Characteristics

1. Optimized Read Operations: By storing the result of complex queries, it allows for faster read
operations, especially when dealing with large datasets.

2. Data Consistency: Since the materialized view is a snapshot of the data at a certain point in time,
maintaining consistency is important. Some NoSQL databases offer automatic refreshes, while
others require manual refreshing of the view.
3. Space Usage: Storing materialized views requires additional storage as they kеер copies of data.

4. Use Case: Materialized views are used in NoSQL databases when you need to run complex queries
frequently, like filtering, aggregations, or joining data.

MongoDB

MongoDB is a widely used NoSQL database that stores data in a document-oriented format. It is a
non-relational database, meaning it doesn't follow the traditional table structure (like SQL databases)
and instead organizes data in documents that are stored in collections. MongoDB is designed to be
flexible, scalable, and high-performing, making it suitable for modern applications that require fast
read/write operations, handling large datasets, and evolving data structures.

Features

Document-Oriented Storage: In MongoDB, data is stored as documents in a format known as BSON

(Binary JSON), which is similar to JSON. Each document is a set of key-value pairs (fields and values),
and it can contain arrays, nested documents, or complex structures.

This document structure allows for more flexibility compared to the rigid row-column structure of SQL
databases.

Collections: MongoDB organizes documents into types. collections, which are equivalent to tables in
a relational database. However, unlike tables in SQL databases, collections are schema-less,
meaning each document in the collection can have different fields and data types.

Schema less : MongoDB is schema-less, meaning it does not require a predefined structure for
documents in a collection. This allows for rapid changes in the structure of your data without requiring
database migrations, making it highly flexible when working with evolving or unstructured data.

Scalability: One of MongoDB's core strengths is its ability to scale horizontally. It supports sharding,
which allows data to be distributed across multiple machines or servers to handle large-scale
applications. This feature makes MongoDB a good fit for applications that require high availability,
fault tolerance, and can handle large amounts of traffic and data.

Indexing: MongoDB supports indexing, which improves query performance. Indexes can be created
on any field, and it also supports advanced indexing types such as geospatial indexes and text
indexes.

Replication: MongoDB supports replication, which means it can copy data from one server to others.
This is used to ensure data availability and fault tolerance.
ACID Transactions: MongoDB supports ACID transactions (Atomicity, Consistency, Isolation,
Durability) starting from version 4.0, allowing multiple operations across multiple documents to be
grouped into a single transaction.

MongoDB Use Cases (in the context of NoSQL):

1. Real-Time Applications: MongoDB is great for applications that need real-time data access, such
as social media platforms, messaging systems, and analytics tools.

2. Content Management Systems: Its flexibility and scalability make MongoDB a popular choice for
content management systems that manage large volumes of varying data (such as blogs, articles, or
media files).

3. Mobile and Web Apps: MongoDB is widely used in web and mobile applications due to its ability to
handle large amounts of data with flexibility.

4. Big Data: MongoDB can handle big data workloads effectively and can scale across multiple
machines to accommodate large datasets.

MongoDB vs SQL (Relational Databases): . . .

Data Structure: MongoDB stores data as documents (BSON format), while relational databases use
rows and columns in tables.

Schema: MongoDB is schema-less, whereas relational databases require a predefined schema with a
fixed structure.

Scaling: MongoDB supports horizontal scaling (sharding), while SQL databases typically scale
vertically (adding more power to a single machine).

Joins: MongoDB does not natively support joins like SQL databases, but it provides the $lookup
operator to join collections in a way that can be less efficient than SQL joins.

Some popular open-source tools for big data analysis

1. Apache Hadoop - A framework that allows for the distributed processing of large data sets across
clusters of computers. It includes Hadoop Distributed File System (HDFS) and the MapReduce
processing model.

2. Apache Spark - A fast and general-purpose cluster computing system, Spark is used for big data
processing and analytics. It supports real-time data streaming, machine learning, and graph
processing.

3. Apache Flink - A stream-processing framework that supports high-throughput, low-latency, and

fault-tolerant processing of data.
4.Apache Kafka - A distributed event streaming platform used for building real-time data pipelines
and streaming applications.

5. Apache Hive - A data warehouse system built on top of Hadoop, enabling users to query and
manage large datasets using a SQL-like language.

6. Apache HBase - A NoSQL database that runs on top of HDFS, designed to handle large amounts of
sparse data.

7. Elasticsearch - A search and analytics engine used to index and search large datasets in real time.
8. D3.js - A JavaScript library for visualizing data through interactive charts and graphs, often used for
big data analysis in web applications.

9. Jupyter Notebooks – A web-based tool for interactive computing that is commonly used for big
data analysis, particularly with Python libraries like Pandas, NumPy, and Matplotlib.

MapReduce in Hadoop

MapReduce in Hadoop is a programming model and processing technique used to process and
generate large datasets in a distributed computing environment. It splits the task into two phases:

1. Map Phase: In this phase, the input data is split into smaller chunks, which are processed by the
"map" function. The map function processes the data and outputs key-value pairs. For example, in a
word count task, the input text is split into words, and each word is assigned a key (the word itself)
with a value of 1.

2. Reduce Phase: The reduce phase takes the output from the map phase (key-value pairs) and
processes them. It groups the pairs by their key and performs an aggregation operation, such as
summing the values for each key. Continuing the word count example, the reduce function will sum
up the counts for each word and output the final result.

o . Map: Processes input data, producing intermediate key-value pairs.

o Reduce: Aggregates and processes the intermediate data based on keys, producing the final
result.
MapReduce runs on the Hadoop Distributed File System (HDFS), ensuring scalability and data
redundancy for efficient processing.
How It Helps Process Large-Scale Data in NoSQL

• Parallel Processing: Enables processing massive datasets by dividing workloads across

multiple machines.

• Scalability: Can handle petabytes of data efficiently.

• Fault Tolerance: Automatically recovers from hardware failures.

• Schema Flexibility: Works well with NoSQL databases, which have a flexible schema.

• Optimized for Read-Heavy Workloads: Suitable for analytics, indexing, and aggregation
operations.

Example :
MapReduce Partition and Combining

1. Partitioning (Shuffle & Sort)

o Distributes key-value pairs to reducers based on keys.

o Uses a function like hash(key) % num_of_reducers.

2.Combining (Local Aggregation)

o Optional step to reduce data size before sending to reducers.

o Acts like a mini reducer on the Mapper side.

o Example:
Before: (Apple, 1), (Apple, 1)
After: (Apple, 2)

Flow
Map → Combiner (Optional) → Partition → Reduce

Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
NoSQL Databases: Types, Features, and CAP Theorem
No ratings yet
NoSQL Databases: Types, Features, and CAP Theorem
112 pages
Understanding NoSQL Databases: Features & Types
No ratings yet
Understanding NoSQL Databases: Features & Types
12 pages
Module 1 Introduction
No ratings yet
Module 1 Introduction
9 pages
Understanding NoSQL Databases Explained
No ratings yet
Understanding NoSQL Databases Explained
33 pages
NoSQL for Developers and IT Pros
No ratings yet
NoSQL for Developers and IT Pros
3 pages
Introduction to NoSQL Databases
No ratings yet
Introduction to NoSQL Databases
11 pages
Key Features and Types of NoSQL Databases
No ratings yet
Key Features and Types of NoSQL Databases
7 pages
Understanding NoSQL Databases
No ratings yet
Understanding NoSQL Databases
31 pages
Overview of NoSQL Databases
No ratings yet
Overview of NoSQL Databases
20 pages
Overview of NoSQL Data Management
No ratings yet
Overview of NoSQL Data Management
33 pages
Unit III (FSWD)
No ratings yet
Unit III (FSWD)
27 pages
Chapter14 BigData&NoSQLDatabases
No ratings yet
Chapter14 BigData&NoSQLDatabases
39 pages
Understanding NoSQL Databases and Types
No ratings yet
Understanding NoSQL Databases and Types
65 pages
No SQL
No ratings yet
No SQL
3 pages
Understanding NoSQL Databases Explained
No ratings yet
Understanding NoSQL Databases Explained
11 pages
Full Stack-Unit-Iii
No ratings yet
Full Stack-Unit-Iii
56 pages
Overview of NoSQL Databases and Features
No ratings yet
Overview of NoSQL Databases and Features
25 pages
NoSQL Database Comprehensive Report
No ratings yet
NoSQL Database Comprehensive Report
75 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
47 pages
Module 3 Bigdata Analytics
No ratings yet
Module 3 Bigdata Analytics
19 pages
Understanding NoSQL Databases and Types
No ratings yet
Understanding NoSQL Databases and Types
31 pages
NoSQL Complete QB
No ratings yet
NoSQL Complete QB
43 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
NoSQL Tutorial - New
No ratings yet
NoSQL Tutorial - New
10 pages
Big Data Unit 3
No ratings yet
Big Data Unit 3
374 pages
CH.5 NOSQL Database For Business Applications
No ratings yet
CH.5 NOSQL Database For Business Applications
21 pages
Understanding NoSQL Databases
No ratings yet
Understanding NoSQL Databases
31 pages
Understanding NoSQL Databases Explained
No ratings yet
Understanding NoSQL Databases Explained
18 pages
Unit II - BIG DATA ANALYTICS
No ratings yet
Unit II - BIG DATA ANALYTICS
11 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
Nosql, Mongodb
No ratings yet
Nosql, Mongodb
18 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
NOSQL
No ratings yet
NOSQL
15 pages
DBMS Lecture13 NoSQL
No ratings yet
DBMS Lecture13 NoSQL
31 pages
Brainstorming Mapa Mental Esquema Apuntes Doodle Colorido
No ratings yet
Brainstorming Mapa Mental Esquema Apuntes Doodle Colorido
1 page
NoSQL Database Design in Cloud Computing
No ratings yet
NoSQL Database Design in Cloud Computing
44 pages
Introduction to NoSQL Databases
No ratings yet
Introduction to NoSQL Databases
26 pages
U5 Final
No ratings yet
U5 Final
45 pages
NGD Chap1
No ratings yet
NGD Chap1
22 pages
Nosql
No ratings yet
Nosql
6 pages
Understanding NoSQL Databases Explained
No ratings yet
Understanding NoSQL Databases Explained
4 pages
01 NSQL
No ratings yet
01 NSQL
5 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
38 pages
DB 5
No ratings yet
DB 5
39 pages
Big Data Unit-Ii Notes
No ratings yet
Big Data Unit-Ii Notes
7 pages
Unit 2
No ratings yet
Unit 2
25 pages
Nosql Database
No ratings yet
Nosql Database
19 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Non-Relational Databases (NoSQL)
No ratings yet
Non-Relational Databases (NoSQL)
15 pages
NoSQL Databases Overview
No ratings yet
NoSQL Databases Overview
8 pages
Overview of NoSQL Database Types
No ratings yet
Overview of NoSQL Database Types
21 pages
BDT Unit-Ii
No ratings yet
BDT Unit-Ii
13 pages
No SQL - Types, CAP Theorem
No ratings yet
No SQL - Types, CAP Theorem
12 pages
Understanding NoSQL Databases Explained
No ratings yet
Understanding NoSQL Databases Explained
14 pages
Unit No - 6 Bda
No ratings yet
Unit No - 6 Bda
16 pages
No SQL
No ratings yet
No SQL
24 pages
Understanding NoSQL Databases
No ratings yet
Understanding NoSQL Databases
24 pages
Mongo DB
No ratings yet
Mongo DB
28 pages
DF100 - 04 - Storage and Retrieval With Arrays
No ratings yet
DF100 - 04 - Storage and Retrieval With Arrays
41 pages
Paper 33-A Comparative Study of Databases With Different Methods of Internal Data Management
No ratings yet
Paper 33-A Comparative Study of Databases With Different Methods of Internal Data Management
6 pages
Minor-Project JAN-JUNE 2025
No ratings yet
Minor-Project JAN-JUNE 2025
56 pages
Nonsql-Database Note
No ratings yet
Nonsql-Database Note
24 pages
NoSQL Commands: MongoDB, Cassandra, Hive
No ratings yet
NoSQL Commands: MongoDB, Cassandra, Hive
48 pages
Jamb Test Manual
No ratings yet
Jamb Test Manual
14 pages
MongoDB Database Systems Guide
No ratings yet
MongoDB Database Systems Guide
23 pages
Mongodb Multi-Document Acid Transactions
No ratings yet
Mongodb Multi-Document Acid Transactions
45 pages
Cloud DBMS Market Insights
No ratings yet
Cloud DBMS Market Insights
40 pages
MongoDB Manual
No ratings yet
MongoDB Manual
944 pages
IBM Maximo 9.0 Compatibility Report
No ratings yet
IBM Maximo 9.0 Compatibility Report
80 pages
Mastering MongoDB Your Guide To Next Gen Databases
No ratings yet
Mastering MongoDB Your Guide To Next Gen Databases
8 pages
MongoDB Installation and Configuration Guide
No ratings yet
MongoDB Installation and Configuration Guide
8 pages
Enable Greater Data Reduction and Storage Performance With Dell EMC PowerStore 7000 Series Storage Arrays
No ratings yet
Enable Greater Data Reduction and Storage Performance With Dell EMC PowerStore 7000 Series Storage Arrays
32 pages
Full Stack Development
No ratings yet
Full Stack Development
6 pages
MERN Stack Interview Questions
No ratings yet
MERN Stack Interview Questions
15 pages
MERN Stack E-Commerce Guide
No ratings yet
MERN Stack E-Commerce Guide
78 pages
SI Associate Certification-FY2026
No ratings yet
SI Associate Certification-FY2026
45 pages
Nosql Practice Questions
No ratings yet
Nosql Practice Questions
2 pages
MEAN Stack vs LAMP: A Comparison
No ratings yet
MEAN Stack vs LAMP: A Comparison
39 pages
Rentease Project PDF
No ratings yet
Rentease Project PDF
45 pages
2022-23-BDA-LAB Manual
No ratings yet
2022-23-BDA-LAB Manual
59 pages
Can't Connect To MongoDB With Authentication Enabled - Stack Overflow PDF
No ratings yet
Can't Connect To MongoDB With Authentication Enabled - Stack Overflow PDF
1 page
CHAPTER 6 MongoDB
No ratings yet
CHAPTER 6 MongoDB
53 pages
Tharun Chunchu Resume Sjsu 13600
No ratings yet
Tharun Chunchu Resume Sjsu 13600
1 page
U18cst5203 Nosql Database
No ratings yet
U18cst5203 Nosql Database
2 pages
Full Stack Developer Resume Summary
No ratings yet
Full Stack Developer Resume Summary
2 pages
MongoDB Shell Cheat Sheet
No ratings yet
MongoDB Shell Cheat Sheet
3 pages
PyMongo Basics: Inserting JSON Data
No ratings yet
PyMongo Basics: Inserting JSON Data
7 pages

Bda Unit12

Uploaded by

Bda Unit12

Uploaded by

What is a NoSQL database?

Features of NoSQL DB:

1. Document-Based Database ( Ex – MongoDB, CouchDB )

Key features of documents database:

• Open formats: To build a document we use XML, JSON, and others.

2. Key-Value Stores ( Ex – Redis, Amazon DynamoDB )

Key features of the key-value store:

• Simplicity: Data retrieval is extremely fast due to direct key access.

• Scalability: Designed for horizontal scaling and distributed storage.

• Speed: Ideal for caching and real-time applications.

3. Column Oriented Databases ( Ex – Apache Cassandra, HBase)

Key features of Columnar Oriented Database

• High Scalability: Supports distributed data processing.

• Compression: Columnar storage enables efficient data compression.

• Faster Query Performance: Best for analytical queries.

• Data is represented as nodes (objects) and edges (connections).

• Fast graph traversal algorithms help retrieve relationships quickly.

• Used in scenarios where relationships are as important as the data itself.

Key features of Graph Database

• Relationship-Centric Storage: Perfect for social networks, fraud detection, recommendation

• Real-Time Query Processing: Queries return results almost instantly.

• Schema Flexibility: Easily adapts to evolving relationship structures

Aggregate data model

Types of Schema less Databases

1. Document Stores – Store semi-structured data as JSON or BSON (e.g., MongoDB).

2. Key-Value Stores – Map unique keys to simple values (e.g., Redis).

3. Column-Family Stores – Store data in flexible column-based structures (e.g., Cassandra).

5. Time-Series Databases – Manage time-stamped data efficiently (e.g., InfluxDB).

Document-Oriented Storage: In MongoDB, data is stored as documents in a format known as BSON

MongoDB Use Cases (in the context of NoSQL):

MongoDB vs SQL (Relational Databases): . . .

Some popular open-source tools for big data analysis

3. Apache Flink - A stream-processing framework that supports high-throughput, low-latency, and

o . Map: Processes input data, producing intermediate key-value pairs.

• Parallel Processing: Enables processing massive datasets by dividing workloads across

• Scalability: Can handle petabytes of data efficiently.

• Fault Tolerance: Automatically recovers from hardware failures.

1. Partitioning (Shuffle & Sort)

o Distributes key-value pairs to reducers based on keys.

o Uses a function like hash(key) % num_of_reducers.

2.Combining (Local Aggregation)

o Optional step to reduce data size before sending to reducers.

o Acts like a mini reducer on the Mapper side.

You might also like