0% found this document useful (0 votes)
2 views

DBMS - Unit 6 (Advances in Databases)

The document provides an overview of Big Data, its handling steps, and the limitations of traditional RDBMS systems. It discusses the need for NoSQL databases, the CAP theorem, and various NoSQL data models and processing tools, highlighting MongoDB's advantages over RDBMS. Practical applications of NoSQL are also mentioned, including social networking sites and e-commerce platforms.

Uploaded by

sohel5101shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DBMS - Unit 6 (Advances in Databases)

The document provides an overview of Big Data, its handling steps, and the limitations of traditional RDBMS systems. It discusses the need for NoSQL databases, the CAP theorem, and various NoSQL data models and processing tools, highlighting MongoDB's advantages over RDBMS. Practical applications of NoSQL are also mentioned, including social networking sites and e-commerce platforms.

Uploaded by

sohel5101shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Contents

What is BIG DATA


Handling Steps of Big Data
Dimensions (V’s) of Big Data
Cons of RDBMS
Need of Unstructured Data
NoSQL
CAP Theorem
NoSQL Data Models and Processing Tools
MongoDB Vs RDBMS
Practical Examples of NoSQL
What is BIG DATA
 “A massive volume of both structured and unstructured data that is
so large that it's difÏcult to process with traditional database and
software techniques.” [1]
 Web sites with 300+ million unique visitors/month.

 Criteria for considering data as big data


Size
Type of data
Latency
Data complexity

 Digital data from sensors used to gather climate information

 cell phone GPS signals

 Posts to social networking sites


Handling Big Data
Storage
Processing
Analysis
Security
3 Dimensions of Big Data [2]
Cons of RDBMS
 Rigid schema design.
 Harder to scale.
 Replication.
 Join across multiple nodes is hard
 Handling data growth using RDBMS is difÏcult
 Need for a DBA.
 Object Relational Mapping doesn't work quite well.
 Only structured database like table form is handled
 ACID transaction
 Hence slower processing
Need of unstructured data
 Need of databases which are able to store and process big data
effectively.
 demand for high performance when reading and writing.
 high concurrency applications.
 Easy to expand
 Big data analysis
 High scalability
 Data format.
 Manageability.
NoSQL (continued..) [2]

 Stands for Not Only SQL


 Class of non-relational data storage systems
 Usually do not require a fixed table
 Scales well for both reads and writes
 BASE property
 Auto - Sharding
 Supporting mass storage.
 Flexible schema and data types.
 Fast key value look ups.
 Easy maintenance.
 Large scalability.
CAP Theorem
 Also known as Brewer’s Theorem by Prof. Eric Brewer, published in
2000 at University of Berkeley. [2]
 “Of three properties of a shared data system: data consistency, system availability and tolerance to network
partitions, only two can be achieved at any given moment.” [2]

 NoSQL database provides BASE property.

 Consistency - all nodes see the same data at the same time

 Strict Consistency – RDBMS.


 Tunable Consistency – Cassandra.
 Eventual Consistency – Amazon Dynamo
 Availability
 Partition Tolerance
 Weaker consistency (Eventual), Best effort, Simple and fast, Optimistic.
BASE Properties of CAP theorem
Basically available:
Nodes in the a distributed
environment can go down, but
the whole system shouldn’t be
affected.

Soft State (scalable):


The state of the system and
data changes over time.

Eventual Consistency:
Given enough time, data will be
consistent across the distributed
system.
NoSQL Data Models
 Key-value type (Redis)
value corresponds to a Key.
 Column-based (Cassandra)
database using Table. more suitable application on aggregation and
data warehouse.
 Document-type(MongoDB)
No table structure is used.
 Graph-based (Neo4J)
store an information about networks.
NoSQL Data Processing Tools
Key-value databases- Redis (CP)
o The maximum of value limit to 1 GB.
o suitable for providing high performance computing to small amount of
data.
o main drawback is that capacity of the database is limited by physical
memory.
o Support sql queries.
o Simple values or data structures by keys
Column-oriented database-Cassandra
o Multi-datacenter replication
o Support for map/reduce, good for analytics, data warehousing
o Tunable consistency & strong availability and partition tolerance
(AP)
o No single point of failure
o Probably the easiest of this list to manage in big/growing clusters
 Fact reading from database
Document database- MongoDB
Sophisticated
General Support query
Rich data
complex data language
Purpose model
types reduceable to
SQL

Simple to
Easy to Easy mapping
setup and
to object High-speed
Use oriented code
manage

Dynamically
Fast & open source and
no cost to use
Auto-sharding add / remove
Scalable download built in capacity with no
downtime
MongoDB is easy to use

MySQL MongoDB

Select *from emp; db.emp.find( {} );

Create table log(<col1>


db.createCollection("log",
size,<col2> size);
{ capped : true, size : 5242880,
max : 5000 } );
Insert into products
values(“book”,40);
db.products.save({ item: "book",
qty: 40 });
Schema Free
 MongoDB does not need any pre-defined data schema
[5]

 Every document could have different data!

{name: “will”, {name: “jeff”, {name: “brendan”,


eyes: “blue”, eyes: “blue”, aliases: [“el
birthplace: “NY”, loc: [40.7, 73.4], diablo”]}
aliases: [“bill”, “la boss: “ben”}
ciacco”],
{name: “matt”,
loc: [32.7, 63.4],
pizza: “DiGiorno”,
boss: ”ben”}
{name: “ben”, height: 72,
hat: ”yes”} loc: [44.6, 71.3]}
NoSQL is
popular for MongoDB
10gen is the
development & makes it easy
company behind
deployment of to code, scale,
MongoDB
distributed and operate
system NoSQL.
applications .
SQL Vs MongoDB
SQL Mongodb

Database Database

Table Collection

Row JSON document or BSON document

Column Field

table joins embedded documents and linking

primary key Specify any unique column as primary


key

Aggregation (e.g. group by) aggregation framework


Sharding with mongodb
Practical examples of NoSQL
Social networking sites
Session Store
User Profile Information
Content and Metadata store
Mobile Application
Online shopping sites
E-commerce
Ad-targeting

You might also like