NoSQL Databases (MongoDB-Cassandra)
NoSQL Databases (MongoDB-Cassandra)
Cassandra
Session One
CAP Theorem
There are three parameters to define any distributed systems. They are:
– Consistency: Which ensures that users can access same data at the same
time
– Availability: Every request receives a response about whether it was
successful or failed
– Partition Tolerance: The system continues to operate despite arbitrary
message loss or failure of part of the system
Definition: Any distributed system can achieve any two of them.
All these three parameters are vertices of Triangle. We have three sides of
triangle are CA, AP, CP.
CA -> RDBMS, Teradata, Greenplum etc.
AP -> Cassandra, Voldemart, DynamoDB, Raik, Couch DB etc
CP -> HBase, Big Table, Mongo DB, Hyper Table etc
NoSQL
NoSQL means Not only “Relational/SQL”.
A NoSQL database provides a simple, lightweight mechanism for storage
and retrieval of data that provides higher scalability and availability than
traditional RDBMS.
Horizontal Scalability/ Scale out.
Schema Free/ Flexible Schema.
High Write/Read throughput.
Multiple Data Models.
Different Interfaces like CLI, HQL, CQL, Language API, REST API etc.
Handles all varieties of data.
Programmer friendly.
NoSQL vs. SQL
SQL Databases NoSQL Databases
Scale Up or Vertical Scaling. Scale out or Horizontal Scaling.
Consistency, Availability. Consistency, Availability, Partition tolerance.
Single Data Model i.e Relational. Multiple Data Models i.e Columnar,
Document, Key-Value, Graph and many.
Single Query Language i.e SQL. Multiple Query Languages i.e Simple CLI,
HQL,CQL, REST, Thrift, DSLs.
Rigid Schema. Schema free/ Flexible Schema.
Joins are expensive. Free from Joins.
Good for Real time Querying i.e point queries Good for Real time Querying as well as Real
time Decisioning.
Scales up to a few Tera bytes. Scales up to Peta bytes.
Contd…
Good for Low data traffic. Good for high volumes of data traffic.
Complexity in managing Distributed Majority are Distributed in Nature. Very
databases i.e Adding/Removing machines is easy to add/remove Machines to the
so complex Existing clusters.
Good for non volatile data Good for volatile data.
Hard to implement i.e schema design, data Simple to implement.
integrity
Good for Transactions i.e OLTP. Good for Decisioning i.e OLAP.
Collection Table
Document Record/Tuple
CRUD Operations
Create databases
Create collections
Insert documents
Update documents
Delete documents
Drop collections
Drop databases
Create indexes
Drop indexes
Administration
Create users
Change database permissions
Dump data
Export data
Import data
Check load
Loading files to GridFS
Checking stats
Setting mongo cluster
Replication
Questions & Answers