Introduction to Cassandra
Feb’ 2017
Objectives
● Vertical vs Horizontal Scale
● Hardware Deployment Considerations
● ACID, BASE and CAP
● Key Features of a NoSQL Database
● NoSQL Database Categories
● Apache Cassandra
● Key Concepts, Data Structures and Algorithms
● Cassandra Cluster Ring
● Cassandra Write Path
● Cassandra Node Write Path
● Cassandra Read Path
● Cassandra Node Read Path
Vertical vs Horizontal Scaling
● Vertical Scaling - Vertical scaling is
also known as scaling up
● Horizontal Scaling - Horizontal scaling
is sometimes referred to as scale out.
New and emerging databases prefer to
scale horizontally essentially because:
● Capacity can be increased on the fly.
● Cost effective in comparison to vertical
scaling
● And in theory it is infinitely scalable
since adding nodes increases capacity
proportionally.
Hardware Deployment Considerations
• Shared Memory i.e. Traditional Deployment Architecture
• Shared Disk
• Shared Nothing
ACID, BASE & THE CAP Theorem
• ACID • BASE
– Atomicity – Basically Available
– Consistency – Soft State
– Isolation – Eventually Consistent
– Durability
• CAP
– Consistency
– Availability
– Partition Tolerance
Key Features of NoSQL Databases
• Based on distributed computing
• Commodity Hardware
• ACID, BASE and the CAP Theorem
• Provide a flexible schema
NoSQL Database Categories
• Key-Value databases
– Key value stores provide a simple form of storage that can only store pairs of
keys and values
– Riak, Redis, Amazon Dynamo DB, FoundationDB, MemcacheDB
• Document Databases
– Document Stores are an advanced form of a key value store where the value
part store is not a blob but a well known document format.
– The format of the document are generally XML, JSON, BSON
– Apache CouchDB, Couchbase, MarkLogic and MongoDB
• Column Family Databases
– Column family based (not to be be confused with column oriented) database
are again an evolution of the key value store where the value part contains a
collection of columns
– Apache Cassandra, HBASE and Hypertable
• Graph Databases
– A graph database is one which uses a graph structure to store data
– Neo4J
Apache Cassandra
• Distributed Storage System
• Runs on Commodity Hardware
• Fault Tolerant
• Linearly Scalable
• AID Support
– delivers atomicity, isolation and durability but this is not within the bounds of a
transaction
• Elastically Scalable
• Multi Data Center
Key Concepts, Data Structures and Algorithms
• Key Concepts • Data Structures
– Data Partitioning – Bloom Filters
– Consistent Hashing – Merkle Tree
– Data Replication – SSTable (Sorted String Table)
– Eventual Consistency – Write Back Cache
– Tunable Consistency – Memtable
– Consistency Level – Cassandra Keyspace (RDBMS
– Data Centre, Racks, Nodes Schema)
– Column Family
– Row Key
• Algorithms
– Snitches and Replication Strategies
– Gossip Protocol
Cassandra Cluster/Ring
Cassandra Write Path
Write Path Per Node
Cassandra Read Path
Read Path Per Node
Questions ??
Thank you.