CASSANDRASUMMIT2013
Jonathan Ellis | DataStax CTO | Project Chair, Apache Cassandra
Jul-09 May-10 Feb-11 Dec-11 Oct-12 Jul-13
0.1 0.3 0.6 0.7 1.0 1.2
...
2.0
DSE
Five Years of Cassandra
Jul-08
Core Values
0
20000
40000
60000
80000
0 2 4 6 8 10 12
Cassandra HBase VoltDB Redis MySQL
*Massive scalability
*High performance
*Reliabilty / Availability
VLDB Benchmark (RWS)
0
20000
40000
60000
80000
0 2 4 6 8 10 12
Cassandra HBase VoltDB Redis MySQL
NUMBER OF NODES
THROUGHPUT(OPS/SEC)
CASSANDRA
Endpoint Benchmark (RW)
0
8750
17500
26250
35000
1 2 4 8 16 32
Cassandra HBase MongoDB
CASSANDRA
Vox Populi
#Cassandra13
*Massive scalability
*High performance
*Reliabilty / Availability
*Ease of use
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int
);
CREATE INDEX ON users(state);
SELECT * FROM users
WHERE state=‘Texas’
AND birth_date > 1950;
New Core Value
CQL is working
"Coming from a relational database background we found
the transition to Cassandra to be very straightforward. There are a
few simple key concepts one must grasp at first but ever since it's
been smooth sailing for us."
Boris Wolf, Comcast
*Key concepts?
*The next Top Data Model (Tomorrow, 11:00, Festival)
*The State of CQL (Tomorrow, 3:10, Marina)
1.2 for Developers
*CQL3
Thrift compatibility
Collections
Data dictionary
Auth support
Hadoop support
Native drivers
*Tracing
*Atomic batches
CQL/Thrift compatibility
*http://www.datastax.com/dev/blog/cql3-for-cassandra-experts
*http://www.datastax.com/dev/blog/thrift-to-cql3
*http://www.datastax.com/dev/blog/does-cql-support-dynamic-
columns-wide-rows
*TLDR: Yes
Collections
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int
);
Collections
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int
);
CREATE TABLE users_addresses (
user_id uuid REFERENCES users,
email text
);
SELECT *
FROM users NATURAL JOIN users_addresses;
Collections
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int
);
CREATE TABLE users_addresses (
user_id uuid REFERENCES users,
email text
);
SELECT *
FROM users NATURAL JOIN users_addresses;X
Collections
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int,
email_addresses set<text>
);
Collections
UPDATE users
SET email_addresses = email_addresses +
{‘jbellis@gmail.com’, ‘jbellis@datastax.com’};
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int,
email_addresses set<text>
);
Data Dictionary
cqlsh:system> use system;
cqlsh:system> select columnfamily_name from schema_columnfamilies
where keyspace_name = 'system';
columnfamily_name
-----------------------
batchlog
hints
local
peer_events
peers
schema_columnfamilies
schema_columns
schema_keyspaces
Authentication
[cassandra.yaml]
authenticator: PasswordAuthenticator
# DSE offers KerberosAuthenticator as well
Authentication
[cassandra.yaml]
authenticator: PasswordAuthenticator
# DSE offers KerberosAuthenticator as well
CREATE USER robin WITH PASSWORD 'manager' SUPERUSER;
ALTER USER cassandra WITH PASSWORD 'newpassword';
LIST USERS;
DROP USER cassandra;
Authorization
[cassandra.yaml]
authorizer: CassandraAuthorizer
GRANT select ON audit TO jonathan;
GRANT modify ON users TO robin;
GRANT all ON ALL KEYSPACES TO lara;
Native Drivers
*CQL native protocol: efficient, lightweight, asynchronous
*Java (GA): https://github.com/datastax/java-driver
*.NET (Beta): https://github.com/datastax/csharp-driver
*Coming soon: Python, PHP, Ruby
*Java and .NET Client Drivers (Tomorrow, 4:10, Marina)
Tracing
cqlsh:foo> INSERT INTO bar (i, j) VALUES (6, 2);
Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9
activity | timestamp | source | source_elapsed
-------------------------------------+--------------+-----------+----------------
Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540
Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779
Message received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63
Applying mutation | 00:02:37,016 | 127.0.0.2 | 220
Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250
Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277
Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378
Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710
Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888
Message received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334
Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550
Tracing an Antipattern
CREATE TABLE queues (
id text,
created_at timeuuid,
value blob,
PRIMARY KEY (id, created_at)
);
id created_at value
myqueue 3092e86f 9b0450d30de9
myqueue 0867f47c fc7aee5f6a66
myqueue 5fc74be0 668fdb3a2196
Tracing an Antipattern
CREATE TABLE queues (
id text,
created_at timeuuid,
value blob,
PRIMARY KEY (id, created_at)
);
id created_at value
myqueue 3092e86f 9b0450d30de9
myqueue 0867f47c fc7aee5f6a66
myqueue 5fc74be0 668fdb3a2196
Tracing an Antipattern
CREATE TABLE queues (
id text,
created_at timeuuid,
value blob,
PRIMARY KEY (id, created_at)
);
id created_at value
myqueue 3092e86f 9b0450d30de9
myqueue 0867f47c fc7aee5f6a66
myqueue 5fc74be0 668fdb3a2196
Tracing an Antipattern
CREATE TABLE queues (
id text,
created_at timeuuid,
value blob,
PRIMARY KEY (id, created_at)
);
id created_at value
myqueue 3092e86f 9b0450d30de9
myqueue 0867f47c fc7aee5f6a66
myqueue 5fc74be0 668fdb3a2196
10000 events, 9999 dequeued
cqlsh:foo> SELECT FROM queues WHERE id = 'myqueue' ORDER BY created_at LIMIT 1;
Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9
activity | timestamp | source | source_elapsed
------------------------------------------+--------------+-----------+---------------
execute_cql3_query | 19:31:05,650 | 127.0.0.1 | 0
Sending message to /127.0.0.3 | 19:31:05,651 | 127.0.0.1 | 541
Message received from /127.0.0.1 | 19:31:05,651 | 127.0.0.3 | 39
Executing single-partition query | 19:31:05,652 | 127.0.0.3 | 943
Acquiring sstable references | 19:31:05,652 | 127.0.0.3 | 973
Merging memtable contents | 19:31:05,652 | 127.0.0.3 | 1020
Merging data from memtables and sstables | 19:31:05,652 | 127.0.0.3 | 1081
Read 1 live cells and 19998 tombstoned | 19:31:05,686 | 127.0.0.3 | 35072
Enqueuing response to /127.0.0.1 | 19:31:05,687 | 127.0.0.3 | 35220
Sending message to /127.0.0.1 | 19:31:05,687 | 127.0.0.3 | 35314
Message received from /127.0.0.3 | 19:31:05,687 | 127.0.0.1 | 36908
Processing response from /127.0.0.3 | 19:31:05,688 | 127.0.0.1 | 37650
Request complete | 19:31:05,688 | 127.0.0.1 | 38047
1.2 for Operators
*Concurrent CREATE TABLE
*Virtual nodes
*“Fat node” support (5-10TB)
*JBOD improvements
Off-heap bloom filters, compression metadata
Improved compaction throttle
Parallel leveled compaction
Memory Usage
Java Heap
Off-Heap
Not managed by GC
JVM
Java Process
Native Memory
On-Heap
Managed by GC
Memory
Disk
Read Path (per SSTable)
Bloom
filter
Memory
Disk
Read Path (per SSTable)
Bloom
filter
Memory
Disk
Partition
key cache
Read Path (per SSTable)
Bloom
filter
Memory
Disk
Partition
key cache
Partition
summary
0X...
0X...
0X...
Read Path (per SSTable)
Bloom
filter
Memory
Disk 0X...
0X...
0X...
0X...
Partition
index
Partition
key cache
Partition
summary
0X...
0X...
0X...
Read Path (per SSTable)
Bloom
filter
Memory
Disk 0X...
0X...
0X...
0X...
Partition
index
Compression
offsets
Partition
key cache
Partition
summary
0X...
0X...
0X...
Read Path (per SSTable)
Bloom
filter
Memory
Disk 0X...
0X...
0X...
0X...
Partition
index
Data
Compression
offsets
Partition
key cache
Partition
summary
0X...
0X...
0X...
Read Path (per SSTable)
Off Heap in 1.2+
*Partition key bloom filter
1-2GB per billion partitions
Data
Partition
summary
0X...
0X...
0X...
Bloom
filter
0X...
0X...
0X...
0X...
Partition
index
Compression
offsets
Partition
key cacheMemory
Disk
Off Heap in 1.2+
*Compression metadata
~1-3GB per TB compressed
Data
Partition
summary
0X...
0X...
0X...
Bloom
filter
0X...
0X...
0X...
0X...
Partition
index
Compression
offsets
Partition
key cacheMemory
Disk
Not off Heap until 2.0
*Partition index summary
(Size cut in ~half in 1.2.5+)
Data
Partition
summary
0X...
0X...
0X...
Bloom
filter
0X...
0X...
0X...
0X...
Partition
index
Compression
offsets
Partition
key cacheMemory
Disk
Throttling on partition
boundaries
Throttling using a
constant RateLimiter
10000 Rows
Time
MB/s
1000
Rows10000 Rows
Time
MB/s
Compaction Throttling
1000
Rows
1000
Rows
1000
Rows
DSE 3.1
*Cassandra 1.2 shipping in
DataStax Enterprise 3.1 on
June 30
*Updated with CQL and
composite column
support for Hive and Solr
*Includes Solr 4.3
DataStax DevCenter
Cassandra 2.0
Removed in 2.0
#CASSANDRA13
Removed in 2.0
Removed in 2.0
Removed in 2.0
*Token range bisection on bootstrap
Removed in 2.0
*Token range bisection on bootstrap
*Supercolumns (only internally)
Removed in 2.0
*Token range bisection on bootstrap
*Supercolumns (only internally)
public List<ColumnOrSuperColumn> get_slice(...)
Removed in 2.0
*Token range bisection on bootstrap
*Supercolumns (only internally)
public List<ColumnOrSuperColumn> get_slice(...)
*Disk compatibility for < 1.2.5
Removed in 2.0
*Token range bisection on bootstrap
*Supercolumns (only internally)
public List<ColumnOrSuperColumn> get_slice(...)
*Disk compatibility for < 1.2.5
*Network compatibility for < 1.2
New in 2.0
*CAS (Compare-and-set = lightweight transactions)
*Eager retries
*Improved compaction
*Triggers (experimental)
*CQL cursors
CAS: The Problem
SELECT * FROM users
WHERE username = ’jbellis’
[empty resultset]
INSERT INTO users (...)
VALUES (’jbellis’, ...)
Session 1
SELECT * FROM users
WHERE username = ’jbellis’
[empty resultset]
INSERT INTO users (...)
VALUES (’jbellis’, ...)
Session 2
Why Locking Doesn’t Work
Client
(locks) Coordinator
request
Replica
internal
request
Why Locking Doesn’t Work
Client
(locks) Coordinator
request
Replica
internal
request
X
Why Locking Doesn’t Work
Client
(locks) Coordinator
request
Replica
internal
request
hint
X
Why Locking Doesn’t Work
Client
(locks) Coordinator
request
Replica
internal
request
hint
timeout
response
X
*All operations are quorum-based
*Each replica sends information about unfinished operations to the
leader during prepare
*Paxos made Simple
Paxos
CAS Details
*3 round trips vs 1 for normal updates
*Paxos state is durable
*Immediate consistency with no leader election or failover
*ConsistencyLevel.SERIAL
Use with Caution
*Great for 1% of your application
*Eventual consistency is your friend
Eventual Consistency != Hopeful Consistency (Today, 1:30, Golden Gate)
Using CAS
UPDATE USERS
SET email = ’jonathan@datastax.com’, ...
WHERE username = ’jbellis’
IF email = ’jbellis@datastax.com’;
INSERT INTO USERS (username, email, ...)
VALUES (‘jbellis’, ‘jbellis@datastax.com’, ... )
IF NOT EXISTS;
Triggers
CREATE TRIGGER <name> ON <table> EXECUTE <classname>;
Trigger Implementation
class MyTrigger implements ITrigger
{
public Collection<RowMutation> augment(ByteBuffer key, ColumnFamily update)
{
...
}
}
Experimental!
*Relies on internal RowMutation, ColumnFamily classes
*[partition] key is a ByteBuffer
*Expect changes in 2.1
#CASSANDRA13
Follow Up Discussion
*After What were they Thinking? (DataStax Lounge)
*Meet the Experts (Today, 3:00, C370)
*Happy Hour (Tonight, 6:15)
CASSANDRASUMMIT2013
Thank You
CASSANDRASUMMIT2013
CASSANDRASUMMIT2013
Thank You
CASSANDRASUMMIT2013

More Related Content

PDF
Tokyo cassandra conference 2014
PDF
Cassandra summit keynote 2014
PDF
Cassandra 2.1
PDF
Cassandra Summit 2015
PDF
The world's next top data model
PDF
Real data models of silicon valley
PDF
Cassandra 3.0 advanced preview
PDF
Apache Cassandra Lesson: Data Modelling and CQL3
Tokyo cassandra conference 2014
Cassandra summit keynote 2014
Cassandra 2.1
Cassandra Summit 2015
The world's next top data model
Real data models of silicon valley
Cassandra 3.0 advanced preview
Apache Cassandra Lesson: Data Modelling and CQL3

What's hot (20)

PDF
The data model is dead, long live the data model
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra By Example: Data Modelling with CQL3
PDF
Cassandra 1.1
PDF
Advanced data modeling with apache cassandra
PDF
Cassandra EU - Data model on fire
PDF
Cassandra 2.0 better, faster, stronger
PDF
Cassandra 3.0 Data Modeling
PDF
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
PDF
Apache Cassandra & Data Modeling
PDF
Advanced Data Modeling with Apache Cassandra
PPTX
Cassandra 2.2 & 3.0
PDF
CQL3 in depth
PDF
Cassandra 3.0 - JSON at scale - StampedeCon 2015
PDF
Cassandra nice use cases and worst anti patterns
PDF
Cassandra Day Chicago 2015: Advanced Data Modeling
PDF
Cassandra 2.0 and timeseries
PDF
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
PPTX
Apache Cassandra Data Modeling with Travis Price
PDF
MariaDB Cassandra Interoperability
The data model is dead, long live the data model
Introduction to Data Modeling with Apache Cassandra
Cassandra By Example: Data Modelling with CQL3
Cassandra 1.1
Advanced data modeling with apache cassandra
Cassandra EU - Data model on fire
Cassandra 2.0 better, faster, stronger
Cassandra 3.0 Data Modeling
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
Apache Cassandra & Data Modeling
Advanced Data Modeling with Apache Cassandra
Cassandra 2.2 & 3.0
CQL3 in depth
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra nice use cases and worst anti patterns
Cassandra Day Chicago 2015: Advanced Data Modeling
Cassandra 2.0 and timeseries
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Apache Cassandra Data Modeling with Travis Price
MariaDB Cassandra Interoperability
Ad

Similar to Cassandra Summit 2013 Keynote (20)

PPTX
PDF
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1
PDF
Cassandra Summit EU 2013
PDF
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
PDF
Introduction to cassandra 2014
PDF
Time series with Apache Cassandra - Long version
PDF
Datastax day 2016 introduction to apache cassandra
PDF
Tokyo Cassandra Summit 2014: Apache Cassandra 2.0 + 2.1 by Jonathan Ellis
PDF
Developing with Cassandra
PPTX
DataStax NYC Java Meetup: Cassandra with Java
PDF
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
PDF
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
PDF
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
PDF
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
PDF
Big Data Grows Up - A (re)introduction to Cassandra
ODP
Intro to cassandra
PDF
Introduction to Apache Cassandra
PPTX
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
PDF
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
PPTX
Presentation
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1
Cassandra Summit EU 2013
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Introduction to cassandra 2014
Time series with Apache Cassandra - Long version
Datastax day 2016 introduction to apache cassandra
Tokyo Cassandra Summit 2014: Apache Cassandra 2.0 + 2.1 by Jonathan Ellis
Developing with Cassandra
DataStax NYC Java Meetup: Cassandra with Java
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Big Data Grows Up - A (re)introduction to Cassandra
Intro to cassandra
Introduction to Apache Cassandra
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Presentation
Ad

More from jbellis (20)

PPTX
Vector Search @ sw2con for slideshare.pptx
PDF
Five Lessons in Distributed Databases
PDF
Data day texas: Cassandra and the Cloud
PDF
London + Dublin Cassandra 2.0
PDF
Cassandra at NoSql Matters 2012
PDF
Top five questions to ask when choosing a big data solution
PDF
State of Cassandra 2012
PDF
Massively Scalable NoSQL with Apache Cassandra
PDF
Pycon 2012 What Python can learn from Java
PDF
Apache Cassandra: NoSQL in the enterprise
PDF
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
PDF
Cassandra at High Performance Transaction Systems 2011
PDF
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
PDF
What python can learn from java
PDF
State of Cassandra, 2011
PDF
Brisk: more powerful Hadoop powered by Cassandra
PDF
PyCon 2010 SQLAlchemy tutorial
PDF
Cassandra 0.7, Los Angeles High Scalability Group
PDF
Cassandra devoxx 2010
PDF
Cassandra FrOSCon 10
Vector Search @ sw2con for slideshare.pptx
Five Lessons in Distributed Databases
Data day texas: Cassandra and the Cloud
London + Dublin Cassandra 2.0
Cassandra at NoSql Matters 2012
Top five questions to ask when choosing a big data solution
State of Cassandra 2012
Massively Scalable NoSQL with Apache Cassandra
Pycon 2012 What Python can learn from Java
Apache Cassandra: NoSQL in the enterprise
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Cassandra at High Performance Transaction Systems 2011
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
What python can learn from java
State of Cassandra, 2011
Brisk: more powerful Hadoop powered by Cassandra
PyCon 2010 SQLAlchemy tutorial
Cassandra 0.7, Los Angeles High Scalability Group
Cassandra devoxx 2010
Cassandra FrOSCon 10

Recently uploaded (20)

PPT
What is a Computer? Input Devices /output devices
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
DOCX
search engine optimization ppt fir known well about this
PDF
Architecture types and enterprise applications.pdf
PPTX
The various Industrial Revolutions .pptx
PDF
STKI Israel Market Study 2025 version august
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPTX
Modernising the Digital Integration Hub
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
CloudStack 4.21: First Look Webinar slides
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PPTX
Benefits of Physical activity for teenagers.pptx
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PPTX
Configure Apache Mutual Authentication
What is a Computer? Input Devices /output devices
Getting started with AI Agents and Multi-Agent Systems
Credit Without Borders: AI and Financial Inclusion in Bangladesh
search engine optimization ppt fir known well about this
Architecture types and enterprise applications.pdf
The various Industrial Revolutions .pptx
STKI Israel Market Study 2025 version august
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
TEXTILE technology diploma scope and career opportunities
Zenith AI: Advanced Artificial Intelligence
Custom Battery Pack Design Considerations for Performance and Safety
Modernising the Digital Integration Hub
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
CloudStack 4.21: First Look Webinar slides
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Consumable AI The What, Why & How for Small Teams.pdf
Benefits of Physical activity for teenagers.pptx
Training Program for knowledge in solar cell and solar industry
OpenACC and Open Hackathons Monthly Highlights July 2025
Configure Apache Mutual Authentication

Cassandra Summit 2013 Keynote