Acunu Analytics:
Simpler Real-Time
Cassandra Apps
Tim Moreton CTO
@timmoreton
Monday, 29 April 13
2
•Scalable. No single point of {failure, bottleneck}
•Fast. Especially for writes
•Available. Effortless Multi-DC support
•Maturing fast. Lots of production deployments
WE C*
Monday, 29 April 13
3
WE C*
Virtual nodes CQL Support
Monday, 29 April 13
4
•Spartan queries
•Thrift (and CQL, a bit)
•Denormalization hurts agility
•Weak update semantics
Challenges remain, of course.
WE C*
Monday, 29 April 13
5
C*: Two uses
Monday, 29 April 13
5
Session storage
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
•Many more reads than writes
•Updates to existing records
(ideally, transactionally)
•Probably fits in RAM:
distribute for availability
C*: Two uses
Monday, 29 April 13
5
Real-time analytics
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
•Many more writes than reads
•Almost all reads are to results
•Almost no writes are ‘updates’
•Distribute for availability,
performance, capacity
Session storage
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
•Many more reads than writes
•Updates to existing records
(ideally, transactionally)
•Probably fits in RAM:
distribute for availability
C*: Two uses
Monday, 29 April 13
5
Real-time analytics
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
•Many more writes than reads
•Almost all reads are to results
•Almost no writes are ‘updates’
•Distribute for availability,
performance, capacity
Session storage
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
•Many more reads than writes
•Updates to existing records
(ideally, transactionally)
•Probably fits in RAM:
distribute for availability
C*: Two uses
Monday, 29 April 13
6
C*on
•Rich, SQL-like queries
•RESTful HTTP APIs, JSON-based
•Automated denormalization
•Update semantics < less critical for analytics
Supplement Cassandra with:
Monday, 29 April 13
7
Analytics: Two patterns
Monday, 29 April 13
7
Exploratory
Analytics
Unstructured
Warehouses
Data
Mining
?
Machine
Learning
Analytics: Two patterns
Monday, 29 April 13
7
Exploratory
Analytics
Unstructured
Warehouses
Data
Mining
?
Machine
Learning
Analytics: Two patterns
Operational
Intelligence
Dashboards Real-time
Decisions
Alerting
!
Monday, 29 April 13
7
Exploratory
Analytics
Unstructured
Warehouses
Data
Mining
?
Machine
Learning
Analytics: Two patterns
Operational
Intelligence
Dashboards Real-time
Decisions
Alerting
!
Complex analysis, data variety
Query richness
Data freshness, response time
Query speed
Monday, 29 April 13
7
Exploratory
Analytics
Unstructured
Warehouses
Data
Mining
?
Machine
Learning
Analytics: Two patterns
Operational
Intelligence
Dashboards Real-time
Decisions
Alerting
!
Complex analysis, data variety
Query richness
Data freshness, response time
Query speed
Monday, 29 April 13
8
API
event
stream
event
store
roll-up
cubes
Ingest
Processing
dashboard queries programatic interface
Monday, 29 April 13
9
Who uses Acunu?
Location DataWeb and Visitor
Market/Tick Data
Infrastructure
Sensor Data
Social Media
Social GamingSmart Grid
Production Line
Monday, 29 April 13
10
Monday, 29 April 13
10
API
event
stream
event
store
roll-up
cubes
Ingest
Processing
dashboard queries programatic interface
API
event
stream
event
store
roll-up
cubes
Ingest
Processing
dashboard queries programatic interface
Cassandra stores raw events and intermediate aggregates
Monday, 29 April 13
10
API
event
stream
event
store
roll-up
cubes
Ingest
Processing
dashboard queries programatic interface
API
event
stream
event
store
roll-up
cubes
Ingest
Processing
dashboard queries programatic interface
Cassandra stores raw events and intermediate aggregates
API
event
store
roll-up
cubes
dashboard queries programatic interface
Acunu Analytics is a Cassandra client mapping new events,
queries and schema changes to aggregate reads and writes
!
API
event
stream
event
store
roll-up
cubes
Ingest
Processing
dashboard queries programatic int
Monday, 29 April 13
10
API
event
stream
event
store
roll-up
cubes
Ingest
Processing
dashboard queries programatic interface
API
event
stream
event
store
roll-up
cubes
Ingest
Processing
dashboard queries programatic interface
Cassandra stores raw events and intermediate aggregates
Acunu Dashboards provides embeddable,
custom data visualization using HTTP API
API
event
store
roll-up
cubes
dashboard queries programatic interface
Acunu Analytics is a Cassandra client mapping new events,
queries and schema changes to aggregate reads and writes
!
API
event
stream
event
store
roll-up
cubes
Ingest
Processing
dashboard queries programatic int
Monday, 29 April 13
CREATE TABLE APICalls (
time TIME(‘PST’, HOUR, MIN, SEC),
path PATH(/),
useragent STRING,
latitude DOUBLE(0.1, 0.01),
longitude DOUBLE(0.1, 0.01)
);
CREATE CUBE SELECT COUNT, AVG(respTime) FROM APICalls
WHERE time, path
GROUP BY time, path;
CREATE CUBE SELECT COUNT FROM APICalls
WHERE latitude, longitude
GROUP BY latitude, longitude;
11
(Loosely) Define a schema
• Tables have HTTP endpoint; map to a set of ColumnFamilys
• Dimensions map keys in events; allow hierarchical aggregation
• Cubes defines dimensions and aggregate to maintain
Monday, 29 April 13
CREATE CUBE SELECT SUM(a) FROM t
WHERE x, y GROUP BY g, h, i;
12
Aggregation
estMonday, 29 April 13
CREATE CUBE SELECT SUM(a) FROM t
WHERE x, y GROUP BY g, h, i;
12
Aggregation
est
New event:
Apply SUM(v, v’)
on this cell
v
A: v’
X: x
Y: y
Z: z
y
x
(g, h, i)
Monday, 29 April 13
CREATE CUBE SELECT SUM(a) FROM t
WHERE x, y GROUP BY g, h, i;
12
Aggregation
• Hierarchical dimensions cause multiple writes per event
(That’s ok: Cassandra’s good at writes)
• Most aggregates result in atomic counter increments
est
New event:
Apply SUM(v, v’)
on this cell
v
A: v’
X: x
Y: y
Z: z
y
x
(g, h, i)
Monday, 29 April 13
SELECT SUM(a) FROM t
WHERE x = .. and y = .. GROUP BY g, h, i;
13
Queries
est
• WHEREs map to a Cassandra row and GROUP BY to a
compound column key in that row (very roughly)
Monday, 29 April 13
SELECT SUM(a) FROM t
WHERE x = .. and y = .. GROUP BY g, h, i;
13
Queries
est
New query:
•Locate slice that
matches WHERE
•Return all mappings
from GROUP BY tuples
to cell values
v
y
x
(g, h, i)
• WHEREs map to a Cassandra row and GROUP BY to a
compound column key in that row (very roughly)
Monday, 29 April 13
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3221 :00→22 :01→19 :02→104 ...
... ...
UK all→228 user01→1 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1904 ...
∅ all→87314 UK→238 US→354 ...
14
A concrete example
Monday, 29 April 13
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :01→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→355 ...
{
cust_id: user01,
session_id: 102,
geography: UK,
browser: IE,
time: 22:02,
}
15
Each event updates
multiple aggregates:
A concrete example
Monday, 29 April 13
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :01→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→355 ...
{
cust_id: user01,
session_id: 102,
geography: UK,
browser: IE,
time: 22:02,
}
15
Each event updates
multiple aggregates:
WHERE time IN (22:00,23:00)
GROUP BY minute
A concrete example
Monday, 29 April 13
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :01→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→355 ...
{
cust_id: user01,
session_id: 102,
geography: UK,
browser: IE,
time: 22:02,
}
15
Each event updates
multiple aggregates:
WHERE time IN (22:00,23:00)
GROUP BY minute
WHERE geography=US
GROUP BY user
A concrete example
Monday, 29 April 13
16
SELECT `SUM(x)/(MAX(y) -
MIN(y) + 0.5) AS 'spread'
FROM ...
Arithmetic expressions
SELECT a - b AS lbound, a + b AS ubound
FROM
(SELECT AVG(score) AS a FROM scores
WHERE year = 2012)
JOIN
(SELECT STDDEV(score) AS b FROM scores)
USING (school)
Fast inner joins
SELECT COUNT UNIQUE (visitors) GROUP
BY time(DAY(‘US/Pacific’))
Time zone support
SELECT SUM(size) FROM ..
WHERE path MATCHES /usr/*
Hierarchical aggregation
SELECT DRILL FROM errors WHERE
category IN (“warn”, “error”)
Drill down to raw events
SELECT COUNT (items) FROM ..
GROUP BY category LIMIT 3,
country
... HAVING AVG(rating) < 2.0
AND COUNT >= 10
Limits
Query-time filtering
Rich queries
Monday, 29 April 13
17
Monday, 29 April 13
Apache,Apache Cassandra, Cassandra, Hadoop, and the eye and elephant logos
are trademarks of the Apache Software Foundation.
Thank You.
Tim Moreton CTO
@timmoreton
Monday, 29 April 13

Acunu Analytics: Simpler Real-Time Cassandra Apps

  • 1.
    Acunu Analytics: Simpler Real-Time CassandraApps Tim Moreton CTO @timmoreton Monday, 29 April 13
  • 2.
    2 •Scalable. No singlepoint of {failure, bottleneck} •Fast. Especially for writes •Available. Effortless Multi-DC support •Maturing fast. Lots of production deployments WE C* Monday, 29 April 13
  • 3.
    3 WE C* Virtual nodesCQL Support Monday, 29 April 13
  • 4.
    4 •Spartan queries •Thrift (andCQL, a bit) •Denormalization hurts agility •Weak update semantics Challenges remain, of course. WE C* Monday, 29 April 13
  • 5.
  • 6.
    5 Session storage 02:44:02 241.24.410.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html •Many more reads than writes •Updates to existing records (ideally, transactionally) •Probably fits in RAM: distribute for availability C*: Two uses Monday, 29 April 13
  • 7.
    5 Real-time analytics 02:44:02 241.24.410.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html •Many more writes than reads •Almost all reads are to results •Almost no writes are ‘updates’ •Distribute for availability, performance, capacity Session storage 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html •Many more reads than writes •Updates to existing records (ideally, transactionally) •Probably fits in RAM: distribute for availability C*: Two uses Monday, 29 April 13
  • 8.
    5 Real-time analytics 02:44:02 241.24.410.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html •Many more writes than reads •Almost all reads are to results •Almost no writes are ‘updates’ •Distribute for availability, performance, capacity Session storage 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html •Many more reads than writes •Updates to existing records (ideally, transactionally) •Probably fits in RAM: distribute for availability C*: Two uses Monday, 29 April 13
  • 9.
    6 C*on •Rich, SQL-like queries •RESTfulHTTP APIs, JSON-based •Automated denormalization •Update semantics < less critical for analytics Supplement Cassandra with: Monday, 29 April 13
  • 10.
  • 11.
  • 12.
  • 13.
    7 Exploratory Analytics Unstructured Warehouses Data Mining ? Machine Learning Analytics: Two patterns Operational Intelligence DashboardsReal-time Decisions Alerting ! Complex analysis, data variety Query richness Data freshness, response time Query speed Monday, 29 April 13
  • 14.
    7 Exploratory Analytics Unstructured Warehouses Data Mining ? Machine Learning Analytics: Two patterns Operational Intelligence DashboardsReal-time Decisions Alerting ! Complex analysis, data variety Query richness Data freshness, response time Query speed Monday, 29 April 13
  • 15.
  • 16.
    9 Who uses Acunu? LocationDataWeb and Visitor Market/Tick Data Infrastructure Sensor Data Social Media Social GamingSmart Grid Production Line Monday, 29 April 13
  • 17.
  • 18.
    10 API event stream event store roll-up cubes Ingest Processing dashboard queries programaticinterface API event stream event store roll-up cubes Ingest Processing dashboard queries programatic interface Cassandra stores raw events and intermediate aggregates Monday, 29 April 13
  • 19.
    10 API event stream event store roll-up cubes Ingest Processing dashboard queries programaticinterface API event stream event store roll-up cubes Ingest Processing dashboard queries programatic interface Cassandra stores raw events and intermediate aggregates API event store roll-up cubes dashboard queries programatic interface Acunu Analytics is a Cassandra client mapping new events, queries and schema changes to aggregate reads and writes ! API event stream event store roll-up cubes Ingest Processing dashboard queries programatic int Monday, 29 April 13
  • 20.
    10 API event stream event store roll-up cubes Ingest Processing dashboard queries programaticinterface API event stream event store roll-up cubes Ingest Processing dashboard queries programatic interface Cassandra stores raw events and intermediate aggregates Acunu Dashboards provides embeddable, custom data visualization using HTTP API API event store roll-up cubes dashboard queries programatic interface Acunu Analytics is a Cassandra client mapping new events, queries and schema changes to aggregate reads and writes ! API event stream event store roll-up cubes Ingest Processing dashboard queries programatic int Monday, 29 April 13
  • 21.
    CREATE TABLE APICalls( time TIME(‘PST’, HOUR, MIN, SEC), path PATH(/), useragent STRING, latitude DOUBLE(0.1, 0.01), longitude DOUBLE(0.1, 0.01) ); CREATE CUBE SELECT COUNT, AVG(respTime) FROM APICalls WHERE time, path GROUP BY time, path; CREATE CUBE SELECT COUNT FROM APICalls WHERE latitude, longitude GROUP BY latitude, longitude; 11 (Loosely) Define a schema • Tables have HTTP endpoint; map to a set of ColumnFamilys • Dimensions map keys in events; allow hierarchical aggregation • Cubes defines dimensions and aggregate to maintain Monday, 29 April 13
  • 22.
    CREATE CUBE SELECTSUM(a) FROM t WHERE x, y GROUP BY g, h, i; 12 Aggregation estMonday, 29 April 13
  • 23.
    CREATE CUBE SELECTSUM(a) FROM t WHERE x, y GROUP BY g, h, i; 12 Aggregation est New event: Apply SUM(v, v’) on this cell v A: v’ X: x Y: y Z: z y x (g, h, i) Monday, 29 April 13
  • 24.
    CREATE CUBE SELECTSUM(a) FROM t WHERE x, y GROUP BY g, h, i; 12 Aggregation • Hierarchical dimensions cause multiple writes per event (That’s ok: Cassandra’s good at writes) • Most aggregates result in atomic counter increments est New event: Apply SUM(v, v’) on this cell v A: v’ X: x Y: y Z: z y x (g, h, i) Monday, 29 April 13
  • 25.
    SELECT SUM(a) FROMt WHERE x = .. and y = .. GROUP BY g, h, i; 13 Queries est • WHEREs map to a Cassandra row and GROUP BY to a compound column key in that row (very roughly) Monday, 29 April 13
  • 26.
    SELECT SUM(a) FROMt WHERE x = .. and y = .. GROUP BY g, h, i; 13 Queries est New query: •Locate slice that matches WHERE •Return all mappings from GROUP BY tuples to cell values v y x (g, h, i) • WHEREs map to a Cassandra row and GROUP BY to a compound column key in that row (very roughly) Monday, 29 April 13
  • 27.
    21:00 all→1345 :00→45:01→62 :02→87 ... 22:00 all→3221 :00→22 :01→19 :02→104 ... ... ... UK all→228 user01→1 user14→12 user99→7 ... US all→354 user01→4 user04→8 user56→17 ... ... UK, 22:00 all→1904 ... ∅ all→87314 UK→238 US→354 ... 14 A concrete example Monday, 29 April 13
  • 28.
    21:00 all→1345 :00→45:01→62 :02→87 ... 22:00 all→3222 :00→22 :01→19 :02→105 ... ... ... UK all→229 user01→2 user14→12 user99→7 ... US all→354 user01→4 user04→8 user56→17 ... ... UK, 22:00 all→1905 ... ∅ all→87315 UK→239 US→355 ... { cust_id: user01, session_id: 102, geography: UK, browser: IE, time: 22:02, } 15 Each event updates multiple aggregates: A concrete example Monday, 29 April 13
  • 29.
    21:00 all→1345 :00→45:01→62 :02→87 ... 22:00 all→3222 :00→22 :01→19 :02→105 ... ... ... UK all→229 user01→2 user14→12 user99→7 ... US all→354 user01→4 user04→8 user56→17 ... ... UK, 22:00 all→1905 ... ∅ all→87315 UK→239 US→355 ... { cust_id: user01, session_id: 102, geography: UK, browser: IE, time: 22:02, } 15 Each event updates multiple aggregates: WHERE time IN (22:00,23:00) GROUP BY minute A concrete example Monday, 29 April 13
  • 30.
    21:00 all→1345 :00→45:01→62 :02→87 ... 22:00 all→3222 :00→22 :01→19 :02→105 ... ... ... UK all→229 user01→2 user14→12 user99→7 ... US all→354 user01→4 user04→8 user56→17 ... ... UK, 22:00 all→1905 ... ∅ all→87315 UK→239 US→355 ... { cust_id: user01, session_id: 102, geography: UK, browser: IE, time: 22:02, } 15 Each event updates multiple aggregates: WHERE time IN (22:00,23:00) GROUP BY minute WHERE geography=US GROUP BY user A concrete example Monday, 29 April 13
  • 31.
    16 SELECT `SUM(x)/(MAX(y) - MIN(y)+ 0.5) AS 'spread' FROM ... Arithmetic expressions SELECT a - b AS lbound, a + b AS ubound FROM (SELECT AVG(score) AS a FROM scores WHERE year = 2012) JOIN (SELECT STDDEV(score) AS b FROM scores) USING (school) Fast inner joins SELECT COUNT UNIQUE (visitors) GROUP BY time(DAY(‘US/Pacific’)) Time zone support SELECT SUM(size) FROM .. WHERE path MATCHES /usr/* Hierarchical aggregation SELECT DRILL FROM errors WHERE category IN (“warn”, “error”) Drill down to raw events SELECT COUNT (items) FROM .. GROUP BY category LIMIT 3, country ... HAVING AVG(rating) < 2.0 AND COUNT >= 10 Limits Query-time filtering Rich queries Monday, 29 April 13
  • 32.
  • 33.
    Apache,Apache Cassandra, Cassandra,Hadoop, and the eye and elephant logos are trademarks of the Apache Software Foundation. Thank You. Tim Moreton CTO @timmoreton Monday, 29 April 13