MongoDB Revs You Up: What
Storage Engine is Right for You?
Jon Tobin, Dir. of Solution Eng.
---------------------
Jon.Tobin@percona.com
@jontobs
Linkedin.com/in/jonathanetobin
www.percona.com
Agenda
• How did we get here?
• What storage engines are available?
• Why does the data structure matter?
• What makes them unique?
• Where should I start my evaluation?
• How can I evaluate the engines?
First: Background
www.percona.com
Let’s Level Set
{
“_id” : ObjectId(“507f1f77bcf86cd799439011”),
“studentID” : 100,
“firstName” : “Jonathan”,
“middleName” : “Eli”,
“lastName” : “Tobin”,
“classes” : [
{
“courseID” : “PHY101”,
“grade” : “B”,
“courseName” : “Physics 101”
“credits” : 3
…
www.percona.com
MongoDB History
• MMAP rules the universe; concurrency suffers
• Per mongod lock
• v1.2.0 – December 10th 2009
• v2.0.7 – August 9th 2012
• Per database lock
• v2.2.0 – August 29th 2012
• v2.6.8 – August 25th 2015
• Concurrency!!
• Per document lock
• MongoDB, Inc acquires wiredTiger – December 16th, 2014
• v3.0 – March 3rd 2015
• First implementation of a storage engine API
Storage Engines & Data Structures
www.percona.com
MongoDB Storage Engines
MongoDB, Inc. & Percona Server for MongoDB
• MMAP
• wiredTiger
MongoDB Enterprise Advanced Only
• In Memory
• Encrypted (wiredTiger)
Percona Server for MongoDB
• PerconaFT
• RocksDB
www.percona.com
B-tree Overview
www.percona.com
B-tree Insert
Pivot Rule >=
www.percona.com
B-tree Search
www.percona.com
B-tree - Importance of I/O
15 hours VS 91 hours
AWS – Insert 200M Rows – Predictable I/O Response VS Not
6x
www.percona.com
What’s the Problem?
Performance is I/O limited when data is > RAM
Each insert/update requires at least 1 I/O
plus an I/O for every extra index
www.percona.com
What’s Up With: MMAP
Overview
• Very basic “storage engine”
• Collection level lock
• Highly reliant on the OS for caching
• Uses b-tree indexes to point to disk offset
• At the offset is the “record”
• In the record is the document
Best Use
• In place updates
• Record migration should be minimized
• $inc, $set, etc
• Read only*
www.percona.com
What’s Up With: MMAP
Problems
• Record allocation is fixed size
• Space inefficient (powerof2)
• What if document grows bigger than record?
Probably not for you. Going the “way of the dodo”
www.percona.com
What’s Up With: wiredTiger
Overview
• Concurrency: Document level
• Supports multiple data structures
• B-tree (v3.0 +)
• LSM tree (v3.2 +)
• Controls cache
Best Use
• Depends on data structure
• B-tree: reads (point or small range) / dataset close to cache
• LSM: random updates
Promising but still a bit of a “black box”
www.percona.com
What’s Up With: RocksDB
Overview
• Written & maintained by Facebook
• Cut it’s teeth @ Parse
• Data Structure = LSM Trees
• Uses Google’s LevelDB API
• Space efficient + compression
• Excellent core scaling
Best Use
• Point queries
• Updates
• Easy incremental backups
Has very advanced functionality. Lots of potential
www.percona.com
What’s Up With: LSMs
memtbl
Level
0
Level
1
Level
2
Level
3
Level
4
• Writes go to memTable + journal
• Memtable fills up and overflows (flush) to file(s)
• Files are read only
• Acts like layers of logs
• Files are eventually merged and old files are marked for deletion
• Files are like small structured trees
www.percona.com
What’s Up With: LSMs – Range Ops
memtbl
Level
0
Level
1
Level
2
Level
3
Level
4
• Range scans are tough
• Each file is it’s own tree
• No good way to tell if data lies in any file
• Read amplification is H-I-G-H
RANGE SCAN
www.percona.com
What’s Up With: LSMs – Point Ops
memtbl
Level
0
Level
1
Level
2
Level
3
Level
4
• Point operations are tough too
• However, Bloom filters work well
• Filter determines if the required info exists in a set
• Can have false positives
www.percona.com
Fractal Tree Indexes
www.percona.com
Fractal - Insert
www.percona.com
Fractal – Message Injection
www.percona.com
What’s Up With: PerconaFT
Overview
• Developed by MIT, SUNY Stony Brook & Rutgers
• Concurrency: Document level
• Unique data structure
• Fractal Tree
• Controls cache
• Compresses well (quicklz, zlib, lzma)
Best Uses
• Best compression
• CPU efficient (relatively)
• Sequential workloads
Still developing as a pluggable engine. Needs to learn API
Benchmarks
Disclaimer: They’re just benchmarks. It’s all made up.
(like economics & meteorology)
www.percona.com
Insert Workload
collections = 8
database name = sbtest
writer threads = 16
documents per collection = 10,000,000
feedback seconds = 20
auto commit = N
run seconds = 1200
oltp range size = 100
oltp point selects = 0
oltp simple ranges = 0
oltp sum ranges = 0
oltp order ranges = 0
oltp distinct ranges = 0
oltp index updates = 0
oltp non index updates = 0
oltp inserts = 20
Applies to all benchmarks in this presentation
www.percona.com
What’s Up With: Writes
0
100
200
300
400
500
600
700
800
20
60
100
140
180
220
260
300
340
380
420
460
500
540
580
620
660
700
740
780
820
860
900
940
980
1020
1060
1100
1140
1180
TPS
Elapsed Seconds
Mongo Engines - Write TPS
PerconaFT
wiredTiger
RocksDB
www.percona.com
Read Workload
run seconds = 1200
oltp range size = 100
oltp point selects = 10
oltp simple ranges = 1
oltp sum ranges = 1
oltp order ranges = 1
oltp distinct ranges = 1
oltp index updates = 0
oltp non index updates = 0
oltp inserts = 0
www.percona.com
What’s Up With: Reads
0
200
400
600
800
1000
1200
1400
1600
20
60
100
140
180
220
260
300
340
380
420
460
500
540
580
620
660
700
740
780
820
860
900
940
980
1020
1060
1100
1140
1180
AxisTitle
Axis Title
Mongo Engines - Read TPS
PerconaFT
wiredTiger
RocksDB
www.percona.com
Update Workload
run seconds = 1200
oltp range size = 100
oltp point selects = 0
oltp simple ranges = 0
oltp sum ranges = 0
oltp order ranges = 0
oltp distinct ranges = 0
oltp index updates = 50
oltp non index updates = 5
oltp inserts = 0
www.percona.com
What’s Up With: Updates
0
50
100
150
200
250
300
350
400
450
500
20
60
100
140
180
220
260
300
340
380
420
460
500
540
580
620
660
700
740
780
820
860
900
940
980
1020
1060
1100
1140
1180
TPS
Elapsed Seconds
Mongo Engines - Updates
PerconaFT
wiredTiger
RocksDB
www.percona.com
Mixed Workload
run seconds = 1200
oltp range size = 100
oltp point selects = 10
oltp simple ranges = 1
oltp sum ranges = 1
oltp order ranges = 1
oltp distinct ranges = 1
oltp index updates = 50
oltp non index updates = 5
oltp inserts = 10
www.percona.com
What’s Up With: Mixed Workloads
0
50
100
150
200
250
20
60
100
140
180
220
260
300
340
380
420
460
500
540
580
620
660
700
740
780
820
860
900
940
980
1020
1060
1100
1140
1180
TPS
Elapsed Seconds
Mongo Engines - Mixed
PerconaFT
wiredTiger
RocksDB
www.percona.com
Evaluation Resources
• Flashback – replay Mongo operations in real time or as fast
as possible with your workload
• Benchrun – javascript benchmark harness in MongoDB. Cut
out driver problems
• Sysbench & iiBench for Mongo
• Yahoo Cloud Services Benchmark
• Mongo-perf
*Whenever possible, run with YOUR workload, or a workload
that accurately simulates yours.
www.percona.comwww.percona.com
Percona Live
Data Performance Conference
• April 18-21 in Santa Clara, CA at the Santa Clara
Convention Center
• Register with code “WebinarPL” to receive 15%
off at registration
• MySQL, NoSQL, Data in the Cloud
www.perconalive.com

MongoDB revs you up: What Storage Engine is Right for You?

  • 1.
    MongoDB Revs YouUp: What Storage Engine is Right for You? Jon Tobin, Dir. of Solution Eng. --------------------- [email protected] @jontobs Linkedin.com/in/jonathanetobin
  • 2.
    www.percona.com Agenda • How didwe get here? • What storage engines are available? • Why does the data structure matter? • What makes them unique? • Where should I start my evaluation? • How can I evaluate the engines?
  • 3.
  • 4.
    www.percona.com Let’s Level Set { “_id”: ObjectId(“507f1f77bcf86cd799439011”), “studentID” : 100, “firstName” : “Jonathan”, “middleName” : “Eli”, “lastName” : “Tobin”, “classes” : [ { “courseID” : “PHY101”, “grade” : “B”, “courseName” : “Physics 101” “credits” : 3 …
  • 5.
    www.percona.com MongoDB History • MMAPrules the universe; concurrency suffers • Per mongod lock • v1.2.0 – December 10th 2009 • v2.0.7 – August 9th 2012 • Per database lock • v2.2.0 – August 29th 2012 • v2.6.8 – August 25th 2015 • Concurrency!! • Per document lock • MongoDB, Inc acquires wiredTiger – December 16th, 2014 • v3.0 – March 3rd 2015 • First implementation of a storage engine API
  • 6.
    Storage Engines &Data Structures
  • 7.
    www.percona.com MongoDB Storage Engines MongoDB,Inc. & Percona Server for MongoDB • MMAP • wiredTiger MongoDB Enterprise Advanced Only • In Memory • Encrypted (wiredTiger) Percona Server for MongoDB • PerconaFT • RocksDB
  • 8.
  • 9.
  • 10.
  • 11.
    www.percona.com B-tree - Importanceof I/O 15 hours VS 91 hours AWS – Insert 200M Rows – Predictable I/O Response VS Not 6x
  • 12.
    www.percona.com What’s the Problem? Performanceis I/O limited when data is > RAM Each insert/update requires at least 1 I/O plus an I/O for every extra index
  • 13.
    www.percona.com What’s Up With:MMAP Overview • Very basic “storage engine” • Collection level lock • Highly reliant on the OS for caching • Uses b-tree indexes to point to disk offset • At the offset is the “record” • In the record is the document Best Use • In place updates • Record migration should be minimized • $inc, $set, etc • Read only*
  • 14.
    www.percona.com What’s Up With:MMAP Problems • Record allocation is fixed size • Space inefficient (powerof2) • What if document grows bigger than record? Probably not for you. Going the “way of the dodo”
  • 15.
    www.percona.com What’s Up With:wiredTiger Overview • Concurrency: Document level • Supports multiple data structures • B-tree (v3.0 +) • LSM tree (v3.2 +) • Controls cache Best Use • Depends on data structure • B-tree: reads (point or small range) / dataset close to cache • LSM: random updates Promising but still a bit of a “black box”
  • 16.
    www.percona.com What’s Up With:RocksDB Overview • Written & maintained by Facebook • Cut it’s teeth @ Parse • Data Structure = LSM Trees • Uses Google’s LevelDB API • Space efficient + compression • Excellent core scaling Best Use • Point queries • Updates • Easy incremental backups Has very advanced functionality. Lots of potential
  • 17.
    www.percona.com What’s Up With:LSMs memtbl Level 0 Level 1 Level 2 Level 3 Level 4 • Writes go to memTable + journal • Memtable fills up and overflows (flush) to file(s) • Files are read only • Acts like layers of logs • Files are eventually merged and old files are marked for deletion • Files are like small structured trees
  • 18.
    www.percona.com What’s Up With:LSMs – Range Ops memtbl Level 0 Level 1 Level 2 Level 3 Level 4 • Range scans are tough • Each file is it’s own tree • No good way to tell if data lies in any file • Read amplification is H-I-G-H RANGE SCAN
  • 19.
    www.percona.com What’s Up With:LSMs – Point Ops memtbl Level 0 Level 1 Level 2 Level 3 Level 4 • Point operations are tough too • However, Bloom filters work well • Filter determines if the required info exists in a set • Can have false positives
  • 20.
  • 21.
  • 22.
  • 23.
    www.percona.com What’s Up With:PerconaFT Overview • Developed by MIT, SUNY Stony Brook & Rutgers • Concurrency: Document level • Unique data structure • Fractal Tree • Controls cache • Compresses well (quicklz, zlib, lzma) Best Uses • Best compression • CPU efficient (relatively) • Sequential workloads Still developing as a pluggable engine. Needs to learn API
  • 24.
    Benchmarks Disclaimer: They’re justbenchmarks. It’s all made up. (like economics & meteorology)
  • 25.
    www.percona.com Insert Workload collections =8 database name = sbtest writer threads = 16 documents per collection = 10,000,000 feedback seconds = 20 auto commit = N run seconds = 1200 oltp range size = 100 oltp point selects = 0 oltp simple ranges = 0 oltp sum ranges = 0 oltp order ranges = 0 oltp distinct ranges = 0 oltp index updates = 0 oltp non index updates = 0 oltp inserts = 20 Applies to all benchmarks in this presentation
  • 26.
    www.percona.com What’s Up With:Writes 0 100 200 300 400 500 600 700 800 20 60 100 140 180 220 260 300 340 380 420 460 500 540 580 620 660 700 740 780 820 860 900 940 980 1020 1060 1100 1140 1180 TPS Elapsed Seconds Mongo Engines - Write TPS PerconaFT wiredTiger RocksDB
  • 27.
    www.percona.com Read Workload run seconds= 1200 oltp range size = 100 oltp point selects = 10 oltp simple ranges = 1 oltp sum ranges = 1 oltp order ranges = 1 oltp distinct ranges = 1 oltp index updates = 0 oltp non index updates = 0 oltp inserts = 0
  • 28.
    www.percona.com What’s Up With:Reads 0 200 400 600 800 1000 1200 1400 1600 20 60 100 140 180 220 260 300 340 380 420 460 500 540 580 620 660 700 740 780 820 860 900 940 980 1020 1060 1100 1140 1180 AxisTitle Axis Title Mongo Engines - Read TPS PerconaFT wiredTiger RocksDB
  • 29.
    www.percona.com Update Workload run seconds= 1200 oltp range size = 100 oltp point selects = 0 oltp simple ranges = 0 oltp sum ranges = 0 oltp order ranges = 0 oltp distinct ranges = 0 oltp index updates = 50 oltp non index updates = 5 oltp inserts = 0
  • 30.
    www.percona.com What’s Up With:Updates 0 50 100 150 200 250 300 350 400 450 500 20 60 100 140 180 220 260 300 340 380 420 460 500 540 580 620 660 700 740 780 820 860 900 940 980 1020 1060 1100 1140 1180 TPS Elapsed Seconds Mongo Engines - Updates PerconaFT wiredTiger RocksDB
  • 31.
    www.percona.com Mixed Workload run seconds= 1200 oltp range size = 100 oltp point selects = 10 oltp simple ranges = 1 oltp sum ranges = 1 oltp order ranges = 1 oltp distinct ranges = 1 oltp index updates = 50 oltp non index updates = 5 oltp inserts = 10
  • 32.
    www.percona.com What’s Up With:Mixed Workloads 0 50 100 150 200 250 20 60 100 140 180 220 260 300 340 380 420 460 500 540 580 620 660 700 740 780 820 860 900 940 980 1020 1060 1100 1140 1180 TPS Elapsed Seconds Mongo Engines - Mixed PerconaFT wiredTiger RocksDB
  • 33.
    www.percona.com Evaluation Resources • Flashback– replay Mongo operations in real time or as fast as possible with your workload • Benchrun – javascript benchmark harness in MongoDB. Cut out driver problems • Sysbench & iiBench for Mongo • Yahoo Cloud Services Benchmark • Mongo-perf *Whenever possible, run with YOUR workload, or a workload that accurately simulates yours.
  • 34.
    www.percona.comwww.percona.com Percona Live Data PerformanceConference • April 18-21 in Santa Clara, CA at the Santa Clara Convention Center • Register with code “WebinarPL” to receive 15% off at registration • MySQL, NoSQL, Data in the Cloud www.perconalive.com