SlideShare a Scribd company logo
Netflix's Transition to High-Availability Storage (QCon SF 2010)
Coordinates
Twitter @r39132
#netflixcloud
Blog http://practicalcloudcomputing.com
Linked In http://www.linkedin.com/in/siddharthanand
2@r39132 - #netflixcloud
Why Are You Here?
”What I need is an exact list of specific unknown
problems we might encounter."
-- anonymous
@r39132 - #netflixcloud 3
Netflix's Transition to High-Availability Storage (QCon SF 2010)
Motivation
 Circa late 2008, Netflix had a single data center
 Single-point-of-failure (a.k.a. SPOF)
 Approaching limits on cooling, power, space, traffic
capacity
 Alternatives
 Build more data centers
 Outsource the majority of our capacity planning and
scale out
@r39132 - #netflixcloud 5
Motivation
 Winner : Outsource the majority of our capacity planning and
scale out
 Leverage a leading Infrastructure-as-a-service provider
 Amazon Web Services
 Footnote : As it has taken us a while (i.e. ~2+ years) to realize
our vision of running on the cloud, we needed a interim solution
to handle growth
 We did build a second data center along the way
 We did outgrow it
6@r39132 - #netflixcloud
Netflix's Transition to High-Availability Storage (QCon SF 2010)
Cloud Migration Strategy
 Components
 Applications and Software Infrastructure
 Data
 Migration Considerations
 Security
 PII and PCI DSS stays in our DC, rest can go to the cloud
 Scalability and Availability for Business Success
@r39132 - #netflixcloud 8
Cloud Migration Strategy
 Scalability and Availability for Business Success
 High Growth or High Traffic Growth Data
 Video starts, Personalized Video choosing
 High Traffic Growth Applications
 Same as above
 Log Processing
 Time-to-market Critical Batch Processing
 Video encoding
 Not Included
 DVD inventory and shipment
 We are a streaming company that also ships DVD
@r39132 - #netflixcloud 9
Cloud Migration Strategy
Examples of Data that can be moved
 Video-centric data
 Critics’ reviews
 Metadata
 User-video-centric data – some of our largest data sets
 User-video queue
 Previously streamed and shipped video history
 Ratings (i.e. a 5-star rating system)
 Video streaming metadata (e.g. streaming bookmarks)
@r39132 - #netflixcloud 10
Netflix's Transition to High-Availability Storage (QCon SF 2010)
Cloud Migration Strategy
 High-level Requirements for our Site
 No big-bang migrations
 New functionality needs to launch in the cloud when
possible
 High-level Requirements for our Data
 Data needs to migrate before applications
 Data needs to be shared between applications running in
the cloud and our data center during the transition period
@r39132 - #netflixcloud 12
Cloud Migration Strategy
@r39132 - #netflixcloud 13
Cloud Migration Strategy
 Low-level Requirements for our Data
 Pick a (key-value) data store in the cloud
 Challenges
 Translate RDBMS concepts to KV store concepts
 Work-around Issues specific to the chosen KV store
 Create a bi-directional DC-Cloud data replication
pipeline
@r39132 - #netflixcloud 14
Netflix's Transition to High-Availability Storage (QCon SF 2010)
Pick a Data Store in the Cloud
An ideal storage solution should have the following features:
 Hosted
 Managed Distribution Model
 Works in AWS
 AP from CAP
 Handles a majority of use-cases accessing high-growth, high-traffic data
 Specifically, key access by customer id, movie id, or both
@r39132 - #netflixcloud 16
Pick a Data Store in the Cloud
 We picked SimpleDB and S3
 SimpleDB was targeted as the AP equivalent of our RDBMS
databases in our Data Center
 S3 was used for data sets where item or row data
exceeded SimpleDB limits and could be looked up purely
by a single key (i.e. does not require secondary indices and
complex query semantics)
 Video encodes
 Streaming device activity logs (i.e. CLOB, BLOB, etc…)
 Compression of old Rental History
@r39132 - #netflixcloud 17
Netflix's Transition to High-Availability Storage (QCon SF 2010)
Technology Overview : SimpleDB
SimpleDB Hash Table Relational Databases
Domain Hash Table Table
Item Entry Row
Item Name Key Mandatory Primary Key
Attribute Part of the Entry Value Column
@r39132 - #netflixcloud 19
Terminology
Technology Overview : SimpleDB
@r39132 - #netflixcloud 20
Soccer Players
Key Value
ab12ocs12v9 First Name = Harold Last Name = Kewell
Nickname = Wizard of
Oz
Teams = Leeds United,
Liverpool, Galatasaray
b24h3b3403b First Name = Pavel Last Name = Nedved
Nickname = Czech
Cannon
Teams = Lazio,
Juventus
cc89c9dc892 First Name = Cristiano Last Name = Ronaldo
Teams = Sporting,
Manchester United,
Real Madrid
SimpleDB’s salient characteristics
• SimpleDB offers a range of consistency options
• SimpleDB domains are sparse and schema-less
• The Key and all Attributes are indexed
• Each item must have a unique Key
• An item contains a set of Attributes
• Each Attribute has a name
• Each Attribute has a set of values
• All data is stored as UTF-8 character strings (i.e. no support for types such as numbers or dates)
Technology Overview : SimpleDB
What does the API look like?
 Manage Domains
 CreateDomain
 DeleteDomain
 ListDomains
 DomainMetaData
 Access Data
 Retrieving Data
 GetAttributes – returns a single item
 Select – returns multiple items using SQL syntax
 Writing Data
 PutAttributes – put single item
 BatchPutAttributes – put multiple items
 Removing Data
 DeleteAttributes – delete single item
 BatchDeleteAttributes – delete multiple items
@r39132 - #netflixcloud 21
Technology Overview : SimpleDB
@r39132 - #netflixcloud 22
 Options available on reads and writes
 Consistent Read
 Read the most recently committed write
 May have lower throughput/higher latency/lower
availability
 Conditional Put/Delete
 i.e. Optimistic Locking
 Useful if you want to build a consistent multi-master data
store – you will still require your own anti-entropy
 We do not use this currently, so we don’t know how it
performs
Netflix's Transition to High-Availability Storage (QCon SF 2010)
Translate RDBMS Concepts to Key-Value Store
Concepts
 Relational Databases are known for relations
 First, a quick refresher on Normal forms
@r39132 - #netflixcloud 24
Normalization
NF1 : All occurrences of a record type must contain the same number of
fields -variable repeating fields and groups are not allowed
NF2 : Second normal form is violated when a non-key field is a fact about
a subset of a key
Violated here
Fixed here
@r39132 - #netflixcloud 25
Part Warehouse Quantity Warehouse-
Address
Part Warehouse Quantity Warehouse Warehouse-
Address
Normalization
 Issues
 Wastes Storage
 The warehouse address is repeated for every Part-WH pair
 Update Performance Suffers
 If the address of the warehouse changes, I must update
many Part-WH pairs
 Data inconsistencies possible
 I can update the warehouse address for one Part-WH pair
and miss Parts for the same WH
 Data Loss Possible
 If at some point in time there are no parts, the WH address
will be lost
@r39132 - #netflixcloud 26
Normalization
 RDBMS  KV Store migrations can’t simply accept
denormalization!
 Especially many-to-many and many-to-one entity relationships
 Instead, pick your data set candidates carefully!
 Keep relational data in RDBMS
 Move key-look-ups to KV stores
 Luckily for Netflix, most data is accessed by Customer, Video,
or both : i.e. Key Lookups
@r39132 - #netflixcloud 27
Translate RDBMS Concepts to Key-Value Store
Concepts
 Aside from relations, relational databases typically
offer the following:
 Transactions
 Locks
 Sequences
 Triggers
 Clocks
 A structured query language (i.e. SQL)
 Database server-side coding constructs (i.e. PL/SQL)
 Constraints
@r39132 - #netflixcloud 28
Translate RDBMS Concepts to Key-Value Store
Concepts
 Partial or no SQL support. Loosely-speaking, SimpleDB supports a
subset of SQL
 BEST PRACTICE
 Do GROUP BY and JOIN operations in the application layer
involving smallish data sets
 No relations between domains
 BEST PRACTICE
 Compose relations in the application layer
 No transactions
 BEST PRACTICE
 Use SimpleDB’s Optimistic Concurrency Control API: ConditionalPut
and ConditionalDelete
@r39132 - #netflixcloud 29
Translate RDBMS Concepts to Key-Value Store
Concepts
 No schema - This is non-obvious. A query for a misspelled attribute
name will not fail with an error
 BEST PRACTICE
 Implement a schema validator in a common data access layer
 No sequences
 BEST PRACTICE
 Sequences are often used as primary keys
 In this case, use a naturally occurring unique key
 If no naturally occurring unique key exists, use a UUID
 Sequences are also often used for ordering
 Use a distributed sequence generator
@r39132 - #netflixcloud 30
Translate RDBMS Concepts to Key-Value Store
Concepts
 No clock operations, PL/SQL, Triggers
 BEST PRACTICE
 Do without
 No constraints. Specifically,
 No uniqueness constraints
 No foreign key or referential constraints
 No integrity constraints
 BEST PRACTICE
 Read Repair and Anti-entropy processes using Conditional
Put/Delete
@r39132 - #netflixcloud 31
Netflix's Transition to High-Availability Storage (QCon SF 2010)
Work-around Issues specific to the chosen KV
store
 Missing / Strange Functionality
 No back-up and recovery
 No native support for types (e.g. Number, Float, Date, etc…)
 You cannot update one attribute and null out another one for an
item in a single API call
 Mis-cased or misspelled attribute names in operations fail silently.
Why is SimpleDB case-sensitive?
 Neglecting "limit N" returns a subset of information. Why does the
absence of an optional parameter not return all of the data?
 Users need to deal with data set partitioning
 Beware of Nulls
 Poor Performance
@r39132 - #netflixcloud 33
Work-around Issues specific to the chosen KV
store
No Native Types – Sorting, Inequalities Conditions,
etc…
 Since sorting is lexicographical, if you plan on sorting by certain
attributes, then
 zero-pad logically-numeric attributes
 e.g. –
 000000000000000111111  this is bigger
 000000000000000011111
 use Joda time to store logical dates
 e.g. –
 2010-02-10T01:15:32.864Z  this is more recent
 2010-02-10T01:14:42.864Z
@r39132 - #netflixcloud 34
Work-around Issues specific to the chosen KV
store
 Anti-pattern : Avoid the anti-pattern Select SOME_FIELD_1 from
MY_DOMAIN where SOME_FIELD_2 is null as this is a full domain
scan
 Nulls are not indexed in a sparse-table
 BEST PRACTICE
 Instead, replace this check with a (indexed) flag column
called IS_FIELD_2_NULL: Select SOME_FIELD_1 from
MY_DOMAIN where IS_FIELD_2_NULL = 'Y'
 Anti-pattern : When selecting data from a domain and sorting by an
attribute, items missing that attribute will not be returned
 In Oracle, rows with null columns are still returned
 BEST PRACTICE
 Use a flag column as shown previously
@r39132 - #netflixcloud 35
Work-around Issues specific to the chosen KV
store
 BEST PRACTICE : Aim for high index selectivity when you formulate
your select expressions for best performance
 SimpleDB select performance is sensitive to index selectivity
 Index Selectivity
 Definition : # of distinct attribute values in specified attribute /
# of items in domain
 e.g. Good Index Selectivity (i.e. 1 is the best)
 A table having 100 records and one of its indexed column
has 88 distinct values, then the selectivity of this index is
88 / 100= 0.88
 e.g. Bad Index Selectivity
 lf an index on a table of 1000 records had only 5 distinct
values, then the index's selectivity is 5 / 1000 = 0.005
@r39132 - #netflixcloud 36
Work-around Issues specific to the chosen KV
store
Sharding Domains
 There are 2 reasons to shard domains
 You are trying to avoid running into one of the sizing limits
 e.g. 10GB of space or 1 Billion Attributes
 You are trying to scale your writes
 To scale your writes further, use BatchPutAttributes and
BatchDeleteAttributes where possible
@r39132 - #netflixcloud 37
Netflix's Transition to High-Availability Storage (QCon SF 2010)
Create a Bi-directional DC-Cloud Data
Replication Pipeline
 Home-grown Data Replication Framework known as IR for Item
Replication
 2 schemes in use currently
 Polls the main table (a.k.a. Simple IR)
 Doesn’t capture deletes but easy to implement
 Polls a journal table that is populated via a trigger on the
main table (a.k.a. Trigger-journaled IR)
 Captures every CRUD, but requires the development
of triggers
@r39132 - #netflixcloud 39
Create a Bi-directional DC-Cloud Data
Replication Pipeline
@r39132 - #netflixcloud 40
Create a Bi-directional DC-Cloud Data
Replication Pipeline
 How often do we poll Oracle?
 Every 5 seconds
 What does the poll query look like?
 select *
from QLOG_0
where LAST_UPDATE_TS > :CHECKPOINT  Get recent
and LAST_UPDATE_TS < :NOW_MINUS_30s  Exclude
most recent
order by LAST_UPDATE_TS  Process in order
@r39132 - #netflixcloud 41
Create a Bi-directional DC-Cloud Data
Replication Pipeline
 Data Replication Challenges & Best Practices
 SimpleDB throttles traffic aggressively via 503 HTTP Response
codes (“Service Unavailable”)
 With Singleton writes, I see 70-120 write TPS/domain
 IR
 Shard domains (i.e. partition data sets) to work-around these limits
 Employs Slow ramp up
 Uses BatchPutAttributes instead of (Singleton) PutAttributes call
 Exercises an exponential bounded-back-off algorithm
 Uses attribute-level replace=false when fork-lifting data
@r39132 - #netflixcloud 42
Create a Bi-directional DC-Cloud Data
Replication Pipeline
@r39132 - #netflixcloud 43
Create a Bi-directional DC-Cloud Data
Replication Pipeline
 Data Replication Challenges & Best Practices
 Implementing Multi-mastering and an Eventually-consistent
Replication Pipeline
 SimpleDB offers optimistic concurrency control in the form of
conditional put (and deletes)
 For our data, it is ok to be “consistent, but not accurate”
 With this relaxation, we do not need to be concerned with
synchronizing logical clocks
 We simply just need to ensure that each conditional put puts a large
strictly increasing value into the “version” column
@r39132 - #netflixcloud 44
Netflix's Transition to High-Availability Storage (QCon SF 2010)
Netflix's Transition to High-Availability Storage (QCon SF 2010)

More Related Content

PDF
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Sid Anand
 
PDF
Introduction to Spark 2.0 Dataset API
datamantra
 
PDF
Hadoop Spark - Reuniao SouJava 12/04/2014
soujavajug
 
PDF
Creating Reusable Geospatial Pipelines
Databricks
 
PPTX
Policy 2012 presentation
bdemchak
 
PDF
Productionalizing a spark application
datamantra
 
PDF
Spark at Bloomberg: Dynamically Composable Analytics
Jen Aman
 
PDF
Stardog talk-dc-march-17
Clark & Parsia LLC
 
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Sid Anand
 
Introduction to Spark 2.0 Dataset API
datamantra
 
Hadoop Spark - Reuniao SouJava 12/04/2014
soujavajug
 
Creating Reusable Geospatial Pipelines
Databricks
 
Policy 2012 presentation
bdemchak
 
Productionalizing a spark application
datamantra
 
Spark at Bloomberg: Dynamically Composable Analytics
Jen Aman
 
Stardog talk-dc-march-17
Clark & Parsia LLC
 

What's hot (18)

PDF
Sprache als Werkzeug: DSLs mit Kotlin (JAX 2020)
Frank Scheffler
 
PPTX
Spark r under the hood with Hossein Falaki
Databricks
 
PDF
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPU
Databricks
 
PDF
PelletServer: REST and Semantic Technologies
Clark & Parsia LLC
 
PDF
Stardog 1.1: An Easier, Smarter, Faster RDF Database
kendallclark
 
PPTX
Multi Source Data Analysis using Spark and Tellius
datamantra
 
PDF
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Garindra Prahandono
 
PPTX
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Chris Fregly
 
PDF
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Databricks
 
PDF
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Databricks
 
PDF
When Apache Spark Meets TiDB with Xiaoyu Ma
Databricks
 
PDF
introduction to ldap
Thevakumar Presanth
 
PDF
Building a High-Performance Database with Scala, Akka, and Spark
Evan Chan
 
PPTX
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Chris Fregly
 
PDF
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Chris Fregly
 
PDF
Interactive workflow management using Azkaban
datamantra
 
PDF
Memory Optimization and Reliable Metrics in ML Pipelines at Netflix
Databricks
 
PDF
Improving Mobile Payments With Real time Spark
datamantra
 
Sprache als Werkzeug: DSLs mit Kotlin (JAX 2020)
Frank Scheffler
 
Spark r under the hood with Hossein Falaki
Databricks
 
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPU
Databricks
 
PelletServer: REST and Semantic Technologies
Clark & Parsia LLC
 
Stardog 1.1: An Easier, Smarter, Faster RDF Database
kendallclark
 
Multi Source Data Analysis using Spark and Tellius
datamantra
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Garindra Prahandono
 
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Chris Fregly
 
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Databricks
 
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Databricks
 
When Apache Spark Meets TiDB with Xiaoyu Ma
Databricks
 
introduction to ldap
Thevakumar Presanth
 
Building a High-Performance Database with Scala, Akka, and Spark
Evan Chan
 
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Chris Fregly
 
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Chris Fregly
 
Interactive workflow management using Azkaban
datamantra
 
Memory Optimization and Reliable Metrics in ML Pipelines at Netflix
Databricks
 
Improving Mobile Payments With Real time Spark
datamantra
 
Ad

Similar to Netflix's Transition to High-Availability Storage (QCon SF 2010) (20)

PPTX
Svccg nosql 2011_v4
Sid Anand
 
PPTX
Linked in nosql_atnetflix_2012_v1
Sid Anand
 
PPT
Amazon Simpledb
Biswajeet Dasmajumdar
 
PDF
Simply Business' Data Platform
Dani Solà Lagares
 
PPT
Big Data
NGDATA
 
PPTX
NoSQL: An Analysis
Andrew Brust
 
PPT
Database Management Myths & Reality for the future
A B M Moniruzzaman
 
PDF
Preparing yourdataforcloud
Inphina Technologies
 
PDF
Big Data Architecture and Design Patterns
John Yeung
 
PPTX
A Practical Look at the NOSQL and Big Data Hullabaloo
Andrew Brust
 
PPTX
NoSQL and The Big Data Hullabaloo
Andrew Brust
 
PDF
Anything data (revisited)
Ahmet Akyol
 
PDF
Prepare Your Data For The Cloud
IndicThreads
 
PDF
Preparing your data for the cloud
Inphina Technologies
 
PDF
Big data and Analytics on AWS
2nd Watch
 
PDF
From Data To Insights
Orit Alul
 
PDF
Event-Driven Architectures Done Right | Tim Berglund, Confluent
HostedbyConfluent
 
PDF
AWS Cloud Experience CA: Bases de Datos en AWS: distintas necesidades, distin...
Amazon Web Services LATAM
 
PPTX
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 
PDF
Migrating Netflix from Datacenter Oracle to Global Cassandra
Adrian Cockcroft
 
Svccg nosql 2011_v4
Sid Anand
 
Linked in nosql_atnetflix_2012_v1
Sid Anand
 
Amazon Simpledb
Biswajeet Dasmajumdar
 
Simply Business' Data Platform
Dani Solà Lagares
 
Big Data
NGDATA
 
NoSQL: An Analysis
Andrew Brust
 
Database Management Myths & Reality for the future
A B M Moniruzzaman
 
Preparing yourdataforcloud
Inphina Technologies
 
Big Data Architecture and Design Patterns
John Yeung
 
A Practical Look at the NOSQL and Big Data Hullabaloo
Andrew Brust
 
NoSQL and The Big Data Hullabaloo
Andrew Brust
 
Anything data (revisited)
Ahmet Akyol
 
Prepare Your Data For The Cloud
IndicThreads
 
Preparing your data for the cloud
Inphina Technologies
 
Big data and Analytics on AWS
2nd Watch
 
From Data To Insights
Orit Alul
 
Event-Driven Architectures Done Right | Tim Berglund, Confluent
HostedbyConfluent
 
AWS Cloud Experience CA: Bases de Datos en AWS: distintas necesidades, distin...
Amazon Web Services LATAM
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Adrian Cockcroft
 
Ad

More from Sid Anand (20)

PDF
Building High Fidelity Data Streams (QCon London 2023)
Sid Anand
 
PDF
Low Latency Fraud Detection & Prevention
Sid Anand
 
PDF
YOW! Data Keynote (2021)
Sid Anand
 
PDF
Big Data, Fast Data @ PayPal (YOW 2018)
Sid Anand
 
PDF
Building Better Data Pipelines using Apache Airflow
Sid Anand
 
PPTX
Cloud Native Predictive Data Pipelines (micro talk)
Sid Anand
 
PDF
Cloud Native Data Pipelines (GoTo Chicago 2017)
Sid Anand
 
PDF
Cloud Native Data Pipelines (DataEngConf SF 2017)
Sid Anand
 
PDF
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Sid Anand
 
PDF
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Sid Anand
 
PDF
Introduction to Apache Airflow - Data Day Seattle 2016
Sid Anand
 
PDF
Airflow @ Agari
Sid Anand
 
PDF
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Sid Anand
 
PDF
Resilient Predictive Data Pipelines (QCon London 2016)
Sid Anand
 
PPTX
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Sid Anand
 
PPTX
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
Sid Anand
 
PPTX
Building a Modern Website for Scale (QCon NY 2013)
Sid Anand
 
PDF
Hands On with Maven
Sid Anand
 
PDF
Learning git
Sid Anand
 
PDF
LinkedIn Data Infrastructure Slides (Version 2)
Sid Anand
 
Building High Fidelity Data Streams (QCon London 2023)
Sid Anand
 
Low Latency Fraud Detection & Prevention
Sid Anand
 
YOW! Data Keynote (2021)
Sid Anand
 
Big Data, Fast Data @ PayPal (YOW 2018)
Sid Anand
 
Building Better Data Pipelines using Apache Airflow
Sid Anand
 
Cloud Native Predictive Data Pipelines (micro talk)
Sid Anand
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Sid Anand
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Sid Anand
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Sid Anand
 
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Sid Anand
 
Introduction to Apache Airflow - Data Day Seattle 2016
Sid Anand
 
Airflow @ Agari
Sid Anand
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Sid Anand
 
Resilient Predictive Data Pipelines (QCon London 2016)
Sid Anand
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Sid Anand
 
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
Sid Anand
 
Building a Modern Website for Scale (QCon NY 2013)
Sid Anand
 
Hands On with Maven
Sid Anand
 
Learning git
Sid Anand
 
LinkedIn Data Infrastructure Slides (Version 2)
Sid Anand
 

Recently uploaded (20)

PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Software Development Methodologies in 2025
KodekX
 
Doc9.....................................
SofiaCollazos
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 

Netflix's Transition to High-Availability Storage (QCon SF 2010)

  • 2. Coordinates Twitter @r39132 #netflixcloud Blog http://practicalcloudcomputing.com Linked In http://www.linkedin.com/in/siddharthanand 2@r39132 - #netflixcloud
  • 3. Why Are You Here? ”What I need is an exact list of specific unknown problems we might encounter." -- anonymous @r39132 - #netflixcloud 3
  • 5. Motivation  Circa late 2008, Netflix had a single data center  Single-point-of-failure (a.k.a. SPOF)  Approaching limits on cooling, power, space, traffic capacity  Alternatives  Build more data centers  Outsource the majority of our capacity planning and scale out @r39132 - #netflixcloud 5
  • 6. Motivation  Winner : Outsource the majority of our capacity planning and scale out  Leverage a leading Infrastructure-as-a-service provider  Amazon Web Services  Footnote : As it has taken us a while (i.e. ~2+ years) to realize our vision of running on the cloud, we needed a interim solution to handle growth  We did build a second data center along the way  We did outgrow it 6@r39132 - #netflixcloud
  • 8. Cloud Migration Strategy  Components  Applications and Software Infrastructure  Data  Migration Considerations  Security  PII and PCI DSS stays in our DC, rest can go to the cloud  Scalability and Availability for Business Success @r39132 - #netflixcloud 8
  • 9. Cloud Migration Strategy  Scalability and Availability for Business Success  High Growth or High Traffic Growth Data  Video starts, Personalized Video choosing  High Traffic Growth Applications  Same as above  Log Processing  Time-to-market Critical Batch Processing  Video encoding  Not Included  DVD inventory and shipment  We are a streaming company that also ships DVD @r39132 - #netflixcloud 9
  • 10. Cloud Migration Strategy Examples of Data that can be moved  Video-centric data  Critics’ reviews  Metadata  User-video-centric data – some of our largest data sets  User-video queue  Previously streamed and shipped video history  Ratings (i.e. a 5-star rating system)  Video streaming metadata (e.g. streaming bookmarks) @r39132 - #netflixcloud 10
  • 12. Cloud Migration Strategy  High-level Requirements for our Site  No big-bang migrations  New functionality needs to launch in the cloud when possible  High-level Requirements for our Data  Data needs to migrate before applications  Data needs to be shared between applications running in the cloud and our data center during the transition period @r39132 - #netflixcloud 12
  • 13. Cloud Migration Strategy @r39132 - #netflixcloud 13
  • 14. Cloud Migration Strategy  Low-level Requirements for our Data  Pick a (key-value) data store in the cloud  Challenges  Translate RDBMS concepts to KV store concepts  Work-around Issues specific to the chosen KV store  Create a bi-directional DC-Cloud data replication pipeline @r39132 - #netflixcloud 14
  • 16. Pick a Data Store in the Cloud An ideal storage solution should have the following features:  Hosted  Managed Distribution Model  Works in AWS  AP from CAP  Handles a majority of use-cases accessing high-growth, high-traffic data  Specifically, key access by customer id, movie id, or both @r39132 - #netflixcloud 16
  • 17. Pick a Data Store in the Cloud  We picked SimpleDB and S3  SimpleDB was targeted as the AP equivalent of our RDBMS databases in our Data Center  S3 was used for data sets where item or row data exceeded SimpleDB limits and could be looked up purely by a single key (i.e. does not require secondary indices and complex query semantics)  Video encodes  Streaming device activity logs (i.e. CLOB, BLOB, etc…)  Compression of old Rental History @r39132 - #netflixcloud 17
  • 19. Technology Overview : SimpleDB SimpleDB Hash Table Relational Databases Domain Hash Table Table Item Entry Row Item Name Key Mandatory Primary Key Attribute Part of the Entry Value Column @r39132 - #netflixcloud 19 Terminology
  • 20. Technology Overview : SimpleDB @r39132 - #netflixcloud 20 Soccer Players Key Value ab12ocs12v9 First Name = Harold Last Name = Kewell Nickname = Wizard of Oz Teams = Leeds United, Liverpool, Galatasaray b24h3b3403b First Name = Pavel Last Name = Nedved Nickname = Czech Cannon Teams = Lazio, Juventus cc89c9dc892 First Name = Cristiano Last Name = Ronaldo Teams = Sporting, Manchester United, Real Madrid SimpleDB’s salient characteristics • SimpleDB offers a range of consistency options • SimpleDB domains are sparse and schema-less • The Key and all Attributes are indexed • Each item must have a unique Key • An item contains a set of Attributes • Each Attribute has a name • Each Attribute has a set of values • All data is stored as UTF-8 character strings (i.e. no support for types such as numbers or dates)
  • 21. Technology Overview : SimpleDB What does the API look like?  Manage Domains  CreateDomain  DeleteDomain  ListDomains  DomainMetaData  Access Data  Retrieving Data  GetAttributes – returns a single item  Select – returns multiple items using SQL syntax  Writing Data  PutAttributes – put single item  BatchPutAttributes – put multiple items  Removing Data  DeleteAttributes – delete single item  BatchDeleteAttributes – delete multiple items @r39132 - #netflixcloud 21
  • 22. Technology Overview : SimpleDB @r39132 - #netflixcloud 22  Options available on reads and writes  Consistent Read  Read the most recently committed write  May have lower throughput/higher latency/lower availability  Conditional Put/Delete  i.e. Optimistic Locking  Useful if you want to build a consistent multi-master data store – you will still require your own anti-entropy  We do not use this currently, so we don’t know how it performs
  • 24. Translate RDBMS Concepts to Key-Value Store Concepts  Relational Databases are known for relations  First, a quick refresher on Normal forms @r39132 - #netflixcloud 24
  • 25. Normalization NF1 : All occurrences of a record type must contain the same number of fields -variable repeating fields and groups are not allowed NF2 : Second normal form is violated when a non-key field is a fact about a subset of a key Violated here Fixed here @r39132 - #netflixcloud 25 Part Warehouse Quantity Warehouse- Address Part Warehouse Quantity Warehouse Warehouse- Address
  • 26. Normalization  Issues  Wastes Storage  The warehouse address is repeated for every Part-WH pair  Update Performance Suffers  If the address of the warehouse changes, I must update many Part-WH pairs  Data inconsistencies possible  I can update the warehouse address for one Part-WH pair and miss Parts for the same WH  Data Loss Possible  If at some point in time there are no parts, the WH address will be lost @r39132 - #netflixcloud 26
  • 27. Normalization  RDBMS  KV Store migrations can’t simply accept denormalization!  Especially many-to-many and many-to-one entity relationships  Instead, pick your data set candidates carefully!  Keep relational data in RDBMS  Move key-look-ups to KV stores  Luckily for Netflix, most data is accessed by Customer, Video, or both : i.e. Key Lookups @r39132 - #netflixcloud 27
  • 28. Translate RDBMS Concepts to Key-Value Store Concepts  Aside from relations, relational databases typically offer the following:  Transactions  Locks  Sequences  Triggers  Clocks  A structured query language (i.e. SQL)  Database server-side coding constructs (i.e. PL/SQL)  Constraints @r39132 - #netflixcloud 28
  • 29. Translate RDBMS Concepts to Key-Value Store Concepts  Partial or no SQL support. Loosely-speaking, SimpleDB supports a subset of SQL  BEST PRACTICE  Do GROUP BY and JOIN operations in the application layer involving smallish data sets  No relations between domains  BEST PRACTICE  Compose relations in the application layer  No transactions  BEST PRACTICE  Use SimpleDB’s Optimistic Concurrency Control API: ConditionalPut and ConditionalDelete @r39132 - #netflixcloud 29
  • 30. Translate RDBMS Concepts to Key-Value Store Concepts  No schema - This is non-obvious. A query for a misspelled attribute name will not fail with an error  BEST PRACTICE  Implement a schema validator in a common data access layer  No sequences  BEST PRACTICE  Sequences are often used as primary keys  In this case, use a naturally occurring unique key  If no naturally occurring unique key exists, use a UUID  Sequences are also often used for ordering  Use a distributed sequence generator @r39132 - #netflixcloud 30
  • 31. Translate RDBMS Concepts to Key-Value Store Concepts  No clock operations, PL/SQL, Triggers  BEST PRACTICE  Do without  No constraints. Specifically,  No uniqueness constraints  No foreign key or referential constraints  No integrity constraints  BEST PRACTICE  Read Repair and Anti-entropy processes using Conditional Put/Delete @r39132 - #netflixcloud 31
  • 33. Work-around Issues specific to the chosen KV store  Missing / Strange Functionality  No back-up and recovery  No native support for types (e.g. Number, Float, Date, etc…)  You cannot update one attribute and null out another one for an item in a single API call  Mis-cased or misspelled attribute names in operations fail silently. Why is SimpleDB case-sensitive?  Neglecting "limit N" returns a subset of information. Why does the absence of an optional parameter not return all of the data?  Users need to deal with data set partitioning  Beware of Nulls  Poor Performance @r39132 - #netflixcloud 33
  • 34. Work-around Issues specific to the chosen KV store No Native Types – Sorting, Inequalities Conditions, etc…  Since sorting is lexicographical, if you plan on sorting by certain attributes, then  zero-pad logically-numeric attributes  e.g. –  000000000000000111111  this is bigger  000000000000000011111  use Joda time to store logical dates  e.g. –  2010-02-10T01:15:32.864Z  this is more recent  2010-02-10T01:14:42.864Z @r39132 - #netflixcloud 34
  • 35. Work-around Issues specific to the chosen KV store  Anti-pattern : Avoid the anti-pattern Select SOME_FIELD_1 from MY_DOMAIN where SOME_FIELD_2 is null as this is a full domain scan  Nulls are not indexed in a sparse-table  BEST PRACTICE  Instead, replace this check with a (indexed) flag column called IS_FIELD_2_NULL: Select SOME_FIELD_1 from MY_DOMAIN where IS_FIELD_2_NULL = 'Y'  Anti-pattern : When selecting data from a domain and sorting by an attribute, items missing that attribute will not be returned  In Oracle, rows with null columns are still returned  BEST PRACTICE  Use a flag column as shown previously @r39132 - #netflixcloud 35
  • 36. Work-around Issues specific to the chosen KV store  BEST PRACTICE : Aim for high index selectivity when you formulate your select expressions for best performance  SimpleDB select performance is sensitive to index selectivity  Index Selectivity  Definition : # of distinct attribute values in specified attribute / # of items in domain  e.g. Good Index Selectivity (i.e. 1 is the best)  A table having 100 records and one of its indexed column has 88 distinct values, then the selectivity of this index is 88 / 100= 0.88  e.g. Bad Index Selectivity  lf an index on a table of 1000 records had only 5 distinct values, then the index's selectivity is 5 / 1000 = 0.005 @r39132 - #netflixcloud 36
  • 37. Work-around Issues specific to the chosen KV store Sharding Domains  There are 2 reasons to shard domains  You are trying to avoid running into one of the sizing limits  e.g. 10GB of space or 1 Billion Attributes  You are trying to scale your writes  To scale your writes further, use BatchPutAttributes and BatchDeleteAttributes where possible @r39132 - #netflixcloud 37
  • 39. Create a Bi-directional DC-Cloud Data Replication Pipeline  Home-grown Data Replication Framework known as IR for Item Replication  2 schemes in use currently  Polls the main table (a.k.a. Simple IR)  Doesn’t capture deletes but easy to implement  Polls a journal table that is populated via a trigger on the main table (a.k.a. Trigger-journaled IR)  Captures every CRUD, but requires the development of triggers @r39132 - #netflixcloud 39
  • 40. Create a Bi-directional DC-Cloud Data Replication Pipeline @r39132 - #netflixcloud 40
  • 41. Create a Bi-directional DC-Cloud Data Replication Pipeline  How often do we poll Oracle?  Every 5 seconds  What does the poll query look like?  select * from QLOG_0 where LAST_UPDATE_TS > :CHECKPOINT  Get recent and LAST_UPDATE_TS < :NOW_MINUS_30s  Exclude most recent order by LAST_UPDATE_TS  Process in order @r39132 - #netflixcloud 41
  • 42. Create a Bi-directional DC-Cloud Data Replication Pipeline  Data Replication Challenges & Best Practices  SimpleDB throttles traffic aggressively via 503 HTTP Response codes (“Service Unavailable”)  With Singleton writes, I see 70-120 write TPS/domain  IR  Shard domains (i.e. partition data sets) to work-around these limits  Employs Slow ramp up  Uses BatchPutAttributes instead of (Singleton) PutAttributes call  Exercises an exponential bounded-back-off algorithm  Uses attribute-level replace=false when fork-lifting data @r39132 - #netflixcloud 42
  • 43. Create a Bi-directional DC-Cloud Data Replication Pipeline @r39132 - #netflixcloud 43
  • 44. Create a Bi-directional DC-Cloud Data Replication Pipeline  Data Replication Challenges & Best Practices  Implementing Multi-mastering and an Eventually-consistent Replication Pipeline  SimpleDB offers optimistic concurrency control in the form of conditional put (and deletes)  For our data, it is ok to be “consistent, but not accurate”  With this relaxation, we do not need to be concerned with synchronizing logical clocks  We simply just need to ensure that each conditional put puts a large strictly increasing value into the “version” column @r39132 - #netflixcloud 44

Editor's Notes

  • #13: Existing functionality needs to move in phases Limits the risk and exposure to bugs Limits conflicts with new product launches
  • #14: Existing functionality needs to move in phases Limits the risk and exposure to bugs Limits conflicts with new product launches
  • #15: Existing functionality needs to move in phases Limits the risk and exposure to bugs Limits conflicts with new product launches
  • #34: Dynamo storage doesn’t suffer from this!
  • #36: This is an issue with any SQL-like Query layer over a Sparse-data model. It can happen in other technologies.
  • #37: Cannot treat SimpleDB like a black-box for performance critical applications.
  • #38: We found the write availability was affected by the right partitioning scheme. We use a combination of forwarding tables and modulo addressing
  • #41: Mention trickle lift
  • #46: We like that it is available, hosted, and managed. We don’t like the performance issues We are looking into Cassandra and other KV stores