SlideShare a Scribd company logo
A NETFLIX ORIGINAL SERVICE
Peter Bakas | @peter_bakas
@ Netflix : Cloud Platform Engineering - Real Time Data Infrastructure
@ Ooyala : Analytics, Discovery, Platform Engineering & Infrastructure
@ Yahoo : Display Advertising, Behavioral Targeting, Payments
@ PayPal : Site Engineering and Architecture
@ Play : Advisor to Startups (Data, Security, Containers)
Who is this guy?
common data pipeline to collect, transport, aggregate, process and visualize events
Why are we here?
● Architectural design and principles for Keystone
● Technologies that Keystone is leveraging
● Best practices
What should I expect?
Let’s get down to business
Netflix is a logging company
that occasionally streams movies
600+ billion events ingested per day
11 million events (24 GB per second) peak
Hundreds of event types
Over 1.3 Petabyte / day
Numbers Galore!
But wait, there’s more
1+ trillion events processed every day
1 trillion events ingested per day during holiday season
Numbers Galore - Part Deux
How did we get here?
Chukwa
Chukwa/Suro + Real-Time Branch
Keystone
Stream
Consumers
Samza
Router
EMR
Fronting
Kafka
Event
Producer
Consumer
Kafka
Control Plane
HTTP
PROXY
Kafka Primer
Kafka is a distributed, partitioned, replicated commit log service.
Kafka Terminology
● Producer
● Consumer
● Topic
● Partition
● Broker
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Kafka Producer
● Best effort delivery
● Prefer msg drop than disrupting producer app
● Wraps Apache Kafka Producer
● Integration with Netflix ecosystem: Eureka, Atlas, etc.
Producer Impact
● Kafka outage does not disrupt existing instances from serving its purpose
● Kafka outage should never prevent new instances from starting up
● After kafka cluster restored, event producing should resume automatically
Prefer Drop than Block
● Drop when buffer is full
● Handle potential blocking of first meta data request
● ack=1 (vs 2)
Sticky Partitioner
● Batching is important to reduce CPU and network I/O on brokers
● Stick to one partition for a while when producing for non-keyed messages
● “linger.ms” works well with sticky partitioner
Producing events to Keystone
● Using Netflix Platform logging API
○ LogManager.logEvent(Annotatable): majority of the cases
○ KeyValueSeriazlier with ILog#log(String)
● REST endpoint that proxies Platform logging
○ ksproxy
○ Prana sidecar
Injected Event Metadata
● GUID
● Timestamp
● Host
● App
Keystone Extensible Wire Protocol
● Invisible to source & sinks
● Backwards and forwards compatibility
● Supports JSON. AVRO on the horizon
● Efficient - 10 bytes overhead per message
○ message size - hundreds of bytes to 10MB
Keystone Extensible Wire Protocol
● Packaged as a jar
● Why? Evolve Independently
○ event metadata & traceability metadata
○ event payload serialization
Max message size 10MB
● Keystone drops if > 10MB
○ Immutable event payload
Fronting Kafka Clusters
Keystone
Stream
Consumers
Samza
Router
EMR
Fronting
Kafka
Event
Producer
Consumer
Kafka
Control Plane
Fronting Kafka Clusters
● Normal-priority (majority)
● High-priority (streaming activities etc.)
Fronting Kafka Instances
● 3 ASGs per cluster, 1 ASG per zone
● 3000 d2.xl AWS instances across 3 regions for regular & failover traffic
Partition Assignment
● All replica assignments zone aware
○ Improved availability
○ Reduce cost of maintenance
Kafka Fault Tolerance
● Instance failure
○ With replication factor of N, guarantee no data loss with N-1 failures
○ With zone aware replica assignment, guarantee no data loss with multiple instance failures in the same
zone
● Sink failure
○ No data loss during retention period
● Replication is the key
○ Data loss can happen if leader dies while follower AND consumer cannot catch up
○ Usually indicated by UncleanLeaderElection metric
Kafka Auditor as a Service
● Broker monitoring
● Consumer monitoring
● Heart-beat & Continuous message latency
● On-demand Broker performance testing
● Built as a service deployable on single or multiple instances
Current Issues
● By using the d2-xl there is trade off between cost and performance
● Performance deteriorates with increase of partitions
● Replication lag during peak traffic
Routing Service
Keystone
Stream
Consumers
Samza
Router
EMR
Fronting
Kafka
Event
Producer
Consumer
Kafka
Control Plane
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Broker
Routing Infrastructure
+
Checkpointing
Cluster
+ 0.9.1
Router
Job Manager
(Control Plane)
EC2 Instances
Zookeeper
(Instance Id assignment)
Job
Job
Job
ksnode
Checkpointing
Cluster
ASG
Reconcile every min.
Routing Layer
● Total of 13,000 containers on 1,300 AWS C3-4XL instances
○ S3 sink: ~7000 Containers
○ Consumer Kafka sink: ~ 4500 Containers
○ Elasticsearch sink: ~1500 Containers
Routing Layer
● Total of ~1400 streams across all regions
○ ~1000 S3 streams
○ ~250 Consumer Kafka streams
○ ~150 Elasticsearch streams
Router Job Details
● One Job per sink and Kafka source topic
○ Separate Job each for S3, ElasticSearch & Kafka sink
○ Provides better isolation & better QOS
● Batch processed message requests to sinks
● Offset checkpointed after batch request succeeds
Processing Semantics
Data Loss & Duplicates
Backpressure
Producer ⇐ Kafka Cluster ⇐ Samza job router ⇐ Sink
● Keystone - at least once
Data Loss - Producer
● buffer full
● network error
● partition leader change
● partition migration
Data Loss - Kafka
● Lose all Kafka replicas of data
○ Safe guards:
■ AZ isolation / Alerts / Broker replacement automation
■ alerts and monitoring
● Unclean partition leader election
○ ack = 1 could cause loss
Data Loss - Router
● Lose checkpointed offset & the router was down for retention period duration
● If messages not processed past retention period (8h / 24h)
● Unclean leader election cause offset to go back
● Safe guard
○ alerts for lag > 0.1% of traffic for 10 minutes
● Concerned only if unable to launch router instances
Duplicates Router - Sink
● Duplicates possible
○ messages reprocessed - retry after batch S3 upload failure
○ Loss of checkpointed offset (message processed marker)
○ Event GUID helps dedup
Measure Duplicates
● Producer sent count diff with Kafka message received
● Router checkpointed offset monitored over time
Note: GUID can be used to dedup at the sink
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
End to End metrics
● Producer to Router to Sink Average Latencies
○ Batch processing [S3 sink]: ~3 sec
○ Stream processing [Consumer Kafka sink]: ~1 sec
○ Log analysis [Elasticsearch]: ~400 seconds (with back pressure)
End to End metrics
● End to End latencies
○ S3:
■ 50 percentile under 1 sec
■ 80 percentile under 8 seconds
○ Consumer Kafka:
■ 50 percentile under 800 ms
■ 80 percentile under 4 seconds
○ Elasticsearch:
■ 50 percentile under 13 seconds
■ 80 percentile under 53 seconds
Alerts
● Producer drop rate over 1%
● Consumer lag > 0.1%
● Next offset after Checkpointed offset not found
● Consumer stuck on partition level
Keystone Dashboard
Keystone Dashboard
Keystone Dashboard
Keystone Dashboard
And Then???
There’s more in the pipeline...
● Self service tools
● Better management of scaling Kafka
● More capable control plane
● JSON Support exists, support for Avro on the horizon
● Multi-tenant Messaging as a Service - MaaS
● Multi-tenant Stream Processing as a Service - SPaaS
???s

More Related Content

DOC
Enhancing SIEM Correlation Rules Through Baselining
PPTX
Osint - Dark side of Internet
PPTX
Spark Summit East 2017: Apache spark and object stores
PDF
Configuring global infrastructure in terraform
PDF
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
PPTX
Log management with ELK
PDF
ClickHouse北京Meetup ClickHouse Best Practice @Sina
PDF
State of the Trino Project
Enhancing SIEM Correlation Rules Through Baselining
Osint - Dark side of Internet
Spark Summit East 2017: Apache spark and object stores
Configuring global infrastructure in terraform
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Log management with ELK
ClickHouse北京Meetup ClickHouse Best Practice @Sina
State of the Trino Project

What's hot (20)

PPTX
Apache Hive Tutorial
PDF
Big Fast Data in High-Energy Particle Physics
PDF
Apache Hbase Architecture
PDF
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB
PDF
Collaborative Editing Tools for Alfresco
PDF
[Meetup] a successful migration from elastic search to clickhouse
PDF
Introduction to Redis
PDF
Reading The Source Code of Presto
PPTX
High throughput data replication over RAFT
PPTX
Google file system
PDF
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
PDF
Kafka streams windowing behind the curtain
PDF
DataStax: Extreme Cassandra Optimization: The Sequel
PDF
Achieve Extreme Simplicity and Superior Price/Performance with Greenplum Buil...
PDF
Experience using the IO-500
PDF
Apache Pulsar Development 101 with Python
PPTX
Sizing MongoDB Clusters
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PDF
Lena Application Server
Apache Hive Tutorial
Big Fast Data in High-Energy Particle Physics
Apache Hbase Architecture
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB
Collaborative Editing Tools for Alfresco
[Meetup] a successful migration from elastic search to clickhouse
Introduction to Redis
Reading The Source Code of Presto
High throughput data replication over RAFT
Google file system
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
Kafka streams windowing behind the curtain
DataStax: Extreme Cassandra Optimization: The Sequel
Achieve Extreme Simplicity and Superior Price/Performance with Greenplum Buil...
Experience using the IO-500
Apache Pulsar Development 101 with Python
Sizing MongoDB Clusters
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Lena Application Server
Ad

Viewers also liked (8)

PDF
Imprimante 3D_Créer un objet simple avec tinkercad
PDF
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
PPTX
Couchbase Meetup Jan 2016
PPTX
Real time Messages at Scale with Apache Kafka and Couchbase
PDF
Elasticsearch in Netflix
PDF
Unbounded bounded-data-strangeloop-2016-monal-daxini
POTX
Performance Tuning EC2 Instances
PDF
Velocity 2015 linux perf tools
Imprimante 3D_Créer un objet simple avec tinkercad
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Couchbase Meetup Jan 2016
Real time Messages at Scale with Apache Kafka and Couchbase
Elasticsearch in Netflix
Unbounded bounded-data-strangeloop-2016-monal-daxini
Performance Tuning EC2 Instances
Velocity 2015 linux perf tools
Ad

Similar to Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec (20)

PDF
Monal Daxini - Beaming Flink to the Cloud @ Netflix
PDF
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
PDF
Keystone - ApacheCon 2016
PDF
Netflix Keystone—Cloud scale event processing pipeline
PPTX
Netflix Data Pipeline With Kafka
PPTX
Netflix Data Pipeline With Kafka
PDF
Netflix Open Source Meetup Season 4 Episode 2
PDF
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
PDF
How Uber scaled its Real Time Infrastructure to Trillion events per day
PDF
From Three Nines to Five Nines - A Kafka Journey
PDF
Uber Real Time Data Analytics
PDF
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
PDF
BDX 2016- Monal daxini @ Netflix
PPTX
Architecture of a Kafka camus infrastructure
PPTX
Keystone event processing pipeline on a dockerized microservices architecture
PDF
The Netflix Way to deal with Big Data Problems
PDF
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
PPTX
Running a Massively Parallel Self-serve Distributed Data System At Scale
PPTX
Kafka Practices @ Uber - Seattle Apache Kafka meetup
PDF
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Monal Daxini - Beaming Flink to the Cloud @ Netflix
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Keystone - ApacheCon 2016
Netflix Keystone—Cloud scale event processing pipeline
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Netflix Open Source Meetup Season 4 Episode 2
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
How Uber scaled its Real Time Infrastructure to Trillion events per day
From Three Nines to Five Nines - A Kafka Journey
Uber Real Time Data Analytics
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
BDX 2016- Monal daxini @ Netflix
Architecture of a Kafka camus infrastructure
Keystone event processing pipeline on a dockerized microservices architecture
The Netflix Way to deal with Big Data Problems
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Running a Massively Parallel Self-serve Distributed Data System At Scale
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing

Recently uploaded (20)

PPTX
AgentX UiPath Community Webinar series - Delhi
PPTX
Soil science - sampling procedures for soil science lab
PPTX
Road Safety tips for School Kids by a k maurya.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
436813905-LNG-Process-Overview-Short.pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
24AI201_AI_Unit_4 (1).pptx Artificial intelligence
PPTX
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
PDF
Geotechnical Engineering, Soil mechanics- Soil Testing.pdf
PDF
Queuing formulas to evaluate throughputs and servers
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
MCAD-Guidelines. Modernization of command Area Development, Guideines
PPTX
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
PDF
Traditional Exams vs Continuous Assessment in Boarding Schools.pdf
PPTX
anatomy of limbus and anterior chamber .pptx
PDF
Top 10 read articles In Managing Information Technology.pdf
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
BRKDCN-2613.pdf Cisco AI DC NVIDIA presentation
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
AgentX UiPath Community Webinar series - Delhi
Soil science - sampling procedures for soil science lab
Road Safety tips for School Kids by a k maurya.pptx
Structs to JSON How Go Powers REST APIs.pdf
436813905-LNG-Process-Overview-Short.pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
24AI201_AI_Unit_4 (1).pptx Artificial intelligence
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
Geotechnical Engineering, Soil mechanics- Soil Testing.pdf
Queuing formulas to evaluate throughputs and servers
Lesson 3_Tessellation.pptx finite Mathematics
MCAD-Guidelines. Modernization of command Area Development, Guideines
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
Traditional Exams vs Continuous Assessment in Boarding Schools.pdf
anatomy of limbus and anterior chamber .pptx
Top 10 read articles In Managing Information Technology.pdf
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
BRKDCN-2613.pdf Cisco AI DC NVIDIA presentation
Strings in CPP - Strings in C++ are sequences of characters used to store and...

Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec

  • 2. Peter Bakas | @peter_bakas @ Netflix : Cloud Platform Engineering - Real Time Data Infrastructure @ Ooyala : Analytics, Discovery, Platform Engineering & Infrastructure @ Yahoo : Display Advertising, Behavioral Targeting, Payments @ PayPal : Site Engineering and Architecture @ Play : Advisor to Startups (Data, Security, Containers) Who is this guy?
  • 3. common data pipeline to collect, transport, aggregate, process and visualize events Why are we here?
  • 4. ● Architectural design and principles for Keystone ● Technologies that Keystone is leveraging ● Best practices What should I expect?
  • 5. Let’s get down to business
  • 6. Netflix is a logging company
  • 8. 600+ billion events ingested per day 11 million events (24 GB per second) peak Hundreds of event types Over 1.3 Petabyte / day Numbers Galore!
  • 10. 1+ trillion events processed every day 1 trillion events ingested per day during holiday season Numbers Galore - Part Deux
  • 11. How did we get here?
  • 15. Kafka Primer Kafka is a distributed, partitioned, replicated commit log service.
  • 16. Kafka Terminology ● Producer ● Consumer ● Topic ● Partition ● Broker
  • 18. Netflix Kafka Producer ● Best effort delivery ● Prefer msg drop than disrupting producer app ● Wraps Apache Kafka Producer ● Integration with Netflix ecosystem: Eureka, Atlas, etc.
  • 19. Producer Impact ● Kafka outage does not disrupt existing instances from serving its purpose ● Kafka outage should never prevent new instances from starting up ● After kafka cluster restored, event producing should resume automatically
  • 20. Prefer Drop than Block ● Drop when buffer is full ● Handle potential blocking of first meta data request ● ack=1 (vs 2)
  • 21. Sticky Partitioner ● Batching is important to reduce CPU and network I/O on brokers ● Stick to one partition for a while when producing for non-keyed messages ● “linger.ms” works well with sticky partitioner
  • 22. Producing events to Keystone ● Using Netflix Platform logging API ○ LogManager.logEvent(Annotatable): majority of the cases ○ KeyValueSeriazlier with ILog#log(String) ● REST endpoint that proxies Platform logging ○ ksproxy ○ Prana sidecar
  • 23. Injected Event Metadata ● GUID ● Timestamp ● Host ● App
  • 24. Keystone Extensible Wire Protocol ● Invisible to source & sinks ● Backwards and forwards compatibility ● Supports JSON. AVRO on the horizon ● Efficient - 10 bytes overhead per message ○ message size - hundreds of bytes to 10MB
  • 25. Keystone Extensible Wire Protocol ● Packaged as a jar ● Why? Evolve Independently ○ event metadata & traceability metadata ○ event payload serialization
  • 26. Max message size 10MB ● Keystone drops if > 10MB ○ Immutable event payload
  • 29. Fronting Kafka Clusters ● Normal-priority (majority) ● High-priority (streaming activities etc.)
  • 30. Fronting Kafka Instances ● 3 ASGs per cluster, 1 ASG per zone ● 3000 d2.xl AWS instances across 3 regions for regular & failover traffic
  • 31. Partition Assignment ● All replica assignments zone aware ○ Improved availability ○ Reduce cost of maintenance
  • 32. Kafka Fault Tolerance ● Instance failure ○ With replication factor of N, guarantee no data loss with N-1 failures ○ With zone aware replica assignment, guarantee no data loss with multiple instance failures in the same zone ● Sink failure ○ No data loss during retention period ● Replication is the key ○ Data loss can happen if leader dies while follower AND consumer cannot catch up ○ Usually indicated by UncleanLeaderElection metric
  • 33. Kafka Auditor as a Service ● Broker monitoring ● Consumer monitoring ● Heart-beat & Continuous message latency ● On-demand Broker performance testing ● Built as a service deployable on single or multiple instances
  • 34. Current Issues ● By using the d2-xl there is trade off between cost and performance ● Performance deteriorates with increase of partitions ● Replication lag during peak traffic
  • 40. Router Job Manager (Control Plane) EC2 Instances Zookeeper (Instance Id assignment) Job Job Job ksnode Checkpointing Cluster ASG Reconcile every min.
  • 41. Routing Layer ● Total of 13,000 containers on 1,300 AWS C3-4XL instances ○ S3 sink: ~7000 Containers ○ Consumer Kafka sink: ~ 4500 Containers ○ Elasticsearch sink: ~1500 Containers
  • 42. Routing Layer ● Total of ~1400 streams across all regions ○ ~1000 S3 streams ○ ~250 Consumer Kafka streams ○ ~150 Elasticsearch streams
  • 43. Router Job Details ● One Job per sink and Kafka source topic ○ Separate Job each for S3, ElasticSearch & Kafka sink ○ Provides better isolation & better QOS ● Batch processed message requests to sinks ● Offset checkpointed after batch request succeeds
  • 45. Backpressure Producer ⇐ Kafka Cluster ⇐ Samza job router ⇐ Sink ● Keystone - at least once
  • 46. Data Loss - Producer ● buffer full ● network error ● partition leader change ● partition migration
  • 47. Data Loss - Kafka ● Lose all Kafka replicas of data ○ Safe guards: ■ AZ isolation / Alerts / Broker replacement automation ■ alerts and monitoring ● Unclean partition leader election ○ ack = 1 could cause loss
  • 48. Data Loss - Router ● Lose checkpointed offset & the router was down for retention period duration ● If messages not processed past retention period (8h / 24h) ● Unclean leader election cause offset to go back ● Safe guard ○ alerts for lag > 0.1% of traffic for 10 minutes ● Concerned only if unable to launch router instances
  • 49. Duplicates Router - Sink ● Duplicates possible ○ messages reprocessed - retry after batch S3 upload failure ○ Loss of checkpointed offset (message processed marker) ○ Event GUID helps dedup
  • 50. Measure Duplicates ● Producer sent count diff with Kafka message received ● Router checkpointed offset monitored over time Note: GUID can be used to dedup at the sink
  • 52. End to End metrics ● Producer to Router to Sink Average Latencies ○ Batch processing [S3 sink]: ~3 sec ○ Stream processing [Consumer Kafka sink]: ~1 sec ○ Log analysis [Elasticsearch]: ~400 seconds (with back pressure)
  • 53. End to End metrics ● End to End latencies ○ S3: ■ 50 percentile under 1 sec ■ 80 percentile under 8 seconds ○ Consumer Kafka: ■ 50 percentile under 800 ms ■ 80 percentile under 4 seconds ○ Elasticsearch: ■ 50 percentile under 13 seconds ■ 80 percentile under 53 seconds
  • 54. Alerts ● Producer drop rate over 1% ● Consumer lag > 0.1% ● Next offset after Checkpointed offset not found ● Consumer stuck on partition level
  • 60. There’s more in the pipeline... ● Self service tools ● Better management of scaling Kafka ● More capable control plane ● JSON Support exists, support for Avro on the horizon ● Multi-tenant Messaging as a Service - MaaS ● Multi-tenant Stream Processing as a Service - SPaaS
  • 61. ???s