natansil.com twitter@NSilnitsky linkedin/natansilnitsky github.com/natansil
Polyglot, Fault-Tolerant, and Performant
Event-Driven Programming with Kafka,
Kubernetes and gRPC
Natan Silnitsky
Backend Infra Developer, Wix.com
registered users from
190 countries
180M
of all internet websites
run on Wix
5%
About Wix
@NSilnitsky
Wix
Editor
Service
Metasite
Service
Restaurant app
Service
1,500
Microservices
Publish
Site
@NSilnitsky
Scala
Python
NodeJS
1,500
Microservices
@NSilnitsky
Wix
1510M Kafka
messages a day
@NSilnitsky
Wix
1,075M Kafka
messages a day
So, we need our
message flows to be
performant,
fault-tolerant, and
polyglot.
Agenda
Event-driven programming with Kafka
+ Performance with Greyhound
+ Fault-tolerance with Greyhound & Kubernetes
+ Polyglot with Kubernetes & gRPC
@NSilnitsky
HTTP
* coupled
Request-Reply
Communication
New App
installed
Site Apps
Service
ECom Catalog
Service
Classic.
@NSilnitsky
HTTP
New App
installed
What if Network
is Unreliable
Request-Reply
Communication
Site Apps
Service
ECom Catalog
Service
@NSilnitsky * 1500 hard
HTTP
New App
installed
Request-Reply
Communication
Cascading Failures
can happen.
@NSilnitsky
Message
Consumer
Message
Producer
Broker
Event-driven
Communication
Introduce
a Broker.
* clusters and replications
@NSilnitsky
Broker
Event-driven
Communication
Message
Consumer
Message
Producer
Introduce
a Broker.
@NSilnitsky
Kafka Broker
Event-driven
Communication
Site Apps
Topic
0 1 2 3 45
0 1 2 3 45
0 1 2 3 45
0 1 2 3 45
0 1 2 3 4
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
Where you store
messages (events!)
A Kafka Broker
Looks Like This:
* vs message queues
@NSilnitsky
Kafka Broker
Event-driven
Communication
Site Apps
Topic
0 1 2 3 45
0 1 2 3 45
0 1 2 3 45
0 1 2 3 45
0 1 2 3 4
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
Kafka
Producer
Site Apps
Service
There’s the
Event
Producer,
@NSilnitsky
Kafka Broker
Event-driven
Communication
Site Apps
Topic
0 1 2 3 45
0 1 2 3 45
0 1 2 3 45
0 1 2 3 45
0 1 2 3 4
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
Kafka
Consumer
Ecom Catalog
Service
and Event
Consumer.
* scale
Site Apps
Service
Kafka
Consumer
Kafka
Producer
Greyhound
wraps Kafka
Ecom Catalog
Service
Kafka Broker
Greyhound
Producer
Greyhound
Consumer
@NSilnitsky
Simplify
APIs, with additional
features
Greyhound
wraps Kafka
Site Apps
Service
Kafka
Consumer
Kafka
Producer
Ecom Catalog
Service
Kafka Broker
@NSilnitsky
Abstract
so that it is easy to
change for
everyone
Kafka
Consumer
Kafka
Producer
Kafka Broker
Greyhound
wraps Kafka
Performant
Event-driven Programming
with Kafka and Greyhound
(Wix OSS)
Wix
1,075M Kafka
messages a day
@NSilnitsky
Kafka Broker
Topic
Greyhound
Consumer
Kafka
Consumer
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Performant
message handling A Service
Message Handler
@NSilnitsky
Kafka Broker
Topic
Greyhound
Consumer
Kafka
Consumer
SCALA ZIO FIBERS + QUEUES
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
(Thread-safe)
Parallel Message Consumption
Performant
message handling A Service
Message Handler
@NSilnitsky
80 partitions
Kafka Broker
Topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 5
Performant
message handling
Maximum Throughput:
Messages Processing -
Non blocking IO (Netty grpc client)
with response latency of 100ms
@NSilnitsky
80 partitions
Kafka Broker
Topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 5
Performant
message handling
Maximum Throughput:
800 Messages per second
Messages Processing -
Non blocking IO (Netty grpc client)
with response latency of 100ms
@NSilnitsky
Kafka
ConsumerKafka
ConsumerKafka
ConsumerKafka
ConsumerKafka
ConsumerKafka
ConsumerKafka
Consumer
Performant
message handling
80 Kafka Consumers
with 80 Java threads
80 partitions
Kafka Broker
Topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 5
Messages Processing -
Non blocking IO (Netty grpc client)
with latency of 100ms
Kafka
Consumer
@NSilnitsky
Greyhound
Performant
message handling
or 1 Greyhound Consumer
with 80 fibers running on a
small thread pool
Kafka
Consumer
80 partitions
Kafka Broker
Topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 5
Messages Processing -
Non blocking IO (Netty grpc client)
with latency of 100ms
fault-tolerant
Event-driven Programming
with Kafka, Greyhound &
Kubernetes
Wix
1,075M Kafka
messages a day
@NSilnitsky
Kafka Broker
renew-sub-topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Greyhound Consumer
Kafka Consumer
Fails To Read
Fault-tolerant
message handling
@NSilnitsky
renew-sub-topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
renew-sub-topic-retry-0
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
renew-sub-topic-retry-1
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Greyhound Consumer
Kafka Consumer
RETRY
PRODUCER
Fault-tolerant
message handling
Inspired by Uber
RETRY!
Kafka Broker
@NSilnitsky
Producer
Wix Payments
Service
Subscription
renewal
Job
Scheduler
Fault-tolerant
message handling
Use Case: Guarantee Completion
Kafka Broker
@NSilnitsky
+ Retry
on failure
Kafka Broker
Producer Consumer
Wix Payments
Service
Subscription
renewal
Job
Scheduler
Fault-tolerant
message handling
Use Case: Guarantee Completion
@NSilnitsky
Producer Consumer
Subscription
renewal
Job
Scheduler
Fault-tolerant
message handling
Wix Payments
Service
Use Case: Guarantee Completion
Kafka Broker
+ Retry
on failure
@NSilnitsky
Producer
Fault-tolerant
message handling
The Resilient Producer
Kafka Broker
@NSilnitsky
Save
message
to disk
Kafka Broker
Producer
Fault-tolerant
message handling
The Resilient Producer: When failed to produce will save the message
@NSilnitsky
Kafka Broker
Producer
Fault-tolerant
message handling
The Resilient Producer: When failed to produce will save the message
and retry on failure.
+ Retry
on failure
@NSilnitsky
Kebe
Fault-tolerant
message handling
What if pod is killed?
pod
Kube Node
@NSilnitsky
Kebe
Fault-tolerant
message handling
What if pod is killed?
pod
Kube Node
@NSilnitsky
Kebe
Fault-tolerant
message handling
What if pod is killed?
pod
Kube Node
Fault-tolerant
message handling
38
@NSilnitsky
Kubernetes
Node 1
DaemonSet pod
pod 1
pod 2
Kubernetes
Node 2
DaemonSet pod
pod 1
pod 2
DaemonSet
pod
Fault-tolerant
message handling
Scavenger
DaemonSet
Kube Node
@NSilnitsky
@NSilnitsky
Kafka Broker
pod
Scavenger
Kube Node
DaemonSet
What if pod is killed?
Fault-tolerant
message handling
Flush out
messages
* small
footprint
Polyglot
Event-driven Programming
with Kubernetes & gRPC
Wix
1,075M Kafka
messages a day
@NSilnitsky
Polyglot
message handling
Motivation: code reuse
Kafka Broker
Greyhound
Scala/Java
services
Greynode
NodeJS
services
Producer Consumer Producer Consumer
@NSilnitsky
Polyglot
message handling
Partial implementation is a problem
Kafka Broker
Greyhound Greynode
NodeJS
services
Producer Consumer Producer Consumer
@NSilnitsky
Polyglot
message handling
Experiment #1
Greyhound on GraalVM
Kafka Broker
Greyhound
NodeJS
services
GraalVM
Producer Consumer
@NSilnitsky
Greyhound
Polyglot
message handling
Experiment #2
Greyhound Sidecar
Kafka Broker
NodeJS
services
Consumer
gRPC
Producer
@NSilnitsky
Kafka Broker
Greyhound
Polyglot
message handling
Experiment #2
Greyhound Sidecar
gRPC
.proto
Sidecar
written
with
Scala
Service
written
with JS &
TS
@NSilnitsky
Kafka Broker
Greyhound
Polyglot
message handling
Experiment #2
Greyhound Sidecar
gRPC
Sidecar
written
with
Scala
.proto
Service
written
with Python
gRPC
Service
written
with JS &
TS
48
@NSilnitsky
Polyglot
message handling
pod
Kube Node
Containerized app
Volume
The Sidecar resides with the app in the same pod.
@NSilnitsky
Kubernetes pod
with main and
sidecar
containers
Polyglot
message handling
pod
Kube Node
pod
pod
Kafka Broker
@NSilnitsky
@NSilnitsky
There’s a
memory issue.
pod
Kube Node
pod
pod
Polyglot
message handling
Kafka Broker
@NSilnitsky
@NSilnitsky
Kafka Broker
DaemonSet
Optimization:
Greyhound in
Daemonset
Polyglot
message handling
@NSilnitsky
52
Sidecar Daemonset
✔ Simple design (simple state) ✘ Complex Design (Multi-tenant state)
✔ App and sidecar lifecycles are in sync ✘ Daemonset GA means downtime
✔ Failure footprint is small ✘ Failure affects more consumers
✘ Memory overhead/footprint ✔ Memory usage per Kube Node
@NSilnitsky
Design
Dilemmas
Standalone
Producer Service
Mitigates Daemonset Greyhound
downtime
@NSilnitsky
gRPC gRPC
* network hop
Kafka Broker
Greyhound
Producer
Greyhound
Producer
So, We use Kubernetes’ flexibility
to deploy Greyhound producers and consumers
in different patterns, in order to comply with
different requirements.
@NSilnitsky
Wix harnesses Kafka, Kubernetes and gRPC to
achieve a polyglot, fault tolerant, scalable
event-driven distributed system.
@NSilnitsky
A Java/Scala high-level SDK for Apache Kafka.
0.1 is out!
github.com/wix/greyhound
@NSilnitsky
Thank You
natansil.com twitter@NSilnitsky linkedin/natansilnitsky github.com/natansil
@NSilnitsky
Slides & More
slideshare.net/NatanSilnitsky
medium.com/@natansil
twitter.com/NSilnitsky
natansil.com
@NSilnitsky
Q&A
natansil.com twitter@NSilnitsky linkedin/natansilnitsky github.com/natansil

Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and gRpc