Sharding Containers
Andrey
Sibiryov
SRE, Uber New York
The Problem
«It’s complicated» –
John von Neumann.
…and we can do better than this.
In Uber, we run more than a thousand microservices in production,
written in different languages. At this scale and fanout, performance
of each one of them matters.
• The team I work on runs a very CPU and memory intensive
Go service processing millions of requests per second.
• We noticed that a relatively large slice of its run time is
dedicated to doing useless things – GC, context switching,
CPU stalling for memory access and so on.
It’s Not Fast Enough
…and unexpected numbers.
Benchmarks!
RPS
0
40
80
120
160
200
nginx, 1000s go 1.5 allocgo 1.5 computego 1.6 allocgo 1.6 compute
192
99
188
98
179 169
9199
83
155
before magic after magic
Sockets, Cores, HTs & NUMA.
In just a few years, the modern hardware switched away from
growing the CPU power to growing CPU cores and caches.
• Massively multi-core, multi-socket, with deep cache
hierarchies and cunning out-of-order execution pipelines.
• Same code can have different latency and throughput even
when running on the same CPU.
• Also, almost nobody uses PMU, PEBS and so on except
Brendan Gregg.
It’s Complicated
“Crooked Moore’s law
doesn’t work anymore!”
— Donald T.
I: A Cryptic Diagram
II: A Cryptic Diagram
III: A Cryptic Diagram
Devices, Interrupts & Latency.
Growing gap between performance characteristics of different
buses and components and multi-level caching end up introducing
more and more hidden lag to all operations.
• A single core in not capable of processing input from a NIC.
• Some applications are even forced to switch to userspace-
based polling to achieve full performance.
• Computers grow in complexity: adding queues, buffers &
offload techniques at expense of transparency.
It’s Complicated
“Computers are networks-
on-a-chip, literally!”
— Donald T.
The “Solution”
The challenge of
avoiding challenges.
…somebody already thought this through, right?
Modern language runtimes and VMs chose the way of least
resistance in hope that OSes will take care of the complexity of the
underlying hardware. You’ll never believe what happened next:
• Golang Issue #14406 – «… GC makes sufficient accesses to
memory to trick Linux's NUMA logic into moving physical
memory pages to be closer to the GC workers».
Wishful Thinking
IV: A Cryptic Diagram
We’ve put a VM into your VM so you can stall while you stall.
Multi-layer abstractions, over-engineering and multiple indirections
are the current trends of software development. The level of
abstraction engineers work on now is as remote from real hardware
as we are from planting potato on Mars right now.
• VMs & transpilation, garbage collectors, abstract hardware
models.
• Running in multiple nested virtual machines with virtualized
networking.
Wishful Thinking
The Workaround
Computer-Friendly
Engineering.
Databases is not the only thing you can shard.
Shard (n.) – A shard is a horizontal partition of data in a database
or search engine. Each shard is held on a separate server instance,
to spread load.
• In fact, we can shard whatever we want.
• In fact, we already do this: load balancing is essentially
sharding of your whole backend infrastructure.
Sharding
It’s not only for networks.
A load balancer distributes workloads across multiple computing
resources, such as computers, a computer cluster or network links.
It aims to optimize resource use, maximize throughput, minimize
response time, and avoid overload of any single resource.
• Normally, load balancers are used to distribute traffic across
network nodes.
• In fact, we can use a network LB to distribute load across
physical CPU cores.
Load Balancing
Tactics 101: use the terrain.
Network topology is the arrangement of the various elements
(links, nodes, etc.) of a computer network.
• In modern computers, each core is essentially a separate
network node.
• Docker supports CPU pinning. This way, we can spin up
multiple instances of the same container and pin them to
separate cores.
• We can even pin linked components closer to each other.
Hardware Locality
Let’s shard all the things!
Tesson is a tool that automatically analyzes your
hardware topology to utilize it as much as possible by
spawning & pinning multiple instances of your app
behind a local load balancer.
• Supports different granularities: core, NUMA
node, etc.
• Integrates with Gorb for seamless local load
balancer setup & configuration.
Project Tesson
github://kobolog/tesson
“Make programs
computer-friendly again!”
— Donald T.
Thank you!

More Related Content

PDF
Production Ready Containers from IBM and Docker
PPTX
Docker for Ops: Docker Networking Deep Dive, Considerations and Troubleshooti...
PDF
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
PDF
Securing the Container Pipeline at Salesforce by Cem Gurkok
PPTX
The Good, the Bad and the Ugly of Networking for Microservices by Mathew Lodg...
PDF
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
PDF
OSCON: System software goes weird
PDF
DockerCon EU 2015: The Latest in Docker Engine
Production Ready Containers from IBM and Docker
Docker for Ops: Docker Networking Deep Dive, Considerations and Troubleshooti...
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
Securing the Container Pipeline at Salesforce by Cem Gurkok
The Good, the Bad and the Ugly of Networking for Microservices by Mathew Lodg...
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
OSCON: System software goes weird
DockerCon EU 2015: The Latest in Docker Engine

What's hot (20)

PDF
Unikernels and docker from revolution to evolution — unikernels and docker ...
PDF
Advanced Docker Developer Workflows on MacOS X and Windows
PDF
Unikernels: the rise of the library hypervisor in MirageOS
PDF
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...
PDF
DCSF19 CMD and Conquer: Containerizing the Monolith
PPTX
Docker Meetup 08 03-2016
PDF
Journey to Docker Production: Evolving Your Infrastructure and Processes - Br...
PPTX
DockerCon EU 2015: Docker Universal Control Plane (Gordon's Special Session)
PPTX
Moving Legacy Applications to Docker by Josh Ellithorpe, Apcera
PPTX
How to be successful running Docker in Production
PDF
The Dockerfile Explosion and the Need for Higher Level Tools by Gareth Rushgrove
PDF
Docker Orchestration at Production Scale
PDF
Securing your Containers
PDF
You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...
PDF
Persistent storage tailored for containers
PPTX
Docker Networking : 0 to 60mph slides
PDF
Effective Data Pipelines with Docker & Jenkins - Brian Donaldson
PPTX
DockerCon EU 2015: Persistent, stateful services with docker cluster, namespa...
PDF
Docker on Docker
PPTX
DockerCon EU 2015: Speed Up Deployment: Building a Distributed Docker Registr...
Unikernels and docker from revolution to evolution — unikernels and docker ...
Advanced Docker Developer Workflows on MacOS X and Windows
Unikernels: the rise of the library hypervisor in MirageOS
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...
DCSF19 CMD and Conquer: Containerizing the Monolith
Docker Meetup 08 03-2016
Journey to Docker Production: Evolving Your Infrastructure and Processes - Br...
DockerCon EU 2015: Docker Universal Control Plane (Gordon's Special Session)
Moving Legacy Applications to Docker by Josh Ellithorpe, Apcera
How to be successful running Docker in Production
The Dockerfile Explosion and the Need for Higher Level Tools by Gareth Rushgrove
Docker Orchestration at Production Scale
Securing your Containers
You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...
Persistent storage tailored for containers
Docker Networking : 0 to 60mph slides
Effective Data Pipelines with Docker & Jenkins - Brian Donaldson
DockerCon EU 2015: Persistent, stateful services with docker cluster, namespa...
Docker on Docker
DockerCon EU 2015: Speed Up Deployment: Building a Distributed Docker Registr...
Ad

Viewers also liked (20)

PDF
runC: The little engine that could (run Docker containers) by Docker Captain ...
PDF
The Golden Ticket: Docker and High Security Microservices by Aaron Grattafiori
PDF
Microservices + Events + Docker = A Perfect Trio by Docker Captain Chris Rich...
PDF
Docker for Ops: Extending Docker with APIs, Drivers and Plugins by Arnaud Por...
PDF
Containerd: Building a Container Supervisor by Michael Crosby
PPTX
Thinking Inside the Container: A Continuous Delivery Story by Maxfield Stewart
PDF
Docker Security Deep Dive by Ying Li and David Lawrence
PDF
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
PDF
Docker for Mac and Windows: The Insider's Guide by Justin Cormack
PDF
Cloning Running Servers with Docker and CRIU by Ross Boucher
PPTX
Windows Server and Docker - The Internals Behind Bringing Docker and Containe...
PPTX
DockerCon 16 General Session Day 1
PDF
Microservices + Events + Docker = A Perfect Trio (dockercon)
PDF
On-the-Fly Containerization of Enterprise Java & .NET Apps by Amjad Afanah
PDF
Docker for Developers - Part 2 by Borja Burgos and Fernando Mayo
PDF
Docker for Developers - Part 1 by David Gageot
PPTX
DockerCon 16 General Session Day 2
PPTX
Docker for Ops: Docker Storage and Volumes Deep Dive and Considerations by Br...
PDF
Docker Networking Deep Dive
PPTX
Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...
runC: The little engine that could (run Docker containers) by Docker Captain ...
The Golden Ticket: Docker and High Security Microservices by Aaron Grattafiori
Microservices + Events + Docker = A Perfect Trio by Docker Captain Chris Rich...
Docker for Ops: Extending Docker with APIs, Drivers and Plugins by Arnaud Por...
Containerd: Building a Container Supervisor by Michael Crosby
Thinking Inside the Container: A Continuous Delivery Story by Maxfield Stewart
Docker Security Deep Dive by Ying Li and David Lawrence
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
Docker for Mac and Windows: The Insider's Guide by Justin Cormack
Cloning Running Servers with Docker and CRIU by Ross Boucher
Windows Server and Docker - The Internals Behind Bringing Docker and Containe...
DockerCon 16 General Session Day 1
Microservices + Events + Docker = A Perfect Trio (dockercon)
On-the-Fly Containerization of Enterprise Java & .NET Apps by Amjad Afanah
Docker for Developers - Part 2 by Borja Burgos and Fernando Mayo
Docker for Developers - Part 1 by David Gageot
DockerCon 16 General Session Day 2
Docker for Ops: Docker Storage and Volumes Deep Dive and Considerations by Br...
Docker Networking Deep Dive
Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...
Ad

Similar to Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov (20)

PDF
PowerAndPerformance
PPTX
(19-23)CC Unit-1 ppt.pptx
ODP
Cloud accounting software uk
PPTX
The Big Data Stack
PDF
Lessons from Highly Scalable Architectures at Social Networking Sites
PPTX
network ram parallel computing
PDF
The Internet-of-things: Architecting for the deluge of data
PPT
What is cloud computing - basic concepts
PPTX
Amazon Elastic Computing 2
PDF
The Big Data Developer (@pavlobaron)
PPTX
The Rise of the Monorepo at NVIDIA 
PPT
PPT
Session1
PPT
Cloud Computing
PPT
Cloud computing
PPTX
Cloud Native & Service Mesh
PDF
The Concept of Load Balancing Server in Secured and Intelligent Network
PPT
Current Trends in HPC
PDF
The trials and tribulations of providing engineering infrastructure
PPT
Everything You Need to Know About Sharding
PowerAndPerformance
(19-23)CC Unit-1 ppt.pptx
Cloud accounting software uk
The Big Data Stack
Lessons from Highly Scalable Architectures at Social Networking Sites
network ram parallel computing
The Internet-of-things: Architecting for the deluge of data
What is cloud computing - basic concepts
Amazon Elastic Computing 2
The Big Data Developer (@pavlobaron)
The Rise of the Monorepo at NVIDIA 
Session1
Cloud Computing
Cloud computing
Cloud Native & Service Mesh
The Concept of Load Balancing Server in Secured and Intelligent Network
Current Trends in HPC
The trials and tribulations of providing engineering infrastructure
Everything You Need to Know About Sharding

More from Docker, Inc. (20)

PDF
Containerize Your Game Server for the Best Multiplayer Experience
PDF
How to Improve Your Image Builds Using Advance Docker Build
PDF
Build & Deploy Multi-Container Applications to AWS
PDF
Securing Your Containerized Applications with NGINX
PDF
How To Build and Run Node Apps with Docker and Compose
PDF
Hands-on Helm
PDF
Distributed Deep Learning with Docker at Salesforce
PDF
The First 10M Pulls: Building The Official Curl Image for Docker Hub
PDF
Monitoring in a Microservices World
PDF
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
PDF
Predicting Space Weather with Docker
PDF
Become a Docker Power User With Microsoft Visual Studio Code
PDF
How to Use Mirroring and Caching to Optimize your Container Registry
PDF
Monolithic to Microservices + Docker = SDLC on Steroids!
PDF
Kubernetes at Datadog Scale
PDF
Labels, Labels, Labels
PDF
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
PDF
Build & Deploy Multi-Container Applications to AWS
PDF
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
PDF
Developing with Docker for the Arm Architecture
Containerize Your Game Server for the Best Multiplayer Experience
How to Improve Your Image Builds Using Advance Docker Build
Build & Deploy Multi-Container Applications to AWS
Securing Your Containerized Applications with NGINX
How To Build and Run Node Apps with Docker and Compose
Hands-on Helm
Distributed Deep Learning with Docker at Salesforce
The First 10M Pulls: Building The Official Curl Image for Docker Hub
Monitoring in a Microservices World
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
Predicting Space Weather with Docker
Become a Docker Power User With Microsoft Visual Studio Code
How to Use Mirroring and Caching to Optimize your Container Registry
Monolithic to Microservices + Docker = SDLC on Steroids!
Kubernetes at Datadog Scale
Labels, Labels, Labels
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Build & Deploy Multi-Container Applications to AWS
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
Developing with Docker for the Arm Architecture

Recently uploaded (20)

PPT
What is a Computer? Input Devices /output devices
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
Five Habits of High-Impact Board Members
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
Modernising the Digital Integration Hub
PPTX
Configure Apache Mutual Authentication
PDF
STKI Israel Market Study 2025 version august
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
Getting started with AI Agents and Multi-Agent Systems
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
What is a Computer? Input Devices /output devices
A review of recent deep learning applications in wood surface defect identifi...
NewMind AI Weekly Chronicles – August ’25 Week III
Zenith AI: Advanced Artificial Intelligence
Consumable AI The What, Why & How for Small Teams.pdf
OpenACC and Open Hackathons Monthly Highlights July 2025
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Five Habits of High-Impact Board Members
Developing a website for English-speaking practice to English as a foreign la...
Modernising the Digital Integration Hub
Configure Apache Mutual Authentication
STKI Israel Market Study 2025 version august
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
A proposed approach for plagiarism detection in Myanmar Unicode text
Getting started with AI Agents and Multi-Agent Systems
Training Program for knowledge in solar cell and solar industry
Flame analysis and combustion estimation using large language and vision assi...
Module 1.ppt Iot fundamentals and Architecture
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Enhancing plagiarism detection using data pre-processing and machine learning...

Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov

  • 2. The Problem «It’s complicated» – John von Neumann.
  • 3. …and we can do better than this. In Uber, we run more than a thousand microservices in production, written in different languages. At this scale and fanout, performance of each one of them matters. • The team I work on runs a very CPU and memory intensive Go service processing millions of requests per second. • We noticed that a relatively large slice of its run time is dedicated to doing useless things – GC, context switching, CPU stalling for memory access and so on. It’s Not Fast Enough
  • 4. …and unexpected numbers. Benchmarks! RPS 0 40 80 120 160 200 nginx, 1000s go 1.5 allocgo 1.5 computego 1.6 allocgo 1.6 compute 192 99 188 98 179 169 9199 83 155 before magic after magic
  • 5. Sockets, Cores, HTs & NUMA. In just a few years, the modern hardware switched away from growing the CPU power to growing CPU cores and caches. • Massively multi-core, multi-socket, with deep cache hierarchies and cunning out-of-order execution pipelines. • Same code can have different latency and throughput even when running on the same CPU. • Also, almost nobody uses PMU, PEBS and so on except Brendan Gregg. It’s Complicated
  • 6. “Crooked Moore’s law doesn’t work anymore!” — Donald T.
  • 7. I: A Cryptic Diagram
  • 8. II: A Cryptic Diagram
  • 9. III: A Cryptic Diagram
  • 10. Devices, Interrupts & Latency. Growing gap between performance characteristics of different buses and components and multi-level caching end up introducing more and more hidden lag to all operations. • A single core in not capable of processing input from a NIC. • Some applications are even forced to switch to userspace- based polling to achieve full performance. • Computers grow in complexity: adding queues, buffers & offload techniques at expense of transparency. It’s Complicated
  • 11. “Computers are networks- on-a-chip, literally!” — Donald T.
  • 12. The “Solution” The challenge of avoiding challenges.
  • 13. …somebody already thought this through, right? Modern language runtimes and VMs chose the way of least resistance in hope that OSes will take care of the complexity of the underlying hardware. You’ll never believe what happened next: • Golang Issue #14406 – «… GC makes sufficient accesses to memory to trick Linux's NUMA logic into moving physical memory pages to be closer to the GC workers». Wishful Thinking
  • 14. IV: A Cryptic Diagram
  • 15. We’ve put a VM into your VM so you can stall while you stall. Multi-layer abstractions, over-engineering and multiple indirections are the current trends of software development. The level of abstraction engineers work on now is as remote from real hardware as we are from planting potato on Mars right now. • VMs & transpilation, garbage collectors, abstract hardware models. • Running in multiple nested virtual machines with virtualized networking. Wishful Thinking
  • 17. Databases is not the only thing you can shard. Shard (n.) – A shard is a horizontal partition of data in a database or search engine. Each shard is held on a separate server instance, to spread load. • In fact, we can shard whatever we want. • In fact, we already do this: load balancing is essentially sharding of your whole backend infrastructure. Sharding
  • 18. It’s not only for networks. A load balancer distributes workloads across multiple computing resources, such as computers, a computer cluster or network links. It aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. • Normally, load balancers are used to distribute traffic across network nodes. • In fact, we can use a network LB to distribute load across physical CPU cores. Load Balancing
  • 19. Tactics 101: use the terrain. Network topology is the arrangement of the various elements (links, nodes, etc.) of a computer network. • In modern computers, each core is essentially a separate network node. • Docker supports CPU pinning. This way, we can spin up multiple instances of the same container and pin them to separate cores. • We can even pin linked components closer to each other. Hardware Locality
  • 20. Let’s shard all the things! Tesson is a tool that automatically analyzes your hardware topology to utilize it as much as possible by spawning & pinning multiple instances of your app behind a local load balancer. • Supports different granularities: core, NUMA node, etc. • Integrates with Gorb for seamless local load balancer setup & configuration. Project Tesson github://kobolog/tesson