SlideShare a Scribd company logo
AllReduce for
distributed learning
SungMin Han
Gopher
Agenda
● A short impression of I/O 2019
● Distributed learning
● AllReduce
● Cloud TPU Pods
Speaker
SungMin Han
Clova Research Engineer
Gopher
@pignose
A short impression
I/O 2019
Google I/O 2019
schedule
attended sessions
happy hours
sand boxes
join community meet up
snacks
Uber drives
05.07 - 09
26
1
8
4
6
8
Key Announcements
TensorFlow 2.0 Fairness Learning ML Kit AI Hub
Federated Learning TPU v3 Cloud TPU Pods TensorFlow on Swift
TensorFlow Lite for IoT Devices TensorFlow Agent TensorFlow Extended (TFX)
TensorFlow.js Google Coral Firebase Prediction Edge TPU
Key Announcements
TensorFlow 2.0 Fairness Learning ML Kit AI Hub
Federated Learning TPU v3 Cloud TPU Pods TensorFlow on Swift
TensorFlow Lite for IoT Devices TensorFlow Agent TensorFlow Extended (TFX)
TensorFlow.js Google Coral
TensorFlow
Firebase Prediction Edge TPU
Key Announcements
TensorFlow 2.0 Fairness Learning ML Kit AI Hub
Federated Learning TPU v3 Cloud TPU Pods TensorFlow on Swift
TensorFlow Lite for IoT Devices TensorFlow Agent TensorFlow Extended (TFX)
TensorFlow.js Google Coral
TPU / Device
Firebase Prediction Edge TPU
Key Announcements
TensorFlow 2.0 Fairness Learning ML Kit AI Hub
Federated Learning TPU v3 Cloud TPU Pods TensorFlow on Swift
TensorFlow Lite for IoT Devices TensorFlow Agent TensorFlow Extended (TFX)
TensorFlow.js Google Coral
ML Kit
Firebase Prediction Edge TPU
This session, We will talk
TensorFlow 2.0 Fairness Learning ML Kit AI Hub
Federated Learning TPU v3 Cloud TPU Pods TensorFlow on Swift
TensorFlow Lite for IoT Devices TensorFlow Agent TensorFlow Extended (TFX)
TensorFlow.js Google Coral
Distributed Learning
Firebase Prediction Edge TPU
Distributed
Learning
SGD with single GPU
Model
FP
BP
GPU 1CPU
AVG
WP
∆𝒘
loss
Previous learning environment
Simple version SGD
Model
loss
Gradient
GPU 1CPU
AVG
Update
∆𝒘
Previous learning environment
The problem
● Learning time has dependency on the and GPU model
● The model update process works on only
● High spec GPU machine is too
● Single GPU has a practical
● There is no way to support
Previous learning environment
batch-size
single GPU
expensive
limitations
scalability
SGD with multiple GPU
Model
loss
Gradient
GPU 1CPU
Aggregate
(AVG)
Update
∆𝒘
Model
loss
Gradient
GPU 3
Model
loss
Gradient
GPU 2
Gather
∆𝒘𝟏 ∆𝒘𝟐 ∆𝒘𝟑
Previous learning environment
The issue which we can find
● Data transmission time is slow between GPU memory and CPU
● There is GPU stickiness issue (*GPU balancing issue)
● This solution is for only single bare metal server (node)
Previous learning environment
TW gradient CPU model
To avoid the
problem
We need to find
a better way
imbalance
imbalance bottle neck=
The definition of Distribution
Increase efficiency by dividing the problem into smaller parts
Problem
Worker Worker Worker Worker
Answer
Three way of distributions
Parallel Concurrent Parallel + Concurrent
To build a distributed environment,
We should understand the difference of three categories for distributed solutions
Well known distributed solutions
● DistBelief (Google brain 1st distributed environment for Deep Learning)
● Horovod (Uber’s Distributed Tensorflow Environment)
● AllReduce (Today’s topic!)
● Federated Learning (Google announced on 2018)
● CollectiveAllReduce (Google Tensorflow tf.contrib.distribute.CollectiveAllReduce)
The basic theory
PS1
∆𝒘
GPU 1 GPU 2 GPU 3 GPU 4
Broadcast
∆𝒘𝟏 ∆𝒘𝟐 ∆𝒘𝟑 ∆𝒘𝟒
Downpour SGD
Use case of Uber (horovod)
https://eng.uber.com/horovod/
Parameter Server Scenario
https://eng.uber.com/horovod/
bottle necksimple over headcomplex
Use case of Uber (horovod)
https://eng.uber.com/horovod/
http://www.cs.fsu.edu/~xyuan/paper/09jpdc.pdf
Ring AllReduce
Use case of Uber (horovod)
https://eng.uber.com/horovod/
http://www.cs.fsu.edu/~xyuan/paper/09jpdc.pdf
Ring AllReduce
horovod Architecture
https://eng.uber.com/horovod/
http://www.cs.fsu.edu/~xyuan/paper/09jpdc.pdf
Ring AllReduce
TensorFlow
Baidu
Ring-AllReduce
NVIDIA
NCCL2
Open MPI
Use case of Uber (horovod)
https://eng.uber.com/horovod/
Federated Learning
https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
Federated Learning
https://arxiv.org/abs/1902.01046 - Towards Federated Learning at Scale: System Design
Federated Learning
https://arxiv.org/abs/1902.01046 - Towards Federated Learning at Scale: System Design
Secure Aggregation
https://eprint.iacr.org/2017/281.pdf
Federated Learning Architecture
https://eng.uber.com/horovod/
http://www.cs.fsu.edu/~xyuan/paper/09jpdc.pdf
TensorFlow
Actor
Programming
(Message Passing)
FL Server
Secure
Aggregation
AllReduce
What is AllReduce
PS
∆𝒘
∆𝒘𝟏
∆𝒘
∆𝒘𝟒
Downpour
∆𝒘
∆𝒘𝟑
∆𝒘
∆𝒘𝟐
What is AllReduce
AllReduce
𝜹1
𝜹𝟐
𝜹𝟐
𝜹𝟑
𝜹𝟏
𝜹𝟒
𝜹𝟑 𝜹𝟏
𝜹𝟒
𝜹𝟑
𝜹𝟒𝜹𝟐
What is AllReduce
AllReduce
𝜹1
𝜹𝟐
𝜹𝟐
𝜹𝟑
𝜹𝟏
𝜹𝟒
𝜹𝟑 𝜹𝟏
𝜹𝟒
𝜹𝟑
𝜹𝟒𝜹𝟐
With Hamiltonian circuit
AllReduce Strategy
https://preferredresearch.jp/2018/07/10/technologies-behind-distributed-deep-learning-allreduce/
AllReduce Strategy
https://preferredresearch.jp/2018/07/10/technologies-behind-distributed-deep-learning-allreduce/
AllReduce Strategy
https://preferredresearch.jp/2018/07/10/technologies-behind-distributed-deep-learning-allreduce/
AllReduce Strategy
https://preferredresearch.jp/2018/07/10/technologies-behind-distributed-deep-learning-allreduce/
AllReduce Strategy
https://preferredresearch.jp/2018/07/10/technologies-behind-distributed-deep-learning-allreduce/
Cloud TPU Pods
The world scale
180
TFLOPS
TPU v2
The world scale
100 Peta
FLOPS
TPU v3
TPU v3 architecture (H/W)
https://cloud.google.com/tpu/docs/system-architecture
TPU v3 architecture (S/W)
https://cloud.google.com/tpu/docs/system-architecture
TPU v3 architecture
https://cloud.google.com/blog/products/ai-machine-learning/what-makes-tpus-fine-tuned-for-deep-learning?hl=ko
TPU v3 architecture
https://cloud.google.com/blog/products/ai-machine-learning/what-makes-tpus-fine-tuned-for-deep-learning?hl=ko
TPU v3 architecture
https://cloud.google.com/blog/products/ai-machine-learning/what-makes-tpus-fine-tuned-for-deep-learning?hl=ko
TPU Pods Overview
https://arxiv.org/pdf/1811.06992.pdf
2-D AllReduce
Summary
● TPU’s inter-connect design gives high-speed for communication with units
● TPU v3 and Pods basically follows AllReduce
(1-D ring AllReduce, 2-D AllReduce)
● TPU Pods is not available yet (Alpha ‘19 06 30)
tan 𝑞!

More Related Content

PDF
Helm - Application deployment management for Kubernetes
PPTX
Terraform
PDF
Gitops: the kubernetes way
PDF
PPTX
building microservices
PPTX
Terraform on Azure
PDF
Terraform -- Infrastructure as Code
PPT
Git workflows
Helm - Application deployment management for Kubernetes
Terraform
Gitops: the kubernetes way
building microservices
Terraform on Azure
Terraform -- Infrastructure as Code
Git workflows

What's hot (20)

PDF
Cluster-as-code. The Many Ways towards Kubernetes
PDF
Kubernetes security
PDF
PDF
從軟體開發角度
談 Docker 的應用
PDF
GitOps with ArgoCD
PPTX
Intro to Helm for Kubernetes
PDF
Docker - un outil pour faciliter le développement et le déploiement informatique
PDF
Less intro workshop
PDF
Containerd + buildkit breakout
PPTX
Introduction to helm
PDF
【BS13】チーム開発がこんなにも快適に!コーディングもデバッグも GitHub 上で。 GitHub Codespaces で叶えられるシームレスな開発
PPTX
Helm - Package manager in K8S
PPTX
[넥슨] kubernetes 소개 (2018)
PPTX
Gitlab CI/CD
PPTX
Meetup 23 - 03 - Application Delivery on K8S with GitOps
PPTX
Kubernetes 101
PPTX
DevOps with Kubernetes
PDF
How Cisco Provides World-Class Technology Conference Experiences Using Automa...
PPTX
Terraform
PPTX
The Cloud Native Journey
Cluster-as-code. The Many Ways towards Kubernetes
Kubernetes security
從軟體開發角度
談 Docker 的應用
GitOps with ArgoCD
Intro to Helm for Kubernetes
Docker - un outil pour faciliter le développement et le déploiement informatique
Less intro workshop
Containerd + buildkit breakout
Introduction to helm
【BS13】チーム開発がこんなにも快適に!コーディングもデバッグも GitHub 上で。 GitHub Codespaces で叶えられるシームレスな開発
Helm - Package manager in K8S
[넥슨] kubernetes 소개 (2018)
Gitlab CI/CD
Meetup 23 - 03 - Application Delivery on K8S with GitOps
Kubernetes 101
DevOps with Kubernetes
How Cisco Provides World-Class Technology Conference Experiences Using Automa...
Terraform
The Cloud Native Journey
Ad

Similar to AllReduce for distributed learning I/O Extended Seoul (20)

PDF
The Flow of TensorFlow
PDF
Seed rl paper review
PPTX
Google developer festival Yangon 2017
PPTX
Tensorflow Ecosystem
PPTX
How to outsource the pain of drupal translation to smartling from loparev
PPTX
Boolan machine learning summit
PDF
building intelligent systems with large scale deep learning
PPTX
Getting Started with TensorFlow on Google Cloud
PDF
Large Scale Deep Learning with TensorFlow
PPTX
Hadoop training in mumbai
PDF
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
PDF
Python enterprise vento di liberta
PPTX
BRV CTO Summit Deep Learning Talk
PPTX
Building Instruqt, a scalable learning platform
PPTX
Hadoop Meetup Jan 2019 - TonY: TensorFlow on YARN and Beyond
PDF
Intro - End to end ML with Kubeflow @ SignalConf 2018
PDF
Run your code serverlessly on Google's open cloud
PDF
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
PDF
wang-Leveraging-the-Power-of-ChatGPT-and-Vector-Databases-in-the-FreeBSD-Expe...
PDF
ANIn Pune July 2023 |Prompt Engineering and AI first SDLC by Abhijit Shah
The Flow of TensorFlow
Seed rl paper review
Google developer festival Yangon 2017
Tensorflow Ecosystem
How to outsource the pain of drupal translation to smartling from loparev
Boolan machine learning summit
building intelligent systems with large scale deep learning
Getting Started with TensorFlow on Google Cloud
Large Scale Deep Learning with TensorFlow
Hadoop training in mumbai
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
Python enterprise vento di liberta
BRV CTO Summit Deep Learning Talk
Building Instruqt, a scalable learning platform
Hadoop Meetup Jan 2019 - TonY: TensorFlow on YARN and Beyond
Intro - End to end ML with Kubeflow @ SignalConf 2018
Run your code serverlessly on Google's open cloud
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
wang-Leveraging-the-Power-of-ChatGPT-and-Vector-Databases-in-the-FreeBSD-Expe...
ANIn Pune July 2023 |Prompt Engineering and AI first SDLC by Abhijit Shah
Ad

More from Kenneth Ceyer (15)

PDF
이미지 프로세싱 in Python Open Source - PYCON KOREA 2020
PDF
정적 컨텐츠 제너레이터 GatsbyJS에 대해서 알아봅시다.
PDF
LP(linear programming) Algorithm
PDF
AI 연구자를 위한 클린코드 - GDG DevFest Seoul 2019
PPTX
하둡 에코시스템 위에서 환상적인 테이크오프 - DSTS 2019
PDF
gRPC와 goroutine 톺아보기 - GDG Golang Korea 2019
PDF
How to use vim
PDF
Test and refactoring
PPTX
Deep dive into Modern frameworks - HTML5 Forum 2018
PDF
우아하게 준비하는 테스트와 리팩토링 - PyCon Korea 2018
PPTX
GDG DevFest 2017 Seoul 프론트엔드 모던 프레임워크 낱낱히 파헤치기
PPTX
엔지니어 관점에서 바라본 데이터시각화
PDF
Dealing with Python Reactively - PyCon Korea 2017
PDF
파이썬 리액티브하게 짜기 - PyCon Korea 2017
PDF
AngularJS 2, version 1 and ReactJS
이미지 프로세싱 in Python Open Source - PYCON KOREA 2020
정적 컨텐츠 제너레이터 GatsbyJS에 대해서 알아봅시다.
LP(linear programming) Algorithm
AI 연구자를 위한 클린코드 - GDG DevFest Seoul 2019
하둡 에코시스템 위에서 환상적인 테이크오프 - DSTS 2019
gRPC와 goroutine 톺아보기 - GDG Golang Korea 2019
How to use vim
Test and refactoring
Deep dive into Modern frameworks - HTML5 Forum 2018
우아하게 준비하는 테스트와 리팩토링 - PyCon Korea 2018
GDG DevFest 2017 Seoul 프론트엔드 모던 프레임워크 낱낱히 파헤치기
엔지니어 관점에서 바라본 데이터시각화
Dealing with Python Reactively - PyCon Korea 2017
파이썬 리액티브하게 짜기 - PyCon Korea 2017
AngularJS 2, version 1 and ReactJS

Recently uploaded (20)

PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Construction Project Organization Group 2.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Well-logging-methods_new................
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Artificial Intelligence
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
web development for engineering and engineering
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
PPT on Performance Review to get promotions
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPT
Project quality management in manufacturing
PPTX
Current and future trends in Computer Vision.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Construction Project Organization Group 2.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Fundamentals of safety and accident prevention -final (1).pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Well-logging-methods_new................
OOP with Java - Java Introduction (Basics)
Artificial Intelligence
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
web development for engineering and engineering
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPT on Performance Review to get promotions
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
UNIT-1 - COAL BASED THERMAL POWER PLANTS
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Project quality management in manufacturing
Current and future trends in Computer Vision.pptx

AllReduce for distributed learning I/O Extended Seoul