Open navigation menu

Scribd

0% found this document useful (0 votes)

111 views69 pages

Deep Learning Evolution at Google

Large-Scale Deep Learning for Intelligent Computer Systems discusses Google's use of deep learning across many of its products and research areas. The document outlines Google's two generations of deep learning software systems - DistBelief and TensorFlow. It describes how Google leverages large datasets and computation to push the boundaries of perception and language understanding. Google's rapid experimentation cycle and large-scale distributed training allows it to quickly improve models and deploy them across many products.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views69 pages

Deep Learning Evolution at Google

Large-Scale Deep Learning for Intelligent Computer Systems discusses Google's use of deep learning across many of its products and research areas. The document outlines Google's two generations of deep learning software systems - DistBelief and TensorFlow. It describes how Google leverages large datasets and computation to push the boundaries of perception and language understanding. Google's rapid experimentation cycle and large-scale distributed training allows it to quickly improve models and deploy them across many products.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Large-Scale Deep Learning for

Intelligent Computer Systems

Jeff Dean
Google Brain team in collaboration with many other teams

Growing Use of Deep Learning at Google

# of directories containing model description files

Across many
products/areas:
Android
Apps
GMail
Image Understanding
Maps
NLP
Photos
Robotics
Speech
Translation
many research uses..
YouTube
many others ...

Outline
Two generations of deep learning software systems:
1st generation: DistBelief [Dean et al., NIPS 2012]
2nd generation: TensorFlow (unpublished)
An overview of how we use these in research and products
Plus, ...a new approach for training (people, not models)

Google Brain project started in 2011, with a focus on

pushing state-of-the-art in neural networks. Initial
emphasis:
use large datasets, and
large amounts of computation
to push boundaries of what is possible in perception and
language understanding

Plenty of raw data

Text: trillions of words of English + other languages

Visual data: billions of images and videos
Audio: tens of thousands of hours of speech per day
User activity: queries, marking messages spam, etc.
Knowledge graph: billions of labelled relation triples
...

How can we build systems that truly understand this data?

Text Understanding
This movie should have NEVER been made. From the poorly
done animation, to the beyond bad acting. I am not sure at what
point the people behind this movie said "Ok, looks good! Lets
do it!" I was in awe of how truly horrid this movie was.

Turnaround Time and Effect on Research

Minutes, Hours:

Interactive research! Instant gratification!

1-4 days

Tolerable
Interactivity replaced by running many experiments in parallel

1-4 weeks:

High value experiments only

Progress stalls

>1 month

Dont even try

Important Property of Neural Networks

Results get better with

more data +
bigger models +
more computation
(Better algorithms, new insights and improved
techniques always help, too!)

How Can We Train Large, Powerful Models Quickly?

Exploit many kinds of parallelism
Model parallelism
Data parallelism

Model Parallelism

Model Parallelism

Model Parallelism

Data Parallelism
Parameter Servers

Model
Replicas

...

Data

...

Data Parallelism
Parameter Servers

p
Model
Replicas

...

Data

...

Data Parallelism
Parameter Servers

Model
Replicas

...

Data

...

Data Parallelism
Parameter Servers

p = p + p

Model
Replicas

...

Data

...

Data Parallelism
Parameter Servers

p = p + p

p
Model
Replicas

...

Data

...

Data Parallelism
Parameter Servers

Model
Replicas

...

Data

...

Data Parallelism
Parameter Servers

p = p + p

Model
Replicas

...

Data

...

Data Parallelism
Parameter Servers

p = p + p

Model
Replicas

...

Data

...

Data Parallelism Choices

Can do this synchronously:

N replicas eqivalent to an N times larger batch size

Pro: No noise
Con: Less fault tolerant (requires recovery if any single machine fails)

Can do this asynchronously:

Con: Noise in gradients

Pro: Relatively fault tolerant (failure in model replica doesnt block other
replicas)

(Or hybrid: M asynchronous groups of N synchronous replicas)

Data Parallelism Considerations

Want model computation time to be large relative to time to send/receive
parameters over network
Models with fewer parameters, that reuse each parameter multiple times in the
computation

Mini-batches of size B reuse parameters B times

Certain model structures reuse parameter many times within each example:

Convolutional models tend to reuse hundreds or thousands of times per

example (for different spatial positions)
Recurrent models (LSTMs, RNNs) tend to reuse tens to hundreds of times
(for unrolling through T time steps during training)

What are some ways that

deep learning is having
a significant impact at Google?

Sequence to Sequence Models

Oriol Vinyals, Ilya Sutskever & Quoc Le started looking at how to map one
sequence to another sequence:
0.1
0.5
1.0
0.0
2.4

The lazy brown fox jumped over the ...

Sequence to Sequence Model

Deep LSTM

High dimensional
representation of a
sequence
0.1
0.5
1.0
0.0
2.4

The lazy brown fox jumped over the ...

Connect two, you get a machine translation

system
Le renard brun rapide saute par dessus ...

The quick brown fox jumped over the ...

It works well
WMT14

BLEU

State-of-the-art

37.0

Neural Translation Model

37.3

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, Oriol Vinyals, Quoc V. Le (NIPS 2014)
Addressing Rare Word Problems in Neural Translation Models (arxiv.org/abs/1410.8206)
Thang Luong, Ilya Sutskever, Oriol Vinyals, Quoc V. Le, Wojciech Zaremba

or a chatbot.
I'm fine, thank you!

Hi Shiva, how are you?

It produces sensible conversations.

User> hello , i have a problem with my machine
BrainStop> hi
BrainStop> this is shiva .
BrainStop> how are you doing today ?
User> hi shiva , how are you ?
BrainStop> i ' m fine , thank you !
BrainStop> how may i assist you today ?
User> i want to access using vpn
BrainStop> currently the computer is connected to corp network ?
User> no
BrainStop> check this solution :
BrainStop> https : / / INTERNAL_HOSTNAME_REDACTED / articles / connecting - to disconnecting - from - ssl - vpn # heading - 4
User> thanks , bye !
BrainStop> thank you .

or a parser.
n:(S.17 n:(S.17 n:(NP.11 p:NNP.53 n:) ...

Allen is locked in, regardless of his situ...

It works well
Completely learned parser with no parsing-specific code
State of the art results on WSJ 23 parsing task
Grammar as a Foreign Language, Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav
Petrov, Ilya Sutskever, and Geoffrey Hinton (to appear in NIPS 2015)
http://arxiv.org/abs/1412.7449

or something that can learn graph algorithms

output:
Convex Hull
(or Delauney
Triangulation)
(or Travelling
Salesman tour)

input:
collection
of points

Pointer Networks, Oriol Vinyals, Meire Fortunato, &

Navdeep Jaitly (to appear in NIPS 2015)

Object Recognition Improvement Over Time

Predicted Human Performance

cat

ImageNet Challenge Winners

Image Models

cat

Module with 6
separate
=
convolutional
layers
24 layers deep

Going Deeper with Convolutions

Szegedy et al. CVPR 2015

Good Fine-Grained Classification

Good Generalization

Both recognized as meal

Sensible Errors

Works in practice for real users

Works in practice for real users

Connect sequence and image models, you get a

captioning system
A close up of a child holding a stuffed animal

It works well (BLEU scores)

Dataset

Previous SOTA

Show & Tell

Human

MS COCO

N/A

67

69

FLICKR

49

63

68

PASCAL (xfer learning)

25

59

68

SBU (weak label)

11

27

N/A

Show and Tell: A Neural Image Caption Generator,

Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan (CVPR
2015)

TensorFlow:
Second Generation Deep Learning System

Motivations
DistBelief (1st system) was great for scalability
Not as flexible as we wanted for research purposes
Better understanding of problem space allowed us to
make some dramatic simplifications

TensorFlow: Expressing High-Level ML Computations

Core in C++
Very low overhead
Different front ends for specifying/driving the computation
Python and C++ today, easy to add more

...

Python front end

C++ front end

Core TensorFlow Execution System

CPU

GPU

Android

iOS

...

TensorFlow Example (Batch Logistic Regression)

graph = tf.Graph()
with graph.AsDefault():
examples = tf.constant(train_dataset)
labels = tf.constant(train_labels)
W = tf.Variable(tf.truncated_normal([image_size * image_size, num_labels]))
b = tf.Variable(tf.zeros([num_labels]))

# Create new computation graph

# Training data/labels

# Variables

logits = tf.mat_mul(examples, W) + b
# Training computation
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, labels))
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
prediction = tf.nn.softmax(logits)

# Optimizer to use
# Predictions for training data

TensorFlow Example (Batch Logistic Regression)

graph = tf.Graph()
with graph.AsDefault():
examples = tf.constant(train_dataset)
labels = tf.constant(train_labels)
W = tf.Variable(tf.truncated_normal([image_size * image_size, num_labels]))
b = tf.Variable(tf.zeros([num_labels]))

# Create new computation graph

# Training data/labels

# Variables

logits = tf.mat_mul(examples, W) + b
# Training computation
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, labels))
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
prediction = tf.nn.softmax(logits)
with tf.Session(graph=graph) as session:
tf.InitializeAllVariables().Run()
for step in xrange(num_steps):
_, l, predictions = session.Run([optimizer, loss, prediction])
if (step % 100 == 0):
print 'Loss at step', step, ':', l
print 'Training accuracy: %.1f%%' % accuracy(predictions, labels)

# Optimizer to use
# Predictions for training data

# Run & return 3 values

Computation is a dataflow graph

Graph of Nodes, also called Operations or ops.

biases

Add

weights
MatMul
examples

labels

Relu
Xent

Computation is a dataflow graph

Edges are N-dimensional arrays: Tensors

biases

Add

weights
MatMul
examples

labels

with

s
r
o
s
ten

Relu
Xent

Computation is a dataflow graph

'Biases' is a variable

e
t
a
t
ith s

Some ops compute gradients

= updates biases

biases

...

learning rate

Add

...

Mul

Computation is a dataflow graph

Device A

biases

...

d
e
t
u
b
i
r
t
is

Add

learning rate

Devices: Processes, Machines, GPUs, etc

...

Mul

Device B

TensorFlow: Expressing High-Level ML Computations

Automatically runs models on range of platforms:

from phones ...

to single machines (CPU and/or GPUs)

to distributed systems of many 100s of GPU cards

What is in a name?
Tensor: N-dimensional array

1-dimension: Vector
2-dimension: Matrix
Represent many dimensional data flowing through the graph
e.g. Image represented as 3-d tensor rows, cols, color

Flow: Computation based on data flow graphs

Lots of operations (nodes in the graph) applied to data flowing through

Tensors flow through the graph TensorFlow

Edges represent the tensors (data)

Nodes represent the processing

Flexible
General computational infrastructure
Deep Learning support is a set of libraries on top of the core
Also useful for other machine learning algorithms
Possibly even for high performance computing (HPC) work
Abstracts away the underlying devices/computational hardware

Extensible
Core system defines a number of standard operations
and kernels (device-specific implementations of
operations)
Easy to define new operators and/or kernels

Deep Learning in TensorFlow

Typical neural net layer maps to one or more tensor operations

e.g. Hidden Layer: activations = Relu(weights * inputs + biases)

Library of operations specialized for Deep Learning

Dozens of high-level operations: 2D and 3D convolutions, Pooling, Softmax, ...

Standard losses e.g. CrossEntropy, L1, L2

Various optimizers e.g. Gradient Descent, AdaGrad, L-BFGS, ...

Auto Differentiation

Easy to experiment with (or combine!) a wide variety of different models:

LSTMs, convolutional models, attention models, reinforcement learning,
embedding models, Neural Turing Machine-like models, ...

No distinct Parameter Server subsystem

Parameters are now just stateful nodes in the graph
Data parallel training just a more complex graph
update

model
computation

update

model
computation

parameters

update

model
computation

Synchronous Variant
update

add

gradient

model
computation

gradient

model
computation

parameters

gradient

model
computation

Nurturing Great Researchers

Were always looking for people with the potential to become excellent
machine learning researchers

The resurgence of deep learning in the last few years has caused a surge of
interest of people who want to learn more and conduct research in this area

Google Brain Residency Program

New one year immersion program in deep learning research
Learn to conduct deep learning research w/experts in our team

Fixed one-year employment with salary, benefits, ...

Goal after one year is to have conducted several research projects

Interesting problems, TensorFlow, and access to computational resources

Google Brain Residency Program

Who should apply?

people with BSc or MSc, ideally in computer science, mathematics or statistics

completed coursework in calculus, linear algebra, and probability, or equiv.

programming experience

motivated, hard working, and have a strong interest in Deep Learning

Google Brain Residency Program

Program Application & Timeline

Google Brain Residency Program

For more information:

g.co/brainresidency
Contact us:

brain-residency@google.com

Questions?

You might also like

Bigdata Neural Networks
No ratings yet
Bigdata Neural Networks
144 pages
TensorFlow for AI Researchers
No ratings yet
TensorFlow for AI Researchers
240 pages
Large-Scale Deep Learning with TensorFlow
No ratings yet
Large-Scale Deep Learning with TensorFlow
119 pages
TensorFlow: Large-Scale ML System
No ratings yet
TensorFlow: Large-Scale ML System
18 pages
ETH Zurich Talk - April 14, 2025
No ratings yet
ETH Zurich Talk - April 14, 2025
84 pages
Multilingual Machine Translation Insights
No ratings yet
Multilingual Machine Translation Insights
31 pages
Deep Learning Advances at Google
No ratings yet
Deep Learning Advances at Google
97 pages
Optimizing Deep Learning on Mali GPU
No ratings yet
Optimizing Deep Learning on Mali GPU
44 pages
Paper Colossal-AI - A Unified Deep Learning System For Large-Scale Parallel Training
No ratings yet
Paper Colossal-AI - A Unified Deep Learning System For Large-Scale Parallel Training
10 pages
Efficient Deep Learning (First Early Release) (Gaurav Menghani Naresh Singh) (Z-Library)
No ratings yet
Efficient Deep Learning (First Early Release) (Gaurav Menghani Naresh Singh) (Z-Library)
69 pages
TensorFlow & CNTK for Deep Learning
No ratings yet
TensorFlow & CNTK for Deep Learning
23 pages
Understanding Transformer Models in NLP
No ratings yet
Understanding Transformer Models in NLP
5 pages
Deep Learning Fundamentals Explained
No ratings yet
Deep Learning Fundamentals Explained
49 pages
Self-Supervision, Bert, and Beyond: Building Transformer-Based Natural Language Processing Applications (Part 2)
No ratings yet
Self-Supervision, Bert, and Beyond: Building Transformer-Based Natural Language Processing Applications (Part 2)
117 pages
Tensorflow: Large-Scale Machine Learning On Heterogeneous Distributed Systems
No ratings yet
Tensorflow: Large-Scale Machine Learning On Heterogeneous Distributed Systems
4 pages
Tensor Flow
No ratings yet
Tensor Flow
19 pages
Deep Learning Frameworks & Techniques
No ratings yet
Deep Learning Frameworks & Techniques
5 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
Deep Learning Concise Notes
No ratings yet
Deep Learning Concise Notes
4 pages
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
No ratings yet
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
105 pages
Machine Learning Model Training Insights
No ratings yet
Machine Learning Model Training Insights
60 pages
Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures
No ratings yet
Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures
9 pages
Deep Learning 1 Course Overview 2023/24
No ratings yet
Deep Learning 1 Course Overview 2023/24
45 pages
Deep Learning Algorithms and Use Cases
No ratings yet
Deep Learning Algorithms and Use Cases
8 pages
09 Tensorflow101 Slide
No ratings yet
09 Tensorflow101 Slide
78 pages
History and Development of AI
No ratings yet
History and Development of AI
22 pages
AI 101 CheatSheet for Beginners
No ratings yet
AI 101 CheatSheet for Beginners
18 pages
Tensorflow Ensai SID 13 01 17
No ratings yet
Tensorflow Ensai SID 13 01 17
99 pages
Overview of Artificial Neural Networks
No ratings yet
Overview of Artificial Neural Networks
2 pages
Understanding Transformers in AI
No ratings yet
Understanding Transformers in AI
41 pages
UNIT - 5 Lecture 2
No ratings yet
UNIT - 5 Lecture 2
26 pages
Understanding Deep Learning
100% (1)
Understanding Deep Learning
39 pages
Deeplearning Ai
No ratings yet
Deeplearning Ai
57 pages
Deep Learning For Industries
No ratings yet
Deep Learning For Industries
45 pages
ICML'22 Big Model Tutorial (Public v2)
No ratings yet
ICML'22 Big Model Tutorial (Public v2)
160 pages
Dla
No ratings yet
Dla
23 pages
Transformer
No ratings yet
Transformer
39 pages
1 AI - Introduction and ML
No ratings yet
1 AI - Introduction and ML
32 pages
Krishna Rungta - TensorFlow in 1 Day Make Your Own Neural Network (2018) - Trang-1
No ratings yet
Krishna Rungta - TensorFlow in 1 Day Make Your Own Neural Network (2018) - Trang-1
24 pages
Deep Learning
No ratings yet
Deep Learning
28 pages
Efficient CNNs for Document Classification
No ratings yet
Efficient CNNs for Document Classification
59 pages
Deep Learning and CNN Overview
No ratings yet
Deep Learning and CNN Overview
35 pages
Overview of Deep Learning Concepts
100% (3)
Overview of Deep Learning Concepts
49 pages
Introduction To TensorFlow For Artificial Intelligence
No ratings yet
Introduction To TensorFlow For Artificial Intelligence
41 pages
DL Lab Manual
No ratings yet
DL Lab Manual
67 pages
Deep Learning in Natural Language Processing
No ratings yet
Deep Learning in Natural Language Processing
41 pages
22a Neural
No ratings yet
22a Neural
46 pages
DNN/CNN Toolbox Overview
No ratings yet
DNN/CNN Toolbox Overview
52 pages
Understanding Deep Learning Concepts
No ratings yet
Understanding Deep Learning Concepts
915 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
Deep Unsupervised Learning
No ratings yet
Deep Unsupervised Learning
90 pages
Deep Learning Nanodegree Syllabus
No ratings yet
Deep Learning Nanodegree Syllabus
7 pages
Deep Learning Basics with TensorFlow
No ratings yet
Deep Learning Basics with TensorFlow
50 pages
Train Your First Neural Network in PyTorch
No ratings yet
Train Your First Neural Network in PyTorch
68 pages
i.MX23, i.MX25, and i.MX28
No ratings yet
i.MX23, i.MX25, and i.MX28
52 pages
Tengku Diyana Binti Tengku Ibrahim
No ratings yet
Tengku Diyana Binti Tengku Ibrahim
24 pages
X X X X X X: Example 3. Given The Equations
No ratings yet
X X X X X X: Example 3. Given The Equations
5 pages
C Programs for Searching and Sorting Algorithms
No ratings yet
C Programs for Searching and Sorting Algorithms
51 pages
File Processing Approaches: (1) Traditional Approach To Information Processing
No ratings yet
File Processing Approaches: (1) Traditional Approach To Information Processing
20 pages
Asm Rsps Client Hacking PT 1
No ratings yet
Asm Rsps Client Hacking PT 1
3 pages
User and Role Report Fetching Guide
No ratings yet
User and Role Report Fetching Guide
1 page
LCD Interfacing With PIC Microcontroller
100% (2)
LCD Interfacing With PIC Microcontroller
9 pages
Web Dev Exam for Aspiring Developers
No ratings yet
Web Dev Exam for Aspiring Developers
4 pages
Manual Cr1000
No ratings yet
Manual Cr1000
540 pages
BrowserStack Implementation Guide
No ratings yet
BrowserStack Implementation Guide
9 pages
Configuring ISDN DDR With Dialer Profiles
No ratings yet
Configuring ISDN DDR With Dialer Profiles
11 pages
LAN Design
No ratings yet
LAN Design
6 pages
Oracle Database Architecture Overview
No ratings yet
Oracle Database Architecture Overview
19 pages
OS Solutions
No ratings yet
OS Solutions
22 pages
OpenText RightFax 10.5 FP1 Administrators Guide
No ratings yet
OpenText RightFax 10.5 FP1 Administrators Guide
372 pages
PWM for LED Brightness Control Guide
No ratings yet
PWM for LED Brightness Control Guide
9 pages
Advanced Test Automation Engineer Course
No ratings yet
Advanced Test Automation Engineer Course
3 pages
MCD4710 Mid-Trimester Test 1 Solutions
No ratings yet
MCD4710 Mid-Trimester Test 1 Solutions
8 pages
NLP Exam
No ratings yet
NLP Exam
3 pages
CODESYS Variable Declaration Guide
No ratings yet
CODESYS Variable Declaration Guide
15 pages
Browser Security White Paper
No ratings yet
Browser Security White Paper
330 pages
Computer 9 Components of Computer System
100% (1)
Computer 9 Components of Computer System
8 pages
Huawei FusionSphere 5.1 Technical White Paper On OpenStack Integrating FusionCompute (Cloud Data Center)
No ratings yet
Huawei FusionSphere 5.1 Technical White Paper On OpenStack Integrating FusionCompute (Cloud Data Center)
16 pages
FSM Design for Thunderbird Signals
100% (1)
FSM Design for Thunderbird Signals
3 pages
Resource Requirement and Allocation
No ratings yet
Resource Requirement and Allocation
26 pages
LCD Display Control for DE2 Board
No ratings yet
LCD Display Control for DE2 Board
5 pages
Creating Components in Quartus: Guide
No ratings yet
Creating Components in Quartus: Guide
3 pages
Linux Bible RedHat
No ratings yet
Linux Bible RedHat
7 pages
BC402 - ABAP Programming Techniques
100% (5)
BC402 - ABAP Programming Techniques
415 pages