SlideShare a Scribd company logo
RTSS Jun Young Park
Introduction to PyTorch
Objective
 Understanding AutoGrad
 Review
 Logistic Classifier
 Loss Function
 Backpropagation
 Chain Rule
 Example : Find gradient from a matrix
 AutoGrad
 Solve the example with AutoGrad
 Data Parallism in PyTorch
 Why should we use GPUs?
 Inside CUDA
 How to parallelize our models
 Experiment
Simple but powerful implementation of backpropagation
Understanding AutoGrad
Logistic Classifier (Fully-Connected)
𝑊𝑋 + b = y
2.0
1.0
0.1
p = 0.7
p = 0.2
p = 0.1
S(y)
ProbabilityLogits
X : Input
W, b : To be trained
y : Prediction
S(y) : Softmax function (Can be other activation functions)
A
B
C
𝑆 𝑦 =
𝑒 𝑦 𝑖
𝑖 𝑒 𝑦 𝑖
represents the probabilities of elements in vector 𝑦.
A
Instance
Distance
A
0.7
0.2
0.1
Probability
1
0
0
One-Hot Encoded
A
B
C
MAX
Loss
Find W, b that minimize the loss(error).
Predict Label
Loss Function
 The vector can be very large when there are a lot of classes.
 How can we find the distance between vector S(Predict) and L(Label) ?
𝐷 𝑆, 𝐿 = −
𝑖
𝐿𝑖 log(𝑆𝑖)
0.7
0.2
0.1
1.0
0.0
0.0
S(y) L
※ D(S,L) ≠ D(L,S)
Don’t worry to take log(0)
𝑆 𝑦 =
𝑒 𝑦𝑖
𝑖 𝑒 𝑦 𝑖
In-depth of Classifier
Let there’re equations …
1. Affine Sum
𝜎(𝑥) = 𝑊𝑥 + 𝐵
2. Activation Function
𝑦(𝜎) = 𝑅𝑒𝐿𝑈 𝜎
3. Loss Function
𝐸 𝑦 =
1
2
𝑦𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑦
2
4. Gradient Descent
𝑤 ← 𝑤 − 𝛼
𝜕𝐸
𝜕𝑤
𝑏 ← 𝑏 − 𝛼
𝜕𝐸
𝜕𝑏
• Gradient Descent requires
𝜕𝐸
𝜕𝑤
and
𝜕𝐸
𝜕𝑏
.
• How can we find them? -> Use chain rule !
𝑦𝑡𝑎𝑟𝑔𝑒𝑡 : Training data
𝑦 : Prediction result
Chain Rule
• Let y(x) is defined below, 𝑥 influences 𝑔 𝑥 and 𝑔 𝑥 influences 𝑓 𝑔 𝑥
𝑦 𝑥 = 𝑓 𝑔 𝑥 = 𝑓 ∘ 𝑔(𝑥)
• Find derivation of y(x)
𝑦′
𝑥 = 𝑓′
𝑔 𝑥 𝑔′
𝑥
• in Liebniz notation…
𝑑𝑦
𝑑𝑥
=
𝑑𝑦
𝑑𝑓
𝑑𝑓
𝑑𝑔
𝑑𝑔
𝑑𝑥
= 1 ∗ 𝑓′ 𝑔 𝑥 ∗ 𝑔′(𝑥)
Chain Rule
𝜕𝐸
𝜕𝑤
=
𝜕𝐸
𝜕𝑦
𝜕𝑦
𝜕𝜎
𝜕𝜎
𝜕𝑤
=
𝑥 𝑦 − 𝑦𝑡𝑎𝑟𝑔𝑒𝑡 (𝜎 > 0)
0 (𝜎 ≤ 0)
𝜕𝐸
𝜕𝑦
= 𝑦 − 𝑦𝑡𝑎𝑟𝑔𝑒𝑡 ,
𝜕𝑦
𝜕𝜎
=
1 (𝜎 > 0)
0 (𝜎 ≤ 0)
,
𝜕𝜎
𝜕𝑤
= 𝑥
Let there’re equations …
1. Affine Sum
𝜎(𝑥) = 𝑊𝑥 + 𝐵
2. Activation Function
𝑦(𝜎) = 𝑅𝑒𝐿𝑈 𝜎
3. Loss Function
𝐸 𝑦 =
1
2
𝑦𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑦
2
4. Gradient Descent
𝑤 ← 𝑤 − 𝛼
𝜕𝐸
𝜕𝑤
𝑏 ← 𝑏 − 𝛼
𝜕𝐸
𝜕𝑏
Example : Finding gradient of 𝑋
 Let input tensor 𝑋 is initialized by following square matrix of 3rd order.
𝑋 =
1 2 3
4 5 6
7 8 9
 And 𝑌, 𝑍 is defined following …
𝑌 = 𝑋 + 3
𝑍 = 6(𝑌)2
= 6( 𝑋 + 3)2
 And output 𝛿 is the average of tensor 𝑍
𝛿 = 𝑚𝑒𝑎𝑛 𝑍 =
1
9
𝑖 𝑗
𝑍𝑖𝑗
Example : Finding gradient of 𝑋
 We can find scalar 𝑍𝑖𝑗 from its definition (Linearity)
𝑍𝑖𝑗 = 6(𝑌𝑖𝑗)2
𝑌𝑖𝑗 = 𝑋𝑖𝑗 + 3
 To find gradient, We use ‘Chain Rule’ so that we can find partial gradients.
𝜕𝛿
𝜕𝑍𝑖𝑗
=
1
9
,
𝜕𝑍𝑖𝑗
𝜕𝑌𝑖𝑗
= 12𝑌𝑖𝑗,
𝜕𝑌𝑖𝑗
𝜕𝑋𝑖𝑗
= 1
𝜕𝛿
𝜕𝑋𝑖𝑗
=
𝜕𝛿
𝜕𝑍𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑋𝑖𝑗
=
1
9
∗ 12𝑌𝑖𝑗 ∗ 1 =
4
3
𝑋𝑖𝑗 + 3
Example : Finding gradient of 𝑋
 Thus, We can get a gradient of (1,1) element of 𝑋
𝜕𝛿
𝜕𝑋𝑖𝑗
=
4
3
𝑋𝑖𝑗 + 3 |(𝑖, 𝑗)=(1,1) =
4
3
1 + 3 =
16
3
 Like this, We can get whole gradient matrix of 𝑋 …
𝜕𝛿
𝜕 𝑋
=
𝜕𝛿
𝜕𝑋11
𝜕𝛿
𝜕𝑋12
𝜕𝛿
𝜕𝑋13
𝜕𝛿
𝜕𝑋21
𝜕𝛿
𝜕𝑋22
𝜕𝛿
𝜕𝑋23
𝜕𝛿
𝜕𝑋31
𝜕𝛿
𝜕𝑋32
𝜕𝛿
𝜕𝑋33
=
16
3
20
3
24
3
28
3
32
3
36
3
40
3
44
3
48
3
AutoGrad : Finding gradient of 𝑋
𝑋 =
1 2 3
4 5 6
7 8 9
𝑌 = 𝑋 + 3
𝑍 = 6(𝑌)2
= 6( 𝑋 + 3)2
𝛿 = 𝑚𝑒𝑎𝑛 𝑍 =
1
9
𝑖 𝑗
𝑍𝑖𝑗
𝜕𝛿
𝜕𝑋
=
𝜕𝛿
𝜕𝑋11
𝜕𝛿
𝜕𝑋12
𝜕𝛿
𝜕𝑋13
𝜕𝛿
𝜕𝑋21
𝜕𝛿
𝜕𝑋22
𝜕𝛿
𝜕𝑋23
𝜕𝛿
𝜕𝑋31
𝜕𝛿
𝜕𝑋32
𝜕𝛿
𝜕𝑋33
=
16
3
20
3
24
3
28
3
32
3
36
3
40
3
44
3
48
3
Each operation has its gradient function.
Back Propagation
 Get derivatives using ‘Back Propagation’
+
𝑥
𝑦
𝑧
𝑧 = 𝑥 + 𝑦
𝜕𝑧
𝜕𝑥
=
𝜕𝑧
𝜕𝑦
= 1
𝜕𝐿
𝜕𝑧
𝜕𝐿
𝜕𝑧
𝜕𝑧
𝜕𝑥
=
𝜕𝐿
𝜕𝑧
𝜕𝐿
𝜕𝑧
𝜕𝑧
𝜕𝑦
=
𝜕𝐿
𝜕𝑧
x
𝑥
𝑦
𝑧
𝑧 = 𝑥𝑦
𝜕𝑧
𝜕𝑥
= 𝑦,
𝜕𝑧
𝜕𝑦
= 𝑥
𝜕𝐿
𝜕𝑧
𝜕𝐿
𝜕𝑧
𝜕𝑧
𝜕𝑥
=
𝜕𝐿
𝜕𝑧
∙ 𝑦
From output signal 𝐿 …
𝜕𝐿
𝜕𝑧
𝜕𝑧
𝜕𝑦
=
𝜕𝐿
𝜕𝑧
∙ 𝑥
Back Propagation
 How about exponentation function?
^
𝑛
𝑥 𝑧
𝑧 = 𝑥 𝑛
𝜕𝑧
𝜕𝑥
= 𝑛𝑥 𝑛−1
,
𝜕𝑧
𝜕𝑛
= 𝑥 𝑛
ln 𝑥
𝜕𝐿
𝜕𝑧
𝜕𝐿
𝜕𝑧
𝜕𝑧
𝜕𝑥
=
𝜕𝐿
𝜕𝑧
(𝑛𝑥 𝑛−1
)
From output signal 𝐿 …
𝑧 = 𝑥 𝑛
ln 𝑧 = 𝑛 ln 𝑥
1
𝑧
𝑑𝑧 = ln 𝑥 𝑑𝑛
𝑑𝑧
𝑑𝑛
= 𝑧 ln 𝑥 = 𝑥 𝑛 ln 𝑥
𝜕𝐿
𝜕𝑧
𝜕𝑧
𝜕𝑛
=
𝜕𝐿
𝜕𝑧
(𝑥 𝑛
ln 𝑥)
Appendix : Operation Graph of 𝛿 (Matrix)
+𝑋11 ^
𝑌11
x
2 2 6
x
1
9
+
𝑍11
𝑋12
…
…
…
𝑋33
…
…
…
… 𝑍12
𝑍33
𝛿
…
𝑍𝑖𝑗 = 6(𝑌𝑖𝑗)2
𝛿 = 𝑚𝑒𝑎𝑛 𝑍
Appendix : Operation Graph of 𝛿 (Scalar)
- Backpropagation
+𝑋𝑖𝑗 ^
𝑌𝑖𝑗
x
2 6
x
1
9
+
𝑍𝑖𝑗
𝛿
+𝑋𝑖𝑗 ^ x x+
𝑍 𝑠𝑢𝑚
2
𝛽𝑖𝑗𝛼𝑖𝑗
𝜕𝛿
𝜕𝑋𝑖𝑗
=
𝜕𝛿
𝜕𝑍𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑋𝑖𝑗
=
𝜕𝛿
𝜕𝛽𝑖𝑗
𝜕𝛽𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝛼𝑖𝑗
𝜕𝛼𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑋𝑖𝑗
=
1
9
∗ 1 ∗ 6 ∗ 2𝑌𝑖𝑗 ∗ 2 =
4
3
(𝑋𝑖𝑗 + 3)
𝜕𝛿
𝜕𝛽𝑖𝑗
=
1
9
𝜕𝛿
𝜕𝑍𝑖𝑗
=
1
9
∗ 1
𝜕𝛿
𝜕𝛼𝑖𝑗
=
1
9
∗ 1 ∗ 6
=
𝜕𝛿
𝜕𝛽𝑖𝑗
=
𝜕𝛿
𝜕𝛽𝑖𝑗
𝜕𝛽𝑖𝑗
𝜕𝑍𝑖𝑗
=
𝜕𝛿
𝜕𝛽𝑖𝑗
𝜕𝛽𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝛼𝑖𝑗
=
𝜕𝛿
𝜕𝛽𝑖𝑗
𝜕𝛽𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝛼𝑖𝑗
𝜕𝛼𝑖𝑗
𝜕𝑌𝑖𝑗
=
𝜕𝛿
𝜕𝛽𝑖𝑗
𝜕𝛽𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝛼𝑖𝑗
𝜕𝛼𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑋𝑖𝑗
𝜕𝛿
𝜕𝑌𝑖𝑗
=
1
9
∗ 1 ∗ 6 ∗ 2𝑌𝑖𝑗
𝜕𝛿
𝜕𝑋𝑖𝑗
=
4
3
(𝑋𝑖𝑗 + 3)
𝜕𝛿
𝜕𝛿
= 1
𝛿
Comparison
AutoGradRaw
Data Parallism
in PyTorch
Why GPU? (CUDA)
T T
Core
T T
Core
T T
Core
T T
Core
T T
Core
T T
Core
…
3584 cores
Good for few huge tasks Good for enormous small tasks
3.6 GHz
1.6 GHz
(2.0 GHz @ O.C)
Dataflow Diagram
CPU GPU
Memory MemorycudaMemcpy()
cudaMalloc()
__global__ sum()
hello.cu
NVCC
Co-processor
CPU GPU
d_a
d_b
d_out
h_a
h_b
h_out
1.Memcpy
sum
2.Kernal call (cuBLAS)
3.Memcpy
CUDA on Multi GPU System
Quad SLI
14,336 CUDA cores
48GB of VRAM
How can we use multi GPUs in PyTorch?
Problem
- Low utilization
Only allocated
single GPU.
Zero Utilization
Redundant Memory
Problem
- Duration & Memory Allocation
 Large batch size causes lack of memory.
 Out of memory error from PyTorch -> Python kernel dies.
 Can’t set large batch size.
 Can afford batch_size = 5, num_workers = 2
 Can’t divide up the work with the other GPUs
 Elapsed Time : 25m 44s (10 epochs)
 Reached 99% of accuracy in 9 epochs (for training set)
 It takes too much time.
Data Parallelism in PyTorch
 Implemented using torch.nn.DataParallel()
 Can be used for wrapping a module or model.
 Also support primitives (torch.nn.parallel.*)
 Replicate : Replicate the model on multiple devices(GPUs)
 Scatter : Distribute the input in the first-dimension.
 Gather : Gather and concatenate the input in the first-dimension.
 Apply-Parallel : Apply a set of already-distributed inputs to a set of already-distributed
models.
 PyTorch Tutorials – Multi-GPU examples
 https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html
Easy to Use : nn.DataParallel(model)
- Practical Example
1. Define the model.
2. Wrap the model with nn.DataParallel().
3. Access layers through ‘module’
After Parallelism
- GPU Utilization
 Hyperparameters
 Batch Size : 128
 Number of Workers : 16
 High Utilization.
 Can use large memory space.
 Allocated all GPUs
After Parallelism
- Training Performance
 Hyperparameters
 Batch Size : 128
 Large batch size need more memory space
 Number of Workers : 16
 Recommended to set (4 * NUM_GPUs) – From the forum
 Elapsed Time : 7m 50s (10 epochs)
 Reached 99% of accuracy in 4 epochs (for training set).
 It just taken 3m 10s.
Q & A

More Related Content

What's hot (20)

TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
Poo Kuan Hoong
 
Pytorch
PytorchPytorch
Pytorch
ehsan tr
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro
9xdot
 
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
Edureka!
 
Deep learning with keras
Deep learning with kerasDeep learning with keras
Deep learning with keras
MOHITKUMAR1379
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to Keras
John Ramey
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
Edge AI and Vision Alliance
 
Keras: Deep Learning Library for Python
Keras: Deep Learning Library for PythonKeras: Deep Learning Library for Python
Keras: Deep Learning Library for Python
Rafi Khan
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Introduction To TensorFlow
Introduction To TensorFlowIntroduction To TensorFlow
Introduction To TensorFlow
Spotle.ai
 
Deep learning
Deep learning Deep learning
Deep learning
Rajgupta258
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
Shuai Zhang
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Simplilearn
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning
Asma-AH
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
Charmi Chokshi
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)
Rakuten Group, Inc.
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
Debarko De
 
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Simplilearn
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
Databricks
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
Poo Kuan Hoong
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro
9xdot
 
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
Edureka!
 
Deep learning with keras
Deep learning with kerasDeep learning with keras
Deep learning with keras
MOHITKUMAR1379
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to Keras
John Ramey
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
Edge AI and Vision Alliance
 
Keras: Deep Learning Library for Python
Keras: Deep Learning Library for PythonKeras: Deep Learning Library for Python
Keras: Deep Learning Library for Python
Rafi Khan
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Introduction To TensorFlow
Introduction To TensorFlowIntroduction To TensorFlow
Introduction To TensorFlow
Spotle.ai
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
Shuai Zhang
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Simplilearn
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning
Asma-AH
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
Charmi Chokshi
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)
Rakuten Group, Inc.
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
Debarko De
 
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Simplilearn
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
Databricks
 

Similar to Introduction to PyTorch (20)

04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
Tamer Ahmed Farrag, PhD
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
alpinedatalabs
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
DB Tsai
 
DISCRETE LOGARITHM PROBLEM
DISCRETE LOGARITHM PROBLEMDISCRETE LOGARITHM PROBLEM
DISCRETE LOGARITHM PROBLEM
MANISH KUMAR
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
Stratio
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
Yan Xu
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdf
TulasiramKandula1
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
cairo university
 
Shors'algorithm simplified.pptx
Shors'algorithm simplified.pptxShors'algorithm simplified.pptx
Shors'algorithm simplified.pptx
SundarappanKathiresa
 
Learn Matlab
Learn MatlabLearn Matlab
Learn Matlab
Abd El Kareem Ahmed
 
Introduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave MachinesIntroduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave Machines
Arithmer Inc.
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
DB Tsai
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
ParveenMalik18
 
Principal Component Analysis (PCA) .
Principal Component Analysis (PCA)      .Principal Component Analysis (PCA)      .
Principal Component Analysis (PCA) .
KyonLue
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
Hojin Yang
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdf
ssuser7f0b19
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
Ahmed BESBES
 
Machine learning introduction lecture notes
Machine learning introduction lecture notesMachine learning introduction lecture notes
Machine learning introduction lecture notes
UmeshJagga1
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
alpinedatalabs
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
DB Tsai
 
DISCRETE LOGARITHM PROBLEM
DISCRETE LOGARITHM PROBLEMDISCRETE LOGARITHM PROBLEM
DISCRETE LOGARITHM PROBLEM
MANISH KUMAR
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
Stratio
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
Yan Xu
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdf
TulasiramKandula1
 
Introduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave MachinesIntroduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave Machines
Arithmer Inc.
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
DB Tsai
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
ParveenMalik18
 
Principal Component Analysis (PCA) .
Principal Component Analysis (PCA)      .Principal Component Analysis (PCA)      .
Principal Component Analysis (PCA) .
KyonLue
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
Hojin Yang
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdf
ssuser7f0b19
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
Ahmed BESBES
 
Machine learning introduction lecture notes
Machine learning introduction lecture notesMachine learning introduction lecture notes
Machine learning introduction lecture notes
UmeshJagga1
 
Ad

More from Jun Young Park (8)

Using Multi GPU in PyTorch
Using Multi GPU in PyTorchUsing Multi GPU in PyTorch
Using Multi GPU in PyTorch
Jun Young Park
 
Trial for Practical NN Using
Trial for Practical NN UsingTrial for Practical NN Using
Trial for Practical NN Using
Jun Young Park
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
Jun Young Park
 
PyTorch and Transfer Learning
PyTorch and Transfer LearningPyTorch and Transfer Learning
PyTorch and Transfer Learning
Jun Young Park
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
Jun Young Park
 
Deep Neural Network
Deep Neural NetworkDeep Neural Network
Deep Neural Network
Jun Young Park
 
Introduction to Neural Network
Introduction to Neural NetworkIntroduction to Neural Network
Introduction to Neural Network
Jun Young Park
 
GPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingGPU-Accelerated Parallel Computing
GPU-Accelerated Parallel Computing
Jun Young Park
 
Using Multi GPU in PyTorch
Using Multi GPU in PyTorchUsing Multi GPU in PyTorch
Using Multi GPU in PyTorch
Jun Young Park
 
Trial for Practical NN Using
Trial for Practical NN UsingTrial for Practical NN Using
Trial for Practical NN Using
Jun Young Park
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
Jun Young Park
 
PyTorch and Transfer Learning
PyTorch and Transfer LearningPyTorch and Transfer Learning
PyTorch and Transfer Learning
Jun Young Park
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
Jun Young Park
 
Introduction to Neural Network
Introduction to Neural NetworkIntroduction to Neural Network
Introduction to Neural Network
Jun Young Park
 
GPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingGPU-Accelerated Parallel Computing
GPU-Accelerated Parallel Computing
Jun Young Park
 
Ad

Recently uploaded (20)

语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上
语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上
语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上
JunZhao68
 
Chapter 5.1.pptxsertj you can get it done before the election and I will
Chapter 5.1.pptxsertj you can get it done before the election and I willChapter 5.1.pptxsertj you can get it done before the election and I will
Chapter 5.1.pptxsertj you can get it done before the election and I will
SotheaPheng
 
EPC UNIT-V forengineeringstudentsin.pptx
EPC UNIT-V forengineeringstudentsin.pptxEPC UNIT-V forengineeringstudentsin.pptx
EPC UNIT-V forengineeringstudentsin.pptx
ExtremerZ
 
Geospatial Data_ Unlocking the Power for Smarter Urban Planning.docx
Geospatial Data_ Unlocking the Power for Smarter Urban Planning.docxGeospatial Data_ Unlocking the Power for Smarter Urban Planning.docx
Geospatial Data_ Unlocking the Power for Smarter Urban Planning.docx
sofiawilliams5966
 
IST606_SecurityManagement-slides_ 4 pdf
IST606_SecurityManagement-slides_ 4  pdfIST606_SecurityManagement-slides_ 4  pdf
IST606_SecurityManagement-slides_ 4 pdf
nwanjamakane
 
How to Choose the Right Online Proofing Software
How to Choose the Right Online Proofing SoftwareHow to Choose the Right Online Proofing Software
How to Choose the Right Online Proofing Software
skalatskayaek
 
Market Share Analysis.pptx nnnnnnnnnnnnnn
Market Share Analysis.pptx nnnnnnnnnnnnnnMarket Share Analysis.pptx nnnnnnnnnnnnnn
Market Share Analysis.pptx nnnnnnnnnnnnnn
rocky
 
Alcoholic liver disease slides presentation new.pptx
Alcoholic liver disease slides presentation new.pptxAlcoholic liver disease slides presentation new.pptx
Alcoholic liver disease slides presentation new.pptx
DrShashank7
 
Mastering Data Science: Unlocking Insights and Opportunities at Yale IT Skill...
Mastering Data Science: Unlocking Insights and Opportunities at Yale IT Skill...Mastering Data Science: Unlocking Insights and Opportunities at Yale IT Skill...
Mastering Data Science: Unlocking Insights and Opportunities at Yale IT Skill...
smrithimuralidas
 
一比一原版(USC毕业证)南加利福尼亚大学毕业证如何办理
一比一原版(USC毕业证)南加利福尼亚大学毕业证如何办理一比一原版(USC毕业证)南加利福尼亚大学毕业证如何办理
一比一原版(USC毕业证)南加利福尼亚大学毕业证如何办理
Taqyea
 
531a07261283c4efb4cbae5fb8. Tele2 Sverige AB v post-och telestyrelsen, C-203:...
531a07261283c4efb4cbae5fb8. Tele2 Sverige AB v post-och telestyrelsen, C-203:...531a07261283c4efb4cbae5fb8. Tele2 Sverige AB v post-och telestyrelsen, C-203:...
531a07261283c4efb4cbae5fb8. Tele2 Sverige AB v post-och telestyrelsen, C-203:...
spratistha569
 
Content Moderation Services_ Leading the Future of Online Safety.docx
Content Moderation Services_ Leading the Future of Online Safety.docxContent Moderation Services_ Leading the Future of Online Safety.docx
Content Moderation Services_ Leading the Future of Online Safety.docx
sofiawilliams5966
 
Artificial-Intelligence-in-Autonomous-Vehicles (1)-1.pptx
Artificial-Intelligence-in-Autonomous-Vehicles (1)-1.pptxArtificial-Intelligence-in-Autonomous-Vehicles (1)-1.pptx
Artificial-Intelligence-in-Autonomous-Vehicles (1)-1.pptx
AbhijitPal87
 
BADS-MBA-Unit 1 that what data science and Interpretation
BADS-MBA-Unit 1 that what data science and InterpretationBADS-MBA-Unit 1 that what data science and Interpretation
BADS-MBA-Unit 1 that what data science and Interpretation
srishtisingh1813
 
GDPR Audit - GDPR gap analysis cost Data Protection People.pdf
GDPR Audit - GDPR gap analysis cost  Data Protection People.pdfGDPR Audit - GDPR gap analysis cost  Data Protection People.pdf
GDPR Audit - GDPR gap analysis cost Data Protection People.pdf
Data Protection People
 
Multi-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptx
Multi-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptxMulti-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptx
Multi-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptx
VikashVats1
 
Blue Dark Professional Geometric Business Project Presentation .pdf
Blue Dark Professional Geometric Business Project Presentation .pdfBlue Dark Professional Geometric Business Project Presentation .pdf
Blue Dark Professional Geometric Business Project Presentation .pdf
mohammadhaidarayoobi
 
Arrays in c programing. practicals and .ppt
Arrays in c programing. practicals and .pptArrays in c programing. practicals and .ppt
Arrays in c programing. practicals and .ppt
Carlos701746
 
Chronic constipation presentaion final.ppt
Chronic constipation presentaion final.pptChronic constipation presentaion final.ppt
Chronic constipation presentaion final.ppt
DrShashank7
 
Ethical Frameworks for Trustworthy AI – Opportunities for Researchers in Huma...
Ethical Frameworks for Trustworthy AI – Opportunities for Researchers in Huma...Ethical Frameworks for Trustworthy AI – Opportunities for Researchers in Huma...
Ethical Frameworks for Trustworthy AI – Opportunities for Researchers in Huma...
Karim Baïna
 
语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上
语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上
语法专题3-状语从句.pdf 英语语法基础部分,涉及到状语从句部分的内容来米爱上
JunZhao68
 
Chapter 5.1.pptxsertj you can get it done before the election and I will
Chapter 5.1.pptxsertj you can get it done before the election and I willChapter 5.1.pptxsertj you can get it done before the election and I will
Chapter 5.1.pptxsertj you can get it done before the election and I will
SotheaPheng
 
EPC UNIT-V forengineeringstudentsin.pptx
EPC UNIT-V forengineeringstudentsin.pptxEPC UNIT-V forengineeringstudentsin.pptx
EPC UNIT-V forengineeringstudentsin.pptx
ExtremerZ
 
Geospatial Data_ Unlocking the Power for Smarter Urban Planning.docx
Geospatial Data_ Unlocking the Power for Smarter Urban Planning.docxGeospatial Data_ Unlocking the Power for Smarter Urban Planning.docx
Geospatial Data_ Unlocking the Power for Smarter Urban Planning.docx
sofiawilliams5966
 
IST606_SecurityManagement-slides_ 4 pdf
IST606_SecurityManagement-slides_ 4  pdfIST606_SecurityManagement-slides_ 4  pdf
IST606_SecurityManagement-slides_ 4 pdf
nwanjamakane
 
How to Choose the Right Online Proofing Software
How to Choose the Right Online Proofing SoftwareHow to Choose the Right Online Proofing Software
How to Choose the Right Online Proofing Software
skalatskayaek
 
Market Share Analysis.pptx nnnnnnnnnnnnnn
Market Share Analysis.pptx nnnnnnnnnnnnnnMarket Share Analysis.pptx nnnnnnnnnnnnnn
Market Share Analysis.pptx nnnnnnnnnnnnnn
rocky
 
Alcoholic liver disease slides presentation new.pptx
Alcoholic liver disease slides presentation new.pptxAlcoholic liver disease slides presentation new.pptx
Alcoholic liver disease slides presentation new.pptx
DrShashank7
 
Mastering Data Science: Unlocking Insights and Opportunities at Yale IT Skill...
Mastering Data Science: Unlocking Insights and Opportunities at Yale IT Skill...Mastering Data Science: Unlocking Insights and Opportunities at Yale IT Skill...
Mastering Data Science: Unlocking Insights and Opportunities at Yale IT Skill...
smrithimuralidas
 
一比一原版(USC毕业证)南加利福尼亚大学毕业证如何办理
一比一原版(USC毕业证)南加利福尼亚大学毕业证如何办理一比一原版(USC毕业证)南加利福尼亚大学毕业证如何办理
一比一原版(USC毕业证)南加利福尼亚大学毕业证如何办理
Taqyea
 
531a07261283c4efb4cbae5fb8. Tele2 Sverige AB v post-och telestyrelsen, C-203:...
531a07261283c4efb4cbae5fb8. Tele2 Sverige AB v post-och telestyrelsen, C-203:...531a07261283c4efb4cbae5fb8. Tele2 Sverige AB v post-och telestyrelsen, C-203:...
531a07261283c4efb4cbae5fb8. Tele2 Sverige AB v post-och telestyrelsen, C-203:...
spratistha569
 
Content Moderation Services_ Leading the Future of Online Safety.docx
Content Moderation Services_ Leading the Future of Online Safety.docxContent Moderation Services_ Leading the Future of Online Safety.docx
Content Moderation Services_ Leading the Future of Online Safety.docx
sofiawilliams5966
 
Artificial-Intelligence-in-Autonomous-Vehicles (1)-1.pptx
Artificial-Intelligence-in-Autonomous-Vehicles (1)-1.pptxArtificial-Intelligence-in-Autonomous-Vehicles (1)-1.pptx
Artificial-Intelligence-in-Autonomous-Vehicles (1)-1.pptx
AbhijitPal87
 
BADS-MBA-Unit 1 that what data science and Interpretation
BADS-MBA-Unit 1 that what data science and InterpretationBADS-MBA-Unit 1 that what data science and Interpretation
BADS-MBA-Unit 1 that what data science and Interpretation
srishtisingh1813
 
GDPR Audit - GDPR gap analysis cost Data Protection People.pdf
GDPR Audit - GDPR gap analysis cost  Data Protection People.pdfGDPR Audit - GDPR gap analysis cost  Data Protection People.pdf
GDPR Audit - GDPR gap analysis cost Data Protection People.pdf
Data Protection People
 
Multi-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptx
Multi-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptxMulti-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptx
Multi-Agent-Solution-Architecture-for-Unified-Loan-Platform.pptx
VikashVats1
 
Blue Dark Professional Geometric Business Project Presentation .pdf
Blue Dark Professional Geometric Business Project Presentation .pdfBlue Dark Professional Geometric Business Project Presentation .pdf
Blue Dark Professional Geometric Business Project Presentation .pdf
mohammadhaidarayoobi
 
Arrays in c programing. practicals and .ppt
Arrays in c programing. practicals and .pptArrays in c programing. practicals and .ppt
Arrays in c programing. practicals and .ppt
Carlos701746
 
Chronic constipation presentaion final.ppt
Chronic constipation presentaion final.pptChronic constipation presentaion final.ppt
Chronic constipation presentaion final.ppt
DrShashank7
 
Ethical Frameworks for Trustworthy AI – Opportunities for Researchers in Huma...
Ethical Frameworks for Trustworthy AI – Opportunities for Researchers in Huma...Ethical Frameworks for Trustworthy AI – Opportunities for Researchers in Huma...
Ethical Frameworks for Trustworthy AI – Opportunities for Researchers in Huma...
Karim Baïna
 

Introduction to PyTorch

  • 1. RTSS Jun Young Park Introduction to PyTorch
  • 2. Objective  Understanding AutoGrad  Review  Logistic Classifier  Loss Function  Backpropagation  Chain Rule  Example : Find gradient from a matrix  AutoGrad  Solve the example with AutoGrad  Data Parallism in PyTorch  Why should we use GPUs?  Inside CUDA  How to parallelize our models  Experiment
  • 3. Simple but powerful implementation of backpropagation Understanding AutoGrad
  • 4. Logistic Classifier (Fully-Connected) 𝑊𝑋 + b = y 2.0 1.0 0.1 p = 0.7 p = 0.2 p = 0.1 S(y) ProbabilityLogits X : Input W, b : To be trained y : Prediction S(y) : Softmax function (Can be other activation functions) A B C 𝑆 𝑦 = 𝑒 𝑦 𝑖 𝑖 𝑒 𝑦 𝑖 represents the probabilities of elements in vector 𝑦. A Instance
  • 6. Loss Function  The vector can be very large when there are a lot of classes.  How can we find the distance between vector S(Predict) and L(Label) ? 𝐷 𝑆, 𝐿 = − 𝑖 𝐿𝑖 log(𝑆𝑖) 0.7 0.2 0.1 1.0 0.0 0.0 S(y) L ※ D(S,L) ≠ D(L,S) Don’t worry to take log(0) 𝑆 𝑦 = 𝑒 𝑦𝑖 𝑖 𝑒 𝑦 𝑖
  • 7. In-depth of Classifier Let there’re equations … 1. Affine Sum 𝜎(𝑥) = 𝑊𝑥 + 𝐵 2. Activation Function 𝑦(𝜎) = 𝑅𝑒𝐿𝑈 𝜎 3. Loss Function 𝐸 𝑦 = 1 2 𝑦𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑦 2 4. Gradient Descent 𝑤 ← 𝑤 − 𝛼 𝜕𝐸 𝜕𝑤 𝑏 ← 𝑏 − 𝛼 𝜕𝐸 𝜕𝑏 • Gradient Descent requires 𝜕𝐸 𝜕𝑤 and 𝜕𝐸 𝜕𝑏 . • How can we find them? -> Use chain rule ! 𝑦𝑡𝑎𝑟𝑔𝑒𝑡 : Training data 𝑦 : Prediction result
  • 8. Chain Rule • Let y(x) is defined below, 𝑥 influences 𝑔 𝑥 and 𝑔 𝑥 influences 𝑓 𝑔 𝑥 𝑦 𝑥 = 𝑓 𝑔 𝑥 = 𝑓 ∘ 𝑔(𝑥) • Find derivation of y(x) 𝑦′ 𝑥 = 𝑓′ 𝑔 𝑥 𝑔′ 𝑥 • in Liebniz notation… 𝑑𝑦 𝑑𝑥 = 𝑑𝑦 𝑑𝑓 𝑑𝑓 𝑑𝑔 𝑑𝑔 𝑑𝑥 = 1 ∗ 𝑓′ 𝑔 𝑥 ∗ 𝑔′(𝑥)
  • 9. Chain Rule 𝜕𝐸 𝜕𝑤 = 𝜕𝐸 𝜕𝑦 𝜕𝑦 𝜕𝜎 𝜕𝜎 𝜕𝑤 = 𝑥 𝑦 − 𝑦𝑡𝑎𝑟𝑔𝑒𝑡 (𝜎 > 0) 0 (𝜎 ≤ 0) 𝜕𝐸 𝜕𝑦 = 𝑦 − 𝑦𝑡𝑎𝑟𝑔𝑒𝑡 , 𝜕𝑦 𝜕𝜎 = 1 (𝜎 > 0) 0 (𝜎 ≤ 0) , 𝜕𝜎 𝜕𝑤 = 𝑥 Let there’re equations … 1. Affine Sum 𝜎(𝑥) = 𝑊𝑥 + 𝐵 2. Activation Function 𝑦(𝜎) = 𝑅𝑒𝐿𝑈 𝜎 3. Loss Function 𝐸 𝑦 = 1 2 𝑦𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑦 2 4. Gradient Descent 𝑤 ← 𝑤 − 𝛼 𝜕𝐸 𝜕𝑤 𝑏 ← 𝑏 − 𝛼 𝜕𝐸 𝜕𝑏
  • 10. Example : Finding gradient of 𝑋  Let input tensor 𝑋 is initialized by following square matrix of 3rd order. 𝑋 = 1 2 3 4 5 6 7 8 9  And 𝑌, 𝑍 is defined following … 𝑌 = 𝑋 + 3 𝑍 = 6(𝑌)2 = 6( 𝑋 + 3)2  And output 𝛿 is the average of tensor 𝑍 𝛿 = 𝑚𝑒𝑎𝑛 𝑍 = 1 9 𝑖 𝑗 𝑍𝑖𝑗
  • 11. Example : Finding gradient of 𝑋  We can find scalar 𝑍𝑖𝑗 from its definition (Linearity) 𝑍𝑖𝑗 = 6(𝑌𝑖𝑗)2 𝑌𝑖𝑗 = 𝑋𝑖𝑗 + 3  To find gradient, We use ‘Chain Rule’ so that we can find partial gradients. 𝜕𝛿 𝜕𝑍𝑖𝑗 = 1 9 , 𝜕𝑍𝑖𝑗 𝜕𝑌𝑖𝑗 = 12𝑌𝑖𝑗, 𝜕𝑌𝑖𝑗 𝜕𝑋𝑖𝑗 = 1 𝜕𝛿 𝜕𝑋𝑖𝑗 = 𝜕𝛿 𝜕𝑍𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑋𝑖𝑗 = 1 9 ∗ 12𝑌𝑖𝑗 ∗ 1 = 4 3 𝑋𝑖𝑗 + 3
  • 12. Example : Finding gradient of 𝑋  Thus, We can get a gradient of (1,1) element of 𝑋 𝜕𝛿 𝜕𝑋𝑖𝑗 = 4 3 𝑋𝑖𝑗 + 3 |(𝑖, 𝑗)=(1,1) = 4 3 1 + 3 = 16 3  Like this, We can get whole gradient matrix of 𝑋 … 𝜕𝛿 𝜕 𝑋 = 𝜕𝛿 𝜕𝑋11 𝜕𝛿 𝜕𝑋12 𝜕𝛿 𝜕𝑋13 𝜕𝛿 𝜕𝑋21 𝜕𝛿 𝜕𝑋22 𝜕𝛿 𝜕𝑋23 𝜕𝛿 𝜕𝑋31 𝜕𝛿 𝜕𝑋32 𝜕𝛿 𝜕𝑋33 = 16 3 20 3 24 3 28 3 32 3 36 3 40 3 44 3 48 3
  • 13. AutoGrad : Finding gradient of 𝑋 𝑋 = 1 2 3 4 5 6 7 8 9 𝑌 = 𝑋 + 3 𝑍 = 6(𝑌)2 = 6( 𝑋 + 3)2 𝛿 = 𝑚𝑒𝑎𝑛 𝑍 = 1 9 𝑖 𝑗 𝑍𝑖𝑗 𝜕𝛿 𝜕𝑋 = 𝜕𝛿 𝜕𝑋11 𝜕𝛿 𝜕𝑋12 𝜕𝛿 𝜕𝑋13 𝜕𝛿 𝜕𝑋21 𝜕𝛿 𝜕𝑋22 𝜕𝛿 𝜕𝑋23 𝜕𝛿 𝜕𝑋31 𝜕𝛿 𝜕𝑋32 𝜕𝛿 𝜕𝑋33 = 16 3 20 3 24 3 28 3 32 3 36 3 40 3 44 3 48 3 Each operation has its gradient function.
  • 14. Back Propagation  Get derivatives using ‘Back Propagation’ + 𝑥 𝑦 𝑧 𝑧 = 𝑥 + 𝑦 𝜕𝑧 𝜕𝑥 = 𝜕𝑧 𝜕𝑦 = 1 𝜕𝐿 𝜕𝑧 𝜕𝐿 𝜕𝑧 𝜕𝑧 𝜕𝑥 = 𝜕𝐿 𝜕𝑧 𝜕𝐿 𝜕𝑧 𝜕𝑧 𝜕𝑦 = 𝜕𝐿 𝜕𝑧 x 𝑥 𝑦 𝑧 𝑧 = 𝑥𝑦 𝜕𝑧 𝜕𝑥 = 𝑦, 𝜕𝑧 𝜕𝑦 = 𝑥 𝜕𝐿 𝜕𝑧 𝜕𝐿 𝜕𝑧 𝜕𝑧 𝜕𝑥 = 𝜕𝐿 𝜕𝑧 ∙ 𝑦 From output signal 𝐿 … 𝜕𝐿 𝜕𝑧 𝜕𝑧 𝜕𝑦 = 𝜕𝐿 𝜕𝑧 ∙ 𝑥
  • 15. Back Propagation  How about exponentation function? ^ 𝑛 𝑥 𝑧 𝑧 = 𝑥 𝑛 𝜕𝑧 𝜕𝑥 = 𝑛𝑥 𝑛−1 , 𝜕𝑧 𝜕𝑛 = 𝑥 𝑛 ln 𝑥 𝜕𝐿 𝜕𝑧 𝜕𝐿 𝜕𝑧 𝜕𝑧 𝜕𝑥 = 𝜕𝐿 𝜕𝑧 (𝑛𝑥 𝑛−1 ) From output signal 𝐿 … 𝑧 = 𝑥 𝑛 ln 𝑧 = 𝑛 ln 𝑥 1 𝑧 𝑑𝑧 = ln 𝑥 𝑑𝑛 𝑑𝑧 𝑑𝑛 = 𝑧 ln 𝑥 = 𝑥 𝑛 ln 𝑥 𝜕𝐿 𝜕𝑧 𝜕𝑧 𝜕𝑛 = 𝜕𝐿 𝜕𝑧 (𝑥 𝑛 ln 𝑥)
  • 16. Appendix : Operation Graph of 𝛿 (Matrix) +𝑋11 ^ 𝑌11 x 2 2 6 x 1 9 + 𝑍11 𝑋12 … … … 𝑋33 … … … … 𝑍12 𝑍33 𝛿 … 𝑍𝑖𝑗 = 6(𝑌𝑖𝑗)2 𝛿 = 𝑚𝑒𝑎𝑛 𝑍
  • 17. Appendix : Operation Graph of 𝛿 (Scalar) - Backpropagation +𝑋𝑖𝑗 ^ 𝑌𝑖𝑗 x 2 6 x 1 9 + 𝑍𝑖𝑗 𝛿 +𝑋𝑖𝑗 ^ x x+ 𝑍 𝑠𝑢𝑚 2 𝛽𝑖𝑗𝛼𝑖𝑗 𝜕𝛿 𝜕𝑋𝑖𝑗 = 𝜕𝛿 𝜕𝑍𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑋𝑖𝑗 = 𝜕𝛿 𝜕𝛽𝑖𝑗 𝜕𝛽𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝛼𝑖𝑗 𝜕𝛼𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑋𝑖𝑗 = 1 9 ∗ 1 ∗ 6 ∗ 2𝑌𝑖𝑗 ∗ 2 = 4 3 (𝑋𝑖𝑗 + 3) 𝜕𝛿 𝜕𝛽𝑖𝑗 = 1 9 𝜕𝛿 𝜕𝑍𝑖𝑗 = 1 9 ∗ 1 𝜕𝛿 𝜕𝛼𝑖𝑗 = 1 9 ∗ 1 ∗ 6 = 𝜕𝛿 𝜕𝛽𝑖𝑗 = 𝜕𝛿 𝜕𝛽𝑖𝑗 𝜕𝛽𝑖𝑗 𝜕𝑍𝑖𝑗 = 𝜕𝛿 𝜕𝛽𝑖𝑗 𝜕𝛽𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝛼𝑖𝑗 = 𝜕𝛿 𝜕𝛽𝑖𝑗 𝜕𝛽𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝛼𝑖𝑗 𝜕𝛼𝑖𝑗 𝜕𝑌𝑖𝑗 = 𝜕𝛿 𝜕𝛽𝑖𝑗 𝜕𝛽𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝛼𝑖𝑗 𝜕𝛼𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑋𝑖𝑗 𝜕𝛿 𝜕𝑌𝑖𝑗 = 1 9 ∗ 1 ∗ 6 ∗ 2𝑌𝑖𝑗 𝜕𝛿 𝜕𝑋𝑖𝑗 = 4 3 (𝑋𝑖𝑗 + 3) 𝜕𝛿 𝜕𝛿 = 1 𝛿
  • 20. Why GPU? (CUDA) T T Core T T Core T T Core T T Core T T Core T T Core … 3584 cores Good for few huge tasks Good for enormous small tasks 3.6 GHz 1.6 GHz (2.0 GHz @ O.C)
  • 21. Dataflow Diagram CPU GPU Memory MemorycudaMemcpy() cudaMalloc() __global__ sum() hello.cu NVCC Co-processor CPU GPU d_a d_b d_out h_a h_b h_out 1.Memcpy sum 2.Kernal call (cuBLAS) 3.Memcpy
  • 22. CUDA on Multi GPU System Quad SLI 14,336 CUDA cores 48GB of VRAM How can we use multi GPUs in PyTorch?
  • 23. Problem - Low utilization Only allocated single GPU. Zero Utilization Redundant Memory
  • 24. Problem - Duration & Memory Allocation  Large batch size causes lack of memory.  Out of memory error from PyTorch -> Python kernel dies.  Can’t set large batch size.  Can afford batch_size = 5, num_workers = 2  Can’t divide up the work with the other GPUs  Elapsed Time : 25m 44s (10 epochs)  Reached 99% of accuracy in 9 epochs (for training set)  It takes too much time.
  • 25. Data Parallelism in PyTorch  Implemented using torch.nn.DataParallel()  Can be used for wrapping a module or model.  Also support primitives (torch.nn.parallel.*)  Replicate : Replicate the model on multiple devices(GPUs)  Scatter : Distribute the input in the first-dimension.  Gather : Gather and concatenate the input in the first-dimension.  Apply-Parallel : Apply a set of already-distributed inputs to a set of already-distributed models.  PyTorch Tutorials – Multi-GPU examples  https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html
  • 26. Easy to Use : nn.DataParallel(model) - Practical Example 1. Define the model. 2. Wrap the model with nn.DataParallel(). 3. Access layers through ‘module’
  • 27. After Parallelism - GPU Utilization  Hyperparameters  Batch Size : 128  Number of Workers : 16  High Utilization.  Can use large memory space.  Allocated all GPUs
  • 28. After Parallelism - Training Performance  Hyperparameters  Batch Size : 128  Large batch size need more memory space  Number of Workers : 16  Recommended to set (4 * NUM_GPUs) – From the forum  Elapsed Time : 7m 50s (10 epochs)  Reached 99% of accuracy in 4 epochs (for training set).  It just taken 3m 10s.
  • 29. Q & A

Editor's Notes

  • #3: PyTorch 에서 제공하는 자동 미분 기능인 AutoGrad 를 이해하기 위해… Deep Learning 의 기초 이론을 다지고 Backpropagation 을 좀더 깊게 살펴본다. 그리고 그 Backpropagation 과 AutoGrad 의 구현을 보며 차이점을 이해한다. GPU 를 사용하는 이유와 CUDA 연산의 과정을 보고 PyTorch 에서 제공하는 데이터 병렬화 Method 의 사용법을 본다. 그리고 다중 GPU와 단일 GPU의 성능을 비교한다.
  • #4: Backpropagation 을 쉽게 구현한 모듈.
  • #5: 로지스틱 분류기의 기본적인 형태는 1차 선형 함수 꼴. (WX+b = y) 이 때 X 는 입력, W, b 는 가중치와 편향 (학습을 한다는 것은 적절한 가중치와 편향을 찾는 것.) Y 는 예측 결과 –> 이 결과 (Logits) 를 확률로 변환 (Softmax Function) 왜 ? : Logit이 매우 커질수도 있으니 이를 0~1 사이의 간단한 값으로 변환. 확률이 제일 높은 것으로 분류 클래스가 두개 ? : Logistic Classification 클래스가 여러 개 ? : Softmax/Multinomial Classification
  • #6: 클래스를 수로 나타내려면 ? 벡터에서 해당하는 클래스가 참의 값을 가지게 하면 됨. (제일 높은 확률을 갖는 클래스) Ex) 클래스 A ? -> [ 1 0 0 0 0 ….. ] : 클래스 A에 해당하는 인덱스의 값만 참, 나머지는 거짓
  • #7: 정답과 예측간의 거리 : Cross-Entropy Softmax will not be 0, 순서주의 즉 값이 작으면(가까우면) 옳은 판단. S(y) 의 합은 1이고 각 인스턴스는 0보다 큰 값을 가지므로 log(0) 에 대한 문제가 발생하지 않는다.
  • #10: 연쇄 법칙에 따라 Loss Function E 의 w 에 대한 미분은 다음과 같음. 이는 곧, w가 변할때 E가 변하는 정도는 합성된 함수에 의한 변화량의 곱과 같음. Y 가 E에 영향을 주고 시그마가 y에 영향을 주고 w가 시그마에 영향을 주는 것으로 나누어 표현. 각각에 대한 미분을 구하면 다음과 같음. 이 때, ReLU 는 Non-linear Function 이므로 구간을 나누어 미분.
  • #11: 위와같이 연산 정의…
  • #12: 행렬을 그대로 연산하기는 번거로우므로, 단일 요소에 대한 스칼라 표현을 사용. 그리고 부분 미분을 구하면… 이렇게 나오고 이것을 합성함수로 표시하면
  • #13: X에 1행 1열 원소인 1을 대입하면 다음과 같이 나옴. 마찬가지로 다른 원소들을 다시 원본 표현인 행렬로 나타내면 다음과 같고 결과는 저렇게 나옴.
  • #14: Gradient Function 은 결국 가장 기본적인 계산 노드의 Backpropagation 을 의미.
  • #15: 합성함수에 대하여 제대로 알았으므로 역전파로 가보자. x 와 y 가 z 에 값에 얼마나 영향을 줬는가? 즉, x 와 y 가 변할 때 z 가 어떻게 변하는가? 역전파 : 신호에 노드의 국소적 미분을 곱한 후 다음 노드로 전달 (거꾸로) 더하기 노드의 역전파는 이전 신호를 그대로 전파. 곱하기 노드의 역전파는 이전 신호에 반대편 신호를 곱한 신호를 전파.
  • #16: 제곱함수 노드와 그에 대한 순전파, 역전파는 다음과같이 나타남. 마찬가지로 z 에 대하여 x 와 n 이 주는 영향을 찾는다는 점에서 같음. 그렇게 구하면 다음과 같이 나옴.
  • #17: 행렬에 대한 계산 그래프를 나타내면 다음과 같음. 여러 요소에 대하여 각각 계산 후 그 원소 수와 합을 이용하여 평균을 구함.
  • #18: 행렬에 대한 표현은 이해하기 어려우므로, 각 원소에 대하여 Scalar 로 표시하도록 하자. 앞서 다룬 역전파 원리에 이해 아래와 같이 구해짐.