0% found this document useful (0 votes)
15 views10 pages

2024_GR5245 class2 notes

The document outlines a course on Python for Deep Learning, focusing on tensors, auto differentiation, and GPU processing. It covers key concepts such as CUDA cores, tensor operations, and the use of libraries like cuBLAS and cuDNN for mathematical computations. Additionally, it explains the reverse mode of auto differentiation and its application in optimizing neural network parameters.

Uploaded by

yl5404
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views10 pages

2024_GR5245 class2 notes

The document outlines a course on Python for Deep Learning, focusing on tensors, auto differentiation, and GPU processing. It covers key concepts such as CUDA cores, tensor operations, and the use of libraries like cuBLAS and cuDNN for mathematical computations. Additionally, it explains the reverse mode of auto differentiation and its application in optimizing neural network parameters.

Uploaded by

yl5404
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

GR5245 Fall 2024

Python for Deep Learning


Instructor: Ka Yi Ng

Tensors &
Auto differentiation
Tensors
• Multi-dimensional array
• Slicing / vectorization / broadcasting
• All elements of same data type

• ‘cuda’ device: NVIDIA GPU supported


Tensors
• Tensors between CPU and GPU

• Tensors in CPU

• Defining default device


Inside GPU
• CUDA cores
• General-purpose processing units
• Floating point and integer operations (e.g. FP32,
FP16, FP64,..)
• Tensor cores
• Specialized processing units for matrix
multiplications
• Optimized for mixed precision arithmetic (e.g.
FP16 and FP32)
• Streaming Multiprocessor (SM):
• Many CUDA cores / tensor cores
• Single instruction, multi-threaded model (SIMT)
• Group of 32 threads: execution of same
instruction on different data elements
• DRAM: dynamic RAM for data storage
Inside Tensor Core
• “matrix-multiply-and-accumulate” operation on 4x4 matrix
• Automatic mixed precision arithmetic
• FP16 inputs and FP32 output
• Benefits:
• Lower memory usage
• Larger neural networks
• Faster data transfer
• Faster math calculations
• Math-bound vs memory-bound operations
CUDA C++ libraries
• APIs to access GPUs for intensive math calculations
• cuBLAS: linear algebra, matrix multiplication
• cuDNN: deep neural networks (e.g. CNN)
• Tiled outer product approach
• Tile size : 256x128, 128x256, 128x128, etc.

• General guidelines:
• Dim of input/output matrices: multiple of 8
• Batch size: multiple of 8
Auto differentiation (AD)
• Evaluate partial derivatives
• Tensors as variables
• Torch.autograd package
• Autograd engine in C++
• Reverse mode of AD: forward pass & backward pass
• Forward pass:
• Computational graph of operations
• Backward pass
• Compute gradient at each node
• Use chain rule to propagate gradients back to leaf nodes
• Forward mode AD vs reverse mode AD
Reverse mode Auto Differentiation
• Forward pass:

• Backward pass • Disabling gradient calculations

• Dynamic graph
Using auto differentiation
• Example: MNIST images with 28 x 28 pixels
• Image → 𝑥 = vector of 784 values
• Output: 𝑦 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑥𝑊 + 𝑏)
• Parameters:
• 𝑊: (784,10) weight parameters
• 𝑏: vector of 10 bias parameters
• Goal: find optimal 𝑊, 𝑏 that minimize average cross entropy loss
• Forward pass • Gradient step

• Backward pass
Key elements
• Datasets
• Defining model
• Loss function
• Optimization method
• Performance evaluation

You might also like