2024_GR5245 class2 notes
2024_GR5245 class2 notes
Tensors &
Auto differentiation
Tensors
• Multi-dimensional array
• Slicing / vectorization / broadcasting
• All elements of same data type
• Tensors in CPU
• General guidelines:
• Dim of input/output matrices: multiple of 8
• Batch size: multiple of 8
Auto differentiation (AD)
• Evaluate partial derivatives
• Tensors as variables
• Torch.autograd package
• Autograd engine in C++
• Reverse mode of AD: forward pass & backward pass
• Forward pass:
• Computational graph of operations
• Backward pass
• Compute gradient at each node
• Use chain rule to propagate gradients back to leaf nodes
• Forward mode AD vs reverse mode AD
Reverse mode Auto Differentiation
• Forward pass:
• Dynamic graph
Using auto differentiation
• Example: MNIST images with 28 x 28 pixels
• Image → 𝑥 = vector of 784 values
• Output: 𝑦 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑥𝑊 + 𝑏)
• Parameters:
• 𝑊: (784,10) weight parameters
• 𝑏: vector of 10 bias parameters
• Goal: find optimal 𝑊, 𝑏 that minimize average cross entropy loss
• Forward pass • Gradient step
• Backward pass
Key elements
• Datasets
• Defining model
• Loss function
• Optimization method
• Performance evaluation