2024_GR5245 class2 notes

The document outlines a course on Python for Deep Learning, focusing on tensors, auto differentiation, and GPU processing. It covers key concepts such as CUDA cores, tensor operations, and the use of libraries like cuBLAS and cuDNN for mathematical computations. Additionally, it explains the reverse mode of auto differentiation and its application in optimizing neural network parameters.

Uploaded by

yl5404

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views10 pages

2024_GR5245 class2 notes

Uploaded by

yl5404

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

GR5245 Fall 2024

Python for Deep Learning

Instructor: Ka Yi Ng

Tensors &
Auto differentiation
Tensors
• Multi-dimensional array
• Slicing / vectorization / broadcasting
• All elements of same data type

• ‘cuda’ device: NVIDIA GPU supported

Tensors
• Tensors between CPU and GPU

• Tensors in CPU

• Defining default device

Inside GPU
• CUDA cores
• General-purpose processing units
• Floating point and integer operations (e.g. FP32,
FP16, FP64,..)
• Tensor cores
• Specialized processing units for matrix
multiplications
• Optimized for mixed precision arithmetic (e.g.
FP16 and FP32)
• Streaming Multiprocessor (SM):
• Many CUDA cores / tensor cores
• Single instruction, multi-threaded model (SIMT)
• Group of 32 threads: execution of same
instruction on different data elements
• DRAM: dynamic RAM for data storage
Inside Tensor Core
• “matrix-multiply-and-accumulate” operation on 4x4 matrix
• Automatic mixed precision arithmetic
• FP16 inputs and FP32 output
• Benefits:
• Lower memory usage
• Larger neural networks
• Faster data transfer
• Faster math calculations
• Math-bound vs memory-bound operations
CUDA C++ libraries
• APIs to access GPUs for intensive math calculations
• cuBLAS: linear algebra, matrix multiplication
• cuDNN: deep neural networks (e.g. CNN)
• Tiled outer product approach
• Tile size : 256x128, 128x256, 128x128, etc.

• General guidelines:
• Dim of input/output matrices: multiple of 8
• Batch size: multiple of 8
Auto differentiation (AD)
• Evaluate partial derivatives
• Tensors as variables
• Torch.autograd package
• Autograd engine in C++
• Reverse mode of AD: forward pass & backward pass
• Forward pass:
• Computational graph of operations
• Backward pass
• Compute gradient at each node
• Use chain rule to propagate gradients back to leaf nodes
• Forward mode AD vs reverse mode AD
Reverse mode Auto Differentiation
• Forward pass:

• Backward pass • Disabling gradient calculations

• Dynamic graph
Using auto differentiation
• Example: MNIST images with 28 x 28 pixels
• Image → 𝑥 = vector of 784 values
• Output: 𝑦 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑥𝑊 + 𝑏)
• Parameters:
• 𝑊: (784,10) weight parameters
• 𝑏: vector of 10 bias parameters
• Goal: find optimal 𝑊, 𝑏 that minimize average cross entropy loss
• Forward pass • Gradient step

• Backward pass
Key elements
• Datasets
• Defining model
• Loss function
• Optimization method
• Performance evaluation

Deep Learning A Z PDF
100% (6)
Deep Learning A Z PDF
799 pages
Deep Learning UNIT 5
No ratings yet
Deep Learning UNIT 5
182 pages
Parallel Programming With CUDA _ Architecture, Analysis
No ratings yet
Parallel Programming With CUDA _ Architecture, Analysis
93 pages
Notions de Deep Learning
No ratings yet
Notions de Deep Learning
116 pages
Deep Learning Lab Manual (1)
No ratings yet
Deep Learning Lab Manual (1)
35 pages
Christopher_Noel_Hesse
No ratings yet
Christopher_Noel_Hesse
103 pages
nvidia-learning-training course-catalog
No ratings yet
nvidia-learning-training course-catalog
34 pages
DAA UNIT-V Branch and Bound and P &NP
No ratings yet
DAA UNIT-V Branch and Bound and P &NP
47 pages
C# Workshop 2018 - Day 2
No ratings yet
C# Workshop 2018 - Day 2
122 pages
Day 5 Slot 1 SPR 23 24 Update
No ratings yet
Day 5 Slot 1 SPR 23 24 Update
95 pages
Day5 FDP IoT Part1
No ratings yet
Day5 FDP IoT Part1
89 pages
Accelerating XD GRASP MRImage Reconstruction
No ratings yet
Accelerating XD GRASP MRImage Reconstruction
77 pages
Lecture06-Syntax Formal Languages
No ratings yet
Lecture06-Syntax Formal Languages
43 pages
Ug4 Proj
No ratings yet
Ug4 Proj
44 pages
6.5.1.8 Proposal For AI Lab
No ratings yet
6.5.1.8 Proposal For AI Lab
40 pages
Mug21 DL ML v3
No ratings yet
Mug21 DL ML v3
84 pages
Lecture02 Ambiguity
No ratings yet
Lecture02 Ambiguity
21 pages
Week 13 GCP Lec Notes
No ratings yet
Week 13 GCP Lec Notes
28 pages
Assignment 10
No ratings yet
Assignment 10
18 pages
Sid AIML SEM6
No ratings yet
Sid AIML SEM6
32 pages
DSE_3141_Deep_Learning_Lab_Manual_2024_Week4
No ratings yet
DSE_3141_Deep_Learning_Lab_Manual_2024_Week4
14 pages
Deep learning1
No ratings yet
Deep learning1
23 pages
简单的基于LSTM的股市分析与预测（Python）
No ratings yet
简单的基于LSTM的股市分析与预测（Python）
17 pages
Rdbms Practical Program
No ratings yet
Rdbms Practical Program
25 pages
GR5010_Handout2_ProbabilityReviewNew
No ratings yet
GR5010_Handout2_ProbabilityReviewNew
31 pages
Selenium Automation
No ratings yet
Selenium Automation
87 pages
Mastering CUDA C++ Programming: A Comprehensive Guidebook
From Everand
Mastering CUDA C++ Programming: A Comprehensive Guidebook
Brett Neutreon
No ratings yet
GR5010_Handout0_2024
No ratings yet
GR5010_Handout0_2024
26 pages
DL Inference FPGA Class1
No ratings yet
DL Inference FPGA Class1
56 pages
Slide Set 6 Strings and Pointers
No ratings yet
Slide Set 6 Strings and Pointers
50 pages
8 Deep Learning CNN
No ratings yet
8 Deep Learning CNN
63 pages
Lecture17 12
No ratings yet
Lecture17 12
86 pages
CSE 305 SE Lab Manual
No ratings yet
CSE 305 SE Lab Manual
40 pages
Control System Term Paper
No ratings yet
Control System Term Paper
12 pages
APznzaa7hF-mfVj2V8zO8HsZAO1P27t34A_Cwjs4-Z3dfKvBUC5VsYBuhEAJ9SIGkA_GXNl5dyWxHJkRO3WAl2Jt4EKGp-jnhYlLaWvgg0wLs49f16rQ9FnUS0CCjb-vIvNwOm12gNSGVrKSlqloDZSL1rH-gaTCVskKMNLwlnBmLJqnqBYBomhOI-umTK9SEbJe5htEpTgTAzDOWsEifZHJrzFN3v8RrsLh3b6BmYq
No ratings yet
APznzaa7hF-mfVj2V8zO8HsZAO1P27t34A_Cwjs4-Z3dfKvBUC5VsYBuhEAJ9SIGkA_GXNl5dyWxHJkRO3WAl2Jt4EKGp-jnhYlLaWvgg0wLs49f16rQ9FnUS0CCjb-vIvNwOm12gNSGVrKSlqloDZSL1rH-gaTCVskKMNLwlnBmLJqnqBYBomhOI-umTK9SEbJe5htEpTgTAzDOWsEifZHJrzFN3v8RrsLh3b6BmYq
6 pages
CTSD Week 3
No ratings yet
CTSD Week 3
23 pages
Course Materials May Not Be Reproduced in Whole or in Part Without The Prior Written Permission of IBM
No ratings yet
Course Materials May Not Be Reproduced in Whole or in Part Without The Prior Written Permission of IBM
19 pages
Computational Aerodynamics and Artificial Intelligence: Unmeel B. Mehta and Paul Kutler
No ratings yet
Computational Aerodynamics and Artificial Intelligence: Unmeel B. Mehta and Paul Kutler
41 pages
03-Methods and Functions
No ratings yet
03-Methods and Functions
24 pages
[量化策略]使用Python优化RSI策略
No ratings yet
[量化策略]使用Python优化RSI策略
7 pages
Deep Learning UNIT-3
No ratings yet
Deep Learning UNIT-3
20 pages
521010J Toolbox Intro
No ratings yet
521010J Toolbox Intro
52 pages
Lec-07-8
No ratings yet
Lec-07-8
40 pages
Programming in Parallel With CUDA A Practical Guide (Richard Ansorge)
100% (1)
Programming in Parallel With CUDA A Practical Guide (Richard Ansorge)
477 pages
NewSyllabus_1157202352913185 (5)
No ratings yet
NewSyllabus_1157202352913185 (5)
7 pages
GR5010_Handout5OptionsBasics2024
100% (1)
GR5010_Handout5OptionsBasics2024
31 pages
GR5010_Handout1_Arbitrage
No ratings yet
GR5010_Handout1_Arbitrage
24 pages
Supervised Learning Network: "Principles of Soft Computing, 2
No ratings yet
Supervised Learning Network: "Principles of Soft Computing, 2
30 pages
01 Intro
No ratings yet
01 Intro
49 pages
CUDA Zone - Library of Resources - NVIDIA Developer
No ratings yet
CUDA Zone - Library of Resources - NVIDIA Developer
7 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
GR5010_Handout3_Futures2024
No ratings yet
GR5010_Handout3_Futures2024
24 pages
Hardware Accleration For ML
No ratings yet
Hardware Accleration For ML
26 pages
Introduction To Computer Simulation of Physical Systems (Lecture 1)
No ratings yet
Introduction To Computer Simulation of Physical Systems (Lecture 1)
63 pages
AA12_Deep_Learning_2024 (1)
No ratings yet
AA12_Deep_Learning_2024 (1)
30 pages
15 ML
No ratings yet
15 ML
60 pages
C++ 06 Declare Multiple Variables
No ratings yet
C++ 06 Declare Multiple Variables
5 pages
GR5010_Handout6OptionsPricing2023 (3)
No ratings yet
GR5010_Handout6OptionsPricing2023 (3)
29 pages
Distributed Deep Learning For Parallel Training
No ratings yet
Distributed Deep Learning For Parallel Training
7 pages
Course contents #1
No ratings yet
Course contents #1
24 pages
Image Recognitiion
No ratings yet
Image Recognitiion
50 pages
GR5010_Handout4_Futures_CTA_2024
No ratings yet
GR5010_Handout4_Futures_CTA_2024
24 pages
Let Us Code: Using Deep Learning Through A Library
No ratings yet
Let Us Code: Using Deep Learning Through A Library
17 pages
Spss Data Entry and Coding
No ratings yet
Spss Data Entry and Coding
2 pages
Watir
No ratings yet
Watir
25 pages
hw5
No ratings yet
hw5
10 pages
Counters: Log File
No ratings yet
Counters: Log File
8 pages
Form Customer
No ratings yet
Form Customer
3 pages
Outbound Delivery Automatic Packing
No ratings yet
Outbound Delivery Automatic Packing
52 pages
DP
No ratings yet
DP
10 pages
Report Data
No ratings yet
Report Data
5 pages
Introduction To Deep Neural Networks - DataCamp
No ratings yet
Introduction To Deep Neural Networks - DataCamp
10 pages
20 10 2022 Class Notes
No ratings yet
20 10 2022 Class Notes
5 pages
CUDA Programming with C++: From Basics to Expert Proficiency
From Everand
CUDA Programming with C++: From Basics to Expert Proficiency
William Smith
No ratings yet
2024 GR5245 HW1_due0929_11pm
No ratings yet
2024 GR5245 HW1_due0929_11pm
2 pages
09 Tensorflow101 Slide
No ratings yet
09 Tensorflow101 Slide
78 pages
Autoencoders: Parallel Programming Parallel Processing
No ratings yet
Autoencoders: Parallel Programming Parallel Processing
5 pages
deepLearning
No ratings yet
deepLearning
2 pages
24CH10039 AGV Task 4
No ratings yet
24CH10039 AGV Task 4
3 pages
Sony Ai Content[1]
No ratings yet
Sony Ai Content[1]
26 pages
Assignment 2-lms
No ratings yet
Assignment 2-lms
2 pages
Gpu, Cuda and Pycuda
No ratings yet
Gpu, Cuda and Pycuda
11 pages
Gujarat Technological University: W.E.F. AY 2018-19
No ratings yet
Gujarat Technological University: W.E.F. AY 2018-19
3 pages
Paper - V: Cloud Computing and Python: Department of Computer Science SSBN Degree College:: Anantapur (Autonomous)
No ratings yet
Paper - V: Cloud Computing and Python: Department of Computer Science SSBN Degree College:: Anantapur (Autonomous)
2 pages
MITXPRO BROCHURE Deep Learning DRN ENG Oct 2021
No ratings yet
MITXPRO BROCHURE Deep Learning DRN ENG Oct 2021
9 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
Benchmarking The NVIDIA 8800GTX With The CUDA Development Platform
No ratings yet
Benchmarking The NVIDIA 8800GTX With The CUDA Development Platform
2 pages
Deep Learning Cookbook
No ratings yet
Deep Learning Cookbook
24 pages
ML Unit-5
No ratings yet
ML Unit-5
19 pages
Lecture Notes 01
No ratings yet
Lecture Notes 01
77 pages
Pytorch 101: Deep Learning PHD Course 2017/2018
No ratings yet
Pytorch 101: Deep Learning PHD Course 2017/2018
19 pages
Introduction To IF Statements in Excel
No ratings yet
Introduction To IF Statements in Excel
7 pages
Syllabus - Ml Lab
No ratings yet
Syllabus - Ml Lab
3 pages
Frequently Asked Questions On Batch Versions - Document 1487424.1
No ratings yet
Frequently Asked Questions On Batch Versions - Document 1487424.1
15 pages
Learnopencv Com Demystifying Gpu Architectures For Deep Learning
No ratings yet
Learnopencv Com Demystifying Gpu Architectures For Deep Learning
1 page
2 Computer Programming Module 7
No ratings yet
2 Computer Programming Module 7
6 pages
SAP BPC Script Logic
No ratings yet
SAP BPC Script Logic
8 pages
Unit 4 HIVE - PIG
No ratings yet
Unit 4 HIVE - PIG
71 pages
Acceleratingpythonongpus
No ratings yet
Acceleratingpythonongpus
33 pages
A Deep Learning Prediction Process Accelerator Based FPGA PDF
No ratings yet
A Deep Learning Prediction Process Accelerator Based FPGA PDF
4 pages
Intro To Deep Learning
100% (1)
Intro To Deep Learning
35 pages
Data Structure MCQ
No ratings yet
Data Structure MCQ
46 pages

2024_GR5245 class2 notes

Uploaded by

2024_GR5245 class2 notes

Uploaded by

GR5245 Fall 2024

Python for Deep Learning

• ‘cuda’ device: NVIDIA GPU supported

• Defining default device

• Backward pass • Disabling gradient calculations

You might also like