0% found this document useful (0 votes)

17 views47 pages

Intro_DL_05

Uploaded by

Hoàng Khải

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views47 pages

Intro_DL_05

Uploaded by

Hoàng Khải

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

INTRODUCTION TO DEEP LEARNING (IT3320E)

5 – Hardware and Software for Deep Learning

Hung Son Nguyen

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY

October 18, 2023

Agenda

1 HARDWARE FOR DEEP LEARNING (CPU, GPU AND TPU)

CPU, GPU, TPU comparison

2 DEEP LEARNING FRAMEWORKS

3 ACCELERATOR AND COMPRESSION TOOLS

1
Section 1

Hardware for deep learning (CPU, GPU and TPU)

2
Introduction

WHAT ARE THE DIFFERENCES BETWEEN CPU AND GPU?

CPU (central processing unit) is a generalized processor that is designed to
carry out a wide variety of tasks.
GPU (graphics processing unit) is a specialized processing unit with
enhanced mathematical computation capability, ideal for computer graphics
and machine-learning tasks.
TPUs (Tensor Processing Units) is an application-specific integrated circuits
(ASICs). TPUs were designed from the ground up by Google; they started
using TPUs in 2015 and made them public in 2018.

3
Central Processing Unit (CPU)
CPU is the heart of any and every computer.
The CPU handles the core processing tasks in a
computer — the literal computation that drives
every single action in a computer system.
Computers work through the processing of binary
data, or ones and zeroes.
To translate that information into the software, graphics, animations, and
every other process executed on a computer, those ones and zeroes must
work through the logical structure of the CPU.
That includes the basic arithmetic, logical functions (AND, OR, NOT) and
input and output operations. The CPU is the brain, taking information,
calculating it, and moving it where it needs to go.
Within every CPU, there are a few standard components, which include the
following: Core, Cache, Memory Management Unit (MMU), CPU Clock and
4
Control Unit.
Components of CPU: Core
A core, or CPU core, is the ”brain” of a CPU. It receives instructions, and
performs calculations, or operations, to satisfy those instructions. A CPU can
have multiple cores.
A core typically functions through what is called the “instruction cycle,”
where instructions are pulled from memory (fetch), decoded into processing
language (decode), and executed through the logical gates of the core
(execute).
Initially, all CPUs were single-core, but with the
proliferation of multi-core CPUs, we’ve seen an
increase in processing power.
As of 2019, the majority of consumer CPUs feature
between two and twelve cores. Workstation and
server CPUs may feature as many as 48 cores.
5
Cache
Cache is a small amount of memory which is a part of the CPU – closer to the
CPU than RAM. It is used to temporarily hold instructions and data that the CPU
is likely to reuse.

Since CPUs work so fast to complete millions of

calculations per second, they require ultra-fast (and
expensive) memory to do it—memory that is much
faster than hard drive storage or even the fastest RAM.
In any CPU configuration, you will see some L1, L2,
and/or L3 cache arrangement, with L1 being the fastest
and L3 the slowest.
The CPU will store the most immediately needed
information in L1, and as the data loses priority, it will
move out into L2, then L3, and then out to RAM or the
6
hard disk.
Other components

Memory Management Unit (MMU): The MMU controls data movement

between the CPU and RAM during the instruction cycle.
CPU Clock and Control Unit: Every CPU works on synchronizing processing
tasks through a clock. The CPU clock determines the frequency at which the
CPU can generate electrical pulses, its primary way of processing and
transmitting data, and how rapidly the CPU can work. So, the higher the CPU
clock rate, the faster it will run and quicker processor-intensive tasks can be
completed.

7
CPU

All these components work together to provide an environment where

high-speed task parallelism can take place.
As the CPU clock drives activities, the CPU cores switch rapidly between
hundreds of different tasks per second. That’s why your computer can run
multiple programs, display a desktop, connect to the internet, and more all
at the same time.
The CPU is responsible for all activity on a computer.
When you close or open programs, the CPU must send the correct instructions
to pull information from the hard drive and run executable code from RAM.
When playing a game, the CPU handles processing graphical information to
display on the screen.
When compiling code, the CPU handles all the computation and mathematics
involved.

8
Graphics Processing Unit (GPU)
One of these tasks, graphical processing, is generally considered one of the
more complex processing tasks for the CPU. Solving that complexity has led
to technology with applications far beyond graphics.
The challenge in processing graphics is that graphics call on complex
mathematics to render, and those complex mathematics must compute in
parallel to work correctly.

For example:
A graphically intense video game might contain
hundreds or thousands of polygons on the screen
at any given time, each with its individual
movement, color, lighting, and so on.
CPUs aren’t made to handle that kind of workload.
That’s where graphical processing units (GPUs)
come into play. 9
Graphics Processing Unit (GPU)

GPUs are similar in function to CPU: they contain cores, memory, and other
components.
Instead of emphasizing context switching to manage multiple tasks, GPU
acceleration emphasizes parallel data processing through a large number of
cores.
These cores are usually less powerful individually than the core of a CPU.
GPUs also typically have less interoperability with different hardware APIs
and houseless memory.
Where they shine is pushing large amounts of processed data in parallel.
Instead of switching through multiple tasks to process graphics, a GPU
simply takes batch instructions and pushes them out at high volume to
speed processing and display.

10
TPU: Tensor Processing Unit
TPU stands for tensor processing unit and is a designated architecture for deep
learning or machine learning applications.
Invented by Google, TPUs are application-specific integrated circuits (ASIC)
designed specifically to handle the computational demands of machine
learning and accelerate AI calculations and algorithms.
Google began using TPUs internally in 2015, and in 2018 they made them
publicly available to others.
When Google designed the TPU, they created a domain-specific architecture.
What that means is that instead of designing a general purpose processor
like a GPU or CPU, Google designed it as a matrix processor that was
specialized for neural network work loads.
By designing the TPU as a matrix processor instead of a general purpose
processor, Google solved the memory access problem that slows down GPUs
and CPUs and requires them to use more processing power.
11
How Do TPUs Work?

Here’s how a TPU works:

TPU loads the parameter from memory into the matrix of multipliers and
adders.
TPU loads the data from memory.
As multiplications are executed, their results are passed on to the next
multipliers while simultaneously taking summation at the same time.
The output from these steps will be whatever the summation of all the
multiplication results is between the data and parameters.
No memory access at all is required throughout the entire process of these
massive calculations and data passing.

12
Understanding Matrix Multiplication with TPU

13
TPU: pros and cons

TPUs are extremely valuable and bring a lot to the table. Their only real
downside is that they are more expensive than GPUs and CPUs.
Their list of pros highly outweighs their high price tag.
TPUs are a great choice for those who want to:
Accelerate machine learning applications
Scale applications quickly
Cost effectively manage machine learning workloads
Start with well-optimized, open source reference models

14
Agenda
1 HARDWARE FOR DEEP LEARNING (CPU, GPU AND TPU)
CPU, GPU, TPU comparison
2 DEEP LEARNING FRAMEWORKS
3 ACCELERATOR AND COMPRESSION TOOLS

15
Advantages of CPU architecture

Flexibility: CPUs are flexible and resilient and can handle a variety of tasks
outside of graphics processing. Because of their serial processing
capabilities, the CPU can multitask across multiple activities in your
computer. Because of this, a strong CPU can provide more speed for typical
computer use than a GPU.
Contextual Power: In specific situations, the CPU will outperform the GPU.
For example, the CPU is significantly faster when handling several different
types of system operations (random access memory, mid-range
computational operations, managing an operating system, I/O operations).
Precision: CPUs can work on mid-range mathematical equations with a
higher level of precision. CPUs can handle the computational depth and
complexity more readily, becoming increasingly crucial for specific
applications.

16
Advantages of CPU architecture (cont.)

Access to Memory: CPUs usually

contain significant local cache memory,
which means they can handle a larger
set of linear instructions and, hence,
more complex system and
computational operations.
Cost and Availability: CPUs are more
readily available, more widely
manufactured, and cost-effective for
consumer and enterprise use.
Additionally, hardware manufacturers
still create thousands of motherboard
designs to house a wide range of CPUs.
17
Disadvantage of CPU

Parallel Processing: CPUs cannot handle parallel processing like a GPU, so

large tasks that require thousands or millions of identical operations will
choke a CPU’s capacity to process data.
Slow Evolution: In line with Moore’s Law, developing more powerful CPUs
will eventually slow, which means less improvement year after year. The
expansion of multi-core CPUs has mitigated this somewhat.
Compatibility: Not every system or software is compatible with every
processor. For example, applications written for x86 Intel Processors will not
run on ARM processors. This is less of a problem as more computer
manufacturers use standard processor sets (see Apple’s move to Intel
processors), but it still presents issues between PCs and mobile devices.

18
Advantages of GPU

Some of the advantages of a GPU include the following:

High Data Throughput: a GPU consist of hundreds of cores performing the

same operation on multiple data items in parallel. Because of that, a GPU
can push vast volumes of processed data through a workload, speeding up
specific tasks beyond what a CPU can handle.
Massive Parallel Computing: Whereas CPUs excel in more complex
computations, GPUs excel in extensive calculations with numerous similar
operations, such as computing matrices or modeling complex systems.

These two advantages were the main reasons GPUs were created because both
contribute to complex graphics processing. However, the GPU structure quickly
led developers and engineers to apply GPU technology to other
high-performance applications:

19
High-performance Application of GPU
Bitcoin Mining: The process of mining bitcoins involves using computational
power to solve complex cryptographic hashes. The increasing expansion of
Bitcoin and the difficulty of mining bitcoins has led bitcoin mines to
implement a GPU to handle immense volumes of cryptographic data in the
hopes of earning bitcoins.
Machine Learning: Neural networks, particularly those used for
deep-learning algorithms, function through the ability to process large
amounts of training data through smaller nodes of operations. GPUs for
machine learning have emerged to help process the enormous data sets
used to train machine-learning algorithms and AI.
Analytics and Data Science: GPUs are uniquely suited to help analytics
programs process large amounts of base data from different sources.
Furthermore, these same GPUs can power the computation necessary for
deep data sets associated with research areas like life sciences (genomic
20
sequencing).
Disadvantages of GPU

Multitasking: GPUs aren’t built for multitasking, so they don’t have much
impact in areas like general-purpose computing.
Cost: While the price of GPUs has fallen somewhat over the years, they are
still significantly more expensive than CPUs. This cost rises more when
talking about a GPU built for specific tasks like mining or analytics.
Power and Complexity: While a GPU can handle large amounts of parallel
computing and data throughput, they struggle when the processing
requirements become more chaotic. Branching logic paths, sequential
operations, and other approaches to computing impede the effectiveness of
a GPU.

21
Hardware comparison

22
When To Use CPU, GPU, Or TPU
To Run Your Machine Learning Models?

CPUs are general-purpose processors, while GPUs and TPUs are optimized
accelerators that accelerate machine learning.
CPUs:

Prototypes that require the highest flexibility

Training simple models that do not require a long time
Training small models with small effective batch sizes
Mostly written in C++ based on custom TensorFlow operations
Models with limited I/O or limited system’s networking bandwidth

23
When To Use CPU, GPU, Or TPU
To Run Your Machine Learning Models?
GPUs:
Models that are too difficult to change or sources that do not exist
Models with numerous custom TensorFlow operations that a GPU must
support
Models that are not available on Cloud TPU
Medium or larger size models with bigger effective batch sizes
TPUs:
Training models using mostly matrix computations
Training models without custom TensorFlow operations inside the main
training loop
Training Models that require weeks or months to complete
Training huge models with very large effective batch sizes 24
Section 2

Deep learning frameworks

PyTorch, TensorFlow and Keras

https://github.com/397090770/spark-ai-summit-europe-2018-10/
blob/master/ppt/
a-tale-of-three-deep-learning-frameworks-tensorflow-keras-pytorch
pdf

25
What is Keras?

Keras is an effective high-level neural network Application Programming

Interface (API) written in Python. This open-source neural network library is
designed to provide fast experimentation with deep neural networks, and it
can run on top of CNTK, TensorFlow, and Theano.
Keras focuses on being modular, user-friendly, and extensible. It doesn’t
handle low-level computations; instead, it hands them off to another library
called the Backend.
Keras was adopted and integrated into TensorFlow in mid-2017. Users can
access it via the tf.keras module. However, the Keras library can still operate
separately and independently.

26
What is PyTorch?

PyTorch is a relatively new deep learning framework based on Torch.

Developed by Facebook’s AI research group and open-sourced on GitHub in
2017, it’s used for natural language processing applications.
PyTorch has a reputation for simplicity, ease of use, flexibility, efficient
memory usage, and dynamic computational graphs.
It also feels native, making coding more manageable and increasing
processing speed.

27
What is TensorFlow?
TensorFlow is an end-to-end open-source deep learning framework
developed by Google and released in 2015. It is known for documentation
and training support, scalable production and deployment options, multiple
abstraction levels, and support for different platforms, such as Android.
TensorFlow is a symbolic math library used for neural networks and is best
suited for dataflow programming across a range of tasks. It offers multiple
abstraction levels for building and training models.
A promising and fast-growing entry in the world of deep learning,
TensorFlow offers a flexible, comprehensive ecosystem of community
resources, libraries, and tools that facilitate building and deploying machine
learning apps. Also, as mentioned before, TensorFlow has adopted Keras,
which makes comparing the two seem problematic. Nevertheless, we will
still compare the two frameworks for the sake of completeness, especially
since Keras users don’t necessarily have to use TensorFlow.
28
PyTorch vs TensorFlow

Both TensorFlow and PyTorch offer useful abstractions that ease the
development of models by reducing boilerplate code. They differ because
PyTorch has a more “pythonic” approach and is object-oriented, while
TensorFlow offers a variety of options.
PyTorch is used for many deep learning projects today, and its popularity is
increasing among AI researchers, although of the three main frameworks, it
is the least popular. Trends show that this may change soon.
When researchers want flexibility, debugging capabilities, and short training
duration, they choose PyTorch. It runs on Linux, macOS, and Windows.

29
Thanks to its well-documented framework and abundance of trained models
and tutorials, TensorFlow is the favorite tool of many industry professionals
and researchers. TensorFlow offers better visualization, which allows
developers to debug better and track the training process. PyTorch, however,
provides only limited visualization.
TensorFlow also beats PyTorch in deploying trained models to production,
thanks to the TensorFlow Serving framework. PyTorch offers no such
framework, so developers need to use Django or Flask as a back-end server.
In the area of data parallelism, PyTorch gains optimal performance by
relying on native support for asynchronous execution through Python.
However, with TensorFlow, you must manually code and optimize every
operation run on a specific device to allow distributed training. In summary,
you can replicate everything from PyTorch in TensorFlow; you just need to
work harder at it.
If you’re just starting to explore deep learning, you should learn PyTorch first
due to its popularity in the research community. However, if you’re familiar
with machine learning and deep learning and focused on getting a job in the 30
industry as soon as possible, learn TensorFlow first.
PyTorch vs Keras
Both of these choices are good if you’re just starting to work with deep
learning frameworks. Mathematicians and experienced researchers will find
PyTorch more to their liking. Keras is better suited for developers who want
a plug-and-play framework that lets them build, train, and evaluate their
models quickly. Keras also offers more deployment options and easier
model export.
However, remember that PyTorch is faster than Keras and has better
debugging capabilities.
Both platforms enjoy sufficient levels of popularity that they offer plenty of
learning resources. Keras has excellent access to reusable code and
tutorials, while PyTorch has outstanding community support and active
development.
Keras is the best when working with small datasets, rapid prototyping, and
multiple back-end support. It’s the most popular framework thanks to its
31
comparative simplicity. It runs on Linux, MacOS, and Windows.
TensorFlow vs Keras
TensorFlow is an open-sourced end-to-end platform, a library for multiple
machine learning tasks, while Keras is a high-level neural network library
that runs on top of TensorFlow. Both provide high-level APIs used for easily
building and training models, but Keras is more user-friendly because it’s
built-in Python.
Researchers turn to TensorFlow when working with large datasets and object
detection and need excellent functionality and high performance.
TensorFlow runs on Linux, MacOS, Windows, and Android. The framework
was developed by Google Brain and currently used for Google’s research and
production needs.
The reader should bear in mind that comparing TensorFlow and Keras isn’t
the best way to approach the question since Keras functions as a wrapper to
TensorFlow’s framework. Thus, you can define a model with Keras’ interface,
which is easier to use, then drop down into TensorFlow when you need to
32
use a feature that Keras doesn’t have, or you’re looking for specific
Which is Better PyTorch or TensorFlow or Keras?

Keras PyTorch TensorFlow

API Level High Low High and Low
Architecture Simple, concise, readable Complex, less Not easy to use
readable
Datasets Smaller datasets Large datasets, Large datasets,
high performance high performance
Debugging Simple network, debug- Good debugging Difficult to conduct
ging is not often needed capabilities debugging
Does It Have Yes Yes Yes
Trained Models?
Popularity Most popular Third most popular Second most popular
Speed Slow, low performance Fast, Fast,
high-performance high-performance
Written In Python Lua C++, CUDA, Python

33
Section 3

Accelerator and compression tools

Introduction

DNN is powerful but consumes considerable storage and memory bandwidth. To

reduce the storage and energy required to run inference so that they can be
deployed on mobile devices, the authors proposed ”deep compression”: a
three-stage pipeline to reduce the storage required in a manner that preserves
the original accuracy. The three stages are as follows:
1 Prune the networking by removing the redundant connections, keeping only
the most informative connections
2 Quantize the weights so that multiple connections share the same weight,
thus only the codebook and the indices need to be stored
3 Apply Huffman coding to take advantage of the biased distribution of
effective weights

34
General presentation of techniques

35
Deep Compression: exemplary pipeline

36
Network Pruning
The detailed procedures about network pruning in this paper could be read in
this page. Using compressed sparse column (CSC) and compressed sparse row
(CSR) to store the pruned structure. To compress further, we store the index
difference with zero padding solution.

37
Quantization and Weight Sharing

38
Quantization and Weight Sharing
Making multiple connections share the same weight and fine-tune those shared
weights to save storage. Refer to the image upon to have a detailed
understanding. First quantize the weights into k bins, then we only need to store
a small index. During update, the gradients of same bin summed together,
multiplied by the learning rate and subtracted from the shared centroids from
last iteration. The compression rate can be calculated as:
nb
r=
n log(k) + kb
Using k-means to identify the shared weights for each layer of a trained network,
but weights are not shared across layers. Partition n weights W = {w1 , ..., wn }
into k clusters C = {c1 , c2 , · · · , ck }, n >> k by minimize the within-cluster sum of
squares(WCSS):
∑k ∑
arg min |w − ci |2
C i=1 w∈ci
39
Huffman Coding

The probability distribution of quantized weights and the sparse matrix index are
biased. Huffman coding these non-uniformly distributed values saves 20% - 30%
of network storage.
Example of Huffman Coding

40
41
42
The End
Thank You!

A1.1_Computer Hardware and Operations
No ratings yet
A1.1_Computer Hardware and Operations
30 pages
Difference between CPU, GPU, TPU and NPU _ by Abhishek Jain _ Medium
No ratings yet
Difference between CPU, GPU, TPU and NPU _ by Abhishek Jain _ Medium
14 pages
Whitepaper AI Trends
No ratings yet
Whitepaper AI Trends
41 pages
Chief Architect Current Tutorial Guide
100% (2)
Chief Architect Current Tutorial Guide
511 pages
Up and Running With Power BI Service
No ratings yet
Up and Running With Power BI Service
117 pages
Tensor Processing Unit
50% (2)
Tensor Processing Unit
23 pages
5 Introduction To Huawei AI Platforms v3.5
No ratings yet
5 Introduction To Huawei AI Platforms v3.5
113 pages
Ubuntu Server Guide
No ratings yet
Ubuntu Server Guide
377 pages
AI Hardware Showdown_ CPU vs GPU vs NPU
No ratings yet
AI Hardware Showdown_ CPU vs GPU vs NPU
13 pages
Introduction To Hardware Accelerator Systems For Artificial Intelligence and Machine Learning
No ratings yet
Introduction To Hardware Accelerator Systems For Artificial Intelligence and Machine Learning
21 pages
9 - CPU vs. GPU
No ratings yet
9 - CPU vs. GPU
14 pages
EGCO103-02-HowComWorks
No ratings yet
EGCO103-02-HowComWorks
65 pages
JavaScript - React
No ratings yet
JavaScript - React
100 pages
Presentation On CPU (Central Processing Unit) : Copilot
No ratings yet
Presentation On CPU (Central Processing Unit) : Copilot
1 page
Chapter CPU 3
No ratings yet
Chapter CPU 3
31 pages
Getting Started With Test Driven Development Slides
No ratings yet
Getting Started With Test Driven Development Slides
20 pages
Topic 8
No ratings yet
Topic 8
71 pages
IoT PyqAns.html
No ratings yet
IoT PyqAns.html
66 pages
Google TPU
No ratings yet
Google TPU
27 pages
Programming and Testing7
No ratings yet
Programming and Testing7
43 pages
00_CourseIntroduction
No ratings yet
00_CourseIntroduction
33 pages
ComputerCPU Presentation
No ratings yet
ComputerCPU Presentation
6 pages
lecture02_High-Level Digital Design Automation
No ratings yet
lecture02_High-Level Digital Design Automation
34 pages
Lec5-tpu
No ratings yet
Lec5-tpu
44 pages
Comprehensive Guide to IT Hardware Components
No ratings yet
Comprehensive Guide to IT Hardware Components
11 pages
Switch Juniper EX4650 (Datasheet)
No ratings yet
Switch Juniper EX4650 (Datasheet)
13 pages
D386 Study Guide_ Essentials
No ratings yet
D386 Study Guide_ Essentials
37 pages
TerminologieL1 Hardware
No ratings yet
TerminologieL1 Hardware
53 pages
Lecture - 01 - CUDA Programming
No ratings yet
Lecture - 01 - CUDA Programming
52 pages
Software Processes & Software Process Models
No ratings yet
Software Processes & Software Process Models
72 pages
Presentation-Group # 7
No ratings yet
Presentation-Group # 7
17 pages
Anomaly Detection On Time Series Data Challenge Rules
100% (1)
Anomaly Detection On Time Series Data Challenge Rules
8 pages
Central-Processing-Unit-CPU (1)
No ratings yet
Central-Processing-Unit-CPU (1)
9 pages
GPGPU
No ratings yet
GPGPU
139 pages
Detailed_CPU_Explanation_Presentation
No ratings yet
Detailed_CPU_Explanation_Presentation
17 pages
CPU-GPU
No ratings yet
CPU-GPU
22 pages
Compiler Design 2
No ratings yet
Compiler Design 2
9 pages
5th Sem 2018 Scheme Syllabus
No ratings yet
5th Sem 2018 Scheme Syllabus
25 pages
Motivation_for_and_Evaluation_of_the_First_Tensor_Processing_Unit
No ratings yet
Motivation_for_and_Evaluation_of_the_First_Tensor_Processing_Unit
10 pages
Parallel Path Tracing
No ratings yet
Parallel Path Tracing
35 pages
Hardware Function - Cpu
No ratings yet
Hardware Function - Cpu
42 pages
Module-V CPU, TPU, GPU
No ratings yet
Module-V CPU, TPU, GPU
9 pages
2024-aq-compute-blogpost_cpu-vs-gpu
No ratings yet
2024-aq-compute-blogpost_cpu-vs-gpu
9 pages
Minimal Corporate IT Support Brand Presentation
No ratings yet
Minimal Corporate IT Support Brand Presentation
23 pages
the difference between the CPU
No ratings yet
the difference between the CPU
3 pages
SAE J1939 Training
100% (4)
SAE J1939 Training
80 pages
Central Processing Unit (CPU) - GeeksforGeeks
No ratings yet
Central Processing Unit (CPU) - GeeksforGeeks
9 pages
Lecture 04 Application of ICT
No ratings yet
Lecture 04 Application of ICT
8 pages
CPU ARCHITECTGURE 101
No ratings yet
CPU ARCHITECTGURE 101
18 pages
CompTIA A+ Core 1
No ratings yet
CompTIA A+ Core 1
24 pages
LOCC Blockchain Data Specification V1.0
No ratings yet
LOCC Blockchain Data Specification V1.0
10 pages
Eied 5
No ratings yet
Eied 5
25 pages
Rev.4 Quote IT RS Brawijaya
No ratings yet
Rev.4 Quote IT RS Brawijaya
9 pages
GPU Introduction
No ratings yet
GPU Introduction
52 pages
report
No ratings yet
report
9 pages
Computer Components an Insiders Guide (1)
No ratings yet
Computer Components an Insiders Guide (1)
8 pages
LGU Reporting Infographics
No ratings yet
LGU Reporting Infographics
6 pages
Grade 9 - CPU, Motherboard, And Data Bus (1)
No ratings yet
Grade 9 - CPU, Motherboard, And Data Bus (1)
9 pages
Tribhuvan University Institute of Engineering Thapathali Campus
No ratings yet
Tribhuvan University Institute of Engineering Thapathali Campus
40 pages
cw 1 notes.....
No ratings yet
cw 1 notes.....
15 pages
assignment 7.pdf
No ratings yet
assignment 7.pdf
7 pages
MSAI Masterclass PDF
No ratings yet
MSAI Masterclass PDF
33 pages
Cloud Computing Assignment-Week 1 Type of Question: MCQ/MSQ Number of Questions: 10 Total Mark: 10 X 1 10
No ratings yet
Cloud Computing Assignment-Week 1 Type of Question: MCQ/MSQ Number of Questions: 10 Total Mark: 10 X 1 10
4 pages
DataMining
No ratings yet
DataMining
4 pages
gpu (1)
No ratings yet
gpu (1)
11 pages
Multicast Traffic in Vxlan Using Underlay: Example
No ratings yet
Multicast Traffic in Vxlan Using Underlay: Example
10 pages
Empowerment Technology MIDTERM
No ratings yet
Empowerment Technology MIDTERM
38 pages
An Intelligent Internet of Things IoT Based Automatic Dry and Wet Medical Waste Segregation and Manage
No ratings yet
An Intelligent Internet of Things IoT Based Automatic Dry and Wet Medical Waste Segregation and Manage
7 pages
Information Technology Book 2
No ratings yet
Information Technology Book 2
5 pages
Finalprojectpresentation 1
No ratings yet
Finalprojectpresentation 1
12 pages
note2_4
No ratings yet
note2_4
11 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
15 pages
assignment_7
No ratings yet
assignment_7
5 pages
Hardware Accleration For ML
No ratings yet
Hardware Accleration For ML
26 pages
Lesson 6 - Central Processing Unit
No ratings yet
Lesson 6 - Central Processing Unit
5 pages
Comparative Study On CPU GPU and TPU
No ratings yet
Comparative Study On CPU GPU and TPU
9 pages
GPU Architecture
No ratings yet
GPU Architecture
8 pages
RP Photonics Marketing Solutions
No ratings yet
RP Photonics Marketing Solutions
4 pages
BIOL2310 Prac 6 Arthropod Diversity
No ratings yet
BIOL2310 Prac 6 Arthropod Diversity
5 pages
CGE13213 It03 Hardware and Software
No ratings yet
CGE13213 It03 Hardware and Software
22 pages
SteamVR Unity Plugin
No ratings yet
SteamVR Unity Plugin
5 pages
A_Survey_Comparing_Specialized_Hardware_And_Evolution_In_TPUs_For_Neural_Networks
No ratings yet
A_Survey_Comparing_Specialized_Hardware_And_Evolution_In_TPUs_For_Neural_Networks
7 pages
Sophia Introduction To Networking Syllabus
No ratings yet
Sophia Introduction To Networking Syllabus
6 pages
Processors
No ratings yet
Processors
4 pages
computer
No ratings yet
computer
6 pages
CPU_GPU_TPU
No ratings yet
CPU_GPU_TPU
1 page
Ranjan Yadav: Mindtree LTD
No ratings yet
Ranjan Yadav: Mindtree LTD
2 pages
LDT's
No ratings yet
LDT's
5 pages
The Ultimate Guide to Mastering Technology
From Everand
The Ultimate Guide to Mastering Technology
Anshu Goyal
No ratings yet
PSP Architecture: Architecture of Consoles: A Practical Analysis, #18
From Everand
PSP Architecture: Architecture of Consoles: A Practical Analysis, #18
Rodrigo Copetti
No ratings yet

Intro_DL_05

Uploaded by

Intro_DL_05

Uploaded by

INTRODUCTION TO DEEP LEARNING (IT3320E)

5 – Hardware and Software for Deep Learning

Hung Son Nguyen

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

October 18, 2023

1 HARDWARE FOR DEEP LEARNING (CPU, GPU AND TPU)

2 DEEP LEARNING FRAMEWORKS

3 ACCELERATOR AND COMPRESSION TOOLS

Hardware for deep learning (CPU, GPU and TPU)

WHAT ARE THE DIFFERENCES BETWEEN CPU AND GPU?

Since CPUs work so fast to complete millions of

Memory Management Unit (MMU): The MMU controls data movement

All these components work together to provide an environment where

Here’s how a TPU works:

Access to Memory: CPUs usually

Parallel Processing: CPUs cannot handle parallel processing like a GPU, so

Some of the advantages of a GPU include the following:

High Data Throughput: a GPU consist of hundreds of cores performing the

Prototypes that require the highest flexibility

Deep learning frameworks

Keras is an effective high-level neural network Application Programming

PyTorch is a relatively new deep learning framework based on Torch.

Keras PyTorch TensorFlow

Accelerator and compression tools

DNN is powerful but consumes considerable storage and memory bandwidth. To

You might also like