0% found this document useful (0 votes)

9 views

Lecture 2

Uploaded by

nadiabha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Lecture 2

Uploaded by

nadiabha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

CSC 391/691: GPU Programming Fall 2011

Introduction to GPUs for HPC

Copyright © 2011 Samuel S. Cho

High Performance Computing

• Speed: Many problems that are

interesting to scientists and engineers
would take a long time to execute on
a PC or laptop: months, years, “never”.

• Size: Many problems that are

interesting to scientists and engineers
can’t fit on a PC or laptop with a few
GB of RAM or a few 100s of GB of
disk space.

• Supercomputers or clusters of
computers can make these problems
practically numerically solvable.
Scientific and Engineering Problems
• Simulations of physical phenomena such as:
• Weather forecasting
• Earthquake forecasting
• Galaxy formation
• Oil reservoir management
• Molecular dynamics

• Data Mining: Finding needles of critical

information in a haystack of data such as:
• Bioinformatics
• Signal processing
• Detecting storms that might turn into
hurricanes

• Visualization: turning a vast sea of data into

pictures that scientists can understand.

• At its most basic level, all of these problems

involve many, many floating point
operations.
Hardware Accelerators

• In HPC, an accelerator is hardware

component whose role is to speed up
some aspect of the computing
workload.

• In the olden days (1980s), Not an accelerator*

supercomputers sometimes had array
processors, which did vector
operations on arrays

• PCs sometimes had floating point

accelerators: little chips that did the
floating point calculations in hardware
rather than software.

Not an accelerator*

*Okay, I lied.
To Accelerate Or Not To Accelerate

• Pro:
• They make your code
run faster.
• Cons:
• They’re expensive.
• They’re hard to program.
• Your code may not be
cross-platform.
Why GPU for HPC?

• Graphics Processing Units (GPUs) were

originally designed to accelerate graphics tasks
like image rendering.

• They became very very popular with video

gamers, because they’ve produced better and
better images, and lightning fast.

• And, prices have been extremely good, ranging

from three figures at the low end to four figures
at the high end.

• Chips are expensive to design (hundreds of

millions of $$$), expensive to build the factory
for (billions of $$$), but cheap to produce.

• For example, in 2006 – 2007, GPUs sold at a

rate of about 80 million cards per year,
generating about $20 billion per year in revenue.

• This means that the GPU companies have been

able to recoup the huge fixed costs.

• Remember: GPUs mostly do stuff like rendering

images. This is done through mostly floating
point arithmetic – the same stuff people use
supercomputing for!
What are GPUs?

• GPUs have developed from graphics cards into a

platform for high performance computing (HPC)
-- perhaps the most important development in Early design
HPC for many years.

• Co-processors -- very old idea that appeared in Memory CPU

1970s and 1980s with floating point co-
processors attached to microprocessors that
did not then have floating point capability.

• These coprocessors simply executed floating Graphics

point instructions that were fetched from card
memory.

• Around same time, interest to provide hardware

support for displays, especially with increasing
use of graphics and PC games. Display
• Led to graphics processing units (GPUs)
attached to CPU to create video display.
Modern GPU Design

• By late 1990’s, graphics chips needed Input stage

to support 3-D graphics, especially
for games and graphics APIs such as
DirectX and OpenGL. Vertex shader
stage

• Graphics chips generally had a Graphics

pipeline structure with individual memory
Geometry
stages performing specialized shader stage
operations, finally leading to loading
frame buffer for display.
Rasterizer stage

• Individual stages may have access to Frame

buffer
graphics memory for storing Pixel shading
intermediate computed data. stage
General Purpose GPU (GPGPU) Designs

• High performance pipelines call for high-speed (IEEE) floating point operations.

• Known as GPGPU (General-purpose computing on graphics processing units) -- Difficult

to do with specialized graphics pipelines, but possible.)

• By mid 2000’s, recognized that individual stages of graphics pipeline could be implemented
by a more general purpose processor core (although with a data-parallel paradigm)

• 2006 -- First GPU for general high performance computing as well as graphics processing,
NVIDIA GT 80 chip/GeForce 8800 card.

• Unified processors that could perform vertex, geometry, pixel, and general
computing operations

• Could now write programs in C rather than graphics APIs.

• Single-instruction multiple thread (SIMT) programming model

NVIDIA Tesla Platform

• NVIDIA Tesla series was their first platform for the high performance
computing market.

• Named for Nikola Tesla, a pioneering mechanical and electrical engineer and
inventor.

GTX 480 C2070

NVIDIA GTX 480 Specs

• 3 billion transistors

• 480 compute cores

• 1.401 GHz

• Single precision floating point performance:

1.35 TFLOPs (2 single precision flops per clock
per core)

• Double precision floating point performance:

168 GFLOPs (1 double precision flop per clock
per core)

• Internal RAM: 1.5 GB DDR5 VRAM

• Internal RAM speed: 177.4 GB/sec (compared

21-25 GB/sec for regular RAM)

• PCIe slot (at most 8 GB/sec per GPU card)

• 250 W thermal power

Coming: Kepler and Maxwell

• NVIDIA’s 20-series is also known by the codename “Fermi.” It runs at about

0.5 TFLOPs per GPU card (peak).

• The next generation, to be released in 2011, is codenamed “Kepler” and will

be capable of something like 1.4 TFLOPs double precision per GPU card.

• After “Kepler” will come “Maxwell” in 2013, capable of something like 4

TFLOPs double precision per GPU card.

• So, the increase in performance is likely to be roughly 2.5x – 3x per

generation, roughly every two years.
Maryland CPU/GPU Cluster Infrastructure

http://www.umiacs.umd.edu/
research/GPU/facilities.html
Intel’s Response to NVIDIA GPUs
Does it work?

Example Applications URL Speedup

Seismic Database http://www.headwave.com 66x – 100x
Mobile Phone Antenna Simulation http://www.accelware.com 45x
Molecular Dynamics http://www.ks.uiuc.edu/Research/vmd 21x – 100x
Neuron Simulation http://www.evolvedmachines.com 100x
MRI Processing http://bic-test.beckman.uiuc.edu 245x – 415x
Atmospheric Cloud Simulation http://www.cs.clemson.edu/~jesteel/clouds.html 50x

• Looks like remarkable speedup compared

to traditional CPU-based HPC approaches

Real Time Visual Effects For The Technical by Chris Roda
No ratings yet
Real Time Visual Effects For The Technical by Chris Roda
411 pages
Report On Gpu
No ratings yet
Report On Gpu
39 pages
GPUIntro
No ratings yet
GPUIntro
21 pages
06 Intro Gpus
No ratings yet
06 Intro Gpus
33 pages
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
No ratings yet
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
24 pages
Lecture - 01 - CUDA Programming
No ratings yet
Lecture - 01 - CUDA Programming
52 pages
gpus
No ratings yet
gpus
32 pages
Programming For Graphics Processing Units (Gpus) : Parallel
No ratings yet
Programming For Graphics Processing Units (Gpus) : Parallel
35 pages
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
43 pages
NVIDIA GPU Computing - A Journey From PC Gaming To Deep Learning
100% (1)
NVIDIA GPU Computing - A Journey From PC Gaming To Deep Learning
91 pages
Parallel Processing Using GPU's
No ratings yet
Parallel Processing Using GPU's
34 pages
p10-cuda
No ratings yet
p10-cuda
28 pages
Gpu IEEE Paper
No ratings yet
Gpu IEEE Paper
14 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
Gpgpu Workshop Cuda
No ratings yet
Gpgpu Workshop Cuda
10 pages
GPGPU
No ratings yet
GPGPU
139 pages
CUDA
No ratings yet
CUDA
46 pages
3-1
No ratings yet
3-1
35 pages
Lecture-12-PDC - CUDA
No ratings yet
Lecture-12-PDC - CUDA
25 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
1 Cuda
100% (1)
1 Cuda
173 pages
AHA U4
No ratings yet
AHA U4
199 pages
Accelerating Large Graph Algorithms On The GPU Using Cuda
No ratings yet
Accelerating Large Graph Algorithms On The GPU Using Cuda
12 pages
Introduction To Massively Parallel Computing
No ratings yet
Introduction To Massively Parallel Computing
44 pages
IntroGPUs
No ratings yet
IntroGPUs
36 pages
Unit 2 - GPU DFG
No ratings yet
Unit 2 - GPU DFG
27 pages
Algorithmic Considerations For Graphical Hardware Accelerated Applications
No ratings yet
Algorithmic Considerations For Graphical Hardware Accelerated Applications
9 pages
Gpu Cuda Part1
No ratings yet
Gpu Cuda Part1
27 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
No ratings yet
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
2 pages
HPC 5th Unit - 240504 - 160548
No ratings yet
HPC 5th Unit - 240504 - 160548
18 pages
Part1 22
No ratings yet
Part1 22
77 pages
10 - Introduction and Overview GPGPU
No ratings yet
10 - Introduction and Overview GPGPU
69 pages
CUDA Wikipedia
No ratings yet
CUDA Wikipedia
10 pages
10 GPU-IntroCUDA3
No ratings yet
10 GPU-IntroCUDA3
141 pages
D&I of GPU Based Image Processing On CASE Cluster
No ratings yet
D&I of GPU Based Image Processing On CASE Cluster
28 pages
NVIDIAFermiComputeArchitectureWhitepaper PDF
No ratings yet
NVIDIAFermiComputeArchitectureWhitepaper PDF
21 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
Lecture 0: Cpus and Gpus: Prof. Mike Giles
No ratings yet
Lecture 0: Cpus and Gpus: Prof. Mike Giles
36 pages
Owens
No ratings yet
Owens
67 pages
00_CourseIntroduction
No ratings yet
00_CourseIntroduction
33 pages
The Role of Field-Programmable Gate Arrays in The Acceleration of Modern High - Performance Computing Workloads
No ratings yet
The Role of Field-Programmable Gate Arrays in The Acceleration of Modern High - Performance Computing Workloads
11 pages
UNIT-4
No ratings yet
UNIT-4
48 pages
The Evolution of Gpus For General Purpose Computing
No ratings yet
The Evolution of Gpus For General Purpose Computing
38 pages
GPU Gpgpu Computing: Rajan Panigrahi
No ratings yet
GPU Gpgpu Computing: Rajan Panigrahi
24 pages
Whitepaper NVIDIA's Next Generation CUDA Compute Architecture
No ratings yet
Whitepaper NVIDIA's Next Generation CUDA Compute Architecture
21 pages
Graphics Processing Unit (GPU) Programming Strategies and Trends in GPU Computing
No ratings yet
Graphics Processing Unit (GPU) Programming Strategies and Trends in GPU Computing
10 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
GPGPUs CUDA
No ratings yet
GPGPUs CUDA
21 pages
CAO Report
No ratings yet
CAO Report
17 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
Graphics Processing Unit Thesis
100% (2)
Graphics Processing Unit Thesis
4 pages
Brodtkorb Etal Meta10
No ratings yet
Brodtkorb Etal Meta10
15 pages
Graphics Processing Unit
No ratings yet
Graphics Processing Unit
21 pages
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
Graphics Card Wonders
From Everand
Graphics Card Wonders
Mei Gates
No ratings yet
GPU Overclocking Guide
From Everand
GPU Overclocking Guide
Alisa Turing
No ratings yet
PC Hardware Explained
From Everand
PC Hardware Explained
V. Subhash
No ratings yet
Lecture01 Intro ToHPC
No ratings yet
Lecture01 Intro ToHPC
48 pages
Ratpapertoy 2020 Tinakrausnatural
No ratings yet
Ratpapertoy 2020 Tinakrausnatural
3 pages
13 SQLite
No ratings yet
13 SQLite
61 pages
Android Programming: Data Persistence
No ratings yet
Android Programming: Data Persistence
51 pages
RK3188 Datasheet V1.0 PDF
No ratings yet
RK3188 Datasheet V1.0 PDF
57 pages
How To Use WinePrefix in Ubuntu To Optimize Gaming
No ratings yet
How To Use WinePrefix in Ubuntu To Optimize Gaming
9 pages
RealFlow Beginners Guide Downloadable PD
No ratings yet
RealFlow Beginners Guide Downloadable PD
66 pages
VolumetricBillboards CGF09
No ratings yet
VolumetricBillboards CGF09
9 pages
3ds Max Overview
No ratings yet
3ds Max Overview
4 pages
History of Computer Graphics
No ratings yet
History of Computer Graphics
24 pages
Meshmixer Log
No ratings yet
Meshmixer Log
2 pages
Envytools PDF
No ratings yet
Envytools PDF
701 pages
s.1.0.cod24
No ratings yet
s.1.0.cod24
9 pages
Printable Unity
100% (1)
Printable Unity
1,015 pages
Practice Final
No ratings yet
Practice Final
6 pages
Analyzing CUDA Workloads Using A Detailed GPU Simulator
No ratings yet
Analyzing CUDA Workloads Using A Detailed GPU Simulator
12 pages
trace invoice
No ratings yet
trace invoice
44 pages
CompilingShadersToSPIR V
No ratings yet
CompilingShadersToSPIR V
3 pages
06 Export From AC 18 To Artlantis Render and Artlantis Studio
No ratings yet
06 Export From AC 18 To Artlantis Render and Artlantis Studio
7 pages
Computer Graphics - Opengl
No ratings yet
Computer Graphics - Opengl
150 pages
CAT - 1
No ratings yet
CAT - 1
18 pages
Alias API
No ratings yet
Alias API
1,152 pages
Graphics Processor Theoretical Performance: Nvidia Geforce RTX 3080 Founder Edition
No ratings yet
Graphics Processor Theoretical Performance: Nvidia Geforce RTX 3080 Founder Edition
3 pages
HFPS 1.6.3c Documentation
No ratings yet
HFPS 1.6.3c Documentation
44 pages
Webgl
No ratings yet
Webgl
19 pages
Chapter Three Question (2)
No ratings yet
Chapter Three Question (2)
21 pages
Lighting and Rendering
No ratings yet
Lighting and Rendering
24 pages
Bachelor Thesis HTW Chur
100% (1)
Bachelor Thesis HTW Chur
7 pages
3d Human Model
No ratings yet
3d Human Model
77 pages
FOREST PACK-GIAO DIỆN
No ratings yet
FOREST PACK-GIAO DIỆN
106 pages
VR summary
No ratings yet
VR summary
20 pages
Lec 7
No ratings yet
Lec 7
11 pages
Accelerating Marching Cubes With Graphics Hardware
No ratings yet
Accelerating Marching Cubes With Graphics Hardware
6 pages

Lecture 2

Uploaded by

Lecture 2

Uploaded by

CSC 391/691: GPU Programming Fall 2011

Introduction to GPUs for HPC

Copyright © 2011 Samuel S. Cho

• Speed: Many problems that are

• Size: Many problems that are

• Data Mining: Finding needles of critical

• Visualization: turning a vast sea of data into

• At its most basic level, all of these problems

• In HPC, an accelerator is hardware

• In the olden days (1980s), Not an accelerator*

• PCs sometimes had floating point

• Graphics Processing Units (GPUs) were

• They became very very popular with video

• And, prices have been extremely good, ranging

• Chips are expensive to design (hundreds of

• For example, in 2006 – 2007, GPUs sold at a

• This means that the GPU companies have been

• Remember: GPUs mostly do stuff like rendering

• GPUs have developed from graphics cards into a

• Co-processors -- very old idea that appeared in Memory CPU

• These coprocessors simply executed floating Graphics

• Around same time, interest to provide hardware

• By late 1990’s, graphics chips needed Input stage

• Graphics chips generally had a Graphics

• Individual stages may have access to Frame

• Known as GPGPU (General-purpose computing on graphics processing units) -- Difficult

• Could now write programs in C rather than graphics APIs.

• Single-instruction multiple thread (SIMT) programming model

GTX 480 C2070

• 480 compute cores

• Single precision floating point performance:

• Double precision floating point performance:

• Internal RAM: 1.5 GB DDR5 VRAM

• Internal RAM speed: 177.4 GB/sec (compared

• PCIe slot (at most 8 GB/sec per GPU card)

• 250 W thermal power

• NVIDIA’s 20-series is also known by the codename “Fermi.” It runs at about

• The next generation, to be released in 2011, is codenamed “Kepler” and will

• After “Kepler” will come “Maxwell” in 2013, capable of something like 4

• So, the increase in performance is likely to be roughly 2.5x – 3x per

Example Applications URL Speedup

• Looks like remarkable speedup compared

You might also like