Open navigation menu

Scribd

0% found this document useful (0 votes)

9 views52 pages

Lecture3 Fundamentals of CUDA(Part1)_2025

The document provides an overview of CUDA (Compute Unified Device Architecture), a parallel computing platform and API model designed for managing computations on GPUs. It discusses the architecture of the graphics pipeline, memory management, and error checking in CUDA programming. Key features include the ability to program GPUs in C, a scalable thread execution model, and the separation of host and device memory.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views52 pages

Lecture3 Fundamentals of CUDA(Part1)_2025

The document provides an overview of CUDA (Compute Unified Device Architecture), a parallel computing platform and API model designed for managing computations on GPUs. It discusses the architecture of the graphics pipeline, memory management, and error checking in CUDA programming. Key features include the ability to program GPUs in C, a scalable thread execution model, and the separation of host and device memory.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

SEE3001 Parallel Computer Architecture and Programming

Fundamentals of CUDA 1

Prof. Seokin Hong

Slide Credit: Slides are modified from Prof Baek’s slides

Agenda
▪ History
▪ What is CUDA?
▪ Device Global Memory and Data Transfer
▪ Error Checking
▪ A Vector Addition Kernel
▪ Kernel Functions and Threading
▪ Kernel Launch

2
Agenda

History

3
3D Graphics Pipeline

4
3D Graphics Pipeline

Host (main board) GPU (graphics card)

triangles/lines/points

vertice Transform
Primitive s Primitive
API and Rasterizer
Processing Assembly
Lighting

Vertex
Buffer Texture
Color
Objects Environme Fog
Sum
nt

Color
Alpha Depth Frame
Buffer Dither
Test Stencil Buffer
Blend

5
3D Graphics Pipeline

user input vertex shaded

triangles/lines/points

vertice Transform
Primitive s Primitive
API and Rasterizer
Processing Assembly
Lighting

Vertex
Buffer Texture
Color
Objects Environme Fog
Sum
nt

Color
Alpha Depth Frame
Buffer Dither
Test Stencil Buffer
Blend

6
3D Graphics Pipeline

vertex shaded

triangles/lines/points

vertice Transform
Primitive s Primitive
API and Rasterizer
Processing Assembly
Lighting

Vertex rasterized
Buffer Texture
Color
Objects Environme Fog
Sum
nt

Color
Alpha Depth Frame
Buffer Dither
Test Stencil Buffer
Blend

7
3D Graphics Pipeline

triangles/lines/points

vertice Transform
Primitive s Primitive
API and Rasterizer
Processing Assembly
Lighting

Vertex rasterized
Buffer Texture
Color
Objects Environme Fog
Sum
nt

Color
Alpha Depth Frame
Buffer Dither
Test Stencil Buffer
Blend

pixel
shaded

8
The Graphics Pipeline – 1st Gen.
▪ One chip/board per stage
▪ Fixed data flow through pipeline

triangles/lines/points

vertices
Primitive Transform
Primitive Rasterize
API Processin and
Assembly r
g Lighting

Vertex
Buffer Texture
Color
Objects Environm Fog
Sum
ent

Color
Alpha Depth Frame
Buffer Dither
Test Stencil Buffer
Blend

9
The Graphics Pipeline – 2nd Gen.
▪ Everything fixed function,
with a certain number of modes
▪ Number of modes for each stage grew over time
▪ Hard to optimize HW
▪ Developers always wanted more flexibility

triangles/lines/points

vertices
Primitive Transform
Primitive Rasterize
API Processin and
Assembly r
g Lighting

Vertex
Buffer Texture
Color
Objects Environm Fog
Sum
ent

Color
Alpha Depth Frame
Buffer Dither
Test Stencil Buffer
Blend

10
The Graphics Pipeline – 3rd Gen.
▪ Vertex & pixel processing became programmable
▪ GPU architecture increasingly centers around `shader’ execution

triangles/lines/points

vertic
Primitive
es Vertex Primitive
API Processin Rasterizer
Shader Assembly
g

Vertex
Buffer Pixel
Objects
Shader

Color
Depth Frame
Buffer Dither
Stencil Buffer
Blend

11
Before CUDA

▪ Use the GPU for general-purpose

▪ Computing by casting problem as graphics

o Turn data into images ("texture maps")
o Turn algorithms into image synthesis ("rending passes")

▪ Drawback:
o Tough learning curve
o Potentially high overhead of graphics API
o Highly constrained memory layout & access model
Before CUDA
▪ What's wrong with the old GPGPU programming model 1

APIs are specific to Graphics

Limited texture size and

dimension

Pixel Shader (program)

Limited Instruction set Limited local storage

No thread Communication
Pixel Shader

Limited shader outputs

Before CUDA
▪ What's wrong with the old GPGPU programming model 2
Agenda

What is CUDA?

15
What is CUDA?
▪ What is CUDA?: Compute Unified Device Architecture
o Parallel computing platform and application programming interface (API) model
o A powerful parallel programming model for issuing and managing computations
on the GPU without mapping them to a graphics API
▪ Targeted Software stack
o Library, Runtime, Driver
▪ Advantages
o SW: program the GPU in C
• Scalable data parallel execution/memory model
• C with minimal yet powerful extensions
o HW: fully general data-parallel architecture
▪ Features
o Heterogenous - mixed serial-parallel programming
o Scalable - hierarchical thread execution model
o Accessible - minimal but expressive changes to C
What is CUDA?

Pixel Shader
(program)
Review : Heterogeneous Computing
▪ Use more than one kind of processor or cores
o CPUs for sequential parts
o GPUs for parallel parts
main memory main memory

CPU

GPU
video
memory
18
Simple CUDA Model
▪ Host : CPU + main memory (host memory)
▪ Device : GPU + video memory (device memory)

Host Device

main video
CPU GPU
Memory Memory
Simple CUDA Model
▪ GNU gcc : linux c compiler
▪ nvcc: NVIDIA CUDA compiler

GNU gcc nvcc

Source code(.cu)

CPU Code GPU Code

(aka. Host Code) (aka. Device code
or Kernel)

main video
CPU GPU
Memory Memory

Host Device
CUDA Program Execution Scenario
▪ Integrated “host+device” app C program
o Serial or modestly parallel parts in host code
o Highly parallel parts in device code host code
(serial)

▪ Execution Scenario Device code

(parallel) ...
o Step 1: host code
• Serial execution: read data
• Prepare parallel execution host code
(serial)
• Copy data from host memory to device memory
o Step 2: device code (kernel)
• Parallel processing
• Read/write data from device memory to device memory
o Step 3: host code
• Copy data from device memory to host memory
• Serial execution: print data
Agenda

Device Global Memory and Data Transfer

22
CUDA program uses CUDA memory
▪ GPU cores share the “global memory” (device memory)
o DRAM (e.g., GDDR, HBM) is used as global memory
▪ To execute a kernel on a device,
o allocate global memory on the device
o transfer data from the host memory to allocated device
Red lines are global memory
memory
o transfer result data from the device memory back to the
host memory
o release global memory
SMEM

SMEM

SMEM

SMEM

Host
CPU
Global Memory Memory

23 Device
Memory Spaces (before UVM!!)

▪ CPU and GPU have separate memory spaces

o Data is moved across data bus
o Use functions to allocate/set/copy memory on GPU
o Very similar to corresponding C functions

▪ Pointers are just addresses

o Use pointer to access CPU and GPU memory
o Can’t tell from the pointer value whether the address is on CPU or GPU
o Dereferencing CPU pointer on GPU will likely crash
o Same for vice versa

24
CPU Memory Allocation / Release
▪ Host (CPU) manages host (CPU) memory:
o void* malloc (size_t nbytes)
o void* memset (void* pointer, int value, size_t count)
o void free (void* pointer)

int n = 1024;
int nbytes = 1024*sizeof(int);
int* ptr = 0;
ptr = malloc( nbytes );
memset( ptr, 0, nbytes);
free( ptr );

25
GPU Memory Allocation / Release
▪ Host (CPU) manages device (GPU) memory:
o cudaMalloc (void** pointer, size_t nbytes)
o cudaMemset (void* pointer, int value, size_t count)
o cudaFree (void* pointer)

int n = 1024;
int nbytes = 1024*sizeof(int);
int* dev_a = 0;
cudaMalloc( (void**)&dev_a, nbytes );
cudaMemset( dev_a, 0, nbytes);
cudaFree(dev_a);

26
CUDA function rules
▪ Every library function starts with “cuda”

▪ Most of them returns error code (or cudaSuccess).

o cudaError_t cudaMalloc(void** devPtr, size_t size);
o cudaError_t cudaFree(void* devPtr);
o cudaError_t cudaMemcpy(void* dst, const void* src, size_t size);

▪ Example:
o if (cudaMalloc(&devPtr, SIZE) != cudaSuccess) {
exit(1);
}

27
CUDA Malloc
▪ cudaError_t cudaMalloc( void** devPtr, size_t nbytes );
o allocates nbytes bytes of linear memory on the device
o The start address is stored into “devPtr”
o The memory is not cleared.
o returns cudaSuccess or cudaErrorMemoryAllocation

▪ cudaError_t cudaFree( void* devPtr );

o frees the memory space pointed by devPtr
o if devPtr == 0, no operation
o returns cudaSuccess or cudaErrorInvalidDevicePointer

28
CUDA mem set
▪ cudaError_t cudaMemset( void* devPtr, int value, size_t nbytes );
o fills the first nbytes byte of the memory area pointed by devPtr with the value
o returns cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer

29
Data Copy
▪ cudaError_t cudaMemcpy( void* dst,
void* src,
size_t nbytes,
enum cudaMemcpyKind direction);
o returns after the copy is complete
o blocks CPU thread until all bytes have been copied
o doesn’t start copying until previous CUDA calls complete

▪ enum cudaMemcpyKind
o cudaMemcpyHostToDevice
o cudaMemcpyDeviceToHost
o cudaMemcpyDeviceToDevice
o cudaMemcpyHostToHost

30
Data Copy
▪ host → host : memcpy (in C/C++)
▪ host → device, device → device, device → host : cudaMemcpy (CUDA)

Host Code Kernel

main video
CPU GPU
Memory Memory

Host Device

31
Example: Host-Device Mem copy
▪ step 1.
o make a block of data
o print out the source data
▪ step 2.
o copy from host memory to device memory
o copy from device memory to device memory
o copy from device memory to host memory
▪ step 3.
o print out the result
Host Code Kernel

host device
CPU GPU
Memory Memory

32 Host Device
Code: cuda memcpy (1/4)

#include <iostream>

int main(void) {
// host-side data
const int SIZE = 5;
const int a[SIZE] = { 1, 2, 3, 4, 5 }; // source data
int b[SIZE] = { 0, 0, 0, 0, 0 }; // final destination

// print source
printf("a = {%d,%d,%d,%d,%d}\n", a[0], a[1], a[2], a[3], a[4]);

33
Code: cuda memcpy (2/4)

// device-side data
int* dev_a = 0;
int* dev_b = 0;

// allocate device memory

cudaMalloc((void**)&dev_a, SIZE * sizeof(int));
cudaMalloc((void**)&dev_b, SIZE * sizeof(int));

// copy from host to device

cudaMemcpy(dev_a, a, SIZE * sizeof(int), cudaMemcpyHostToDevice);

34
Code: cuda memcpy (3/4)

// copy from device to device

cudaMemcpy(dev_b, dev_a, SIZE * sizeof(int),
cudaMemcpyDeviceToDevice);
// copy from device to host
cudaMemcpy(b, dev_b, SIZE * sizeof(int), cudaMemcpyDeviceToHost);

35
Code: cuda memcpy (4/4)

// free device memory

cudaFree(dev_a);
cudaFree(dev_b);
// print the result
printf("b = {%d,%d,%d,%d,%d}\n", b[0], b[1], b[2], b[3], b[4]);
// done
return 0;
}

36
Execution Result
▪ Compile the source code
o nvcc memcpy.cu -o ./memcpy

▪ executing ./memcpy
a = {1,2,3,4,5}
b = {1,2,3,4,5}

37
Agenda

Error Checking

38
Error Checking and Handling in CUDA
▪ It is important for a program to check and handle errors
▪ CUDA API functions return flags that indicate whether an error has
occurred

▪ Most of them returns error code (or cudaSuccess).

o cudaError_t cudaMalloc(void** devPtr, size_t size);
o cudaError_t cudaFree(void* devPtr);
o cudaError_t cudaMemcpy(void* dst, const void* src, size_t size);

▪ Example:
o if (cudaMalloc(&devPtr, SIZE) != cudaSuccess) {
exit(1);
}

39
cudaError_t : data type
▪ typedef enum cudaError cudaError_t
▪ possible values:
o cudaSuccess, cudaErrorMissingConfiguration, cudaErrorMemoryAllocation,
cudaErrorInitializationError, cudaErrorLaunchFailure, cudaErrorLaunchTimeout,
cudaErrorLaunchOutOfResources, cudaErrorInvalidDeviceFunction,
cudaErrorInvalidConfiguration, cudaErrorInvalidDevice, cudaErrorInvalidValue,
cudaErrorInvalidPitchValue, cudaErrorInvalidSymbol, cudaErrorUnmapBufferObjectFailed,
cudaErrorInvalidHostPointer, cudaErrorInvalidDevicePointer, cudaErrorInvalidTexture,
cudaErrorInvalidTextureBinding, cudaErrorInvalidChannelDescriptor,
cudaErrorInvalidMemcpyDirection, cudaErrorInvalidFilterSetting,
cudaErrorInvalidNormSetting, cudaErrorUnknown, cudaErrorNotYetImplemented,
cudaErrorInvalidResourceHandle, cudaErrorInsufficientDriver, cudaErrorSetOnActiveProcess,
cudaErrorStartupFailure, cudaErrorApiFailureBase

40
cudaGetErrorName( err )
▪ const char* cudaGetErrorName( cudaError_t err )
o err : error code to convert to string
o returns:
• char* to a NULL-terminated string
• NULL if the error code is not valid

▪ cout << cudaGetErrorName( cudaErrorMemoryAllocation )

<< endl;
▪ cout << cudaGetErrorName( cudaErrorInvalidValue )
<< endl;
o shows:
cudaErrorMemoryAllocation
cudaErrorInvalidValue

41
cudaGetErrorString( err )
▪ const char* cudaGetErrorString( cudaError_t err )
o err : error code to convert to string
o returns:
• char* to a NULL-terminated string
• NULL if the error code is not valid

▪ cout << cudaGetErrorString( cudaErrorMemoryAllocation )

<< endl;
▪ cout << cudaGetErrorString( cudaErrorInvalidValue )
<< endl;
o shows:
out of memory
invalid argument

42
cudaGetLastError( void )
▪ cudaError_t cudaGetLastError( void)
o returns the last error due to CUDA runtime calls in the same host thread
o and resets it to cudaSuccess
o So, if no CUDA error since the last call, it returns cudaSuccess
o For multiple errors, it contains the last error only.

▪ cudaError_t cudaPeekAtLastError( void )

o returns the last error due to CUDA runtime calls in the same host thread
o Note that this call does NOT reset
o So, the last error code is still available

43
A simple CUDA error check code
cudaMemcpy( … );
cudaError_t e = cudaGetLastError();
if (e != cudaSuccess) {
printf(“cuda failure %s:%d: '%s'\n",
__FILE__, __LINE__,
cudaGetErrorString(e) );
exit(0);
}

44
A simple CUDA error check macro
#define cudaCheckError( ) do { \
cudaError_t e = cudaGetLastError(); \
if (e != cudaSuccess) { \
printf(“cuda failure %s:%d: '%s'\n", \
__FILE__, __LINE__, \
cudaGetErrorString(e) ); \
exit(0); \
}\
} while (0)

45
Example
▪ code segment

// allocate device memory

cudaMalloc((void**)&dev_a, sizeof(int));
cudaMalloc((void**)&dev_b, sizeof(int));
cudaMalloc((void**)&dev_c, sizeof(int));
cudaCheckError( );

46
More advanced macro
#ifdef DEBUG // debug mode
#define CUDA_CHECK(x) do {\
(x); \
cudaError_t e = cudaGetLastError(); \
if (cudaSuccess != e) { \
printf("cuda failure %s at %s:%d\n", \
cudaGetErrorString(e), \
__FILE__, __LINE__); \
exit(1); \
}\
} while (0)
#else
#define CUDA_CHECK(x) (x) // release mode
#endif

47
error_check.cu
#include <iostream>

#ifdef DEBUG
#define CUDA_CHECK(x) do {\
(x); \
cudaError_t e = cudaGetLastError(); \
if (cudaSuccess != e) { \
printf("cuda failure \"%s\" at %s:%d\n", \
cudaGetErrorString(e), \
__FILE__, __LINE__); \
exit(1); \
}\
} while (0)
#else
#define CUDA_CHECK(x) (x)
#endif

48
error_check.cu
// main program for the CPU
int main(void) {
// host-side data
const int SIZE = 5;
const int a[SIZE] = { 1, 2, 3, 4, 5 };
int b[SIZE] = { 0, 0, 0, 0, 0 };
// print source
printf("a = {%d,%d,%d,%d,%d}\n", a[0], a[1], a[2], a[3], a[4]);
// device-side data
int *dev_a = 0;
int *dev_b = 0;
// allocate device memory
CUDA_CHECK( cudaMalloc((void**)&dev_a, SIZE * sizeof(int)) );
CUDA_CHECK( cudaMalloc((void**)&dev_b, SIZE * sizeof(int)) );
// copy from host to device
CUDA_CHECK( cudaMemcpy(dev_a, a, SIZE * sizeof(int), cudaMemcpyDeviceToDevice) ); // BOMB here !
// copy from device to device
CUDA_CHECK( cudaMemcpy(dev_b, dev_a, SIZE * sizeof(int), cudaMemcpyDeviceToDevice) );
// copy from device to host
CUDA_CHECK( cudaMemcpy(b, dev_b, SIZE * sizeof(int), cudaMemcpyDeviceToHost) );
// free device memory
CUDA_CHECK( cudaFree(dev_a) );
CUDA_CHECK( cudaFree(dev_b) );
// print the result
printf("b = {%d,%d,%d,%d,%d}\n", b[0], b[1], b[2], b[3], b[4]);
// done
return 0;
}

49
Execution Result
▪ Compile the source code
o nvcc error_check.cu -DDEBUG -o ./error_check
o nvcc error_check.cu -o ./error_check

▪ executing ./error_check
a = {1,2,3,4,5}
b = {………..}

50
Agenda
▪ What is CUDA?
▪ Device Global Memory and Data Transfer
▪ Error Checking
▪ A Vector Addition Kernel
▪ Kernel Functions and Threading
▪ Kernel Launch

51
CUDA Resources
▪ CUDA API reference:
o http://docs.nvidia.com/cuda/index.html
o http://docs.nvidia.com/cuda/cuda-runtime-api/index.html
▪ CUDA course:
o https://developer.nvidia.com/cuda-education-training
o https://developer.nvidia.com/cuda-training

52

You might also like

Cambridge Primary Science 2 Workbook (Second Edition) - Unlocked
100% (12)
Cambridge Primary Science 2 Workbook (Second Edition) - Unlocked
86 pages
Chibi Art Class
100% (12)
Chibi Art Class
356 pages
Medical Clinic Business Plan
No ratings yet
Medical Clinic Business Plan
64 pages
Thesis: Proposed New Public Market With Transport Terminal and Bagsakan Center in Trece Martires, Cavite
89% (9)
Thesis: Proposed New Public Market With Transport Terminal and Bagsakan Center in Trece Martires, Cavite
75 pages
Atma-Vidya-Vilasa: Sri Sadasivendra Brahmam
100% (3)
Atma-Vidya-Vilasa: Sri Sadasivendra Brahmam
6 pages
MSDS Snowwhite White Jelly
No ratings yet
MSDS Snowwhite White Jelly
6 pages
April Free Chapter - The Light Between Oceans by M. L. Stedman
No ratings yet
April Free Chapter - The Light Between Oceans by M. L. Stedman
19 pages
CUDAProgModel
No ratings yet
CUDAProgModel
24 pages
GPU_Programming_slides_2
No ratings yet
GPU_Programming_slides_2
37 pages
21.L18 Intro To GPU and CUDA C
No ratings yet
21.L18 Intro To GPU and CUDA C
89 pages
Gpu History and Cuda Programming Basics
No ratings yet
Gpu History and Cuda Programming Basics
44 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
217 Lec2
No ratings yet
217 Lec2
24 pages
GPUMod 2
No ratings yet
GPUMod 2
64 pages
CUDA PPT Anurita Unit3
No ratings yet
CUDA PPT Anurita Unit3
42 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
CUDA_part-1
No ratings yet
CUDA_part-1
52 pages
1 Cuda
100% (1)
1 Cuda
173 pages
CUDA
No ratings yet
CUDA
33 pages
Cuda Talk
100% (1)
Cuda Talk
82 pages
Gpu Cuda
No ratings yet
Gpu Cuda
204 pages
Lecture2 Cuda Basic 2010
No ratings yet
Lecture2 Cuda Basic 2010
44 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
Cuda C
No ratings yet
Cuda C
70 pages
Intro To CUDA
No ratings yet
Intro To CUDA
76 pages
cs179 2016 Lec13
No ratings yet
cs179 2016 Lec13
30 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
CUDA_1
No ratings yet
CUDA_1
45 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
Lec 2 PDC
No ratings yet
Lec 2 PDC
31 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
Chapter7_GPU
No ratings yet
Chapter7_GPU
45 pages
Lec 1
No ratings yet
Lec 1
27 pages
Introduction To CUDA C 3
No ratings yet
Introduction To CUDA C 3
67 pages
DS1822 - Parallel Computing-unit3
No ratings yet
DS1822 - Parallel Computing-unit3
17 pages
chapter-8
No ratings yet
chapter-8
58 pages
2023-CSC14120-Lecture01-CUDAIntroduction
No ratings yet
2023-CSC14120-Lecture01-CUDAIntroduction
32 pages
CUDA_part-1-LMS
No ratings yet
CUDA_part-1-LMS
51 pages
Hetero Lecture Slides 002 Lecture 1 Lecture-1-5-Cuda-API
No ratings yet
Hetero Lecture Slides 002 Lecture 1 Lecture-1-5-Cuda-API
11 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
Program Structure of CUDA
No ratings yet
Program Structure of CUDA
3 pages
01 Cuda c Basics
No ratings yet
01 Cuda c Basics
32 pages
Cuda C/C++ Basics: NVIDIA Corporation
No ratings yet
Cuda C/C++ Basics: NVIDIA Corporation
67 pages
Lecture-12-GPU-Programming
No ratings yet
Lecture-12-GPU-Programming
65 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
Threads
No ratings yet
Threads
54 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
GPU Series III CUDA Compilation Host Side 1721302802
No ratings yet
GPU Series III CUDA Compilation Host Side 1721302802
8 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
GPU Programming: CUDA
No ratings yet
GPU Programming: CUDA
29 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
247 pages
002 - Introduction To CUDA Programming - 1
No ratings yet
002 - Introduction To CUDA Programming - 1
54 pages
4. CUDA Programming
No ratings yet
4. CUDA Programming
35 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
CUDA Introduction Mod
No ratings yet
CUDA Introduction Mod
50 pages
8 Cud A 1
No ratings yet
8 Cud A 1
38 pages
Basic-Cuda
No ratings yet
Basic-Cuda
49 pages
CUDA Introduction
No ratings yet
CUDA Introduction
39 pages
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Volume Rendering: Exploring Visual Realism in Computer Vision
From Everand
Volume Rendering: Exploring Visual Realism in Computer Vision
Fouad Sabry
No ratings yet
Color Mapping: Exploring Visual Perception and Analysis in Computer Vision
From Everand
Color Mapping: Exploring Visual Perception and Analysis in Computer Vision
Fouad Sabry
No ratings yet
Hit Hulu 1508
No ratings yet
Hit Hulu 1508
2 pages
Account shubham kasar-1
No ratings yet
Account shubham kasar-1
3 pages
Aggie O'Shea: About Aggie and B.J
No ratings yet
Aggie O'Shea: About Aggie and B.J
1 page
Caffeine CSPC SDS
No ratings yet
Caffeine CSPC SDS
8 pages
How To Download Articles From Scribd
100% (1)
How To Download Articles From Scribd
7 pages
2024-03-11 Chinese Nuclear Command and Control
No ratings yet
2024-03-11 Chinese Nuclear Command and Control
89 pages
Project Report On Comparison of Mutual Funds With Other Investment Options
No ratings yet
Project Report On Comparison of Mutual Funds With Other Investment Options
12 pages
Citrus Indoors
100% (1)
Citrus Indoors
6 pages
(Freshman) ACET in PTC
100% (1)
(Freshman) ACET in PTC
13 pages
chemqb_ch02_mcq_e
No ratings yet
chemqb_ch02_mcq_e
13 pages
Status Uptade İlke Ada Öztürk^J 15770184
No ratings yet
Status Uptade İlke Ada Öztürk^J 15770184
3 pages
125 Rks Rks Sport
No ratings yet
125 Rks Rks Sport
208 pages
Computer Programming Ece 343
No ratings yet
Computer Programming Ece 343
51 pages
Souvenir
No ratings yet
Souvenir
20 pages
Hiragana Kotoba
No ratings yet
Hiragana Kotoba
3 pages
List of Items
No ratings yet
List of Items
3 pages
Melc DLL Eng 9 Week 2 Q1 Done
No ratings yet
Melc DLL Eng 9 Week 2 Q1 Done
15 pages
Business Plan Format With Ergonomics 1 1 1
No ratings yet
Business Plan Format With Ergonomics 1 1 1
3 pages
Method Statement For Retaining Wall
No ratings yet
Method Statement For Retaining Wall
19 pages
Practical 10
No ratings yet
Practical 10
4 pages
Meditation Script
No ratings yet
Meditation Script
23 pages
Carlos Cotiangco Vs Province of Biliran and CA
No ratings yet
Carlos Cotiangco Vs Province of Biliran and CA
8 pages
Pricing Maternity & Newborn Shoot - DB ARTWORKS 2024
No ratings yet
Pricing Maternity & Newborn Shoot - DB ARTWORKS 2024
3 pages