Basic Elements of A Program
Basic Elements of A Program
a CUDA program
Basic steps of a CUDA program
Block
Threads in a grid is organized in
to groups called thread blocks
Y Z
4 4
X
4
Kernel_name <<<
number_of_blocks,
thread_per_block >>> (arguments)
dim3 variable_name ( X, Y, Z)
variable_name.x
variable_name.y
variable_name.z
Z X
4
Y
32
dim3 block( 4, 1, 1)
dim3 grid( 8, 1, 1)
Kernel_name << grid, block >>>()
16
X
8
2
4
Y
dim3 block( 8, 2, 1)
dim3 grid( 2, 2, 1)
Kernel_name << grid, block >>>()
Limitation for block size
Z
X
y <= 1024
z < =64
Y x <= 1024
x * y * z < = 1024
Limitation for number of thread block
in each dimension
65536
65536
32
2 -1