
www.nvidia.com
CUDA C++ Programming Guide PG-02829-001_v11.0|v
5.2.2.Device Level.......................................................................................117
5.2.3.Multiprocessor Level............................................................................. 117
5.2.3.1.Occupancy Calculator...................................................................... 119
5.3.Maximize Memory Throughput....................................................................... 121
5.3.1.Data Transfer between Host and Device......................................................122
5.3.2.Device Memory Accesses........................................................................ 123
5.4.Maximize Instruction Throughput................................................................... 127
5.4.1.Arithmetic Instructions.......................................................................... 127
5.4.2.Control Flow Instructions....................................................................... 132
5.4.3.Synchronization Instruction..................................................................... 133
AppendixA.CUDA-Enabled GPUs.......................................................................... 134
AppendixB.C++ Language Extensions................................................................... 135
B.1.Function Execution Space Specifiers............................................................... 135
B.1.1. __global__......................................................................................... 135
B.1.2. __device__.........................................................................................135
B.1.3. __host__............................................................................................136
B.1.4.__noinline__ and __forceinline__..............................................................136
B.2.Variable Memory Space Specifiers.................................................................. 136
B.2.1. __device__.........................................................................................137
B.2.2.__constant__...................................................................................... 137
B.2.3. __shared__.........................................................................................137
B.2.4.__managed__......................................................................................138
B.2.5.__restrict__........................................................................................138
B.3.Built-in Vector Types.................................................................................. 140
B.3.1.char, short, int, long, longlong, float, double...............................................140
B.3.2. dim3................................................................................................ 141
B.4.Built-in Variables.......................................................................................141
B.4.1. gridDim............................................................................................. 141
B.4.2. blockIdx............................................................................................ 141
B.4.3. blockDim........................................................................................... 141
B.4.4. threadIdx...........................................................................................142
B.4.5. warpSize............................................................................................142
B.5.Memory Fence Functions............................................................................. 142
B.6.Synchronization Functions............................................................................ 145
B.7.Mathematical Functions...............................................................................146
B.8.Texture Functions......................................................................................146
B.8.1.Texture Object API............................................................................... 146
B.8.1.1.tex1Dfetch()..................................................................................146
B.8.1.2. tex1D()........................................................................................ 146
B.8.1.3.tex1DLod()....................................................................................146
B.8.1.4.tex1DGrad().................................................................................. 147
B.8.1.5. tex2D()........................................................................................ 147
B.8.1.6.tex2DLod()....................................................................................147