Declaring Variables in a CUDA kernel


Say you declare a new variable in a CUDA kernel and then use it in multiple threads, like:

__global__ void kernel(float* delt, float* deltb) {int i = blockIdx.x * blockDim.x + threadIdx.x;float a;a = delt[i] + deltb[i];a += 1;}

and the kernel call looks something like below, with multiple threads and blocks:

int threads = 200;uint3 blocks = make_uint3(200,1,1);kernel<<<blocks,threads>>>(d_delt, d_deltb);
  1. Is "a" stored on the stack?
  2. Is a new "a" created for each thread when they are initialized?
  3. Or will each thread independently access "a" at an unknown time, potentially messing up the algorithm?

Best Solution

Any variable (scalar or array) declared inside a kernel function, without an extern specifier, is local to each thread, that is each thread has its own "copy" of that variable, no data race among threads will occur!

Compiler chooses whether local variables will reside on registers or in local memory (actually global memory), depending on transformations and optimizations performed by the compiler.

Further details on which variables go on local memory can be found in the NVIDIA CUDA user guide, chapter