刚开始学习CUDA的时候,对kernel加载的计算idx一直很模糊,threadIdx.x,blockx.x,blockDim,gridDim等一直分不清。经过查阅各方资料,特在此做个整理,表述一下个人理解。 1. Grid,Block,Thread三关系
从图中我们可以看出,一个Grid里可以包含多个Block,一个Block里包含多个Thread。这三者的组成方式都可以是一维、二维、三维的。在CUDA程序中每个线程的ThreadIdx在任何时刻都是唯一的。 2. 维度 启动kernel时,需要制定gridsize和blocksize dim3 gridsize(x,y,z) dim3 blocksize(x,y,z) blockDim.x,blockDim.y,blockDim.z分别代表Block 在x,y,z三个方向的深度。Dim 数从1开始标,线程数Idx从0开始标。 3. 1D、2D、3D模式 3.1 1D模式 grid 1D,Block 1D(grid划分成1维,block划分成1维) 加载方式 int idx = blockIdx.x *blockDim.x threadIdx.x; Kernel<<< numBlock,threadsPerBlock>>>(argv)
grid 1D,Block 2D (grid划分成1维,block划分成2维) int idx = blockIdx.x * blockDim.x * blockDim.y threadIdx.y * blockDim.x threadIdx.x; dim3 dimBlock(x,y) Kernel<<< numBlock,dimBlock>>>(argv) 以此为例,因为grid是一维的,所以blockIdx.x(从0开始标号)就是一个grid中含有的Block的数目-1;blockDim.x是一个block中x方向的线程数目,blockDim.y是一个block中y方向的线程数目,blockDim.x*blockDim.y就是一个Block中所含有的线程数, blockIdx.x * blockDim.x * blockDim.y 就是一个grid中所有满线程的Block中所含有的线程总数。接下来我们看最后一个Block的情况,因为Block是二维的,所以threadIdx.y * blockDim.x就是满x的线程数,threadIdx.x是最后一行的线程数。三者相加就是所有线程数。
grid 1D,Block 3D int idx = blockIdx.x * blockDim.x * blockDim.y * blockDim.z threadIdx.z * blockDim.y * blockDim.x threadIdx.y * blockDim.x threadIdx.x; dim3 dimBlock(x,y,z) Kernel<<< numBlock,dimBlock>>>(argv)
grid 2D,Block 1D (grid划分成2维,block划分成1维) int blockId = blockIdx.y * gridDim.x blockIdx.x; int Idx = blockId * blockDim.x threadIdx.x; dim3 dimGrid(x,y); Kernel<<< dimGrid,threadsPerBlock>>>(argv);
grid 2D,Block 2D int blockId = blockIdx.y * gridDim.x blockIdx.x; int Idx = blockId * (blockDim.x * blockDim.y) (threadIdx.y * blockDim.x) threadIdx.x; dim3 dimGrid(x1,y1),dimBlock(x2,y2); Kernel<<< dimGrid,dimBlock>>>(argv);
grid 2D,Block 3D int blockId = blockIdx.y * gridDim.x blockIdx.x; int Idx = blockId * (blockDim.x * blockDim.y * blockDim.z) (threadIdx.z * (blockDim.x * blockDim.y)) (threadIdx.y * blockDim.x) threadIdx.x; dim3 dimGrid(x1,y1),dimBlock(x2,y2,z2); Kernel<<< dimGrid,dimBlock>>>(argv);
grid 3D,Block 1D int blockId = blockIdx.x blockIdx.y * gridDim.x gridDim.x * gridDim.y * blockIdx.z; int Idx = blockId * blockDim.x threadIdx.x; dim3 dimGrid(x,y,z); Kernel<<< dimGrid,threadsPerBlock>>>(argv);
grid 3D,Block 2D int blockId = blockIdx.x blockIdx.y * gridDim.x gridDim.x * gridDim.y * blockIdx.z; int Idx = blockId * (blockDim.x * blockDim.y) (threadIdx.y * blockDim.x) threadIdx.x; dim3 dimGrid(x1,y1,z1),dimBlock(x2,y2); Kernel<<< dimGrid,dimBlock>>>(argv);
grid 3D,block 3D int blockId = blockIdx.x blockIdx.y * gridDim.x gridDim.x * gridDim.y * blockIdx.z; int Idx = blockId * (blockDim.x * blockDim.y * blockDim.z) (threadIdx.z * (blockDim.x * blockDim.y)) (threadIdx.y * blockDim.x) threadIdx.x; dim3 dimGrid(x1,y1),dimBlock(x2,y2,z2); Kernel<<< dimGrid,dimBlock>>>(argv);