3.2.11.3. CUDA Arrays
CUDA arrays are opaque memory layouts optimized for texture fetching. They are one dimensional, two dimensional, or three-dimensional and composed of elements, each of which has 1, 2 or 4 components that may be signed or unsigned 8-, 16-, or 32-bit integers, 16-bit floats, or 32-bit floats. CUDA arrays are only accessible by kernels through texture fetching as described in Texture Memory or surface reading and writing as described in Surface Memory.
3.2.11.4. Read/Write Coherency
The texture and surface memory is cached (see Device Memory Accesses) and within the same kernel call, the cache is not kept coherent with respect to global memory writes and surface memory writes, so any texture fetch or surface read to an address that has been written to via a global write or a surface write in the same kernel call returns undefined data. In other words, a thread can safely read some texture or surface memory location only if this memory location has been updated by a previous kernel call or memory copy, but not if it has been previously updated by the same thread or another thread from the same kernel call.
3.2.12. Graphics Interoperability
Some resources from OpenGL and Direct3D may be mapped into the address space of CUDA, either to enable CUDA to read data written by OpenGL or Direct3D, or to enable CUDA to write data for consumption by OpenGL or Direct3D.
A resource must be registered to CUDA before it can be mapped using the functions mentioned in OpenGL Interoperability and Direct3D Interoperability. These functions return a pointer to a CUDA graphics resource of type struct cudaGraphicsResource. Registering a resource is potentially high-overhead and therefore typically called only once per resource. A CUDA graphics resource is unregistered using cudaGraphicsUnregisterResource(). Each CUDA context which intends to use the resource is required to register it separately.
Once a resource is registered to CUDA, it can be mapped and unmapped as many times as necessary using cudaGraphicsMapResources() and cudaGraphicsUnmapResources().cudaGraphicsResourceSetMapFlags() can be called to specify usage hints (write-only, read-only) that the CUDA driver can use to optimize resource management.
A mapped resource can be read from or written to by kernels using the device memory address returned by cudaGraphicsResourceGetMappedPointer() for buffers andcudaGraphicsSubResourceGetMappedArray() for CUDA arrays.
Accessing a resource through OpenGL, Direct3D, or another CUDA context while it is mapped produces undefined results. OpenGL Interoperability and Direct3D Interoperability give specifics for each graphics API and some code samples. SLI Interoperability gives specifics for when the system is in SLI mode.
本文备注/经验分享:
CUDA Array——
CUDA Array是一种为纹理拾取优化过布局的存储,具体存储布局对用户来说是不透明的。它由1维,2维,或者3维的元素(纹元)构成,每个元素可以有1个,2个,或者4个分量(注意并没有3个分量的元素)。而每个分量则是8-bit, 16-bit, 32-bit的整数(有符号或者无符号),或者16-bit和32-bit浮点数构成。CUDA Array在kernel里只能通过texture的拾取(读取),或者Surface的读写来访问。如同之前在Texture Memory章节和Surface Memory章节那里描述的一样。
CUDA Array是普通的数组么?这个不是普通的数组的。普通的数组布局是知道的(一个元素接着一个元素,先行,再列),而这个的布局NV不告诉你的。你只需要知道是一种优化过的秘密布局方式即可。这个是和普通的数组的最大区别。如果你想知道内部的秘密,网上有第三方资料(特别是AMD的资料)可以告诉你内部的真实情况。
Read/Write Coherency 读取和写入的一致性问题:Texture和Surface的存储经过缓存,在同一次kernel启动期间,这个(读取用的)缓存不维持和普通的global memoryx写入的一致性,也不维持和Surface写入的一致性。所以试图在同一次kernel启动内部,试图通过Texture拾取或者Surface读取一个刚刚通过普通global memory(指针)写入过的,或者Surface写入过的地址,返回的结果将是未定义的。(这就是我们之前说过的,本次的写入,下一次启动才能生效)。注意因为texture和surface的后备存储前者可能是普通线性内存或者CUDA Array, 或者是CUDA Array,可能通过普通写入或者surface写入来改变内容的,所以这里两种都说了。但这个写入本次如果立刻读取来用,值是未定义的。(可能读取到你写入后的新值,也可能读取到写入之前的老值,甚至可能是这两种的混合情况。所以说将读取到未定义的结果),换句话说,一个(设备端)的线程,如果想安全的读取到一些texture或者surface的内容,那么必须是之前通过cudaMemcpy*()系列函数,或者是之前的kernel改写过才可以。而不是同样的一次kernel调用期间,被这个线程自己,或者其他线程改写过。
Graphics Interoperability 这部分是讲OpenGL和Direct3D互操作这个,我们对这部分不是很熟悉,所以就不讲了,抱歉了各位!
有不明白的地方,请在本文后留言
或者在我们的技术论坛bbs.gpuworld.cn上发帖