YUV图像里面的stride和plane的解释

2022-07-15 01:36:11 浏览数 (2)

Image Stride(内存图像行跨度)

stride可以翻译为:跨距

stride指在内存中每行像素所占的空间。如下图所示,为了实现内存对齐(或者其它的什么原因),每行像素在内存中所占的空间并不是图像的宽度。

plane一般是以luma plane、chroma plane的形式出现,其实就是luma层和chroma层,就像RGB,要用三个plane来存。

最近在做HI5321的一个项目,其中遇到一个关键性的技术问题,我们的图像处理程序需 要的是rgb888格式的图像文件,而我从hi3521获取的视频流是yuv420sp格式的图片帧,问题来了,现在我需要将yuv420sp格式的一帧 图像转换成rgb888格式的图片,其实我的目的是要rgb888图像数据。yuv420sp的yuv分量在内存储存的方式是y分量单独存储,uv分量交 叉存储。好了,可以做了,但是当我打印yuv420sp帧信息的时候发现这个720*576的一帧图片的stride(也就是跨距)居然是768,不解, 知道现在即便我已经成功将yuv420sp的一帧图片转成了bmp位图,我依然不明白这个多出来的768-720=48字节是什么。在没有考虑跨距的情况 下,我直接从yuv分量的地址出读取个分量而后获取rgb数据保存成bmp位图,但bmp完全错乱,哪里出了问题。肯定是跨距,跨距:一定会大于等于帧宽 度并且是4的倍数,720和768之间是4的倍数的数多了,为什么是768?好吧!既然是在不足4的倍数的情况下需要在行末补0,那我权当这48字节就在 每行的末尾。那么在读取yuv分量的时候必定要偏移地址。试一试,bmp果真保存成功,就像抓拍的图片一样,当然其中技术细节大家都知道,yuv换算成 rgb的公式我知道的不少于3个。

写这段记录的目的就是想说这个stride的问题,所以在进行yuv420p,yuv420sp,等视频帧转换时一定要注意跨距stride这个参数。

When a video image is stored in memory, the memory buffer might contain extra padding bytes after each row of pixels. The padding bytes affect how the image is store in memory, but do not affect how the image is displayed.

当视频图像存储在内存时,图像的每一行末尾也许包含一些扩展的内容,这些扩展的内容只影响图像如何存储在内存中,但是不影响图像如何显示出来;

The stride is the number of bytes from one row of pixels in memory to the next row of pixels in memory. Stride is also called pitch. If padding bytes are present, the stride is wider than the width of the image, as shown in the following illustration.

Stride 就是这些扩展内容的名称,Stride 也被称作 Pitch,如果图像的每一行像素末尾拥有扩展内容,Stride 的值一定大于图像的宽度值,就像下图所示:

Two buffers that contain video frames with equal dimensions can have two different strides. If you process a video image, you must take into the stride into account.

两个缓冲区包含同样大小(宽度和高度)的视频帧,却不一定拥有同样的 Stride 值,如果你处理一个视频帧,你必须在计算的时候把 Stride 考虑进去;

In addition, there are two ways that an image can be arranged in memory. In a top-down image, the top row of pixels in the image appears first in memory. In a bottom-up image, the last row of pixels appears first in memory. The following illustration shows the difference between a top-down image and a bottom-up image.

另外,一张图像在内存中有两种不同的存储序列(arranged),对于一个从上而下存储(Top-Down) 的图像,最顶行的像素保存在内存中最开头的部分,对于一张从下而上存储(Bottom-Up)的图像,最后一行的像素保存在内存中最开头的部分,下面图示展示了这两种情况:

A bottom-up image has a negative stride, because stride is defined as the number of bytes need to move down a row of pixels, relative to the displayed image. YUV images should always be top-down, and any image that is contained in a Direct3D surface must be top-down. RGB images in system memory are usually bottom-up.

一张从下而上的图像拥有一个负的 Stride 值,因为 Stride 被定义为[从一行像素移动到下一行像素时需要跨过多少个像素],仅相对于被显示出来的图像而言;而 YUV 图像永远都是从上而下表示的,以及任何包含在 Direct3D Surface 中的图像必须是从上而下,RGB 图像保存在系统内存时通常是从下而上;

Video transforms in particular need to handle buffers with mismatched strides, because the input buffer might not match the output buffer. For example, suppose that you want to convert a source image and write the result to a destination image. Assume that both images have the same width and height, but might not have the same pixel format or the same image stride.

尤其是视频变换,特别需要处理不同 Stride 值的图像,因为输入缓冲也许与输出缓冲不匹配,举个例子,假设你想要将源图像转换并且将结果写入到目标图像,假设两个图像拥有相同的宽度和高度,但是其像素格式与 Stride 值也许不同;

The following example code shows a generalized approach for writing this kind of function. This is not a complete working example, because it abstracts many of the specific details.

代码演示

下面代码演示了一种通用方法来编写这种功能,这段代码并不完整,因为这只是一个抽象的算法,没有完全考虑到真实需求中的所有细节;

代码语言:javascript复制
void ProcessVideoImage(
    BYTE*       pDestScanLine0,    
    LONG        lDestStride,       
    const BYTE* pSrcScanLine0,     
    LONG        lSrcStride,        
    DWORD       dwWidthInPixels,    
    DWORD       dwHeightInPixels
    )
{
    for (DWORD y = 0; y < dwHeightInPixels; y  )
    {
        SOURCE_PIXEL_TYPE *pSrcPixel = (SOURCE_PIXEL_TYPE*)pDestScanLine0;
        DEST_PIXEL_TYPE *pDestPixel = (DEST_PIXEL_TYPE*)pSrcScanLine0;

        for (DWORD x = 0; x < dwWidthInPixels; x  =2)
        {
            pDestPixel[x] = TransformPixelValue(pSrcPixel[x]);
        }
        pDestScanLine0  = lDestStride;
        pSrcScanLine0  = lSrcStride;
    }
}

This function takes six parameters:

A pointer to the start of scan line 0 in the destination image.

The stride of the destination image.

A pointer to the start of scan line 0 in the source image.

The stride of the source image.

The width of the image in pixels.

The height of the image in pixels.

这个函数需要六个参数:

  1. 目标图像的起始扫描行的内存指针
  2. 目标图像的 Stride 值
  3. 源图像的起始扫描行的内存指针
  4. 源图像的 Stride 值
  5. 图像的宽度值(以像素为单位)
  6. 图像的高度值(以像素为单位)

The general idea is to process one row at a time, iterating over each pixel in the row. Assume that SOURCE_PIXEL_TYPE and DEST_PIXEL_TYPE are structures representing the pixel layout for the source and destination images, respectively. (For example, 32-bit RGB uses the RGBQUAD structure. Not every pixel format has a pre-defined structure.) Casting the array pointer to the structure type enables you to access the RGB or YUV components of each pixel. At the start of each row, the function stores a pointer to the row. At the end of the row, it increments the pointer by the width of the image stride, which advances the pointer to the next row.

这里的要点是如何一次处理一行像素,遍历一行里面的每一个像素,假设源像素类型与目标像素类型各自在像素的层面上已经结构化来表示一个源图像与目标图像的像素,(举个例子,32 位 RGB 像素使用 RGBQUAD 结构体,并不是每一种像素类型都有预定义结构体的)强制转换数组指针到这样的结构体指针,可以方便你直接读写每一个像素的 RGB 或者 YUV 值,在每一行的开头,这个函数保存了一个指向这行像素的指针,函数的最后一行,通过图像的 Stride 值直接将指针跳转到图像的下一行像素的起始点;

This example calls a hypothetical function named TransformPixelValue for each pixel. This could be any function that calculates a target pixel from a source pixel. Of course, the exact details will depend on the particular task. For example, if you have a planar YUV format, you must access the chroma planes independently from the luma plane; with interlaced video, you might need to process the fields separately; and so forth.

To give a more concrete example, the following code converts a 32-bit RGB image into an AYUV image. The RGB pixels are accessed using an RGBQUAD structure, and the AYUV pixels are accessed using aDXVA2_AYUVSample8 Structure structure.

下面是一个转换的例子,可以通过它很好的理解

最近拿到了一块液晶显示屏,采用NTSC隔行扫描制式输出图像,其采用的颜色格式为YUV4:2:2的UYVY格式,可是某视频解码器输出的颜色格式是YUV4:2:0的I420格式。那么,就必须在两者之间进行一次转换,其中I420是以平面(planner)格式存放的,而UYVY则是以紧缩(packet)格式存放的。这个转换过程并不复杂,原理如图 1所示。

图2中的每一个颜色分量都采用一个字节表示,U0Y0V0Y1这样一个存放序列表示的实际上是两个像素点,总共需要4个字节表示。因此,每一个像素点平均占据的空间是2字节。YUV这种颜色格式的理论依据是HVS(Human Visual System,人类视觉系统)对亮度敏感,而对色度的敏感程度次之。因此通过对每一行像素点的色差分量亚采样来减少所需的存储空间。YUV4:2:2紧缩格式的颜色占据的存储空间是YUV4:4:4格式占据的存储空间的2/3。比如,如果采用YUV4:4:4格式,则每个像素点都需要用三个分量表示,也即需要用3字节表示一个像素点。

代码实现

代码语言:javascript复制
void rv_csp_i420_uyvy(
    uint8_t *y_plane,   // Y plane of I420
    uint8_t *u_plane,   // U plane of I420
    uint8_t *v_plane,   // V plane of I420
    int y_stride,       // Y stride of I420, in pixel
    int uv_stride,      // U and V stride of I420, in pixel
    uint8_t *image,     // output UYVY image
    int width,          // image width
    int height)         // image height
{
    int row;
    int col;
    uint8_t *pImg = image;

    for (row = 0; row < height; row = row 1) 
    {
        for (col = 0; col < width; col = col 2) 
        {
            pImg[0] = u_plane[row/2 * uv_stride   col/2];
            pImg[1] = y_plane[row * y_stride   col];
            pImg[2] = v_plane[row/2 * uv_stride   col/2];
            pImg[3] = y_plane[row * y_stride   col   1];
            pImg  = 4;
        }
    }
}

代码好像有点问题,保存的时候没有考虑YUV4:2:2的stride,不过上面的代码已经把原理说的很清除了。

0 人点赞