将 YUV 转换为 RGBA 的最快近似方法?
我正在寻找一种最快的方法将一个 YUV 数组转换为 RGBA 数组。例如,给定一个 YCbYCr 数组,它是一个字节序列:
YCbYCr = Luma0, Cb0, Luma1, Cr0, Luma2, ...
其中 8 位蓝色和 8 位红色差色度分量以周期 2 采样,但 8 位亮度以周期 1 采样。例如:
- 图像像素(0,0)有luma0,Cr0红差色度,Cb0蓝差色度
- 图像像素(0,1)有luma1,Cr0红差色度,Cb0蓝差色度
- 图像像素(0,2)有luma2、Cr1 红差色度、Cb1 蓝差色度
- 图像像素 (0,3) 具有 luma3、Cr1 红差色度、Cb1 蓝差色度
- 等。
应生成 RGBA 数组,其中
RGBA = R0, G0, B0, A0, R1, G1, ...
所有元素均为无符号字符每个A#的所有位都为零。
YCbYCr 中的亮度分量是 0..255 个无符号字符,Cb 和 Cr 是 -127..+126 个有符号字符。有一种标准方法——矩阵乘法,但对于实时应用程序来说它非常慢并且它使用浮点数进行操作。我正在寻找一种快速近似数值方法。
I am looking for a fastest method to convert one YUV array into RGBA array. For example, given a YCbYCr array, which is a sequence of bytes:
YCbYCr = Luma0, Cb0, Luma1, Cr0, Luma2, ...
where 8-bit blue- and 8-bit red- difference chroma components are sampled with period 2, but 8-bit luma is sampled with period 1. For example:
- image pixel (0,0) has luma0, Cr0 red-difference chroma, Cb0 blue-difference chroma
- image pixel (0,1) has luma1, Cr0 red-difference chroma, Cb0 blue-difference chroma
- image pixel (0,2) has luma2, Cr1 red-difference chroma, Cb1 blue-difference chroma
- image pixel (0,3) has luma3, Cr1 red-difference chroma, Cb1 blue-difference chroma
- etc.
RGBA array should be produced which is:
RGBA = R0, G0, B0, A0, R1, G1, ...
where all elements are unsigned chars and all bits of each A# are zero.
Luma component in YCbYCr is 0..255 unsigned char, Cb and Cr are -127..+126 signed chars. There is a standard approach --- matrix multiplication, but it's very slow for real time apps and it operates with floating point numbers. I am looking for a fast approximate numerical method.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可能获得的最大的单一计算节省就是通过以定点而不是浮点进行计算。它可能会快一个数量级(猜测)。
您还可以利用子采样色度贡献中的冗余。假设全矩阵乘法的形式为:
您可以经常计算色度部分和的一半:
然后简单地将其添加到全输出速率的亮度贡献中:
The biggest single computational saving you're likely to get is simply by doing the computation in fixed-point rather than floating-point. It's likely to be an order of magnitude faster (at a guess).
You can also take advantage of the redundancy in the subsampled chroma contributions. Given that the full matrix multiply is of the form:
You can compute the chroma partial sum half as often:
and then simply add it to the luma contribution at the full output rate: