当前位置：文江博客话题详情

快速 rgb565 转 YUV（甚至 rgb565 转 Y）

发布于 2024-08-17 12:31:13 字数 940 浏览 16 评论 0原文

我正在做一件事情，我希望有输出选项可以转到视频叠加。有的支持rgb565，如果这么好的话，就把数据复制过来吧。

如果不是，我必须通过转换来复制数据，并且它一次是一个帧缓冲区。我将尝试一些事情，但我认为这可能是优化者热衷于尝试一些挑战的事情之一。

通常最容易支持的各种 YUV 格式是 Y 平面，后跟交错或单独的 UV 平面。

使用 Linux / xv，但在我处理的级别上它只是字节和 x86。

我将专注于速度而牺牲质量，但可能有数百种不同的路径可供尝试。那里有一个平衡点。

我查看了mmx，但我不确定那里是否有任何有用的东西。没有什么让我觉得特别适合这项任务，而且需要进行大量的洗牌才能将东西放入寄存器中的正确位置。

尝试使用 Y = Green*0.5 + R*0.25 + Blue*notmuch 的粗略版本。 U 和 V 的质量就更不用担心了。在这些频道上谋杀你可以逃脱惩罚。

对于一个简单的循环。

loop:
movzx eax,[esi]
add esi,2
shr eax,3
shr al,1
add ah,ah
add al,ah
mov [edi],al
add edi,1
dec count
jnz loop

当然，每条指令都取决于前一条指令，并且单词读取并不是最好的，因此交错两条指令可能会有所收获。

loop: 
mov eax,[esi]
add esi,4
mov ebx,eax
shr eax,3
shr ebx,19
add ah,ah
add bh,bh
add al,ah
add bl,bh
mov ah,bl
mov [edi],ax
add edi,2
dec count
jnz loop

一次使用 4 条指令会很容易做到这一点，也许是有好处的。

谁能想出更快、更好的办法吗？

一个有趣的方面是一个像样的编译器是否可以生成类似的代码。

原文

I'm working on a thing where I want to have the output option to go to a video overlay. Some support rgb565, If so sweet, just copy the data across.

If not I have to copy data across with a conversion and it's a frame buffer at a time. I'm going to try a few things, but I thought this might be one of those things that optimisers would be keen on having a go at for a bit of a challenge.

There a variety of YUV formats that are commonly supported easiest would be the Plane of Y followed by either interleaved or individual planes of UV.

Using Linux / xv, but at the level I'm dealing with it's just bytes and an x86.

I'm going to focus on speed at the cost of quality, but there are potentially hundreds of different paths to try out. There's a balance in there somewhere.

I looked at mmx but I'm not sure if there is anything useful there. There's nothing that strikes me as particularly suited to the task and it's a lot of shuffling to get things into the right place in registers.

Trying a crude version with Y = Green*0.5 + R*0.25 + Blue*notmuch. The U and V are even less of a concern quality wise. You can get away with murder on those channels.

For a simple loop.

loop:
movzx eax,[esi]
add esi,2
shr eax,3
shr al,1
add ah,ah
add al,ah
mov [edi],al
add edi,1
dec count
jnz loop

of course every instruction depends on the one before and word reads aren't the best so interleaving two might gain a bit

loop: 
mov eax,[esi]
add esi,4
mov ebx,eax
shr eax,3
shr ebx,19
add ah,ah
add bh,bh
add al,ah
add bl,bh
mov ah,bl
mov [edi],ax
add edi,2
dec count
jnz loop

It would be quite easy to do that with 4 at a time, maybe for a benefit.

Can anyone come up with anything faster, better?

An interesting side point to this is whether or not a decent compiler can produce similar code.

分享到QQ

分享到微博