如何使用按位运算符使 ARGB 透明

发布于 2024-10-24 08:32:38 字数 1513 浏览 0 评论 0原文

我需要制作透明度,有 2 个像素:

pixel1: {A, R, G, B} - 前景像素
Pixel2:{A,R,G,B} - 背景像素

A,R,G,B是字节值

每种颜色都由字节值表示

现在我计算透明度为:

newR = Pixel2_R * Alpha / 255 + Pixel1_R * (255 - Alpha) / 255
newG = Pixel2_G * Alpha / 255 + Pixel1_G * (255 - Alpha) / 255
newB = Pixel2_B * alpha / 255 + Pixel1_B * (255 - alpha) / 255

但速度太慢 我需要使用按位运算符(AND、OR、XOR、NEGATION、BIT MOVE)来执行此操作

我想在 Windows Phone 7 XNA 上执行此操作

--- 附加 C# 代码 ---

    public static uint GetPixelForOpacity(uint reduceOpacityLevel, uint pixelBackground, uint pixelForeground, uint pixelCanvasAlpha)
    {
        byte surfaceR = (byte)((pixelForeground & 0x00FF0000) >> 16);
        byte surfaceG = (byte)((pixelForeground & 0x0000FF00) >> 8);
        byte surfaceB = (byte)((pixelForeground & 0x000000FF));

        byte sourceR = (byte)((pixelBackground & 0x00FF0000) >> 16);
        byte sourceG = (byte)((pixelBackground & 0x0000FF00) >> 8);
        byte sourceB = (byte)((pixelBackground & 0x000000FF));

        uint newR = sourceR * pixelCanvasAlpha / 256 + surfaceR * (255 - pixelCanvasAlpha) / 256;
        uint newG = sourceG * pixelCanvasAlpha / 256 + surfaceG * (255 - pixelCanvasAlpha) / 256;
        uint newB = sourceB * pixelCanvasAlpha / 256 + surfaceB * (255 - pixelCanvasAlpha) / 256;

        return (uint)255 << 24 | newR << 16 | newG << 8 | newB;
    }

I need to make transparency, having 2 pixels:

pixel1: {A, R, G, B} - foreground pixel
pixel2: {A, R, G, B} - background pixel

A,R,G,B are Byte values

each color is represented by byte value

now I'm calculating transparency as:

newR = pixel2_R * alpha / 255 + pixel1_R * (255 - alpha) / 255
newG = pixel2_G * alpha / 255 + pixel1_G * (255 - alpha) / 255
newB = pixel2_B * alpha / 255 + pixel1_B * (255 - alpha) / 255

but it is too slow
I need to do it with bitwise operators (AND,OR,XOR, NEGATION, BIT MOVE)

I want to do it on Windows Phone 7 XNA

---attached C# code---

    public static uint GetPixelForOpacity(uint reduceOpacityLevel, uint pixelBackground, uint pixelForeground, uint pixelCanvasAlpha)
    {
        byte surfaceR = (byte)((pixelForeground & 0x00FF0000) >> 16);
        byte surfaceG = (byte)((pixelForeground & 0x0000FF00) >> 8);
        byte surfaceB = (byte)((pixelForeground & 0x000000FF));

        byte sourceR = (byte)((pixelBackground & 0x00FF0000) >> 16);
        byte sourceG = (byte)((pixelBackground & 0x0000FF00) >> 8);
        byte sourceB = (byte)((pixelBackground & 0x000000FF));

        uint newR = sourceR * pixelCanvasAlpha / 256 + surfaceR * (255 - pixelCanvasAlpha) / 256;
        uint newG = sourceG * pixelCanvasAlpha / 256 + surfaceG * (255 - pixelCanvasAlpha) / 256;
        uint newB = sourceB * pixelCanvasAlpha / 256 + surfaceB * (255 - pixelCanvasAlpha) / 256;

        return (uint)255 << 24 | newR << 16 | newG << 8 | newB;
    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

贪恋 2024-10-31 08:32:38

您无法仅使用按位运算进行 8 位 alpha 混合,除非您基本上使用基本运算(8 个移位加法)重新发明了乘法。

您可以执行其他答案中提到的两种方法:使用 256 而不是 255,或使用查找表。两者都有问题,但您可以减轻它们。这实际上取决于您在什么架构上执行此操作:乘法、除法、移位、加法和内存负载的相对速度。无论如何:

查找表:一个简单的 256x256 查找表为 64KB。这会破坏你的数据缓存并最终变得非常慢。我不会推荐它,除非你的CPU有一个非常慢的乘法器,但有低延迟的RAM。您可以通过丢弃一些 alpha 位(例如 A>>3)来提高性能,从而产生 32x256=8KB 的查找,这更有可能适合缓存。

使用 256 而不是 255:除以 256 的想法只是右移 8。这会稍微偏离并且倾向于向下舍入,使图像稍微变暗,例如,如果 R=255,A=255,则 (R*A )/256 = 254。您可以作一点,这样做:(R*A+R+A)/256 或只是 (R*A+R)/256 或 (R*A+ A)/256 = 255。或者,首先将 A 缩放至 0..256,例如:A = (256*A)/255。这只是一个昂贵的除以 255 而不是 6。然后,(R*A)/256 = 255。

You can't do an 8 bit alpha blend using only bitwise operations, unless you basically re-invent multiplication with basic ops (8 shift-adds).

You can do two methods as mentioned in other answers: use 256 instead of 255, or use a lookup table. Both have issues, but you can mitigate them. It really depends on what architecture you're doing this on: the relative speed of multiply, divide, shift, add and memory loads. In any case:

Lookup table: a trivial 256x256 lookup table is 64KB. This will thrash your data cache and end up being very slow. I wouldn't recommend it unless your CPU has an abysmally slow multiplier, but does have low latency RAM. You can improve performance by throwing away some alpha bits, e.g A>>3, resulting in 32x256=8KB of lookup, which has a better chance of fitting in cache.

Use 256 instead of 255: the idea being divide by 256 is just a shift right by 8. This will be slightly off and tend to round down, darkening the image slightly, e.g if R=255, A=255 then (R*A)/256 = 254. You can cheat a little and do this: (R*A+R+A)/256 or just (R*A+R)/256 or (R*A+A)/256 = 255. Or, scale A to 0..256 first, e.g: A = (256*A)/255. That's just one expensive divide-by-255 instead of 6. Then, (R*A)/256 = 255.

手长情犹 2024-10-31 08:32:38

我认为仅使用这些运算符无法以相同的精度完成此操作。我认为,你最好的选择是使用 LUT(只要 LUT 可以容纳在 CPU 缓存中,否则它甚至可能会更慢)

// allocate the LUT (64KB)
unsigned char lut[256*256] __cacheline_aligned; // __cacheline_aligned is a GCC-ism

// macro to access the LUT
#define LUT(pixel, alpha) (lut[(alpha)*256+(pixel)])

// precompute the LUT
for (int alpha_value=0; alpha_value<256; alpha_value++) {
  for (int pixel_value=0; pixel_value<256; pixel_value++) {
    LUT(pixel_value, alpha_value) = (unsigned char)((double)(pixel_value) * (double)(alpha_value) / 255.0));
  }
}

// in the loop
unsigned char ialpha = 255-alpha;
newR = LUT(pixel2_R, alpha) + LUT(pixel1_R, ialpha);
newG = LUT(pixel2_G, alpha) + LUT(pixel1_G, ialpha);
newB = LUT(pixel2_B, alpha) + LUT(pixel1_B, ialpha);

,否则你应该尝试对代码进行矢量化。但要做到这一点,您至少应该向我们提供有关您的 CPU 架构和编译器的更多信息。请记住,如果提供了正确的选项,您的编译器可能能够自动矢量化。

I don't think it can be done with the same precision using only those operators. Your best bet is, I reckon, using a LUT (as long as the LUT can fit in the CPU cache, otherwise it might even be slower)

// allocate the LUT (64KB)
unsigned char lut[256*256] __cacheline_aligned; // __cacheline_aligned is a GCC-ism

// macro to access the LUT
#define LUT(pixel, alpha) (lut[(alpha)*256+(pixel)])

// precompute the LUT
for (int alpha_value=0; alpha_value<256; alpha_value++) {
  for (int pixel_value=0; pixel_value<256; pixel_value++) {
    LUT(pixel_value, alpha_value) = (unsigned char)((double)(pixel_value) * (double)(alpha_value) / 255.0));
  }
}

// in the loop
unsigned char ialpha = 255-alpha;
newR = LUT(pixel2_R, alpha) + LUT(pixel1_R, ialpha);
newG = LUT(pixel2_G, alpha) + LUT(pixel1_G, ialpha);
newB = LUT(pixel2_B, alpha) + LUT(pixel1_B, ialpha);

otherwise you should try vectorizing your code. But to do that you should at least provide us with more info on your CPU architecture and compiler. Keep in mind that your compiler might be able to vectorize automatically, if provided with the right options.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文