位扩展/复制算法?

发布于 2024-12-29 07:31:11 字数 235 浏览 2 评论 0原文

是否有一种高效(快速)的算法可以执行位扩展/复制?

例如,将 8 位值中的每一位扩展 3(创建 24 位值):

1101 0101 => 11111100 01110001 11000111

已提出的强力方法是创建查找表。将来,扩展值可能需要可变。也就是说,在上面的示例中,我们扩展了 3,但可能需要扩展一些其他值。这将需要多个查找表,如果可能的话,我想避免这些表。

Is there an efficient (fast) algorithm that will perform bit expansion/duplication?

For example, expand each bit in an 8bit value by 3 (creating a 24bit value):

1101 0101 => 11111100 01110001 11000111

The brute force method that has been proposed is to create a lookup table. In the future, the expansion value may need to be variable. That is, in the above example we are expanding by 3 but may need to expand by some other value(s). This would require multiple lookup tables that I'd like to avoid if possible.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

鲸落 2025-01-05 07:31:11

如果算术计算由于某种原因比内存访问更快,则有机会使其比查找表更快。如果计算是矢量化的(PPC AltiVec 或 Intel SSE)和/或程序的其他部分需要使用高速缓存的每一位,则这可能是可能的。

如果扩展因子 = 3,则仅需要 7 条指令:

out = (((in * 0x101 & 0x0F00F) * 0x11 & 0x0C30C3) * 5 & 0x249249) * 7;

或者其他替代方案,需要 10 条指令:

out = (in | in << 8) & 0x0F00F;
out = (out | out << 4) & 0x0C30C3;
out = (out | out << 2) & 0x249249;
out *= 7;

对于其他扩展因子 >= 3:

unsigned mask = 0x0FF;
unsigned out = in;
for (scale = 4; scale != 0; scale /= 2)
{
  shift = scale * (N - 1);
  mask &= ~(mask << scale);
  mask |= mask << (scale * N);
  out = out * ((1 << shift) + 1) & mask;
}
out *= (1 << N) - 1;

或者其他替代方案,对于扩展因子 >= 2:

unsigned mask = 0x0FF;
unsigned out = in;
for (scale = 4; scale != 0; scale /= 2)
{
  shift = scale * (N - 1);
  mask &= ~(mask << scale);
  mask |= mask << (scale * N);
  out = (out | out << shift) & mask;
}
out *= (1 << N) - 1;

shift 和最好在比特流处理之前计算掩码值。

There is a chance to make it quicker than lookup table if arithmetic calculations are for some reason faster than memory access. This may be possible if calculations are vectorized (PPC AltiVec or Intel SSE) and/or if other parts of the program need to use every bit of cache memory.

If expansion factor = 3, only 7 instructions are needed:

out = (((in * 0x101 & 0x0F00F) * 0x11 & 0x0C30C3) * 5 & 0x249249) * 7;

Or other alternative, with 10 instructions:

out = (in | in << 8) & 0x0F00F;
out = (out | out << 4) & 0x0C30C3;
out = (out | out << 2) & 0x249249;
out *= 7;

For other expansion factors >= 3:

unsigned mask = 0x0FF;
unsigned out = in;
for (scale = 4; scale != 0; scale /= 2)
{
  shift = scale * (N - 1);
  mask &= ~(mask << scale);
  mask |= mask << (scale * N);
  out = out * ((1 << shift) + 1) & mask;
}
out *= (1 << N) - 1;

Or other alternative, for expansion factors >= 2:

unsigned mask = 0x0FF;
unsigned out = in;
for (scale = 4; scale != 0; scale /= 2)
{
  shift = scale * (N - 1);
  mask &= ~(mask << scale);
  mask |= mask << (scale * N);
  out = (out | out << shift) & mask;
}
out *= (1 << N) - 1;

shift and mask values are better to be calculated prior to bit stream processing.

披肩女神 2025-01-05 07:31:11

您可以一次输入一位。当然,它会比查找表慢,但如果您正在做一些事情,例如为小型 8 位微控制器编写,但没有足够的空间容纳表,那么它应该具有尽可能最小的 ROM 占用空间。

You can do it one input bit at at time. Of course, it will be slower than a lookup table, but if you're doing something like writing for a tiny, 8-bit microcontroller without enough room for a table, it should have the smallest possible ROM footprint.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文