将 unsigned char 8 位转换为实际数字的最快方法

发布于 2024-10-02 14:34:34 字数 648 浏览 8 评论 0原文

我使用 unsigned char 来存储 8 个标志。每个标志代表立方体的一个角。所以 00000001 将是角点 1 01000100 将是角点 3 和 7 等。我当前的解决方案是 & 结果为 1,2, 4、8、16、32、64和128,检查结果是否不为零并存储角点。即,if (result & 1)corners.push_back(1);。我有机会摆脱那个“if”语句吗?我希望我可以用按位运算符摆脱它,但我想不出任何。

关于为什么我想摆脱 if 语句的一些背景知识。这个立方体实际上是一个体素,它是尺寸至少为 512x512x512 的网格的一部分。即超过 1.34 亿个体素。我正在对每个体素进行计算(嗯,不完全是,但我不会讨论太多细节,因为它与这里无关),这是大量的计算。我需要每帧执行这些计算。每个函数调用的任何微小的速度提升都将有助于完成这些计算量。为了给您一个想法,我的算法(在某些时候)需要确定浮点数是负数、正数还是零(在某些误差范围内)。我在那里有 if 语句和大于/小于检查。我用快速 float 到 int 函数替换了它,并缩短了四分之一秒。目前,128x128x128 网格中的每一帧需要 4 秒多一点的时间。

I am using an unsigned char to store 8 flags. Each flag represents the corner of a cube. So 00000001 will be corner 1 01000100 will be corners 3 and 7 etc. My current solution is to & the result with 1,2,4,8,16,32,64 and 128, check whether the result is not zero and store the corner. That is, if (result & 1) corners.push_back(1);. Any chance I can get rid of that 'if' statement? I was hoping I could get rid of it with bitwise operators but I could not think of any.

A little background on why I want to get rid of the if statement. This cube is actually a Voxel which is part of a grid that is at least 512x512x512 in size. That is more than 134 million Voxels. I am performing calculations on each one of the Voxels (well, not exactly, but I won't go into too much detail as it is irrelevant here) and that is a lot of calculations. And I need to perform these calculations per frame. Any speed boost that is minuscule per function call will help with these amount of calculations. To give you an idea, my algorithm (at some point) needed to determine whether a float was negative, positive or zero (within some error). I had if statements in there and greater/smaller than checks. I replaced that with a fast float to int function and shaved of a quarter of a second. Currently, each frame in a 128x128x128 grid takes a little more than 4 seconds.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

十级心震 2024-10-09 14:34:34

我会考虑一种完全不同的方法:不同的标志组合只有 256 种可能性。预先计算 256 个向量并根据需要对它们进行索引。

std::vector<std::vector<int> > corners(256);
for (int i = 0; i < 256; ++i) {
    std::vector<int>& v = corners[i];
    if (i & 1) v.push_back(1);
    if (i & 2) v.push_back(2);
    if (i & 4) v.push_back(4);
    if (i & 8) v.push_back(8);
    if (i & 16) v.push_back(16);
    if (i & 32) v.push_back(32);
    if (i & 64) v.push_back(64);
    if (i & 128) v.push_back(128);
}

for (int i = 0; i < NumVoxels(); ++i) {
    unsigned char flags = GetFlags(i);
    const std::vector& v = corners[flags];

    ... // do whatever with v
}

这将避免所有条件让push_back调用new,我怀疑无论如何这会更昂贵。

I would consider a different approach to it entirely: there are only 256 possibilities for different combinations of flags. Precalculate 256 vectors and index into them as needed.

std::vector<std::vector<int> > corners(256);
for (int i = 0; i < 256; ++i) {
    std::vector<int>& v = corners[i];
    if (i & 1) v.push_back(1);
    if (i & 2) v.push_back(2);
    if (i & 4) v.push_back(4);
    if (i & 8) v.push_back(8);
    if (i & 16) v.push_back(16);
    if (i & 32) v.push_back(32);
    if (i & 64) v.push_back(64);
    if (i & 128) v.push_back(128);
}

for (int i = 0; i < NumVoxels(); ++i) {
    unsigned char flags = GetFlags(i);
    const std::vector& v = corners[flags];

    ... // do whatever with v
}

This would avoid all the conditionals and having push_back call new which I suspect would be more expensive anyway.

如若梦似彩虹 2024-10-09 14:34:34

如果在该位被设置时需要执行某些操作,而在未设置位时则不需要执行某些操作,那么似乎您必须在某处设置某种条件。如果它可以以某种方式表达为计算,您可以像这样绕过它,例如:

numCorners = ((result >> 0) & 1) + ((result >> 1) & 1) + ((result >> 2) & 1) + ...

If there's some operation that needs to be done if the bit is set and not if it's not, it seems you'll have to have a conditional of some kind somewhere. If it could be expressed as a calculation somehow, you could get around it like this, for example:

numCorners = ((result >> 0) & 1) + ((result >> 1) & 1) + ((result >> 2) & 1) + ...
银河中√捞星星 2024-10-09 14:34:34

黑客之乐,第一页:

x & (-x) // isolates the lowest set bit
x & (x - 1) // clears the lowest set bit

内联您的 push_back 方法也会有所帮助(最好创建一个一起接收所有标志的函数)。

通常,如果您需要性能,您应该在设计整个系统时考虑到这一点。如果您发布更多代码,也许会更容易提供帮助。

编辑:这是一个好主意:

unsigned char LOG2_LUT[256] = {...};
int t;
switch (count_set_bits(flags)){
    case 8:     t = flags; 
                flags &= (flags - 1);       // clearing a bit that was set
                t ^= flags;                 // getting the changed bit
                corners.push_back(LOG2_LUT[t]);
    case 7:     t = flags; 
                flags &= (flags - 1);       
                t ^= flags;                 
                corners.push_back(LOG2_LUT[t]);
    case 6:     t = flags; 
                flags &= (flags - 1);       
                t ^= flags;                 
                corners.push_back(LOG2_LUT[t]);
    // etc...
};

count_set_bits() 是一个众所周知的函数:http://www-graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable

Hackers's Delight, first page:

x & (-x) // isolates the lowest set bit
x & (x - 1) // clears the lowest set bit

Inlining your push_back method would also help (better create a function that receives all the flags together).

Usually if you need performance, you should design the whole system with that in mind. Maybe if you post more code it will be easier to help.

EDIT: here is a nice idea:

unsigned char LOG2_LUT[256] = {...};
int t;
switch (count_set_bits(flags)){
    case 8:     t = flags; 
                flags &= (flags - 1);       // clearing a bit that was set
                t ^= flags;                 // getting the changed bit
                corners.push_back(LOG2_LUT[t]);
    case 7:     t = flags; 
                flags &= (flags - 1);       
                t ^= flags;                 
                corners.push_back(LOG2_LUT[t]);
    case 6:     t = flags; 
                flags &= (flags - 1);       
                t ^= flags;                 
                corners.push_back(LOG2_LUT[t]);
    // etc...
};

count_set_bits() is a very known function: http://www-graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable

往事风中埋 2024-10-09 14:34:34

有一种方法,它并不“漂亮”,但它有效。

(result & 1)   && corners.push_back(1);
(result & 2)   && corners.push_back(2);
(result & 4)   && corners.push_back(3);
(result & 8)   && corners.push_back(4);
(result & 16)  && corners.push_back(5);
(result & 32)  && corners.push_back(6);
(result & 64)  && corners.push_back(7);
(result & 128) && corners.push_back(8);

它使用了 C++ 语言的一个鲜为人知的功能:布尔快捷方式。

There is a way, it's not "pretty", but it works.

(result & 1)   && corners.push_back(1);
(result & 2)   && corners.push_back(2);
(result & 4)   && corners.push_back(3);
(result & 8)   && corners.push_back(4);
(result & 16)  && corners.push_back(5);
(result & 32)  && corners.push_back(6);
(result & 64)  && corners.push_back(7);
(result & 128) && corners.push_back(8);

it uses a seldom known feature of the C++ language: the boolean shortcut.

孤独患者 2024-10-09 14:34:34

我在 OpenTTD 代码中注意到了类似的算法。事实证明它完全没用:如果像这样分解数字,你会更快。相反,用对字节位的迭代来替换对您现在拥有的 vector 的迭代。这对缓存更加友好。

IE

unsigned char flags = Foo(); // the value you didn't put in a vector<>
for (unsigned char c = (UCHAR_MAX >> 1) + 1; c !=0 ; c >>= 1)
{
  if (flags & c) 
    Bar(flags&c);
}

I've noted a similar algorithm in the OpenTTD code. It turned out to be utterly useless: you're faster off by not breaking down numbers like that. Instead, replace the iteration over the vector<> you have now by an iteration over the bits of the byte. This is far more cache-friendly.

I.e.

unsigned char flags = Foo(); // the value you didn't put in a vector<>
for (unsigned char c = (UCHAR_MAX >> 1) + 1; c !=0 ; c >>= 1)
{
  if (flags & c) 
    Bar(flags&c);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文