如何有效地解交织位(逆莫顿)
这个问题:如何去交错位(UnMortonizing?)有一个很好的答案提取莫顿数的两半之一(仅奇数位)的答案,但我需要一个解决方案,以尽可能少的操作提取两个部分(奇数位和偶数位)。
对于我的使用,我需要采用 32 位 int 并提取两个 16 位 int,其中一个是偶数位,另一个是右移 1 位的奇数位,例如,
input, z: 11101101 01010111 11011011 01101110
output, x: 11100001 10110111 // odd bits shifted right by 1
y: 10111111 11011010 // even bits
似乎有很多使用移位和掩码的解决方案使用幻数生成莫顿数(即交织位),例如 按二进制幻数交织位,但我还没有找到任何可以做相反的事情(即去交错)。
更新
在重新阅读 Hacker's Delight 关于完美洗牌/非洗牌的部分后,我发现了一些有用的示例,我对其进行了如下修改:
// morton1 - extract even bits
uint32_t morton1(uint32_t x)
{
x = x & 0x55555555;
x = (x | (x >> 1)) & 0x33333333;
x = (x | (x >> 2)) & 0x0F0F0F0F;
x = (x | (x >> 4)) & 0x00FF00FF;
x = (x | (x >> 8)) & 0x0000FFFF;
return x;
}
// morton2 - extract odd and even bits
void morton2(uint32_t *x, uint32_t *y, uint32_t z)
{
*x = morton1(z);
*y = morton1(z >> 1);
}
我认为这仍然可以改进,无论是在当前的标量形式还是在通过利用 SIMD,所以我仍然对更好的解决方案感兴趣(标量或 SIMD)。
This question: How to de-interleave bits (UnMortonizing?) has a good answer for extracting one of the two halves of a Morton number (just the odd bits), but I need a solution which extracts both parts (the odd bits and the even bits) in as few operations as possible.
For my use I would need to take a 32 bit int and extract two 16 bit ints, where one is the even bits and the other is the odd bits shifted right by 1 bit, e.g.
input, z: 11101101 01010111 11011011 01101110
output, x: 11100001 10110111 // odd bits shifted right by 1
y: 10111111 11011010 // even bits
There seem to be plenty of solutions using shifts and masks with magic numbers for generating Morton numbers (i.e. interleaving bits), e.g. Interleave bits by Binary Magic Numbers, but I haven't yet found anything for doing the reverse (i.e. de-interleaving).
UPDATE
After re-reading the section from Hacker's Delight on perfect shuffles/unshuffles I found some useful examples which I adapted as follows:
// morton1 - extract even bits
uint32_t morton1(uint32_t x)
{
x = x & 0x55555555;
x = (x | (x >> 1)) & 0x33333333;
x = (x | (x >> 2)) & 0x0F0F0F0F;
x = (x | (x >> 4)) & 0x00FF00FF;
x = (x | (x >> 8)) & 0x0000FFFF;
return x;
}
// morton2 - extract odd and even bits
void morton2(uint32_t *x, uint32_t *y, uint32_t z)
{
*x = morton1(z);
*y = morton1(z >> 1);
}
I think this can still be improved on, both in its current scalar form and also by taking advantage of SIMD, so I'm still interested in better solutions (either scalar or SIMD).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您的处理器有效地处理 64 位整数,您可以组合操作......
If your processor handles 64 bit ints efficiently, you could combine the operations...
Intel Haswell 及更高版本 CPU 的代码。您可以使用包含 pext 和 pdep 指令的 BMI2 指令集。这些(以及其他伟大的东西)可以用来构建您的函数。
Code for the Intel Haswell and later CPUs. You can use the BMI2 instruction set which contains the pext and pdep instructions. These can (among other great things) be used to build your functions.
如果有人在 3d 中使用 morton 代码,那么他需要每 3 个读取一位,这里的 64 位是我使用的函数:
In case someone is using morton codes in 3d, so he needs to read one bit every 3, and 64 bits here is the function I used:
您可以通过相乘来提取 8 个交错位,如下所示:
将它们合并为 32 位或更大应该很简单。
You can extract 8 interleaved bits by multiplying like so:
It should be trivial to combine them for 32 bits or larger.
如果您需要速度,则可以使用表查找一次进行一字节转换(两字节表速度更快,但太大)。程序是在 Delphi IDE 下编写的,但汇编器/算法是相同的。
If you need speed than you can use table-lookup for one byte conversion at once (two bytes table is faster but to big). Procedure is made under Delphi IDE but the assembler/algorithem is the same.
我不想局限于固定大小的整数并使用硬编码常量制作类似命令的列表,因此我开发了一个 C++11 解决方案,它利用模板元编程来生成函数和常量。使用
-O3
生成的汇编代码似乎在不使用 BMI 的情况下尽可能紧凑:TL;DR 源代码库 和现场演示。
实现
基本上,
morton1
函数中的每个步骤都是通过移位和添加到一系列常量来实现的,如下所示:0b0101010101010101
(交替 1 和 0)0b0011001100110011
0b0011001100110011 code> (交替 2x 1 和 0)0b0000111100001111
(交替 4x 1 和 0)0b0000000011111111
(交替 8x 1 和 0)如果我们要使用
D 维度,我们将得到一个包含
D-1
0 和1
1 的模式。因此,要生成这些,生成连续的并应用一些按位“或”就足够了:现在我们可以在编译时通过以下方式生成任意维度的常量:
使用相同类型的递归,我们可以为每个步骤生成函数算法 x = (x | (x >> K)) & M:
仍然需要回答“我们需要多少步?”这个问题。这还取决于维数。一般来说,
k
步计算2^k - 1
输出位;每个维度的最大有意义位数由z = sizeof(T) * 8/dimensions
给出,因此采取1 + log_2 z
步就足够了。现在的问题是我们需要将其作为 constexpr 以便将其用作模板参数。我发现解决此问题的最佳方法是通过元编程定义log2
:最后,我们可以执行一次调用:
I didn't want to be limited to a fixed size integer and making lists of similar commands with hardcoded constants, so I developed a C++11 solution which makes use of template metaprogramming to generate the functions and the constants. The assembly code generated with
-O3
seems as tight as it can get without using BMI:TL;DR source repo and live demo.
Implementation
Basically every step in the
morton1
function works by shifting and adding to a sequence of constants which look like this:0b0101010101010101
(alternate 1 and 0)0b0011001100110011
(alternate 2x 1 and 0)0b0000111100001111
(alternate 4x 1 and 0)0b0000000011111111
(alternate 8x 1 and 0)If we were to use
D
dimensions, we would have a pattern withD-1
zeros and1
one. So to generate these it's enough to generate consecutive ones and apply some bitwise or:Now that we can generate the constants at compile time for arbitrary dimensions with the following:
With the same type of recursion, we can generate functions for each of the steps of the algorithm
x = (x | (x >> K)) & M
:It remains to answer the question "how many steps do we need?". This depends also on the number of dimensions. In general,
k
steps compute2^k - 1
output bits; the maximum number of meaningful bits for each dimension is given byz = sizeof(T) * 8 / dimensions
, therefore it is enough to take1 + log_2 z
steps. The problem is now that we need this asconstexpr
in order to use it as a template parameter. The best way I found to work around this is to definelog2
via metaprogramming:And finally, we can perform one single call: