“2^n - 1”的类似 De Bruijn 的序列:它是如何构造的?
我正在查看条目 Find the log base 2 of an N-bit integer in O(lg(N)) 乘法和查找运算 来自 位玩弄黑客。
我可以很容易地看到该条目中的第二个算法是如何工作的
static const int MultiplyDeBruijnBitPosition2[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition2[(uint32_t)(v * 0x077CB531U) >> 27];
,它计算 n = log2 v
,其中 v
已知是 2 的幂。在本例中 0x077CB531
是一个普通的 De Bruijn 序列,其余的都是显而易见的。
然而,该条目中的第一个算法
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
r = MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27];
对我来说看起来有点棘手。我们首先将 v
捕捉到最接近的较大 2^n - 1
值。然后将此 2^n - 1
值乘以 0x07C4ACDD
,在这种情况下,其作用与先前算法中的 DeBruijn 序列的作用相同。
我的问题是:我们如何构建这个神奇的 0x07C4ACDD
序列?即,我们如何构造一个序列,该序列可用于在乘以 2^n - 1
值时生成唯一索引?对于2^n
乘数来说,它只是一个普通的De Bruijn序列,正如我们在上面看到的,所以很清楚0x077CB531
来自哪里。但是 2^n - 1
乘数 0x07C4ACDD
又如何呢?我觉得我在这里遗漏了一些明显的东西。
PS为了澄清我的问题:我并不是真的在寻找一种算法来生成这些序列。我对一些或多或少的琐碎属性(如果存在的话)更感兴趣,这些属性使 0x07C4ACDD
按照我们希望的方式工作。对于0x077CB531
,使其工作的属性非常明显:它包含以 1 位步进的序列“存储”的所有 5 位组合(这基本上就是 De Bruijn 序列)。
另一方面,0x07C4ACDD
本身并不是 De Bruijn 序列。那么,他们在构造 0x07C4ACDD
时的目标是什么(除了非建设性的“它应该使上述算法起作用”)?确实有人以某种方式想出了上述算法。所以他们可能知道这种方法是可行的,并且存在适当的顺序。他们是怎么知道的?
例如,如果我要为任意 v
构建算法,我会
v |= v >> 1;
v |= v >> 2;
...
先这样做。然后我只需执行 ++v
将 v
转换为 2 的幂(假设它不会溢出)。然后我会应用第一个算法。最后我会执行 --r
以获得最终答案。然而,这些人设法对其进行了优化:他们只需更改乘数并重新排列表格即可消除前导 ++v
和尾随 --r
步骤。他们怎么知道这是可能的?这种优化背后的数学原理是什么?
I'm looking at the entry Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup from Bit Twiddling hacks.
I can easily see how the second algorithm in that entry works
static const int MultiplyDeBruijnBitPosition2[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition2[(uint32_t)(v * 0x077CB531U) >> 27];
which calculates n = log2 v
where v
is known to be a power of 2. In this case 0x077CB531
is an ordinary De Bruijn sequence, and the rest is obvious.
However, the first algorithm in that entry
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
r = MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27];
looks a bit more tricky to me. We begin by snapping v
to a nearest greater 2^n - 1
value. This 2^n - 1
value is then multiplied by 0x07C4ACDD
, which in this case acts the same way as the DeBruijn sequence in the previous algorithm did.
My question is: how do we construct this magic 0x07C4ACDD
sequence? I.e. how do we construct a sequence that can be used to generate unique indices when multiplied by a 2^n - 1
value? For 2^n
multiplier it is just an ordinary De Bruijn sequence, as we can see above, so it is clear where 0x077CB531
came from. But what about 2^n - 1
multiplier 0x07C4ACDD
? I feel like I'm missing something obvious here.
P.S. To clarify my question: I'm not really looking for an algorithm to generate these sequences. I'm more interested in some more-or-less trivial property (if one exists) that makes 0x07C4ACDD
work as we want it to work. For 0x077CB531
the property that makes it work is pretty obvious: it contains all 5-bit combinations "stored" in the sequence with 1-bit stepping (which is basically what De Bruijn sequence is).
The 0x07C4ACDD
, on the other hand, is not a De Bruijn sequence by itself. So, what property were they aiming for when constructing 0x07C4ACDD
(besides the non-constructive "it should make the above algorithm work")? Someone did come up with the above algorithm somehow. So they probably knew that the approach is viable, and that the appropriate sequence exists. How did they know that?
For example, if I were to construct the algorithm for an arbitrary v
, I'd do
v |= v >> 1;
v |= v >> 2;
...
first. Then I'd just do ++v
to turn v
into a power of 2 (let's assume it doesn't overflow). Then I'd apply the first algorithm. And finally I'd do --r
to obtain the final answer. However, these people managed to optimize it: they eliminated the leading ++v
and the trailing --r
steps simply by changing the multiplier and rearranging the table. How did they know it was possible? What is the math behind this optimization?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
k 个符号上的 n 阶 De Bruijn 序列(长度为 k^n)具有这样的属性:每个可能的 n 长度单词都在其中显示为连续字符,其中一些具有循环换行。例如,在k=2、n=2的情况下,可能的单词是00、01、10、11,De Bruijn序列是0011。00、01、11出现在其中,10有换行。此属性自然意味着左移 De Bruijn 序列(乘以 2 的幂)并取其高 n 位会为 2 的每个幂乘数产生一个唯一的数字。那么你只需要一张查找表就能确定是哪一个。它的工作原理与 1 小于 2 的幂的数字类似,但这种情况下的神奇数字不是 De Bruijn 序列,而是一个类似物。定义属性简单地更改为“每个可能的 n 长度单词都显示为长度为 n, mod 2^n 的前 m 个子序列的总和”。该属性是算法工作所需的全部。他们只是使用这种不同类别的幻数来加速算法。我也这么做了。
构造 De Bruijn 数的一种可能方法是生成 De Bruijn 图的哈密顿路径,维基百科提供了此类图的示例。在这种情况下,节点是 2^5=32 位整数,有向边是它们之间的转换,其中转换是左移和根据边的标签 0 或 1 进行的二进制或运算。可能有直接类似于 2^n-1 类型的幻数,这可能值得探索,但这不是人们通常构建此类算法的方式。
在实践中,您可能会尝试以不同的方式构建它,特别是如果您希望它以稍微不同的方式运行。例如,在 bit twiddling hacks 页面上实现前导/尾随零数算法只能返回 [0..31] 中的值。它需要额外检查 0 的情况,它有 32 个零。此检查需要分支,并且在某些 CPU 上可能太慢。
我这样做的方式是,我使用 64 个元素的查找表而不是 32 个元素,生成随机幻数,并为每个元素建立一个具有两个输入幂的查找表,检查其正确性(内射性),然后验证它的所有 32 位数字。我继续下去,直到遇到一个正确的幻数。结果数字不满足“每个可能的 n 长度单词出现”的属性,因为只出现 33 个数字,这对于所有 33 个可能的输入都是唯一的。
穷举蛮力搜索听起来很慢,尤其是在好的幻数很少见的情况下,但如果我们首先测试两个值的已知幂作为输入,表很快就会被填满,拒绝也很快,并且拒绝率非常高。我们只需要在每个幻数之后清除表格即可。本质上,我利用了高拒绝率算法来构造幻数。
由此产生的算法是
至于你的问题他们是如何知道的,他们可能不知道。他们像我一样尝试、尝试改变事情。毕竟,2^n-1 个输入可以代替具有不同幻数和查找表的 2^n 个输入,这并不是一个很大的想象。
在这里,我制作了幻数生成器代码的简化版本。如果我们只检查两个输入的幂,它会在 5 分钟内检查所有可能的幻数,找到 1024 个幻数。检查其他输入是没有意义的,因为它们无论如何都会减少到 2^n-1 形式。不构造表,但是一旦你知道了幻数,它就变得微不足道了。
A De Bruijn sequence of order n over k symbols (and of k^n length) have a property that every possible n-length word appears as consecutive characters in it, some of them with cyclic wrapping. For example, in the case of k=2, n=2, the possible words are 00, 01, 10, 11, and a De Bruijn sequence is 0011. 00, 01, 11 appears in it, 10 with wrapping. This property naturally means that left shifting a De Bruijn sequence (multiplying with power of two) and taking its upper n bits results in a unique number for each power of two multiplier. Then you only need a lookup table to determine which one it is. It works on a similar principle to numbers which are one less than power of two, but the magic number in this case is not a De Bruijn sequence, but an analogue. The defining property simply changes to "every possible n-length word appears as the sum of the first m subsequences of length n, mod 2^n". This property is all that is needed for the algorithm to work. They simply used this different class of magic numbers to speed up the algorithm. I did as well.
One possible method of construction of De Bruijn numbers is the generation of a Hamiltonian path of the De Bruijn graph, Wikipedia provides an example of such a graph. In this case, the nodes are 2^5=32-bit integers, the directed edges are transitions between them, where a transition is a left shift and a binary or operation according to the label of the edge, 0 or 1. There might be a direct analogue to 2^n-1 type magic numbers, it might be worth exploring, but this isn't a way people usually construct such algorithms.
In practice you might try to construct it differently, especially if you want it to behave in a tad different manner. For example, the implementation of leading/trailing number of zeros algorithms on the bit twiddling hacks page can only return values in [0..31]. It needs additional checking for the case of 0, which has 32 zeros. This check requires a branching and can be way too slow on some CPUs.
The way I did it, I used a 64 element lookup table instead of 32, generated random magic numbers, and for each of them I built up a lookup table with power of two inputs, checked its correctness (injectivity), then verified it for all 32-bit numbers. I continued till I encountered a correct magic number. The resulting numbers do not fulfill a property of "every possible n-length word appears", since only 33 numbers appear, which are unique for all 33 possible input.
Exhaustive brute force search sounds slow, especially if good magic numbers are rare, but if we first test known power of two values as inputs, the table is filled quickly, rejection is fast, and the rejection rate is very high. We only need to clear the table after each magic number. In essence I exploited a high rejection rate algorithm to construct magic numbers.
The resulting algorithms are
As for your question of how did they know, they probably didn't. They experimented, tried to change things, just like me. After all, it isn't a big stretch of imagination that 2^n-1 inputs might work instead of 2^n inputs with different magic number and lookup table.
Here, I made a simplified version of my magic number generator code. It checks all possible magic numbers in 5 minutes if we check only for power of two inputs, finding 1024 magic numbers. Checking against other inputs are pointless since they are reduced to 2^n-1 form anyway. Does not construct the table, but it is trivial once you know the magic number.
它基于论文 Using de Bruijn Sequences to Index a 1 in a计算机单词。我猜测他们搜索了一个完美的哈希函数来将
2^n-1
映射到[0..31]
。他们描述了一种搜索最多两位设置的整数零计数的方法,其中涉及逐步建立乘法器。It's based on the paper Using de Bruijn Sequences to Index a 1 in a Computer Word. I'm going to guess that they did a search for a perfect hash function to map
2^n-1
to[0..31]
. They describe a method for searching for counting zeroes of integers with up to two bits set which involves incrementally building up the multiplier.来自: http://www.stmintz.com/ccc/index.php?id =306404
在我看来,0x07C4ACDD是一个5位de Bruijn序列。
From: http://www.stmintz.com/ccc/index.php?id=306404
It seems to me that 0x07C4ACDD is a 5-bit de Bruijn sequence.
你的问题的答案可能比预期的要简单一些。首先,
0x077CB531
和0x07C4ACDD
都是 de Bruijn 序列。我们将它们称为B
和C
。现在让我们看看如何查找n - 1
的最高位,其中n
是 2 的幂(因此非零)。请注意:(n - 1) * C
,它与n * C - C
相同。n * C - C
的前 5 位。n * C
是C
的左移版本。C
的前 5 位为零:如果C<,则减去
C
可能会将n * C
的前 5 位减 1 /code> 大于n * C
的底部 27 位。这是
C
的特殊之处:它被选择为始终大于那些底部 27 位。选择C
,使得在其前 5 位为零之后,接下来的 5 位全部为1
。每个
k
位 de Bruijn 序列恰好具有一组k
个连续的0
和一组k
个连续的 < code>1 且不再;然而,这两个子序列在所有 de Bruijn 序列中并不相邻。它们在C
中是相邻的。假设
n
不为零,则旋转后的n * C
在接下来的 5 位中永远不会有 5 个连续的1
:它总是至少有那里有一个零,因此C
的减法总是导致n * C
的前 5 位递减。换句话说,使用n - 1
而不是n
并不会真正改变任何内容,除了将查找表旋转一个条目之外。生成这样的德布鲁因序列
一种方法是简单地将上述约束应用于生成德布鲁因序列的任何方法。这是一个使用 LFSR 的简单示例。虽然 LFSR 生成基本上任何位长度的 de Bruijn 序列,但它们只能找到少数具有上述约束的序列,因此这仅是说明性的。
线性反馈移位寄存器(LFSR)的行为与 de Bruijn 序列非常相似:最大周期
n
位 LFSR 产生(2^n)-1
位的周期序列,其中最后一个任意点的n
位循环遍历所有n
位数字(除了 1,通常为零)。通过在 LFSR 输出序列中输出n-1
个连续零的点添加一个零位(它必须这样做,覆盖所有n< /code>
-位数字保存零)。
下面是示例 C 代码,用于查找给定范围内的所有最大周期 Galois LFSR位宽度,并通过添加缺失的零来显示相应的 de Bruijn 序列。事实证明,在进行右移时,通过设置最高位来启动 LFSR 是很简单的(从值 1 开始,末尾为零,而不是开头)。如果序列满足前面所述的约束,它还会显示
FOUND
。例如,它找到以下序列:
仔细观察,您可能会注意到它提供了对称(反转)的序列对。此外,没有一个与已经提到的其他 de Bruijn 序列相匹配,无论是移位、倒置还是反向:虽然 LFSR 找到了几个 de Bruijn 序列,但它们肯定不会找到所有这些序列。他们也没有找到 5 位的约束序列,并且很少发现 7 位以上的约束序列;它找到的那些似乎总是由两个水龙头组成,分别位于顶部和底部。
The answer to your question is perhaps a bit simpler than expected. First, both
0x077CB531
and0x07C4ACDD
are de Bruijn sequences. Let's call themB
andC
. Now let's look at finding the top bit ofn - 1
wheren
is a power of 2 (thus non-zero). Note that:(n - 1) * C
which is the same asn * C - C
.n * C - C
.n * C
is a left-shifted version ofC
.C
are zero: subtractingC
might decrement the top 5 bits ofn * C
by 1 ifC
is larger than the bottom 27 bits ofn * C
.Here's what's special about
C
: it was chosen such that it is always larger than those bottom 27 bits.C
was chosen such that after its top 5 bits which are zero, the next 5 bits are all1
s.Every
k
-bit de Bruijn sequence has exactly one set ofk
consecutive0s
and one set ofk
consecutive1
s and none longer; however, these two subsequences are not adjacent in all de Bruijn sequences. They are adjacent inC
.Assuming
n
is non-zero, the rotatedn * C
never has 5 consecutive1
s in those next 5 bits: it always has at least a zero there, so the subtraction byC
always results in decrementing the top 5 bits ofn * C
. In other words, usingn - 1
instead ofn
doesn't really change anything, except rotate the lookup table by one entry.Generating such de Bruijn sequences
One approach is to simply apply the above constraint to any method of generating de Bruijn sequences. Here's a simple example using LFSRs. While LFSRs generate de Bruijn sequences of essentially any bit length, they only find a few with the above constraint, so this is illustrative only.
Linear feedback shift registers (LFSRs) behave much like de Bruijn sequences: a maximal-period
n
-bit LFSR produces a periodic sequence of(2^n)-1
bits, for which the lastn
bits at any point cycles through alln
-bit numbers (except one, typically zero). This zero is trivially added back by adding a zero bit in the LFSR output sequence at the point where it outputsn-1
consecutive zeroes (which it must do, covering alln
-bit numbers save zero).Here's example C code to find all maximal-period Galois LFSRs of a given range of bit widths, and display the corresponding de Bruijn sequence by adding the missing zero. It turns out trivial to do by starting the LFSR with the top bit set when doing right shifts (starting with the value 1 ends up with zeroes at the end instead of at the beginning). It also displays
FOUND
if the sequence meets the constraints stated earlier.For example, it finds the following sequences:
Looking carefully, you might notice it provides symmetrical (reversed) pairs of sequences. Also, none match the other de Bruijn sequences already mentioned, whether shifted, inverted or reversed: while LFSRs find several de Bruijn sequences, they certainly do not find all of them. They also don't find constrained sequences for 5 bits and very few above 7 bits; the ones it does find seem to always consist of two taps, at the top and bottom bits.