DJ Bernstein 的《primegen》中 10^15 的限制在哪里?程序从何而来?
在 http://cr.yp.to/primegen.html 您可以找到以下程序的来源:使用阿特金筛来生成素数。正如作者所说,可能需要几个月的时间才能回复发送给他的电子邮件(我明白,他确实是一个忙碌的人!)我发布了这个问题。
该页面指出“primegen 可以生成最多 1000000000000000 个素数”。我试图理解为什么会这样。当然,最大限制是 2^64 ~ 2 * 10^19(long unsigned int 的大小),因为这就是数字的表示方式。我确信如果存在巨大的素数间隙(> 2^31),那么数字的打印将会失败。然而,在这个范围内,我认为不存在这样的素数差距。
要么是作者高估了界限(实际上大约是 10^19),要么是源代码中的某个地方算术运算可能会溢出或类似的情况。
有趣的是,你实际上可以运行它来获取数字 > 。 10^15:
./primes 10000000000000000 10000000000000100
10000000000000061
10000000000000069
10000000000000079
10000000000000099
如果您相信 Wolfram Alpha,那么它是正确的。
我进行了“逆向工程”的一些事实:
- 数字按 1,920 * PRIMEGEN_WORDS = 3,932,160 个数字的批次进行筛选(请参阅 primegen_next.c 中的 primegen_fill 函数)
- PRIMEGEN_WORDS 控制单次筛选的大小 - 您可以在 primegen_impl.h 中调整它以适应你的CPU缓存,
- 筛子本身的实现在primegen.c文件中 - 我假设这是正确的;你得到的是 pg->buf 中素数的位掩码(参见 primegen_fill 函数)。
- 分析位掩码并将素数存储在 pg->p 数组中。
我认为没有任何可能发生溢出的地方。
At http://cr.yp.to/primegen.html you can find sources of program that uses Atkin's sieve to generate primes. As the author says that it may take few months to answer an e-mail sent to him (I understand that, he sure is an occupied man!) I'm posting this question.
The page states that 'primegen can generate primes up to 1000000000000000'. I am trying to understand why it is so. There is of course a limitation up to 2^64 ~ 2 * 10^19 (size of long unsigned int) because this is how the numbers are represented. I know for sure that if there would be a huge prime gap (> 2^31) then printing of numbers would fail. However in this range I think there is no such prime gap.
Either the author overestimated the bound (and really it is around 10^19) or there is a place in the source code where the arithmetic operation can overflow or something like that.
The funny thing is that you actually MAY run it for numbers > 10^15:
./primes 10000000000000000 10000000000000100
10000000000000061
10000000000000069
10000000000000079
10000000000000099
and if you believe Wolfram Alpha, it is correct.
Some facts I had "reverse-engineered":
- numbers are sifted in batches of 1,920 * PRIMEGEN_WORDS = 3,932,160 numbers (see primegen_fill function in primegen_next.c)
- PRIMEGEN_WORDS controls how big a single sifting is - you can adjust it in primegen_impl.h to fit your CPU cache,
- the implementation of the sieve itself is in primegen.c file - I assume it is correct; what you get is a bitmask of primes in pg->buf (see primegen_fill function)
- The bitmask is analyzed and primes are stored in pg->p array.
I see no point where the overflow may happen.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我希望我能在电脑上查看,但我怀疑如果您从 1 作为下限开始,您会取得不同的成功。
I wish I was on my computer to look, but I suspect you would have different success if you started at 1 as your lower bound.
仅从算法来看,我可以得出结论,上限来自 32 位数字。
该页面提到 Pentium-III 作为 CPU,所以我猜它很旧并且不使用 64 位。
2^32 大约是 10^9。阿特金斯筛法(算法使用的)需要 N^(1/2) 位(它使用很大的位域)。这意味着在 2^32 大内存中,您可以(保守)N 约为 10^15。由于这个数字是一个粗略的保守上限(您有系统和其他程序占用内存,为 IO 保留地址范围,...),实际上限可能更高。
Just from the algorithm, I would conclude that the upper bound comes from the 32 bit numbers.
The page mentiones Pentium-III as CPU so my guess it is very old and does not use 64 bit.
2^32 are approx 10^9. Sieve of Atkins (which the algorithm uses) requires N^(1/2) bits (it uses a big bitfield). Which means in 2^32 big memory you can make (conservativ) N approx 10^15. As this number is a rough conservative upper bound (you have system and other programs occupying memory, reserving address ranges for IO,...) the real upper bound is/might be higher.