有效存储素数列表

发布于 2024-08-28 18:14:19 字数 836 浏览 7 评论 0原文

这篇文章说:

每个素数都可以表示为 30k±130k±730k±11 或 对于一些k30k±13。 这意味着我们每个可以使用八位 三十个数字来存储所有 素数;一百万个素数可以是 压缩至 33,334 字节


“这意味着我们可以使用每 30 个数字 8 位来存储所有素数”

这个“每 30 个数字 8 位”适用于 k,对吗?但每个k值不一定只占用一位。难道不应该是八个k值吗?


“一百万个素数可以压缩到 33,334 字节”

我不确定这是怎么回事。

我们需要指示两件事:

  • k 的值(可以任意大)

  • 来自八个状态之一的 STATE (-13,- 11,-7,-1,1,7,11,13)

我不明白“33,334 字节”是如何得出的,但我可以说一件事:随着素数的值变得越来越大,我们将需要更多空间来存储k的值。

那么我们如何才能将其修复为“33,334字节”呢?

This article says:

Every prime number can be expressed as
30k±1, 30k±7, 30k±11, or
30k±13 for some k.
That means we can use eight bits per
thirty numbers to store all the
primes; a million primes can be
compressed to 33,334 bytes


"That means we can use eight bits per thirty numbers to store all the primes"

This "eight bits per thirty numbers" would be for k, correct? But each k value will not necessarily take-up just one bit. Shouldn't it be eight k values instead?


"a million primes can be compressed to 33,334 bytes"

I am not sure how this is true.

We need to indicate two things:

  • VALUE of k (can be arbitrarily large)

  • STATE from one of the eight states (-13,-11,-7,-1,1,7,11,13)

I am not following how "33,334 bytes" was arrived at, but I can say one thing: as the prime numbers become larger and larger in value, we will need more space to store the value of k.

How, then can we fix it at "33,334 bytes"?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

小梨窩很甜 2024-09-04 18:14:19

这篇文章有点误导:我们无法存储 100 万个素数,但我们可以存储 100 万以下的所有素数。

k 的值来自它在列表中的位置。对于这 8 个排列中的每一个,我们只需要 1 位 (-13,-11..,11,13)

换句话说,我们将使用 8 位来存储 k=0, 8 来存储 k=1, 8存储 k=2 等。通过让这些按顺序排列,我们不需要为每 8 位指定 k 的值 - 它只是前 8 位的值 + 1。

因为 1,000,000 / 30 = 33,333 1 /3,我们可以存储这 8 位序列中的 33,334 个来表示低于 100 万的值是素数,因为我们覆盖了 k 可以拥有的所有值,而 30k-13 不超过 100 万的限制。

The article is a bit misleading: we can't store 1 million primes, but we can store all primes below 1 million.

The value of k comes from its position in the list. We only need 1 bit for each of those 8 permutations (-13,-11..,11,13)

In other words, we'll use 8 bits to store for k=0, 8 to store for k=1, 8 to store for k=2, etc. By letting these follow sequentially, we don't need to specify the value of k for each 8 bits - it's simply the value for the previous 8 bits + 1.

Since 1,000,000 / 30 = 33,333 1/3, we can store 33,334 of these 8 bit sequences to represent which values below 1 million are prime, since we cover all of the values k can have without 30k-13 exceeding the limit of 1 million.

秋叶绚丽 2024-09-04 18:14:19

您不需要存储 k 的每个值。如果要存储 100 万以下的素数,请使用 33,334 个字节 - 第一个字节对应于 k=0,第二个字节对应于 k=1 等。然后,在每个字节中,使用 1 位来表示“素数”或“合数” “ 30k+1、30k+7 等。

You don't need to store each value of k. If you want to store the prime numbers below 1 million, use 33,334 bytes - the first byte corresponds to k=0, the second to k=1 etc. Then, in each byte, use 1 bit to indicate "prime" or "composite" for 30k+1, 30k+7 etc.

对你而言 2024-09-04 18:14:19

它是一个位掩码——30 个可能是素数的值中的每一个对应 8 个位,因此每 30 个数字有 8 个位。要将 10^6 以内的所有素数制成表格,您需要 8*10^6/30 = 2666667 位 = 33334 字节。

为了解释为什么这是一个好方法,您需要看看明显的替代方案。

一种更简单的方法是使用位掩码。您需要一百万位,125000 字节。

您还可以存储素数本身的值。最多 1000000,这些值适合 20 位,并且有 78498 个素数,因此这给出了令人失望的 1569960 位(196245 字节)。

另一种方法(尽管对于查找素数不太有用)是存储每个素数与下一个素数之间的差异。不到一百万,这适合 6 位(只要您记住此时素数都是奇数,因此您只需要存储偶数差异,从而可以丢弃最低位),470998 位 == 58874 字节。 (您可以通过计算必须跳转多少个 mod-30 插槽来削减另一位。)

现在,除了 30 = 2*3*5 之外,30 没有什么特别之处,所以这个查找实际上是在引导您通过位掩码刚开始时埃拉托斯坦筛模式的表示。您可以改为使用 2*3*5*7 = 210,然后您必须考虑 +- 1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59、61、67、71、73、79、83、89、97、101、103,适用于 48 个值。如果您使用 7 个 30 块来执行此操作,则需要 7*8=56 位,因此这是一个微小的改进,但是呃...不值得这么麻烦。

因此,这是紧凑地存储相当小的素数的更好技巧之一。

(PS 有趣的是,如果素数随机出现(但与实际出现的相同数字最多出现 1000000 个),则存储在 1 到 10^6 之间的素数中的信息量将约为每个数字 0.397 位。因此,在天真的信息论假设下,您可能会认为存储前一百万个素数的最佳方法是使用 1000000*0.397 位或 49609 字节。)

It's a bitmask--one bit for each of the 8 values out of 30 that might be prime, so 8 bits per 30 numbers. To tabulate all primes up to 10^6, you thus need 8*10^6/30 = 2666667 bits = 33334 bytes.

To explain why this is a good way to go, you need to look at the obvious alternatives.

A more naive way to go would just be to use a bitmask. You need a million bits, 125000 bytes.

You could also store the values of the primes themselves. Up to 1000000, the values fit in 20 bits, and there are 78498 primes, so this gives a disappointing 1569960 bits (196245 bytes).

Another way to go--though less useful for looking up primes--is to store the differences between each prime and the next. Under a million, this fits in 6 bits (as long as you remember that the primes are all odd at that point, so you only need to store even differences and can thus throw away the lowest bit), for 470998 bits == 58874 bytes. (You could shave off another bit by counting how many mod-30 slots you had to jump.)

Now, there's nothing particularly special about 30 except that 30 = 2*3*5, so this lookup is actually walking you up through a bitmask representation of the Sieve of Eratosthanes pattern just after you've gotten started. You could instead use 2*3*5*7 = 210, and then you'd have to consider +- 1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, for 48 values. If you were doing this with 7 blocks of 30, you'd need 7*8=56 bits, so this is a slight improvement, but ugh...hardly worth the hassle.

So this is one of the better tricks out there for compactly storing reasonably small prime numbers.

(P.S. It's interesting to note that if primes appeared randomly (but the same number appeared up to 1000000 as actually appear) the amount of information stored in the primality of a number between 1 and 10^6 would be ~0.397 bits per number. Thus, under naive information-theoretic assumptions, you'd think that the best you could possibly do to store the first million primes was to use 1000000*0.397 bits, or 49609 bytes.)

花桑 2024-09-04 18:14:19

从另一个角度来看,前 23,163,298 个素数可以被认为是很好的可压缩的。它是每个间隙 <= 255(即适合单个字节)的素数的最大数量。

我在此处使用了这个事实,为了将素数缓存的内存占用减少 8 倍,即不使用 number(8 字节),我只缓存素数之间的间隙,每个素数仅使用 1 个字节。

As another perspective on this, the first 23,163,298 primes can be considered nicely compressible. It is the maximum number of primes for which every gap is <= 255, i.e. fits into a single byte.

I used this fact here, to reduce memory footprint for primes cache by 8 times, i.e. instead of using number (8 bytes), I'm caching only the gaps between primes, using just 1 byte per prime.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文