埃拉托色尼筛法 - 寻找素数 Python
只是为了澄清,这不是一个家庭作业问题:)
我想为我正在构建的数学应用程序找到素数遇到了埃拉托斯特尼筛法方法。
我已经用 Python 编写了它的实现。但速度非常慢。比如说,如果我想找到所有小于 200 万的素数。需要> 20分钟(我此时停止了)。我怎样才能加快速度?
def primes_sieve(limit):
limitn = limit+1
primes = range(2, limitn)
for i in primes:
factors = range(i, limitn, i)
for f in factors[1:]:
if f in primes:
primes.remove(f)
return primes
print primes_sieve(2000)
更新: 我最终对这段代码进行了分析&发现从列表中删除一个元素花费了相当多的时间。考虑到它必须遍历整个列表(最坏情况)才能找到元素和元素,这是非常可以理解的。然后删除它,然后重新调整列表(也许还有一些副本?)。不管怎样,我把字典的清单扔掉了。我的新实施 -
def primes_sieve1(limit):
limitn = limit+1
primes = dict()
for i in range(2, limitn): primes[i] = True
for i in primes:
factors = range(i,limitn, i)
for f in factors[1:]:
primes[f] = False
return [i for i in primes if primes[i]==True]
print primes_sieve1(2000000)
Just to clarify, this is not a homework problem :)
I wanted to find primes for a math application I am building & came across Sieve of Eratosthenes approach.
I have written an implementation of it in Python. But it's terribly slow. For say, if I want to find all primes less than 2 million. It takes > 20 mins. (I stopped it at this point). How can I speed this up?
def primes_sieve(limit):
limitn = limit+1
primes = range(2, limitn)
for i in primes:
factors = range(i, limitn, i)
for f in factors[1:]:
if f in primes:
primes.remove(f)
return primes
print primes_sieve(2000)
UPDATE:
I ended up doing profiling on this code & found that quite a lot of time was spent on removing an element from the list. Quite understandable considering it has to traverse the entire list (worst-case) to find the element & then remove it and then readjust the list (maybe some copy goes on?). Anyway, I chucked out list for dictionary. My new implementation -
def primes_sieve1(limit):
limitn = limit+1
primes = dict()
for i in range(2, limitn): primes[i] = True
for i in primes:
factors = range(i,limitn, i)
for f in factors[1:]:
primes[f] = False
return [i for i in primes if primes[i]==True]
print primes_sieve1(2000000)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(27)
您没有完全实现正确的算法:
在第一个示例中,
primes_sieve
不维护要删除/取消设置的素数标志列表(如算法中所示),而是调整整数列表的大小连续地,这是非常昂贵的:从列表中删除一个项目需要将所有后续项目向下移动一位。在第二个示例中,
primes_sieve1
维护一个素数标志的字典,这是朝着正确方向迈出的一步,但它以未定义的顺序迭代字典,并冗余地删除因子的因子(而不是像算法中那样仅是素数的因子)。您可以通过对键进行排序并跳过非素数来解决此问题(这已经使其速度快了一个数量级),但直接使用列表仍然更有效。正确的算法(使用列表而不是字典)类似于:(
请注意,这还包括在素数平方 (
i*i
) 处开始非素数标记的算法优化,而不是它的两倍。)You're not quite implementing the correct algorithm:
In your first example,
primes_sieve
doesn't maintain a list of primality flags to strike/unset (as in the algorithm), but instead resizes a list of integers continuously, which is very expensive: removing an item from a list requires shifting all subsequent items down by one.In the second example,
primes_sieve1
maintains a dictionary of primality flags, which is a step in the right direction, but it iterates over the dictionary in undefined order, and redundantly strikes out factors of factors (instead of only factors of primes, as in the algorithm). You could fix this by sorting the keys, and skipping non-primes (which already makes it an order of magnitude faster), but it's still much more efficient to just use a list directly.The correct algorithm (with a list instead of a dictionary) looks something like:
(Note that this also includes the algorithmic optimization of starting the non-prime marking at the prime's square (
i*i
) instead of its double.)从数组(列表)的开头删除需要将其后面的所有项目向下移动。这意味着以这种方式从列表中从前面开始删除每个元素是一个 O(n^2) 操作。
您可以使用 set: ... 更有效地完成此操作
,或者避免重新排列列表:
Removing from the beginning of an array (list) requires moving all of the items after it down. That means that removing every element from a list in this way starting from the front is an O(n^2) operation.
You can do this much more efficiently with sets:
... or alternatively, avoid having to rearrange the list:
更快:
Much faster:
使用一点
numpy
,我可以在 2 秒多一点的时间内找到 1 亿以下的所有素数。有两个关键特征需要注意,
i
删除i
的倍数,直到n
的根号i< 的倍数使用
x[2*i::i] = False
将 /code> 转换为False
比显式 Python for 循环快得多。这两个可以显着加快你的代码速度。对于低于 100 万的限制,没有可感知的运行时间。
Using a bit of
numpy
, I could find all primes below 100 million in a little over 2 seconds.There are two key features one should note
i
only fori
up to root ofn
i
toFalse
usingx[2*i::i] = False
is much faster than an explicit python for loop.These two significantly speed up your code. For limits below one million, there is no perceptible running time.
我意识到这并没有真正回答如何快速生成素数的问题,但也许有些人会发现这个替代方案很有趣:因为 python 通过生成器提供惰性求值,所以埃拉托斯特尼筛可以完全按照所述实现:
try 块在那里是因为该算法运行直到它耗尽堆栈并且没有
尝试阻止显示回溯,将您想要在屏幕上看到的实际输出推开。
I realise this isn't really answering the question of how to generate primes quickly, but perhaps some will find this alternative interesting: because python provides lazy evaluation via generators, eratosthenes' sieve can be implemented exactly as stated:
The try block is there because the algorithm runs until it blows the stack and without the
try block the backtrace is displayed pushing the actual output you want to see off screen.
通过结合许多爱好者的贡献(包括上述评论中的 Glenn Maynard 和 MrHIDEn),我在 python 2 中提出了以下代码:
在我的机器上计算 10 次幂的不同输入所需的时间为:
By combining contributions from many enthusiasts (including Glenn Maynard and MrHIDEn from above comments), I came up with following piece of code in python 2:
Time taken for computation on my machine for different inputs in power of 10 is:
一个简单的速度技巧:当您定义变量“primes”时,将步长设置为 2 以自动跳过所有偶数,并将起点设置为 1。
然后您可以进一步优化,而不是 for i in primes,使用 for i在素数中[:round(len(素数) ** 0.5)]。这将显着提高性能。此外,您还可以消除以 5 结尾的数字,以进一步提高速度。
A simple speed hack: when you define the variable "primes," set the step to 2 to skip all even numbers automatically, and set the starting point to 1.
Then you can further optimize by instead of for i in primes, use for i in primes[:round(len(primes) ** 0.5)]. That will dramatically increase performance. In addition, you can eliminate numbers ending with 5 to further increase speed.
我的实现:
My implementation:
这是一个内存效率更高的版本(并且:适当的筛子,而不是试除法)。基本上,它不是保留所有数字的数组,并删除那些不是素数的数字,而是保留一组计数器 - 每个发现的素数一个 - 并将它们超越假定的素数。这样,它使用的存储空间与素数的数量成正比,而不是与最高素数成正比。
您会注意到
primes()
是一个生成器,因此您可以将结果保存在列表中,也可以直接使用它们。这是前 n 个素数:并且,为了完整起见,这里有一个用于测量性能的计时器:
以防万一您想知道,我还将
primes()
编写为一个简单的迭代器(使用__iter__
和__next__
),并且它以几乎相同的速度运行。也让我很惊讶!Here's a version that's a bit more memory-efficient (and: a proper sieve, not trial divisions). Basically, instead of keeping an array of all the numbers, and crossing out those that aren't prime, this keeps an array of counters - one for each prime it's discovered - and leap-frogging them ahead of the putative prime. That way, it uses storage proportional to the number of primes, not up to to the highest prime.
You'll note that
primes()
is a generator, so you can keep the results in a list or you can use them directly. Here's the firstn
primes:And, for completeness, here's a timer to measure the performance:
Just in case you're wondering, I also wrote
primes()
as a simple iterator (using__iter__
and__next__
), and it ran at almost the same speed. Surprised me too!我更喜欢 NumPy,因为速度快。
检查输出:
在 Jupyter Notebook 上比较埃拉托斯特尼筛法和暴力破解的速度。埃拉托色尼筛法比暴力法快 539 倍,可处理百万个元素。
I prefer NumPy because of speed.
Check the output:
Compare the speed of Sieve of Eratosthenes and brute-force on Jupyter Notebook. Sieve of Eratosthenes in 539 times faster than brute-force for million elements.
我认为必须可以简单地使用空列表作为循环的终止条件,并提出了这个:
I figured it must be possible to simply use the empty list as the terminating condition for the loop and came up with this:
我认为这是用埃拉托色尼方法查找素数的最短代码
i think this is shortest code for finding primes with eratosthenes method
我能想到的最快的实现:
The fastest implementation I could come up with:
我刚刚想出这个。它可能不是最快的,但除了直接添加和比较之外我没有使用任何其他东西。当然,阻止你的是递归限制。
I just came up with this. It may not be the fastest, but I'm not using anything other than straight additions and comparisons. Of course, what stops you here is the recursion limit.
我制作了埃拉托色尼筛的单线版本
就性能而言,我很确定这无论如何都不是最快的事情,就可读性/遵循 PEP8 而言,这非常糟糕,但它更新颖的长度比什么都重要。
编辑:请注意,这只是打印筛子和筛子。不返回(如果您尝试打印它,您将得到一个 None 列表,如果您想返回,请将列表理解中的 print(x) 更改为“x”。
I made a one liner version of the Sieve of Eratosthenes
In terms of performance, I am pretty sure this isn't the fastest thing by any means, and in terms of readability / following PEP8, this is pretty terrible, but it's more the novelty of the length than anything.
EDIT: Note that this simply prints the sieve & does not return (if you attempt to print it you will get a list of Nones, if you want to return, change the print(x) in the list comprehension to just "x".
埃拉托斯特尼筛法的各种方法的实证分析和可视化
我在第 10th 章(映射、哈希表和跳过列表)末尾发现了该算法
“数据结构和算法” Python”。我最终编写了三个版本:
Naive SOE
在所有问题中,返回语句中的额外空间使用是最糟糕的。
次优 SOE
有了巨大的改进,但仍然缺乏空间使用,并且内循环间隔进展不友好。
快速 SOE
易于理解(注意小数指数的使用与根的数学定义明确一致)
和 @Pi Delport 的答案一样高效。
实证分析
要比较所有三种实现以及 @Pi Delport 所选的答案,
我从 100 范围内的素数到 4500 范围内的素数,以 100 为间隔,进行了 45 次迭代
(这是可视化的最佳点,因为尽管图形形状保持一致,
天真的实现的增长使其他三个的可见性相形见绌)。
您可以在 GitHub gist 上调整可视化代码,但这里有一个示例输出:
Empirical Analysis and Visualization of Various Approaches to the Sieve of Eratosthenes
I discovered this algorithm at the end of the 10th chapter (Maps, Hash Tables, and Skip Lists)
of "Data Structures & Algorithms in Python". I ended up writing three versions:
Naive SOE
Out of all the issues, the additional space usage in the return statement takes the cake as the worst.
Suboptimal SOE
A drastic improvement, but still lacking on space usage, and an unfriendly inner-loop interval progression.
Fast SOE
Easy to understand (note the usage of fractional exponent to be explicitly consistent with the mathematical definition of roots)
and just as performant as @Pi Delport's answer.
Empirical Analysis
To compare all three implementations, along with the selected answer from @Pi Delport,
I ran it through 45 iterations from primes in range 100, until primes in range 4500, at intervals of 100
(that's the sweet spot for the visualization, because despite the consistency of the shape of the graphs,
the growth of the naive implementation dwarves the visibility of the other three).
You can tweak the visualization code on the GitHub gist, but here's one sample output:
Bitarray 和 6k±1 的大小和速度
5 秒内 10 亿 3 毫秒内 2^24
我使用 bitarray 来减少占用空间。 bitarray 还允许这样的语句:
筛子代码将在我的旧 i7 上大约 5 秒内完成 10 亿大小的筛子
这是我的筛子:
仅返回素数:
输出 2**24 搜索大约需要 3 毫秒:
Bitarray and 6k±1 for size and speed
1 billion in 5 seconds 2^24 in 3 milliseconds
I used bitarray for a smaller footprint. bitarray also allows statements like:
The sieve code will do 1 billion sized sieve in about 5 seconds on my old i7
Here is my sieve:
To just return primes:
Output Takes about 3 milliseconds for 2**24 search:
不确定我的代码是否有效,有人愿意发表评论吗?
not sure if my code is efficeient, anyone care to comment?
获得主数字的最快方法可能如下:
如果您不需要存储它们,只需使用上面的代码,无需转换为
列表
。sympy.primerange
是一个生成器,因此它不消耗内存。Probably the quickest way to have primary numbers is the following:
In case you don't need to store them, just use the code above without conversion to the
list
.sympy.primerange
is a generator, so it does not consume memory.使用递归和海象运算符:
Using recursion and walrus operator:
基本筛选的
使用 numpy 进行 速度非常快。可能是最快的实现
分段筛(使用更少的内存)
Basic sieve
with
numpy
is amazing fast. May be the fastest implementationSegment sieve (use less memory)
这是我的解决方案,与维基百科相同
here is my solution, the same as Wikipedia
感谢您提出有趣的问题!
现在我从头开始写了两个版本的经典埃拉托斯特尼筛法。
一种是单核(CPU 核心),另一种是多核(使用所有 CPU 核心)。
两个(单核和多核)版本的主要加速是由于使用了官方 Python 包 Numpy。通过命令
python -m pip install numpy
安装一次。多核版本的另一个巨大的加速是由于使用了所有CPU核心,与单核版本相比,N个核心的加速几乎是N倍。现在我只有两核笔记本电脑。我的程序产生了以下计时:
上面的
Original
表示您问题正文中的代码,这是您已经优化的第二个代码。Simple
是我的单核版本。Multi
是我的多核版本。正如您在上面看到的,在原始代码上计算小于 800 万的素数花费了 72 秒,我的单核版本花费了 5.7 秒,这比您的代码快12.7 倍倍,而我的 2 核版本花费了 5.7 秒。 2.6 秒,比我的单核快 2.1 倍 倍,比您的原始代码快 27 倍 倍。
换句话说,与您的代码相比,我的多核代码的速度提高了 27 倍 倍,真的很多!这仅适用于 2 核笔记本电脑。如果您有 8 个或更多核心,那么您将获得更大的加速。但请记住,多核机器上的真正加速仅适用于相当大的素数限制,请尝试 6400 万限制或更大,为此修改代码中的
primes_limit_M = 8
行以设置百万数量。只是为了详细说明。我的单核版本几乎与您的代码类似,但使用 Numpy ,这使得任何数组操作都非常快,而不是使用纯带列表的 pythonic 循环。
多核版本也使用 Numpy,但将筛选范围的数组分割成与计算机上的 CPU 核心数量相同的块,每块数组具有相同的大小。然后,每个 CPU 核心在其自己的数组部分中设置布尔标志。此技术仅在达到内存 (RAM) 速度限制之前提供加速,因此在 CPU 核心数量增长的某个点之后,您不会获得额外的加速。
默认情况下,我在多核版本中使用所有 CPU 核心,但您可以通过设置比计算机上拥有的更少的核心来进行实验,这可能会带来更好的加速,因为不能保证大多数核心都会给出最快的结果。通过将
cpu_cores = mp.cpu_count()
行更改为cpu_cores = 3
等内容来调整内核数量。在线尝试!
Thanks for interesting question!
Right now I wrote from scratch two versions of classical Sieve of Eratosthenes.
One is single-core (CPU core), another one is multi-core (using all CPU cores).
Main speedup of both (single and multi core) versions was due to using Numpy, official Python package. Install in once through command
python -m pip install numpy
. Another great speedup of multi-core version was due to usage of all CPU cores, which gives almost speedup N times for N cores, compared to single core version.Right now I have only two-core laptop. My program produced following timings:
Original
above means your code from your question's body, the second one that you optimized already.Simple
is my single core version.Multi
is my multi core version.As you see above computing primes less than 8 Million took 72 seconds on your original code, my single core version took 5.7 seconds, which is 12.7x times faster than your code, and my 2-core version took 2.6 seconds, which is 2.1x times faster than my single core and 27x times faster than your original code.
In other words I got 27x times speedup in my multi-core code compared to your code, really a lot! And this is only on 2-core laptop. If you have 8 cores or more then you'll get much bigger speedup. But remember that real speedup on multi core machine will be only for quite big prime number limit, try 64 Million limit or bigger, for this modify line
primes_limit_M = 8
in my code to set amount of Millions.Just to dwell into details. My single core version is almost like your code, but uses Numpy which makes any array operations very fast, instead of using pure pythonic loops with lists.
Multi core version also uses Numpy, but splits array with sieved range into as many pieces as there are CPU cores on your machine, each piece of array having equal size. Then every CPU core sets boolean flags in its own part of array. This technique gives speedup only till you hit speed limit of your memory (RAM), so after some point with growth of amount of CPU cores you don't get extra speedup.
By default I use all CPU cores in multi core version, but you may experiment by setting less cores than you have on your machine, this may give even better speedup, because it is not guaranteed that most of cores will give exactly fastest result. Tweak amount of cores through changing line
cpu_cores = mp.cpu_count()
to something likecpu_cores = 3
.Try it online!
如果您寻求更快的速度,您也可以使用 numba 和 cuda
你有一个 Nvidia 处理器。请随意根据需要进行优化。我们使用 2**24,在 Nvidia 3070 上,大约 1600 万个数字,耗时 239 毫秒。
If your looking for even faster, you can use numba and cuda as well if
you have a Nvidia processor. Feel free to optimize as needed. We use 2**24 which is ~16 million numbers at 239 ms on an Nvidia 3070.