快速素因数分解模块
我正在寻找一种实现或清晰的算法,用于在Python、伪代码或其他易于阅读的内容中获取N的素因数。有一些要求/限制:
- N 介于 1 和 ~20 位数字之间
- 没有预先计算的查找表,但记忆化也可以
- 不需要经过数学证明(例如,如果需要,可以依赖哥德巴赫猜想) )
- 不需要精确,如果需要的话可以是概率性/确定性的
我需要一个快速素因数分解算法,不仅为了它本身,而且为了在许多其他算法中使用,比如计算欧拉 phi(n)。
我尝试过维基百科等的其他算法,但要么我无法理解它们(ECM),要么我无法从该算法(Pollard-Brent)创建一个有效的实现。
我对 Pollard-Brent 算法非常感兴趣,所以任何更多关于它的信息/实现都会非常好。
谢谢!
编辑
经过一番折腾后,我创建了一个相当快的素数/因式分解模块。它结合了优化的试除算法、Pollard-Brent 算法、米勒-拉宾素性测试和我在互联网上找到的最快的素数筛。 gcd 是常规 Euclid 的 GCD 实现(二进制 Euclid 的 GCD 比常规的慢得多)。
赏金
哦,高兴,可以获得赏金!但我怎样才能赢呢?
- 在我的模块中查找优化或错误。
- 提供替代/更好的算法/实现。
最完整/有建设性的答案将获得赏金。
最后是模块本身:
import random
def primesbelow(N):
# http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
#""" Input N>=6, Returns a list of primes, 2 <= p < N """
correction = N % 6 > 1
N = {0:N, 1:N-1, 2:N+4, 3:N+3, 4:N+2, 5:N+1}[N%6]
sieve = [True] * (N // 3)
sieve[0] = False
for i in range(int(N ** .5) // 3 + 1):
if sieve[i]:
k = (3 * i + 1) | 1
sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]
smallprimeset = set(primesbelow(100000))
_smallprimeset = 100000
def isprime(n, precision=7):
# http://en.wikipedia.org/wiki/Miller-Rabin_primality_test#Algorithm_and_running_time
if n < 1:
raise ValueError("Out of bounds, first argument must be > 0")
elif n <= 3:
return n >= 2
elif n % 2 == 0:
return False
elif n < _smallprimeset:
return n in smallprimeset
d = n - 1
s = 0
while d % 2 == 0:
d //= 2
s += 1
for repeat in range(precision):
a = random.randrange(2, n - 2)
x = pow(a, d, n)
if x == 1 or x == n - 1: continue
for r in range(s - 1):
x = pow(x, 2, n)
if x == 1: return False
if x == n - 1: break
else: return False
return True
# https://comeoncodeon.wordpress.com/2010/09/18/pollard-rho-brent-integer-factorization/
def pollard_brent(n):
if n % 2 == 0: return 2
if n % 3 == 0: return 3
y, c, m = random.randint(1, n-1), random.randint(1, n-1), random.randint(1, n-1)
g, r, q = 1, 1, 1
while g == 1:
x = y
for i in range(r):
y = (pow(y, 2, n) + c) % n
k = 0
while k < r and g==1:
ys = y
for i in range(min(m, r-k)):
y = (pow(y, 2, n) + c) % n
q = q * abs(x-y) % n
g = gcd(q, n)
k += m
r *= 2
if g == n:
while True:
ys = (pow(ys, 2, n) + c) % n
g = gcd(abs(x - ys), n)
if g > 1:
break
return g
smallprimes = primesbelow(1000) # might seem low, but 1000*1000 = 1000000, so this will fully factor every composite < 1000000
def primefactors(n, sort=False):
factors = []
for checker in smallprimes:
while n % checker == 0:
factors.append(checker)
n //= checker
if checker > n: break
if n < 2: return factors
while n > 1:
if isprime(n):
factors.append(n)
break
factor = pollard_brent(n) # trial division did not fully factor, switch to pollard-brent
factors.extend(primefactors(factor)) # recurse to factor the not necessarily prime factor returned by pollard-brent
n //= factor
if sort: factors.sort()
return factors
def factorization(n):
factors = {}
for p1 in primefactors(n):
try:
factors[p1] += 1
except KeyError:
factors[p1] = 1
return factors
totients = {}
def totient(n):
if n == 0: return 1
try: return totients[n]
except KeyError: pass
tot = 1
for p, exp in factorization(n).items():
tot *= (p - 1) * p ** (exp - 1)
totients[n] = tot
return tot
def gcd(a, b):
if a == b: return a
while b > 0: a, b = b, a % b
return a
def lcm(a, b):
return abs((a // gcd(a, b)) * b)
I am looking for an implementation or clear algorithm for getting the prime factors of N in either python, pseudocode or anything else well-readable. There are a few requirements/constraints:
- N is between 1 and ~20 digits
- No pre-calculated lookup table, memoization is fine though
- Need not to be mathematically proven (e.g. could rely on the Goldbach conjecture if needed)
- Need not to be precise, is allowed to be probabilistic/deterministic if needed
I need a fast prime factorization algorithm, not only for itself, but for usage in many other algorithms like calculating the Euler phi(n).
I have tried other algorithms from Wikipedia and such but either I couldn't understand them (ECM) or I couldn't create a working implementation from the algorithm (Pollard-Brent).
I am really interested in the Pollard-Brent algorithm, so any more information/implementations on it would be really nice.
Thanks!
EDIT
After messing around a little I have created a pretty fast prime/factorization module. It combines an optimized trial division algorithm, the Pollard-Brent algorithm, a miller-rabin primality test and the fastest primesieve I found on the internet. gcd is a regular Euclid's GCD implementation (binary Euclid's GCD is much slower then the regular one).
Bounty
Oh joy, a bounty can be acquired! But how can I win it?
- Find an optimization or bug in my module.
- Provide alternative/better algorithms/implementations.
The answer which is the most complete/constructive gets the bounty.
And finally the module itself:
import random
def primesbelow(N):
# http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
#""" Input N>=6, Returns a list of primes, 2 <= p < N """
correction = N % 6 > 1
N = {0:N, 1:N-1, 2:N+4, 3:N+3, 4:N+2, 5:N+1}[N%6]
sieve = [True] * (N // 3)
sieve[0] = False
for i in range(int(N ** .5) // 3 + 1):
if sieve[i]:
k = (3 * i + 1) | 1
sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]
smallprimeset = set(primesbelow(100000))
_smallprimeset = 100000
def isprime(n, precision=7):
# http://en.wikipedia.org/wiki/Miller-Rabin_primality_test#Algorithm_and_running_time
if n < 1:
raise ValueError("Out of bounds, first argument must be > 0")
elif n <= 3:
return n >= 2
elif n % 2 == 0:
return False
elif n < _smallprimeset:
return n in smallprimeset
d = n - 1
s = 0
while d % 2 == 0:
d //= 2
s += 1
for repeat in range(precision):
a = random.randrange(2, n - 2)
x = pow(a, d, n)
if x == 1 or x == n - 1: continue
for r in range(s - 1):
x = pow(x, 2, n)
if x == 1: return False
if x == n - 1: break
else: return False
return True
# https://comeoncodeon.wordpress.com/2010/09/18/pollard-rho-brent-integer-factorization/
def pollard_brent(n):
if n % 2 == 0: return 2
if n % 3 == 0: return 3
y, c, m = random.randint(1, n-1), random.randint(1, n-1), random.randint(1, n-1)
g, r, q = 1, 1, 1
while g == 1:
x = y
for i in range(r):
y = (pow(y, 2, n) + c) % n
k = 0
while k < r and g==1:
ys = y
for i in range(min(m, r-k)):
y = (pow(y, 2, n) + c) % n
q = q * abs(x-y) % n
g = gcd(q, n)
k += m
r *= 2
if g == n:
while True:
ys = (pow(ys, 2, n) + c) % n
g = gcd(abs(x - ys), n)
if g > 1:
break
return g
smallprimes = primesbelow(1000) # might seem low, but 1000*1000 = 1000000, so this will fully factor every composite < 1000000
def primefactors(n, sort=False):
factors = []
for checker in smallprimes:
while n % checker == 0:
factors.append(checker)
n //= checker
if checker > n: break
if n < 2: return factors
while n > 1:
if isprime(n):
factors.append(n)
break
factor = pollard_brent(n) # trial division did not fully factor, switch to pollard-brent
factors.extend(primefactors(factor)) # recurse to factor the not necessarily prime factor returned by pollard-brent
n //= factor
if sort: factors.sort()
return factors
def factorization(n):
factors = {}
for p1 in primefactors(n):
try:
factors[p1] += 1
except KeyError:
factors[p1] = 1
return factors
totients = {}
def totient(n):
if n == 0: return 1
try: return totients[n]
except KeyError: pass
tot = 1
for p, exp in factorization(n).items():
tot *= (p - 1) * p ** (exp - 1)
totients[n] = tot
return tot
def gcd(a, b):
if a == b: return a
while b > 0: a, b = b, a % b
return a
def lcm(a, b):
return abs((a // gcd(a, b)) * b)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
如果您不想重新发明轮子,请使用库 sympy
使用函数
sympy.ntheory.factorint
示例:
您可以因式分解一些非常大的数字:
If you don't want to reinvent the wheel, use the library sympy
Use the function
sympy.ntheory.factorint
Example:
You can factor some very large numbers:
无需使用
primesbelow
计算smallprimes
,请使用smallprimeset
来计算。smallprimes = (2,) + tuple(n for n in xrange(3,1000,2) if n in smallprimeset)
将您的
primefactors
分成两个函数来处理smallprimes
和其他的pollard_brent
,这可以节省几次迭代,因为所有smallprimes 的幂都将从n 中除。通过考虑 Pomerance、Selfridge、Wagstaff 和 Jaeschke 的验证结果,您可以减少使用 Miller-Rabin 素性测试的
isprime
中的重复。来自 Wiki。编辑 1:更正了
if-else
的返回调用以追加bigfactors 到质因子中的因子。There is no need to calculate
smallprimes
usingprimesbelow
, usesmallprimeset
for that.smallprimes = (2,) + tuple(n for n in xrange(3,1000,2) if n in smallprimeset)
Divide your
primefactors
into two functions for handlingsmallprimes
and other forpollard_brent
, this can save a couple of iterations as all the powers of smallprimes will be divided from n.By considering verified results of Pomerance, Selfridge and Wagstaff and Jaeschke, you can reduce the repetitions in
isprime
which uses Miller-Rabin primality test. From Wiki.Edit 1: Corrected return call of
if-else
to append bigfactors to factors inprimefactors
.即使在目前的情况下,也有几个值得注意的地方。
checker*checker
,而是使用s=ceil(sqrt(num))
和checher
checher
checher
checker*checker
。 sdivmod
而不是%
和//
Even on the current one, there are several spots to be noticed.
checker*checker
every loop, uses=ceil(sqrt(num))
andchecher < s
divmod
instead of%
and//
您可能应该做一些主要检测,您可以在这里查看,
查找素数的快速算法?
不过,您应该阅读整个博客,那里是他列出的一些用于测试素性的算法。
You should probably do some prime detection which you could look here,
Fast algorithm for finding prime numbers?
You should read that entire blog though, there is a few algorithms that he lists for testing primality.
有一个 python 库,其中包含一系列素性测试(包括不该做的错误测试)。它称为 pyprimes。我认为为了后代的目的值得一提。我认为它不包括你提到的算法。
There's a python library with a collection of primality tests (including incorrect ones for what not to do). It's called pyprimes. Figured it's worth mentioning for posterity's purpose. I don't think it includes the algorithms you mentioned.
您可以分解到一个极限,然后使用布伦特来获得更高的因子
You could factorize up to a limit then use brent to get higher factors
非常有趣的问题,谢谢!
今天我决定用Python从头开始为您实现椭圆曲线ECM分解方法。
我试图编写干净且易于理解的代码。您只需将整个函数
FactorECM()
从我的代码复制粘贴到您的代码中,它就可以正常工作,无需任何其他调整或依赖项。由于代码中的一些简化,算法并未高度优化,例如它不使用 多处理 模块使用进程和所有CPU核心。基本上我的算法是单核的单因子数。
我在代码中使用了以下子算法:试除因式分解、费马概率检验,埃拉托斯特尼筛法(素数生成器),欧几里得算法 (计算最大公约数 (GCD)、扩展欧几里得算法(带有 Bezu 系数的 GCD)、模乘逆,从右到左二进制指数(用于椭圆点乘法),椭圆曲线算法(点加法和乘法), Lenstra 椭圆曲线分解。
注意。如果您安装 gmpy2,您可以将我的代码加速 2.5 倍倍通过
python -m pip install gmpy2
命令行模块。但不要忘记取消注释行#import gmpy2
在我的代码中,如果取消注释,则将应用 2.5 倍提升。我使用 gmpy2 是为了更快地实现 gmpy2.gcdext () 函数,它实现模乘逆所需的扩展欧几里得算法。一般来说,ECM 算法如果优化得当(例如用 C++ 编写),就能够分解相当大的数字,即使是最难的 100 位数字(30 位数字),由两个 50 位素数组成,也可以在几秒钟内分解。
我的算法是根据维基百科中的ECM编写的,如下所示:
检查如果数字小于 2^16,则通过 Trial Division 方法将其分解。返回结果。
检查数字是否可能是素数,因为我使用 费马测试 与32 次试验。如果数字是素数,则将其作为唯一因子返回。
随机生成曲线参数
A, X, Y
并从曲线方程Y^2 = X^3 + AX + B (mod N)导出
。检查曲线是否正常,值B
4 * A ** 3 - 27 * B ** 2
应为非零。通过埃拉托斯特尼筛生成小素数,低于我们
界限的素数.每个素数提高到某个小幂,这个提高的素数将被称为
K
。当它比某些Bound2
小时,我会加注力量,在我的例子中是Sqrt(Bound)
。计算椭圆点乘法
P = k * P
,其中 K 取自上一步,P 为 (X, Y)。根据椭圆曲线算术Wiki进行计算。点乘法使用模乘逆,计算
GCD(SomeValue, N )
根据扩展欧几里得算法维基。如果此 GCD 不是 1,则它会给出 N 的重要因子。收集此因子,将其从数字中删除,然后重新运行 ECM 算法(上面的步骤1.-6.
)以获取剩余的数字.如果 Bound 之前的所有素数都相乘并且没有给出任何因子,则使用另一个随机曲线和更大的 Bound 再次重新运行 ECM 分解算法(上面的步骤
1.-6.
)。在我的代码中,我通过将 512 添加到旧边界来获取新边界。请参阅在
Test()
函数中使用分解函数的示例。下面的代码后面是用于分解最难的 72 位随机数(由两个 36 位素数组成)的示例控制台输出。如果您想要不同大小的数字,请将我的代码中的bits = 72
修改为您想要的输入随机数的位大小。在线试用!
控制台输出示例:
Very interesting question, thanks!
Today I decided to implement for you from scratch Elliptic Curve ECM factorization method in Python.
I tried to make clean and easy understandable code. You can just copy-paste whole function
FactorECM()
from my code into your code and it will work without any other tweaking or dependencies.Due to some simplifications in the code algorithm is NOT highly optimized, e.g. it doesn't use multiprocessing module to use processes and all CPU cores. Basically my algorithm is single core for single factored number.
I used following sub-algorithms in my code: Trial Division Factorization, Fermat Probability Test, Sieve of Eratosthenes (prime numbers generator), Euclidean Algorithm (computing Greatest Common Divisor, GCD), Extended Euclidean Algorithm (GCD with Bezu coefficients), Modular Multiplicative Inverse, Right-to-Left Binary Exponentation (for elliptic point multiplication), Elliptic Curve Arithmetics (point addition and multiplication), Lenstra Elliptic Curve Factorization.
NOTE. You can speedup my code 2.5x times if you install gmpy2 module through
python -m pip install gmpy2
command line. But don't forget to uncomment line#import gmpy2
in my code, if uncommented then 2.5x boost will be applied. I used gmpy2 for its considerably faster implementation of gmpy2.gcdext() functions, which implements Extended Euclidean Algorithm needed for Modular Multiplicative Inverse.ECM algorithm in general, if well optimized (for example written in C++), is capable of factoring quite big numbers, even hardest 100-bit number (30 digits), consisting of two 50-bit primes, can be factored within several seconds.
My algorithm is written according to ECM in Wikipedia and is as following:
Check if number is smaller than 2^16, then factor it through Trial Division method. Return result.
Check if number is probably prime with high condifence, for that I use Fermat Test with 32 trials. If number is primes return it as only factor.
Generate curve parameters
A, X, Y
randomly and deriveB
from curve equationY^2 = X^3 + AX + B (mod N)
. Check if curve is OK, value4 * A ** 3 - 27 * B ** 2
should be non-zero.Generate small primes through Sieve of Eratosthenes, primes below our
Bound
. Each prime raise to some small power, this raised prime would be calledK
. I do raising into power while it is smaller than someBound2
, which isSqrt(Bound)
in my case.Compute elliptic point multiplication
P = k * P
, where K taken from previous step and P is (X, Y). Compute according to Elliptic Curve Arithmetics Wiki.Point multiplication uses Modular Multiplicative Inverse, which computes
GCD(SomeValue, N)
according to Extended Euclidean Algorithm Wiki. If this GCD is not 1, then it gives non-trivial factor of N. Collect this factor, remove it from number and re-run ECM algorithm (steps1.-6.
above) for remaining number.If all primes till Bound were multiplied and gave no factor then re-run ECM factorization algorithm (steps
1.-6.
above) again with another random curve and bigger Bound. In my code I take new bound by adding 512 to old bound.See example of usage of my factoring function inside
Test()
function. After code below is located example console output for factoring hardest 72-bit random number (consisting of two 36-bit primes). If you want different size of number than modifybits = 72
in my code to your desired bit size of input random number.Try it online!
Example console output:
我刚刚在分解数字
2**1427 * 31
时遇到了此代码中的错误。此代码片段:
应该重写为
在实际输入上执行得更快。平方根很慢 - 基本上相当于许多乘法 -,
smallprimes
只有几十个成员,这样我们就可以从紧密的内部循环,这在分解像2**1427
这样的数字时肯定很有帮助。根本没有理由计算sqrt(2**1427)
、sqrt(2**1426)
、sqrt(2**1425)
等等,当我们关心的是“检查器的[平方]是否超过n
”时。重写的代码在出现大数字时不会抛出异常,根据
timeit
(在示例输入2
和2**718 * 31
)。另请注意,
isprime(2)
返回错误的结果,但只要我们不依赖它,就可以。恕我直言,你应该将该函数的介绍重写为I just ran into a bug in this code when factoring the number
2**1427 * 31
.This code snippet:
should be rewritten into
which will likely perform faster on realistic inputs anyway. Square root is slow — basically the equivalent of many multiplications —,
smallprimes
only has a few dozen members, and this way we remove the computation ofn ** .5
from the tight inner loop, which is certainly helpful when factoring numbers like2**1427
. There's simply no reason to computesqrt(2**1427)
,sqrt(2**1426)
,sqrt(2**1425)
, etc. etc., when all we care about is "does the [square of the] checker exceedn
".The rewritten code doesn't throw exceptions when presented with big numbers, and is roughly twice as fast according to
timeit
(on sample inputs2
and2**718 * 31
).Also notice that
isprime(2)
returns the wrong result, but this is okay as long as we don't rely on it. IMHO you should rewrite the intro of that function as最多有 1 个因子大于 sqrt(n),因此所有测试直到 n 的数字的方法都太慢。测试直到 sqrt(n),并单独检查最后一个因子。
At most 1 of the factors is bigger than sqrt(n), so all methods that test numbers up to n are too slow. Test up to sqrt(n), and check the last factor separately.