快速素因数分解模块

发布于 2024-10-10 20:28:09 字数 4492 浏览 10 评论 0原文

我正在寻找一种实现清晰的算法,用于在Python、伪代码或其他易于阅读的内容中获取N的素因数。有一些要求/限制:

  • N 介于 1 和 ~20 位数字之间
  • 没有预先计算的查找表,但记忆化也可以
  • 不需要经过数学证明(例如,如果需要,可以依赖哥德巴赫猜想) )
  • 不需要精确,如果需要的话可以是概率性/确定性的

我需要一个快速素因数分解算法,不仅为了它本身,而且为了在许多其他算法中使用,比如计算欧拉 phi(n)

我尝试过维基百科等的其他算法,但要么我无法理解它们(ECM),要么我无法从该算法(Pollard-Brent)创建一个有效的实现。

我对 Pollard-Brent 算法非常感兴趣,所以任何更多关于它的信息/实现都会非常好。

谢谢!

编辑

经过一番折腾后,我创建了一个相当快的素数/因式分解模块。它结合了优化的试除算法、Pollard-Brent 算法、米勒-拉宾素性测试和我在互联网上找到的最快的素数筛。 gcd 是常规 Euclid 的 GCD 实现(二进制 Euclid 的 GCD 比常规的慢得多)。

赏金

哦,高兴,可以获得赏金!但我怎样才能赢呢?

  • 在我的模块中查找优化或错误。
  • 提供替代/更好的算法/实现。

最完整/有建设性的答案将获得赏金。

最后是模块本身:

import random

def primesbelow(N):
    # http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    #""" Input N>=6, Returns a list of primes, 2 <= p < N """
    correction = N % 6 > 1
    N = {0:N, 1:N-1, 2:N+4, 3:N+3, 4:N+2, 5:N+1}[N%6]
    sieve = [True] * (N // 3)
    sieve[0] = False
    for i in range(int(N ** .5) // 3 + 1):
        if sieve[i]:
            k = (3 * i + 1) | 1
            sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
            sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
    return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]

smallprimeset = set(primesbelow(100000))
_smallprimeset = 100000
def isprime(n, precision=7):
    # http://en.wikipedia.org/wiki/Miller-Rabin_primality_test#Algorithm_and_running_time
    if n < 1:
        raise ValueError("Out of bounds, first argument must be > 0")
    elif n <= 3:
        return n >= 2
    elif n % 2 == 0:
        return False
    elif n < _smallprimeset:
        return n in smallprimeset


    d = n - 1
    s = 0
    while d % 2 == 0:
        d //= 2
        s += 1

    for repeat in range(precision):
        a = random.randrange(2, n - 2)
        x = pow(a, d, n)
    
        if x == 1 or x == n - 1: continue
    
        for r in range(s - 1):
            x = pow(x, 2, n)
            if x == 1: return False
            if x == n - 1: break
        else: return False

    return True

# https://comeoncodeon.wordpress.com/2010/09/18/pollard-rho-brent-integer-factorization/
def pollard_brent(n):
    if n % 2 == 0: return 2
    if n % 3 == 0: return 3

    y, c, m = random.randint(1, n-1), random.randint(1, n-1), random.randint(1, n-1)
    g, r, q = 1, 1, 1
    while g == 1:
        x = y
        for i in range(r):
            y = (pow(y, 2, n) + c) % n

        k = 0
        while k < r and g==1:
            ys = y
            for i in range(min(m, r-k)):
                y = (pow(y, 2, n) + c) % n
                q = q * abs(x-y) % n
            g = gcd(q, n)
            k += m
        r *= 2
    if g == n:
        while True:
            ys = (pow(ys, 2, n) + c) % n
            g = gcd(abs(x - ys), n)
            if g > 1:
                break

    return g

smallprimes = primesbelow(1000) # might seem low, but 1000*1000 = 1000000, so this will fully factor every composite < 1000000
def primefactors(n, sort=False):
    factors = []

    for checker in smallprimes:
        while n % checker == 0:
            factors.append(checker)
            n //= checker
        if checker > n: break

    if n < 2: return factors

    while n > 1:
        if isprime(n):
            factors.append(n)
            break
        factor = pollard_brent(n) # trial division did not fully factor, switch to pollard-brent
        factors.extend(primefactors(factor)) # recurse to factor the not necessarily prime factor returned by pollard-brent
        n //= factor

    if sort: factors.sort()

    return factors

def factorization(n):
    factors = {}
    for p1 in primefactors(n):
        try:
            factors[p1] += 1
        except KeyError:
            factors[p1] = 1
    return factors

totients = {}
def totient(n):
    if n == 0: return 1

    try: return totients[n]
    except KeyError: pass

    tot = 1
    for p, exp in factorization(n).items():
        tot *= (p - 1)  *  p ** (exp - 1)

    totients[n] = tot
    return tot

def gcd(a, b):
    if a == b: return a
    while b > 0: a, b = b, a % b
    return a

def lcm(a, b):
    return abs((a // gcd(a, b)) * b)

I am looking for an implementation or clear algorithm for getting the prime factors of N in either python, pseudocode or anything else well-readable. There are a few requirements/constraints:

  • N is between 1 and ~20 digits
  • No pre-calculated lookup table, memoization is fine though
  • Need not to be mathematically proven (e.g. could rely on the Goldbach conjecture if needed)
  • Need not to be precise, is allowed to be probabilistic/deterministic if needed

I need a fast prime factorization algorithm, not only for itself, but for usage in many other algorithms like calculating the Euler phi(n).

I have tried other algorithms from Wikipedia and such but either I couldn't understand them (ECM) or I couldn't create a working implementation from the algorithm (Pollard-Brent).

I am really interested in the Pollard-Brent algorithm, so any more information/implementations on it would be really nice.

Thanks!

EDIT

After messing around a little I have created a pretty fast prime/factorization module. It combines an optimized trial division algorithm, the Pollard-Brent algorithm, a miller-rabin primality test and the fastest primesieve I found on the internet. gcd is a regular Euclid's GCD implementation (binary Euclid's GCD is much slower then the regular one).

Bounty

Oh joy, a bounty can be acquired! But how can I win it?

  • Find an optimization or bug in my module.
  • Provide alternative/better algorithms/implementations.

The answer which is the most complete/constructive gets the bounty.

And finally the module itself:

import random

def primesbelow(N):
    # http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    #""" Input N>=6, Returns a list of primes, 2 <= p < N """
    correction = N % 6 > 1
    N = {0:N, 1:N-1, 2:N+4, 3:N+3, 4:N+2, 5:N+1}[N%6]
    sieve = [True] * (N // 3)
    sieve[0] = False
    for i in range(int(N ** .5) // 3 + 1):
        if sieve[i]:
            k = (3 * i + 1) | 1
            sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
            sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
    return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]

smallprimeset = set(primesbelow(100000))
_smallprimeset = 100000
def isprime(n, precision=7):
    # http://en.wikipedia.org/wiki/Miller-Rabin_primality_test#Algorithm_and_running_time
    if n < 1:
        raise ValueError("Out of bounds, first argument must be > 0")
    elif n <= 3:
        return n >= 2
    elif n % 2 == 0:
        return False
    elif n < _smallprimeset:
        return n in smallprimeset


    d = n - 1
    s = 0
    while d % 2 == 0:
        d //= 2
        s += 1

    for repeat in range(precision):
        a = random.randrange(2, n - 2)
        x = pow(a, d, n)
    
        if x == 1 or x == n - 1: continue
    
        for r in range(s - 1):
            x = pow(x, 2, n)
            if x == 1: return False
            if x == n - 1: break
        else: return False

    return True

# https://comeoncodeon.wordpress.com/2010/09/18/pollard-rho-brent-integer-factorization/
def pollard_brent(n):
    if n % 2 == 0: return 2
    if n % 3 == 0: return 3

    y, c, m = random.randint(1, n-1), random.randint(1, n-1), random.randint(1, n-1)
    g, r, q = 1, 1, 1
    while g == 1:
        x = y
        for i in range(r):
            y = (pow(y, 2, n) + c) % n

        k = 0
        while k < r and g==1:
            ys = y
            for i in range(min(m, r-k)):
                y = (pow(y, 2, n) + c) % n
                q = q * abs(x-y) % n
            g = gcd(q, n)
            k += m
        r *= 2
    if g == n:
        while True:
            ys = (pow(ys, 2, n) + c) % n
            g = gcd(abs(x - ys), n)
            if g > 1:
                break

    return g

smallprimes = primesbelow(1000) # might seem low, but 1000*1000 = 1000000, so this will fully factor every composite < 1000000
def primefactors(n, sort=False):
    factors = []

    for checker in smallprimes:
        while n % checker == 0:
            factors.append(checker)
            n //= checker
        if checker > n: break

    if n < 2: return factors

    while n > 1:
        if isprime(n):
            factors.append(n)
            break
        factor = pollard_brent(n) # trial division did not fully factor, switch to pollard-brent
        factors.extend(primefactors(factor)) # recurse to factor the not necessarily prime factor returned by pollard-brent
        n //= factor

    if sort: factors.sort()

    return factors

def factorization(n):
    factors = {}
    for p1 in primefactors(n):
        try:
            factors[p1] += 1
        except KeyError:
            factors[p1] = 1
    return factors

totients = {}
def totient(n):
    if n == 0: return 1

    try: return totients[n]
    except KeyError: pass

    tot = 1
    for p, exp in factorization(n).items():
        tot *= (p - 1)  *  p ** (exp - 1)

    totients[n] = tot
    return tot

def gcd(a, b):
    if a == b: return a
    while b > 0: a, b = b, a % b
    return a

def lcm(a, b):
    return abs((a // gcd(a, b)) * b)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

披肩女神 2024-10-17 20:28:09

如果您不想重新发明轮子,请使用库 sympy

pip install sympy

使用函数 sympy.ntheory.factorint

给定一个正整数nfactorint(n)返回一个包含
n 作为键的质因数及其各自的重数
作为价值观。例如:

示例:

>>> from sympy.ntheory import factorint
>>> factorint(10**20+1)
{73: 1, 5964848081: 1, 1676321: 1, 137: 1}

您可以因式分解一些非常大的数字:

>>> factorint(10**100+1)
{401: 1, 5964848081: 1, 1676321: 1, 1601: 1, 1201: 1, 137: 1, 73: 1, 129694419029057750551385771184564274499075700947656757821537291527196801: 1}

If you don't want to reinvent the wheel, use the library sympy

pip install sympy

Use the function sympy.ntheory.factorint

Given a positive integer n, factorint(n) returns a dict containing
the prime factors of n as keys and their respective multiplicities
as values. For example:

Example:

>>> from sympy.ntheory import factorint
>>> factorint(10**20+1)
{73: 1, 5964848081: 1, 1676321: 1, 137: 1}

You can factor some very large numbers:

>>> factorint(10**100+1)
{401: 1, 5964848081: 1, 1676321: 1, 1601: 1, 1201: 1, 137: 1, 73: 1, 129694419029057750551385771184564274499075700947656757821537291527196801: 1}
裂开嘴轻声笑有多痛 2024-10-17 20:28:09

无需使用 primesbelow 计算 smallprimes,请使用 smallprimeset 来计算。

smallprimes = (2,) + tuple(n for n in xrange(3,1000,2) if n in smallprimeset)

将您的 primefactors 分成两个函数来处理 smallprimes 和其他的 pollard_brent,这可以节省几次迭代,因为所有smallprimes 的幂都将从n 中除。

def primefactors(n, sort=False):
    factors = []

    limit = int(n ** .5) + 1
    for checker in smallprimes:
        print smallprimes[-1]
        if checker > limit: break
        while n % checker == 0:
            factors.append(checker)
            n //= checker


    if n < 2: return factors
    else : 
        factors.extend(bigfactors(n,sort))
        return factors

def bigfactors(n, sort = False):
    factors = []
    while n > 1:
        if isprime(n):
            factors.append(n)
            break
        factor = pollard_brent(n) 
        factors.extend(bigfactors(factor,sort)) # recurse to factor the not necessarily prime factor returned by pollard-brent
        n //= factor

    if sort: factors.sort()    
    return factors

通过考虑 Pomerance、Selfridge、Wagstaff 和 Jaeschke 的验证结果,您可以减少使用 Miller-Rabin 素性测试的 isprime 中的重复。来自 Wiki

  • 如果 n < 1,373,653,测试a=2和3就足够了;
  • 如果 n < 9,080,191,测试a=31和73就足够了;
  • 如果 n < 4,759,123,141,测试a=2、7、61就足够了;
  • 如果 n < 2,152,302,898,747,测试a = 2,3,5,7,11就足够了;
  • 如果 n < 3,474,749,660,383,测试a = 2,3,5,7,11,13就足够了;
  • 如果 n < 341,550,071,728,321,测试 a = 2, 3, 5, 7, 11, 13, 和 17 就足够了。

编辑 1:更正了 if-else 的返回调用以追加bigfactors 到质因子中的因子。

There is no need to calculate smallprimes using primesbelow, use smallprimeset for that.

smallprimes = (2,) + tuple(n for n in xrange(3,1000,2) if n in smallprimeset)

Divide your primefactors into two functions for handling smallprimes and other for pollard_brent, this can save a couple of iterations as all the powers of smallprimes will be divided from n.

def primefactors(n, sort=False):
    factors = []

    limit = int(n ** .5) + 1
    for checker in smallprimes:
        print smallprimes[-1]
        if checker > limit: break
        while n % checker == 0:
            factors.append(checker)
            n //= checker


    if n < 2: return factors
    else : 
        factors.extend(bigfactors(n,sort))
        return factors

def bigfactors(n, sort = False):
    factors = []
    while n > 1:
        if isprime(n):
            factors.append(n)
            break
        factor = pollard_brent(n) 
        factors.extend(bigfactors(factor,sort)) # recurse to factor the not necessarily prime factor returned by pollard-brent
        n //= factor

    if sort: factors.sort()    
    return factors

By considering verified results of Pomerance, Selfridge and Wagstaff and Jaeschke, you can reduce the repetitions in isprime which uses Miller-Rabin primality test. From Wiki.

  • if n < 1,373,653, it is enough to test a = 2 and 3;
  • if n < 9,080,191, it is enough to test a = 31 and 73;
  • if n < 4,759,123,141, it is enough to test a = 2, 7, and 61;
  • if n < 2,152,302,898,747, it is enough to test a = 2, 3, 5, 7, and 11;
  • if n < 3,474,749,660,383, it is enough to test a = 2, 3, 5, 7, 11, and 13;
  • if n < 341,550,071,728,321, it is enough to test a = 2, 3, 5, 7, 11, 13, and 17.

Edit 1: Corrected return call of if-else to append bigfactors to factors in primefactors.

南风起 2024-10-17 20:28:09

即使在目前的情况下,也有几个值得注意的地方。

  1. 不要在每个循环都执行 checker*checker,而是使用 s=ceil(sqrt(num))checher checher checher checker*checker 。 s
  2. checher 每次应加 2,忽略除 2 之外的所有偶数
  3. 使用 divmod 而不是 %//

Even on the current one, there are several spots to be noticed.

  1. Don't do checker*checker every loop, use s=ceil(sqrt(num)) and checher < s
  2. checher should plus 2 each time, ignore all even numbers except 2
  3. Use divmod instead of % and //
只想待在家 2024-10-17 20:28:09

您可能应该做一些主要检测,您可以在这里查看,
查找素数的快速算法?

不过,您应该阅读整个博客,那里是他列出的一些用于测试素性的算法。

You should probably do some prime detection which you could look here,
Fast algorithm for finding prime numbers?

You should read that entire blog though, there is a few algorithms that he lists for testing primality.

ぺ禁宫浮华殁 2024-10-17 20:28:09

有一个 python 库,其中包含一系列素性测试(包括不该做的错误测试)。它称为 pyprimes。我认为为了后代的目的值得一提。我认为它不包括你提到的算法。

There's a python library with a collection of primality tests (including incorrect ones for what not to do). It's called pyprimes. Figured it's worth mentioning for posterity's purpose. I don't think it includes the algorithms you mentioned.

近箐 2024-10-17 20:28:09

您可以分解到一个极限,然后使用布伦特来获得更高的因子

from fractions import gcd
from random import randint

def brent(N):
   if N%2==0: return 2
   y,c,m = randint(1, N-1),randint(1, N-1),randint(1, N-1)
   g,r,q = 1,1,1
   while g==1:             
       x = y
       for i in range(r):
          y = ((y*y)%N+c)%N
       k = 0
       while (k<r and g==1):
          ys = y
          for i in range(min(m,r-k)):
             y = ((y*y)%N+c)%N
             q = q*(abs(x-y))%N
          g = gcd(q,N)
          k = k + m
       r = r*2
   if g==N:
       while True:
          ys = ((ys*ys)%N+c)%N
          g = gcd(abs(x-ys),N)
          if g>1:  break
   return g

def factorize(n1):
    if n1==0: return []
    if n1==1: return [1]
    n=n1
    b=[]
    p=0
    mx=1000000
    while n % 2 ==0 : b.append(2);n//=2
    while n % 3 ==0 : b.append(3);n//=3
    i=5
    inc=2
    while i <=mx:
       while n % i ==0 : b.append(i); n//=i
       i+=inc
       inc=6-inc
    while n>mx:
      p1=n
      while p1!=p:
          p=p1
          p1=brent(p)
      b.append(p1);n//=p1 
    if n!=1:b.append(n)   
    return sorted(b)

from functools import reduce
#n= 2**1427 * 31 #
n= 67898771  * 492574361 * 10000223 *305175781* 722222227*880949 *908909
li=factorize(n)
print (li)
print (n - reduce(lambda x,y :x*y ,li))

You could factorize up to a limit then use brent to get higher factors

from fractions import gcd
from random import randint

def brent(N):
   if N%2==0: return 2
   y,c,m = randint(1, N-1),randint(1, N-1),randint(1, N-1)
   g,r,q = 1,1,1
   while g==1:             
       x = y
       for i in range(r):
          y = ((y*y)%N+c)%N
       k = 0
       while (k<r and g==1):
          ys = y
          for i in range(min(m,r-k)):
             y = ((y*y)%N+c)%N
             q = q*(abs(x-y))%N
          g = gcd(q,N)
          k = k + m
       r = r*2
   if g==N:
       while True:
          ys = ((ys*ys)%N+c)%N
          g = gcd(abs(x-ys),N)
          if g>1:  break
   return g

def factorize(n1):
    if n1==0: return []
    if n1==1: return [1]
    n=n1
    b=[]
    p=0
    mx=1000000
    while n % 2 ==0 : b.append(2);n//=2
    while n % 3 ==0 : b.append(3);n//=3
    i=5
    inc=2
    while i <=mx:
       while n % i ==0 : b.append(i); n//=i
       i+=inc
       inc=6-inc
    while n>mx:
      p1=n
      while p1!=p:
          p=p1
          p1=brent(p)
      b.append(p1);n//=p1 
    if n!=1:b.append(n)   
    return sorted(b)

from functools import reduce
#n= 2**1427 * 31 #
n= 67898771  * 492574361 * 10000223 *305175781* 722222227*880949 *908909
li=factorize(n)
print (li)
print (n - reduce(lambda x,y :x*y ,li))
记忆で 2024-10-17 20:28:09

非常有趣的问题,谢谢!

今天我决定用Python从头开始为您实现椭圆曲线ECM分解方法。

我试图编写干净且易于理解的代码。您只需将整个函数 FactorECM() 从我的代码复制粘贴到您的代码中,它就可以正常工作,无需任何其他调整或依赖项。

由于代码中的一些简化,算法并未高度优化,例如它不使用 多处理 模块使用进程和所有CPU核心。基本上我的算法是单核的单因子数。

我在代码中使用了以下子算法:试除因式分解费马概率检验埃拉托斯特尼筛法(素数生成器),欧几里得算法 (计算最大公约数 (GCD)、扩展欧几里得算法(带有 Bezu 系数的 GCD)、模乘逆从右到左二进制指数(用于椭圆点乘法),椭圆曲线算法(点加法和乘法), Lenstra 椭圆曲线分解

注意。如果您安装 gmpy2,您可以将我的代码加速 2.5 倍倍通过 python -m pip install gmpy2 命令行模块。但不要忘记取消注释行 #import gmpy2 在我的代码中,如果取消注释,则将应用 2.5 倍提升。我使用 gmpy2 是为了更快地实现 gmpy2.gcdext () 函数,它实现模乘逆所需的扩展欧几里得算法。

一般来说,ECM 算法如果优化得当(例如用 C++ 编写),就能够分解相当大的数字,即使是最难的 100 位数字(30 位数字),由两个 50 位素数组成,也可以在几秒钟内分解。

我的算法是根据维基百科中的ECM编写的,如下所示:

  1. 检查如果数字小于 2^16,则通过 Trial Division 方法将其分解。返回结果。

  2. 检查数字是否可能是素数,因为我使用 费马测试 与32 次试验。如果数字是素数,则将其作为唯一因子返回。

  3. 随机生成曲线参数A, X, Y并从曲线方程Y^2 = X^3 + AX + B (mod N)导出B。检查曲线是否正常,值 4 * A ** 3 - 27 * B ** 2 应为非零。

  4. 通过埃拉托斯特尼筛生成小素数,低于我们界限的素数.每个素数提高到某个小幂,这个提高的素数将被称为K。当它比某些 Bound2 小时,我会加注力量,在我的例子中是 Sqrt(Bound)


  5. 计算椭圆点乘法P = k * P,其中 K 取自上一步,P 为 (X, Y)。根据椭圆曲线算术Wiki进行计算。

  6. 点乘法使用模乘逆,计算GCD(SomeValue, N ) 根据扩展欧几里得算法维基。如果此 GCD 不是 1,则它会给出 N 的重要因子。收集此因子,将其从数字中删除,然后重新运行 ECM 算法(上面的步骤 1.-6.)以获取剩余的数字.

  7. 如果 Bound 之前的所有素数都相乘并且没有给出任何因子,则使用另一个随机曲线和更大的 Bound 再次重新运行 ECM 分解算法(上面的步骤 1.-6.)。在我的代码中,我通过将 512 添加到旧边界来获取新边界。

请参阅在 Test() 函数中使用分解函数的示例。下面的代码后面是用于分解最难的 72 位随机数(由两个 36 位素数组成)的示例控制台输出。如果您想要不同大小的数字,请将我的代码中的 bits = 72 修改为您想要的输入随机数的位大小。

在线试用!

def FactorECM(N0, *, verbose = False):
    # https://en.wikipedia.org/wiki/Lenstra_elliptic-curve_factorization
    import math, random, time; gmpy2 = None
    #import gmpy2
    def Factor_TrialDiv(x):
        # https://en.wikipedia.org/wiki/Trial_division
        fs = []
        while (x & 1) == 0:
            fs.append(2)
            x >>= 1
        for d in range(3, x + 1, 2):
            if d * d > x:
                break
            while x % d == 0:
                fs.append(d)
                x //= d
        if x > 1:
            fs.append(x)
        return sorted(fs)
    def IsProbablyPrime_Fermat(n, trials = 32):
        # https://en.wikipedia.org/wiki/Fermat_primality_test
        for i in range(trials):
            if pow(random.randint(2, n - 2), n - 1, n) != 1:
                return False
        return True
    def GenPrimes_SieveOfEratosthenes(end):
        # https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
        composite = [False] * end
        for p in range(2, int(math.sqrt(end) + 3)):
            if composite[p]:
                continue
            for i in range(p * p, end, p):
                composite[i] = True
        for p in range(2, end):
            if not composite[p]:
                yield p
    def GCD(a, b):
        # https://en.wikipedia.org/wiki/Euclidean_algorithm
        while b != 0:
            a, b = b, a % b
        return a
    def EGCD(a, b):
        # https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm
        if gmpy2 is None:
            ro, r, so, s = a, b, 1, 0
            while r != 0:
                ro, (q, r) = r, divmod(ro, r)
                so, s = s, so - q * s
            return ro, so, (ro - so * a) // b
        else:
            return tuple(map(int, gmpy2.gcdext(a, b)))
    def ModularInverse(a, n):
        # https://en.wikipedia.org/wiki/Modular_multiplicative_inverse
        g, s, t = EGCD(a, n)
        if g != 1:
            raise ValueError(a)
        return s % n
    def EllipticCurveAdd(N, A, B, X0, Y0, X1, Y1):
        # https://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication
        if X0 == X1 and Y0 == Y1:
            # Double
            l = ((3 * X0 ** 2 + A) * ModularInverse(2 * Y0, N)) % N
            x = (l ** 2 - 2 * X0) % N
            y = (l * (X0 - x) - Y0) % N
        else:
            # Add
            l = ((Y1 - Y0) * ModularInverse(X1 - X0, N)) % N
            x = (l ** 2 - X0 - X1) % N
            y = (l * (X0 - x) - Y0) % N
        return x, y
    def EllipticCurveMul(N, A, B, X, Y, k):
        # https://en.wikipedia.org/wiki/Modular_exponentiation#Right-to-left_binary_method
        assert k >= 2, k
        k -= 1
        BX, BY = X, Y
        while k != 0:
            if k & 1:
                X, Y = EllipticCurveAdd(N, A, B, X, Y, BX, BY)
            BX, BY = EllipticCurveAdd(N, A, B, BX, BY, BX, BY)
            k >>= 1
        return X, Y
    
    bound_start = 1 << 9
    
    def Main(N, *, bound = bound_start, icurve = 0):
        def NextFactorECM(x):
            return Main(x, bound = bound + bound_start, icurve = icurve + 1)
        def PrimePow(p, *, bound2 = int(math.sqrt(bound) + 1.01)):
            mp = p
            while True:
                mp *= p
                if mp >= bound2:
                    return mp // p
        
        if N < (1 << 16):
            fs = Factor_TrialDiv(N)
            if verbose and len(fs) >= 2:
                print('Factors from TrialDiv:', fs, flush = True)
            return fs
        
        if IsProbablyPrime_Fermat(N):
            return [N]
        
        if verbose:
            print(f'Curve {icurve:>4},  bound 2^{math.log2(bound):>7.3f}', flush = True)
        
        while True:
            X, Y, A = [random.randrange(N) for i in range(3)]
            B = (Y ** 2 - X ** 3 - A * X) % N
            if 4 * A ** 3 - 27 * B ** 2 == 0:
                continue
            break
        
        for p in GenPrimes_SieveOfEratosthenes(bound):
            k = PrimePow(p)
            try:
                X, Y = EllipticCurveMul(N, A, B, X, Y, k)
            except ValueError as ex:
                g = GCD(ex.args[0], N)
                assert g > 1, g
                if g != N:
                    if verbose:
                        print('Factor from ECM:', g, flush = True)
                    return sorted(NextFactorECM(g) + NextFactorECM(N // g))
                else:
                    return NextFactorECM(N)
        
        return NextFactorECM(N)
    
    if verbose:
        print(f'ECM factoring N: {N0} (2^{math.log2(max(1, N0)):.2f})', flush = True)
        tb = time.time()
        fs = Main(N0)
        tb = time.time() - tb
        print(f'Factored N {N0}: {fs}')
        print(f'Time {tb:.3f} sec.', flush = True)
        return fs
    else:
        return Main(N0)

def Test():
    import random
    def IsProbablyPrime_Fermat(n, trials = 32):
        for i in range(trials):
            if pow(random.randint(2, n - 2), n - 1, n) != 1:
                return False
        return True
    def RandPrime(bits):
        while True:
            p = random.randrange(1 << (bits - 1), 1 << bits) | 1
            if IsProbablyPrime_Fermat(p):
                return p
    bits = 72
    N = RandPrime((bits + 1) // 2) * RandPrime(bits // 2) * random.randrange(1 << 16)
    FactorECM(N, verbose = True)

if __name__ == '__main__':
    Test()

控制台输出示例:

ECM factoring N: 25005272280974861996134424 (2^84.37)
Curve    0,  bound 2^  9.000
Factor from ECM: 2
Curve    1,  bound 2^ 10.000
Factor from ECM: 4
Factors from TrialDiv: [2, 2]
Curve    2,  bound 2^ 10.585
Factor from ECM: 1117
Curve    3,  bound 2^ 11.000
Curve    4,  bound 2^ 11.322
Curve    5,  bound 2^ 11.585
Curve    6,  bound 2^ 11.807
Factor from ECM: 54629318837
Factored N 25005272280974861996134424: [2, 2, 2, 1117, 51222720707, 54629318837]
Time 0.281 sec.

Very interesting question, thanks!

Today I decided to implement for you from scratch Elliptic Curve ECM factorization method in Python.

I tried to make clean and easy understandable code. You can just copy-paste whole function FactorECM() from my code into your code and it will work without any other tweaking or dependencies.

Due to some simplifications in the code algorithm is NOT highly optimized, e.g. it doesn't use multiprocessing module to use processes and all CPU cores. Basically my algorithm is single core for single factored number.

I used following sub-algorithms in my code: Trial Division Factorization, Fermat Probability Test, Sieve of Eratosthenes (prime numbers generator), Euclidean Algorithm (computing Greatest Common Divisor, GCD), Extended Euclidean Algorithm (GCD with Bezu coefficients), Modular Multiplicative Inverse, Right-to-Left Binary Exponentation (for elliptic point multiplication), Elliptic Curve Arithmetics (point addition and multiplication), Lenstra Elliptic Curve Factorization.

NOTE. You can speedup my code 2.5x times if you install gmpy2 module through python -m pip install gmpy2 command line. But don't forget to uncomment line #import gmpy2 in my code, if uncommented then 2.5x boost will be applied. I used gmpy2 for its considerably faster implementation of gmpy2.gcdext() functions, which implements Extended Euclidean Algorithm needed for Modular Multiplicative Inverse.

ECM algorithm in general, if well optimized (for example written in C++), is capable of factoring quite big numbers, even hardest 100-bit number (30 digits), consisting of two 50-bit primes, can be factored within several seconds.

My algorithm is written according to ECM in Wikipedia and is as following:

  1. Check if number is smaller than 2^16, then factor it through Trial Division method. Return result.

  2. Check if number is probably prime with high condifence, for that I use Fermat Test with 32 trials. If number is primes return it as only factor.

  3. Generate curve parameters A, X, Y randomly and derive B from curve equation Y^2 = X^3 + AX + B (mod N). Check if curve is OK, value 4 * A ** 3 - 27 * B ** 2 should be non-zero.

  4. Generate small primes through Sieve of Eratosthenes, primes below our Bound. Each prime raise to some small power, this raised prime would be called K. I do raising into power while it is smaller than some Bound2, which is Sqrt(Bound) in my case.

  5. Compute elliptic point multiplication P = k * P, where K taken from previous step and P is (X, Y). Compute according to Elliptic Curve Arithmetics Wiki.

  6. Point multiplication uses Modular Multiplicative Inverse, which computes GCD(SomeValue, N) according to Extended Euclidean Algorithm Wiki. If this GCD is not 1, then it gives non-trivial factor of N. Collect this factor, remove it from number and re-run ECM algorithm (steps 1.-6. above) for remaining number.

  7. If all primes till Bound were multiplied and gave no factor then re-run ECM factorization algorithm (steps 1.-6. above) again with another random curve and bigger Bound. In my code I take new bound by adding 512 to old bound.

See example of usage of my factoring function inside Test() function. After code below is located example console output for factoring hardest 72-bit random number (consisting of two 36-bit primes). If you want different size of number than modify bits = 72 in my code to your desired bit size of input random number.

Try it online!

def FactorECM(N0, *, verbose = False):
    # https://en.wikipedia.org/wiki/Lenstra_elliptic-curve_factorization
    import math, random, time; gmpy2 = None
    #import gmpy2
    def Factor_TrialDiv(x):
        # https://en.wikipedia.org/wiki/Trial_division
        fs = []
        while (x & 1) == 0:
            fs.append(2)
            x >>= 1
        for d in range(3, x + 1, 2):
            if d * d > x:
                break
            while x % d == 0:
                fs.append(d)
                x //= d
        if x > 1:
            fs.append(x)
        return sorted(fs)
    def IsProbablyPrime_Fermat(n, trials = 32):
        # https://en.wikipedia.org/wiki/Fermat_primality_test
        for i in range(trials):
            if pow(random.randint(2, n - 2), n - 1, n) != 1:
                return False
        return True
    def GenPrimes_SieveOfEratosthenes(end):
        # https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
        composite = [False] * end
        for p in range(2, int(math.sqrt(end) + 3)):
            if composite[p]:
                continue
            for i in range(p * p, end, p):
                composite[i] = True
        for p in range(2, end):
            if not composite[p]:
                yield p
    def GCD(a, b):
        # https://en.wikipedia.org/wiki/Euclidean_algorithm
        while b != 0:
            a, b = b, a % b
        return a
    def EGCD(a, b):
        # https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm
        if gmpy2 is None:
            ro, r, so, s = a, b, 1, 0
            while r != 0:
                ro, (q, r) = r, divmod(ro, r)
                so, s = s, so - q * s
            return ro, so, (ro - so * a) // b
        else:
            return tuple(map(int, gmpy2.gcdext(a, b)))
    def ModularInverse(a, n):
        # https://en.wikipedia.org/wiki/Modular_multiplicative_inverse
        g, s, t = EGCD(a, n)
        if g != 1:
            raise ValueError(a)
        return s % n
    def EllipticCurveAdd(N, A, B, X0, Y0, X1, Y1):
        # https://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication
        if X0 == X1 and Y0 == Y1:
            # Double
            l = ((3 * X0 ** 2 + A) * ModularInverse(2 * Y0, N)) % N
            x = (l ** 2 - 2 * X0) % N
            y = (l * (X0 - x) - Y0) % N
        else:
            # Add
            l = ((Y1 - Y0) * ModularInverse(X1 - X0, N)) % N
            x = (l ** 2 - X0 - X1) % N
            y = (l * (X0 - x) - Y0) % N
        return x, y
    def EllipticCurveMul(N, A, B, X, Y, k):
        # https://en.wikipedia.org/wiki/Modular_exponentiation#Right-to-left_binary_method
        assert k >= 2, k
        k -= 1
        BX, BY = X, Y
        while k != 0:
            if k & 1:
                X, Y = EllipticCurveAdd(N, A, B, X, Y, BX, BY)
            BX, BY = EllipticCurveAdd(N, A, B, BX, BY, BX, BY)
            k >>= 1
        return X, Y
    
    bound_start = 1 << 9
    
    def Main(N, *, bound = bound_start, icurve = 0):
        def NextFactorECM(x):
            return Main(x, bound = bound + bound_start, icurve = icurve + 1)
        def PrimePow(p, *, bound2 = int(math.sqrt(bound) + 1.01)):
            mp = p
            while True:
                mp *= p
                if mp >= bound2:
                    return mp // p
        
        if N < (1 << 16):
            fs = Factor_TrialDiv(N)
            if verbose and len(fs) >= 2:
                print('Factors from TrialDiv:', fs, flush = True)
            return fs
        
        if IsProbablyPrime_Fermat(N):
            return [N]
        
        if verbose:
            print(f'Curve {icurve:>4},  bound 2^{math.log2(bound):>7.3f}', flush = True)
        
        while True:
            X, Y, A = [random.randrange(N) for i in range(3)]
            B = (Y ** 2 - X ** 3 - A * X) % N
            if 4 * A ** 3 - 27 * B ** 2 == 0:
                continue
            break
        
        for p in GenPrimes_SieveOfEratosthenes(bound):
            k = PrimePow(p)
            try:
                X, Y = EllipticCurveMul(N, A, B, X, Y, k)
            except ValueError as ex:
                g = GCD(ex.args[0], N)
                assert g > 1, g
                if g != N:
                    if verbose:
                        print('Factor from ECM:', g, flush = True)
                    return sorted(NextFactorECM(g) + NextFactorECM(N // g))
                else:
                    return NextFactorECM(N)
        
        return NextFactorECM(N)
    
    if verbose:
        print(f'ECM factoring N: {N0} (2^{math.log2(max(1, N0)):.2f})', flush = True)
        tb = time.time()
        fs = Main(N0)
        tb = time.time() - tb
        print(f'Factored N {N0}: {fs}')
        print(f'Time {tb:.3f} sec.', flush = True)
        return fs
    else:
        return Main(N0)

def Test():
    import random
    def IsProbablyPrime_Fermat(n, trials = 32):
        for i in range(trials):
            if pow(random.randint(2, n - 2), n - 1, n) != 1:
                return False
        return True
    def RandPrime(bits):
        while True:
            p = random.randrange(1 << (bits - 1), 1 << bits) | 1
            if IsProbablyPrime_Fermat(p):
                return p
    bits = 72
    N = RandPrime((bits + 1) // 2) * RandPrime(bits // 2) * random.randrange(1 << 16)
    FactorECM(N, verbose = True)

if __name__ == '__main__':
    Test()

Example console output:

ECM factoring N: 25005272280974861996134424 (2^84.37)
Curve    0,  bound 2^  9.000
Factor from ECM: 2
Curve    1,  bound 2^ 10.000
Factor from ECM: 4
Factors from TrialDiv: [2, 2]
Curve    2,  bound 2^ 10.585
Factor from ECM: 1117
Curve    3,  bound 2^ 11.000
Curve    4,  bound 2^ 11.322
Curve    5,  bound 2^ 11.585
Curve    6,  bound 2^ 11.807
Factor from ECM: 54629318837
Factored N 25005272280974861996134424: [2, 2, 2, 1117, 51222720707, 54629318837]
Time 0.281 sec.
芯好空 2024-10-17 20:28:09

我刚刚在分解数字 2**1427 * 31 时遇到了此代码中的错误。

  File "buckets.py", line 48, in prettyprime
    factors = primefactors.primefactors(n, sort=True)
  File "/private/tmp/primefactors.py", line 83, in primefactors
    limit = int(n ** .5) + 1
OverflowError: long int too large to convert to float

此代码片段:

limit = int(n ** .5) + 1
for checker in smallprimes:
    if checker > limit: break
    while n % checker == 0:
        factors.append(checker)
        n //= checker
        limit = int(n ** .5) + 1
        if checker > limit: break

应该重写为

for checker in smallprimes:
    while n % checker == 0:
        factors.append(checker)
        n //= checker
    if checker > n: break

在实际输入上执行得更快。平方根很慢 - 基本上相当于许多乘法 -,smallprimes 只有几十个成员,这样我们就可以从紧密的内部循环,这在分解像 2**1427 这样的数字时肯定很有帮助。根本没有理由计算 sqrt(2**1427)sqrt(2**1426)sqrt(2**1425)等等,当我们关心的是“检查器的[平方]是否超过n”时。

重写的代码在出现大数字时不会抛出异常,根据 timeit(在示例输入 22**718 * 31)。

另请注意,isprime(2) 返回错误的结果,但只要我们不依赖它,就可以。恕我直言,你应该将该函数的介绍重写为

if n <= 3:
    return n >= 2
...

I just ran into a bug in this code when factoring the number 2**1427 * 31.

  File "buckets.py", line 48, in prettyprime
    factors = primefactors.primefactors(n, sort=True)
  File "/private/tmp/primefactors.py", line 83, in primefactors
    limit = int(n ** .5) + 1
OverflowError: long int too large to convert to float

This code snippet:

limit = int(n ** .5) + 1
for checker in smallprimes:
    if checker > limit: break
    while n % checker == 0:
        factors.append(checker)
        n //= checker
        limit = int(n ** .5) + 1
        if checker > limit: break

should be rewritten into

for checker in smallprimes:
    while n % checker == 0:
        factors.append(checker)
        n //= checker
    if checker > n: break

which will likely perform faster on realistic inputs anyway. Square root is slow — basically the equivalent of many multiplications —, smallprimes only has a few dozen members, and this way we remove the computation of n ** .5 from the tight inner loop, which is certainly helpful when factoring numbers like 2**1427. There's simply no reason to compute sqrt(2**1427), sqrt(2**1426), sqrt(2**1425), etc. etc., when all we care about is "does the [square of the] checker exceed n".

The rewritten code doesn't throw exceptions when presented with big numbers, and is roughly twice as fast according to timeit (on sample inputs 2 and 2**718 * 31).

Also notice that isprime(2) returns the wrong result, but this is okay as long as we don't rely on it. IMHO you should rewrite the intro of that function as

if n <= 3:
    return n >= 2
...
零度℉ 2024-10-17 20:28:09

最多有 1 个因子大于 sqrt(n),因此所有测试直到 n 的数字的方法都太慢。测试直到 sqrt(n),并单独检查最后一个因子。

def prime_factors(n: int) -> dict:
    factors = {}
    i = 2
    while i * i <= n:
        if not n % i:
            factors[i] = factors.get(i, 0) + 1
            n //= i
        else:
            i += 1
    factors[n] = factors.get(n, 0) + 1
    return factors

At most 1 of the factors is bigger than sqrt(n), so all methods that test numbers up to n are too slow. Test up to sqrt(n), and check the last factor separately.

def prime_factors(n: int) -> dict:
    factors = {}
    i = 2
    while i * i <= n:
        if not n % i:
            factors[i] = factors.get(i, 0) + 1
            n //= i
        else:
            i += 1
    factors[n] = factors.get(n, 0) + 1
    return factors
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文