使用特制的 CPU 查找大数的质因数

发布于 2024-07-29 20:50:04 字数 687 浏览 13 评论 0原文

我的理解是,现在许多公钥加密算法都依赖于大素数来组成密钥,而分解两个素数的乘积的难度使得加密难以破解。 据我了解,对如此大的数字进行因式分解如此困难的原因之一是所使用数字的庞大规模意味着没有 CPU 可以有效地对这些数字进行操作,因为我们微小的 32 位和 64 位 CPU 无法匹敌对于 1024、2048 甚至 4096 位数字。 必须使用专门的大整数数学库来处理这些数字,而这些库本质上很慢,因为 CPU 一次只能保存(和处理)小块(例如 32 或 64 位)。

那么...

为什么你不能构建一个高度专业化的定制芯片,具有 2048 位寄存器和巨大的算术电路,就像我们从 8 位 CPU 扩展到 16 位、32 位到 64 位 CPU 一样,只需大量构建一个更大? 该芯片不需要传统 CPU 上的大部分电路,毕竟它不需要处理虚拟内存、多线程或 I/O 等事务。 它甚至不需要是支持存储指令的通用处理器。 只是对巨大数字执行必要的算术计算的最低限度。

我对 IC 设计了解不多,但我记得学习了逻辑门的工作原理,如何构建半加器、全加器,然后将一堆加法器链接在一起进行多位算术。 只需扩大规模即可。 很多。

现在,我相当确定有一个很好的理由(或 17)上述内容不起作用(因为否则许多比我聪明的人之一已经做到了)但我有兴趣知道< em>为什么它不起作用。

(注意:这个问题可能需要重新处理,因为我什至不确定这个问题是否有意义)

My understanding is that many public key cryptographic algorithms these days depend on large prime numbers to make up the keys, and it is the difficulty in factoring the product of two primes that makes the encryption hard to break. It is also my understanding that one of the reasons that factoring such large numbers is so difficult, is that the sheer size of the numbers used means that no CPU can efficiently operate on the numbers, since our minuscule 32 and 64 bit CPUs are no match for 1024, 2048 or even 4096 bit numbers. Specialized Big Integer math libraries must be used in order to process those numbers, and those libraries are inherently slow since a CPU can only hold (and process) small chunks (like 32 or 64 bits) at one time.

So...

Why can't you build a highly specialized custom chip with 2048 bit registers, and giant arithmetic circuits, much in the same way that we scaled from 8 to 16 to 32 to 64-bit CPUs, just build one a LOT larger? This chip wouldn't need most of the circuitry on conventional CPUs, after all it wouldn't need to handle things like virtual memory, multithreading or I/O. It wouldn't even need to be a general-purpose processor supporting stored instructions. Just the bare minimum to perform the necessary arithmetical calculations on ginormous numbers.

I don't know a whole lot about IC design, but I do remember learning about how logic gates work, how to build a half adder, full adder, then link together a bunch of adders to do multi-bit arithmetic. Just scale up. A lot.

Now, I'm fairly certain that there is a very good reason (or 17) that the above won't work (since otherwise one of the many people smarter than I am would have already done it) but I am interested in knowing why it won't work.

(Note: This question may need some re-working, as I'm not even sure yet if the question makes sense)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

为人所爱 2024-08-05 20:50:04

@cube 所说的,以及一个巨大的算术逻辑单元需要更多时间来稳定逻辑信号的事实,并包括数字设计中的其他复杂性。 数字逻辑设计包括您在软件中认为理所当然的东西,即通过组合逻辑的信号需要很短但非零的时间来传播和稳定。 32x32 乘法器需要仔细设计。 1024x1024 乘法器不仅会占用芯片中大量的物理资源,而且还会比 32x32 乘法器慢(尽管可能比计算执行 1024x1024 乘法所需的所有部分乘积的 32x32 乘法器更快)。 另外,瓶颈不仅仅是乘数:你还有记忆路径。 您必须花费大量时间从只有 32 位宽的存储电路中收集 1024 位,并将所得的 2048 位存储回存储电路中。

几乎可以肯定的是,让一堆“传统”32 位或 64 位系统并行工作会更好:无需硬件设计复杂性即可获得加速。

编辑:如果有人有 ACM 访问权限(我没有),也许可以看看 这篇论文看看它说了什么。

What @cube said, and the fact that a giant arithmetic logic unit would take more time for the logic signals to stabilize, and include other complications in digital design. Digital logic design includes something that you take for granted in software, namely that signals through combinational logic take a small but nonzero time to propagate and settle. A 32x32 multiplier needs to be designed carefully. A 1024x1024 multiplier would not only take a huge amount of physical resources in a chip, but it also would be slower than a 32x32 multiplier (though perhaps faster than a 32x32 multiplier computing all the partial products needed to perform a 1024x1024 multiply). Plus it's not only the multiplier that's the bottleneck: you've got memory pathways. You'd have to spend a bunch of time gathering the 1024 bits from a memory circuit that's only 32 bits wide, and storing the resulting 2048 bits back into the memory circuit.

Almost certainly it's better to get a bunch of "conventional" 32-bit or 64-bit systems working in parallel: you get the speedup w/o the hardware design complexity.

edit: if anyone has ACM access (I don't), perhaps take a look at this paper to see what it says.

戏剧牡丹亭 2024-08-05 20:50:04

这是因为这种加速仅在 O(n) 内,但分解数字的复杂性类似于 O(2^n) (相对于位数)。 因此,如果你制作了这个超级处理器,并将数字分解的速度提高了 1000 倍,我只需要将数字增大 10 位,然后我们就会重新开始。

Its because this speedup would be only in O(n), but the complexity of factoring the number is something like O(2^n) (with respect to the number of bits). So if you made this überprocessor and factorized the numbers 1000 times faster, I would only have to make the numbers 10 bits larger and we would be back on the start again.

暮色兮凉城 2024-08-05 20:50:04

如上所述,主要问题只是你必须经历多少种可能性来分解一个数字。 话虽如此,确实存在专门的计算机来完成此类事情。

这种密码学的真正进步是数字分解算法的改进。 目前,已知最快的通用算法是通用数域筛选

从历史上看,我们似乎能够每十年将数字分解两倍。 部分原因在于更快的硬件,部分原因在于更好地理解数学以及如何进行因式分解。

As indicated above, the primary problem is simply how many possibilities you have to go through to factor a number. That being said, specialized computers do exist to do this sort of thing.

The real progress for this sort of cryptography is improvements in number factoring algorithms. Currently, the fastest known general algorithm is the general number field sieve.

Historically, we seem to be able to factor numbers twice as large each decade. Part of that is faster hardware, and part of it is simply a better understanding of mathematics and how to perform factoring.

旧情别恋 2024-08-05 20:50:04

我无法评论与您所描述的方法完全相同的方法的可行性,但人们经常使用 FPGA 做类似的事情:

I can't comment on the feasibility of an approach exactly like the one you described, but people do similar things very frequently using FPGAs:

夜唯美灬不弃 2024-08-05 20:50:04

沙米尔 Tromer 建议采用类似的方法,使用一种网格计算
电路图将加法器与 TWIRL 进行比较

本文讨论了定制硬件的新设计
实施筛分步骤,其中
[相对于 TWINKLE,筛分成本] 降低至约 1000 万美元。 新设备,
称为 TWIRL,可以看作是
闪烁设备。 然而,与 TWINKLE 不同的是
没有光电元件,并且可以
因此采用标准 VLSI 技术制造
在硅晶片上。 基本思想是使用
输入的单个副本可以解决许多子问题
在平行下。 由于输入存储主导成本,如果
并行化开销保持较低,然后结果
加速基本上是免费获得的。 确实,
主要挑战在于有效地实现这种并行性,同时允许输入的紧凑存储。
解决这个问题需要考虑多种因素,
从数论到超大规模集成电路技术。

Shamir & Tromer suggest a similar approach, using a kind of grid computing:
circuit diagram comparing adders to TWIRL

This article discusses a new design for a custom hardware
implementation of the sieving step, which
reduces [the cost of sieving, relative to TWINKLE,] to about $10M. The new device,
called TWIRL, can be seen as an extension of the
TWINKLE device. However, unlike TWINKLE it
does not have optoelectronic components, and can
thus be manufactured using standard VLSI technology
on silicon wafers. The underlying idea is to use
a single copy of the input to solve many subproblems
in parallel. Since input storage dominates cost, if the
parallelization overhead is kept low then the resulting
speedup is obtained essentially for free. Indeed, the
main challenge lies in achieving this parallelism efficiently while allowing compact storage of the input.
Addressing this involves myriad considerations, ranging
from number theory to VLSI technology.

嘿咻 2024-08-05 20:50:04

为什么不尝试构建一台超级量子计算机并在其上运行Shor 算法

“...如果要构建具有足够数量量子位的量子计算机,Shor 算法可用于破解公钥加密方案,例如广泛使用的 RSA 方案。 RSA就目前所知,这种假设对于经典(非量子)计算机来说是有效的;没有已知的经典算法可以分解多项式时间。因式分解在量子计算机上是高效的,因此足够大的量子计算机可以破解 RSA ...”-维基百科

Why don't you try building an uber-quantum computer and run Shor's algorithm on it?

"... If a quantum computer with a sufficient number of qubits were to be constructed, Shor's algorithm could be used to break public-key cryptography schemes such as the widely used RSA scheme. RSA is based on the assumption that factoring large numbers is computationally infeasible. So far as is known, this assumption is valid for classical (non-quantum) computers; no classical algorithm is known that can factor in polynomial time. However, Shor's algorithm shows that factoring is efficient on a quantum computer, so a sufficiently large quantum computer can break RSA. ..." -Wikipedia

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文