极大整数乘法和加法

发布于 2024-09-04 09:48:55 字数 296 浏览 4 评论 0原文

您好,

我需要将存储在文本文件中的两个极长整数值相乘(通过 GMP(准确地说是 MPIR)导出,因此它们可以是任何基数中的任何值)。现在,我通常只是通过 mpz_inp_str() 函数导入这些整数并在 RAM 中执行乘法,但是,这些值太长,以至于我无法真正加载它们(每个值大约 1 GB 数据)。最快的方法是什么?也许已经有一些外部库可以做这种事情了?是否有任何易于实现的方法(性能并不是非常重要,因为此操作只会执行一次或两次)?

tl;dr:我需要将值相乘,以至于它们不适合进程内存限制(Windows)。

感谢您抽出时间。

Greetings,

I need to multiply two extremely long integer values stored in a text file (exported via GMP (MPIR, to be exact), so they can be any in any base). Now, I would usually just import these integers via the mpz_inp_str() function and perform the multiplication in RAM, however, these values are so long that I can't really load them (about 1 GB of data each). What would be the fastest way to do this? Perhaps there are some external libraries that do this sort of thing already? Are there any easily implementable methods for this (performance is not incredibly important as this operation would only be performed once or twice)?

tl;dr: I need to multiply values so large they don't fit into process memory limits (Windows).

Thank you for your time.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我不咬妳我踢妳 2024-09-11 09:48:55

我不知道是否有一个库支持这一点,但您可以在每个真正大数字(RBN)的部分上使用 GMP/MPIR。也就是说,首先将每个 RBN 分解为可管理的、大小统一的块(例如 10M 数字块,预计最高有效数字的块尺寸较小,另见下文)。

    RBN1 --> A B C D E
    RBN2 --> F G H I J

分块可以以 10 为基数进行,因此只需读取即可。每件文件中的字符。然后将每个数字的块一次相乘一个。

     AxF BxF CxF DxF ExF
    +    AxG BxG CxG DxG ExG
    +        AxH BxH CxH DxH ExH
    +            AxI BxI CxI DxI ExI
    +                AxJ BxJ CxJ DxJ ExJ

执行内存中最终总和的每一列。然后,将进位保留在内存中,将该列写入磁盘,对下一列重复...对于进位,将每列总和结果用 GMP 转换为字符串,写出底部的。结果的一部分,并将顶部部分作为进位的 GMP int 读回。

我建议为每个乘法动态选择块大小,以便将每个列加法保留在内存中;数字越大,需要添加的列越多,块大小需要越小。

对于读取和写入,我建议使用内存映射文件,boost 有一个很好的接口 this (请注意,这不会加载整个文件,它只是基本上在虚拟内存上缓冲 IO)。为每个输入 RBN 数字打开一个映射文件,并打开一个大小 = size(RBN1) + size(RBN2) + 1 的输出;对于内存映射文件,文件访问被视为原始 char*,因此您可以使用 gmp c-string io 方法直接读取/写入块。您可能需要读入中间缓冲区,以便为 GMP 提供以 NULL 结尾的字符串(除非您想临时更改内存映射文件)。

这并不是很容易正确实现,但话又说回来,这不是一个特别简单的问题(也许只是乏味)。这种方法的优点是它准确地反映了 GMP 在内存中所做的事情,因此算法是众所周知的。

I don't know if there is a library that supports this, but you could use GMP/MPIR on parts of each really big number (RBN). That is, start by breaking each RBN into manageable, uniformly sized chunks (e.g. 10M digit chunks, expect an undersized chunk for most significant digits, also see below).

    RBN1 --> A B C D E
    RBN2 --> F G H I J

The chunking can be done in base 10, so just read <chuck_size> characters from the file for each piece. Then multiply chunks from each number one at a time.

     AxF BxF CxF DxF ExF
    +    AxG BxG CxG DxG ExG
    +        AxH BxH CxH DxH ExH
    +            AxI BxI CxI DxI ExI
    +                AxJ BxJ CxJ DxJ ExJ

Perform each column of the final sum in memory. Then, keeping the carry in memory, write the column out to disk, repeat for next column... For carries, convert each column sum result to a string with GMP, write out the bottom <chunk size> portion of the result and read the top portion back in as a GMP int for the carry.

I'd suggest selecting a chunk size dynamically for each multiplication in order to keep each column addition in memory; the larger the numbers, the more column additions will need to be done, the smaller the chunk size will need to be.

For both reading and writing, I'd suggest using memory mapped files, boost has a nice interface for this (note that this does not load the entire file, it just basically buffers the IO on virtual memory). Open one mapped file for each input RBN numbers, and one output with size = size(RBN1) + size(RBN2) + 1; With memory mapped files, file access is treated as a raw char*, so you can read/write chunks directly using gmp c-string io methods. You will probably need to read into an intermediate buffer in order to NULL terminated strings for GMP (unless you want to temporarily alter the memory mapped file).

This isn't very easy to implement correctly, but then again this isn't a particularly easy problem (maybe just tedious). This approach has the advantage that it exactly mirrors what GMP is doing in memory, so the algorithms are well known.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文