基于文件的查找表

发布于 2025-01-04 19:49:09 字数 162 浏览 6 评论 0原文

您需要一个 10^10 4 字节整数数组作为查找表。将其加载到 RAM 需要 40GB,这是不可行的。初始化后,您永远不需要写入该数组。您需要从单个进程的多个线程同时读取该数组的随机位置中的各个整数值。保证您使用的是 64 位平台。这个查找表的最快实现是什么?使用常规文件读取功能或例如Boost内存映射文件?

You need an array of 10^10 4-byte integers to be used as a look-up table. Loading it to RAM would take 40GB, which isn't feasible. You never need to write to this array after it has been initialized. You need to read individual integer values from random locations of this array concurrently from multiple threads of a single process. You're guaranteed to be on a 64-bit platform. What is the fastest implementation of this look-up table? Using regular file reading functions or e.g. Boost memory-mapped file?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

浅黛梨妆こ 2025-01-11 19:49:09

听起来你应该进行显式读取。

内存映射的速度来自于一次引入大块页面(我相信 Windows 是 256KiB,不确定其他平台)并允许您在第一次之后重新访问它们而不会受到任何损失。

如果您只是从随机位置读取整数,那么您将在一页中仅读取 4 个字节的 256KB,并且可能永远不会重新访问它。太浪费了!还要考虑到您刚刚从其他应用程序和文件系统缓存中调出了许多可能有用的数据。

It sounds like you should do explicit reads.

Memory mapping gets its speed from bringing in large chunks of pages in at a time (I believe Windows does 256KiB, not sure about other platforms) and allowing you to re-access them without any penalty after the first time.

If you're just reading integers from random locations, you'll be reading in 256KB for just 4 bytes out of one page, and maybe never even re-access it. Such a waste! Also consider that you've also just paged out a lot of maybe useful data from other apps and the filesystem cache.

一绘本一梦想 2025-01-11 19:49:09

因为一旦创建了文件,您只需要以只读方式访问它,我认为您不会想要内存映射文件、Boost 或其他文件的费用。如果您有多个进程想要同时访问相同的数据,那么这会更有用。在你的例子中,你只有只读线程,所以一个简单的 40g 文件应该是最简单和最快的。

Since once the file is created, you only ever need to access it in a read-only way, I wouldn't think you'd want the expense of a memory-mapped file, Boost or otherwise. That would be more useful if you had multiple processes that wanted to concurrently access the same data. In you case, you've just got read-only threads, so a simple 40g file should be the simplest and fastest.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文