基于文件的查找表
您需要一个 10^10 4 字节整数数组作为查找表。将其加载到 RAM 需要 40GB,这是不可行的。初始化后,您永远不需要写入该数组。您需要从单个进程的多个线程同时读取该数组的随机位置中的各个整数值。保证您使用的是 64 位平台。这个查找表的最快实现是什么?使用常规文件读取功能或例如Boost内存映射文件?
You need an array of 10^10 4-byte integers to be used as a look-up table. Loading it to RAM would take 40GB, which isn't feasible. You never need to write to this array after it has been initialized. You need to read individual integer values from random locations of this array concurrently from multiple threads of a single process. You're guaranteed to be on a 64-bit platform. What is the fastest implementation of this look-up table? Using regular file reading functions or e.g. Boost memory-mapped file?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
听起来你应该进行显式读取。
内存映射的速度来自于一次引入大块页面(我相信 Windows 是 256KiB,不确定其他平台)并允许您在第一次之后重新访问它们而不会受到任何损失。
如果您只是从随机位置读取整数,那么您将在一页中仅读取 4 个字节的 256KB,并且可能永远不会重新访问它。太浪费了!还要考虑到您刚刚从其他应用程序和文件系统缓存中调出了许多可能有用的数据。
It sounds like you should do explicit reads.
Memory mapping gets its speed from bringing in large chunks of pages in at a time (I believe Windows does 256KiB, not sure about other platforms) and allowing you to re-access them without any penalty after the first time.
If you're just reading integers from random locations, you'll be reading in 256KB for just 4 bytes out of one page, and maybe never even re-access it. Such a waste! Also consider that you've also just paged out a lot of maybe useful data from other apps and the filesystem cache.
因为一旦创建了文件,您只需要以只读方式访问它,我认为您不会想要内存映射文件、Boost 或其他文件的费用。如果您有多个进程想要同时访问相同的数据,那么这会更有用。在你的例子中,你只有只读线程,所以一个简单的 40g 文件应该是最简单和最快的。
Since once the file is created, you only ever need to access it in a read-only way, I wouldn't think you'd want the expense of a memory-mapped file, Boost or otherwise. That would be more useful if you had multiple processes that wanted to concurrently access the same data. In you case, you've just got read-only threads, so a simple 40g file should be the simplest and fastest.