使用页面文件进行缓存?
我必须处理通常无法装入主内存的大量数据。我访问这些数据的方式具有很高的局部性,因此将部分数据缓存在内存中看起来是一个不错的选择。仅使用 malloc() 一个巨大的数组,并让操作系统确定要调出哪些位以及要保留哪些位是否可行?
I have to deal with a huge amount of data that usually doesn't fit into main memory. The way I access this data has high locality, so caching parts of it in memory looks like a good option. Is it feasible to just malloc() a huge array, and let the operating system figure out which bits to page out and which bits to keep?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
假设数据来自文件,最好将内存映射到该文件。否则,您最终要做的就是分配数组,然后将文件中的数据复制到数组中 - 由于数组已映射到页面文件,因此您基本上只是将原始文件复制到页面文件,并且在此过程中会污染“缓存”(即物理内存),因此当前活动的其他数据更有可能被逐出。然后,完成后(通常)将数据从数组写回原始文件,这(在本例中)意味着从页面文件复制回原始文件。
相反,映射文件的内存只是创建一些地址空间并将其直接映射到原始文件。这可以避免将数据从原始文件复制到页面文件(完成后再次复制回来),以及在从原始文件到页面文件的过程中临时将数据移动到物理内存中。当然,最大的胜利是当/如果原始文件中有大量您从未真正使用过的片段(在这种情况下,它们可能根本不会被读入物理内存,假设未使用的块至少是一个页面)尺寸)。
Assuming the data comes from a file, you're better off memory mapping that file. Otherwise, what you end up doing is allocating your array, and then copying the data from your file into the array -- and since your array is mapped to the page file, you're basically just copying the original file to the page file, and in the process polluting the "cache" (i.e., physical memory) so other data that's currently active has a much better chance of being evicted. Then, when you're done you (typically) write the data back from the array to the original file, which (in this case) means copying from the page file back to the original file.
Memory mapping the file instead just creates some address space and maps it directly to the original file instead. This avoids copying data from the original file to the page file (and back again when you're done) as well as temporarily moving data into physical memory on the way from the original file to the page file. The biggest win, of course, is when/if there are substantial pieces of the original file that you never really use at all (in which case they may never be read into physical memory at all, assuming the unused chunk is at least a page in size).
如果数据位于大文件中,请考虑使用 mmap 来读取它。现代计算机有如此多的 RAM,您可能没有足够的可用交换空间。
If the data are in a large file, look into using mmap to read it. Modern computers have so much RAM, you might not enough swap space available.