在 mmap 中缓存

发布于 2024-08-30 03:53:10 字数 152 浏览 0 评论 0原文

我正在使用 mmap 调用使用 C++ 中的简单指针算术从一个非常大的文件中读取数据。问题是,当我多次读取小块数据(以 KB 为单位)时,每次读取所花费的时间与前一次读取的时间相同。我如何知道是否正在访问磁盘来满足我的请求,或者在第一个请求之后的调用中是否正在从主内存(页面缓存)满足请求。

I am using mmap call to read from a very big file using simple pointer arithmetic in C++. The problem is that when I read small chunks of data (in the order of KBs) multiple times, each read take the same amount of time as the previous one. How can I know if the disk is being accessed to fulfill my request or whether the request is being fulfilled from main memory (page cache) in calls after the first one.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

猥︴琐丶欲为 2024-09-06 03:53:10

问题如下:两次读取都是从缓存中执行的。我猜想缓存是在文件打开或映射时开始的,然后才请求数据。为了验证这一点,我发出:

echo 3 > /proc/sys/vm/drop_caches

刷新缓存,然后,如果我运行两次迭代来获取相同的数据,第一次运行(在我的例子中)比第二次慢 10 倍。

The issue is the following: both reads were being performed from cache. I guess caching starts when the file is opened or mmapped, before asking for the data. To verify this, I issued:

echo 3 > /proc/sys/vm/drop_caches

which flushes out the cache, then, if I run two iterations for getting the same data, the first run is (in my case) 10 times slower than the second.

〗斷ホ乔殘χμё〖 2024-09-06 03:53:10

如果您利用引用位置,您将获得最佳缓存性能。也就是说,如果您访问内存中靠近的变量(例如,通过变量按递增顺序逐步递增)并且执行这些访问的时间接近(即在读取这些元素之间不执行许多其他内存访问),那么您将获得最佳的缓存性能。如果每次读取花费的时间大致相同,那么它很可能被缓存;如果缓存未提供服务,通常表现为多次快速读取(缓存命中),然后出现峰值(缓存未命中),然后进行更快速的读取。在几乎所有系统上,缓存未命中都会导致数据所在的块加载到缓存中,因此,如果您访问附近的变量(位于同一块中),它们将位于缓存中。

You will get the best cache performance if you exploit locality of reference. That is to say that if you access variables that are close together in memory (e.g. stepping by one in increasing order through the variables) and you perform these accesses close in time (i.e. not performing many other memory accesses between reading these elements), then you will get the best cache performance. If each read is taking roughly the same amount of time, then it is very likely being cached; if things are not being served from cache, that is usually indicated by several fast reads (cache hits) followed by a spike (cache miss) followed by more fast reads. On almost all systems, a cache miss causes a chunk in which the data resides to be loaded into the cache, so if you access nearby variables (which are in the same chunk) they will be in the cache.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文