当磁盘页面出现故障时,哪个 (OS X) dtrace 探针会触发?
我正在编写一份有关页面错误的文档,并试图获取一些具体的数字来使用,因此我编写了一个读取 12*1024*1024 字节数据的简单程序。简单:
int main()
{
FILE*in = fopen("data.bin", "rb");
int i;
int total=0;
for(i=0; i<1024*1024*12; i++)
total += fgetc(in);
printf("%d\n", total);
}
是的,它会遍历并读取整个文件。问题是我需要 dtrace 探针在此过程中将触发 1536 次 (12M/8k)。即使我计算了所有 fbt:mach_kernel:vm_fault*: 探针和所有 vminfo::: 探针,我也没有达到 500,所以我知道我没有找到正确的探针。
有人知道我在哪里可以找到当磁盘页面出现故障时触发的 dtrace 探针吗?
更新:
如果问题是 stdio 函数中进行了一些智能预取,我尝试了以下操作:
int main()
{
int in = open("data.bin", O_RDONLY | O_NONBLOCK);
int i;
int total=0;
char buf[128];
for(i=0; i<1024*1024*12; i++)
{
read(in, buf, 1);
total += buf[0];
}
printf("%d\n", total);
}
此版本需要更长的时间才能运行(42 秒实时,其中 10 秒是用户,其余的是系统时间 - 页面错误,我猜)但仍然产生了我预期的五分之一的错误。
出于好奇,时间增加并不是由于循环开销和转换(char 到 int)造成的。仅执行这些操作的代码版本需要 0.07 秒。
I'm writing up a document about page faulting and am trying to get some concrete numbers to work with, so I wrote up a simple program that reads 12*1024*1024 bytes of data. Easy:
int main()
{
FILE*in = fopen("data.bin", "rb");
int i;
int total=0;
for(i=0; i<1024*1024*12; i++)
total += fgetc(in);
printf("%d\n", total);
}
So yes, it goes through and reads the entire file. The issue is that I need the dtrace probe that is going to fire 1536 times during this process (12M/8k). Even if I count all of the fbt:mach_kernel:vm_fault*: probes and all of the vminfo::: probes, I don't hit 500, so I know I'm not finding the right probes.
Anyone know where I can find the dtrace probes that fire when a page is faulted in from disk?
UPDATE:
On the off chance that the issue was that there was some intelligent pre-fetching going on in the stdio functions, I tried the following:
int main()
{
int in = open("data.bin", O_RDONLY | O_NONBLOCK);
int i;
int total=0;
char buf[128];
for(i=0; i<1024*1024*12; i++)
{
read(in, buf, 1);
total += buf[0];
}
printf("%d\n", total);
}
This version takes MUCH longer to run (42s real time, 10s of which was user and the rest was system time - page faults, I'm guessing) but still generates one fifth as many faults as I would expect.
For the curious, the time increase is not due to loop overhead and casting (char to int.) The code version that does just these actions takes .07 seconds.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不是直接答案,但您似乎将磁盘读取和页面错误等同起来。它们不一定相同。在代码中,您将文件中的数据读取到一个小的用户内存块中,因此 I/O 系统可以以任何合适的方式和大小将文件读取到缓冲区/VM 缓存中。我在这里可能是错的,我不知道达尔文是如何做到这一点的。
我认为更可靠的测试是将整个文件
mmap(2)
放入进程内存中,然后触摸每个页面就是该空间。Not a direct answer, but it seems you are equating disk reads and page faults. They are not necessarily the same. In your code you are reading data from a file into a small user memory chunk, so the I/O system can read the file into the buffer/VM cache in any way and size it sees fit. I might be wrong here, I don't know how Darwin does this.
I think the more reliable test would be to
mmap(2)
the whole file into process memory and then go touch each page is that space.我最近也在同一个老鼠洞里。我现在没有可用的 DTrace 脚本或测试程序,但我会给您以下建议:
1.) 亲自了解 Amit Singh 的 OS X Internals 并阅读有关虚拟内存的第 8.3 节(这将使您了解选择 DTrace 探针的正确参考框架)。
2.) 亲身体验 Brendan Gregg / Jim Mauro 编写的 Solaris 性能和工具。阅读有关虚拟内存的部分,并密切关注使用 vminfo 提供程序的示例 DTrace 脚本。
3.) OS X 肯定会从文件系统中预取大块页面,并且您的测试程序正在执行此优化(因为您正在按顺序读取)。有趣的是,Solaris 的情况并非如此。尝试随机访问大数组以击败预取。
I was down the same rathole recently. I don't have my DTrace scripts or test programs available just now, but I will give you the following advice:
1.) Get your hands on OS X Internals by Amit Singh and read section 8.3 on virtual memory (this will get you in the right frame of reference for selecting DTrace probes).
2.) Get your hands on Solaris Performance and Tools by Brendan Gregg / Jim Mauro. Read the section on virtual memory and pay close attention to the example DTrace scripts that make use of the vminfo provider.
3.) OS X is definitely prefetching large chunks of pages from the filesystem, and your test program is playing right into this optimization (since you're reading sequentially). Interestingly, this is not the case for Solaris. Try randomly accessing the big array to defeat the prefetch.
操作系统将在作为单独操作触摸的每个页面中发生故障(因此,如果您触摸 N 个页面,您将看到 DTrace 探测器触发 N 次)的假设是有缺陷的;大多数 UN*Xes 将执行某种预读或预故障,并且您不太可能获得与页面数量完全相同的调用次数。即使直接使用 mmap() 也是如此。
确切的比率还可能取决于文件系统,因为预读和页面集群实现和阈值不太可能都相同。
如果直接使用 mmap 然后应用 madvise(MADV_DONTNEED) 或类似的和/或使用 msync(MS_INVALIDATE) 清除整个范围,您可能可以强制执行每页错误策略。
The assumption that the operating system will fault in each and every page that's being touched as a separate operation (and that therefore, if you touch N pages, you'll see the DTrace probe fire N times) is flawed; most UN*Xes will perform some sort of readahead or pre-faulting and you're very unlikely to get exactly the same number of calls to as you have pages. This is so even if you use mmap() directly.
The exact ratio may also depend on the filesystem, as readahead and page clustering implementations and thresholds are unlikely to be the same for all of them.
You probably can force a per-page fault policy if you use mmap directly and then apply madvise(MADV_DONTNEED) or similar and/or purge the entire range with msync(MS_INVALIDATE).