处理 TLB 未命中
我想查看我的程序正在访问哪些页面。现在,一种方法是使用 mprotect
和 SIGSEGV
处理程序来记下正在访问的页面。然而,这涉及到为我感兴趣的所有内存页设置保护位的开销。
想到的第二种方法是在开始时使转换后备缓冲区 (TLB) 无效,然后记下未命中的情况。每次未命中时,我都会记下所寻址的内存页,从而记下它。现在的问题是如何处理 Linux 程序用户空间中的 TLB 未命中。
如果您知道比 TLB 未命中或 mprotect 更快的方法来记下脏内存页,请告诉我。另外,我想要一个仅适用于 x86 的解决方案。
I want to see which pages are being accessed by my program. Now one way is to use mprotect
with SIGSEGV
handler to note down pages which are being accessed. However, this involves the overhead of setting protection bits for all the memory pages I'm interested in.
The second way that comes in mind is to invalidate the Translation Lookaside Buffer (TLB) in the beginning and then note down the misses. At each miss I will note down the addressed memory page and therefore note it down. Now the question is how to handle TLB misses in user space for a linux program.
And if you know even a faster method than either TLB misses or mprotect to note down dirtied memory pages, kindly let me know. Also, I want a solution for x86 only.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以模拟 CPU 并获取此数据。变体:
这个开销是否太大了?
您无法处理未命中事件,也无法在用户空间中处理,也无法在内核空间中处理(在 x86 和许多其他流行平台上)。这是因为大多数平台在硬件中管理 TLB 未命中: 。 MMU(CPU/芯片组的一部分)将在页表上进行遍历并透明地获取物理地址。
仅当某些位被设置或地址区域未映射时,才会生成页错误中断并将其传递给内核。
另外,似乎没有办法在现代CPU中转储TLB(但386DX能够做到这一点)
你可以尝试通过引入的延迟来检测 TLB 未命中。但这种延迟可以通过 TLB 查找的乱序启动来隐藏。
另外,大多数硬件事件(内存访问、tlb 访问、tlb 命中、tlb 未命中)都是通过硬件性能监控来计数的(这部分 CPU 由 Vtune、CodeAnalyst 和 oprofile 使用)。不幸的是,这只是事件的全局计数器,您不能同时激活超过 2-4 个事件。好消息是,您可以将性能监视器设置为在达到某个计数时中断。然后你将获得(通过中断)指令地址($eip),即达到计数的位置。因此,您可以使用此硬件找到 TLB-miss-heavy 热点(每个现代 x86 cpu 中都有它;intel 和 amd)。
You can simulate a CPU and get this data. Variants:
Is this overhead too big?
You cant handle a miss nor in user-space neither in kernel-space (on x86 and many other popular platforms). This is because most platforms manages TLB misses in hardware:. MMU (part of CPU/chipset) will do a walk on page tables and will get physical address transparently.
Only if some bits are set or when the address region is not mapped, page fault interrupt is generated and delivered to kernel.
Also, seems there is no way to dump TLB in modern CPUs (but 386DX was able to to this)
You can try to detect TLB miss by the delay introduced. But this delay can be hided by Out-of-order start of TLB lookup.
Also, most hardware events (memory access, tlb access, tlb hits, tlb misses) are counted by hardware performance monitoring (this part of CPU is used by Vtune, CodeAnalyst and oprofile). Unfortunately, this is only a global counters for events and you can't activate more than 2-4 events at same time. The good news is that you can set the perfmon counter to interrupt when some count is reached. Then you will get (via interrupt) address of instruction ($eip), where the count was reached. So, you can find TLB-miss-heavy hot-spot with this hardware (it is in every modern x86 cpu; both intel and amd).
TLB对于用户空间程序是透明的,最多可以通过一些性能计数器(没有地址)来计算TLB未命中次数。
TLB is transparent to userspace program, at most you can count TLB misses by some performance counter (without addresses).
查看您的进程的 /proc/PID/maps 文件。根据 http://www.kernel.org/doc/Documentation/ 中的文档filesystems/proc.txt, /proc/PID/maps 指定每个进程的内存映射。该地图将告诉您“我的程序正在访问哪些页面”。但是,您似乎想知道其中哪些是脏页。虽然我不确定如何找到脏页面的确切列表,但可以通过查看私有脏页面和共享脏页面来找到脏页面的数量。 /proc/PID/smaps 并将其除以页面大小。请注意,此方法非常快。我相信通过定期轮询 /proc/PID/maps 可以大致了解哪些页面是脏的。
Take a look at /proc/PID/maps file for your process. According to the documentation in http://www.kernel.org/doc/Documentation/filesystems/proc.txt, /proc/PID/maps specifies the memory map for each process. This map will tell you "which pages are being accessed by my program". However, it looks like you want to know which of these are dirty pages. While I am not sure how to find the exact list of pages that are dirty, it's possible to find how many pages are dirty by looking at private dirty and shared dirty fields in /proc/PID/smaps and dividing it by the pagesize. Note that this method is pretty fast. I believe an approximate idea of which pages are dirty can be obtained by polling /proc/PID/maps periodically.