有没有办法检查处理器缓存最近是否已刷新?
在 i386 linux 上。如果可能的话,最好在 c/(c/posix std libs)/proc 中。如果没有,是否有任何程序集或第三方库可以做到这一点?
编辑:我正在尝试开发测试内核模块是否清除缓存行或整个处理器(使用 wbinvd())。程序以 root 身份运行,但如果可能的话,我更愿意留在用户空间中。
On i386 linux. Preferably in c/(c/posix std libs)/proc if possible. If not is there any piece of assembly or third party library that can do this?
Edit: I'm trying to develop test whether a kernel module clear a cache line or the whole proccesor(with wbinvd()). Program runs as root but I'd prefer to stay in user space if possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
缓存一致性系统会尽最大努力向您隐藏此类内容。我认为您必须间接观察它,要么使用性能计数寄存器来检测缓存未命中,要么通过使用高分辨率计时器仔细测量读取内存位置的时间。
该程序在我的 x86_64 机器上运行,以演示
clflush
的效果。它计算使用 rdtsc 读取全局变量所需的时间。作为直接与 CPU 时钟相关的单个指令,直接使用 rdtsc 非常适合此目的。以下是输出:
您会看到 3 个试验:第一个确保
i
位于缓存中(确实如此,因为它刚刚作为 BSS 的一部分归零),第二个是读取i
应该在缓存中。然后,clflush
将i
踢出缓存(及其邻居),并显示重新读取它需要更长的时间。最终读取验证它是否返回到缓存中。结果的重现性非常好,并且差异足够大,足以轻松看出缓存未命中情况。如果您愿意校准 rdtsc() 的开销,您可以使差异更加明显。如果您无法读取要测试的内存地址(尽管即使
/dev/mem
的mmap
也可以用于这些目的),您也许能够推断出您想要测试的内存地址。如果您知道缓存的缓存行大小和关联性。然后,您可以使用可访问的内存位置来探测您感兴趣的集合中的活动。源代码:
(1.使用
静态内联
或引用的其他方法此处(如果使用较新的 gcc)2. 受到 评论,最好使用
asm volatile ("lfence;rdtsc;lfence" : "=a" (a), "=d" (d)::"memory");
如果您的 CPU 可能在运行时重新排序指令。这里易失性
意味着不需要在cflush
周围使用mfence
来确保cflush
后的指令可以观察其效果)Cache coherent systems do their utmost to hide such things from you. I think you will have to observe it indirectly, either by using performance counting registers to detect cache misses or by carefully measuring the time to read a memory location with a high resolution timer.
This program works on my x86_64 box to demonstrate the effects of
clflush
. It times how long it takes to read a global variable usingrdtsc
. Being a single instruction tied directly to the CPU clock makes direct use ofrdtsc
ideal for this.Here is the output:
You see 3 trials: The first ensures
i
is in the cache (which it is, because it was just zeroed as part of BSS), the second is a read ofi
that should be in the cache. Thenclflush
kicksi
out of the cache (along with its neighbors) and shows that re-reading it takes significantly longer. A final read verifies it is back in the cache. The results are very reproducible and the difference is substantial enough to easily see the cache misses. If you cared to calibrate the overhead ofrdtsc()
you could make the difference even more pronounced.If you can't read the memory address you want to test (although even
mmap
of/dev/mem
should work for these purposes) you may be able to infer what you want if you know the cacheline size and associativity of the cache. Then you can use accessible memory locations to probe the activity in the set you're interested in.Source code:
(1. Use
static inline
or other methods referenced here if using newer gcc2. Inspired by the comment, better use
asm volatile ("lfence;rdtsc;lfence" : "=a" (a), "=d" (d)::"memory");
if your CPU may reorder the instruction at runtime. Herevolatile
implies no need ofmfence
aroundcflush
to ensure thatinstructions after
cflush
can observe its effect)我不知道有什么通用命令可以获取缓存状态,但是有一些方法:
您提到了 WBINVD - 据我所知,它将始终刷新完整的(即所有缓存行)
I dont know of any generic command to get the the cache state, but there are ways:
You mentioned WBINVD - afaik that will always flush complete, i.e. all, cache lines
它可能不是您具体问题的答案,但您是否尝试过使用缓存分析器,例如 缓存研磨?它只能用于分析用户空间代码,但您仍然可以使用它,例如,如果函数的代码不依赖于任何特定于内核的接口,则将其移动到用户空间。
它实际上可能比尝试向处理者询问可能存在或可能不存在的信息更有效,并且这些信息可能会受到您仅仅询问它的影响 - 是的,海森堡 远远领先于他的时代:-)
It may not be an answer to your specific question, but have you tried using a cache profiler such as Cachegrind? It can only be used to profile userspace code, but you might be able to use it nonetheless, by e.g. moving the code of your function to userspace if it does not depend on any kernel-specific interfaces.
It might actually be more effective than trying to ask the processor for information that may or may not exist and that will be probably affected by your mere asking about it - yes, Heisenberg was way before his time :-)