有没有办法检查处理器缓存最近是否已刷新?

发布于 2024-11-07 18:42:05 字数 173 浏览 4 评论 0原文

在 i386 linux 上。如果可能的话,最好在 c/(c/posix std libs)/proc 中。如果没有,是否有任何程序集或第三方库可以做到这一点?

编辑:我正在尝试开发测试内核模块是否清除缓存行或整个处理器(使用 wbinvd())。程序以 root 身份运行,但如果可能的话,我更愿意留在用户空间中。

On i386 linux. Preferably in c/(c/posix std libs)/proc if possible. If not is there any piece of assembly or third party library that can do this?

Edit: I'm trying to develop test whether a kernel module clear a cache line or the whole proccesor(with wbinvd()). Program runs as root but I'd prefer to stay in user space if possible.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

固执像三岁 2024-11-14 18:42:05

缓存一致性系统会尽最大努力向您隐藏此类内容。我认为您必须间接观察它,要么使用性能计数寄存器来检测缓存未命中,要么通过使用高分辨率计时器仔细测量读取内存位置的时间。

该程序在我的 x86_64 机器上运行,以演示 clflush 的效果。它计算使用 rdtsc 读取全局变量所需的时间。作为直接与 CPU 时钟相关的单个指令,直接使用 rdtsc 非常适合此目的。

以下是输出:

took 81 ticks
took 81 ticks
flush: took 387 ticks
took 72 ticks

您会看到 3 个试验:第一个确保 i 位于缓存中(确实如此,因为它刚刚作为 BSS 的一部分归零),第二个是读取 i 应该在缓存中。然后,clflushi 踢出缓存(及其邻居),并显示重新读取它需要更长的时间。最终读取验证它是否返回到缓存中。结果的重现性非常好,并且差异足够大,足以轻松看出缓存未命中情况。如果您愿意校准 rdtsc() 的开销,您可以使差异更加明显。

如果您无法读取要测试的内存地址(尽管即使 /dev/memmmap 也可以用于这些目的),您也许能够推断出您想要测试的内存地址。如果您知道缓存的缓存行大小和关联性。然后,您可以使用可访问的内存位置来探测您感兴趣的集合中的活动。

源代码:

(1.使用静态内联或引用的其他方法此处(如果使用较新的 gcc)
2. 受到 评论,最好使用asm volatile ("lfence;rdtsc;lfence" : "=a" (a), "=d" (d)::"memory"); 如果您的 CPU 可能在运行时重新排序指令。这里易失性 意味着不需要在 cflush 周围使用 mfence 来确保
cflush后的指令可以观察其效果)

#include <stdio.h>
#include <stdint.h>

inline void
clflush(volatile void *p)
{
    asm volatile ("clflush (%0)" :: "r"(p));
}

inline uint64_t
rdtsc()
{
    unsigned long a, d;
    asm volatile ("rdtsc" : "=a" (a), "=d" (d));
    return a | ((uint64_t)d << 32);
}

volatile int i;

inline void
test()
{
    uint64_t start, end;
    volatile int j;

    start = rdtsc();
    j = i;
    end = rdtsc();
    printf("took %lu ticks\n", end - start);
}

int
main(int ac, char **av)
{
    test();
    test();
    printf("flush: ");
    clflush(&i);
    test();
    test();
    return 0;
}

Cache coherent systems do their utmost to hide such things from you. I think you will have to observe it indirectly, either by using performance counting registers to detect cache misses or by carefully measuring the time to read a memory location with a high resolution timer.

This program works on my x86_64 box to demonstrate the effects of clflush. It times how long it takes to read a global variable using rdtsc. Being a single instruction tied directly to the CPU clock makes direct use of rdtsc ideal for this.

Here is the output:

took 81 ticks
took 81 ticks
flush: took 387 ticks
took 72 ticks

You see 3 trials: The first ensures i is in the cache (which it is, because it was just zeroed as part of BSS), the second is a read of i that should be in the cache. Then clflush kicks i out of the cache (along with its neighbors) and shows that re-reading it takes significantly longer. A final read verifies it is back in the cache. The results are very reproducible and the difference is substantial enough to easily see the cache misses. If you cared to calibrate the overhead of rdtsc() you could make the difference even more pronounced.

If you can't read the memory address you want to test (although even mmap of /dev/mem should work for these purposes) you may be able to infer what you want if you know the cacheline size and associativity of the cache. Then you can use accessible memory locations to probe the activity in the set you're interested in.

Source code:

(1. Use static inline or other methods referenced here if using newer gcc
2. Inspired by the comment, better use asm volatile ("lfence;rdtsc;lfence" : "=a" (a), "=d" (d)::"memory"); if your CPU may reorder the instruction at runtime. Here volatile implies no need of mfence around cflush to ensure that
instructions after cflush can observe its effect)

#include <stdio.h>
#include <stdint.h>

inline void
clflush(volatile void *p)
{
    asm volatile ("clflush (%0)" :: "r"(p));
}

inline uint64_t
rdtsc()
{
    unsigned long a, d;
    asm volatile ("rdtsc" : "=a" (a), "=d" (d));
    return a | ((uint64_t)d << 32);
}

volatile int i;

inline void
test()
{
    uint64_t start, end;
    volatile int j;

    start = rdtsc();
    j = i;
    end = rdtsc();
    printf("took %lu ticks\n", end - start);
}

int
main(int ac, char **av)
{
    test();
    test();
    printf("flush: ");
    clflush(&i);
    test();
    test();
    return 0;
}
小矜持 2024-11-14 18:42:05

我不知道有什么通用命令可以获取缓存状态,但是有一些方法:

  1. 我想这是最简单的:如果您获得了内核模块,只需将其反汇编并查找缓存失效/刷新命令(atm。只有 3 个命令)我认为:WBINDVD、CLFLUSH、INVD)。
  2. 你刚才说它是针对 i386 的,但我猜你并不是指 80386。问题是有很多不同的扩展和功能。例如,最新的英特尔系列包含一些用于缓存系统的性能/分析寄存器,您可以使用它们来评估缓存未命中/命中/传输次数等。
  3. 与 2 类似,很大程度上取决于您所使用的系统。但是,当您具有多处理器配置时,您可以观察第一个缓存一致性协议 (MESI) 与第二个缓存一致性协议。

您提到了 WBINVD - 据我所知,它将始终刷新完整的(即所有缓存行)

I dont know of any generic command to get the the cache state, but there are ways:

  1. I guess this is the easiest: If you got your kernel module, just disassemble it and look for cache invalidation / flushing commands (atm. just 3 came to my mind: WBINDVD, CLFLUSH, INVD).
  2. You just said it is for i386, but I guess you dont mean a 80386. The problem is that there are many different with different extension and features. E.g. the newest Intel series has some performance/profiling registers for the cache system included, which you can use to evalute cache misses/hits/number of transfers and similar.
  3. Similar to 2, very depending on the system you got. But when you have a multiprocessor configuration you could watch the first cache coherence protocol (MESI) with the 2nd.

You mentioned WBINVD - afaik that will always flush complete, i.e. all, cache lines

缱绻入梦 2024-11-14 18:42:05

它可能不是您具体问题的答案,但您是否尝试过使用缓存分析器,例如 缓存研磨?它只能用于分析用户空间代码,但您仍然可以使用它,例如,如果函数的代码不依赖于任何特定于内核的接口,则将其移动到用户空间。

它实际上可能比尝试向处理者询问可能存在或可能不存在的信息更有效,并且这些信息可能会受到您仅仅询问它的影响 - 是的,海森堡 远远领先于他的时代:-)

It may not be an answer to your specific question, but have you tried using a cache profiler such as Cachegrind? It can only be used to profile userspace code, but you might be able to use it nonetheless, by e.g. moving the code of your function to userspace if it does not depend on any kernel-specific interfaces.

It might actually be more effective than trying to ask the processor for information that may or may not exist and that will be probably affected by your mere asking about it - yes, Heisenberg was way before his time :-)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文