有没有办法检查处理器缓存最近是否已刷新？

发布于 2024-11-07 18:42:05 字数 173 浏览 8 评论 0原文

在 i386 linux 上。如果可能的话，最好在 c/(c/posix std libs)/proc 中。如果没有，是否有任何程序集或第三方库可以做到这一点？

编辑：我正在尝试开发测试内核模块是否清除缓存行或整个处理器（使用 wbinvd()）。程序以 root 身份运行，但如果可能的话，我更愿意留在用户空间中。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

固执像三岁 2024-11-14 18:42:05

缓存一致性系统会尽最大努力向您隐藏此类内容。我认为您必须间接观察它，要么使用性能计数寄存器来检测缓存未命中，要么通过使用高分辨率计时器仔细测量读取内存位置的时间。

该程序在我的 x86_64 机器上运行，以演示 clflush 的效果。它计算使用 rdtsc 读取全局变量所需的时间。作为直接与 CPU 时钟相关的单个指令，直接使用 rdtsc 非常适合此目的。

以下是输出：

took 81 ticks
took 81 ticks
flush: took 387 ticks
took 72 ticks

您会看到 3 个试验：第一个确保 i 位于缓存中（确实如此，因为它刚刚作为 BSS 的一部分归零），第二个是读取 i 应该在缓存中。然后，clflush 将 i 踢出缓存（及其邻居），并显示重新读取它需要更长的时间。最终读取验证它是否返回到缓存中。结果的重现性非常好，并且差异足够大，足以轻松看出缓存未命中情况。如果您愿意校准 rdtsc() 的开销，您可以使差异更加明显。

如果您无法读取要测试的内存地址（尽管即使 /dev/mem 的 mmap 也可以用于这些目的），您也许能够推断出您想要测试的内存地址。如果您知道缓存的缓存行大小和关联性。然后，您可以使用可访问的内存位置来探测您感兴趣的集合中的活动。

源代码：

（1.使用静态内联或引用的其他方法此处（如果使用较新的 gcc）
2. 受到评论，最好使用asm volatile ("lfence;rdtsc;lfence" : "=a" (a), "=d" (d)::"memory"); 如果您的 CPU 可能在运行时重新排序指令。这里易失性 意味着不需要在 cflush 周围使用 mfence 来确保
cflush后的指令可以观察其效果）

#include <stdio.h>
#include <stdint.h>

inline void
clflush(volatile void *p)
{
    asm volatile ("clflush (%0)" :: "r"(p));
}

inline uint64_t
rdtsc()
{
    unsigned long a, d;
    asm volatile ("rdtsc" : "=a" (a), "=d" (d));
    return a | ((uint64_t)d << 32);
}

volatile int i;

inline void
test()
{
    uint64_t start, end;
    volatile int j;

    start = rdtsc();
    j = i;
    end = rdtsc();
    printf("took %lu ticks\n", end - start);
}

int
main(int ac, char **av)
{
    test();
    test();
    printf("flush: ");
    clflush(&i);
    test();
    test();
    return 0;
}

Cache coherent systems do their utmost to hide such things from you. I think you will have to observe it indirectly, either by using performance counting registers to detect cache misses or by carefully measuring the time to read a memory location with a high resolution timer.

This program works on my x86_64 box to demonstrate the effects of clflush. It times how long it takes to read a global variable using rdtsc. Being a single instruction tied directly to the CPU clock makes direct use of rdtsc ideal for this.

Here is the output:

took 81 ticks
took 81 ticks
flush: took 387 ticks
took 72 ticks

You see 3 trials: The first ensures i is in the cache (which it is, because it was just zeroed as part of BSS), the second is a read of i that should be in the cache. Then clflush kicks i out of the cache (along with its neighbors) and shows that re-reading it takes significantly longer. A final read verifies it is back in the cache. The results are very reproducible and the difference is substantial enough to easily see the cache misses. If you cared to calibrate the overhead of rdtsc() you could make the difference even more pronounced.

If you can't read the memory address you want to test (although even mmap of /dev/mem should work for these purposes) you may be able to infer what you want if you know the cacheline size and associativity of the cache. Then you can use accessible memory locations to probe the activity in the set you're interested in.

Source code:

(1. Use static inline or other methods referenced here if using newer gcc
2. Inspired by the comment, better use asm volatile ("lfence;rdtsc;lfence" : "=a" (a), "=d" (d)::"memory"); if your CPU may reorder the instruction at runtime. Here volatile implies no need of mfence around cflush to ensure that
instructions after cflush can observe its effect)

#include <stdio.h>
#include <stdint.h>

inline void
clflush(volatile void *p)
{
    asm volatile ("clflush (%0)" :: "r"(p));
}

inline uint64_t
rdtsc()
{
    unsigned long a, d;
    asm volatile ("rdtsc" : "=a" (a), "=d" (d));
    return a | ((uint64_t)d << 32);
}

volatile int i;

inline void
test()
{
    uint64_t start, end;
    volatile int j;

    start = rdtsc();
    j = i;
    end = rdtsc();
    printf("took %lu ticks\n", end - start);
}

int
main(int ac, char **av)
{
    test();
    test();
    printf("flush: ");
    clflush(&i);
    test();
    test();
    return 0;
}

回复收藏 0 原文

小矜持 2024-11-14 18:42:05

我不知道有什么通用命令可以获取缓存状态，但是有一些方法：

我想这是最简单的：如果您获得了内核模块，只需将其反汇编并查找缓存失效/刷新命令（atm。只有 3 个命令）我认为：WBINDVD、CLFLUSH、INVD）。
你刚才说它是针对 i386 的，但我猜你并不是指 80386。问题是有很多不同的扩展和功能。例如，最新的英特尔系列包含一些用于缓存系统的性能/分析寄存器，您可以使用它们来评估缓存未命中/命中/传输次数等。
与 2 类似，很大程度上取决于您所使用的系统。但是，当您具有多处理器配置时，您可以观察第一个缓存一致性协议 (MESI) 与第二个缓存一致性协议。

您提到了 WBINVD - 据我所知，它将始终刷新完整的（即所有缓存行）

回复收藏 0 原文