当前位置：文江博客话题详情

刷新 x86 中的 iCache

发布于 2024-12-11 04:18:51 字数 69 浏览 0 评论 0原文

无论如何，我可以在 x86 架构中刷新 iCache 吗？就像 WBINVD 一样，它将使数据缓存中的所有缓存行失效并刷新。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

暮年慕年 2024-12-18 04:18:51

根据文档，wbinvd 刷新并无效< em>所有缓存，而不仅仅是数据和统一缓存。（如果您在启用分页的情况下运行它，我不确定这是否包括 TLB。）

您想测试什么？ L1i 未命中/L2 命中以获取代码？我认为不可能只刷新 I 缓存而不刷新所有级别的缓存。

假设有一个 8 路 32kiB L1i 缓存，您可以通过在别名的 8 个地址处执行代码来为特定行创建冲突未命中。但缓存替换通常是伪 LRU，而不是真正的 LRU，因此您可能需要多次跳过一组超过 8 个别名行来确定。

clflush / clflushopt应该对特定的缓存行起作用。他们需要刷新所有核心中所有级别缓存的行。

我假设他们还会从（虚拟寻址的）微指令缓存中逐出已解码的微指令。

CLFLUSH 指令可在所有特权级别使用，并受到与字节加载相关的所有权限检查和故障的影响（此外，CLFLUSH 指令允许在只执行的情况下刷新线性地址）段）。与加载一样，CLFLUSH 指令设置页表中的 A 位，但不设置 D 位。

但是，如果您希望在 JIT 编译某些内容后保持这种正确性，只需跳转或调用新编写的指令就足以避免过时的指令获取。

（事实上，在当前的 x86 实现上，它们监听存储到管道中的任何代码地址，因此即使您将相同的物理页映射到不同的虚拟地址，并在执行另一个时写入一个，您也永远不会看到过时的指令获取通过自修改代码观察 x86 上的过时指令提取）

您只需要担心编译器会优化掉“死存储”到您转换为函数指针的缓冲区。在 GNU C / C++ 中，对您写入的字节范围使用 __builtin___clear_cache。它在 x86 上编译为零指令（与 ARM 或其他具有非一致性指令缓存的 ISA 不同），但仍然需要不优化指令字节存储：__builtin___clear_cache 是如何工作的工作？

According to the docs, wbinvd flushes and invalidates all caches, not just data and unified caches. (I'm not sure if that includes TLBs if you ran it with paging enabled.)

What are you trying to test? L1i miss / L2 hit for code-fetch? I don't think it's possible to purposely flush just the I-cache without also flushing all levels of cache.

You could create conflict misses for a specific line by executing code at 8 addresses that alias it, assuming an 8-way 32kiB L1i cache. But cache replacement is usually pseudo-LRU, not true LRU, so you might want to jump through a set of more than 8 aliasing lines a couple times to make sure.

clflush / clflushopt should do the trick for a specific cache line. They're required to flush the line from all levels of cache in all cores.

I assume they would also evict decoded uops from the (virtually addressed) uop cache.

The CLFLUSH instruction can be used at all privilege levels and is subject to all permission checking and faults associated with a byte load (and in addition, a CLFLUSH instruction is allowed to flush a linear address in an execute-only segment). Like a load, the CLFLUSH instruction sets the A bit but not the D bit in the page tables.

But if you want this correctness after JIT-compiling something, merely jumping or calling to the newly-written instructions is sufficient to avoid stale instruction fetch.

(In fact, on current x86 implementations, they snoop stores to any code address in the pipeline, so you'll never see stale instruction fetch even when you have the same physical page mapped to different virtual addresses, and write one while executing the other. Observing stale instruction fetching on x86 with self-modifying code)

You only need to worry about your compiler optimizing away "dead stores" to a buffer you cast to a function pointer. In GNU C / C++, use __builtin___clear_cache on the range of bytes you wrote. It compiles to zero instructions on x86 (unlike ARM or other ISAs with non-coherent instruction caches), but it is still needed to not optimize away stores of instruction bytes: How does __builtin___clear_cache work?

回复收藏 0 原文

~没有更多了~