刷新 x86 中的 iCache
无论如何,我可以在 x86 架构中刷新 iCache 吗?就像 WBINVD 一样,它将使数据缓存中的所有缓存行失效并刷新。
Is there anyway I can flush iCache in x86 architecture ? Like WBINVD which will invalidate and flush all the cachelines in data cache.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据文档,
wbinvd
刷新并无效< em>所有缓存,而不仅仅是数据和统一缓存。 (如果您在启用分页的情况下运行它,我不确定这是否包括 TLB。)您想测试什么? L1i 未命中/L2 命中以获取代码?我认为不可能只刷新 I 缓存而不刷新所有级别的缓存。
假设有一个 8 路 32kiB L1i 缓存,您可以通过在别名的 8 个地址处执行代码来为特定行创建冲突未命中。但缓存替换通常是伪 LRU,而不是真正的 LRU,因此您可能需要多次跳过一组超过 8 个别名行来确定。
clflush
/clflushopt
应该对特定的缓存行起作用。他们需要刷新所有核心中所有级别缓存的行。我假设他们还会从(虚拟寻址的)微指令缓存中逐出已解码的微指令。
但是,如果您希望在 JIT 编译某些内容后保持这种正确性,只需跳转或调用新编写的指令就足以避免过时的指令获取。
(事实上,在当前的 x86 实现上,它们监听存储到管道中的任何代码地址,因此即使您将相同的物理页映射到不同的虚拟地址,并在执行另一个时写入一个,您也永远不会看到过时的指令获取通过自修改代码观察 x86 上的过时指令提取)
您只需要担心编译器会优化掉“死存储”到您转换为函数指针的缓冲区。在 GNU C / C++ 中,对您写入的字节范围使用
__builtin___clear_cache
。它在 x86 上编译为零指令(与 ARM 或其他具有非一致性指令缓存的 ISA 不同),但仍然需要不优化指令字节存储:__builtin___clear_cache 是如何工作的工作?According to the docs,
wbinvd
flushes and invalidates all caches, not just data and unified caches. (I'm not sure if that includes TLBs if you ran it with paging enabled.)What are you trying to test? L1i miss / L2 hit for code-fetch? I don't think it's possible to purposely flush just the I-cache without also flushing all levels of cache.
You could create conflict misses for a specific line by executing code at 8 addresses that alias it, assuming an 8-way 32kiB L1i cache. But cache replacement is usually pseudo-LRU, not true LRU, so you might want to jump through a set of more than 8 aliasing lines a couple times to make sure.
clflush
/clflushopt
should do the trick for a specific cache line. They're required to flush the line from all levels of cache in all cores.I assume they would also evict decoded uops from the (virtually addressed) uop cache.
But if you want this correctness after JIT-compiling something, merely jumping or calling to the newly-written instructions is sufficient to avoid stale instruction fetch.
(In fact, on current x86 implementations, they snoop stores to any code address in the pipeline, so you'll never see stale instruction fetch even when you have the same physical page mapped to different virtual addresses, and write one while executing the other. Observing stale instruction fetching on x86 with self-modifying code)
You only need to worry about your compiler optimizing away "dead stores" to a buffer you cast to a function pointer. In GNU C / C++, use
__builtin___clear_cache
on the range of bytes you wrote. It compiles to zero instructions on x86 (unlike ARM or other ISAs with non-coherent instruction caches), but it is still needed to not optimize away stores of instruction bytes: How does __builtin___clear_cache work?