如何使用X86平台上使用GCC声明内存范围为无法摄取的内存范围?

发布于 2024-12-04 14:39:57 字数 133 浏览 3 评论 0原文

虽然我已经阅读了有关此的 movntdqa 指令,但已经找到了一种干净的方法来表示不可缓存的内存范围或读取数据,以免污染缓存。 我想从海湾合作委员会做到这一点。我的主要目标是交换到大型数组中的随机位置。由于数据恢复很少,希望通过避免缓存来加速此操作。

Although I have read about movntdqa instructions regarding this but have figured out a clean way to express a memory range uncacheable or read data so as to not pollute the cache.
I want to do this from gcc. My main goal is to swap to random locations in an large array. Hoping to accelerate this operation by avoiding caching since there is very little data resue.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

许你一世情深 2024-12-11 14:39:57

我认为您所描述的是内存类型范围寄存器。您可以在 Linux 下使用 /proc/mttr / ioctl(2) 控制这些(如果可用并且您是用户 0),请参阅 此处 为例。由于它在物理地址范围上工作,我认为您将很难以合理的方式使用它。

更好的方法是查看 GCC 提供的编译器内在函数并找到一个或更多,表达您的意图。查看 Ulrich Drepper 的系列“每个程序员都应该了解内存”,特别是第 5 部分它涉及绕过缓存。看起来 _mm_prefetch(ptr, _MM_HINT_NTA) 可能适合您的需求。

一如既往,当谈到绩效时——衡量、衡量、再衡量。 Drepper 的系列有精彩的部分详细介绍了如何做到这一点(第 7 部分)以及代码示例和其他内容加快代码的内存性能时可尝试的策略。

I think what you're describing is Memory Type Range Registers. You can control these under Linux (if available and you're user 0) using /proc/mttr / ioctl(2) see here for an example. As it works on a physical address range I think you're going to have a hard time using it in a reasonable way.

A better way is to look at the compiler intrinsics GCC provides and find one or more, that expresses your intent. Have a look at Ulrich Drepper's series on "What every programmer should know about memory", in particular part 5 which deals with bypassing the cache. It looks like _mm_prefetch(ptr, _MM_HINT_NTA) might be appropriate for your needs.

As always when it comes to performance - measure, measure, measure. Drepper's series has excellent parts detailing how this can be done (part 7) as well as code examples and other strategies to try when speeding up the memory performance of your code.

眼泪都笑了 2024-12-11 14:39:57

来自用户786653的所有好建议;尤其是 Ulrich Drepper 的文章。我要补充一点:

  • 无论是否未缓存,VM 硬件都必须在 TLB 中查找页面信息,而 TLB 的容量有限。不要低估 TLB 抖动对随机访问性能的影响。如果您还没有,请查看结果这里了解为什么你真的想使用用于数组数据的大页面,而不是微小的 4K 默认值(这可以追溯到“640K 对任何人来说都应该足够了”的时代)。当然,如果你说的是真正巨大的数组,甚至比充满 2MB 页面的 TLB 可以引用的还要大,即使这样也无济于事。

  • 您对“nt”指令(例如_mm_stream_ps内在指令)有什么看法?我不相信声明页面未缓存会比适当使用这些页面带来更好的性能,而且它们比其他替代方案更容易使用。不过,我很想看到相反的证据。

All good advice from user786653; the Ulrich Drepper article especially. I'll add:

  • Uncached or not, the VM HW is going to have to look up page info in the TLB, which has a limited capacity. Don't underestimate the impact of TLB thrashing on random access performance. If you're not already, see the results here for why you really want to be using huge pages for your array data and not the teeny 4K default (which goes back to the days of "640K ought to be enough for anybody"). Of course if you're talking really huge arrays bigger than even a TLB full of 2MB pages can reference, even that won't help with this.

  • What have you got against the 'nt' instructions (e.g _mm_stream_ps intrinsic) ? I'm unconvinced declaring pages uncached will get you any better performance than appropriate use of those, and they're much easier to use than the alternatives. Would be very interested to see evidence to the contrary though.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文