Intel x86 ISA 上的 _mm_load_ps 与 _mm_load_pd 与等

发布于 2024-12-26 20:08:18 字数 222 浏览 1 评论 0原文

下面两行有什么区别?

__m128 x = _mm_load_ps((float *) ptr);
__m128 y = _mm_load_pd((double *)ptr);

换句话说,为什么有这么多不同的_mm_load_xyz指令,而不是通用的__m128 _mm_load(const void *)

What's the difference between the following two lines?

__m128 x = _mm_load_ps((float *) ptr);
__m128 y = _mm_load_pd((double *)ptr);

In other words, why are there so many different _mm_load_xyz instructions, instead of a generic __m128 _mm_load(const void *)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

热鲨 2025-01-02 20:08:18

存在不同的内在函数,因为它们对应于不同的指令。

存在不同的加载指令,因为英特尔希望保持设计处理器的自由度,在该处理器上,双精度向量由与单精度向量或整数向量不同的物理寄存器文件支持,或者使用不同的执行单元。如果没有办法指定数据应加载到适当的寄存器文件或转发网络中,则任何这些都可能会增加额外的延迟。

一种思考方式是,不同的指令执行“相同的操作”,但另外还向处理器提供提示,告诉它正在加载的数据将如何被未来的指令使用。这可以帮助处理器确保数据位于正确的位置以尽可能高效地使用,否则它可能会被处理器忽略。

请注意,这不仅仅是一个假设。存在一些处理器,在这些处理器上使用整数向量加载 (MOVDQA) 加载浮点运算消耗的数据比使用浮点加载获取浮点运算数据需要更多时间(反之亦然) 。有关该主题的更多详细信息,请参阅英特尔优化手册或 Agner Fog 的注释。使用与您将如何使用数据相匹配的负载,以避免将来出现此类性能危害的风险。

There are different intrinsics because they correspond to different instructions.

There are different load instructions because Intel wants to maintain the freedom to design a processor on which double-precision vectors are backed by a different physical register file than are single-precision vectors or integer vectors, or use different execution units. Any of these might add additional latency if there were not a way to specify that data should be loaded into the appropriate register file or forwarding network.

One way to think about it is that the different instructions do the "same thing", but additionally provide a hint to the processor telling it how the data that is being loaded will be used by future instructions. This may help the processor make sure that the data is in the right place to be used as efficiently as possible, or it may be ignored by the processor.

Note that this isn't just a hypothetical. There exist processors on which using an integer vector load (MOVDQA) to load data that is consumed by a floating-point operation requires more time than using a floating-point load to get data for a floating-point operation (and vice-versa). See the Intel Optimization Manual, or Agner Fog's notes for more detail on the subject. Use the load that matches how you will use the data to avoid the risk of such performance hazards in the future.

永不分离 2025-01-02 20:08:18

_mm_load_ps 加载 4 个单精度浮点值

_mm_load_pd 加载 2 个双精度浮点值

这些做不同的事情,所以我认为拥有不同的函数是有意义的。另外,在 C 语言中,不存在重载。

_mm_load_ps loads 4 single precision floating point values

_mm_load_pd loads 2 double precision floating point values

These do different things, so I think it just makes sense to have different functions. Also, in C, there's no overloading.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文