当前位置：文江博客话题详情

将包装结构的阵列铸造为Simd矢量

发布于 2025-02-11 04:35:05 字数 1508 浏览 1 评论 0 原文

说我有一个包装结构，用作幻影类型。

struct Wrapper {
  float value;
}

将该结构的数组直接加载到SIMD固有类型（例如 __ M256 ）中是合法的吗？例如，

alignas(32) Wrapper arr[8] = {};
static_assert(sizeof(Wrapper) == sizeof(float));
__m256 x = _mm256_load_ps(reinterpret_cast<float*>(arr));


// or (I think this is equivalent):
__m256 y = *(__m256 *)arr;

讨论：

我知道我不能正常使用一个阵列包装器像 t 的数组一样expr.add＃6“ rel =“ noreferrer”> [expr.add]/6 。
- 即使我可以在给定平台上确保对象表示相同（例如，包装器中没有填充），它仍然是未定义的行为，因为 timur doumler 。
就是说，负载似乎不使用指针算术。
__ m256 被定义，以使其可以别名任何东西，并且似乎打算加载，对于一个人可以加载，因为示例， int16_t [] 通过铸造中 __ M256i 。
将包装器*施放到 float*并进行该值不会违反严格的混叠，因为 pointer-inter-interconvertibilty ，尽管这似乎是无关紧要的，因为只有最终的演员> __ M256*确实很重要。

在我看来，指针算术规则不适用：如果我可以断言类型是兼容的（例如没有填充），那么由于矢量类型的特殊属性，直接施放是有效的。但是我还没有看到这种特殊的用法，并且担心编译器可能会调用指针算术或负载操作和触发UB的其他规则。鉴于这是一个非标准的扩展，我可以确定吗？

原文

Say I have a wrapper struct, serving as a phantom type.

struct Wrapper {
  float value;
}

Is it legal to load an array of this struct directly into an SIMD intrinsic type such as __m256? For example,

alignas(32) Wrapper arr[8] = {};
static_assert(sizeof(Wrapper) == sizeof(float));
__m256 x = _mm256_load_ps(reinterpret_cast<float*>(arr));


// or (I think this is equivalent):
__m256 y = *(__m256 *)arr;

Discussion:

I know that I can't normally use an array of Wrapper like an array of T because the pointer arithmetic is illegal: [expr.add]/6.
- Even if I can ensure the object representation is identical on the given platform (e.g. no padding in Wrapper), it would still be undefined behavior, as described by Timur Doumler.
That said, a load doesn't seem to use pointer arithmetic.
__m256 is defined such that it can alias anything and it seems intended that one can load, for example, int16_t[] into __m256i by casting.
Casting a Wrapper* to a float* and taking the value does not violate strict aliasing, because of pointer-interconvertibilty, though this seems irrelevant because only the final cast to __m256* really matters.

To me it seems like the pointer arithmetic rules don't apply: if I can assert that the types are compatible (e.g. there is no padding), then it's valid to cast directly because of the special properties of the vector type. But I haven't seen this particular usage and have some worries that a compiler might invoke the pointer arithmetic or another rule for the load operation and trigger UB. Given that this is a non-standard extension, can I be sure?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不喜欢何必死缠烂打 2025-02-18 04:35:05

这是完全安全的，

您不会直接删除 float*，而仅将其传递到 _MM256_LOAD_PS 执行Aliasing-Safe Load。就语言律法而言，您可以查看 _mm256_load_ps / _mm256_store_ps memcpy （到一个私有本地变量），除了它的ub如果指针不适合32字节对齐。

包装器*和 float*之间的互通度并不重要。您不是要解释 float*。

如果您一直在使用 _mm_load_ss（arr）上的buggy GCC版本，该版本以 _mm_set_ss（ *ptr）（ *ptr）而不是使用 may_alias typdef，则使用它以 _mm_set_ss（ *ptr）对于 float ，那很重要。（不幸的是，即使是当前的GCC bug ; _mm_loadu_si32 是但不是较旧的 _SS 和 _sd loads。）但是，是一个编译器错误，IMO。 _MM_LOAD_PS 是Aliasing-Safe，因此 _MM_LOAD_SS 都不是没有意义的，当它们都服用 float*时。如果您想要具有普通C的CALIAN/ALIGNMENT语义的负载，以便更多地向优化器保证，您只需使用 _MM_SET_SS（ *foo）就可以自己解释。

Intel Intersics的确切混杂语义在任何地方都没有记录在AFAIK的任何地方。 MSVC已经开发了许多X86特定的代码，这些代码根本不会强制执行严格的混音，即它就像 GCC -FNO-STRICT-Strict-Aliasing ，定义了之类的事物的行为。 *（int*）my_float ，甚至鼓励它用于类型。

在历史上不确定英特尔的编译器，但我猜它也不会对基于类型的混叠优化进行优化，否则他们希望他们更早地定义了 movd 32位的整数加载/存储的更好的内在信息比 _mm_loadu_si32 在过去几年中。您可以从 void* arg中分辨出来是最新的：Intel以前做了疯狂的事情，例如 _mm_loadl_epi64（__ m128i*）用于 movqq 加载，服用，服用指向16字节对象的指针，但仅加载低8个字节（没有对齐要求）。

因此，许多英特尔内在内容似乎对C和C ++的安全规则很随意，就像是由认为C是便携式组件的人设计的。或至少他们的内在物质应该以这种方式起作用。

正如我在答案中指出的那样，您在问题中链接（是`retinterpret_cast`在硬件simd vector指针和相应的类型之间是不确定的行为吗？）吗？），英特尔的内在API有效地需要编译器来支持创建错误的POINTER，只要您不deref deRef deRef deRef deRef他们自己。包括 _mm_mm_loadu_ps 的未对准 float*，支持任何 abor 对齐，而不仅仅是4

。与英特尔发布的示例兼容的方式）可能不需要支持指针类型之间的任意铸造（无DEREF），但实际上，所有X86编译器都这样做，因为它们针对具有字节 - 可观内存的平面存储器模型。

由于存在用于聚集和散射的内在信息，因此用 0 基础的用例，带有指针元素的 _mm256_i64gather_epi64 （例如，并行4个链接列表）要求C ++实现如果想支持这一点，请使用理智的对象代理来指针。

像往常一样，我认为有100％的钉书证明，在 struct> struct {int a; float b [3]; }; ，但是我认为使用内在的每个人都期望情况如此。而且，没有人愿意使用将其打破的编译器在 memcpy 带有相同源指针的情况下是安全的。

但是在您的情况下，除了 _MM256_LOAD_PS 本身是一个异常安全的负载之外，您甚至不需要在这里依赖任何事实保证。 您已经正确地表明，在ISO C中创建该 float*并将其传递给不透明的功能是100％安全的。

，是的，是的 __ M256*完全等于 _MM256_LOAD_PS ，实际上大多数编译器如何实现 _MM256_LOAD_PS 。

（相比之下， _mm256_loadu_ps 会将其投入到指向较低的32字节向量类型的指针中，该类型不属于已记录的API，例如GCC的 __ m256_U* 。也许将其传递给内置功能，但是编译器使其实现，它等同于一个纪念品，包括缺乏对齐要求。

This is fully safe

You're not directly dereffing the float*, only passing it to _mm256_load_ps which does an aliasing-safe load. In terms of language-lawyering, you can look at _mm256_load_ps / _mm256_store_ps as doing a memcpy (to a private local variable), except it's UB if the pointer isn't 32-byte aligned.

Interconvertibility between Wrapper* and float* isn't really relevant; you're not derefing a float*.

If you'd been using _mm_load_ss(arr) on a buggy GCC version that implements it as _mm_set_ss( *ptr ) instead using a may_alias typdef for float, then that would matter. (Unfortunately even current GCC still has that bug; _mm_loadu_si32 was fixed in GCC11.3 but not the older _ss and _sd loads.) But that is a compiler bug, IMO. _mm_load_ps is aliasing-safe, so it makes no sense that _mm_load_ss wouldn't be, when they both take float*. If you wanted a load with normal C aliasing/alignment semantics to promise more to the optimizer, you'd just deref yourself, using _mm_set_ss( *foo ).

The exact aliasing semantics of Intel Intrinsics are not AFAIK documented anywhere. A lot of x86-specific code has been developed with MSVC, which doesn't enforce strict aliasing at all, i.e. it's like gcc -fno-strict-aliasing, defining the behaviour of stuff like *(int*)my_float and even encouraging it for type-punning.

Not sure about Intel's compiler historically, but I'm guessing it also didn't do type-based aliasing optimizations, otherwise they hopefully would have defined better intrinsics for movd 32-bit integer loads/stores much earlier than _mm_loadu_si32 in the last few years. You can tell from the void* arg that it's recent: Intel previously did insane stuff like _mm_loadl_epi64(__m128i*) for a movq load, taking a pointer to a 16-byte object but only loading the low 8 bytes (with no alignment requirement).

So a lot of Intel intrinsics stuff seemed pretty casual about C and C++ safety rules, like it was designed by people who thought of C as a portable assembler. Or at least that their intrinsics were supposed to work that way.

As I pointed out in my answer you linked in the question (Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior?), Intel's intrinsics API effectively requires compilers to support creating misaligned pointers as long as you don't deref them yourself. Including misaligned float* for _mm_loadu_ps, which supports any alignment, not just multiples of 4.

You could probably argue that supporting Intel's intrinsics API (in a way that's compatible with the examples Intel's published) might not require supporting arbitrary casting between pointer types (without deref), but in practice all x86 compilers do, because they target a flat memory model with byte-addressable memory.

With the existence of intrinsics for gather and scatter, use-cases like using a 0 base with pointer elements for _mm256_i64gather_epi64 (e.g. to walk 4 linked lists in parallel) require that a C++ implementation use a sane object-representation for pointers if they want to support that.

As usual with Intel intrinsics, I don't think there's documentation that 100% nails down proof that it would be safe to use _mm_load_ps on a struct { int a; float b[3]; };, but I think everyone working with intrinsics expects that to be the case. And nobody would want to use a compiler that broke it for a cases where memcpy with the same source pointer would be safe.

But in your case, you don't even need to depend on any de-facto guarantees here, beyond the fact that _mm256_load_ps itself is an aliasing-safe load. You've correctly shown that it's 100% safe to create that float* in ISO C, and pass it to an opaque function.

And yes, deref of an __m256* is exactly equivalent to _mm256_load_ps, and is in fact how most compilers implement _mm256_load_ps.

(By comparison, _mm256_loadu_ps would cast to a pointer to a less-aligned 32-byte vector type which isn't part of the documented API, like GCC's __m256_u*. Or maybe pass it to a builtin function. But however the compiler makes it happen, it's equivalent to a memcpy, including the lack of alignment requirement.)

回复收藏 0 原文

~没有更多了~

关于作者

心凉怎暖

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

将包装结构的阵列铸造为Simd矢量

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

这是完全安全的，

This is fully safe

关于作者

相关话题

热门标签

推荐作者

夢野间

百度③文鱼

小草泠泠

zhuwenyan

weirdo

坚持沉默

友情链接

将包装结构的阵列铸造为Simd矢量

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

这是完全安全的，

This is fully safe

关于作者

相关话题

热门标签

推荐作者

夢野间

百度③文鱼

小草泠泠

zhuwenyan

weirdo

坚持沉默

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。