将包装结构的阵列铸造为Simd矢量
说我有一个包装结构,用作幻影类型。
struct Wrapper {
float value;
}
将该结构的数组直接加载到SIMD固有类型(例如 __ M256
)中是合法的吗?例如,
alignas(32) Wrapper arr[8] = {};
static_assert(sizeof(Wrapper) == sizeof(float));
__m256 x = _mm256_load_ps(reinterpret_cast<float*>(arr));
// or (I think this is equivalent):
__m256 y = *(__m256 *)arr;
讨论:
- 我知道我不能正常 使用一个阵列
包装器
像t
的数组一样expr.add#6“ rel =“ noreferrer”> [expr.add]/6 。- 即使我可以在给定平台上确保对象表示相同(例如,包装器中没有填充),它仍然是未定义的行为,因为 timur doumler 。 。
- 就是说,负载似乎不使用指针算术。
-
__ m256
被定义,以使其可以别名任何东西,并且似乎打算加载,对于一个人可以加载,因为示例,int16_t []
通过铸造中__ M256i
。 - 将
包装器*
施放到float*
并进行该值不会违反严格的混叠,因为 pointer-inter-interconvertibilty ,尽管这似乎是无关紧要的,因为只有最终的演员> __ M256*确实很重要。
在我看来,指针算术规则不适用:如果我可以断言类型是兼容的(例如没有填充),那么由于矢量类型的特殊属性,直接施放是有效的。但是我还没有看到这种特殊的用法,并且担心编译器可能会调用指针算术或负载操作和触发UB的其他规则。鉴于这是一个非标准的扩展,我可以确定吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是完全安全的,
您不会直接删除
float*
,而仅将其传递到_MM256_LOAD_PS
执行Aliasing-Safe Load。就语言律法而言,您可以查看_mm256_load_ps
/_mm256_store_ps
memcpy (到一个私有本地变量),除了它的ub如果指针不适合32字节对齐。包装器*
和float*
之间的互通度并不重要。您不是要解释float*
。如果您一直在使用
_mm_load_ss(arr)
上的buggy GCC版本,该版本以_mm_set_ss( *ptr)( *ptr)
而不是使用may_alias
typdef,则使用它以_mm_set_ss( *ptr)
对于float
,那很重要。 (不幸的是,即使是当前的GCC bug ;_mm_loadu_si32
是但不是较旧的_SS
和_sd
loads。)但是,是 一个编译器错误,IMO。_MM_LOAD_PS
是Aliasing-Safe,因此_MM_LOAD_SS
都不是没有意义的,当它们都服用float*
时。如果您想要具有普通C的CALIAN/ALIGNMENT语义的负载,以便更多地向优化器保证,您只需使用_MM_SET_SS( *foo)
就可以自己解释。Intel Intersics的确切混杂语义在任何地方都没有记录在AFAIK的任何地方。 MSVC已经开发了许多X86特定的代码,这些代码根本不会强制执行严格的混音,即它就像
GCC -FNO-STRICT-Strict-Aliasing
,定义了之类的事物的行为。 *(int*)my_float
,甚至鼓励它用于类型。在历史上不确定英特尔的编译器,但我猜它也不会对基于类型的混叠优化进行优化,否则他们希望他们更早地定义了
movd
32位的整数加载/存储的更好的内在信息比_mm_loadu_si32
在过去几年中。您可以从void*
arg中分辨出来是最新的:Intel以前做了疯狂的事情,例如_mm_loadl_epi64(__ m128i*)
用于movqq
加载,服用,服用指向16字节对象的指针,但仅加载低8个字节(没有对齐要求)。因此,许多英特尔内在内容似乎对C和C ++的安全规则很随意,就像是由认为C是便携式组件的人设计的。或至少他们的内在物质应该以这种方式起作用。
正如我在答案中指出的那样,您在问题中链接(是`retinterpret_cast`在硬件simd vector指针和相应的类型之间是不确定的行为吗?)吗?),英特尔的内在API有效地需要编译器来支持创建错误的POINTER,只要您不deref deRef deRef deRef deRef他们自己。包括
_mm_mm_loadu_ps
的未对准float*
,支持任何 abor 对齐,而不仅仅是4。与英特尔发布的示例兼容的方式)可能不需要支持指针类型之间的任意铸造(无DEREF),但实际上,所有X86编译器都这样做,因为它们针对具有字节 - 可观内存的平面存储器模型。
由于存在用于聚集和散射的内在信息,因此用
0
基础的用例,带有指针元素的_mm256_i64gather_epi64
(例如,并行4个链接列表)要求C ++实现如果想支持这一点,请使用理智的对象代理来指针。像往常一样,我认为有100%的钉书证明,在
struct> struct {int a; float b [3]; };
,但是我认为使用内在的每个人都期望情况如此。而且,没有人愿意使用将其打破的编译器在memcpy
带有相同源指针的情况下是安全的。但是在您的情况下,除了
_MM256_LOAD_PS
本身是一个异常安全的负载之外,您甚至不需要在这里依赖任何事实保证。 您已经正确地表明,在ISO C中创建该float*
并将其传递给不透明的功能是100%安全的。,是的,是的
__ M256*
完全等于_MM256_LOAD_PS
,实际上大多数编译器如何实现_MM256_LOAD_PS
。(相比之下,
。也许将其传递给内置功能,但是编译器使其实现,它等同于一个纪念品,包括缺乏对齐要求。
_mm256_loadu_ps
会将其投入到指向较低的32字节向量类型的指针中,该类型不属于已记录的API,例如GCC的__ m256_U*
This is fully safe
You're not directly dereffing the
float*
, only passing it to_mm256_load_ps
which does an aliasing-safe load. In terms of language-lawyering, you can look at_mm256_load_ps
/_mm256_store_ps
as doing amemcpy
(to a private local variable), except it's UB if the pointer isn't 32-byte aligned.Interconvertibility between
Wrapper*
andfloat*
isn't really relevant; you're not derefing afloat*
.If you'd been using
_mm_load_ss(arr)
on a buggy GCC version that implements it as_mm_set_ss( *ptr )
instead using amay_alias
typdef forfloat
, then that would matter. (Unfortunately even current GCC still has that bug;_mm_loadu_si32
was fixed in GCC11.3 but not the older_ss
and_sd
loads.) But that is a compiler bug, IMO._mm_load_ps
is aliasing-safe, so it makes no sense that_mm_load_ss
wouldn't be, when they both takefloat*
. If you wanted a load with normal C aliasing/alignment semantics to promise more to the optimizer, you'd just deref yourself, using_mm_set_ss( *foo )
.The exact aliasing semantics of Intel Intrinsics are not AFAIK documented anywhere. A lot of x86-specific code has been developed with MSVC, which doesn't enforce strict aliasing at all, i.e. it's like
gcc -fno-strict-aliasing
, defining the behaviour of stuff like*(int*)my_float
and even encouraging it for type-punning.Not sure about Intel's compiler historically, but I'm guessing it also didn't do type-based aliasing optimizations, otherwise they hopefully would have defined better intrinsics for
movd
32-bit integer loads/stores much earlier than_mm_loadu_si32
in the last few years. You can tell from thevoid*
arg that it's recent: Intel previously did insane stuff like_mm_loadl_epi64(__m128i*)
for amovq
load, taking a pointer to a 16-byte object but only loading the low 8 bytes (with no alignment requirement).So a lot of Intel intrinsics stuff seemed pretty casual about C and C++ safety rules, like it was designed by people who thought of C as a portable assembler. Or at least that their intrinsics were supposed to work that way.
As I pointed out in my answer you linked in the question (Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior?), Intel's intrinsics API effectively requires compilers to support creating misaligned pointers as long as you don't deref them yourself. Including misaligned
float*
for_mm_loadu_ps
, which supports any alignment, not just multiples of 4.You could probably argue that supporting Intel's intrinsics API (in a way that's compatible with the examples Intel's published) might not require supporting arbitrary casting between pointer types (without deref), but in practice all x86 compilers do, because they target a flat memory model with byte-addressable memory.
With the existence of intrinsics for gather and scatter, use-cases like using a
0
base with pointer elements for_mm256_i64gather_epi64
(e.g. to walk 4 linked lists in parallel) require that a C++ implementation use a sane object-representation for pointers if they want to support that.As usual with Intel intrinsics, I don't think there's documentation that 100% nails down proof that it would be safe to use
_mm_load_ps
on astruct { int a; float b[3]; };
, but I think everyone working with intrinsics expects that to be the case. And nobody would want to use a compiler that broke it for a cases wherememcpy
with the same source pointer would be safe.But in your case, you don't even need to depend on any de-facto guarantees here, beyond the fact that
_mm256_load_ps
itself is an aliasing-safe load. You've correctly shown that it's 100% safe to create thatfloat*
in ISO C, and pass it to an opaque function.And yes, deref of an
__m256*
is exactly equivalent to_mm256_load_ps
, and is in fact how most compilers implement_mm256_load_ps
.(By comparison,
_mm256_loadu_ps
would cast to a pointer to a less-aligned 32-byte vector type which isn't part of the documented API, like GCC's__m256_u*
. Or maybe pass it to a builtin function. But however the compiler makes it happen, it's equivalent to a memcpy, including the lack of alignment requirement.)