gcc 内存对齐编译指示

发布于 2024-08-29 23:06:08 字数 576 浏览 7 评论 0原文

gcc 是否具有内存对齐 pragma,类似于 Intel 编译器中的 #pragma vectorlined? 我想告诉编译器使用对齐的加载/存储指令来优化特定循环。为了避免可能的混淆,这与结构打包无关。

例如:

#if defined (__INTEL_COMPILER)
#pragma vector aligned
#endif
        for (int a = 0; a < int(N); ++a) {
            q10 += Ix(a,0,0)*Iy(a,1,1)*Iz(a,0,0);
            q11 += Ix(a,0,0)*Iy(a,0,1)*Iz(a,1,0);
            q12 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,0,1);
            q13 += Ix(a,1,0)*Iy(a,0,0)*Iz(a,0,1);
            q14 += Ix(a,0,0)*Iy(a,1,0)*Iz(a,0,1);
            q15 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,1,1);
        }

谢谢

Does gcc have memory alignment pragma, akin #pragma vector aligned in Intel compiler?
I would like to tell compiler to optimize particular loop using aligned loads/store instructions. to avoid possible confusion, this is not about struct packing.

e.g:

#if defined (__INTEL_COMPILER)
#pragma vector aligned
#endif
        for (int a = 0; a < int(N); ++a) {
            q10 += Ix(a,0,0)*Iy(a,1,1)*Iz(a,0,0);
            q11 += Ix(a,0,0)*Iy(a,0,1)*Iz(a,1,0);
            q12 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,0,1);
            q13 += Ix(a,1,0)*Iy(a,0,0)*Iz(a,0,1);
            q14 += Ix(a,0,0)*Iy(a,1,0)*Iz(a,0,1);
            q15 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,1,1);
        }

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

终难遇 2024-09-05 23:06:08

您可以通过使用 typedef 创建一个可以声明指向的指针的过度对齐类型,告诉 GCC 指针指向对齐的内存。

这对 gcc 有帮助,但对 clang7.0 或 ICC19 没有帮助,请查看它们发出的 x86-64 非 AVX asm Godbolt。 (只有 GCC 将加载折叠到 mulps 的内存操作数中,而不是使用单独的 movups)。如果您想向 GCC 本身以外的 GNU C 编译器可移植地传达对齐承诺,则必须使用 __builtin_assume_aligned 。


来自 http://gcc.gnu.org/onlinedocs/gcc/Type-Attributes .html

typedef double aligned_double __attribute__((aligned (16)));
// Note: sizeof(aligned_double) is 8, not 16
void some_function(aligned_double *x, aligned_double *y, int n)
{
    for (int i = 0; i < n; ++i) {
        // math!
    }
}

这不会使 aligned_double 为 16 字节宽。这只会使其与 16 字节边界对齐,或者更确切地说是数组中的第一个边界。看看我计算机上的反汇编,一旦我使用对齐指令,我就开始看到很多向量操作。我目前使用的是 Power 架构计算机,因此它是 altivec 代码,但我认为这可以满足您的需求。

(注意:测试时我没有使用 double ,因为 altivec 不支持双浮点数。)

您可以在此处查看使用类型属性的其他一些自动向量化示例:http://gcc.gnu.org/projects/tree-ssa/vectorization.html

You can tell GCC that a pointer points to aligned memory by using a typedef to create an over-aligned type that you can declare pointers to.

This helps gcc but not clang7.0 or ICC19, see the x86-64 non-AVX asm they emit on Godbolt. (Only GCC folds a load into a memory operand for mulps, instead of using a separate movups). You have have to use __builtin_assume_aligned if you want to portably convey an alignment promise to GNU C compilers other than GCC itself.


From http://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html

typedef double aligned_double __attribute__((aligned (16)));
// Note: sizeof(aligned_double) is 8, not 16
void some_function(aligned_double *x, aligned_double *y, int n)
{
    for (int i = 0; i < n; ++i) {
        // math!
    }
}

This won't make aligned_double 16 bytes wide. This will just make it aligned to a 16-byte boundary, or rather the first one in an array will be. Looking at the disassembly on my computer, as soon as I use the alignment directive, I start to see a LOT of vector ops. I am using a Power architecture computer at the moment so it's altivec code, but I think this does what you want.

(Note: I wasn't using double when I tested this, because there altivec doesn't support double floats.)

You can see some other examples of autovectorization using the type attributes here: http://gcc.gnu.org/projects/tree-ssa/vectorization.html

往昔成烟 2024-09-05 23:06:08

我使用 g++ 版本 4.5.2(Ubuntu 和 Windows)尝试了您的解决方案,它没有矢量化循环。

如果删除对齐属性,则会使用未对齐的负载对循环进行矢量化。

如果函数是内联的,以便可以在消除指针的情况下直接访问数组,则它会通过对齐负载进行矢量化。

在这两种情况下,对齐属性都会阻止矢量化。这很讽刺:“aligned_double *x”本来应该启用矢量化,但它却做了相反的事情。

哪个编译器为您报告了矢量化循环?我怀疑这不是gcc编译器?

I tried your solution with g++ version 4.5.2 (both Ubuntu and Windows) and it did not vectorize the loop.

If the alignment attribute is removed then it vectorizes the loop, using unaligned loads.

If the function is inlined so that the array can be accessed directly with the pointer eliminated, then it is vectorized with aligned loads.

In both cases, the alignment attribute prevents vectorization. This is ironic: The "aligned_double *x" was supposed to enable vectorization but it does the opposite.

Which compiler was it that reported vectorized loops for you? I suspect it was not a gcc compiler?

淡墨 2024-09-05 23:06:08

gcc 是否有内存对齐 pragma,类似于 #pragma 矢量对齐

看起来较新版本的 GCC 有 __builtin_assume_aligned

内置函数:void * __builtin_assume_aligned (const void *exp, size_talign, ...)

此函数返回其第一个参数,并允许编译器假设返回的指针至少是对齐字节对齐的。
该内置函数可以有两个或三个参数,如果有三个,
第三个参数应该是整数类型,如果它非零
表示错位偏移。例如:

void *x = __builtin_assume_aligned (arg, 16);

意味着编译器可以假设设置为arg的x至少是16字节对齐的,而:

void *x = __builtin_assume_aligned (arg, 32, 8);

意味着编译器可以假设 x 设置为 arg,(char *) x - 8 是 32 字节对齐的。

根据 2010 年左右 Stack Overflow 上的一些其他问题和答案,似乎内置功能在 GCC 3 和早期的 GCC 4 中不可用。但我不知道截止点在哪里。

Does gcc have memory alignment pragma, akin #pragma vector aligned

It looks like newer versions of GCC have __builtin_assume_aligned:

Built-in Function: void * __builtin_assume_aligned (const void *exp, size_t align, ...)

This function returns its first argument, and allows the compiler to assume that the returned pointer is at least align bytes aligned.
This built-in can have either two or three arguments, if it has three,
the third argument should have integer type, and if it is nonzero
means misalignment offset. For example:

void *x = __builtin_assume_aligned (arg, 16);

means that the compiler can assume x, set to arg, is at least 16-byte aligned, while:

void *x = __builtin_assume_aligned (arg, 32, 8);

means that the compiler can assume for x, set to arg, that (char *) x - 8 is 32-byte aligned.

Based on some other questions and answers on Stack Overflow circa 2010, it appears the built-in was not available in GCC 3 and early GCC 4. But I do not know where the cut-off point is.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文