gcc 内存对齐编译指示

发布于 2024-08-29 23:06:08 字数 576 浏览 7 评论 0原文

gcc 是否具有内存对齐 pragma，类似于 Intel 编译器中的 #pragma vectorlined？我想告诉编译器使用对齐的加载/存储指令来优化特定循环。为了避免可能的混淆，这与结构打包无关。

例如：

#if defined (__INTEL_COMPILER)
#pragma vector aligned
#endif
        for (int a = 0; a < int(N); ++a) {
            q10 += Ix(a,0,0)*Iy(a,1,1)*Iz(a,0,0);
            q11 += Ix(a,0,0)*Iy(a,0,1)*Iz(a,1,0);
            q12 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,0,1);
            q13 += Ix(a,1,0)*Iy(a,0,0)*Iz(a,0,1);
            q14 += Ix(a,0,0)*Iy(a,1,0)*Iz(a,0,1);
            q15 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,1,1);
        }

谢谢

原文

Does gcc have memory alignment pragma, akin #pragma vector aligned in Intel compiler?
I would like to tell compiler to optimize particular loop using aligned loads/store instructions. to avoid possible confusion, this is not about struct packing.

e.g:

#if defined (__INTEL_COMPILER)
#pragma vector aligned
#endif
        for (int a = 0; a < int(N); ++a) {
            q10 += Ix(a,0,0)*Iy(a,1,1)*Iz(a,0,0);
            q11 += Ix(a,0,0)*Iy(a,0,1)*Iz(a,1,0);
            q12 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,0,1);
            q13 += Ix(a,1,0)*Iy(a,0,0)*Iz(a,0,1);
            q14 += Ix(a,0,0)*Iy(a,1,0)*Iz(a,0,1);
            q15 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,1,1);
        }

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

终难遇 2024-09-05 23:06:08

您可以通过使用 typedef 创建一个可以声明指向的指针的过度对齐类型，告诉 GCC 指针指向对齐的内存。

这对 gcc 有帮助，但对 clang7.0 或 ICC19 没有帮助，请查看它们发出的 x86-64 非 AVX asm Godbolt。（只有 GCC 将加载折叠到 mulps 的内存操作数中，而不是使用单独的 movups）。如果您想向 GCC 本身以外的 GNU C 编译器可移植地传达对齐承诺，则必须使用 __builtin_assume_aligned 。

来自 http://gcc.gnu.org/onlinedocs/gcc/Type-Attributes .html

typedef double aligned_double __attribute__((aligned (16)));
// Note: sizeof(aligned_double) is 8, not 16
void some_function(aligned_double *x, aligned_double *y, int n)
{
    for (int i = 0; i < n; ++i) {
        // math!
    }
}

这不会使 aligned_double 为 16 字节宽。这只会使其与 16 字节边界对齐，或者更确切地说是数组中的第一个边界。看看我计算机上的反汇编，一旦我使用对齐指令，我就开始看到很多向量操作。我目前使用的是 Power 架构计算机，因此它是 altivec 代码，但我认为这可以满足您的需求。

（注意：测试时我没有使用 double ，因为 altivec 不支持双浮点数。）

您可以在此处查看使用类型属性的其他一些自动向量化示例：http://gcc.gnu.org/projects/tree-ssa/vectorization.html

You can tell GCC that a pointer points to aligned memory by using a typedef to create an over-aligned type that you can declare pointers to.

This helps gcc but not clang7.0 or ICC19, see the x86-64 non-AVX asm they emit on Godbolt. (Only GCC folds a load into a memory operand for mulps, instead of using a separate movups). You have have to use __builtin_assume_aligned if you want to portably convey an alignment promise to GNU C compilers other than GCC itself.

From http://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html

typedef double aligned_double __attribute__((aligned (16)));
// Note: sizeof(aligned_double) is 8, not 16
void some_function(aligned_double *x, aligned_double *y, int n)
{
    for (int i = 0; i < n; ++i) {
        // math!
    }
}

This won't make aligned_double 16 bytes wide. This will just make it aligned to a 16-byte boundary, or rather the first one in an array will be. Looking at the disassembly on my computer, as soon as I use the alignment directive, I start to see a LOT of vector ops. I am using a Power architecture computer at the moment so it's altivec code, but I think this does what you want.

(Note: I wasn't using double when I tested this, because there altivec doesn't support double floats.)

You can see some other examples of autovectorization using the type attributes here: http://gcc.gnu.org/projects/tree-ssa/vectorization.html

回复收藏 0 原文

往昔成烟 2024-09-05 23:06:08

我使用 g++ 版本 4.5.2（Ubuntu 和 Windows）尝试了您的解决方案，它没有矢量化循环。

如果删除对齐属性，则会使用未对齐的负载对循环进行矢量化。

如果函数是内联的，以便可以在消除指针的情况下直接访问数组，则它会通过对齐负载进行矢量化。

在这两种情况下，对齐属性都会阻止矢量化。这很讽刺：“aligned_double *x”本来应该启用矢量化，但它却做了相反的事情。

哪个编译器为您报告了矢量化循环？我怀疑这不是gcc编译器？

回复收藏 0 原文

淡墨 2024-09-05 23:06:08

gcc 是否有内存对齐 pragma，类似于 #pragma 矢量对齐

看起来较新版本的 GCC 有 __builtin_assume_aligned：

内置函数：void * __builtin_assume_aligned (const void *exp, size_talign, ...)
此函数返回其第一个参数，并允许编译器假设返回的指针至少是对齐字节对齐的。
该内置函数可以有两个或三个参数，如果有三个，
第三个参数应该是整数类型，如果它非零
表示错位偏移。例如：
void *x = __builtin_assume_aligned (arg, 16);
意味着编译器可以假设设置为arg的x至少是16字节对齐的，而：
void *x = __builtin_assume_aligned (arg, 32, 8);
意味着编译器可以假设 x 设置为 arg，(char *) x - 8 是 32 字节对齐的。

根据 2010 年左右 Stack Overflow 上的一些其他问题和答案，似乎内置功能在 GCC 3 和早期的 GCC 4 中不可用。但我不知道截止点在哪里。

Does gcc have memory alignment pragma, akin #pragma vector aligned

It looks like newer versions of GCC have __builtin_assume_aligned:

Built-in Function: void * __builtin_assume_aligned (const void *exp, size_t align, ...)
This function returns its first argument, and allows the compiler to assume that the returned pointer is at least align bytes aligned.
This built-in can have either two or three arguments, if it has three,
the third argument should have integer type, and if it is nonzero
means misalignment offset. For example:
void *x = __builtin_assume_aligned (arg, 16);
means that the compiler can assume x, set to arg, is at least 16-byte aligned, while:
void *x = __builtin_assume_aligned (arg, 32, 8);
means that the compiler can assume for x, set to arg, that (char *) x - 8 is 32-byte aligned.

Based on some other questions and answers on Stack Overflow circa 2010, it appears the built-in was not available in GCC 3 and early GCC 4. But I do not know where the cut-off point is.

回复收藏 0 原文

~没有更多了~