使用 GCC 进行循环版本控制

发布于 2024-08-11 08:22:33 字数 1114 浏览 15 评论 0原文

我正在使用 GCC 进行自动矢量化。由于客户要求，我无法使用内在函数或属性。（我无法获取用户输入来支持向量化）

如果可以向量化的数组的对齐信息未知，GCC 会调用“循环版本控制”的过程。当在树上完成循环矢量化时，将执行循环版本控制。当循环被识别为可矢量化时，并且数据对齐或数据依赖性的约束阻碍了它（因为它们无法在编译时确定），则将生成循环的两个版本。这些是循环的矢量化和非矢量化版本，以及运行时检查的对齐或依赖性以控制执行哪个版本。

我的问题是我们如何强制执行对齐？如果我找到了一个可矢量化的循环，则由于缺少对齐信息，我不应该生成该循环的两个版本。

例如。考虑以下代码

short a[15]; short b[15]; short c[15];
int i;

void foo()
{
    for (i=0; i<15; i++)
    {
      a[i] = b[i] ;
    }
}

树转储（选项：-fdump-tree-optimized -ftree-vectorize）

<SNIP>
     vector short int * vect_pa.49;
     vector short int * vect_pb.42;
     vector short int * vect_pa.35;
     vector short int * vect_pb.30;

    bb 2>:
     vect_pb.30 = (vector short int *) &b;
     vect_pa.35 = (vector short int *) &a;
     if (((signed char) vect_pa.35 | (signed char) vect_pb.30) & 3 == 0)    ;; <== (A)
       goto <bb 3>;
     else
       goto <bb 4>;

    bb 3>:
</SNIP>

在“bb 3”版本中生成矢量化代码。在“bb 4”处生成没有矢量化的代码。这些是通过检查对齐来完成的（语句“A”）。现在，在不使用内在函数和其他属性的情况下，我应该如何仅获取矢量化代码（没有此运行时对齐检查。）

原文

I am working on auto vectorization with GCC. I am not in a position to use intrinsics or attributes due to customer requirement. (I cannot get user input to support vectorization)

If the alignment information of the array that can be vectorized is unknown, GCC invokes a pass for 'loop versioning'. Loop versioning will be performed when loop vectorization is done on trees. When a loop is identified to be vectorizable, and the constraint on data alignment or data dependence is hindering it, (because they cannot be determined at compile time), then two versions of the loop will be generated. These are the vectorized and non-vectorized versions of the loop along with runtime checks for alignment or dependence to control which version is executed.

My question is how we have to enforce the alignment? If I have found a loop that is vectorizable, I should not generate two versions of the loop because of missing alignment information.

For example. Consider the below code

short a[15]; short b[15]; short c[15];
int i;

void foo()
{
    for (i=0; i<15; i++)
    {
      a[i] = b[i] ;
    }
}

Tree dump (options: -fdump-tree-optimized -ftree-vectorize)

<SNIP>
     vector short int * vect_pa.49;
     vector short int * vect_pb.42;
     vector short int * vect_pa.35;
     vector short int * vect_pb.30;

    bb 2>:
     vect_pb.30 = (vector short int *) &b;
     vect_pa.35 = (vector short int *) &a;
     if (((signed char) vect_pa.35 | (signed char) vect_pb.30) & 3 == 0)    ;; <== (A)
       goto <bb 3>;
     else
       goto <bb 4>;

    bb 3>:
</SNIP>

At 'bb 3' version of vectorized code is generated. At 'bb 4' code without vectorization is generated. These are done by checking the alignment (statement 'A'). Now without using intrinsics and other attributes, how should I get only the vectorized code (without this runtime alignment check.)

分享到QQ

分享到微博