我正在做大量的矩阵算术,并且想利用 C99 的 restrict
指针限定符。
我想将矩阵设置为指向指针的指针,以便轻松添加下标,如下所示:
int **A = malloc (ncols * sizeof(int *));
A[0] = malloc (nrows * ncols * sizof(int));
for (int i=1; i < ncols; i++) {
A[i] = A[0] + i*nrows;
}
现在,对于矩阵乘法函数,
void mmultiply ( int nrows, int ncols, int **Out, int **A, int **B);
我必须将参数的两个指针限定为受限吗?这是有效的语法,但我很难确定 int *restrict *restrict
的行为是否与 int **restrict
不同。
那么,在适当限制指针的情况下,通过 A[0][col*nrows + row] 访问元素是否未定义? (即,编译器是否会假设我仅通过A[col][row]
访问矩阵来获取row
的值,使得行)?或者我必须保持一致?
I am doing lots of matrix arithmetic and would like to take advantage of C99's restrict
pointer qualifier.
I'd like to setup my matrices as pointers to pointers to allow for easy subscripting, like so:
int **A = malloc (ncols * sizeof(int *));
A[0] = malloc (nrows * ncols * sizof(int));
for (int i=1; i < ncols; i++) {
A[i] = A[0] + i*nrows;
}
Now, for a matrix multiplication function
void mmultiply ( int nrows, int ncols, int **Out, int **A, int **B);
must I qualify both pointers of the arguments as restricted? It's valid syntax, but I'm having a hard time determining if int *restrict *restrict
behaves any differently than int **restrict
.
Then, with the pointers properly restricted, is accessing elements through A[0][col*nrows + row]
undefined? (ie, will the compiler assume that I only access the matrix through A[col][row]
for values of row
such that row < nrow
)? Or must I simply remain consistent?
发布评论
评论(3)
对于第一个问题“是”,如果您使用两个
restrict
限定符,则意味着不同的东西,具体来说,指针也不会被别名。至于是否有任何区别:理论上是的,实际上,这取决于优化器。对于第二个问题“是”,它将假设通过行指针访问的任何内容都只能通过行指针访问。
您也可以在其中添加
const
。最后,如果这是位于 -O2、-O3 或 -Os 的 gcc,则编译器已经在基于类型进行别名分析。我确信其他编译器也会这样做。这意味着限制指针与整数已经被理解,只留下可能相互存储的数组。
总之,优化器将假设指针没有被存储为整数,并且它知道在循环期间它没有执行任何指针写入。
因此,您可能会得到相同的代码,但只有一个限制。
For the first question, "yes", it will mean something different if you use both
restrict
qualifiers, specifically, that the pointers also won't be aliased. As to whether it makes any difference: theoretically yes, in practice, it depends on the optimizer.For the second question, "yes", it will assume that anything accessed through a row pointer is only accessed through the row pointer.
You could throw
const
in there too.Finally, if this is gcc at -O2, -O3, or -Os, the compiler is already doing an alias analysis based on types. I'm sure other compilers do this also. This means that restricting the pointers vs the ints is already understood, leaving only the arrays that could possibly store to each other.
In sum, the optimizer will assume that the pointers aren't being stored into as ints, and it knows it isn't doing any pointer writes during the loop.
So you will probably get the same code with only the one restrict.
外部(第二个)限制告诉编译器没有任何指针数组(A、B 和 out)别名。内部(第一个)限制告诉编译器没有任何整数数组(由指针数组的元素指向)别名。
如果您同时访问 A[0][col*nrows + row] 和 A[col][row] 那么您就违反了内部限制,因此事情可能会中断。
The outer (second) restrict tells the compiler that none of the arrays of pointers (A, B, and out) alias. The inner (first) restrict tells the compiler that none of the arrays of ints (pointed to by elements of the arrays of pointers) alias.
If you access both A[0][col*nrows + row] and A[col][row] then you're violating the inner restrict, so things might break.
int **restrict
仅断言 Out、A 和 B 寻址的内存不重叠(除非 A 和 B 可以重叠,假设您的函数不修改它们中的任何一个)。这意味着指针数组。它不会断言 Out、A 和 B 所指向的内存内容。n1124 中的脚注 117 表示:通过与 const 类比,我怀疑用
restrict
限定两次将断言您想要的内容,即数组中没有任何值指向重叠内存。但阅读该标准后,我无法向自己证明它确实如此。我认为“让 D 是一个普通标识符的声明,它提供了一种将对象 P 指定为类型 T 的限制限定指针的方法”确实意味着对于int *restrict *restrict A
,则 A[0] 和 A[1] 是指定为限制限定 int 指针的对象。但这是相当沉重的法律术语。请注意,我不知道您的编译器是否真的会利用这些知识做任何事情。显然可以,只是执行与否的问题。
所以我真的不知道与传统的 C 二维数组相比,您获得了什么,在传统的 C 二维数组中,您只需分配 rows * cols * sizeof(int) 并使用 A[cols 进行索引*行+列]。那么显然您只需要使用一次restrict,并且任何使用
restrict
执行任何操作的编译器都将能够跨写入Out 重新排序从A 和B 的读取。当然,如果没有restrict
,它就不能,所以通过做你正在做的事情,你就将自己置于编译器的怜悯之下。如果它不能处理双重限制,只能处理单一限制的情况,那么您的双重间接会导致您的优化付出代价。乍一看,无论如何,乘法可能比附加的指针间接更快。您显然关心性能,否则您根本不会使用限制,因此在进行此更改之前,我会相当仔细地测试性能(在您关心的所有编译器上),以便稍微更好的语法,而不必记住有多少每次访问数组时,数组中都会有列。
通过 A[0][col*nrows + row] 访问元素是否未定义?
是的,如果元素被其中一次访问修改,因为这使得 A[0] 成为也访问的内存的别名通过 A[col]。如果只有 A 和 B 是限制限定指针,那就没问题,但如果 A[0] 和 A[col] 是限制限定指针,那就不行了。
我假设您没有修改此函数中的 A,所以实际上该别名没问题。但是,如果您对 Out 执行相同的操作,则行为将是不确定的。
int **restrict
only asserts that the memory addressed by Out, A and B don't overlap (except that A and B can overlap, assuming your function doesn't modify either of them). This means the arrays of pointers. It doesn't assert anything about the contents of the memory pointed to by Out, A, and B. Footnote 117 in n1124 says:By analogy with
const
, I suspect that qualifying withrestrict
twice will assert what you want, which is that none of the values in the array points to overlapping memory. But reading the standard, I can't prove to myself that it actually does. I reckon that "Let D be a declaration of an ordinary identifier that provides a means of designating an object P as a restrict-qualified pointer to type T" does indeed mean that forint *restrict *restrict A
, then A[0] and A[1] are objects designated as a restrict-qualified pointer to int. But it's pretty heavy legalese.I have no idea whether your compiler will actually do anything with that knowledge, mind you. Clearly it could, it's a question of whether it's implemented.
So I don't really know what you've gained over a conventional C 2-D array, where you just allocate
rows * cols * sizeof(int)
, and index withA[cols*row + col]
. Then you clearly only need one use of restrict, and any compiler that does anything withrestrict
will be able to re-order reads from A and B across writes to Out. Withoutrestrict
, of course, it can't, so by doing what you're doing, you're throwing yourself on your compiler's mercy. If it can't cope with double-restrict, only the single restrict case, then your double-indirection has cost you the optimization.At first guess, multiplication is likely to be faster than an additional pointer indirection anyway. You obviously care about performance or you wouldn't be using restrict at all, so I'd test performance fairly carefully (on all compilers you care about) before making this change for the sake of slightly nicer syntax and not having to remember how many columns there are in your array every time you access it.
is accessing elements through A[0][col*nrows + row] undefined?
Yes, if the element is modified by one of the accesses, because this makes A[0] an alias for memory also accessed via A[col]. That'd be fine if only A and B were restrict-qualified pointers, but not if A[0] and A[col] are.
I assume that you don't modify A in this function, so actually that alias is fine. If you did the same thing with Out, though, behavior would be undefined.