limit 关键字在 gcc/g++ 中是否提供了显着的好处?
有没有人见过关于在 gcc/g++ 中使用 C/C++ restrict
关键字是否在现实中(而不仅仅是理论上)提供任何显着性能提升的任何数字/分析?
我读过各种推荐/贬低其使用的文章,但我还没有遇到任何实际数字可以证明双方的论点。
编辑
我知道 restrict
并不是 C++ 的正式一部分,但它受到一些编译器的支持,并且我读过 Christer Ericson 强烈推荐使用它。
Has anyone ever seen any numbers/analysis on whether or not use of the C/C++ restrict
keyword in gcc/g++ actual provides any significant performance boost in reality (and not just in theory)?
I've read various articles recommending / disparaging its use, but I haven't ran across any real numbers practically demonstrating either sides arguments.
EDIT
I know that restrict
is not officially part of C++, but it is supported by some compilers and I've read a paper by Christer Ericson which strongly recommends its usage.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
limit 关键字有不同的作用。
在某些情况下(图像处理),我已经看到了 2 倍甚至更多的改进。大多数时候,差异并没有那么大。大约10%。
这是一个说明差异的小例子。我编写了一个非常基本的 4x4 矢量 * 矩阵变换作为测试。请注意,我必须强制该函数不被内联。否则,GCC 会检测到我的基准代码中没有任何别名指针,并且由于内联而限制不会产生影响。
我也可以将转换函数移动到不同的文件中。
结果:(在我的 2 Ghz Core Duo 上)
在该系统上,执行速度提高了 20%。
为了显示它在多大程度上取决于架构,我让相同的代码在 Cortex-A8 嵌入式 CPU 上运行(稍微调整了循环计数,因为我不想等待那么久):
这里的差异仅为 9% (顺便说一句,相同的编译器。)
The restrict keyword does a difference.
I've seen improvements of factor 2 and more in some situations (image processing). Most of the time the difference is not that large though. About 10%.
Here is a little example that illustrate the difference. I've written a very basic 4x4 vector * matrix transform as a test. Note that I have to force the function not to be inlined. Otherwise GCC detects that there aren't any aliasing pointers in my benchmark code and restrict wouldn't make a difference due to inlining.
I could have moved the transform function to a different file as well.
Results: (on my 2 Ghz Core Duo)
Over the thumb 20% faster execution, on that system.
To show how much it depends on the architecture I've let the same code run on a Cortex-A8 embedded CPU (adjusted the loop count a bit cause I don't want to wait that long):
Here the difference is just 9% (same compiler btw.)
它可以减少指令数量,如下例所示,因此请尽可能使用它。
GCC 4.8 Linux x86-64 exmample
输入:
编译和反编译:
使用
-O0
,它们是相同的。使用
-O3
:对于新手来说,调用约定是:
rdi
= 第一个参数rsi
= 第二个参数rdx
= 第三个参数结论:3 条指令而不是 4 条。
当然,指令可以有不同的延迟,但是这提供了一个好主意。
为什么 GCC 能够优化它?
上面的代码取自Wikipedia 示例这非常很有启发性。
f
的伪汇编:对于
fr
:真的更快吗?
呃……不适用于这个简单的测试:
然后:
在 Ubuntu 上14.04 AMD64 CPU 英特尔 i5-3210M。
我承认我仍然不了解现代CPU。如果您出现以下情况,请告诉我:
It can reduce the number of instructions as shown on the example below, so use it whenever possible.
GCC 4.8 Linux x86-64 exmample
Input:
Compile and decompile:
With
-O0
, they are the same.With
-O3
:For the uninitiated, the calling convention is:
rdi
= first parameterrsi
= second parameterrdx
= third parameterConclusion: 3 instructions instead of 4.
Of course, instructions can have different latencies, but this gives a good idea.
Why GCC was able to optimize that?
The code above was taken from the Wikipedia example which is very illuminating.
Pseudo assembly for
f
:For
fr
:Is it really any faster?
Ermmm... not for this simple test:
And then:
on Ubuntu 14.04 AMD64 CPU Intel i5-3210M.
I confess that I still don't understand modern CPUs. Let me know if you:
揭秘限制关键字一文提到了论文 为什么程序员指定的别名是一个坏主意 (pdf)说它通常没有帮助,并提供了测量结果来支持这一点。
The article Demystifying The Restrict Keyword refers to the paper Why Programmer-specified Aliasing is a Bad Idea (pdf) which says it generally doesn't help and provides measurements to back this up.
请注意,允许
restrict
关键字的 C++ 编译器可能仍会忽略它。例如 此处。Note that C++ compilers that allow the
restrict
keyword may still ignore it. That is the case for example here.我测试了 这个 C 程序。如果没有
restrict
,则需要 12.640 秒才能完成,如果有restrict
,则需要 12.516 秒。看起来可以节省一些时间。I tested this C-Program. Without
restrict
it took 12.640 seconds to complete, withrestrict
12.516. Looks like it can save some time.