C 结构指针解引用速度
我有一个关于指针取消引用速度的问题。我有一个这样的结构:
typedef struct _TD_RECT TD_RECT;
struct _TD_RECT {
double left;
double top;
double right;
double bottom;
};
我的问题是,哪个会更快,为什么?
情况 1:
TD_RECT *pRect;
...
for(i = 0; i < m; i++)
{
if(p[i].x < pRect->left) ...
if(p[i].x > pRect->right) ...
if(p[i].y < pRect->top) ...
if(p[i].y > pRect->bottom) ...
}
情况 2:
TD_RECT *pRect;
double left = pRect->left;
double top = pRect->top;
double right = pRect->right;
double bottom = pRect->bottom;
...
for(i = 0; i < m; i++)
{
if(p[i].x < left) ...
if(p[i].x > right) ...
if(p[i].y < top) ...
if(p[i].y > bottom) ...
}
因此,在情况 1 中,循环直接取消引用 pRect 指针来获取比较值。在情况 2 中,在函数的局部空间(在堆栈上)上创建了新值,并将这些值从 pRect 复制到局部变量。通过一个循环就会有很多比较。
在我看来,它们同样慢,因为局部变量也是堆栈上的内存引用,但我不确定......
另外,最好是通过索引继续引用 p[] ,还是增加 p一个元素并直接取消引用它,而不需要索引。
有什么想法吗?谢谢 :)
I have a question regarding the speed of pointer dereferencing. I have a structure like so:
typedef struct _TD_RECT TD_RECT;
struct _TD_RECT {
double left;
double top;
double right;
double bottom;
};
My question is, which of these would be faster and why?
CASE 1:
TD_RECT *pRect;
...
for(i = 0; i < m; i++)
{
if(p[i].x < pRect->left) ...
if(p[i].x > pRect->right) ...
if(p[i].y < pRect->top) ...
if(p[i].y > pRect->bottom) ...
}
CASE 2:
TD_RECT *pRect;
double left = pRect->left;
double top = pRect->top;
double right = pRect->right;
double bottom = pRect->bottom;
...
for(i = 0; i < m; i++)
{
if(p[i].x < left) ...
if(p[i].x > right) ...
if(p[i].y < top) ...
if(p[i].y > bottom) ...
}
So in case 1, the loop is directly dereferencing the pRect pointer to obtain the comparison values. In case 2, new values were made on the function's local space (on the stack) and the values were copied from the pRect to the local variables. Through a loop there will be many comparisons.
In my mind, they would be equally slow, because the local variable is also a memory reference on the stack, but I'm not sure...
Also, would it be better to keep referencing p[] by index, or increment p by one element and dereference it directly without an index.
Any ideas? Thanks :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可能会发现它对现代编译器没有什么影响。他们中的大多数可能会对循环内不改变的表达式执行常见的子表达式消除。假设 C 语句和汇编代码之间存在简单的一对一映射是不明智的。我见过 gcc 输出的代码让我的汇编技能相形见绌。
但这既不是 C 也不是 C++ 问题,因为 ISO 标准没有强制要求它是如何完成的。检查的最佳方法是使用 gcc -S 之类的东西生成汇编代码,并详细检查这两种情况。
如果您远离这种微观优化并更多地关注宏观层面(例如算法选择等),您还将获得更多的投资回报。
而且,与所有优化问题一样,衡量,不要猜测!有太多变量会影响它,因此您应该在目标环境中使用实际数据对不同方法进行基准测试。
You'll probably find it won't make a difference with modern compilers. Most of them would probably perform common subexpresion elimination of the expressions that don't change within the loop. It's not wise to assume that there's a simple one-to-one mapping between your C statements and assembly code. I've seen gcc pump out code that would put my assembler skills to shame.
But this is neither a C nor C++ question since the ISO standard doesn't mandate how it's done. The best way to check for sure is to generate the assembler code with something like
gcc -S
and examine the two cases in detail.You'll also get more return on your investment if you steer away from this sort of micro-optimisation and concentrate more on the macro level, such as algorithm selection and such.
And, as with all optimisation questions, measure, don't guess! There are too many variables which can affect it, so you should be benchmarking different approaches in the target environment, and with realistic data.
这不太可能是一个巨大的性能关键差异。您可以多次分析每个选项并查看。确保您在测试中设置了编译器优化。
关于存储双精度数,使用 const 可能会降低性能。你的阵列有多大?
关于使用指针算术,这可以更快,是的。
如果您知道 left <<,您可以立即进行优化。就在你的直肠里(当然一定是)。如果x<离开它也不可能是>对,这样你就可以输入“其他”。
如果有的话,您的重大优化将来自不必循环遍历数组中的所有项目,也不必对所有项目执行 4 次检查。
例如,如果您在 x 和 y 上对数组进行索引或排序,您将能够使用二分搜索来查找 x < 的所有值。向左并循环遍历这些。
It is not likely to be a hugely performance critical difference. You could profile doing each option multiple times and see. Ensure you have your compiler optimisations set in the test.
With regards to storing the doubles, you might get some performance hit by using const. How big is your array?
With regards to using pointer arithmetic, this can be faster, yes.
You can instantly optimise if you know left < right in your rect (surely it must be). If x < left it can't also be > right so you can put in an "else".
Your big optimisation, if there is one, would come from not having to loop through all the items in your array and not have to perform 4 checks on all of them.
For example, if you indexed or sorted your array on x and y, you would be able, using binary search, to find all values that have x < left and loop through just those.
我认为第二种情况可能会更快,因为您没有在每次循环迭代时取消对指向 pRect 的指针的引用。
实际上,进行优化的编译器可能会注意到这一点,并且生成的代码可能没有差异,但 pRect 可能是 p[] 中项目的别名,这可能会阻止这种情况发生。
I think the second case is likely to be faster because you are not dereferencing the pointer to pRect on every loop iteration.
Practically, a compiler doing optimisation may notice this and there might be no difference in the code that is generated, but the possibility of pRect being an alias of an item in p[] could prevent this.
优化编译器将看到结构访问是循环不变的,因此循环不变代码运动,使您的两个案例看起来相同。
An optimizing compiler will see that the structure accesses are loop invariant and so do a Loop-invariant code motion, making your two cases look the same.
如果即使是完全未优化的编译(-O0)也会为所呈现的两种情况生成不同的代码,我会感到惊讶。为了在现代处理器上执行任何操作,数据需要加载到寄存器中。因此,即使您声明自动变量,这些变量也不会存在于主内存中,而是存在于处理器浮点寄存器之一中。即使您自己没有声明变量,情况也是如此,因此我预计即使您在 C++ 代码中声明临时变量,生成的机器代码也不会产生任何差异。
但正如其他人所说,将代码编译成汇编程序并亲自查看。
I will be surprised if even a totally non-optimized compile (- O0) will produce differentcode for the two cases presented. In order to perform any operation on a modern processor, the data need to loaded into registers. So even when you declare automatic variables, these variables will not exist in main memory but rather in one of the processors floating point registers. This will be true even when you do not declare the variables yourself and therefore I expect no difference in generated machine code even for when you declare the temporary variables in your C++ code.
But as others have said, compile the code into assembler and see for yourself.