C 语言中的 2D 动态数组:这 3 个片段中哪一个执行得更快?
gprof 在我的系统 (MinGW) 上无法正常工作,因此我想知道以下哪一个片段平均效率更高。
我知道 C 编译器在内部将所有内容转换为指针算术,但尽管如此,我想知道以下任何片段是否比其他片段具有任何显着优势。
该数组已作为一维数组在连续内存中动态分配,并且可以在运行时重新分配(它适用于简单的棋盘游戏,其中允许玩家根据自己的意愿重新定义棋盘的大小) 。
请注意,我& j 必须在每次循环迭代中计算并传递到函数 set_cell() 中(gridType 是一个简单的结构,带有一些整数和指向另一个单元结构的指针)。
提前致谢!
分配内存
grid = calloc( (nrows * ncols), sizeof(gridType) );
Snippet #1(按 1D 顺序解析)
gridType *gp = grid;
register int i=0 ,j=0; // we need to pass those in set_cell()
if ( !grid )
return;
for (gp=grid; gp < grid+(nrows*ncols); gp++)
{
set_cell( gp, i, j, !G_OPENED, !G_FOUND, value, NULL );
if (j == ncols-1) { // last col of current row has been reached
j=0;
i++;
}
else // last col of current row has NOT been reached
j++;
}
Snippet #2(解析为 2D 数组,仅使用指针)
gridType *gp1, *gp2;
if ( !grid )
return;
for (gp1=grid; gp1 < grid+nrows; gp1+=ncols)
for (gp2=gp1; gp2 < gp1+ncols; gp2++)
set_cell( gp2, (gp1-grid), (gp2-gp1), !G_OPENED, !G_FOUND, value, NULL );
Snippet # 3(解析为 2D,仅使用计数器)
register int i,j; // we need to pass those in set_cell()
for (i=0; i<nrows; i++)
for (j=0; j<ncols; j++)
set_cell( &grid[i * ncols + j], i, j, !G_OPENED, !G_FOUND, value, NULL);
可用内存
free( grid );
编辑: 在保罗修正之后,我在第一个循环中将 #2 形式 gp1++) 修复为 gp1+=ncols)(谢谢!)
gprof is not working properly on my system (MinGW) so I'd like to know which one of the following snippets is more efficient, on average.
I'm aware that internally C compilers convert everything into pointers arithmetic, but nevertheless I'd like to know if any of the following snippets has any significant advantage over the others.
The array has been allocated dynamically in contiguous memory as 1d array and may be re-allocated at run time (its for a simple board game, in which the player is allowed to re-define the board's size, as often as he wants to).
Please note that i & j must get calculated and passed into the function set_cell() in every loop iteration (gridType is a simple struct with a few ints and a pointer to another cell struct).
Thanks in advance!
Allocate memory
grid = calloc( (nrows * ncols), sizeof(gridType) );
Snippet #1 (parse sequentially as 1D)
gridType *gp = grid;
register int i=0 ,j=0; // we need to pass those in set_cell()
if ( !grid )
return;
for (gp=grid; gp < grid+(nrows*ncols); gp++)
{
set_cell( gp, i, j, !G_OPENED, !G_FOUND, value, NULL );
if (j == ncols-1) { // last col of current row has been reached
j=0;
i++;
}
else // last col of current row has NOT been reached
j++;
}
Snippet #2 (parse as 2D array, using pointers only)
gridType *gp1, *gp2;
if ( !grid )
return;
for (gp1=grid; gp1 < grid+nrows; gp1+=ncols)
for (gp2=gp1; gp2 < gp1+ncols; gp2++)
set_cell( gp2, (gp1-grid), (gp2-gp1), !G_OPENED, !G_FOUND, value, NULL );
Snippet #3 (parse as 2D, using counters only)
register int i,j; // we need to pass those in set_cell()
for (i=0; i<nrows; i++)
for (j=0; j<ncols; j++)
set_cell( &grid[i * ncols + j], i, j, !G_OPENED, !G_FOUND, value, NULL);
Free memory
free( grid );
EDIT:
I fixed #2 form gp1++) to gp1+=ncols), in the 1st loop, after Paul's correction (thx!)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
对于这样的事情,答案将取决于编译器和运行它的机器。您可以尝试每个代码片段,并计算每个代码片段需要多长时间。
然而,这是过早优化的一个典型例子。最好的办法是选择看起来最清晰、最可维护的片段。从长远来看,这样做会给你带来更多的好处,而不是选择你的机器上最快的机器(无论如何在别人的机器上可能不是最快的!)
For anything like this, the answer is going to depend on the compiler and the machine you're running it on. You could try each of your code snippets, and calculating how long each one takes.
However, this is a prime example of premature optimization. The best thing to do is to pick the snippet which looks the clearest and most maintainable. You'll get much more benefit from doing that in the long run than from any savings you'd make from choosing the one that's fastest on your machine (which might not be fastest on someone else's anyway!)
好吧,片段 2 并不完全有效。你需要不同的递增行为;外循环应为 for (gp1 = grid; gp1 < grid + (nrows * ncols); gp1 += ncols)。
在另外两个中,任何关注的编译器几乎肯定会将片段 3 转换为与片段 1 等效的内容。但实际上,如果不分析它们,就无法知道。
另外,请记住 Knuth 的话:“过早的优化是所有邪恶的根源。我看到以‘优化’之名造成的损害比所有其他原因造成的损害还要多,包括纯粹的、错误的愚蠢 ”。编写编译器的人比你聪明(除非你私下是 Knuth 或 Hofstadter),所以让编译器完成它的工作,你就可以继续你的工作了。尝试编写“聪明”的优化代码通常只会让编译器感到困惑,从而阻止它编写更好、更优化的代码。
Well, snippet 2 doesn't exactly work. You need different incrementing behavior; the outer loop should read
for (gp1 = grid; gp1 < grid + (nrows * ncols); gp1 += ncols)
.Of the other two, any compiler that's paying attention will almost certainly convert snippet 3 into something equivalent to snippet 1. But really, there's no way to know without profiling them.
Also, remember the words of Knuth: "Premature optimization is the ROOT OF ALL EVIL. I have seen more damage done in the name of 'optimization' than for all other causes combined, including sheer, wrongheaded stupidity." People who write compilers are smarter than you (unless you're secretly Knuth or Hofstadter), so let the compiler do its job and you can get on with yours. Trying to write "clever" optimized code will usually just confuse the compiler, preventing it from writing even better, more optimized code.
这就是我要写的方式。恕我直言,它比你的任何方法都更短、更清晰、更简单。
This is the way I'd write it. IMHO it's shorter, clearer and simpler than any of your ways.
原谅。您仍然可以设置
基准和衡量执行
时间。
现代 CPU 上的差异直到
nrows*ncols
变得非常大或发生重新分配
非常经常发生,因此您可能会优化代码中错误的部分。
set_cell
上,而其他所有内容都可以由编译器优化为相同或非常相似的代码。excuse. You can still set up a
benchmark and measure execution
time.
difference on modern CPUs until
nrows*ncols
is getting verylarge or the reallocation happens
very often, so you might optimize the wrong part of your code.
set_cell
and everything else could be optimized to the same or very similar code by the compiler.在你测量之前你不会知道。
任何像样的编译器都可能产生相同的代码,即使它没有缓存、堆砌、预测分支和其他聪明的东西的影响,这意味着仅仅猜测指令的数量是不够的
You don't know until you measure it.
Any decent compiler may produce the same code, even if it doesn't the effects of caching, pilelining, predictive branching and other clever stuff means that simply guessing the number of instructions isn't enough