You should rely on your compiler to optimise this stuff. Concentrate on using appropriate algorithms and writing reliable, readable and maintainable code.
The day tclhttpd, a webserver written in Tcl, one of the slowest scripting language, managed to outperform Apache, a webserver written in C, one of the supposedly fastest compiled language, was the day I was convinced that micro-optimizations significantly pales in comparison to using a faster algorithm/technique*.
Never worry about micro-optimizations until you can prove in a debugger that it is the problem. Even then, I would recommend first coming here to SO and ask if it is a good idea hoping someone would convince you not to do it.
It is counter-intuitive but very often code, especially tight nested loops or recursion, are optimized by adding code rather than removing them. The gaming industry has come up with countless tricks to speed up nested loops using filters to avoid unnecessary processing. Those filters add significantly more instructions than the difference between i++ and ++i.
*note: We have learned a lot since then. The realization that a slow scripting language can outperform compiled machine code because spawning threads is expensive led to the developments of lighttpd, NginX and Apache2.
There's a difference, I think, between a micro-optimization, a trick, and alternative means of doing something. It can be a micro-optimization to use ++i instead of i++, though I would think of it as merely avoiding a pessimization, because when you pre-increment (or decrement) the compiler need not insert code to keep track of the current value of the variable for use in the expression. If using pre-increment/decrement doesn't change the semantics of the expression, then you should use it and avoid the overhead.
A trick, on the other hand, is code that uses a non-obvious mechanism to achieve a result faster than a straight-forward mechanism would. Tricks should be avoided unless absolutely needed. Gaining a small percentage of speed-up is generally not worth the damage to code readability unless that small percentage reflects a meaningful amount of time. Extremely long-running programs, especially calculation-heavy ones, or real-time programs are often candidates for tricks because the amount of time saved may be necessary to meet the systems performance goals. Tricks should be clearly documented if used.
Alternatives, are just that. There may be no performance gain or little; they just represent two different ways of expressing the same intent. The compiler may even produce the same code. In this case, choose the most readable expression. I would say to do so even if it results in some performance loss (though see the preceding paragraph).
I think you do not need to think about these micro-optimizations because most of them is done by compiler. These things can only make code more difficult to read.
Remember, [edited] premature [/edited] optimization is an evil.
To be honest, that question, while valid, is not relevant today - why?
Compiler writers are a lot more smarter than they were 20 years ago, rewind back in time, then these optimizations would have been very relevant, we were all working with old 80286/386 processors, and coders would often resort to tricks to squeeze even more bytes out of the compiled code.
Today, processors are too fast, compiler writers knows the intimate details of operand instructions to make every thing work, considering that there is pipe-lining, core processors, acres of RAM, remember, with a 80386 processor, there would be 4Mb RAM and if you're lucky, 8Mb was considered superior!!
The paradigm has shifted, it was about squeezing every byte out of compiled code, now it is more on programmer productivity and getting the release out the door much sooner.
The above I have stated the nature of the processor, and compilers, I was talking about the Intel 80x86 processor family, Borland/Microsoft compilers.
If you can easily see that two different code sequences produce identical results, without making assumptions about the data other than what's present in the code, then the compiler can too, and generally will.
It's only when the transformation from one to the other is highly non-obvious or requires assuming something that you may know to be true but the compiler has no way to infer (eg. that an operation cannot overflow or that two pointers will never alias, even though they aren't declared with the restrict keyword) that you should spend time thinking about these things. Even then, the best thing to do is usually to find a way to inform the compiler about the assumptions that it can make.
If you do find specific cases where the compiler misses simple transformations, 99% of the time you should just file a bug against the compiler and get on with working on more important things.
对于 C 语言中的这些具体示例,for(;;) 是无限循环的习惯用法,“i++”是“向 i 加一”的常用习惯用法,除非您在表达式中使用该值,在这种情况下,这取决于是否含义最明确的值是增量之前或之后的值。
You would do better to consider every program you write primarily as a language in which you communicate your ideas, intentions and reasoning to other human beings who will have to bug-fix, reuse and understand it. They will spend more time on decoding garbled code than any compiler or runtime system will do executing it. To summarise, say what you mean in the clearest way, using the common idioms of the language in question.
For these specific examples in C, for(;;) is the idiom for an infinite loop and "i++" is the usual idiom for "add one to i" unless you use the value in an expression, in which case it depends whether the value with the clearest meaning is the one before or after the increment.
Someone on SO once remarked that micro-optimization was like "getting a haircut to lose weight". On American TV there is a show called "The Biggest Loser" where obese people compete to lose weight. If they were able to get their body weight down to a few grams, then getting a haircut would help.
Maybe that's overstating the analogy to micro-optimization, because I have seen (and written) code where micro-optimization actually did make a difference, but when starting off there is a lot more to be gained by simply not solving problems you don't have.
++i should be prefered over i++ for situations where you don't use the return value because it better represents the semantics of what you are trying to do (increment i) rather than any possible optimisation (it might be slightly faster, and is probably not worse).
在某些架构中,您可以使用减法指令根据减法结果设置标志,但我很确定 x86 不是其中之一,因此我们大多数人都没有使用可以访问此类指令的编译器机器指令。
Generally, loops that count towards zero are faster than loops that count towards some other number. I can imagine a situation where the compiler can't make this optimization for you, but you can make it yourself.
Say that you have and array of length x, where x is some very big number, and that you need to perform some operation on each element of x. Further, let's say that you don't care what order these operations occur in. You might do this...
int i;
for (i = 0; i < x; i++)
doStuff(array[i]);
But, you could get a little optimization by doing it this way instead -
int i;
for (i = x-1; i != 0; i--)
{
doStuff(array[i]);
}
doStuff(array[0]);
The compiler doesn't do it for you because it can't assume that order is unimportant.
MaR's example code is better. Consider this, assuming doStuff() returns an int:
int i = x;
while (i != 0)
{
--i;
printf("%d\n",doStuff(array[i]));
}
This is ok as long as printing the array contents in reverse order is acceptable, but the compiler can't decide that for you.
This being an optimization is hardware dependent. From what I remember about writing assembler (many, many years ago), counting up rather than counting down to zero requires an extra machine instruction each time you go through the loop.
If your test is something like (x < y), then evaluation of the test goes something like this:
subtract y from x, storing the result in some register r1
test r1, to set the n and z flags
branch based on the values of the n and z flags
If your test is ( x != 0), you can do this:
test x, to set the z flag
branch based on the value of the z flag
You get to skip a subtract instruction for each iteration.
There are architectures where you can have the subtract instruction set the flags based on the result of the subtraction, but I'm pretty sure x86 isn't one of them, so most of us aren't using compilers that have access to such a machine instruction.
发布评论
评论(13)
你应该依靠你的编译器来优化这些东西。专注于使用适当的算法并编写可靠、可读和可维护的代码。
You should rely on your compiler to optimise this stuff. Concentrate on using appropriate algorithms and writing reliable, readable and maintainable code.
tclhttpd(一种用 Tcl(最慢的脚本语言之一)编写的网络服务器)设法超越了 Apache(一种用 C(据说是最快的编译语言之一)编写的网络服务器)那天,我确信微优化相比之下明显相形见绌。使用更快的算法/技术*。
永远不要担心微优化,直到您可以在调试器中证明这是问题所在。即便如此,我还是建议先来这里询问是否是一个好主意,希望有人能说服你不要这样做。
这是违反直觉的,但通常代码,尤其是紧密嵌套循环或递归,是通过添加代码而不是删除代码来优化的。游戏行业已经想出了无数的技巧来使用过滤器来加速嵌套循环,以避免不必要的处理。这些过滤器添加的指令明显多于 i++ 和 ++i 之间的差异。
*注:从那时起我们学到了很多东西。由于生成线程的成本昂贵,缓慢的脚本语言可以胜过编译的机器代码,这一认识导致了 lighttpd、NginX 和 Apache2 的开发。
The day tclhttpd, a webserver written in Tcl, one of the slowest scripting language, managed to outperform Apache, a webserver written in C, one of the supposedly fastest compiled language, was the day I was convinced that micro-optimizations significantly pales in comparison to using a faster algorithm/technique*.
Never worry about micro-optimizations until you can prove in a debugger that it is the problem. Even then, I would recommend first coming here to SO and ask if it is a good idea hoping someone would convince you not to do it.
It is counter-intuitive but very often code, especially tight nested loops or recursion, are optimized by adding code rather than removing them. The gaming industry has come up with countless tricks to speed up nested loops using filters to avoid unnecessary processing. Those filters add significantly more instructions than the difference between i++ and ++i.
*note: We have learned a lot since then. The realization that a slow scripting language can outperform compiled machine code because spawning threads is expensive led to the developments of lighttpd, NginX and Apache2.
我认为,微优化、技巧和替代方法之间是有区别的。使用
++i
而不是i++
可能是一种微优化,尽管我认为这只是避免悲观,因为当您预先递增(或递减)编译器不需要插入代码来跟踪表达式中使用的变量的当前值。如果使用预递增/递减不会改变表达式的语义,那么您应该使用它并避免开销。另一方面,一个技巧是使用非显而易见的机制比直接机制更快地实现结果的代码。除非绝对必要,否则应避免使用技巧。获得小百分比的加速通常不值得以损害代码可读性为代价,除非这个小百分比反映了有意义的时间量。运行时间极长的程序(尤其是计算量大的程序)或实时程序通常是技巧的候选者,因为节省的时间可能是满足系统性能目标所必需的。如果使用了技巧,应该清楚地记录下来。
替代方案仅此而已。可能没有性能提升或性能提升很小;它们只是代表表达同一意图的两种不同方式。编译器甚至可能生成相同的代码。在这种情况下,请选择最易读的表达方式。我想说即使会导致一些性能损失也要这样做(尽管请参阅上一段)。
There's a difference, I think, between a micro-optimization, a trick, and alternative means of doing something. It can be a micro-optimization to use
++i
instead ofi++
, though I would think of it as merely avoiding a pessimization, because when you pre-increment (or decrement) the compiler need not insert code to keep track of the current value of the variable for use in the expression. If using pre-increment/decrement doesn't change the semantics of the expression, then you should use it and avoid the overhead.A trick, on the other hand, is code that uses a non-obvious mechanism to achieve a result faster than a straight-forward mechanism would. Tricks should be avoided unless absolutely needed. Gaining a small percentage of speed-up is generally not worth the damage to code readability unless that small percentage reflects a meaningful amount of time. Extremely long-running programs, especially calculation-heavy ones, or real-time programs are often candidates for tricks because the amount of time saved may be necessary to meet the systems performance goals. Tricks should be clearly documented if used.
Alternatives, are just that. There may be no performance gain or little; they just represent two different ways of expressing the same intent. The compiler may even produce the same code. In this case, choose the most readable expression. I would say to do so even if it results in some performance loss (though see the preceding paragraph).
我认为你不需要考虑这些微观优化,因为它们大部分是由编译器完成的。这些东西只会让代码更难阅读。
请记住,[edited]过早[/edited]优化是一种罪恶。
I think you do not need to think about these micro-optimizations because most of them is done by compiler. These things can only make code more difficult to read.
Remember, [edited] premature [/edited] optimization is an evil.
老实说,这个问题虽然有效,但在今天已经不相关了——为什么?
编译器编写者比 20 年前聪明得多,如果时光倒流,那么这些优化就会非常相关,我们都在使用旧的 80286/386 处理器,编码人员经常会采取一些技巧来压缩更多已编译代码中的字节。
如今,处理器速度太快了,编译器编写者知道操作数指令的详细细节,以使一切正常工作,考虑到有管道、核心处理器、大量 RAM,请记住,对于 80386 处理器,将有 4Mb RAM 和如果你幸运的话,8Mb 就已经算高级了!
范式已经转变,它是关于从编译代码中挤出每个字节,现在它更多地关注程序员的生产力并更快地发布版本。
上面我已经阐述了处理器和编译器的本质,我说的是Intel 80x86处理器家族,Borland/Microsoft编译器。
希望这有帮助,
此致,
汤姆.
To be honest, that question, while valid, is not relevant today - why?
Compiler writers are a lot more smarter than they were 20 years ago, rewind back in time, then these optimizations would have been very relevant, we were all working with old 80286/386 processors, and coders would often resort to tricks to squeeze even more bytes out of the compiled code.
Today, processors are too fast, compiler writers knows the intimate details of operand instructions to make every thing work, considering that there is pipe-lining, core processors, acres of RAM, remember, with a 80386 processor, there would be 4Mb RAM and if you're lucky, 8Mb was considered superior!!
The paradigm has shifted, it was about squeezing every byte out of compiled code, now it is more on programmer productivity and getting the release out the door much sooner.
The above I have stated the nature of the processor, and compilers, I was talking about the Intel 80x86 processor family, Borland/Microsoft compilers.
Hope this helps,
Best regards,
Tom.
如果您可以轻松地看到两个不同的代码序列产生相同的结果,无需对代码中存在的数据以外的数据做出假设,那么编译器也可以,而且通常会这样做。
仅当从一个到另一个的转换非常不明显或需要假设某些您可能知道是真的但编译器无法推断的情况时(例如,操作不能溢出或两个指针永远不会别名,即使它们没有使用
restrict
关键字声明),您也应该花时间考虑这些事情。即使如此,最好的做法通常是找到一种方法来告知编译器它可以做出的假设。如果您确实发现编译器错过了简单转换的特定情况,则 99% 的情况下您应该针对编译器提交错误,然后继续处理更重要的事情。
If you can easily see that two different code sequences produce identical results, without making assumptions about the data other than what's present in the code, then the compiler can too, and generally will.
It's only when the transformation from one to the other is highly non-obvious or requires assuming something that you may know to be true but the compiler has no way to infer (eg. that an operation cannot overflow or that two pointers will never alias, even though they aren't declared with the
restrict
keyword) that you should spend time thinking about these things. Even then, the best thing to do is usually to find a way to inform the compiler about the assumptions that it can make.If you do find specific cases where the compiler misses simple transformations, 99% of the time you should just file a bug against the compiler and get on with working on more important things.
牢记内存是新磁盘这一事实可能会比应用任何这些微观优化更能提高性能。
Keeping the fact that memory is the new disk in mind will likely improve your performance far more than applying any of those micro-optimizations.
对于 ++i 与 i++ 问题的更务实的看法(至少在 C++ 上下文中),请参阅 http://llvm.org/docs/CodingStandards.html#micro_preincrement。
如果克里斯·拉特纳这么说,我就得注意了。 ;-)
For a slightly more pragmatic take on the question of ++i vs. i++ (at least in a C++ context) see http://llvm.org/docs/CodingStandards.html#micro_preincrement.
If Chris Lattner says it, I've got to pay attention. ;-)
您最好将您编写的每个程序主要视为一种语言,您可以在其中向其他必须修复错误、重用和理解它的人传达您的想法、意图和推理。他们在解码乱码上花费的时间比任何编译器或运行时系统执行它的时间都多。
总而言之,使用相关语言的常见习语,以最清晰的方式说出您的意思。
对于 C 语言中的这些具体示例,for(;;) 是无限循环的习惯用法,“i++”是“向 i 加一”的常用习惯用法,除非您在表达式中使用该值,在这种情况下,这取决于是否含义最明确的值是增量之前或之后的值。
You would do better to consider every program you write primarily as a language in which you communicate your ideas, intentions and reasoning to other human beings who will have to bug-fix, reuse and understand it. They will spend more time on decoding garbled code than any compiler or runtime system will do executing it.
To summarise, say what you mean in the clearest way, using the common idioms of the language in question.
For these specific examples in C, for(;;) is the idiom for an infinite loop and "i++" is the usual idiom for "add one to i" unless you use the value in an expression, in which case it depends whether the value with the clearest meaning is the one before or after the increment.
根据我的经验,这是真正的优化。
SO 上有人曾经说过,微优化就像“理发减肥”。美国电视上有一档名为《减肥达人》的节目,肥胖者竞相减肥。如果他们能够将体重降低到几克,那么理发就会有所帮助。
也许这夸大了微优化的类比,因为我已经看到(并编写了)代码,其中微优化实际上确实产生了影响,但是在开始时,简单地获得更多东西不解决你没有的问题。
Here's real optimization, in my experience.
Someone on SO once remarked that micro-optimization was like "getting a haircut to lose weight". On American TV there is a show called "The Biggest Loser" where obese people compete to lose weight. If they were able to get their body weight down to a few grams, then getting a haircut would help.
Maybe that's overstating the analogy to micro-optimization, because I have seen (and written) code where micro-optimization actually did make a difference, but when starting off there is a lot more to be gained by simply not solving problems you don't have.
对于不使用返回值的情况,++i 应该优于 i++,因为它更好地代表了您尝试执行的操作(递增 i)的语义,而不是任何可能的优化(它可能会稍微快一些,并且是可能不会更糟)。
++i should be prefered over i++ for situations where you don't use the return value because it better represents the semantics of what you are trying to do (increment i) rather than any possible optimisation (it might be slightly faster, and is probably not worse).
通常,计数到零的循环比计数到其他数字的循环更快。我可以想象一种情况,编译器无法为你进行这种优化,但你可以自己进行。
假设您有一个长度为 x 的数组,其中 x 是一个非常大的数字,并且您需要对 x 的每个元素执行一些操作。此外,假设您不关心这些操作发生的顺序。您可能会这样做...
但是,您可以通过这样做来获得一些优化 -
编译器不会为您做这件事,因为它不能假设顺序不重要。
MaR 的示例代码更好。考虑一下这一点,假设 doStuff() 返回一个 int:
只要以相反的顺序打印数组内容是可以接受的,这是可以的,但编译器无法为您决定这一点。
这是一种优化依赖于硬件。根据我对编写汇编程序的记忆(很多很多年前),每次执行循环时,向上计数而不是向下计数到零都需要额外的机器指令。
如果您的测试类似于 (x < y),则测试的评估如下所示:
如果您的测试是 ( x != 0),您可以执行以下操作:
分支您可以为每次迭代跳过减法指令。
在某些架构中,您可以使用减法指令根据减法结果设置标志,但我很确定 x86 不是其中之一,因此我们大多数人都没有使用可以访问此类指令的编译器机器指令。
Generally, loops that count towards zero are faster than loops that count towards some other number. I can imagine a situation where the compiler can't make this optimization for you, but you can make it yourself.
Say that you have and array of length x, where x is some very big number, and that you need to perform some operation on each element of x. Further, let's say that you don't care what order these operations occur in. You might do this...
But, you could get a little optimization by doing it this way instead -
The compiler doesn't do it for you because it can't assume that order is unimportant.
MaR's example code is better. Consider this, assuming doStuff() returns an int:
This is ok as long as printing the array contents in reverse order is acceptable, but the compiler can't decide that for you.
This being an optimization is hardware dependent. From what I remember about writing assembler (many, many years ago), counting up rather than counting down to zero requires an extra machine instruction each time you go through the loop.
If your test is something like (x < y), then evaluation of the test goes something like this:
If your test is ( x != 0), you can do this:
You get to skip a subtract instruction for each iteration.
There are architectures where you can have the subtract instruction set the flags based on the result of the subtraction, but I'm pretty sure x86 isn't one of them, so most of us aren't using compilers that have access to such a machine instruction.