GCC 中的 lambda 函数有多快
在 G++ 中尝试了一下 C++0x Lambda 表达式之后,我想知道与不使用 lambda 函数的替代方法相比,在一般/特定情况下性能如何。
有谁知道关于 lambda 表达式性能的或多或少全面的讨论,或者在开发过程中尽管更舒适但应该避免它们的情况?
Having toyed around a bit with C++0x Lambda Expression in G++, I was wondering as to how well the performance will be in general/specific situations compared to alternative ways without using lambda functions.
Does anybody know a more or less comprehensive discussion of lambda expression performance or situations in which one should avoid them despite more comfort while developing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您考虑使用operator () 定义结构的老式方法作为替代方法,那么不会有任何区别,因为 lambda 几乎与此相同。只是语法上更方便。让我们等待更完整的答案......
If you consider the old-school way of defining a struct with
operator ()
as an alternative way then there isn't gonna be any difference because lambdas are pretty much equivalent to that. Just syntactically more convenient. Let's wait for a more complete answer...当第一次遇到 lambda 表达式时,许多人都会有这样的模糊印象:创建这些函数时发生了一些运行时编译魔法。具体来说,如果您有一个函数返回一个新创建的函数作为其结果,那么每次调用周围的函数时,似乎都会“创建”返回的函数。但这是错误的——lambda 表达式(在任何语言中都是如此)包含一些可以像任何其他代码一样编译的代码,并且这一切都是静态发生的,没有任何成本留到运行时。
唯一的问题是封闭的变量会发生什么,但这并不排除这样的编译——要创建一个闭包,您只需将闭包数据(这些变量)与指向静态的指针配对即可。 > 编译后的代码。就性能而言,这意味着完全不应该有任何性能损失——无论是否是封闭变量。即使使用封闭变量,也没有成本 - 解决您面临的任何问题的任何其他方法都需要以某种方式打包这些值,因此无论您如何保留它(显式地,或隐含在封闭变量中)。如果替代解决方案不需要打包某些值,那么也不需要用闭包来关闭它们。这实际上与执行代码所需的本地分配相同——无论代码来自具有本地作用域的闭包还是来自需要相同类型本地状态的其他作用域,这显然都是相同的。
再说一次,这是任何带有闭包的语言都具备的东西,C++ 代码没有理由遭受其他语言没有的性能问题。 C++ lambda 表达式中的一个奇怪之处是需要指定要关闭的变量,而在大多数其他语言中,默认情况下您只是关闭所有内容。这看起来似乎为 C++ 代码提供了一些优势,可以更好地控制需要使用闭包打包多少内容 —— 但这对于编译器来说很容易自动完成,无需显式注释。它导致了函数式语言编译器最常见的事情之一——“lambda 提升”——函数被有效提升到顶层,避免在运行时创建不需要的闭包。例如,如果您编写(使用一些类似 JS 的伪代码):
那么(对于编译器以及人类)很容易看到返回的函数不依赖于
x
,并且编译器现在可以提升它,就像您写的那样:当仅关闭某些值时,会使用类似的技术。
但这是有分歧的。底线是 lambda 表达式不会带来一些性能损失——无论是否是封闭变量。
(至于将 lambda 表达式与使用
operator()
进行比较,我不确定大多数 C++ 编译器会做什么,但 lambda 应该更快,因为任何方法调用都不需要运行时调度。即使 lambda 被实现为带有()
运算符的匿名类,上述技术也适用于这种情况,这意味着调度机制可以被编译掉,这意味着它不应该也有额外费用,使其类似于特殊情况其中匿名类对于高效编译来说是微不足道的。)When encountering lambda expression for the first time, many people get the vague impression that there's some runtime compilation magic happening to create these functions. Specifically, if you have a function that returns a newly made function as its result, it would seem that the returned function is "created" every time the surrounding function is called. But this is wrong -- a lambda expression (and this is true in any language) contains some code that can be compiled just like any other code, and it all happens statically, without any cost that needs to be left for the runtime.
The only issue is what happens with variables that are closed over, but that does not preclude such compilation -- to create a closure, you just pair up the closure data (these variables) with a pointer to the statically compiled code. The implication of that in terms of performance is that there should be no loss of performance at all -- closed variables or not. Even with closed over variables there is no cost -- any other way to approach whatever problem you're facing would require packaging up those values in some way, so the cost of allocation is the same regardless of how you keep it (explicitly, or implicitly in closed over variables). If an alternative solution doesn't need to package some values, then there wouldn't be any need for closing over them with closures too. This is really the same as with local allocation needed to execute the code -- which would obviously be the same regardless of whether the code comes from a closure with its local scope or from some other scope that would need the same kind of local state.
Again, this is all stuff that holds in any language with closures, and there is no reason for C++ code to suffer from some performance issues where no other language does. One oddity in the C++ lambda expressions is the need to specify which variables you close over, whereas in most other languages you just get everything closed over by default. This would seem like it gives C++ code some edge in having greater control over how much stuff needs to be packages with the closure -- but that's something that is very easy for a compiler to do automatically, without explicit annotations. It leads to one of the most common things that compilers of functional languages do -- "lambda lifting" -- where a function is effectively lifted to the toplevel, avoiding the need to create closures at runtime if they're not needed. For example, if you write (using some JS-like pseudo code):
then it's easy (for a compiler as well as for a human) to see that the returned function does not depend on
x
, and the compiler can now lift it, as if you wrote:Similar techniques are used when only some values are closed over.
But this is diverging. The bottom line is that lambda expressions are not something that incurs some performance penalty -- closed variables or not.
(As for comparing lambda expressions with using
operator()
, I'm not sure what most C++ compilers will do, but lambdas should be faster since there is no runtime dispatch that is needed for any method call. Even if lambdas are implemented as anonymous classes with a()
operator, the above techniques can apply in that case too, meaning that the dispatch machinery can be compiled away, which would mean that it shouldn't have additional costs too, making it similar to a special case where the anonymous class is trivial to the point of efficient compilation.)我没有看到任何设计原因,为什么闭包的性能应该低于具有相同数量和大小的传递参数的等效函数,
即使捕获所有上下文变量的闭包也应该能够仅优化 lambda 中实际使用的上下文变量。
特定的上下文变量(通过值或通过引用捕获)将需要在实例化时初始化一些存储,这发生在执行期间首次找到 lambda 时。但这个存储不需要是堆,堆栈分配就完全可以。
lambda 与常规函数完全相同,唯一的区别完全是语法上的;它在其他函数内部定义,并且可以捕获一些外部变量,这些变量被编译为附加上下文参数。上下文参数可能在定义 lambda 时进行类似 POD 的初始化。
如果特定编译器(即:g++ 或 clang)的行为与上述冲突,则这是一个错误实现的警告信号。 clang 能够轻松扩展设计传递的优化,因此从长远来看,任何此类缺点都应该更容易解决,与 g++ 相比,
底线是如果您不使用上下文变量,则 lambda 是(应该)与编译器的常规自由函数完全没有区别
i don't see any design reason why closures should be lesser performers than equivalent function with the same number and size of passed parameters
even closures capturing all context variables should be able to optimize away only the context variables actually being used in the lambda.
specific context variables (either captured by value or by reference) will need some storage initialised at instantiation time, which happens at the point the lambda is first found during execution. But this storage doesn't need to be heap, stack allocation is perfectly fine.
a lambda is exactly the same as a regular function, the only difference is entirely sintactical; it is defined inside other functions, and can capture some external vars, which are compiled as an additional context parameter. the context parameter might have a POD-like initialization at the point where the lambda is defined.
if a specific compiler (i.e: g++ or clang) behave in conflict with the above, its a warning sign of a bad implementation. clang has the ability to easily extend optimization passed by design, so any such shortcomings should be easier to address in the long run, compared to say, g++
the bottom line is if you don't use context variables, a lambda is (should be) totally indistinguishable from a regular free function to the compiler
我们开始避免在某些情况下(游戏环境)使用 lambda,因为创建的闭包(如果它已捕获值)具有关联的 new/delete 来保存任何捕获的值。虽然在许多环境中这不是问题,但我们致力于一次缩短几微秒以获得最佳性能。 Lambda(带有封闭变量的)在我们的分析中占有重要地位,并且是最先被淘汰的。
We're starting to avoid using lambdas in some cases (gaming environment) because the closure created (if it has captured values) has an associated new/delete to hold any captured values. While in many environments this is not an issue, we're in the business of shaving off microseconds at a time to get the best performance. Lambdas (those with enclosed variables) featured highly on our profiling and were among the first to go.
正如其他人提到的,编译器生成的 lambda 闭包在性能上与手写闭包没有任何不同。为了验证它,我只是用手写的类更改了解析器中使用的 lambda。它们都只包含几行紧凑的代码并执行了数百万次,因此性能的每一个变化都会立即引人注目。结果——执行时间完全相同。所以不,性能上没有区别。
Just as others mentioned, there is no reason why compiler generated closures resulting from lambdas should be any different in performance than the hand written ones. And in order to verify it, I just changed the lambdas used in a parser with hand written classes. They all contain only few lines of tight code and executed millions of times, so every change in performance would be immediately noticeable. The result -- exactly the same execution time. So no, there is no difference in performance.