价值观与背景的封闭
我正在思考闭包的各种实现,并且想知道不同风格的优点。似乎有两个选择,关闭执行上下文或值。例如,在上下文中,我们有:
a = 1
def f():
return a
f() # returns 1
a = 2
f() # returns 2
或者,我们可以关闭值并有:
a = 1
def f():
return a
f() # returns 1
a = 2
f() # returns 1
是否有实现第二个的语言?有优点和缺点吗?
I'm thinking through various implementations of closures and am wondering about the merits of different styles. It seems there are two choices, closing over the execution context or the values. For instance, over the context we have:
a = 1
def f():
return a
f() # returns 1
a = 2
f() # returns 2
Alternatively, we can close over values and have:
a = 1
def f():
return a
f() # returns 1
a = 2
f() # returns 1
Are there languages that implement the second? Are there advantages vs. disadvantages?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我认为在这种情况下,这不是上下文与值的问题,而是您是否将变量作为引用单元格或变量包含的值关闭。
如果您真正指的是上下文,那么您指的是动态范围与词法范围。请参阅这篇维基百科文章进行深入比较。
大多数语言都实现了词法作用域(或尝试实现)。有些语言确实实现了动态作用域:尤其是较旧的 Lisp,例如 emacs 的 ELisp。大多数带有闭包的语言(例如,Scheme、Haskell、ML 等)都会关闭词法范围内的值。动态范围通常被认为是一个坏主意,因为它更难以推理(它是“远距离的幽灵行动”)。
请注意,即使在词法范围的语言中,如果关闭引用单元格,您也可以获得与第一个示例类似的行为。这就是为什么Scheme 和JavaScript 闭包的行为就像它们一样(因为变量是引用单元格)。
I think in this case it's not a matter of context vs. value, but whether you close over a variable as a reference cell or the value that the variable contains.
If you really mean the context, you're referring to dynamic vs. lexical scope. See this Wikipedia article for an in-depth comparison.
Most languages implement lexical scope (or try to). Some languages do implement dynamic scope: notably older Lisps like ELisp for emacs. Most languages with closures (e.g., Scheme, Haskell, ML, and so on) close over the values in the lexical scope. Dynamic scope is often considered a bad idea because it's more difficult to reason about (it's "spooky action at a distance").
Note that even in lexically scoped languages, you can get behavior like your first example if you close over a reference cell. That's why Scheme and JavaScript closures behave like they do (because variables are reference cells).
C++ lambda 可以通过值显式捕获:
或通过引用:
您也可以隐式捕获任一方式:
优点是您可以控制复制或引用哪些变量以及是否复制或引用变量。一个潜在的缺点是您必须注意生命周期问题,因为 C++ 引用是非拥有的:如果
a
超出范围,则调用f1
仍然有效,但调用f2
未定义。如果这是自然的并且您不介意开销,您始终可以捕获shared_ptr
(具有共享所有权的指针)。因此,对于不可变值:
按值捕获会强制进行副本。通过引用捕获则不会。
按价值获取不存在所有权问题。通过引用捕获可以。
对于可变值,您当然必须通过引用捕获。这是一个类似于 std::partial_sum() 的人为示例:
C++ lambdas can capture explicitly by value:
Or by reference:
You can also implicitly capture either way:
The advantage is that you control which and whether variables are copied or referenced. A potential disadvantage is that you must beware of lifetime issues, because C++ references are non-owning: if
a
goes out of scope, then callingf1
is still valid, but callingf2
is undefined. If it’s natural and you don’t mind the overhead, you could always capture ashared_ptr<T>
(pointer with shared ownership).So for immutable values:
Capturing by value forces a copy. Capturing by reference does not.
Capturing by value has no ownership issues. Capturing by reference does.
For mutable values, you must of course capture by reference. Here’s a contrived example similar to
std::partial_sum()
:在大多数同时具有闭包和可变变量的语言中,闭包捕获位置,而不是值(即第一个行为)。示例包括Scheme、Python 和Javascript。
为了安全地做到这一点,在许多情况下,语言必须对闭包捕获的可变变量进行堆分配。这通常是通过编译器传递来实现的,该编译器传递将实际变异的变量转换为显式分配的可变单元,之后编译器可以忘记该问题。
为了避免隐式堆分配,Java 要求(必需?)捕获的变量(通过内部类)声明为最终的(即不可变的)。其他语言,如 ML 和 Haskell,完全避免了这个问题,因为变量总是不可变的。正如 Jon 在他的回答中指出的那样,在 C++ 中,按引用捕获可能是不安全的。
In most languages with both closures and mutable variables, closures capture locations, not values (that is, the first behavior). Examples include Scheme, Python, and Javascript.
To do this safely, the language must, in many circumstances, heap-allocate mutable variables that are captured by closures. This is typically implemented by a compiler pass that converts variables that are actually mutated into explicitly-allocated mutable cells, after which the compiler can forget about the issue.
To avoid implicit heap-allocation, Java requires (required?) captured variables (by inner classes) to be declaredi
fnal
(ie, immutable). Other languages, like ML and Haskell, avoid the issue entirely because variables are always immutable. In C++ capture-by-reference can be unsafe, as Jon points out in his answer.闭包的行为应与第一种情况相同,但某些语言提供第二种情况。
Smalltalk 按照第一种情况工作。假设一个类定义了方法 m 和 test:
要考虑闭包,您必须考虑堆栈。如果在方法m中定义闭包c并通过临时变量counter关闭,则m的堆栈帧可以在闭包被垃圾收集之前不会被删除。闭包是一流的,所以你不知道什么时候就不再引用它了。
但是许多闭包不会关闭任何临时变量,或者关闭定义闭包后未修改的临时变量。在后一种情况下,定义闭包时临时变量的值可以复制到闭包中,这样它们就不需要对m的堆栈帧的引用。
在上面的闭包c的情况下,闭包可以复制counter的值。 Java 通过强制封闭的临时变量为最终变量来强制执行此操作。
如果方法m是
我猜它会失败优化,因为counter在创建闭包后发生了变化。
至少我是这么理解闭包的。
Closures should behave as in the first case, but some languages provide the 2nd case.
Smalltalk works according to the first case. Let's assume a class defines methods m and test:
To think about closure, you must think about the stack. If closure c is defined in method m and closes over the temporary variable counter, the stack frame of m can not be removed until the closure is garbage collected. Closure are first-class, so you don't know when there will be no reference to it anylonger.
But many closures do no close over any temporary variable, or close over temporary variables that are not modified after the closure is defined. In the latter case, the value of the temporary variable at the moment the closure is defined can be copied into the closure, so that they don't need a reference to the stack frame of m.
In the case of the closure c above, the closure can copy the value of counter. This what Java mandates by forcing tempory variables that are closed over to be final.
If method m was
I guess it would defeat the optimization, because counter is mutated after the creation of the closure.
That's how I understand closures at least.
Felix 实际上提供了相当复杂的语义,有时是违反直觉的。闭包通过指向上下文框架的指针捕获上下文。在闭包形成时。因此,您会期望捕获的变量始终反映执行闭包时变量的当前值。
情况并非如此,因为优化器可以用变量的值替换变量,特别是如果“变量”声明如下:
它被视为不可变值,并且这种替换被认为是安全的。即使该值作为参数传递也是如此!例如:
我们很可能将 fy 定义为 if:
已被写入。在这种情况下,对于变量来说可能是相同的:
通过在闭包形成时将 x 替换为 z 的值,但它也可以通过将 x 替换为变量名 z 来打印 2。
在 Felix 中,不确定应用哪种优化,这是经过深思熟虑的:它允许编译器自由选择(它认为是)最佳优化。
如果你想强制解释,你可以:对于参数参数:
fun f(var x:int) () => x; // 强制急切求值,将参数复制到参数
fun f( x: 单位 -> int ) => x(); // 强制惰性求值
对于最初的问题:您可以通过简单地使用指针来强制惰性解释:
强制急切解释是无意义的。如果你愿意,你可以这样做:
我必须说我对这些语义不满意,但这就是目前发生的情况,而且看起来很合乎逻辑。更麻烦的是:
for循环是扁平的,没有栈帧。这里的“x”是一个值,但它不是一成不变的!
如果您可以预测 g() 打印的值,那么您会比我做得更好(并且我设计了该语言:)
不幸的是,通过这些语义获得的优化是强制性的:我们不希望最终得到以下性能:呃,好吧,哈斯克尔(无意冒犯)。
这个故事的寓意是:如果你的代码取决于OP问题的答案,那就由你自己决定吧!如果需要,请编写语义确定的代码。
Felix actually provides quite a complex semantics which is sometimes counter-intuitive. Closures capture the context via a pointer to the context's frame .. at the point closures are formed. Therefore you would expect that the captured variable always reflect the current value of the variable at the time the closure is executed.
This is not the case, because the optimiser may replace the variable with its value, in particular, if the "variable" is declared like:
it is taken as an immutable value, and such a substitution is deemed safe. This is true even if the value is passed as an argument! For example:
It's likely we have fy defined as if:
had been written. In this case it may be the same for a variable:
by replacing the x with the value of z at the time of closure formation BUT it could also print 2, by replacing the x with the variable name z instead.
In Felix, it is not determinate which optimisation is applied and that is deliberate: it allows the compiler the freedom to choose (what it thinks is) the best optimisation.
If you want to force an interpretation you can: for the parameter argument:
fun f(var x:int) () => x; // forces eager evaluation, copies argument to parameter
fun f( x: unit -> int ) => x(); // forces lazy evaluation
And for the original question: you can force the lazy interpretation by simply using a pointer:
It is nonsense to force the eager interpretation. If you want that you do this:
I must say I am NOT HAPPY with these semantics, but that's what happens at the moment, and it seems quite logical. What is more troubling is this:
The for loop is flat, no stack frame. Here 'x' is a value, but it isn't immutable!
If you can predict the value printed by g() you're doing better than me (and I designed the language :)
Unfortunately the optimisations obtained by these semantics are mandatory: we do not want to end up with the performance of, er, well, Haskell (no offense intended).
The moral of the story is: if your code depends on the answer to the OP's question, on your head be it! Write code where the semantics are determinate if you require that.
各种语言都以这两种方式之一或两者都有。
主要区别在于分配给变量时会发生什么。因此,正如其他人指出的那样,在变量不可变的语言中
,在按值捕获的语言中,一个问题是如何处理对该变量的赋值。由于它是按值捕获的
final
,本质上是为了按值捕获,否则在同一作用域中拥有变量的两个单独的可变副本会造成混乱;但是当它们是final
时,它们就无法修改,因此拥有一份副本和拥有多个副本之间没有区别,&
的变量是通过引用;否则,按价值计算。=
本身按值捕获所有未列出的变量;&
本身通过引用捕获所有未列出的变量。通过引用捕获变量时必须小心,不要捕获超出范围的变量。有趣的是(与 Java 不同),通过在匿名函数上使用mutable
修饰符,可以按值捕获变量,但使其可变。&
表示通过引用捕获;否则按值。Various languages have it in one of those two ways, or both.
The main distinction is what happens when you assign to the variable. Thus, as others have pointed out, in languages where variables are immutable
In languages that capture by value, one issue is how to deal with assignments to that variable. Since it's captured by value
final
, essentially for the purpose of capturing by value, because otherwise there would be confusion at having two separate mutable copies of a variable in the same scope; but when they arefinal
, they can't be modified so there is no difference between having one copy and many copies&
are by reference; otherwise, it's by value.=
by itself captures all unlisted variables by value;&
by itself captures all unlisted variables by reference. One has to be careful when capturing variables by reference, to not capture variables that go out of scope. Interestingly (unlike Java), it is possible to capture a variable by value, but have it be mutable, by using themutable
modifier on the anonymous function.&
indicates capture by reference; otherwise by value.__block
modifier when declaring that variable. This probably allocates it on the heap.