为什么闭包突然对于优化在多核上运行的程序有用?

发布于 2024-08-08 14:42:16 字数 511 浏览 6 评论 0原文

我读过一篇文章,该文章声称关闭(或“块”)是“多核战争”中的有用武器,因为

[...]它们允许您创建以下单位 作品,每个都有自己的副本 堆栈的,并且不要踩在每个堆栈上 结果是其他人的脚趾。更重要的是, 你可以像这样传递这些单位 它们是价值观,但实际上 它们包含一整堆值 (双关语),以及可执行代码 执行一些操作。

现在,我不讨论闭包的一般用处,并且可能也适用于共享内存模型中的并发编程,但是与仅作用于的线程有什么区别本地数据(或进程,或参与者,或......)?

对于并发编程来说,闭包本身是否与没有调度程序的线程一样有用?

具有非局部副作用的闭包怎么办?

I read an article that claims that closures (or "blocks") are a useful weapon in the "War on Multicores", because

[...] they allow you to create units of
work, which each have their own copy
of the stack, and don’t step on each
others toes as a result. What’s more,
you can pass these units around like
they are values, when in actual fact
they contain a whole stack of values
(pun intended), and executable code to
perform some operation.

Now, I am not debating the usefulness of closures in general and possibly also for concurrent programming in a shared-memory model, but what's the difference with a thread that only acts on local data (or processes, or actors, or ...)?

Isn't a closure on its own as useful for concurrent programming as a thread without a scheduler?

What with closures that have non-local side effects?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

我纯我任性 2024-08-15 14:42:16

争论的焦点是,在编程语言中使用闭包可以更轻松地在另一个线程中完成某些工作。我认为作者应该在该论证中提到高阶函数的重要性。

我最喜欢的高阶函数介绍是“为什么函数式编程很重要”,我不会尝试在这里展示一个糟糕的副本。

因此,如果您要在 for 循环中执行闭包,那么使用闭包并不会免费为您提供并行性,例如

for (int i = 0; i < numElements; i++) {
  result[i] = closure(inputs[i], i);
}

因为语言无法判断 closure(a, b) 是否以某种方式更改了中的其他值结果或输入数组。但是具有高阶函数(例如 map)的语言指定传递给 map 的函数不应查看或更改输入中的其他值,并防止其影响其他结果。因此,可以为您并行化函数式语言中常见的如下代码,而无需创建工作线程池并将闭包交给它们:

results = map(closure, inputs, [0..numElements-1]);

在这些语言中,闭包消除了声明一个线程的痛苦。某处的新函数用于短代码片段。这使得使用高阶函数变得更加有趣。

以下 Haskell 代码定义了一个函数 f,它接受数字列表并返回一个列表,其中每个输入 i 都替换为 2i+1。通过省去创建函数来计算 2i+1 的麻烦,这是 1 行代码而不是 2 行。

f nums = map (\i -> 2*i+1) nums

再次参见 "为什么函数式编程很重要" 提供了关于如何扩展到实际代码库的强有力的论据。

The argument is that having closures in your programming language makes it easier to have some work done in another thread. I think the author should have mentioned the importance of higher-order function in that argument.

My favorite introduction to higher-order functions is "Why functional programming matters", I won't try to present a bad replica here.

So using closures doesn't give you parallelism for free if you're going do execute closures in for loops, e.g.

for (int i = 0; i < numElements; i++) {
  result[i] = closure(inputs[i], i);
}

because the language can't tell if closure(a, b) somehow changes other values in the result or inputs arrays. But languages with higher-order functions like map specify that the function passed to map shouldn't look at or change other values in the inputs, and prevent it from affecting other results. So, code like the following, which is common in functional languages, can be parallelized for you, without you needing to create a pool of worker threads and hand off the closure to them:

results = map(closure, inputs, [0..numElements-1]);

In these languages, closures take away the pain of declaring a new function somewhere for short pieces of code. That makes it more fun to use higher-order functions.

The following Haskell code defines a function f that takes a list of numbers and returns a list where each input i is replaced with 2i+1. By saving you the hassle of creating a function to compute 2i+1 this is 1 line of code instead of 2.

f nums = map (\i -> 2*i+1) nums

Again, see "Why functional programming matters" for strong arguments as to how this scales up to real code bases.

奈何桥上唱咆哮 2024-08-15 14:42:16

这是闭包的一个很好的定义:

“闭包”是一个表达式
(通常是一个函数)可以有
自由变量与
绑定这些变量的环境
(“关闭”表达式)。

我认为你混淆了定义,例如,在 javascript 中,我的闭包可能经常会产生非本地副作用,因为我正在更改 DOM。

闭包非常有用,这就是 C# 将它们添加到语言中的原因。

在函数式编程语言等语言中,它们似乎不一定创建线程(由于上下文切换而必须为此付出代价),而是创建轻量级进程。框架或编译器将控制要创建的内容,以确保处理器得到最佳利用。

是否使用闭包编写并不比使用不可变数据重要。

例如,如果我有一个没有全局数据的应用程序,但每个线程都使用自己的本地副本,则由操作系统和调度程序来确定我的应用程序将使用哪些核心。不幸的是,在 C/C++ 中,编译器不知道如何做到这一点,因此通过转向 FP,我们可以使用长期以来一直在处理分布式处理的框架,例如 Erlang,并利用他们的经验。

在像 Erlang 这样的东西中,Actor 的开销比 C/C++ 线程要少,因为 Actor 的切换似乎更快。

Here is a nice definition of closures:

A "closure" is an expression
(typically a function) that can have
free variables together with an
environment that binds those variables
(that "closes" the expression).

I think you are confusing definitions, as, in javascript for example, my closures may often have non-local side effects, as I am changing the DOM.

Closures are very useful, which is why C# added them to the language.

In languages such as the functional programming language, they seem to not necessarily create threads, which you have to pay a price for due to context switching, but create light-weight processes. The framework, or compiler, will have control over what to create to ensure that the processor is best utilized.

Whether you write with closures is less important than if you use immutable data.

For example, if I have an application that has no global data, but every thread uses it's own local copy, then it is up to the OS and the scheduler to determine which cores my application will use. Unfortunately, in C/C++ the compilers don't see to know how to do that well, so by moving to FP then we can go with frameworks, such as Erlang, that have been dealing with distributed processing for a long time, and leverage their experience.

Actors, in something like Erlang, will have less overhead than a C/C++ thread, as the switching seems to be faster with actors.

爱冒险 2024-08-15 14:42:16

这篇文章中的这段话将相当多的误解包装在一个句子片段中:

[...]它们允许你创建单位
的工作,每个人都有自己的
堆栈的副本,​​并且不要踩
结果是彼此踮起脚尖。

对于带有闭包的语言来说,情况通常并非如此。

首先,为了提高效率,它们通常会引用堆栈,而不是副本。在大多数语言中,您可以通过引用来修改内容。因此,在这些语言中,此功能根本不提供隔离工作单元的方法。如果有什么不同的话,那就是它会让事情变得更加混乱。

其次,在大多数(理智的)语言中,您可以引用任何在词法上包含本地函数的内容。您不能仅引用堆栈上任何位置的任何内容。例如,您无法深入了解调用该函数的函数的局部变量...等等。您只能访问本地声明的函数的变量/参数,其文本包含发生使用的文本。此外,局部变量和参数(“在堆栈上”)并不是唯一可能在词法上封闭函数的东西。因此,此处引用“堆栈”的概念是错误的。

Java 是一种只能在其“匿名内部类”闭包中获取局部变量和参数副本的语言示例。但是,在外部类的 this 引用的情况下,它仍然会获取引用。

在关闭 this 的情况下,内部类现在将存储对外部类的隐式引用 - 实际上与堆栈无关。

C# 中的情况类似,只不过局部变量和参数是通过引用捕获而不是复制。

var counter = 0;
Repeat(10, () => counter++);

假设 Repeat 是一个库函数,它启动另一个线程,并且现在每 10 毫秒调用一次传递给它的 Action lambda。您有望看到这是如何非常简洁地创建竞争条件的方式!

唯一可以避免此问题的语言是纯函数式语言,例如 Haskell,但这显然不是由于闭包所致,而是因为您永远无法修改任何共享状态。 (Haskell 仍然无法完全避免这个问题;大多数实际软件必须在某个时刻与程序外部的共享状态进行交互,而 Haskell 的标准库有一种方法可以做到这一点)。

This quote from the article packs quite a few misunderstandings into one sentence fragment:

[...] they allow you to create units
of work, which each have their own
copy of the stack, and don’t step on
each others toes as a result.

This is not generally true of languages with closures.

Firstly, more often they have references to the stack, not copies, for the sake of efficiency. And in the majority of languages, you can modify things through a reference. So in those languages, this feature does not provide a way of isolating units of work at all. If anything, it can make it more confusing.

Secondly, in most (sane) languages you can refer to anything that is lexically enclosing a local function. You cannot refer to just anything anywhere on the stack. For example, you cannot dig into the local variables of the function that called the function that called the function... etc. You can only access the variables/parameters of functions declared locally, whose text encloses the text in which the usage occurs. Also local variables and parameters ("on the stack") are not the only things that may be lexically enclosing a function. So "the stack" is the wrong concept to invoke here.

Java is one example of a language that can only take copies of local variables and parameters in its "anonymous inner class" closures. However, it will still take a reference in the case of the this reference of the outer class.

And in the case of closing over this, the inner class will now store an implicit reference to the outer class - nothing to do with the stack, really.

The situation is similar in C#, except local variables and parameters are captured by reference instead of being copied.

var counter = 0;
Repeat(10, () => counter++);

Suppose Repeat is a library function that starts another thread and will now call the Action lambda passed to it every 10 milliseconds. You can hopefully see how this is a very succinct way to create race conditions!

The only kind of language that would avoid this problem would be a pure functional language such as Haskell, but that would not be due to closures - clearly - but instead due to the fact that you can never modify any shared state. (And Haskell still wouldn't avoid the problem entirely; most real software has to interact with shared state external to the program at some point, and Haskell's standard library has a way of doing that).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文