对调用者来说看起来很纯粹但内部使用突变的函数
我刚刚拿到了 Expert F# 2.0 的副本,并发现了这样的说法,这让我有些惊讶:
例如,当需要时,您可以 对私有数据使用副作用 开始时分配的结构 一个算法,然后丢弃这些 返回之前的数据结构 结果;那么总体结果是 有效无副作用 功能。分离的一个例子 来自 F# 库的是该库的 List.map 的实现,它使用 内部突变;写入发生 基于内部的、分离的数据 其他代码无法实现的结构 访问。
现在,显然这种方法的优点是性能。我只是好奇是否有任何缺点——可能带来副作用的任何陷阱都适用于此吗?并行性是否受到影响?
换句话说,如果抛开性能,用纯粹的方式实现List.map
会更好吗?
(显然这特别涉及 F#,但我也对一般哲学感到好奇)
I just got my copy of Expert F# 2.0 and came across this statement, which somewhat surprised me:
For example, when necessary, you can
use side effects on private data
structures allocated at the start of
an algorithm and then discard these
data structures before returning a
result; the overall result is then
effectively a side-effect-free
function. One example of separation
from the F# library is the library's
implementation of List.map, which uses
mutation internally; the writes occur
on an internal, separated data
structure that no other code can
access.
Now, obviously the advantage to this approach is performance. I'm just curious if there are any disadvantages--do any of the pitfalls that can come with side-effects apply here? Is parallelisibility affected?
In other words, if performance were set aside, would it be preferable to implement List.map
in a pure way?
(Obviously this deals with F# in particular, but I'm also curious about general philosophy)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我认为副作用的几乎所有缺点都与“与程序其他部分的交互”有关。副作用本身并不坏(正如 @Gabe 所说,即使是纯功能程序也会不断改变 RAM),而是效果的常见副作用(非本地交互)导致问题(调试/性能/可理解性) /ETC。)。因此,对纯局部状态(例如,对不转义的局部变量)的影响是很好的。
(我能想到的唯一缺点是,当人类看到这样的局部可变变量时,他们必须推理它是否可以逃逸。在 F# 中,局部可变变量永远无法逃逸(闭包无法捕获可变变量),因此唯一的潜力是“心理税”来自于对可变引用类型的推理。)
摘要:使用效果是可以的,只要很容易让自己相信效果只发生在非逃逸的当地人身上。 (在其他情况下使用效果也是可以的,但我忽略了其他情况,因为在这个问题线程上,我们是开明的函数式程序员,在合理的情况下试图避开效果。:))
(如果你想要非常深层的局部效果(如 F# 的 List.map 实现中的效果)不仅不会妨碍并行性,而且实际上是一个好处,从更高效的实现分配更少的角度来看,因此也更少对 GC 的共享资源造成压力。)
I think nearly every disadvantage of side-effects is tied to "interaction with other portions of the program". Side-effects themselves aren't bad (as @Gabe says, even a pure functional program is constantly mutating RAM), it's the common-side-consequences of effects (non-local interactions) which cause problems (with debugging/performance/understandability/etc.). So effects on purely local state (e.g. on a local variable that does not escape) are fine.
(The only detriment I can think of is that, when a human sees such a local mutable, they have to reason about whether it can escape. In F#, local mutables can never escape (closures cannot capture mutables), so the only potential "mental tax" comes from reasoning about mutable reference types.)
Summary: it's fine to use effects, so long as it's simple to convince one's self that the effects only happen on non-escaping locals. (It's also ok to use effects in other cases, but I'm ignoring those other cases, since on this question-thread we are the enlightened functional programmers trying to eschew effects whenever it's reasonable. :) )
(If you want to go very deep, local effects, like those in the implementation of F#'s List.map, are not only a non-hindrance to parallelizability, but actually a benefit, from the point of view that the more-efficient implementation allocates less, and thus is less of a strain on the shared resource of the GC.)
您可能对 Simon Peyton Jones 的 “惰性功能状态线程”。我只浏览了前几页,内容非常清晰(我确信其余部分也非常清晰)。
重要的一点是,当您在 Haskell 中使用 Control.Monad.ST 执行此类操作时,类型系统本身会强制执行封装。在 Scala(也可能是 F#)中,方法更多的是“相信我们,我们不会在您的
map
中使用这个ListBuffer
做任何偷偷摸摸的事情”。You might be interested in Simon Peyton Jones's "Lazy Functional State Threads". I've only ever made it through the first few pages, which are very clear (I'm sure the rest is very clear as well).
The important point is that when you use
Control.Monad.ST
to do this kind of thing in Haskell, the type system itself enforces the encapsulation. In Scala (and probably F#) the approach is more "just trust us that we're not doing anything sneaky over here with thisListBuffer
in yourmap
".如果函数使用本地、私有(对于函数而言)可变数据结构,则并行化不受影响。因此,如果
map
函数在内部创建一个与列表大小相同的数组,并迭代填充该数组的元素,您仍然可以在同一台机器上同时运行map
100 次不用担心,因为map
的每个实例都有自己的私有数组。由于您的代码在填充数组之前无法看到数组的内容,因此它实际上是纯粹的(请记住,在某种程度上,您的计算机必须实际改变 RAM 的状态)。另一方面,如果函数使用全局可变数据结构,并行化可能会受到影响。例如,假设您有一个
Memoize
函数。显然,它的全部要点是维护某种全局状态(尽管“全局”是指它不是函数调用的本地状态,但它仍然是“私有”,因为它在函数外部不可访问),以便它不必使用相同的参数多次运行一个函数,但它仍然是纯粹的,因为相同的输入总是会产生相同的输出。如果您的缓存数据结构是线程安全的(例如ConcurrentDictionary
),那么您仍然可以与其自身并行运行您的函数。如果不是,那么您可能会认为该函数不是纯粹的,因为它具有并发运行时可以观察到的副作用。我应该补充一点,F# 中的一种常见技术是从纯函数例程开始,然后在分析显示速度太慢时利用可变状态(例如缓存、显式循环)对其进行优化。
If a function uses local, private (to the function) mutable data structures, parallelization is not affected. So if the
map
function internally creates an array of the size of the list and iterates over its elements filling in the array, you could still runmap
100 times concurrently on the same list and not worry because each instance ofmap
will have its own private array. Since your code cannot see the contents of the array until it has been populated, it is effectively pure (remember, at some level your computer has to actually mutate the state of RAM).On the other hand if a function uses global mutable data structures, parallelization could be affected. For example, let's say you have a
Memoize
function. Obviously the whole point of it is to maintain some global state (although "global" in the sense that it is not local to the function call, it is still "private" in the sense that it is not accessible outside the function) so that it doesn't have to run a function multiple times with the same arguments, yet it is still pure because the same inputs will always produce the same outputs. If your cache data structure is thread-safe (likeConcurrentDictionary
) then you can still run your function in parallel with itself. If not, then you could argue that the function is not pure because it has side effects that are observable when run concurrently.I should add that it is a common technique in F# to start off with a purely functional routine, then optimize it by taking advantage of mutable state (e.g. caching, explicit looping) when profiling shows that it's too slow.
Clojure 中也使用了相同的方法。 Clojure 中的不可变数据结构(列表、映射和向量)具有可变的“瞬时”对应数据结构。 关于瞬态的 Clojure 参考 敦促仅在“任何其他代码”无法看到的代码中使用它们。
客户端代码中有针对泄漏瞬态的防护措施:
适用于不可变数据结构的常用函数不适用于瞬态。调用它们将引发异常。
瞬态被绑定到创建它们的线程。从任何其他线程修改它们都会抛出异常。
clojure.core 代码本身在幕后使用了大量瞬态。
使用瞬态的主要好处是它们提供了巨大的加速。
因此,在函数式语言中,严格控制可变状态的使用似乎是可以的。
Same approach can be found in use in Clojure. The immutable data structures in Clojure - list, map and vector - have their "transient" counterparts which are mutable. The Clojure reference about transient urges to use them only in the code which cannot be seen by "any other code".
There are guards against leaking transients in client code:
The usual function which work on the immutable data structures don't work on transients. Calling them will throw an exception.
The transients are bound to the thread they are created in. Modifying them from any other thread will throw an exception.
The clojure.core code itself uses a lot of transients behind the scenes.
The main benefit of using transients is the massive speed-up they provide.
So the tightly controlled use of mutable state seems to be OK in the functional languages.
它不会影响该函数是否可以与其他函数并行运行。它会影响函数的内部是否可以并行 - 但这对于大多数针对 PC 的小型函数(例如地图)来说不太可能成为问题。
我注意到一些优秀的 F# 程序员(在网络上和书中)似乎对使用命令式循环技术非常放松。他们似乎更喜欢带有可变循环变量的简单循环,而不是复杂的递归函数。
It won't affect whether the function can be run in parallel with other functions. It will affect whether the function's internals can be made parallel though - but that's unlikely to be an issue for most small functions (such as map), targeting a PC.
I've noticed is that some good F# programmers (on the web, and in books) seem be very relaxed about using imperative techniques for loops. They seem to prefer a simple loop, with mutable loop variables, to a complex recursive function.
一个问题可能是,一个好的函数式编译器是为了很好地优化“函数式”代码而构建的,但是如果您使用一些可变的东西,编译器的优化效果可能不如其他情况。在最坏的情况下,这会导致算法比不可变变体效率更低。
我能想到的另一个问题是惰性 - 可变数据结构通常不是惰性的,因此可变函数可能会强制对参数进行不必要的评估。
One issue can be, that a good functional compiler is constructed to optimize "functional" code well, but if you're using some mutable stuff, the compiler may not optimize as good as in the other case. In the worst case, this leads to more inefficient algorithms then the immutable variant.
The other issue I can think of is laziness - a mutable data structure is usually not lazy, thus a mutable unction may force unneccesary evaluation of arguments.
我会用一个问题来回答这个问题:“你是在编写这个函数,还是在使用这个函数?”
功能有两个方面,用户和开发者。
作为用户,我们根本不关心函数的内部结构。它可以用字节码进行编码,并从现在到审判日在内部使用硬副作用,只要它符合预期的数据输入->数据输出的契约。函数是一个黑匣子或一个预言机,它的内部结构是无关紧要的(假设它不做任何愚蠢的和外部的事情)。
作为一个功能的开发者,内部结构非常重要。不变性、常量正确性和避免副作用都有助于开发和维护函数,并将函数扩展到并行域。
许多人开发了一个然后使用的功能,因此这两方面都适用。
不变性与可变结构的优点是什么是另一个问题。
I would answer this with a question: "are you writing the function, or using the function?"
There are 2 aspects to functions, users and developers.
As a user, one doesn't care about the internal structure of a function at all. It can be coded in byte code and use hard side effects internally from now until judgement day so long as it matches the contract of data in->data out that one expects. A function is a black box or an oracle, it's internal structure is irrelevant(assuming it doesn't do anything stupid and external).
As a developer of a function, the internal structure matters a lot. Immutability, const correctness and avoiding side effects all help develop and maintain a function, and expand the function into the parallel domain.
Many people develop a function that they then use, so both of these aspects apply.
What the advantages are of immutability vs. mutable structures is a different question.