当前位置：文江博客话题详情

Clojure 中如何实现惰性序列？

发布于 2024-09-09 06:22:10 字数 266 浏览 11 评论 0原文

我喜欢 Clojure。该语言让我困扰的一件事是我不知道惰性序列是如何实现的，或者它们是如何工作的。

我知道惰性序列仅评估序列中所要求的项目。它是如何做到这一点的？

是什么让惰性序列如此高效以至于不消耗太多堆？
为什么你可以将递归调用包装在惰性序列中而不是大型计算不再需要堆栈溢出？
惰性序列会消耗哪些资源来完成它的任务？
惰性序列在什么场景下效率低下？
在什么情况下惰性序列最有效？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

娇女薄笑 2024-09-16 06:22:10

让我们这样做吧。

• 我知道惰性序列仅评估序列中所要求的项目，它是如何做到这一点的？

惰性序列（下文称为 LS，因为我是 LP，或惰性人）由以下组成部分。 head 或已评估序列的一部分（实际上一次评估 32 个元素，从 Clojure 1.1 开始，我认为是 1.2），后面跟着一个称为 thunk 的东西，它基本上是等待调用的一大块信息（将其视为创建序列的函数的其余部分，未评估）。当它被调用时，thunk 会评估请求的内容，并创建一个新的 thunk，并根据需要提供上下文（已经调用了多少内容，因此它可以从之前的位置恢复）。

所以你（取10（整数）） - 假设整数是整数的惰性序列。这意味着您将强制对 thunk 求值 10 次（尽管在内部，这可能会因优化而略有不同。

• 什么使惰性序列如此高效以至于不消耗太多堆栈？

这变得更加清晰一旦您阅读了前面的答案（我希望）：除非您特别要求某些内容，否则不会对任何内容进行评估。当您要求某些内容时，可以单独评估序列的每个元素，然后将其丢弃（

如果序列不是惰性的）进行计算，然后将其丢弃，因为后续计算不需要它。

。它保留在其头部，这会消耗堆空间。如果它是惰性的，则会对其不再需要大量计算的堆栈溢出？

请参阅前面的答案并考虑：lazy-seq宏（来自文档）将

will invoke the body only the first time seq
is called, and will cache the result and return it on all subsequent
seq calls.

查看使用 LS 的 filter 函数递归：

(defn filter
  "Returns a lazy sequence of the items in coll for which
  (pred item) returns true. pred must be free of side-effects."
  [pred coll]
  (let [step (fn [p c]
                 (when-let [s (seq c)]
                   (if (p (first s))
                     (cons (first s) (filter p (rest s)))
                     (recur p (rest s)))))]
    (lazy-seq (step pred coll))))

• 惰性序列消耗哪些资源来完成它所做的事情？

我不太确定您在这里问什么。 LS 需要内存和 CPU 周期。他们只是不会不断地敲打堆栈，并用获取序列元素所需的计算结果来填充它。

• 在什么情况下惰性序列效率低下？

当您使用计算速度快且不会被大量使用的小型 seq 时，将其设为 LS 效率低下，因为它需要另外几个字符来创建。

严肃地说，除非您试图让某些东西极其表现出色，否则 LS 就是最佳选择。

• 在什么情况下惰性序列最有效？< /strong>

当您处理巨大的 seqs 并且仅使用其中的一部分时，那就是您从使用它们中获得最大收益的时候。

事实上，就方便性、易于理解（一旦掌握了它们的窍门）、推理代码和速度而言，使用 LS 总是比非 LS 更好。

Let's do this.

• I know that lazy sequences only evaluate the items in the sequence that are asked for, how does it do this?

Lazy sequences (henceforth LS, because I am a LP, or Lazy Person) are composed of parts. The head, or the part(s, as really 32 elements are evaluated at a time, as of Clojure 1.1, and I think 1.2) of the sequence that have been evaluated, is followed by something called a thunk, which is basically a chunk of information (think of it as the rest of the your function that creates the sequence, unevaluated) waiting to be called. When it is called, the thunk evaluates however much is asked of it, and a new thunk is created, with context as necessary (how much has been called already, so it can resume from where it was before).

So you (take 10 (whole-numbers)) – assume whole-numbers is a lazy sequence of whole numbers. That means you're forcing evaluation of thunks 10 times (though internally this may be a little difference depending on optimizations.

• What makes lazy sequences so efficient that they don't consume much stack?

This becomes clearer once you read the previous answer (I hope): unless you call for something in particular, nothing is evaluated. When you call for something, each element of the sequence can be evaluated individually, then discarded.

If the sequence is not lazy, oftentimes it is holding onto its head, which consumes heap space. If it is lazy, it is computed, then discarded, as it is not required for subsequent computations.

• How come you can wrap recursive calls in a lazy sequence and no longer get a stack over flow for large computations?

See the previous answer and consider: the lazy-seq macro (from the documentation) will

will invoke the body only the first time seq
is called, and will cache the result and return it on all subsequent
seq calls.

Check out the filter function for a cool LS that uses recursion:

(defn filter
  "Returns a lazy sequence of the items in coll for which
  (pred item) returns true. pred must be free of side-effects."
  [pred coll]
  (let [step (fn [p c]
                 (when-let [s (seq c)]
                   (if (p (first s))
                     (cons (first s) (filter p (rest s)))
                     (recur p (rest s)))))]
    (lazy-seq (step pred coll))))

• What resources do lazy sequences consume to do what it does?

I'm not quite sure what you're asking here. LSs require memory and CPU cycles. They just don't keep banging the stack, and filling it up with results of the computations required to get the sequence elements.

• In what scenarios are lazy sequences inefficient?

When you're using small seqs that are fast to compute and won't be used much, making it an LS is inefficient because it requires another couple chars to create.

In all seriousness, unless you're trying to make something extremely performant, LSs are the way to go.

• In what scenarios are lazy sequences most efficient?

When you're dealing with seqs that are huge and you're only using bits and pieces of them, that is when you get the most benefit from using them.

Really, it's pretty much always better to use LSs over non-LSs, in terms of convenience, ease of understanding (once you get the hang of them) and reasoning about your code, and speed.

回复收藏 0 原文

与之呼应 2024-09-16 06:22:10

我知道惰性序列仅评估序列中所要求的项目，它是如何做到这一点的？

我认为之前发布的答案已经很好地解释了这一部分。我只会补充一点，惰性序列的“强制”是隐式的——无括号！ :-) -- 函数调用；或许这样的思考方式会让一些事情变得更加清晰。另请注意，强制惰性序列涉及隐藏的突变 - 被强制的 thunk 需要生成一个值，将其存储在缓存中（突变！）并丢弃其可执行代码，这将不再需要（再次突变！）。

我知道惰性序列仅评估序列中所要求的项目，它是如何做到这一点的？
是什么让惰性序列如此高效以至于不消耗太多堆栈？
惰性序列会消耗哪些资源来完成它的任务？

它们不消耗堆栈，因为它们消耗堆。惰性序列是一种存在于堆上的数据结构，其中包含一小部分可执行代码，如果/需要时可以调用这些代码来生成更多数据结构。

为什么您可以将递归调用包装在惰性序列中，并且不再因大型计算而导致堆栈溢出？

首先，正如 dbyrne 所提到的，如果 thunk 本身需要执行具有非常深层嵌套的调用结构的代码，那么在使用惰性序列时，您可以很好地得到 SO。

然而，在某种意义上，你可以使用惰性序列来代替尾递归，并且在这对你有效的程度上，你可以说它们有助于避免 SO。事实上，相当重要的是，产生惰性序列的函数不应该是尾递归的；惰性序列生成器对堆栈空间的保护源于前面提到的堆栈 ->堆传输以及任何以尾递归方式编写它们的尝试只会破坏事情。

关键的见解是，惰性序列是一个对象，在首次创建时，不包含任何项目（严格序列总是如此）；当函数返回惰性序列时，在发生任何强制之前，仅将此“惰性序列对象”返回给调用者。因此，在任何强制发生之前，返回延迟序列的调用所用完的堆栈帧将被弹出。让我们看一个示例生产者函数：

(defn foo-producer [] ; not tail recursive...
  (lazy-seq
    (cons :foo        ; because it returns the value of the cons call...
           (foo-producer)))) ; which wraps a non-tail self-call

这是有效的，因为lazy-seq立即返回，因此(cons :foo (foo- Producer)) 也会立即返回，并且对 foo- Producer 的外部调用所用完的堆栈帧会立即弹出。对foo- Producer的内部调用隐藏在序列的rest部分中，这是一个thunk；如果/当强制执行该 thunk 时，它将短暂地用完堆栈上自己的帧，但然后立即返回，如上所述。

分块（由 dbyrne 提到）对这张图片的更改非常轻微，因为在每个步骤，但原理保持不变：当生成惰性序列的相应元素时，每个步骤都会使用一些堆栈，然后在发生更多强制之前回收该堆栈。

惰性序列在什么情况下效率低下？
惰性序列在什么情况下最有效？

无论如何，如果你需要同时掌握整个事情，那么懒惰是没有意义的。惰性序列在未分块时在每个步骤进行堆分配，或者在分块时在每个块（每 32 步一次）进行堆分配；在某些情况下，避免这种情况可以让您获得性能提升。

然而，惰性序列启用了数据处理的管道模式：

(->> (lazy-seq-producer)               ; possibly (->> (range)
     (a-lazy-seq-transformer-function) ;               (filter even?)
     (another-transformer-function))   ;               (map inc))

以严格的方式执行此操作无论如何都会分配大量堆，因为您必须保留中间结果以将它们传递到下一个处理阶段。此外，您需要保留整个事物，这在 (range) 的情况下实际上是不可能的——无限序列！ ——即使有可能，通常效率也很低。

I know that lazy sequences only evaluate the items in the sequence that are asked for, how does it do this?

I think the previously posted answers already do a good job explaining this part. I'll only add that the "forcing" of a lazy sequence is an implicit -- paren-free! :-) -- function call; perhaps this way of thinking about it will make some things clearer. Also note that forcing a lazy sequence involves a hidden mutation -- the thunk being forced needs to produce a value, store it in a cache (mutation!) and throw away its executable code, which will not be required again (mutation again!).

I know that lazy sequences only evaluate the items in the sequence that are asked for, how does it do this?
What makes lazy sequences so efficient that they don't consume much stack?
What resources do lazy sequences consume to do what it does?

They don't consume stack, because they consume heap instead. A lazy sequence is a data structure, living on the heap, which contains a small bit of executable code which can be called to produce more of the data structure if/when that is required.

How come you can wrap recursive calls in a lazy sequence and no longer get a stack over flow for large computations?

Firstly, as mentioned by dbyrne, you can very well get an SO when working with lazy sequences if the thunks themselves need to execute code with a very deeply nested call structure.

However, in a certain sense you can use lazy seqs in place of tail recursion, and to the degree that this works for you you can say that they help in avoiding SOs. In fact, rather importantly, functions producing lazy sequences should not be tail recursive; the conservation of stack space with lazy seq producers arises from the aforementioned stack -> heap transfer and any attempts to write them in a tail recursive fashion will only break things.

The key insight is that a lazy sequence is an object which, when first created, doesn't hold any items (as a strict sequence always does); when a function returns a lazy sequence, only this "lazy sequence object" is returned to the caller, before any forcing takes place. Thus the stack frame used up by the call which returned the lazy sequence is popped before any forcing takes place. Let's have a look at an example producer function:

(defn foo-producer [] ; not tail recursive...
  (lazy-seq
    (cons :foo        ; because it returns the value of the cons call...
           (foo-producer)))) ; which wraps a non-tail self-call

This works because lazy-seq returns immediately, thus (cons :foo (foo-producer)) also returns immediately and the stack frame used up by the outer call to foo-producer is immediately popped. The inner call to foo-producer is hidden in the rest part of the sequence, which is a thunk; if/when that thunk is forced, it will briefly use up its own frame on the stack, but then return immediately as described above etc.

Chunking (mentioned by dbyrne) changes this picture very slightly, because a larger number of elements gets produced at each step, but the principle remains the same: each step used up some stack when the corresponding elements of the lazy seq are being produced, then that stack is reclaimed before more forcing takes place.

In what scenarios are lazy sequences inefficient?
In what scenarios are lazy sequences most efficient?

There's no point to being lazy if you need to hold the entire thing at once anyway. A lazy sequence makes a heap allocation at every step when not chunked or at every chunk -- once every 32 steps -- when chunked; avoiding that can net you a performance gain in some situations.

However, lazy sequences enable a pipelined mode of data processing:

(->> (lazy-seq-producer)               ; possibly (->> (range)
     (a-lazy-seq-transformer-function) ;               (filter even?)
     (another-transformer-function))   ;               (map inc))

Doing this in a strict way would allocate plenty of heap anyway, because you'd have to keep the intermediate results around to pass them to the next processing stage. Moreover, you'd need to keep the whole thing around, which is actually impossible in the case of (range) -- an infinite sequence! -- and when it is possible, it is usually inefficient.

回复收藏 0 原文

你的呼吸 2024-09-16 06:22:10

最初，Clojure 中的惰性序列是根据需要逐项评估的。 Clojure 1.1 中添加了分块序列以提高性能。一次评估 32 个元素的“块”，而不是逐项评估。这减少了惰性求值产生的开销。此外，它还允许 Clojure 利用底层数据结构。例如，PersistentVector 被实现为 32 个元素数组的树。这意味着要访问元素，您必须遍历树直到找到适当的数组。对于分块序列，一次会抓取整个数组。这意味着在需要重新遍历树之前可以检索 32 个元素中的每一个。

人们一直在讨论如何在需要完全惰性的情况下提供一种强制逐项评估的方法。然而，我认为它还没有被添加到语言中。