clojure 哈希映射的惰性是否有意义？

发布于 2025-01-08 19:56:37 字数 666 浏览 3 评论 0原文

我需要从我的函数返回一个序列、一个数字和一个哈希映射（全部包含在一个向量中），以便打印的返回值如下所示：

[ ([:c :a] [:e :c] [:f :e] [:d :e] [:g :f] [:b :a])  15
  {:g :c, :f :a, :c :e, :d :a, :b :a, :c :a} ]

由于我的输入可能很大，我想返回惰性序列/我的函数中的对象。对的序列（我的返回向量中的第一个对象）很容易通过将“lazy-seq”包装在构建它的 conj 调用周围来变得懒惰。

哈希映射（我的返回向量中的第三个对象，可能非常大，就像我的序列）正在与序列相同的循环递归块中构建（使用 assoc 调用）。哈希映射是我的一些调用者将使用的附加信息，但如果对序列作为惰性返回，那么我想知道使用（有效的）惰性序列发回潜在的巨大哈希映射是否有意义即使我将其设为可选的返回值。哈希映射中的条目与惰性序列中的对相关。

所以这是我的菜鸟问题：发回 MapEntry 的惰性序列来代替大型 HashMap 是否有意义？也就是说，假设用户将获取 MapEntrys 的惰性序列块，将它们转换为 hashmap 进行查找。如果失败，他们将获取下一个块，依此类推。这是延迟使用关联数据的明智方法吗？ Clojure 中是否有一些惯用的方法来返回/管理大型关联数据？如果有任何关于我的选择的想法，我将不胜感激。预先感谢您的帮助。

原文

I need to return a sequence, a number and a hash-map from my function (all wrapped in a vector) so that the printed return value looks like this:

[ ([:c :a] [:e :c] [:f :e] [:d :e] [:g :f] [:b :a])  15
  {:g :c, :f :a, :c :e, :d :a, :b :a, :c :a} ]

Since my inputs could be large, I'd like to return lazy sequences/objects from my function.
The sequence of pairs (the first object in my return vector) was easy enough to make lazy by wrapping 'lazy-seq' around the conj calls that build it up.

The hash-map (3rd object in my return vector and potentially very large like my sequence) is being built-up (using assoc calls) in the same loop-recur block as the sequence. The hash-map is additional info that some of my callers will use but if the pairs-sequence is returned as lazy then I'm wondering if it makes sense to send back a potentially huge hash-map with (an efficient) lazy-seq even if I make it an optional return-value. The entries in the hash-map are related to the pairs in the lazy-sequence.

So here is my noobie question: Is there any sense in sending back a lazy-sequence of MapEntry's in place of a large HashMap? That is, assuming a user would take a chunk of the lazy-seq of MapEntrys, convert them to hashmap to do a lookup..failing which they'd take the next chunk and so on. Is this a sensible way to lazily use associative-data?
Are there some idiomatic ways to return/manage large associative data in Clojure?
Would appreciate any ideas as to what my options are. Thanks in advance for your help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

愿得七秒忆 2025-01-15 19:56:37

不，给他们一张惰性地图是不可能的。 MapEntries 的惰性序列是可能的，但不是很有用。不过，还有许多其他可能有意义且相似的选项。

您说调用者可能根本不需要地图：因此返回地图的延迟，如果需要，他们可以强制执行。
如果键的计算成本较低，但值的计算成本较高，则可以返回包含正确键和每个延迟的值的完整映射；调用者只能强制他们需要的值。

您仍然可以返回向量的惰性序列（我不会费心将它们制作为 MapEntries），但调用者基本上无法将其视为惰性映射。他们要么只想查找一组固定的已知键（在这种情况下，他们只是懒惰地过滤条目，从不使其成为映射），要么他们想任意查找条目，在这种情况下，他们会在查找第一个条目后，必须将所有条目保留在内存中，以便他们仍然可以查找第二个条目，因此他们可能会将整个内容转储到完全实现的映射中。

回复收藏 0 原文

白色秋天 2025-01-15 19:56:37

不，Clojure 没有惰性映射。

另外，如果您正在使用循环/递归构建序列，我不相信尝试使其变得懒惰可以完成任何事情（除非生成每个元素很慢）。

看一下这两个函数：

(defn bad-lazy-range [begin end]
  (loop [i (dec end) lst nil]
    (if (>= i begin)
      (recur (dec i) (lazy-seq (cons i lst)))
      lst)))

(defn good-lazy-range [begin end]
  (if (>= begin end)
    nil
    (lazy-seq (cons begin (good-lazy-range (inc begin) end)))))

bad-lazy-range会重复begin-end次，每次生成一个thunk（惰性序列链接），然后返回最外层的thunk 。这个 thunk 需要保留对下一个 thunk 的引用，下一个 thunk 需要对第三个 thunk 的引用，等等。您立即完成所有工作并生成一个 thunk 的伪链接列表，它比普通列表占用更多的空间。

然而，good-lazy-range 会立即返回，而无需进行更多递归——递归调用隐藏在 thunk 中，除非需要，否则不会对其进行求值。这还可以防止堆栈溢出异常 - 如果没有 lazy-seq 调用，它可能会生成堆栈溢出异常，但在每一步中，它都会评估对 good-lazy-range< 的一次调用/code> 并返回。然后，调用者可以评估下一个调用，但此时，第一次调用的堆栈帧早已消失。

一般来说，只有当您可以处理大量计算时才使用lazy-seq。在第一个函数中，它仅包含对 cons 的调用，无论如何它都会快速返回。然而，在第二个函数中，它包含了对 cons 的调用和递归调用，这意味着它延迟了大量的计算。

如果您的代码正确使用惰性并使用循环/递归，请发布它 - 我很想看看您是如何做到的。

No, Clojure does not have lazy maps.

Also, if you are building up a sequence using loop/recur, I don't believe that trying to make it lazy accomplishes anything (unless generating each element is slow).

Look at these two functions:

(defn bad-lazy-range [begin end]
  (loop [i (dec end) lst nil]
    (if (>= i begin)
      (recur (dec i) (lazy-seq (cons i lst)))
      lst)))

(defn good-lazy-range [begin end]
  (if (>= begin end)
    nil
    (lazy-seq (cons begin (good-lazy-range (inc begin) end)))))

bad-lazy-range will recur begin-end times, generating a thunk (a lazy sequence link) each time, and then return the outermost thunk. This thunk needs to keep the reference to the next thunk, which needs a reference to the third thunk, etc. You do all the work immediately and generate a pseudo-linked list of thunks which takes up more space than a normal list would.

good-lazy-range, however, returns immediately without recursing more -- the recursive call is hidden inside the thunk and won't be evaluated until necessary. This also prevents a stack overflow exception -- without the lazy-seq call, it could generate a stack overflow exception, but at each step, it evaluates one call to good-lazy-range and returns. The caller can then evaluate the next call, but at this point, the stack frame from the first call is long gone.

In general, only use lazy-seq if you can wrap it around a significant amount of computation. In the first function, it is only wrapped around a call to cons, which would return quickly anyway. In the second function, however, it is wrapped around a call to cons and the recursive call, which means that it is delaying a worthwhile amount of computation.

If your code uses lazyness correctly and uses loop/recur, please post it -- I would be interested to see how you did it.

回复收藏 0 原文