Scala：读取并保存 Iterable 的所有元素

发布于 2024-11-25 07:09:47 字数 703 浏览 1 评论 0原文

我有一个 Iterable[T]，它实际上是一个未知长度的流，并且想要读取所有内容并将其保存到仍然是 Iterable 实例的内容中。我确实必须阅读并保存它；我不能用懒惰的方式来做这件事。最初的 Iterable 至少可以有几千个元素。最有效/最好/规范的方法是什么？我应该使用 ArrayBuffer、List 还是 Vector？

假设 xs 是我的 Iterable。我可以考虑做这些可能性：

xs.toArray.toIterable     // Ugh?
xs.toList                 // Fast?
xs.copyToBuffer(anArrayBuffer)
Vector(xs: _*)            // There's no toVector, sadly. Is this construct as efficient?

编辑：我从问题中看出我应该更具体。这是一个稻草人的例子：

def f(xs: Iterable[SomeType]) {    // xs might a stream, though I can't be sure
    val allOfXS = <xs all read in at once>
    g(allOfXS)
    h(allOfXS)    // Both g() and h() take an Iterable[SomeType]
}

原文

I have an Iterable[T] that is really a stream of unknown length, and want to read it all and save it into something that is still an instance of Iterable. I really do have to read it and save it; I can't do it in a lazy way. The original Iterable can have a few thousand elements, at least. What's the most efficient/best/canonical way? Should I use an ArrayBuffer, a List, a Vector?

Suppose xs is my Iterable. I can think of doing these possibilities:

xs.toArray.toIterable     // Ugh?
xs.toList                 // Fast?
xs.copyToBuffer(anArrayBuffer)
Vector(xs: _*)            // There's no toVector, sadly. Is this construct as efficient?

EDIT: I see by the questions I should be more specific. Here's a strawman example:

def f(xs: Iterable[SomeType]) {    // xs might a stream, though I can't be sure
    val allOfXS = <xs all read in at once>
    g(allOfXS)
    h(allOfXS)    // Both g() and h() take an Iterable[SomeType]
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

司马昭之心 2024-12-02 07:09:47

这很容易。几千个元素没什么意义，所以除非它是一个非常紧密的循环，否则它几乎不重要。所以轻率的答案是：使用你认为最优雅的任何东西。

但是，好吧，让我们假设这实际上处于某个紧密循环中，并且您可以预测或对您的代码进行足够的基准测试，以了解这会限制性能。

不可变解决方案的最佳性能可能是 Vector，如下使用：

Vector() ++ xs

在我手中，这可以每秒复制 10k 个可迭代对象约 4k-5k 次。 List 大约是速度的一半。

如果您愿意在幕后尝试可变解决方案，xs.toArray.toIterable 通常会以每秒约 10k 副本的速度完成任务。 ArrayBuffer 的速度与 List 大致相同。

如果您确实知道目标的大小（即 size 是 O(1) 或者您从其他地方知道），您可以再削减 20-30%通过分配正确的大小并编写 while 循环来提高执行速度。

如果它实际上是基元，您可以通过编写自己的专门的类似 Iterable 的东西来获得 10 倍的系数，该东西作用于数组并通过底层数组转换为常规集合。

底线：为了将功能、速度和灵活性完美结合，请在大多数情况下使用 Vector() ++ xs。 xs.toIndexedSeq 默认为相同的东西，好处是如果它已经是一个 Vector ，那么它根本不需要时间（并且在不使用括号的情况下很好地链接），并且缺点是您依赖于约定，而不是行为规范（并且需要多输入 1-3 个字符）。

This is easy. A few thousand elements is nothing, so it hardly matters unless it's a really tight loop. So the flippant answer is: use whatever you feel is most elegant.

But, okay, let's suppose that this is actually in some tight loop, and you can predict or have benchmarked your code enough to know that this is performance-limiting.

Your best performance for an immutable solution will likely be a Vector, used like so:

Vector() ++ xs

In my hands, this can copy a 10k iterable about 4k-5k times per second. List is about half the speed.

If you're willing to try a mutable solution under the hood, xs.toArray.toIterable usually takes the cake with about 10k copies per second. ArrayBuffer is about the same speed as List.

If you actually know the size of the target (i.e. size is O(1) or you know it from somewhere else), you can shave off another 20-30% of the execution speed by allocating just the right size and writing a while loop.

If it's actually primitives, you can gain a factor of 10 by writing your own specialized Iterable-like-thing that acts on arrays and converts to regular collections via the underlying array.

Bottom line: for a great blend of power, speed, and flexibility, use Vector() ++ xs in most situations. xs.toIndexedSeq defaults to the same thing, with the benefit that if it's already a Vector that it will take no time at all (and chains nicely without using parens), and the drawback that you are relying upon a convention, not a specification for behavior (and it takes 1-3 more characters to type).

回复收藏 0 原文