巨大地图的不可变地图实现

发布于 2024-08-22 04:46:12 字数 477 浏览 8 评论 0原文

如果我有一个不可变地图,我可能希望(在很短的时间内 - 比如几秒钟)从中添加/删除数十万项目,标准 HashMap 是一个坏主意吗?假设我想在 10 秒内通过 Map 传递 1Gb 的数据,使得 Map 在任何瞬间的最大大小仅为 256Mb。

我的印象是地图保留了某种“历史”,但我将总是访问最后更新的表(即我不传递地图),因为它是私有成员变量一个仅从反应内部更新/访问的Actor

基本上我怀疑这个数据结构可能(部分)有问题问题我发现 JVM 在短时间内读取大量数据时会出现内存不足的情况。

如果我使用不同的地图实现会更好吗?如果是的话,它是什么?

If I have an immutable Map which I might expect (over a very short period of time - like a few seconds) to be adding/removing hundreds of thousands of items from, is the standard HashMap a bad idea? Let's say I want to pass 1Gb of data through the Map in <10 seconds in such a way that the maximum size of the Map at any once instant is only 256Mb.

I get the impression that the map keeps some kind of "history" but I will always be accessing the last-updated table (i.e. I do not pass the map around) because it is a private member variable of an Actor which is updated/accessed only from within reactions.

Basically I suspect that this data structure may be (partly) at fault for issues I am seeing around JVMs going out of memory when reading in large amounts of data in a short time.

Would I be better off with a different map implementation and, if so, what is it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

帝王念 2024-08-29 04:46:12

哎哟。为什么必须使用不可变的映射?可怜的垃圾收集者!除了 (log n) 时间之外,不可变映射通常每次操作还需要 (log n) 个新对象,或者它们实际上只是将可变哈希映射和层变更集包装在顶部(这会减慢速度并增加对象创建的数量)。

不变性很棒,但在我看来,现在不是使用它的时候。如果我是你,我会坚持使用 scala.collection.mutable.HashMap。如果您需要并发访问,请包装 Java util.concurrent 。

您可能还希望增加 JVM 中年轻代的大小:-Xmn1G 或更多(假设您使用 -Xmx3G 运行)。另外,使用吞吐量(并行)垃圾收集器。

Ouch. Why do you have to use an immutable map? Poor garbage collector! Immutable maps generally require (log n) new objects per operation in addition to (log n) time, or they really just wrap mutable hash maps and layer changesets on top (which slows things down and can increase the number of object creations).

Immutability is great, but this does not seem to me like the time to use it. If I were you, I'd stick with scala.collection.mutable.HashMap. If you need concurrent access, wrap the Java util.concurrent one instead.

You also might want to increase the size of the young generation in the JVM: -Xmn1G or more (assuming you're running with -Xmx3G). Also, use the throughput (parallel) garbage collector.

守望孤独 2024-08-29 04:46:12

那太糟糕了。你说你总是想访问最后更新的表,这意味着你只需要一个临时数据结构,不需要支付持久数据结构的成本- 这就像交易时间和记忆来获得完全有争议的“风格点”。当你不需要时,你不会通过盲目地使用持久结构来建立你的业力。

此外,哈希表是一种特别难以持久化的结构。换句话说,“非常非常慢”(基本上,当读取数量大大超过写入数量时,它是可用的 - 而且您似乎谈论了很多写入)。

顺便说一句,考虑到映射是从单个参与者访问的(这就是我从描述中理解的),ConcurrentHashMap 在此设计中没有意义。

That would be awful. You say you always want to access the last-updated table, that means you only need an ephemeral data structure, there is no need to pay the cost for a persistent data structure - it's like trading time and memory to gain completely arguable "style points". You are not building your karma by using blindly persistent structures when they are not called for.

Also, a hashtable is a particularly difficult structure to make persistent. In other words, "very, very slow" (basically it is usable when reads greatly outnumber writes - and you seem to talk about many writes).

By the way, a ConcurrentHashMap wouldn't make sense in this design, given that the map is accessed from a single actor (that's what I understand from the description).

同尘 2024-08-29 04:46:12

Scala 所谓的(*)不可变 Map 在 Scala 2.7 之前已经超出了基本用法。不要相信我,只要查一下它的开放门票数量即可。解决方案只是“它将被 Scala 2.8 上的其他东西取代”(它确实做到了)。

因此,如果您想要 Scala 2.7.x 的不可变映射,我建议您在 Scala 以外的其他地方寻找它。或者直接使用 TreeHashMap 代替。

(*) Scala 的不可变 Map 并不是真正不可变的。它内部是一个可变的数据结构,需要大量的同步。

Scala's so-called(*) immutable Map is broken beyond basic usage up to Scala 2.7. Don't trust me, just look up the number of open tickets for it. And the solution is just "it will be replaced with something else on Scala 2.8" (which it did).

So, if you want an immutable map for Scala 2.7.x, I'd advise looking for it in something other than Scala. Or just use TreeHashMap instead.

(*) Scala's immutable Map isn't really immutable. It is a mutable data structure internally, which requires lot of synchronization.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文