在我的代码中,我有一个被大量使用的地图,在几秒钟内使用了数千次。最初我有一个 TreeMap,但是当测试 9,000 个条目时,我发现我的旧处理器融化了。这需要扩大规模。所以我转向了 HashMap,性能非常出色。
现在我正在改变我的设计并正在寻找 MultiMap。然而,我担心对 get()
端的性能影响,因为它必须迭代所述大映射以挑选出匹配的键,并且当多次调用甚至同步时,它似乎会慢一点。
是否有一个好的 MultiMap 能够以出色的性能处理如此大的值?在此应用程序中,性能至关重要,因为可能有许多大型独立映射处理非常大的工作负载,从而使“小”性能损失成为很大的问题。
如果它可以被提取出来单独工作而没有任何依赖,那就加分了。
In my code I have a map that is used heavily, several thousand times in a few seconds. Originally I had a TreeMap, but when testing with 9,000 entries I watched my old processor melt. And this needs to scale. So I moved to a HashMap and performance was excellent.
Now I am changing my design and am looking for a MultiMap. However I'm afraid of the performance impact on the get()
side, as it must iterate over said large map picking out the matching keys, and when called many many times even synchronized it seems like it would be slow.
Is there a good MultiMap that can handle such large values with great performance? Performance is critical in this application, as there could be many large separate maps handling a very large workload, making "small" performance losses very big issues.
Bonus points if it can be extracted to work alone without any dependencies.
发布评论
评论(5)
我有一个要求,我必须有一个
Map>
,其中 Map 上的插入是并发的,并且在相应的 Set 上也是并发的,但是一旦从 Map 中消耗了一个 Key ,它必须被删除,想想如果作为一个每两秒运行一次的作业,它会消耗来自特定键的整个Set
,但插入是完全并发的,以便在作业时大多数值被缓冲开始,这是我的实现:注意:我使用 Guava 的辅助类 Maps 来创建并发 Map,此外,该解决方案模拟 实践清单 5.19 中的 Java 并发 :
I had a requirement where I had to have a
Map<Comparable, Set<Comparable>>
where insertion on the Map be concurrent and also on the corresponding Set, but once a Key was consumed from the Map, it had to be deleted, think if as a Job running every two seconds which is consuming the wholeSet<Comparable>
from an specific Key but insertion be totally concurrent so that most values be buffered when the Job kicks in, here is my implementation:Note: I use Guava's helper class Maps to create the concurrent Maps, also, this solution emulates Java concurrency in Practice Listing 5.19:
选择很大程度上取决于您想做什么。数据结构有很多种,有些在特定领域比其他更好,反之亦然。
我可以向您推荐潜在的候选人。如果完全读取,ImmutableMultiMap 可能是一个不错的选择。
如果您需要并发读/写,那么我会实现我自己的多重映射,也许使用ConcurrentHashMap和ConcurrentSkipListSet(您需要小心,因为同步多重映射和使用非这种方式创建的多重映射之间的语义) -阻塞数据结构不同)。如果您使用 ConcurrentSkipListSet,则可以使用二分搜索,它比仅仅迭代更快。
如果您有很多行,您也可以从使用 ConcurrentHashMap 和同步列表开始。这可以显着减少争用,这可能足以解决您的性能问题,而且很简单。
The choice would largely depend on what you want to do. There are many data-structures and some are better than others in specific areas and vice versa.
I could recommend you potential candidates. If it is entirely read, ImmutableMultiMap might be a good fit.
If you need concurrent read/write, then I'd implement my own multimap, perhaps using ConcurrentHashMap and ConcurrentSkipListSet (you need to be careful because the semantics between a synchronized multimap and a multipmap created this way using non-blocking data structures differ). If you use ConcurrentSkipListSet, you can then use binary search and it's faster than just iterating.
If you have a lot of rows, you could also start by just using a ConcurrentHashMap and a synchronized list. That could significantly reduce the contention, which might be enough to resolve your performance problem and it's simple.
我一直在尽可能使用 Google Guava 作为 Apache Commons 的替代品...这是一个使用 Multimap 实现 HashMultiMap 的示例,请注意,映射的值是值的集合而不是单个引用。方法“contains()”用于获取 get(key) 的结果。
I've been using Google Guava as a replacement to Apache Commons whenever possible... Here's an example with its Multimap's implementation HashMultiMap, and notice that the values of the map is a collection of the values instead of a single reference. The method "contains()" is used for the result of get(key).
当您提到您“迭代所述大地图以挑选出匹配的键”时,这让我想知道您是否正在使用最好的数据结构。有没有办法可以避免这种迭代?
请注意,Guava 包含多个具有不同性能特征的多重映射实现。正如 Zwei 提到的,ImmutableMultimap 比可变多重映射具有更好的性能。如果您的代码检查多重映射是否包含特定值,则 SetMultimaps 会更快;否则 ArrayListMultimap 的性能会更好。
When you mention that you "iterate over said large map picking out the matching keys", that makes me wonder whether you're using the best data structure. Is there a way you could avoid that iteration?
Note that Guava includes multiple multimap implementations with different performance characteristics. As Zwei mentioned, an ImmutableMultimap has better performance than the mutable multimaps. The SetMultimaps are faster if your code checks whether the multimap contains a particular value; otherwise an ArrayListMultimap performs better.
在我的一个问题中向我推荐的是 Apache Commons MultiMap:
http://commons.apache。 org/collections/api-3.2.1/org/apache/commons/collections/MultiHashMap.html
它是免费软件,因此您至少可以获取源代码来查看它,并且根据您的许可证情况,您可以可以修改它或独立使用它。
它在内部使用 ArrayList,但我想你可以将其更改为使用 HashSet 或其他东西。我会查看 createCollection(Collection coll) 方法。
更新:实际上,Guava 的 HashMultiMap 似乎已经是我所说的:
https://github。 com/google/guava/blob/master/guava/src/com/google/common/collect/Multimap.java
我查看了源代码,似乎每个值的集合实际上都由 HashSet 支持。
The one that was recommended to me in one of my questions was the Apache Commons MultiMap:
http://commons.apache.org/collections/api-3.2.1/org/apache/commons/collections/MultiHashMap.html
It's free software, so you can at least get the source to look at it, and depending on your license situation, you can modify it or use it standalone.
It uses an ArrayList internally, but I imagine you can probably change it to use a HashSet or something. I would look at the
createCollection(Collection coll)
method.UPDATE: Actually, Guava's HashMultiMap appears to already be what I was talking about:
https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/Multimap.java
I looked at the source and it seems that each collection of values is in fact backed by a HashSet.