具有良好性能的多重映射

发布于 2024-09-14 03:40:36 字数 361 浏览 10 评论 0 原文

在我的代码中,我有一个被大量使用的地图,在几秒钟内使用了数千次。最初我有一个 TreeMap,但是当测试 9,000 个条目时,我发现我的旧处理器融化了。这需要扩大规模。所以我转向了 HashMap,性能非常出色。

现在我正在改变我的设计并正在寻找 MultiMap。然而,我担心对 get() 端的性能影响,因为它必须迭代所述大映射以挑选出匹配的键,并且当多次调用甚至同步时,它似乎会慢一点。

是否有一个好的 MultiMap 能够以出色的性能处理如此大的值?在此应用程序中,性能至关重要,因为可能有许多大型独立映射处理非常大的工作负载,从而使“小”性能损失成为很大的问题。

如果它可以被提取出来单独工作而没有任何依赖,那就加分了。

In my code I have a map that is used heavily, several thousand times in a few seconds. Originally I had a TreeMap, but when testing with 9,000 entries I watched my old processor melt. And this needs to scale. So I moved to a HashMap and performance was excellent.

Now I am changing my design and am looking for a MultiMap. However I'm afraid of the performance impact on the get() side, as it must iterate over said large map picking out the matching keys, and when called many many times even synchronized it seems like it would be slow.

Is there a good MultiMap that can handle such large values with great performance? Performance is critical in this application, as there could be many large separate maps handling a very large workload, making "small" performance losses very big issues.

Bonus points if it can be extracted to work alone without any dependencies.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

幽梦紫曦~ 2024-09-21 03:40:37

我有一个要求,我必须有一个 Map> ,其中 Map 上的插入是并发的,并且在相应的 Set 上也是并发的,但是一旦从 Map 中消耗了一个 Key ,它必须被删除,想想如果作为一个每两秒运行一次的作业,它会消耗来自特定键的整个Set,但插入是完全并发的,以便在作业时大多数值被缓冲开始,这是我的实现:

注意:我使用 Guava 的辅助类 Maps 来创建并发 Map,此外,该解决方案模拟 实践清单 5.19 中的 Java 并发

import com.google.common.collect.MapMaker;

import java.util.concurrent.ConcurrentMap;

/**
 * Created by IntelliJ IDEA.
 * User: gmedina
 * Date: 18-Sep-2012
 * Time: 09:17:50
 */
public class LockMap<K extends Comparable>
{
  private final ConcurrentMap<K, Object> locks;

  public LockMap()
  {
    this(16, 64);
  }

  public LockMap(final int concurrencyLevel)
  {
    this(concurrencyLevel, 64);
  }

  public LockMap(final int concurrencyLevel, final int initialCapacity)
  {
    locks=new MapMaker().concurrencyLevel(concurrencyLevel).initialCapacity(initialCapacity).weakValues().makeMap();
  }

  public Object getLock(final K key)
  {
    final Object object=new Object();
    Object lock=locks.putIfAbsent(key, object);
    return lock == null ? object : lock;
  }

}


import com.google.common.collect.MapMaker;
import com.google.common.collect.Sets;

import java.util.Collection;
import java.util.Set;
import java.util.concurrent.ConcurrentMap;

/**
 * A general purpose Multimap implementation for delayed processing and concurrent insertion/deletes.
 *
 * @param <K> A comparable Key
 * @param <V> A comparable Value
 */
public class ConcurrentMultiMap<K extends Comparable, V extends Comparable>
{
  private final int initialCapacity;
  private final LockMap<K> locks;
  private final ConcurrentMap<K, Set<V>> cache;

  public ConcurrentMultiMap()
  {
    this(16, 64);
  }

  public ConcurrentMultiMap(final int concurrencyLevel)
  {
    this(concurrencyLevel, 64);
  }

  public ConcurrentMultiMap(final int concurrencyLevel, final int initialCapacity)
  {
    this.initialCapacity=initialCapacity;
    cache=new MapMaker().concurrencyLevel(concurrencyLevel).initialCapacity(initialCapacity).makeMap();
    locks=new LockMap<K>(concurrencyLevel, initialCapacity);
  }

  public void put(final K key, final V value)
  {
    synchronized(locks.getLock(key)){
      Set<V> set=cache.get(key);
      if(set == null){
        set=Sets.newHashSetWithExpectedSize(initialCapacity);
        cache.put(key, set);
      }
      set.add(value);
    }
  }

  public void putAll(final K key, final Collection<V> values)
  {
    synchronized(locks.getLock(key)){
      Set<V> set=cache.get(key);
      if(set == null){
        set=Sets.newHashSetWithExpectedSize(initialCapacity);
        cache.put(key, set);
      }
      set.addAll(values);
    }
  }

  public Set<V> remove(final K key)
  {
    synchronized(locks.getLock(key)){
      return cache.remove(key);
    }
  }

  public Set<K> getKeySet()
  {
    return cache.keySet();
  }

  public int size()
  {
    return cache.size();
  }

}

I had a requirement where I had to have a Map<Comparable, Set<Comparable>> where insertion on the Map be concurrent and also on the corresponding Set, but once a Key was consumed from the Map, it had to be deleted, think if as a Job running every two seconds which is consuming the whole Set<Comparable> from an specific Key but insertion be totally concurrent so that most values be buffered when the Job kicks in, here is my implementation:

Note: I use Guava's helper class Maps to create the concurrent Maps, also, this solution emulates Java concurrency in Practice Listing 5.19:

import com.google.common.collect.MapMaker;

import java.util.concurrent.ConcurrentMap;

/**
 * Created by IntelliJ IDEA.
 * User: gmedina
 * Date: 18-Sep-2012
 * Time: 09:17:50
 */
public class LockMap<K extends Comparable>
{
  private final ConcurrentMap<K, Object> locks;

  public LockMap()
  {
    this(16, 64);
  }

  public LockMap(final int concurrencyLevel)
  {
    this(concurrencyLevel, 64);
  }

  public LockMap(final int concurrencyLevel, final int initialCapacity)
  {
    locks=new MapMaker().concurrencyLevel(concurrencyLevel).initialCapacity(initialCapacity).weakValues().makeMap();
  }

  public Object getLock(final K key)
  {
    final Object object=new Object();
    Object lock=locks.putIfAbsent(key, object);
    return lock == null ? object : lock;
  }

}


import com.google.common.collect.MapMaker;
import com.google.common.collect.Sets;

import java.util.Collection;
import java.util.Set;
import java.util.concurrent.ConcurrentMap;

/**
 * A general purpose Multimap implementation for delayed processing and concurrent insertion/deletes.
 *
 * @param <K> A comparable Key
 * @param <V> A comparable Value
 */
public class ConcurrentMultiMap<K extends Comparable, V extends Comparable>
{
  private final int initialCapacity;
  private final LockMap<K> locks;
  private final ConcurrentMap<K, Set<V>> cache;

  public ConcurrentMultiMap()
  {
    this(16, 64);
  }

  public ConcurrentMultiMap(final int concurrencyLevel)
  {
    this(concurrencyLevel, 64);
  }

  public ConcurrentMultiMap(final int concurrencyLevel, final int initialCapacity)
  {
    this.initialCapacity=initialCapacity;
    cache=new MapMaker().concurrencyLevel(concurrencyLevel).initialCapacity(initialCapacity).makeMap();
    locks=new LockMap<K>(concurrencyLevel, initialCapacity);
  }

  public void put(final K key, final V value)
  {
    synchronized(locks.getLock(key)){
      Set<V> set=cache.get(key);
      if(set == null){
        set=Sets.newHashSetWithExpectedSize(initialCapacity);
        cache.put(key, set);
      }
      set.add(value);
    }
  }

  public void putAll(final K key, final Collection<V> values)
  {
    synchronized(locks.getLock(key)){
      Set<V> set=cache.get(key);
      if(set == null){
        set=Sets.newHashSetWithExpectedSize(initialCapacity);
        cache.put(key, set);
      }
      set.addAll(values);
    }
  }

  public Set<V> remove(final K key)
  {
    synchronized(locks.getLock(key)){
      return cache.remove(key);
    }
  }

  public Set<K> getKeySet()
  {
    return cache.keySet();
  }

  public int size()
  {
    return cache.size();
  }

}
所有深爱都是秘密 2024-09-21 03:40:37

选择很大程度上取决于您想做什么。数据结构有很多种,有些在特定领域比其他更好,反之亦然。

我可以向您推荐潜在的候选人。如果完全读取,ImmutableMultiMap 可能是一个不错的选择。

如果您需要并发读/写,那么我会实现我自己的多重映射,也许使用ConcurrentHashMap和ConcurrentSkipListSet(您需要小心,因为同步多重映射和使用非这种方式创建的多重映射之间的语义) -阻塞数据结构不同)。如果您使用 ConcurrentSkipListSet,则可以使用二分搜索,它比仅仅迭代更快。

如果您有很多行,您也可以从使用 ConcurrentHashMap 和同步列表开始。这可以显着减少争用,这可能足以解决您的性能问题,而且很简单。

The choice would largely depend on what you want to do. There are many data-structures and some are better than others in specific areas and vice versa.

I could recommend you potential candidates. If it is entirely read, ImmutableMultiMap might be a good fit.

If you need concurrent read/write, then I'd implement my own multimap, perhaps using ConcurrentHashMap and ConcurrentSkipListSet (you need to be careful because the semantics between a synchronized multimap and a multipmap created this way using non-blocking data structures differ). If you use ConcurrentSkipListSet, you can then use binary search and it's faster than just iterating.

If you have a lot of rows, you could also start by just using a ConcurrentHashMap and a synchronized list. That could significantly reduce the contention, which might be enough to resolve your performance problem and it's simple.

ゝ偶尔ゞ 2024-09-21 03:40:37

我一直在尽可能使用 Google Guava 作为 Apache Commons 的替代品...这是一个使用 Multimap 实现 HashMultiMap 的示例,请注意,映射的值是值的集合而不是单个引用。方法“contains()”用于获取 get(key) 的结果。

private Multimap<Phase, ResultingState> phaseResults = HashMultimap.create();

/**
 * @param withState is the state to be verified.
 * @param onPhase is the phase to be verified.
 * @return Whether the given result was reported in the given phase.
 */
public boolean wasReported(ResultingState withState, Phase onPhase) {
    return phaseResults.containsKey(onPhase) && phaseResults.get(onPhase).contains(withState);
}

/**
 * @param resultingState is the resulting state.
 * @return Whether the given resulting state has ever been reported.
 */
public boolean anyReported(ResultingState resultingState) {
    return phaseResults.values().contains(resultingState);
}

I've been using Google Guava as a replacement to Apache Commons whenever possible... Here's an example with its Multimap's implementation HashMultiMap, and notice that the values of the map is a collection of the values instead of a single reference. The method "contains()" is used for the result of get(key).

private Multimap<Phase, ResultingState> phaseResults = HashMultimap.create();

/**
 * @param withState is the state to be verified.
 * @param onPhase is the phase to be verified.
 * @return Whether the given result was reported in the given phase.
 */
public boolean wasReported(ResultingState withState, Phase onPhase) {
    return phaseResults.containsKey(onPhase) && phaseResults.get(onPhase).contains(withState);
}

/**
 * @param resultingState is the resulting state.
 * @return Whether the given resulting state has ever been reported.
 */
public boolean anyReported(ResultingState resultingState) {
    return phaseResults.values().contains(resultingState);
}
沉溺在你眼里的海 2024-09-21 03:40:37

当您提到您“迭代所述大地图以挑选出匹配的键”时,这让我想知道您是否正在使用最好的数据结构。有没有办法可以避免这种迭代?

请注意,Guava 包含多个具有不同性能特征的多重映射实现。正如 Zwei 提到的,ImmutableMultimap 比可变多重映射具有更好的性能。如果您的代码检查多重映射是否包含特定值,则 SetMultimaps 会更快;否则 ArrayListMultimap 的性能会更好。

When you mention that you "iterate over said large map picking out the matching keys", that makes me wonder whether you're using the best data structure. Is there a way you could avoid that iteration?

Note that Guava includes multiple multimap implementations with different performance characteristics. As Zwei mentioned, an ImmutableMultimap has better performance than the mutable multimaps. The SetMultimaps are faster if your code checks whether the multimap contains a particular value; otherwise an ArrayListMultimap performs better.

ら栖息 2024-09-21 03:40:36

在我的一个问题中向我推荐的是 Apache Commons MultiMap:
http://commons.apache。 org/collections/api-3.2.1/org/apache/commons/collections/MultiHashMap.html

它是免费软件,因此您至少可以获取源代码来查看它,并且根据您的许可证情况,您可以可以修改它或独立使用它。

它在内部使用 ArrayList,但我想你可以将其更改为使用 HashSet 或其他东西。我会查看 createCollection(Collection coll) 方法。

更新:实际上,Guava 的 HashMultiMap 似乎已经是我所说的:
https://github。 com/google/guava/blob/master/guava/src/com/google/common/collect/Multimap.java

我查看了源代码,似乎每个值的集合实际上都由 HashSet 支持。

The one that was recommended to me in one of my questions was the Apache Commons MultiMap:
http://commons.apache.org/collections/api-3.2.1/org/apache/commons/collections/MultiHashMap.html

It's free software, so you can at least get the source to look at it, and depending on your license situation, you can modify it or use it standalone.

It uses an ArrayList internally, but I imagine you can probably change it to use a HashSet or something. I would look at the createCollection(Collection coll) method.

UPDATE: Actually, Guava's HashMultiMap appears to already be what I was talking about:
https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/Multimap.java

I looked at the source and it seems that each collection of values is in fact backed by a HashSet.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文