保证键唯一时 HashMap 的性能

发布于 2024-11-19 18:25:48 字数 282 浏览 3 评论 0原文

如果我希望使用的密钥保证是唯一的(或者至少可以假设密钥是唯一的),那么使用“vanilla”ConcurrentHashMap 提供最佳性能,或者是否需要修改哈希函数或 put 方法以避免不必要的哈希?

另外,数字键相对于非数字键(例如具有适当哈希函数的字符串或 POJO)是否具有任何性能优势?

If the keys I wish to use are guaranteed to be unique (or at least the assumption can be made that the keys are unique), does using a 'vanilla' ConcurrentHashMap provide the best performance, or does a hashing function or put method need to be modified to avoid needless hashing?

Also, does a numeric key have any performance benefit over a non-numeric key (such as a String or POJO with a proper hashing function)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

初熏 2024-11-26 18:25:48

正如评论中已经提到的,如果您不需要线程安全方面,那么就不要使用ConcurrentHashMap。

如果您想要绝对最佳的性能,请考虑保留您的密钥并使用 IdentityHashMap。这避免了计算对象的哈希(并且,正如注释中提到的,不需要评估 equals),而是假设引用本身就是哈希。

显然,您必须确保同一键的两个实例是同一对象(例如,您必须确保引用相等,而不仅仅是对象相等)。保留所有密钥是实现这一目标的一种方法。

实现说明:这是一个简单的线性探测哈希表,如 Sedgewick 和 Knuth 的文本中所述。该数组交替保存键和值。 (对于大型表来说,这比使用单独的数组具有更好的局部性。)对于许多 JRE 实现和操作混合,此类将比 HashMap(使用链接而不是线性探测)产生更好的性能。

如果您知道所有密钥,也许您还可以考虑完美哈希?或者映射到一个简单的数组结构?

As already mentioned in the comments, if you don't need the thread-safe aspects then don't use ConcurrentHashMap.

If you want the absolute best performance consider interning your keys and using an IdentityHashMap. This avoids calculating the hash of the object (and, as mentioned in the comments, negates the need for equals to be evaluated) and instead assumes that the reference itself is the hash.

Note obviously that you've got to make sure that two instances of the same key are the same object (e.g. you have to ensure reference equality, not just object equality). Interning all your keys is one approach for achieving this.

Implementation note: This is a simple linear-probe hash table, as described for example in texts by Sedgewick and Knuth. The array alternates holding keys and values. (This has better locality for large tables than does using separate arrays.) For many JRE implementations and operation mixes, this class will yield better performance than HashMap (which uses chaining rather than linear-probing).

If you know all the keys, perhaps you could also consider perfect hashing? Or map to a simple array structure?

旧情别恋 2024-11-26 18:25:48

ConcurrentHashMap 是 HashMap 实现中最昂贵的,这是因为它是线程安全的。

所有地图都必须有唯一的键,因此这是给定的。

如果您使用支持 TLongHashMap 等基元的集合,则使用数字具有性能优势,但是使用自定义哈希映射可能会更快。

来自 http://vanillajava.blogspot.com/ 2011/07/low-gc-in-java-using-primitives.html

Test                                    Performance Memory used
Use Integer wrappers and HashMap        71 - 134 (ns)   53 MB/sec
Use int primitives and HashMap          45 - 76 (ns)    36 MB/sec
Use int primitives and FastMap          58 - 93 (ns)    28 MB/sec
Use int primitives and TIntIntHashMap   18 - 28 (ns)    nonimal
Use int primitives and simple hash map   6 - 9 (ns)     nonimal 

ConcurrentHashMap is the most expensive of the HashMap implementations, this is becuase it is thread safe.

All Maps must have unique keys so this is a given.

Using numbers has a performance advantage if you use a collection which supports primtives like TLongHashMap, however you may be able to go much faster using a custom hash map.

From http://vanillajava.blogspot.com/2011/07/low-gc-in-java-using-primitives.html

Test                                    Performance Memory used
Use Integer wrappers and HashMap        71 - 134 (ns)   53 MB/sec
Use int primitives and HashMap          45 - 76 (ns)    36 MB/sec
Use int primitives and FastMap          58 - 93 (ns)    28 MB/sec
Use int primitives and TIntIntHashMap   18 - 28 (ns)    nonimal
Use int primitives and simple hash map   6 - 9 (ns)     nonimal 
半边脸i 2024-11-26 18:25:48

如果我希望使用的键保证是唯一的(或者至少可以假设键是唯一的),那么使用“vanilla”ConcurrentHashMap 是否可以提供最佳性能,

您通常会使用 ConcurrentHashMap code> 如果 Map 是潜在的并发瓶颈。如果您的应用程序是单线程的或者没有争用,则 ConcurrentHashMap 比 HashMap 慢。

或者是否需要修改哈希函数或 put 方法以避免不必要的哈希?

哈希函数每次“探测”哈希表时都会被评估一次;例如,每个 getput 操作一次。您可以通过缓存结果来降低哈希函数的成本,但这会导致每个键对象额外占用 4 个字节的存储空间。缓存是否值得优化取决于:

  • 与应用程序的其余部分相比,哈希的相对成本是多少,以及
  • 实际使用缓存值的 hashCode() 调用比例。

这两个因素都是高度特定于应用的。

(顺便说一句,使用身份哈希码作为哈希值的长期成本也是额外的 4 个字节的存储空间。)

此外,与非数字键(例如具有适当哈希函数的字符串或 POJO)相比,数字键是否具有任何性能优势?

在数字情况下,哈希函数可能更便宜,但是否值得取决于使用数字键是否存在特定于应用程序的缺点。并且,如上所述,相对成本取决于应用的具体情况。例如,String.hashCode() 的成本与被散列的字符串的长度成正比。

If the keys I wish to use are guaranteed to be unique (or at least the assumption can be made that the keys are unique), does using a 'vanilla' ConcurrentHashMap provide the best performance,

You would typically use ConcurrentHashMap if the Map is a potential concurrency bottleneck. If your application is single threaded or if there is no contention, ConcurrentHashMap is slower than HashMap.

or does a hashing function or put method need to be modified to avoid needless hashing?

The hash function gets evaluated once per "probe" of the hash table; e.g. once per get or put operation. You can reduce the cost of the hash function by caching the result, but this costs you an extra 4 bytes of storage per key object. Whether caching is a worthwhile optimization depends on:

  • what the relative cost of hashing is compared with the rest of the application, and
  • the proportion of calls to hashCode() that will actually make use of the cached value.

Both of these factors are highly application specific.

(Incidentally, the long term cost of using the identity hashcode as the hash value is also an extra 4 bytes of storage.)

Also, does a numeric key have any performance benefit over a non-numeric key (such as a String or POJO with a proper hashing function)?

The hash function is likely to be cheaper in the numeric case, but whether it is worth it depends on whether there are application-specific downsides of using a numeric key. And, as above, the relative costs are application specifics. For instance, the cost of String.hashCode() is proportional to the length of the string being hashed.

萌酱 2024-11-26 18:25:48

Java 的 HashMap 最终由 Entry 数组支持,其中 K 的哈希码用于确定存储 Entry 的数组中的槽。

使用的数组的大小 (通常从 16 开始)远小于可能的哈希码数量(2^32 ~= 40 亿),因此即使哈希码是唯一的,该数组中也必然存在冲突。

只要您的 hashcode() 方法很快,用作 Key 的类型之间就没有区别。请记住,hashcode() 方法可能会被调用很多次,因此如果它很慢,您可以将其缓存在对象内部。

Java's HashMaps are eventually backed by an array of Entry<K,V> where the hashcode of K is used to determine the slot in the array that the Entry is stored in.

The size of the array used (typically starts at 16) is much smaller than the number of possible hashcodes (2^32 ~= 4 billion), so there are bound to be collisions in this array, even if the hashcodes are unique.

So long as your hashcode() method is fast, there is no difference between the types that are used as the Key. Remember that the hashcode() method may be called lots of times, so if it is slow you can cache it internally in the object.

在你怀里撒娇 2024-11-26 18:25:48

我有 ConcurrentHashMap 实例映射,可以通过多线程访问。参见下面的代码片段。这些怎么样?

Iterator<String> it = new TreeSet<String>(map.keySet()).iterator();
            while(it.hasNext())
            {
                id = it.next();
                synchronized(map)
                {
                    msg = map.get(id);
                    if(msg != null)
                        map.remove(id);
                }
                if(msg != null)
                listener.procMessage(msg);
            }

i have ConcurrentHashMap instance map which access by multithread.seeing below code snippet. how about these?

Iterator<String> it = new TreeSet<String>(map.keySet()).iterator();
            while(it.hasNext())
            {
                id = it.next();
                synchronized(map)
                {
                    msg = map.get(id);
                    if(msg != null)
                        map.remove(id);
                }
                if(msg != null)
                listener.procMessage(msg);
            }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文