保证键唯一时 HashMap 的性能
如果我希望使用的密钥保证是唯一的(或者至少可以假设密钥是唯一的),那么使用“vanilla”ConcurrentHashMap 提供最佳性能,或者是否需要修改哈希函数或 put 方法以避免不必要的哈希?
另外,数字键相对于非数字键(例如具有适当哈希函数的字符串或 POJO)是否具有任何性能优势?
If the keys I wish to use are guaranteed to be unique (or at least the assumption can be made that the keys are unique), does using a 'vanilla' ConcurrentHashMap provide the best performance, or does a hashing function or put method need to be modified to avoid needless hashing?
Also, does a numeric key have any performance benefit over a non-numeric key (such as a String or POJO with a proper hashing function)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
正如评论中已经提到的,如果您不需要线程安全方面,那么就不要使用ConcurrentHashMap。
如果您想要绝对最佳的性能,请考虑保留您的密钥并使用 IdentityHashMap。这避免了计算对象的哈希(并且,正如注释中提到的,不需要评估
equals
),而是假设引用本身就是哈希。显然,您必须确保同一键的两个实例是同一对象(例如,您必须确保引用相等,而不仅仅是对象相等)。保留所有密钥是实现这一目标的一种方法。
如果您知道所有密钥,也许您还可以考虑完美哈希?或者映射到一个简单的数组结构?
As already mentioned in the comments, if you don't need the thread-safe aspects then don't use
ConcurrentHashMap
.If you want the absolute best performance consider interning your keys and using an IdentityHashMap. This avoids calculating the hash of the object (and, as mentioned in the comments, negates the need for
equals
to be evaluated) and instead assumes that the reference itself is the hash.Note obviously that you've got to make sure that two instances of the same key are the same object (e.g. you have to ensure reference equality, not just object equality). Interning all your keys is one approach for achieving this.
If you know all the keys, perhaps you could also consider perfect hashing? Or map to a simple array structure?
ConcurrentHashMap 是 HashMap 实现中最昂贵的,这是因为它是线程安全的。
所有地图都必须有唯一的键,因此这是给定的。
如果您使用支持 TLongHashMap 等基元的集合,则使用数字具有性能优势,但是使用自定义哈希映射可能会更快。
来自 http://vanillajava.blogspot.com/ 2011/07/low-gc-in-java-using-primitives.html
ConcurrentHashMap is the most expensive of the HashMap implementations, this is becuase it is thread safe.
All Maps must have unique keys so this is a given.
Using numbers has a performance advantage if you use a collection which supports primtives like TLongHashMap, however you may be able to go much faster using a custom hash map.
From http://vanillajava.blogspot.com/2011/07/low-gc-in-java-using-primitives.html
您通常会使用 ConcurrentHashMap code> 如果
Map
是潜在的并发瓶颈。如果您的应用程序是单线程的或者没有争用,则 ConcurrentHashMap 比 HashMap 慢。哈希函数每次“探测”哈希表时都会被评估一次;例如,每个
get
或put
操作一次。您可以通过缓存结果来降低哈希函数的成本,但这会导致每个键对象额外占用 4 个字节的存储空间。缓存是否值得优化取决于:hashCode()
调用比例。这两个因素都是高度特定于应用的。
(顺便说一句,使用身份哈希码作为哈希值的长期成本也是额外的 4 个字节的存储空间。)
在数字情况下,哈希函数可能更便宜,但是否值得取决于使用数字键是否存在特定于应用程序的缺点。并且,如上所述,相对成本取决于应用的具体情况。例如,
String.hashCode()
的成本与被散列的字符串的长度成正比。You would typically use
ConcurrentHashMap
if theMap
is a potential concurrency bottleneck. If your application is single threaded or if there is no contention,ConcurrentHashMap
is slower thanHashMap
.The hash function gets evaluated once per "probe" of the hash table; e.g. once per
get
orput
operation. You can reduce the cost of the hash function by caching the result, but this costs you an extra 4 bytes of storage per key object. Whether caching is a worthwhile optimization depends on:hashCode()
that will actually make use of the cached value.Both of these factors are highly application specific.
(Incidentally, the long term cost of using the identity hashcode as the hash value is also an extra 4 bytes of storage.)
The hash function is likely to be cheaper in the numeric case, but whether it is worth it depends on whether there are application-specific downsides of using a numeric key. And, as above, the relative costs are application specifics. For instance, the cost of
String.hashCode()
is proportional to the length of the string being hashed.Java 的 HashMap 最终由
Entry
数组支持,其中 K 的哈希码用于确定存储 Entry 的数组中的槽。使用的数组的大小 (通常从 16 开始)远小于可能的哈希码数量(2^32 ~= 40 亿),因此即使哈希码是唯一的,该数组中也必然存在冲突。
只要您的 hashcode() 方法很快,用作 Key 的类型之间就没有区别。请记住,hashcode() 方法可能会被调用很多次,因此如果它很慢,您可以将其缓存在对象内部。
Java's HashMaps are eventually backed by an array of
Entry<K,V>
where the hashcode of K is used to determine the slot in the array that the Entry is stored in.The size of the array used (typically starts at 16) is much smaller than the number of possible hashcodes (2^32 ~= 4 billion), so there are bound to be collisions in this array, even if the hashcodes are unique.
So long as your hashcode() method is fast, there is no difference between the types that are used as the Key. Remember that the hashcode() method may be called lots of times, so if it is slow you can cache it internally in the object.
我有 ConcurrentHashMap 实例映射,可以通过多线程访问。参见下面的代码片段。这些怎么样?
i have ConcurrentHashMap instance map which access by multithread.seeing below code snippet. how about these?