ConcurrentHashMap构造函数参数?
我想知道构造 ConcurrentHashMap
的参数:
initialCapacity
默认情况下为 16(已理解)。loadFactor
默认为 0.75。concurrencyLevel
默认情况下为 16。
我的问题是:
- 应该使用什么标准来向上或向下调整
loadFactor
? - 我们如何确定并发更新线程的数量?
- 应该使用什么标准来调整
concurrencyLevel
向上或向下?
另外:
- 良好哈希码实现的特点是什么? (如果某个问题解决了这个问题,只需链接到它。)
谢谢!
I am wondering about the parameters for constructing a ConcurrentHashMap
:
initialCapacity
is 16 by default (understood).loadFactor
is 0.75 by default.concurrencyLevel
is 16 by default.
My questions are:
- What criteria should be used to adjust
loadFactor
up or down? - How do we establish the number of concurrently updating threads?
- What criteria should be used to adjust
concurrencyLevel
up or down?
Additionally:
- What are the hallmarks of a good hashcode implementation? (If an SO question addresses this, just link to it.)
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
简短的答案:将“初始容量”设置为您希望在地图中放入的映射数量,并将其他参数保留为默认值。
长答案:
负载因子是
地图中“桶”的数量以及
预期元素的数量;
0.75 通常是一个合理的折衷方案——我记得,这意味着
良好的哈希函数,平均而言我们
预计大约 1.6 重定向才能找到
地图中的元素(或该图周围);
改变负载
因素改变了之间的折衷
更多重定向来查找元素,但是
更少的浪费空间——设置 0.75 是
通常确实物有所值;
原则上将ConcurrencyLevel设置为
您的并发线程数
期望修改地图,
虽然高估这并不
似乎对其他方面有不好的影响
而不是浪费内存(我写了一点
关于ConcurrentHashMap 性能
不久前,以防你
有兴趣)
非正式地说,您的散列函数本质上应该旨在使位中具有尽可能多的“随机性”。或者更严格地说,给定元素的哈希码应该为每个位提供大约 50% 的被设置的机会。实际上用一个例子来说明这一点更容易:同样,您可能对我写的一些关于 字符串哈希函数的工作原理以及相关的哈希函数指南。显然欢迎对任何这些东西提供反馈。
我在某些时候还提到的一件事是,您在实践中不必过于偏执:如果您的哈希函数在某些位中产生“合理”数量的随机性,那么它将经常会没事。在最坏的情况下,将代表性的数据片段粘贴到字符串中并获取字符串的哈希码实际上并没有那么糟糕。
The short answer: set "initial capacity" to roughly how many mappings you expect to put in the map, and leave the other parameters at their default.
Long answer:
load factor is the ratio between the
number of "buckets" in the map and
the number of expected elements;
0.75 is usually a reasonable compromise-- as I recall, it means that with a
good hash function, on average we
expect about 1.6 redirects to find an
element in the map (or around that figure);
changing the load
factor changes the compromise between
more redirects to find an element but
less wasted space-- put 0.75 is
really usually a good value;
in principle, set ConcurrencyLevel to
the number of concurrent threads you
expect to have modifying the map,
although overestimating this doesn't
appear to have a bad effect other
than wasting memory (I wrote a little
on ConcurrentHashMap performance
a while ago in case you're
interested)
Informally, your hash function should essentially aim to have as much "randomness" in the bits as possible. Or more strictly, the hash code for a given element should give each bit a roughly 50% chance of being set. It's actually easier to illustrate this with an example: again, you may be interested in some stuff I wrote about how the String hash function works and associated hash function guidelines. Feedback is obvioulsy welcome on any of this stuff.
One thing I also mention at some point is that you don't have to be too paranoid in practice: if your hash function produces a "reasonable" amount of randomness in some of the bits, then it will often be OK. In the worst case, sticking representative pieces of data into a string and taking the hash code of the string actually doesn't work so badly.
负载因子主要与哈希函数的质量有关。负载因子越接近零,即使散列函数不是很好,发生冲突的可能性也越小。代价是内存占用更大。换句话说,HashMap 不会为每个单独的哈希码将条目分布在单独的存储桶中,而是按邻近度对它们进行分组,因此它拥有的存储桶越多,分布就越分散,发生冲突的可能性就越小。
因此,最重要的是,根据您的需求和存储在映射中的对象,调整负载因子以缩短查找时间或减少内存。
并发级别实际上取决于您的应用程序。如果应用程序中只运行两个或三个线程,那就这样吧。如果你是一个具有任意数量线程的应用服务器,那么你需要了解你的负载能力是多少以及你想要优化的点。
高质量的哈希码实现可以在遵守合同的同时,以最少的冲突次数在对象的潜在值之间提供尽可能广泛的分布。换句话说,它允许 HashMap(或 Set,视情况而定)将对象分布到单独的存储桶中,从而加快查找速度。
Load Factor is primarily related to the quality of the hash function. The closer to zero the load factor the less likely there are to be collisions even if the hash function isn't so great. The trade off is that the memory footprint is larger. In other words, the HashMap isn't distributing the entries in seperate buckets for each seperate hashcode, it is grouping them by a proximity, so the more buckets it has, the more spread out the distribution, the less likely that there are collisions.
So the bottom line is you fiddle with load factor to improve lookup time or reduce memory, according to your needs and the objects you are storing in the Map.
ConcurrencyLevel really depends on your application. If you only have two or three threads running in the application, there you go. If you are an application server with an arbitrary number of threads, then you need to understand what your load capacity is and what point you want to optimize for.
A good quality hashcode implementation provides as wide a distribution across potential values of the object as possible with the least number of collisions, while honoring the contract. In other words, it allows the HashMap (or Set as the case may be) to distribute the objects into separate buckets making lookups faster.
loadFactor:控制实现决定何时调整哈希表的大小。值太高会浪费空间;太低的值将导致昂贵的调整大小操作。
concurrencyLevel:告诉实现尝试针对给定数量的写入线程进行优化。根据 API 文档,最多 10 倍的偏差不会对性能产生太大影响。
良好的哈希码实现将在任何间隔内均匀分布哈希值。如果预先知道密钥集,则可以定义一个“完美”哈希函数,为每个密钥创建唯一的哈希值。
loadFactor: controls when the implementation decides to resize the hashtable. Too high a value will waste space; too low a value will result in expensive resize operations.
concurrencyLevel: tells the implementation to try to optimize for the given number of writing threads. According to the API docs, being off by up to a factor of 10 shouldn't have much effect on performance.
A good hashcode implementation will distribute the hash values uniformly over any interval. If the set of keys is known in advance it is possible to define a "perfect" hash function that creates a unique hash value for each key.
在了解哈希映射的工作原理之前,您需要一些关于哈希映射工作原理的背景知识。该地图本质上是一系列的桶。映射中的每个值都会根据其哈希码放入一个存储桶中。 loadFactor 意味着,如果存储桶已满 75% 以上,则应调整 Map 的大小
这是询问您希望同时修改 Map 的线程数。
有关哈希代码,请参阅 Joshua Bloch 的 有效的 Java
You need some background in how hash maps work before you can understand how this works. The map is essentially a series of buckets. Each value in the map gets put in to a bucket depending on what its hash code is. The loadFactor means, if the buckets are more than 75% full, the Map should be resized
This is asking how many threads to you expect to modify the Map concurrently (simultaneously)
For hash codes, see Joshua Bloch's Effective Java