ConcurrentHashMap构造函数参数?

发布于 2024-08-07 09:43:03 字数 476 浏览 2 评论 0原文

我想知道构造 ConcurrentHashMap 的参数:

  • initialCapacity 默认情况下为 16(已理解)。
  • loadFactor 默认为 0.75。
  • concurrencyLevel 默认情况下为 16。

我的问题是:

  • 应该使用什么标准来向上或向下调整 loadFactor
  • 我们如何确定并发更新线程的数量?
  • 应该使用什么标准来调整 concurrencyLevel 向上或向下?

另外:

  • 良好哈希码实现的特点是什么? (如果某个问题解决了这个问题,只需链接到它。)

谢谢!

I am wondering about the parameters for constructing a ConcurrentHashMap:

  • initialCapacity is 16 by default (understood).
  • loadFactor is 0.75 by default.
  • concurrencyLevel is 16 by default.

My questions are:

  • What criteria should be used to adjust loadFactor up or down?
  • How do we establish the number of concurrently updating threads?
  • What criteria should be used to adjust concurrencyLevel up or down?

Additionally:

  • What are the hallmarks of a good hashcode implementation? (If an SO question addresses this, just link to it.)

Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

与君绝 2024-08-14 09:43:03

简短的答案:将“初始容量”设置为您希望在地图中放入的映射数量,并将其他参数保留为默认值。

长答案:

  • 负载因子是
    地图中“桶”的数量以及
    预期元素的数量;

  • 0.75 通常是一个合理的折衷方案——我记得,这意味着
    良好的哈希函数,平均而言我们
    预计大约 1.6 重定向才能找到
    地图中的元素(或该图周围);

    • 改变负载
      因素改变了之间的折衷
      更多重定向来查找元素,但是
      更少的浪费空间——设置 0.75 是
      通常确实物有所值;

    • 原则上将ConcurrencyLevel设置为
      您的并发线程数
      期望修改地图,
      虽然高估这并不
      似乎对其他方面有不好的影响
      而不是浪费内存(我写了一点
      关于ConcurrentHashMap 性能
      不久前,以防你
      有兴趣)

非正式地说,您的散列函数本质上应该旨在使位中具有尽可能多的“随机性”。或者更严格地说,给定元素的哈希码应该为每个位提供大约 50% 的被设置的机会。实际上用一个例子来说明这一点更容易:同样,您可能对我写的一些关于 字符串哈希函数的工作原理以及相关的哈希函数指南。显然欢迎对任何这些东西提供反馈。

我在某些时候还提到的一件事是,您在实践中不必过于偏执:如果您的哈希函数在某些位中产生“合理”数量的随机性,那么它将经常会没事。在最坏的情况下,将代表性的数据片段粘贴到字符串中并获取字符串的哈希码实际上并没有那么糟糕。

The short answer: set "initial capacity" to roughly how many mappings you expect to put in the map, and leave the other parameters at their default.

Long answer:

  • load factor is the ratio between the
    number of "buckets" in the map and
    the number of expected elements;

  • 0.75 is usually a reasonable compromise-- as I recall, it means that with a
    good hash function, on average we
    expect about 1.6 redirects to find an
    element in the map (or around that figure);

    • changing the load
      factor changes the compromise between
      more redirects to find an element but
      less wasted space-- put 0.75 is
      really usually a good value;

    • in principle, set ConcurrencyLevel to
      the number of concurrent threads you
      expect to have modifying the map,
      although overestimating this doesn't
      appear to have a bad effect other
      than wasting memory (I wrote a little
      on ConcurrentHashMap performance
      a while ago in case you're
      interested)

Informally, your hash function should essentially aim to have as much "randomness" in the bits as possible. Or more strictly, the hash code for a given element should give each bit a roughly 50% chance of being set. It's actually easier to illustrate this with an example: again, you may be interested in some stuff I wrote about how the String hash function works and associated hash function guidelines. Feedback is obvioulsy welcome on any of this stuff.

One thing I also mention at some point is that you don't have to be too paranoid in practice: if your hash function produces a "reasonable" amount of randomness in some of the bits, then it will often be OK. In the worst case, sticking representative pieces of data into a string and taking the hash code of the string actually doesn't work so badly.

独享拥抱 2024-08-14 09:43:03

负载因子主要与哈希函数的质量有关。负载因子越接近零,即使散列函数不是很好,发生冲突的可能性也越小。代价是内存占用更大。换句话说,HashMap 不会为每个单独的哈希码将条目分布在单独的存储桶中,而是按邻近度对它们进行分组,因此它拥有的存储桶越多,分布就越分散,发生冲突的可能性就越小。

因此,最重要的是,根据您的需求和存储在映射中的对象,调整负载因子以缩短查找时间或减少内存。

并发级别实际上取决于您的应用程序。如果应用程序中只运行两个或三个线程,那就这样吧。如果你是一个具有任意数量线程的应用服务器,那么你需要了解你的负载能力是多少以及你想要优化的点。

高质量的哈希码实现可以在遵守合同的同时,以最少的冲突次数在对象的潜在值之间提供尽可能广泛的分布。换句话说,它允许 HashMap(或 Set,视情况而定)将对象分布到单独的存储桶中,从而加快查找速度。

Load Factor is primarily related to the quality of the hash function. The closer to zero the load factor the less likely there are to be collisions even if the hash function isn't so great. The trade off is that the memory footprint is larger. In other words, the HashMap isn't distributing the entries in seperate buckets for each seperate hashcode, it is grouping them by a proximity, so the more buckets it has, the more spread out the distribution, the less likely that there are collisions.

So the bottom line is you fiddle with load factor to improve lookup time or reduce memory, according to your needs and the objects you are storing in the Map.

ConcurrencyLevel really depends on your application. If you only have two or three threads running in the application, there you go. If you are an application server with an arbitrary number of threads, then you need to understand what your load capacity is and what point you want to optimize for.

A good quality hashcode implementation provides as wide a distribution across potential values of the object as possible with the least number of collisions, while honoring the contract. In other words, it allows the HashMap (or Set as the case may be) to distribute the objects into separate buckets making lookups faster.

娇柔作态 2024-08-14 09:43:03

loadFactor:控制实现决定何时调整哈希表的大小。值太高会浪费空间;太低的值将导致昂贵的调整大小操作。

concurrencyLevel:告诉实现尝试针对给定数量的写入线程进行优化。根据 API 文档,最多 10 倍的偏差不会对性能产生太大影响。

允许的更新并发数
操作由可选的指导
concurrencyLevel 构造函数参数
(默认16),用作提示
用于内部尺寸调整。该表是
内部分区尝试
允许指定数量的
无争用的并发更新。
因为哈希表中的放置是
本质上是随机的,实际
并发度会有所不同。理想情况下,你
应该选择一个值来容纳
尽可能多的线程
同时修改表。使用
明显高于你的价值
需要会浪费空间和时间,并且
显着降低值可能会导致
线程争用。但高估了
并在一个顺序内低估
幅度通常没有太大
影响显着。

良好的哈希码实现将在任何间隔内均匀分布哈希值。如果预先知道密钥集,则可以定义一个“完美”哈希函数,为每个密钥创建唯一的哈希值。

loadFactor: controls when the implementation decides to resize the hashtable. Too high a value will waste space; too low a value will result in expensive resize operations.

concurrencyLevel: tells the implementation to try to optimize for the given number of writing threads. According to the API docs, being off by up to a factor of 10 shouldn't have much effect on performance.

The allowed concurrency among update
operations is guided by the optional
concurrencyLevel constructor argument
(default 16), which is used as a hint
for internal sizing. The table is
internally partitioned to try to
permit the indicated number of
concurrent updates without contention.
Because placement in hash tables is
essentially random, the actual
concurrency will vary. Ideally, you
should choose a value to accommodate
as many threads as will ever
concurrently modify the table. Using a
significantly higher value than you
need can waste space and time, and a
significantly lower value can lead to
thread contention. But overestimates
and underestimates within an order of
magnitude do not usually have much
noticeable impact.

A good hashcode implementation will distribute the hash values uniformly over any interval. If the set of keys is known in advance it is possible to define a "perfect" hash function that creates a unique hash value for each key.

loadFactor默认设置为0.75,
应该用什么标准来调整
这是向上还是向下?

在了解哈希映射的工作原理之前,您需要一些关于哈希映射工作原理的背景知识。该地图本质上是一系列的桶。映射中的每个值都会根据其哈希码放入一个存储桶中。 loadFactor 意味着,如果存储桶已满 75% 以上,则应调整 Map 的大小

concurrencyLevel 设置为 16
默认情况下,我们如何建立
并发更新数
线程?应该使用什么标准
向上或向下调整?

这是询问您希望同时修改 Map 的线程数。

有关哈希代码,请参阅 Joshua Bloch 的 有效的 Java

loadFactor is set to 0.75 by default,
what criteria should be used to adjust
this up or down?

You need some background in how hash maps work before you can understand how this works. The map is essentially a series of buckets. Each value in the map gets put in to a bucket depending on what its hash code is. The loadFactor means, if the buckets are more than 75% full, the Map should be resized

concurrencyLevel is set to 16 by
default, how do we establish the
number of concurrently updating
threads? What criteria should be used
to adjust this up or down?

This is asking how many threads to you expect to modify the Map concurrently (simultaneously)

For hash codes, see Joshua Bloch's Effective Java

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文