HashSet、Vector、LinkedList 的最大大小
HashSet
、Vector
、LinkedList
的最大大小是多少?我知道 ArrayList 可以存储超过 3277000 个数字。
然而列表的大小取决于内存(堆)的大小。如果达到最大值,JDK 会抛出 OutOfMemoryError
。
但我不知道 HashSet
、Vector
和 LinkedList
中元素数量的限制。
What is the maximum size of HashSet
, Vector
, LinkedList
? I know that ArrayList
can store more than 3277000 numbers.
However the size of list depends on the memory (heap) size. If it reaches maximum the JDK throws an OutOfMemoryError
.
But I don't know the limit for the number of elements in HashSet
, Vector
and LinkedList
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这些结构没有指定最大尺寸。
实际的实际大小限制可能在
Integer.MAX_VALUE
区域内(即 2147483647,大约 20 亿个元素),因为这是 Java 中数组的最大大小。HashSet
在内部使用HashMap
,因此它具有与HashMap
相同的最大大小HashMap
使用的数组的大小始终为 2 的幂,因此它最多可以有 230 = 1073741824 个元素(从下一个开始) 2 的幂大于Integer.MAX_VALUE
)。HashMap
停止调整大小时,它仍然允许您添加元素,利用每个存储桶都是通过链接管理的事实列表。因此,HashMap
/HashSet
中元素的唯一限制是内存。Vector
在内部使用一个数组,其最大大小恰好为Integer.MAX_VALUE
,因此它不能支持超过那么多的元素LinkedList
大于Integer.MAX_VALUE
,它会错误地报告大小,因为它使用int
字段来存储大小,并且size()
的返回类型也是int
。请注意,虽然
Collection
API 确实定义了包含超过Integer.MAX_VALUE
元素的Collection
的行为方式。最重要的是,它指出了这个size()
文档:请注意,虽然
HashMap
、HashSet
和LinkedList
似乎 支持的不仅仅是Integer.MAX_VALUE
元素,没有以这种方式实现size()
方法(即它们只是让内部size
字段溢出)。这让我相信在这种情况下其他操作也没有明确定义。
因此,我认为使用这些通用集合最多包含
Integer.MAX_VLAUE
元素是安全的。如果您知道需要存储更多内容,那么您应该切换到实际支持此操作的专用集合实现。There is no specified maximum size of these structures.
The actual practical size limit is probably somewhere in the region of
Integer.MAX_VALUE
(i.e. 2147483647, roughly 2 billion elements), as that's the maximum size of an array in Java.HashSet
uses aHashMap
internally, so it has the same maximum size as thatHashMap
uses an array which always has a size that is a power of two, so it can be at most 230 = 1073741824 elements big (since the next power of two is bigger thanInteger.MAX_VALUE
).HashMap
stops resizing, then it will still allow you to add elements, exploiting the fact that each bucket is managed via a linked list. Therefore the only limit for elements in aHashMap
/HashSet
is memory.Vector
uses an array internally which has a maximum size of exactlyInteger.MAX_VALUE
, so it can't support more than that many elementsLinkedList
doesn't use an array as the underlying storage, so that doesn't limit the size. It uses a classical doubly linked list structure with no inherent limit, so its size is only bounded by the available memory. Note that aLinkedList
will report the size wrongly if it is bigger thanInteger.MAX_VALUE
, because it uses aint
field to store the size and the return type ofsize()
isint
as well.Note that while the
Collection
API does define how aCollection
with more thanInteger.MAX_VALUE
elements should behave. Most importantly it states this thesize()
documentation:Note that while
HashMap
,HashSet
andLinkedList
seem to support more thanInteger.MAX_VALUE
elements, none of those implement thesize()
method in this way (i.e. they simply let the internalsize
field overflow).This leads me to believe that other operations also aren't well-defined in this condition.
So I'd say it's safe to use those general-purpose collections with up to
Integer.MAX_VLAUE
elements. If you know that you'll need to store more than that, then you should switch to dedicated collection implementations that actually support this.在所有情况下,您可能会受到 JVM 堆大小的限制,而不是其他任何因素。最终你总是会开始使用数组,所以我非常怀疑它们中的任何一个都将管理超过 231 - 1 个元素,但你非常非常有可能在此之前耗尽堆反正。
In all cases, you're likely to be limited by the JVM heap size rather than anything else. Eventually you'll always get down to arrays so I very much doubt that any of them will manage more than 231 - 1 elements, but you're very, very likely to run out of heap before then anyway.
这在很大程度上取决于实施细节。
HashSet 使用数组作为底层存储,默认情况下,当集合已满 75% 时,它会尝试增长。这意味着如果您尝试添加超过大约 750,000,000 个条目,它将失败。 (它无法将数组从 2^30 项增加到 2^31 项)
增加加载因子会增加集合的最大大小。例如,负载因子 10 允许 100 亿个元素。 (值得注意的是,HashSet 在超过 1 亿个元素时效率相对较低,因为 32 位哈希码的分布开始看起来不那么随机,并且冲突次数增加)
Vector 将其容量加倍并从 10 开始。这意味着它将未能增长到约 13.4 亿以上。将初始大小更改为 2^n-1 可以为您提供更多的头部空间。
顺便说一句:如果可以的话,使用 ArrayList 而不是 Vector。
LinkedList 没有固有的限制,可以增长超过 21 亿。此时 size() 可以返回 Integer.MAX_VALUE,但是某些函数(例如 toArray)将失败,因为它无法将所有对象放入数组中, in 会为您提供第一个 Integer.MAX_VALUE 而不是抛出异常。
正如 @Joachim Sauer 所指出的,当前的 OpenJDK 可能会针对大于 Integer.MAX_VALUE 的大小返回错误的结果。例如,它可能是负数。
It very much depends on the implementation details.
A HashSet uses an array as an underlying store which by default it attempt to grow when the collection is 75% full. This means it will fail if you try to add more than about 750,000,000 entries. (It cannot grow the array from 2^30 to 2^31 entries)
Increasing the load factor increases the maximum size of the collection. e.g. a load factor of 10 allows 10 billion elements. (It is worth noting that HashSet is relatively inefficient past 100 million elements as the distribution of the 32-bit hashcode starts to look less random, and the number of collisions increases)
A Vector doubles its capacity and starts at 10. This means it will fail to grow above approx 1.34 billion. Changing the initial size to 2^n-1 gives you slightly more head room.
BTW: Use ArrayList rather than Vector if you can.
A LinkedList has no inherent limit and can grow beyond 2.1 billion. At this point size() could return Integer.MAX_VALUE, however some functions such as toArray will fail as it cannot put all objects into an array, in will instead give you the first Integer.MAX_VALUE rather than throw an exception.
As @Joachim Sauer notes, the current OpenJDK could return an incorrect result for sizes above Integer.MAX_VALUE. e.g. it could be a negative number.
最大大小取决于 JVM 的内存设置,当然还有可用的系统内存。每个列表条目的内存消耗的具体大小在平台之间也有所不同,因此最简单的方法可能是运行简单的测试。
The maximum size depends on the memory settings of the JVM and of course the available system memory. Specific size of memory consumption per list entry also differs between platforms, so the easiest way might be to run simple tests.
正如其他答案中所述,一个数组不能达到 2^31 个条目。其他数据类型要么受此限制,要么最终可能会误报其 size()。但是,在某些系统上无法达到这些理论限制:
在 32 位系统上,可用字节数永远不会超过 2^32。这是假设您没有占用内存的操作系统。 32 位指针为 4 个字节。任何不依赖数组的事物都必须在每个条目中至少包含一个指针:这意味着对于不使用数组的事物,最大条目数为 2^32/4 或 2^30。
普通数组可以达到理论极限,但只有字节数组,长度为 2^31-1 的短数组将占用大约 2^32+38 个字节。
一些 java VM 引入了使用压缩指针的新内存模型。通过调整指针对齐方式,可以用 32 字节指针引用略多于 2^32 的字节。大约是四倍。这足以导致 LinkedList size() 变为负数,但不足以让它回绕到零。
64 位系统有 64 位指针,使所有指针变大一倍,使非数组列表变得更胖。这也意味着支持的最大容量准确地跃升至 2^64 字节。这足以让二维阵列达到其理论最大值。 byte[0x7fffffff][0x7fffffff] 使用的内存大约等于 40+40*(2^31-1)+(2^31-1)(2^31-1)=40+40( 2^31-1)+(2^62-2^32+1)
As stated in other answers, an array cannot reach 2^31 entries. Other data types are limited either by this or they will likely misreport their size() eventually. However, these theoretical limits cannot be reached on some systems:
On a 32 bit system, the number of bytes available never exceeds 2^32 exactly. And that is assuming that you have no operating system taking up memory. A 32 bit pointer is 4 bytes. Anything which does not rely on arrays must include at least one pointer per entry: this means that the maximum number of entries is 2^32/4 or 2^30 for things that do not utilize arrays.
A plain array can achieve it's theoretical limit, but only a byte array, a short array of length 2^31-1 would use up about 2^32+38 bytes.
Some java VMs have introduced a new memory model that uses compressed pointers. By adjusting pointer alignment, slightly more than 2^32 bytes may be referenced with 32 byte pointers. Around four times more. This is enough to cause a LinkedList size() to become negative, but not enough to allow it to wrap around to zero.
A sixty four bit system has sixty four bit pointers, making all pointers twice as big, making non array lists a bunch fatter. This also means that the maximum capacity supported jumps to 2^64 bytes exactly. This is enough for a 2D array to reach its theoretical maximum. byte[0x7fffffff][0x7fffffff] uses memory apporximately equal to 40+40*(2^31-1)+(2^31-1)(2^31-1)=40+40(2^31-1)+(2^62-2^32+1)