Java 集合:当“size”改变时会发生什么?超过“int”?
我知道 Java 集合非常耗内存,我自己做了一个测试,证明 4GB 勉强足以将几百万个 Integer
存储到 HashSet
中。
但是如果我有“足够”的内存怎么办? Collection.size()
会发生什么?
编辑:已解决:超出整数范围时Collection.size()
返回Integer.MAX
。
新问题:如何确定集合中元素的“真实”数量?
注 1:抱歉,这可能是一个让我为你谷歌的问题,但我真的没有找到任何东西;)
注 2:据我了解它,集合的每个整数条目是: reference + cached_hashcode + boxed_integer_object + real_int_value
,对吗?
注 3:有趣的是,即使使用 JDK7 和“压缩指针”,当 JVM 使用 2GB 实际内存时,它也只显示分配了 1.5GB VisualVM
中的内存。
对于那些关心的人:
测试来源:
import java.util.*;
import java.lang.management.*;
public final class _BoxedValuesInSetMemoryConsumption {
private final static int MILLION = 1000 * 1000;
public static void main(String... args) {
Set<Integer> set = new HashSet<Integer>();
for (int i = 1;; ++i) {
if ((i % MILLION) == 0) {
int milsOfEntries = (i / MILLION);
long mbytes = ManagementFactory.getMemoryMXBean().
getHeapMemoryUsage().getUsed() / MILLION;
int ratio = (int) mbytes / milsOfEntries;
System.out.println(milsOfEntries + " mil, " + mbytes + " MB used, "
+ " ratio of bytes per entry: " + ratio);
}
set.add(i);
}
}
}
执行参数:
在 OpenSuse 11.3 x64 下使用 x64 版本的 JDK7 build 105 进行测试。
-XX:+UseCompressedOops -Xmx2048m
输出结果:
1 mil, 56 MB used, ratio of bytes per entry: 56
2 mil, 113 MB used, ratio of bytes per entry: 56
3 mil, 161 MB used, ratio of bytes per entry: 53
4 mil, 225 MB used, ratio of bytes per entry: 56
5 mil, 274 MB used, ratio of bytes per entry: 54
6 mil, 322 MB used, ratio of bytes per entry: 53
7 mil, 403 MB used, ratio of bytes per entry: 57
8 mil, 452 MB used, ratio of bytes per entry: 56
9 mil, 499 MB used, ratio of bytes per entry: 55
10 mil, 548 MB used, ratio of bytes per entry: 54
11 mil, 596 MB used, ratio of bytes per entry: 54
12 mil, 644 MB used, ratio of bytes per entry: 53
13 mil, 827 MB used, ratio of bytes per entry: 63
14 mil, 874 MB used, ratio of bytes per entry: 62
15 mil, 855 MB used, ratio of bytes per entry: 57
16 mil, 902 MB used, ratio of bytes per entry: 56
17 mil, 951 MB used, ratio of bytes per entry: 55
18 mil, 999 MB used, ratio of bytes per entry: 55
19 mil, 1047 MB used, ratio of bytes per entry: 55
20 mil, 1096 MB used, ratio of bytes per entry: 54
21 mil, 1143 MB used, ratio of bytes per entry: 54
22 mil, 1191 MB used, ratio of bytes per entry: 54
23 mil, 1239 MB used, ratio of bytes per entry: 53
24 mil, 1288 MB used, ratio of bytes per entry: 53
25 mil, 1337 MB used, ratio of bytes per entry: 53
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
最终使用了大约 2 GiB 的实际内存,而不是显示的 1.3 GiB,因此每个条目的消耗甚至大于超过 53 字节。
I know that Java collections are very memory-hungry, and did a test myself, proving that 4GB is barely enough to store few millions of Integer
s into a HashSet
.
But what if I has "enough" memory? What would happen to Collection.size()
?
EDIT: Solved: Collection.size()
returns Integer.MAX
when the integer range is exceeded.
New question: how to determine the "real" count of elements of a collection then?
NOTE 1: Sorry, this is probably a let-me-google-it-for-you question, but I really didn't find anything ;)
NOTE 2: As far as I understand it, each integer entry of a set is:reference + cached_hashcode + boxed_integer_object + real_int_value
, right?
NOTE 3: Funny, even with JDK7 and "compressed pointers", when the JVM uses 2GB of real memory, it shows only 1.5GB allocated memory in VisualVM
.
For those who care:
Test sources:
import java.util.*;
import java.lang.management.*;
public final class _BoxedValuesInSetMemoryConsumption {
private final static int MILLION = 1000 * 1000;
public static void main(String... args) {
Set<Integer> set = new HashSet<Integer>();
for (int i = 1;; ++i) {
if ((i % MILLION) == 0) {
int milsOfEntries = (i / MILLION);
long mbytes = ManagementFactory.getMemoryMXBean().
getHeapMemoryUsage().getUsed() / MILLION;
int ratio = (int) mbytes / milsOfEntries;
System.out.println(milsOfEntries + " mil, " + mbytes + " MB used, "
+ " ratio of bytes per entry: " + ratio);
}
set.add(i);
}
}
}
Execution parameters:
Tested with x64 version of JDK7 build 105 under OpenSuse 11.3 x64.
-XX:+UseCompressedOops -Xmx2048m
Output result:
1 mil, 56 MB used, ratio of bytes per entry: 56
2 mil, 113 MB used, ratio of bytes per entry: 56
3 mil, 161 MB used, ratio of bytes per entry: 53
4 mil, 225 MB used, ratio of bytes per entry: 56
5 mil, 274 MB used, ratio of bytes per entry: 54
6 mil, 322 MB used, ratio of bytes per entry: 53
7 mil, 403 MB used, ratio of bytes per entry: 57
8 mil, 452 MB used, ratio of bytes per entry: 56
9 mil, 499 MB used, ratio of bytes per entry: 55
10 mil, 548 MB used, ratio of bytes per entry: 54
11 mil, 596 MB used, ratio of bytes per entry: 54
12 mil, 644 MB used, ratio of bytes per entry: 53
13 mil, 827 MB used, ratio of bytes per entry: 63
14 mil, 874 MB used, ratio of bytes per entry: 62
15 mil, 855 MB used, ratio of bytes per entry: 57
16 mil, 902 MB used, ratio of bytes per entry: 56
17 mil, 951 MB used, ratio of bytes per entry: 55
18 mil, 999 MB used, ratio of bytes per entry: 55
19 mil, 1047 MB used, ratio of bytes per entry: 55
20 mil, 1096 MB used, ratio of bytes per entry: 54
21 mil, 1143 MB used, ratio of bytes per entry: 54
22 mil, 1191 MB used, ratio of bytes per entry: 54
23 mil, 1239 MB used, ratio of bytes per entry: 53
24 mil, 1288 MB used, ratio of bytes per entry: 53
25 mil, 1337 MB used, ratio of bytes per entry: 53
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
At the end, about 2 GiB real memory were used, instead of displayed 1.3 GiB, so the consumption for each entry is even larger than 53 bytes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Java 堆!= 系统内存。 Java 的默认堆大小仅为 128MB。请注意,这也不同于 JVM 使用的内存。
关于您的问题:从文档中,
公共 int size()
Java Heap != System Memory. Java's default heap size is only 128MB. Note this is also different from the memory the JVM uses.
Regarding your question: from the docs,
public int size()
你的问题似乎与标题有很大不同的内容。
您已经回答了标题中的问题(返回了
Integer.MAX_VALUE
)。不:您无法使用可安全迭代集合和计数的普通 API 找出“真实”大小(当然使用long
)。如果您想要存储
int
值的Set
,并且您知道值的范围和 数量可能会变得非常大,那么BitSet
实际上可能是一个更好的实现:这将产生一个恒定大小的数据结构,可以保存范围内的所有值,而不改变大小并占用相对较小的内存量(每个可能值 1 位加上一些开销) )。
但是,此方法有两个缺点:
int
值Set
API这两个缺点都可以通过编写一个使用的包装器轻松解决两个 BitSet 对象(可能是延迟分配的)分别保存正值范围和负值范围,并实现 Set 接口的适配器方法。
Your question seems to have a quite different content than the title.
You already answered the question in the title (
Integer.MAX_VALUE
is returned). And no: there's no way you can find out the "true" size with the normal APIs safe for iterating over the collection and counting (using along
of course).If you want to store a
Set
ofint
values and you know that the range and amount of values can become very big, then aBitSet
might actually be a better implementation:This will produce a constant-size data structure that can hold all values inside the range without changing size and occupying a relatively small amount of memory (1 bit per possible value plus some overhead).
This method has two drawbacks, however:
int
valuesSet
APIBoth can easily be worked around by writing a wrapper that uses two
BitSet
objects (possibly lazily allocated) to hold the positive and negative value range respectively and implements adapter methods for theSet
interface.从源代码来看:
From the source code:
对于任何真正的处理器架构来说,通用的答案是你不能。原因很简单:分配的对象(至少 1 个字大小)不能多于可寻址内存。
当然,考虑到 JVM 的虚拟性质,存在一种可能发生这种情况的情况。
int
始终是 32 位签名的,您可以在可以寻址超过 2GB 内存的 64 位机器上实现和运行 JVM。在这种情况下,文档告诉我们将返回
Integer.MAX_INT
...这是一个大问题,因为任何使用依赖于i
i
的整数变量的循环都将返回。 col.size()
停止将永远运行(尽管我认为任何循环2**31-1
次的东西都会花费足够长的时间让你想要终止该进程) 。The generic answer for any real processor architecture is that you just cannot. The reason is simple: there can't be more allocated objects (of at least 1 word size) than addressable memory.
Of course, given the virtual nature of the JVM, there's a scenario where that can happen.
int
will always be 32bit signed, and you can implement and run the JVM on top a 64bit machine where more than 2GB of memory can be addressed.In that case, the documentation tells us that
Integer.MAX_INT
would be returned... And that's a big problem, because any loop that used an integer variable relying oni < col.size()
to stop would run forever (although I think that anything that loops2**31-1
times would take long enough to make you want to kill the process anyway).