Java 集合:当“size”改变时会发生什么?超过“int”?

发布于 2024-09-15 18:32:13 字数 2954 浏览 5 评论 0原文

我知道 Java 集合非常耗内存,我自己做了一个测试,证明 4GB 勉强足以将几百万个 Integer 存储到 HashSet 中。

但是如果我有“足够”的内存怎么办? Collection.size() 会发生什么?

编辑:已解决:超出整数范围时Collection.size()返回Integer.MAX
新问题:如何确定集合中元素的“真实”数量?

注 1:抱歉,这可能是一个让我为你谷歌的问题,但我真的没有找到任何东西;)

注 2:据我了解它,集合的每个整数条目是: reference + cached_hashcode + boxed_integer_object + real_int_value,对吗?

注 3:有趣的是,即使使用 JDK7 和“压缩指针”,当 JVM 使用 2GB 实际内存时,它也只显示分配了 1.5GB VisualVM 中的内存。

对于那些关心的人:

测试来源:

import java.util.*;
import java.lang.management.*;

public final class _BoxedValuesInSetMemoryConsumption {
  private final static int MILLION = 1000 * 1000;

  public static void main(String... args) {
    Set<Integer> set = new HashSet<Integer>();

    for (int i = 1;; ++i) {
      if ((i % MILLION) == 0) {
        int milsOfEntries = (i / MILLION);
        long mbytes = ManagementFactory.getMemoryMXBean().
            getHeapMemoryUsage().getUsed() / MILLION;
        int ratio = (int) mbytes / milsOfEntries;
        System.out.println(milsOfEntries + " mil, " + mbytes + " MB used, "
            + " ratio of bytes per entry: " + ratio);
      }

      set.add(i);
    }
  }
}

执​​行参数:

在 OpenSuse 11.3 x64 下使用 x64 版本的 JDK7 build 105 进行测试。

-XX:+UseCompressedOops -Xmx2048m

输出结果:

1 mil, 56 MB used,  ratio of bytes per entry: 56
2 mil, 113 MB used,  ratio of bytes per entry: 56
3 mil, 161 MB used,  ratio of bytes per entry: 53
4 mil, 225 MB used,  ratio of bytes per entry: 56
5 mil, 274 MB used,  ratio of bytes per entry: 54
6 mil, 322 MB used,  ratio of bytes per entry: 53
7 mil, 403 MB used,  ratio of bytes per entry: 57
8 mil, 452 MB used,  ratio of bytes per entry: 56
9 mil, 499 MB used,  ratio of bytes per entry: 55
10 mil, 548 MB used,  ratio of bytes per entry: 54
11 mil, 596 MB used,  ratio of bytes per entry: 54
12 mil, 644 MB used,  ratio of bytes per entry: 53
13 mil, 827 MB used,  ratio of bytes per entry: 63
14 mil, 874 MB used,  ratio of bytes per entry: 62
15 mil, 855 MB used,  ratio of bytes per entry: 57
16 mil, 902 MB used,  ratio of bytes per entry: 56
17 mil, 951 MB used,  ratio of bytes per entry: 55
18 mil, 999 MB used,  ratio of bytes per entry: 55
19 mil, 1047 MB used,  ratio of bytes per entry: 55
20 mil, 1096 MB used,  ratio of bytes per entry: 54
21 mil, 1143 MB used,  ratio of bytes per entry: 54
22 mil, 1191 MB used,  ratio of bytes per entry: 54
23 mil, 1239 MB used,  ratio of bytes per entry: 53
24 mil, 1288 MB used,  ratio of bytes per entry: 53
25 mil, 1337 MB used,  ratio of bytes per entry: 53
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

最终使用了大约 2 GiB 的实际内存,而不是显示的 1.3 GiB,因此每个条目的消耗甚至大于超过 53 字节。

I know that Java collections are very memory-hungry, and did a test myself, proving that 4GB is barely enough to store few millions of Integers into a HashSet.

But what if I has "enough" memory? What would happen to Collection.size()?

EDIT: Solved: Collection.size() returns Integer.MAX when the integer range is exceeded.
New question: how to determine the "real" count of elements of a collection then?

NOTE 1: Sorry, this is probably a let-me-google-it-for-you question, but I really didn't find anything ;)

NOTE 2: As far as I understand it, each integer entry of a set is:
reference + cached_hashcode + boxed_integer_object + real_int_value, right?

NOTE 3: Funny, even with JDK7 and "compressed pointers", when the JVM uses 2GB of real memory, it shows only 1.5GB allocated memory in VisualVM.

For those who care:

Test sources:

import java.util.*;
import java.lang.management.*;

public final class _BoxedValuesInSetMemoryConsumption {
  private final static int MILLION = 1000 * 1000;

  public static void main(String... args) {
    Set<Integer> set = new HashSet<Integer>();

    for (int i = 1;; ++i) {
      if ((i % MILLION) == 0) {
        int milsOfEntries = (i / MILLION);
        long mbytes = ManagementFactory.getMemoryMXBean().
            getHeapMemoryUsage().getUsed() / MILLION;
        int ratio = (int) mbytes / milsOfEntries;
        System.out.println(milsOfEntries + " mil, " + mbytes + " MB used, "
            + " ratio of bytes per entry: " + ratio);
      }

      set.add(i);
    }
  }
}

Execution parameters:

Tested with x64 version of JDK7 build 105 under OpenSuse 11.3 x64.

-XX:+UseCompressedOops -Xmx2048m

Output result:

1 mil, 56 MB used,  ratio of bytes per entry: 56
2 mil, 113 MB used,  ratio of bytes per entry: 56
3 mil, 161 MB used,  ratio of bytes per entry: 53
4 mil, 225 MB used,  ratio of bytes per entry: 56
5 mil, 274 MB used,  ratio of bytes per entry: 54
6 mil, 322 MB used,  ratio of bytes per entry: 53
7 mil, 403 MB used,  ratio of bytes per entry: 57
8 mil, 452 MB used,  ratio of bytes per entry: 56
9 mil, 499 MB used,  ratio of bytes per entry: 55
10 mil, 548 MB used,  ratio of bytes per entry: 54
11 mil, 596 MB used,  ratio of bytes per entry: 54
12 mil, 644 MB used,  ratio of bytes per entry: 53
13 mil, 827 MB used,  ratio of bytes per entry: 63
14 mil, 874 MB used,  ratio of bytes per entry: 62
15 mil, 855 MB used,  ratio of bytes per entry: 57
16 mil, 902 MB used,  ratio of bytes per entry: 56
17 mil, 951 MB used,  ratio of bytes per entry: 55
18 mil, 999 MB used,  ratio of bytes per entry: 55
19 mil, 1047 MB used,  ratio of bytes per entry: 55
20 mil, 1096 MB used,  ratio of bytes per entry: 54
21 mil, 1143 MB used,  ratio of bytes per entry: 54
22 mil, 1191 MB used,  ratio of bytes per entry: 54
23 mil, 1239 MB used,  ratio of bytes per entry: 53
24 mil, 1288 MB used,  ratio of bytes per entry: 53
25 mil, 1337 MB used,  ratio of bytes per entry: 53
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

At the end, about 2 GiB real memory were used, instead of displayed 1.3 GiB, so the consumption for each entry is even larger than 53 bytes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

梦醒时光 2024-09-22 18:32:13

我知道Java集合非常
内存饥饿,我自己做了一个测试,
证明4GB勉强够用
将数百万个整数存储到
哈希集

Java 堆!= 系统内存。 Java 的默认堆大小仅为 128MB。请注意,这也不同于 JVM 使用的内存。

关于您的问题:从文档中,

公共 int size()

返回此元素的数量
收藏。如果这个合集
包含超过 Integer.MAX_VALUE
元素,返回 Integer.MAX_VALUE

I know that Java collections are very
memory-hungry, and did a test myself,
proving that 4GB is barely enough to
store few millions of Integers into a
HashSet.

Java Heap != System Memory. Java's default heap size is only 128MB. Note this is also different from the memory the JVM uses.

Regarding your question: from the docs,

public int size()

Returns the number of elements in this
collection. If this collection
contains more than Integer.MAX_VALUE
elements, returns Integer.MAX_VALUE.

镜花水月 2024-09-22 18:32:13

你的问题似乎与标题有很大不同的内容。

您已经回答了标题中的问题(返回了Integer.MAX_VALUE)。不:您无法使用可安全迭代集合和计数的普通 API 找出“真实”大小(当然使用 long)。

如果您想要存储int 值的Set,并且您知道值的范围 数量可能会变得非常大,那么BitSet 实际上可能是一个更好的实现:

import java.util.*;
import java.lang.management.*;

public final class IntegersInBitSetMemoryConsumption {
  private final static int MILLION = 1000 * 1000;

  public static void main(String... args) {
    BitSet set = new BitSet(Integer.MAX_VALUE);

    for (int i = 1;; ++i) {
      if ((i % MILLION) == 0) {
        int milsOfEntries = (i / MILLION);
        long mbytes = ManagementFactory.getMemoryMXBean().
            getHeapMemoryUsage().getUsed() / MILLION;
        double ratio = mbytes / milsOfEntries;
        System.out.println(milsOfEntries + " mil, " + mbytes + " MiB used, "
            + " ratio of bytes per entry: " + ratio);
      }

      set.set(i);
    }
  }
}

这将产生一个恒定大小的数据结构,可以保存范围内的所有值,而不改变大小并占用相对较小的内存量(每个可能值 1 位加上一些开销) )。

但是,此方法有两个缺点:

  • 它不支持负 int
  • 它不提供 Set API

这两个缺点都可以通过编写一个使用的包装器轻松解决两个 BitSet 对象(可能是延迟分配的)分别保存正值范围和负值范围,并实现 Set 接口的适配器方法。

Your question seems to have a quite different content than the title.

You already answered the question in the title (Integer.MAX_VALUE is returned). And no: there's no way you can find out the "true" size with the normal APIs safe for iterating over the collection and counting (using a long of course).

If you want to store a Set of int values and you know that the range and amount of values can become very big, then a BitSet might actually be a better implementation:

import java.util.*;
import java.lang.management.*;

public final class IntegersInBitSetMemoryConsumption {
  private final static int MILLION = 1000 * 1000;

  public static void main(String... args) {
    BitSet set = new BitSet(Integer.MAX_VALUE);

    for (int i = 1;; ++i) {
      if ((i % MILLION) == 0) {
        int milsOfEntries = (i / MILLION);
        long mbytes = ManagementFactory.getMemoryMXBean().
            getHeapMemoryUsage().getUsed() / MILLION;
        double ratio = mbytes / milsOfEntries;
        System.out.println(milsOfEntries + " mil, " + mbytes + " MiB used, "
            + " ratio of bytes per entry: " + ratio);
      }

      set.set(i);
    }
  }
}

This will produce a constant-size data structure that can hold all values inside the range without changing size and occupying a relatively small amount of memory (1 bit per possible value plus some overhead).

This method has two drawbacks, however:

  • it doesn't support negative int values
  • it doesn't provide the Set API

Both can easily be worked around by writing a wrapper that uses two BitSet objects (possibly lazily allocated) to hold the positive and negative value range respectively and implements adapter methods for the Set interface.

时光与爱终年不遇 2024-09-22 18:32:13

从源代码来看:

 /**
 * Returns the number of elements in this collection.  If this collection
 * contains more than <tt>Integer.MAX_VALUE</tt> elements, returns
 * <tt>Integer.MAX_VALUE</tt>.
 * 
 * @return the number of elements in this collection
 */
int size();

From the source code:

 /**
 * Returns the number of elements in this collection.  If this collection
 * contains more than <tt>Integer.MAX_VALUE</tt> elements, returns
 * <tt>Integer.MAX_VALUE</tt>.
 * 
 * @return the number of elements in this collection
 */
int size();
毅然前行 2024-09-22 18:32:13

对于任何真正的处理器架构来说,通用的答案是你不能。原因很简单:分配的对象(至少 1 个字大小)不能多于可寻址内存。

当然,考虑到 JVM 的虚拟性质,存在一种可能发生这种情况的情况。
int 始终是 32 位签名的,您可以在可以寻址超过 2GB 内存的 64 位机器上实现和运行 JVM。

在这种情况下,文档告诉我们将返回 Integer.MAX_INT...这是一个大问题,因为任何使用依赖于 i i 的整数变量的循环都将返回。 col.size() 停止将永远运行(尽管我认为任何循环 2**31-1 次的东西都会花费足够长的时间让你想要终止该进程) 。

The generic answer for any real processor architecture is that you just cannot. The reason is simple: there can't be more allocated objects (of at least 1 word size) than addressable memory.

Of course, given the virtual nature of the JVM, there's a scenario where that can happen.
int will always be 32bit signed, and you can implement and run the JVM on top a 64bit machine where more than 2GB of memory can be addressed.

In that case, the documentation tells us that Integer.MAX_INT would be returned... And that's a big problem, because any loop that used an integer variable relying on i < col.size() to stop would run forever (although I think that anything that loops 2**31-1 times would take long enough to make you want to kill the process anyway).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文