当前位置：文江博客话题详情

Java 集合：当“size”改变时会发生什么？超过“int”？

发布于 2024-09-15 18:32:13 字数 2954 浏览 5 评论 0原文

我知道 Java 集合非常耗内存，我自己做了一个测试，证明 4GB 勉强足以将几百万个 Integer 存储到 HashSet 中。

但是如果我有“足够”的内存怎么办？ Collection.size() 会发生什么？

编辑：已解决：超出整数范围时Collection.size()返回Integer.MAX。
新问题：如何确定集合中元素的“真实”数量？

注 1：抱歉，这可能是一个让我为你谷歌的问题，但我真的没有找到任何东西；）

注 2：据我了解它，集合的每个整数条目是： reference + cached_hashcode + boxed_integer_object + real_int_value，对吗？

注 3：有趣的是，即使使用 JDK7 和“压缩指针”，当 JVM 使用 2GB 实际内存时，它也只显示分配了 1.5GB VisualVM 中的内存。

对于那些关心的人：

测试来源：

import java.util.*;
import java.lang.management.*;

public final class _BoxedValuesInSetMemoryConsumption {
  private final static int MILLION = 1000 * 1000;

  public static void main(String... args) {
    Set<Integer> set = new HashSet<Integer>();

    for (int i = 1;; ++i) {
      if ((i % MILLION) == 0) {
        int milsOfEntries = (i / MILLION);
        long mbytes = ManagementFactory.getMemoryMXBean().
            getHeapMemoryUsage().getUsed() / MILLION;
        int ratio = (int) mbytes / milsOfEntries;
        System.out.println(milsOfEntries + " mil, " + mbytes + " MB used, "
            + " ratio of bytes per entry: " + ratio);
      }

      set.add(i);
    }
  }
}

执行参数：

在 OpenSuse 11.3 x64 下使用 x64 版本的 JDK7 build 105 进行测试。

-XX:+UseCompressedOops -Xmx2048m

输出结果：

1 mil, 56 MB used,  ratio of bytes per entry: 56
2 mil, 113 MB used,  ratio of bytes per entry: 56
3 mil, 161 MB used,  ratio of bytes per entry: 53
4 mil, 225 MB used,  ratio of bytes per entry: 56
5 mil, 274 MB used,  ratio of bytes per entry: 54
6 mil, 322 MB used,  ratio of bytes per entry: 53
7 mil, 403 MB used,  ratio of bytes per entry: 57
8 mil, 452 MB used,  ratio of bytes per entry: 56
9 mil, 499 MB used,  ratio of bytes per entry: 55
10 mil, 548 MB used,  ratio of bytes per entry: 54
11 mil, 596 MB used,  ratio of bytes per entry: 54
12 mil, 644 MB used,  ratio of bytes per entry: 53
13 mil, 827 MB used,  ratio of bytes per entry: 63
14 mil, 874 MB used,  ratio of bytes per entry: 62
15 mil, 855 MB used,  ratio of bytes per entry: 57
16 mil, 902 MB used,  ratio of bytes per entry: 56
17 mil, 951 MB used,  ratio of bytes per entry: 55
18 mil, 999 MB used,  ratio of bytes per entry: 55
19 mil, 1047 MB used,  ratio of bytes per entry: 55
20 mil, 1096 MB used,  ratio of bytes per entry: 54
21 mil, 1143 MB used,  ratio of bytes per entry: 54
22 mil, 1191 MB used,  ratio of bytes per entry: 54
23 mil, 1239 MB used,  ratio of bytes per entry: 53
24 mil, 1288 MB used,  ratio of bytes per entry: 53
25 mil, 1337 MB used,  ratio of bytes per entry: 53
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

最终使用了大约 2 GiB 的实际内存，而不是显示的 1.3 GiB，因此每个条目的消耗甚至大于超过 53 字节。

原文

I know that Java collections are very memory-hungry, and did a test myself, proving that 4GB is barely enough to store few millions of Integers into a HashSet.

But what if I has "enough" memory? What would happen to Collection.size()?

EDIT: Solved: Collection.size() returns Integer.MAX when the integer range is exceeded.
New question: how to determine the "real" count of elements of a collection then?

NOTE 1: Sorry, this is probably a let-me-google-it-for-you question, but I really didn't find anything ;)

NOTE 2: As far as I understand it, each integer entry of a set is:
reference + cached_hashcode + boxed_integer_object + real_int_value, right?

NOTE 3: Funny, even with JDK7 and "compressed pointers", when the JVM uses 2GB of real memory, it shows only 1.5GB allocated memory in VisualVM.

For those who care:

Test sources:

import java.util.*;
import java.lang.management.*;

public final class _BoxedValuesInSetMemoryConsumption {
  private final static int MILLION = 1000 * 1000;

  public static void main(String... args) {
    Set<Integer> set = new HashSet<Integer>();

    for (int i = 1;; ++i) {
      if ((i % MILLION) == 0) {
        int milsOfEntries = (i / MILLION);
        long mbytes = ManagementFactory.getMemoryMXBean().
            getHeapMemoryUsage().getUsed() / MILLION;
        int ratio = (int) mbytes / milsOfEntries;
        System.out.println(milsOfEntries + " mil, " + mbytes + " MB used, "
            + " ratio of bytes per entry: " + ratio);
      }

      set.add(i);
    }
  }
}

Execution parameters:

Tested with x64 version of JDK7 build 105 under OpenSuse 11.3 x64.

-XX:+UseCompressedOops -Xmx2048m

Output result:

1 mil, 56 MB used,  ratio of bytes per entry: 56
2 mil, 113 MB used,  ratio of bytes per entry: 56
3 mil, 161 MB used,  ratio of bytes per entry: 53
4 mil, 225 MB used,  ratio of bytes per entry: 56
5 mil, 274 MB used,  ratio of bytes per entry: 54
6 mil, 322 MB used,  ratio of bytes per entry: 53
7 mil, 403 MB used,  ratio of bytes per entry: 57
8 mil, 452 MB used,  ratio of bytes per entry: 56
9 mil, 499 MB used,  ratio of bytes per entry: 55
10 mil, 548 MB used,  ratio of bytes per entry: 54
11 mil, 596 MB used,  ratio of bytes per entry: 54
12 mil, 644 MB used,  ratio of bytes per entry: 53
13 mil, 827 MB used,  ratio of bytes per entry: 63
14 mil, 874 MB used,  ratio of bytes per entry: 62
15 mil, 855 MB used,  ratio of bytes per entry: 57
16 mil, 902 MB used,  ratio of bytes per entry: 56
17 mil, 951 MB used,  ratio of bytes per entry: 55
18 mil, 999 MB used,  ratio of bytes per entry: 55
19 mil, 1047 MB used,  ratio of bytes per entry: 55
20 mil, 1096 MB used,  ratio of bytes per entry: 54
21 mil, 1143 MB used,  ratio of bytes per entry: 54
22 mil, 1191 MB used,  ratio of bytes per entry: 54
23 mil, 1239 MB used,  ratio of bytes per entry: 53
24 mil, 1288 MB used,  ratio of bytes per entry: 53
25 mil, 1337 MB used,  ratio of bytes per entry: 53
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

At the end, about 2 GiB real memory were used, instead of displayed 1.3 GiB, so the consumption for each entry is even larger than 53 bytes.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦醒时光 2024-09-22 18:32:13

我知道Java集合非常
内存饥饿，我自己做了一个测试，
证明4GB勉强够用
将数百万个整数存储到
哈希集。

Java 堆！= 系统内存。 Java 的默认堆大小仅为 128MB。请注意，这也不同于 JVM 使用的内存。

关于您的问题：从文档中，

公共 int size()

返回此元素的数量
收藏。如果这个合集
包含超过 Integer.MAX_VALUE
元素，返回 Integer.MAX_VALUE。

回复收藏 0 原文

镜花水月 2024-09-22 18:32:13

你的问题似乎与标题有很大不同的内容。

您已经回答了标题中的问题（返回了Integer.MAX_VALUE）。不：您无法使用可安全迭代集合和计数的普通 API 找出“真实”大小（当然使用 long）。

如果您想要存储int 值的Set，并且您知道值的范围和数量可能会变得非常大，那么BitSet 实际上可能是一个更好的实现：

import java.util.*;
import java.lang.management.*;

public final class IntegersInBitSetMemoryConsumption {
  private final static int MILLION = 1000 * 1000;

  public static void main(String... args) {
    BitSet set = new BitSet(Integer.MAX_VALUE);

    for (int i = 1;; ++i) {
      if ((i % MILLION) == 0) {
        int milsOfEntries = (i / MILLION);
        long mbytes = ManagementFactory.getMemoryMXBean().
            getHeapMemoryUsage().getUsed() / MILLION;
        double ratio = mbytes / milsOfEntries;
        System.out.println(milsOfEntries + " mil, " + mbytes + " MiB used, "
            + " ratio of bytes per entry: " + ratio);
      }

      set.set(i);
    }
  }
}

这将产生一个恒定大小的数据结构，可以保存范围内的所有值，而不改变大小并占用相对较小的内存量（每个可能值 1 位加上一些开销））。

但是，此方法有两个缺点：

它不支持负 int 值
它不提供 Set API

这两个缺点都可以通过编写一个使用的包装器轻松解决两个 BitSet 对象（可能是延迟分配的）分别保存正值范围和负值范围，并实现 Set 接口的适配器方法。

Your question seems to have a quite different content than the title.

You already answered the question in the title (Integer.MAX_VALUE is returned). And no: there's no way you can find out the "true" size with the normal APIs safe for iterating over the collection and counting (using a long of course).

If you want to store a Set of int values and you know that the range and amount of values can become very big, then a BitSet might actually be a better implementation:

import java.util.*;
import java.lang.management.*;

public final class IntegersInBitSetMemoryConsumption {
  private final static int MILLION = 1000 * 1000;

  public static void main(String... args) {
    BitSet set = new BitSet(Integer.MAX_VALUE);

    for (int i = 1;; ++i) {
      if ((i % MILLION) == 0) {
        int milsOfEntries = (i / MILLION);
        long mbytes = ManagementFactory.getMemoryMXBean().
            getHeapMemoryUsage().getUsed() / MILLION;
        double ratio = mbytes / milsOfEntries;
        System.out.println(milsOfEntries + " mil, " + mbytes + " MiB used, "
            + " ratio of bytes per entry: " + ratio);
      }

      set.set(i);
    }
  }
}

This will produce a constant-size data structure that can hold all values inside the range without changing size and occupying a relatively small amount of memory (1 bit per possible value plus some overhead).

This method has two drawbacks, however:

it doesn't support negative int values
it doesn't provide the Set API

Both can easily be worked around by writing a wrapper that uses two BitSet objects (possibly lazily allocated) to hold the positive and negative value range respectively and implements adapter methods for the Set interface.

回复收藏 0 原文

时光与爱终年不遇 2024-09-22 18:32:13

从源代码来看：

 /**
 * Returns the number of elements in this collection.  If this collection
 * contains more than <tt>Integer.MAX_VALUE</tt> elements, returns
 * <tt>Integer.MAX_VALUE</tt>.
 * 
 * @return the number of elements in this collection
 */
int size();

From the source code:

 /**
 * Returns the number of elements in this collection.  If this collection
 * contains more than <tt>Integer.MAX_VALUE</tt> elements, returns
 * <tt>Integer.MAX_VALUE</tt>.
 * 
 * @return the number of elements in this collection
 */
int size();

回复收藏 0 原文