java中字符串的保留堆大小
这是一个我们一直难以理解的问题。用文字描述它很困难,但我希望人们能够理解要点。
我知道字符串的实际内容包含在内部字符数组中。在正常情况下,字符串的保留堆大小将包括 40 个字节加上字符数组的大小。 此处对此进行了解释。调用子字符串时,字符数组保留对原始字符串的引用,因此字符数组的保留大小可能比字符串本身大得多。
然而,当使用 Yourkit 或 MAT 分析内存使用情况时,似乎会发生一些奇怪的事情。引用字符数组的保留大小的字符串不包括字符数组的保留大小。
示例如下(半伪代码):
String date = "2011-11-33"; (24 bytes)
date.value = char{1172}; (2360 bytes)
字符串的保留大小定义为 24 个字节,不包括字符数组的保留大小。如果由于许多子字符串操作而有大量对字符数组的引用,这可能是有意义的。
现在,当该字符串包含在某种类型的集合(例如数组或列表)中时,该数组的保留大小将包括所有字符串的保留大小,包括字符数组的保留大小。
然后我们会遇到这样的情况:
Array's retained size = 300 bytes
array[0] = String 40 bytes;
array[1] = String 40 bytes;
array[1].value = char[] (220 bytes)
因此,您必须查看每个数组条目以尝试找出保留大小的来源。
同样,这可以解释为该数组保存了所有保存对同一字符数组的引用的字符串,因此该数组的保留大小总共是正确的。
现在我们来解决问题了。
我在一个单独的对象中保留了对上面讨论的数组以及具有相同字符串的不同数组的引用。在两个数组中,字符串引用相同的字符数组。这是预料之中的——毕竟我们谈论的是同一个字符串。但是,此新对象中的两个数组都会计算此字符数组的保留大小。换句话说,保留的大小似乎是两倍。如果我删除第一个数组,那么第二个数组仍将保存对字符数组的引用,反之亦然。这会导致混乱,因为 java 似乎持有对同一字符数组的两个单独的引用。怎么会这样呢?这是java内存的问题还是只是分析器显示信息的方式?
这个问题让我们在尝试追踪应用程序中巨大的内存使用情况时遇到了很多麻烦。
再次 - 我希望有人能够理解这个问题并解释它。
感谢您的帮助
This is a question that we have had trouble understanding. It's tricky to describe it using text but I hope that the gist will be understood.
I understand that a string's actual content is enclosed in an internal char array. In normal instances the retained heap size of the string will include 40 bytes plus the size of the character array. This is explained here. When calling a substring the character array retains a reference to the original string and therefore the retained size of the character array could be a lot bigger than the string itself.
However when profiling memory usage using Yourkit or MAT something strange seems to happen. The string that references the char array's retained size does not include the retained size of the character array.
An example could be as follows (semi pseudo-code):
String date = "2011-11-33"; (24 bytes)
date.value = char{1172}; (2360 bytes)
The string's retained size is defined as 24 bytes without including the character array's retained size. This could make sense if there are a lot of references to the character array due to many substring operations.
Now when this string is included in some type of collection such as an array or list then the retained size of this array will include the retained size of all the strings including the character array's retained size.
We then have a situation like this:
Array's retained size = 300 bytes
array[0] = String 40 bytes;
array[1] = String 40 bytes;
array[1].value = char[] (220 bytes)
You therefore have to look into each array entry to try to work out where the retained size comes from.
Again this can be explained in that the array holds all the strings that hold references to the same character array and therefore altogether the array's retained size is correct.
Now we get to the problem.
I keep in a separate object a reference to the array that I discussed above as well as a different array with the same strings. In both arrays the strings refer to the same character array. This is expected - after all we are talking about the same string. However the retained size of this character array is counted for both arrays in this new object. In other words the retained size seems to be double. If I delete the first array then the second array will still hold a reference to the character array and vice versa. This causes a confusion in that it seems that java is holding two separate references to the same character array. How can this be? Is this a problem with java's memory or is it just the way that the profilers display information?
This problem caused a lot of headaches for us in trying to track down huge memory usage in our application.
Again - I hope that someone out there will be able to understand the question and explain it.
Thanks for your help
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这里有一个支配树中的传递引用:
角色数组不应显示在任一数组的保留大小中。如果探查器以这种方式显示它,那么就会产生误导。
这是 JProfiler 在最大对象视图中显示这种情况的方式:
< img src="https://i.sstatic.net/jD1aE.png" alt="在此处输入图像描述">
两个数组中包含的字符串实例显示在数组实例外部,并带有 [传递引用] 标签。如果您想探索实际路径,可以将数组持有者和字符串添加到图表中并查找它们之间的所有路径:
免责声明:我的公司开发 JProfiler。
What you have here is a transitive reference in a dominator tree:
The character array should not show up in the retained size of either array. If the profiler displays it that way, then that's misleading.
This is how JProfiler shows this situation in the biggest objects view:
The string instance that is contained in both arrays, is shown outside the array instances, with a [transitive reference] label. If you want to explore the actual paths, you can add the array holder and the string to the graph and find all paths between them:
Disclaimer: My company develops JProfiler.
我想说这只是探查器显示信息的方式。它不知道这两个数组应该考虑进行“重复数据删除”。将两个数组包装到某种虚拟持有者对象中,然后针对它运行探查器怎么样?然后,它应该能够处理“重复计算”。
I'd say it is just the way the profiler displays the information. It has no idea that the two arrays should be considered for "deduplication". How about you wrap the two arrays into some kind of dummy holder object, and run your profiler against that? Then, it should be able to take care of the "double-counting".
除非字符串被保留,否则它们可以是
equal()
,但不能是==
。当从 char 数组构造 String 对象时,构造函数将复制 char 数组。 (这是保护不可变字符串免受以后 char 数组值更改的影响的唯一方法。)Unless the strings are interned, they can be
equal()
but not==
. When constructing a String object from a char array, the constructor will make a copy of the char array. (This is the only way to shield the immutable String from later changes in the char array values.)运行,
如果您使用
-XX:-UseTLAB
打印您可以看到它消耗的内存比您预期的要多,如果它们共享相同的后端存储。
如果你看一下String类中的代码。
您可以看到 String 的子字符串不会获取基础值数组的副本。
另一件需要考虑的事情是
-XX:+ UseCompressedStrings
在较新版本的 JVM 上默认处于启用状态。这鼓励 JVM 在可能的情况下使用 byte[] 而不是 char[]。对于 32 位 JVM、具有 32 位引用的 64 位 JVM 以及具有 64 位引用的 64 位 JVM,字符串和数组对象的标头大小有所不同。
If you run with
-XX:-UseTLAB
prints
You can see its consuming more memory that you might expect if they shared the same back end store.
If you look at the code in the String class.
You can see that substring for String doesn't take a copy of the underlying value array.
Another thing to consider is the
-XX:+UseCompressedStrings
which is on by default on newer versions of the JVM. This encourages the JVM to use byte[] instead of char[] where possible.The size of the headers for the String and array object varies for 32-bit JVMs, 64-bit JVM with 32-bit references and 64-bit JVMs with 64-bit references.