Java简单数据结构的内存使用
我想要对用 java 实现的缓存进行非常精确的测量。请告诉我这种方法是否可行。
我有一个将字符串映射到字符串数组的哈希图。有什么方法可以很好地近似这个数据结构吗?
如何获取字符串的大小?调用 String.toByte() 并为保存对象的开销添加一些加号?
字符串数组是所有字符串的和吗?还是有一些开销?
哈希映射是否也有一些开销,可能将对象包装到某个条目对象中?
对于map中所有未使用的空间,hashmap仍然分配了一些空间,我可以为map中所有未使用的空间求和
对于map2 * 空指针
吗?
我对部分答案感到满意,也为我指明了正确的方向。
I want to have a quite exact measurement of my cache implemented in java. Please tell me if this approach is possible.
I have a hashmap mapping a string to an array of string. Is there some way to get a good aproximation of this data structure?
How do I get the size of a string? Call String.toByte() and add some plus for the overhead of holding the object?
Is a string array the sum of all strings? Or is there some overhead?
Does the hashmap also have some overead, maybe wrapping the objects into some entry object?
For all unused space in the map, the hashmap still allocates some space, can I sum up
2 * null pointer
for all unsused spaces in the map?
I'm happy with partical answers aswell poiting me in the right direction.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我认为一个很好的实用方法是使用内存分析器,例如 YourKit。
I think a good practical approach is to use a memory profiler such as YourKit.
您是否尝试过
Instrumentation.getObjectSize()
?这可能会告诉您想要什么,尽管 JavaDoc 声称这只是一个估计。Have you tried
Instrumentation.getObjectSize()
? This might tell you want you want, though the JavaDoc claims that it's only an estimate.对象实例隐含的实际内存开销取决于 JVM 实现的一些内部细节,并且可能很难定义,因为它可能在对象的整个生命周期中发生变化(在垃圾收集器内,对象可以在使用不同内存管理结构的代之间“移动”)。
一个非常粗略的近似是,任何对象的每个实例都包含两个“字”(32 位机器上两个 32 位值,64 位机器上两个 64 位值);其中一个词或多或少是指向该对象的
Class
实例的指针,另一个词保存一些对象状态,例如该对象的监视器(您使用synchronized
锁定的那个)代码>)。然后是对象字段。对于数组,必须将数组长度以及值写入对象中的某个位置。此时,查看 Java 类的源代码(在 JDK 发行版中查找名为 src.zip 的文件)。在
String.java
文件中,我们可以看到,在内部,String
实例有四个字段: 对char
值数组的引用,和三个int
(一个是数组中第一个字符串字符的索引,第二个是字符串长度,第三个是缓存字符串哈希码)。因此,对于 32 位计算机,您可以估计 n 个字符的String
实例的最小内存使用量为以下总和:String
实例对象头String
实例字段的四个 32 位字 数组char
是 16 位)这只是最小值,因为
String
实例仅引用内部字符数组的块,因此数组内存大小可能会更大。另一方面,字符数组可以在多个String
实例之间共享。这种结构使String.substring()
变得非常快:新的String
实例在内部使用相同的数组,因此不涉及数据复制;但这也意味着,如果你有一个大字符串,取出它的一个小子字符串,并存储那个小子字符串,那么你实际上也在 RAM 中保留了大数组(对于String
实例str
,您可以使用new String(str)
来获取一个新实例,该实例将在内部使用新分配和修剪的数组实例)。好的一面是,如果您有两个字符串,一个是另一个的子字符串,并且将两者都存储在缓存中,那么您只需为公共内部数组支付一次费用。因此,即使不考虑 GC 隐含的所有隐藏成本,也很难知道“字符串的内存大小”意味着什么:如果两个
String
实例共享相同的内部数组,您如何计算每个字符串的“大小”?查看
HashMap
的源代码将显示,还有一些内部实例也已分配;有一组HashMap.Entry
实例,每个存储的值都有一个HashMap.Entry
实例。数组大小根据条目数和配置的负载因子动态调整。由于计算内存大小很困难,因此一种完全不同的解决方案是让 GC 自行决定何时应删除旧的缓存条目。这在内部使用了“软引用”:它们是某种指针,当内存变得紧张时,GC 可能会将其设置为 null(中断引用可能允许 GC 释放更多对象)。这形成了一个粗略的“内存感知”缓存,它会根据可用的 RAM 自动修剪。一个有用的库是 Google 的 Guava 及其 MapMaker 类。
The actual memory overhead implied by an object instance depends on some internal details of the JVM implementation, and may be hard to define because it can change throughout the lifetime of the object (within the garbage collector, an object can "move" between generations which use distinct memory management structures).
A very rough approximation is that each instance of any object includes two "words" (two 32-bit values on a 32-bit machine, two 64-bit values on a 64-bit machines); one of the words is more or less a pointer to the
Class
instance for that object, the other holds some object state such as the monitor for that object (the one you lock withsynchronized
). Then there are the object fields. For an array, the array length must be written somewhere in the object, and also the values.At that point, have a look at the source code for the Java classes (look for a file named
src.zip
in the JDK distribution). In theString.java
file, we can see that, internally, aString
instance has four fields: a reference to an array ofchar
values, and threeint
(one is the index of the first string character within the array, the second is the string length, and the third caches the string hashcode). So, for a 32-bit machine, you can estimate that the minimal memory usage for aString
instance of n characters is the sum of:String
instance object headerString
instance fieldchar
is 16-bit)That's only a minimum because the
String
instance only references a chunk of the internal character array, so the array memory size could be larger. On the other hand, the array of characters may be shared between severalString
instances. This structures allowsString.substring()
to be very fast: the newString
instance internally uses the same array , so there is no data copying involved; but it also means that if you have a big string, take a small substring of it, and store that small substring, you are actually retaining the big array in RAM as well (for aString
instancestr
, you can makenew String(str)
to get a new instance which will internally use a newly allocated and trimmed down array instance). On the bright side, if you have two strings, one being a substring of the other, and you store both in your cache, then you pay only once for the common internal array.Hence, even without considering all the hidden costs implied by the GC, it is quite hard to know what "memory size for a string" means: if two
String
instances share the same internal array, how do you count the "size" of each string ?Looking at the source for
HashMap
will show you that there are internal instances which are also allocated; there is an array ofHashMap.Entry
instances, and oneHashMap.Entry
instance for every stored value. The array size is dynamically adjusted depending on the number of entries and the configured load factor.Since accounting for the memory size is hard, an altogether different solution is to let the GC itself decide when old cache entries should be removed. This internally uses "soft references": they are some kind of pointers which the GC may set to
null
when memory becomes tight (breaking references may allow the GC to free more objects). This makes for a crude "memory-aware" cache which is automatically pruned depending on the available RAM. A useful library for that is Google's Guava and its MapMaker class.1) 让我们假设,虽然不能保证(不同的 JVM 可能有不同的行为)
2) 字符串总和加上保存对象(数组)的开销
3) 当然,很多。对象被包装成条目,然后这些条目被存储到内部 HashSet 中,等等……至少在 Oracle JVM 中是这样。
4) 地图上没有“未使用”的空间...什么意思?
总而言之,不幸的是,没有办法获得这些问题的准确答案。它取决于 VM、GC、操作系统等...分析器可以为您提供一些与一种配置相关的有用信息,但这只是您希望获得的最多信息。
这是设计使然:Java 及其垃圾收集器希望您永远不必担心内存分配和管理细节。大多数时候这都很棒,但对你来说这是一种负担。无论如何,你为什么有这样的需求?
1) Let's assume that, although it's not guaranteed (different JVM can act differently)
2) Sum of strings plus the overhead of holding an object (the array)
3) Sure, a lot. Objects are wrapped into entries, these entries are then stored into an internal HashSet, etc... Well at least in the Oracle JVM.
4) There's no "unused" space in the map... What do you mean ?
Well to sum up, unfortunately, there's NO way to get a precise answer of any of these questions. It depends on the VM, the GC, the operating system, etc... A profiler could give you some useful information related to one configuration, but that's the most you can ever hope to get.
It's by design: Java and its garbage collector want you to never have to worry about memory allocation and management details. It's awesome most of the time, in your case it's a burden. Why do you have such a need, anyway ?
量化内存使用情况的一个简单方法是使用以下内容:
jmap -histo:live
(Java 进程的进程 ID)这将为您提供堆的直方图。对于每个 Java 类,都会打印对象数量、内存大小(以字节为单位)以及完全限定的类名称。
您还可以这样做:
jmap -dump:live pid
以 hprof 二进制格式转储 Java 堆。
我会更多地了解 jmap。当你的瓶颈是 java 的内存时,它非常有帮助。
例如,您可以创建一个每 30 秒执行一次 jmap -histo 的脚本。然后,您可以绘制输出图并查看在 Java 类中创建的每个对象的内存演变情况。
以下是 jmap -histo 的一个示例:
更多示例 此处
此外,分析您的流程也是一个不错的选择。
我建议使用 visualvm(免费) 或 jprofiler7(不是免费的,但很棒!)
A simple way to quantify your memory usage would be to use the following:
jmap -histo:live <pid>
(process id of your java process)This will give you a histogram of the heap. For each Java class, number of objects, memory size in bytes, and fully qualified class names are printed.
You can also do:
jmap -dump:live pid
Dumps the Java heap in hprof binary format.
I would look more into jmap. It is very helpful when your bottleneck is memory for java.
For example, you can create a script that does a jmap -histo every 30 seconds. Then you can graph the output and see the evolution of memory for each object created in your java classes.
Here is one example of jmap -histo:
More examples here
Also, profile your process would a good choice too.
I would recommend using visualvm (free) or jprofiler7 (not free, but awesome!)