是“结构”数组吗?理论上可以用Java吗?
在某些情况下,人们需要高效的内存来存储大量对象。要在 Java 中做到这一点,您必须使用几个原始数组(请参阅下面的原因)或一个大字节数组,这会产生一点 CPU 转换开销。
示例:您有一个 class Point { float x;浮动y;}
。现在您想要将 N 个点存储在一个数组中,在 32 位 JVM 上,该数组至少需要 N * 8 字节的浮点数和 N * 4 字节的引用。所以至少 1/3 是垃圾(不计入这里的正常对象开销)。但如果你将其存储在两个浮点数组中,一切都会好起来的。
我的问题:为什么Java不优化引用数组的内存使用?我的意思是为什么不像 C++ 那样直接将对象嵌入到数组中呢?
例如,将 Point 类标记为 Final 应该足以让 JVM 查看 Point 类的数据的最大长度。或者这在哪里违反规范?另外,在处理大型 n 维矩阵等时,这会节省大量内存
更新:
我想知道 JVM 理论上是否可以优化它(例如在幕后)以及在什么条件下 - 不是我可以以某种方式强制 JVM。我认为结论的第二点是它根本无法轻易完成的原因。
JVM 需要知道的结论:
- 该类需要是最终类,以便让 JVM 猜测一个数组条目的长度
- 该数组需要是只读的。当然,您可以更改值,例如 Point p = arr[i]; p.setX(i) 但您无法通过
inlineArr[i] = new Point()
写入数组。或者 JVM 必须引入复制语义,这将违背“Java 方式”。请参阅aroth的答案 - 如何初始化数组(调用默认构造函数或将成员初始化为其默认值)
There are cases when one needs a memory efficient to store lots of objects. To do that in Java you are forced to use several primitive arrays (see below why) or a big byte array which produces a bit CPU overhead for converting.
Example: you have a class Point { float x; float y;}
. Now you want to store N points in an array which would take at least N * 8 bytes for the floats and N * 4 bytes for the reference on a 32bit JVM. So at least 1/3 is garbage (not counting in the normal object overhead here). But if you would store this in two float arrays all would be fine.
My question: Why does Java not optimize the memory usage for arrays of references? I mean why not directly embed the object in the array like it is done in C++?
E.g. marking the class Point final should be sufficient for the JVM to see the maximum length of the data for the Point class. Or where would this be against the specification? Also this would save a lot of memory when handling large n-dimensional matrices etc
Update:
I would like to know wether the JVM could theoretically optimize it (e.g. behind the scene) and under which conditions - not wether I can force the JVM somehow. I think the second point of the conclusion is the reason it cannot be done easily if at all.
Conclusions what the JVM would need to know:
- The class needs to be final to let the JVM guess the length of one array entry
- The array needs to be read only. Of course you can change the values like
Point p = arr[i]; p.setX(i)
but you cannot write to the array viainlineArr[i] = new Point()
. Or the JVM would have to introduce copy semantics which would be against the "Java way". See aroth's answer - How to initialize the array (calling default constructor or leaving the members intialized to their default values)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Java 没有提供执行此操作的方法,因为它不是语言级别的选择。 C、C++ 等公开了执行此操作的方法,因为它们是系统级编程语言,您需要了解系统级功能并根据您正在使用的特定体系结构做出决策。
在 Java 中,您的目标是 JVM。 JVM 没有指定这是否是允许的(我假设这是真的;我还没有彻底梳理 JLS 来证明我是对的)。这个想法是,当您编写 Java 代码时,您相信 JIT 会做出明智的决策。这就是引用类型可以折叠成数组等的地方。因此,这里的“Java 方式”是您无法指定它是否发生,但如果 JIT 可以进行优化并提高性能,那么它就可以而且应该这样做。
我不确定是否特别实现了这种优化,但我确实知道类似的优化是:例如,使用
new
分配的对象在概念上位于“堆”上,但如果 JVM 注意到(通过一种称为逃逸分析的技术),该对象是方法局部的,它可以在堆栈上甚至直接在 CPU 寄存器中分配对象的字段,完全消除“堆分配”开销,而无需更改语言。更新问题的更新
如果问题是“这可以完成吗”,我认为答案是肯定的。有一些极端情况(例如空指针),但您应该能够解决它们。对于空引用,JVM 可以说服自己永远不会有空元素,或者保留一个位向量,如前所述。这两种技术可能都基于逃逸分析,表明数组引用永远不会离开该方法,因为我可以看到,如果您尝试将其存储在对象字段中,簿记会变得很棘手。
Java doesn't provide a way to do this because it's not a language-level choice to make. C, C++, and the like expose ways to do this because they are system-level programming languages where you are expected to know system-level features and make decisions based on the specific architecture that you are using.
In Java, you are targeting the JVM. The JVM doesn't specify whether or not this is permissible (I'm making an assumption that this is true; I haven't combed the JLS thoroughly to prove that I'm right here). The idea is that when you write Java code, you trust the JIT to make intelligent decisions. That is where the reference types could be folded into an array or the like. So the "Java way" here would be that you cannot specify if it happens or not, but if the JIT can make that optimization and improve performance it could and should.
I am not sure whether this optimization in particular is implemented, but I do know that similar ones are: for example, objects allocated with
new
are conceptually on the "heap", but if the JVM notices (through a technique called escape analysis) that the object is method-local it can allocate the fields of the object on the stack or even directly in CPU registers, removing the "heap allocation" overhead entirely with no language change.Update for updated question
If the question is "can this be done at all", I think the answer is yes. There are a few corner cases (such as null pointers) but you should be able to work around them. For null references, the JVM could convince itself that there will never be null elements, or keep a bit vector as mentioned previously. Both of these techniques would likely be predicated on escape analysis showing that the array reference never leaves the method, as I can see the bookkeeping becoming tricky if you try to e.g. store it in an object field.
您描述的场景可能会节省内存(尽管实际上我不确定它是否会这样做),但在实际将对象放入数组时,它可能会增加相当多的计算开销。考虑一下,当您执行 new Point() 时,您创建的对象会在堆上动态分配。因此,如果您通过调用 new Point() 分配 100 个 Point 实例,则不能保证它们的位置在内存中是连续的(事实上,它们很可能不会被分配)到连续的内存块)。
那么,
Point
实例实际上如何将其放入“压缩”数组中呢?在我看来,Java 必须显式地将Point
中的每个字段复制到为数组分配的连续内存块中。对于具有许多字段的对象类型来说,这可能会变得昂贵。不仅如此,原始的 Point 实例仍然占用堆上以及数组内部的空间。因此,除非它立即被垃圾收集(我想任何引用都可以被重写以指向放置在数组中的副本,从而理论上允许立即对原始实例进行垃圾收集),否则您实际上使用的存储空间比您想要的要多如果您刚刚将引用存储在数组中。此外,如果您有多个“压缩”数组和可变对象类型怎么办?将对象插入数组必然会将该对象的字段复制到数组中。因此,如果您执行以下操作:
...您将得到:
...即使从逻辑上讲应该只有一个
Point
实例。存储引用可以避免此类问题,并且还意味着在任何情况下,在多个数组之间共享重要对象时,您的总存储使用量可能低于如果每个数组存储一个副本该对象的所有字段。The scenario you describe might save on memory (though in practice I'm not sure it would even do that), but it probably would add a fair bit of computational overhead when actually placing an object into an array. Consider that when you do
new Point()
the object you create is dynamically allocated on the heap. So if you allocate 100Point
instances by callingnew Point()
there is no guarantee that their locations will be contiguous in memory (and in fact they will most likely not be allocated to a contiguous block of memory).So how would a
Point
instance actually make it into the "compressed" array? It seems to me that Java would have to explicitly copy every field inPoint
into the contiguous block of memory that was allocated for the array. That could become costly for object types that have many fields. Not only that, but the originalPoint
instance is still taking up space on the heap, as well as inside of the array. So unless it gets immediately garbage-collected (I suppose any references could be rewritten to point at the copy that was placed in the array, thereby theoretically allowing immediate garbage-collection of the original instance) you're actually using more storage than you would be if you had just stored the reference in the array.Moreover, what if you have multiple "compressed" arrays and a mutable object type? Inserting an object into an array necessarily copies that object's fields into the array. So if you do something like:
...you would get:
...even though logically there should only be a single instance of
Point
. Storing references avoids this kind of problem, and also means that in any case where a nontrivial object is being shared between multiple arrays your total storage usage is probably lower than it would be if each array stored a copy of all of that object's fields.这不是等于提供如下这样的琐碎类吗?
此外,无需让程序员明确声明他们想要它就可以实现这一点; JVM 已经知道“值类型”(C++ 中的 POD 类型);其中仅包含其他普通旧数据类型的数据。我相信 HotSpot 在堆栈省略期间使用此信息,为什么它不能对数组也这样做?
Isn't this tantamount to providing trivial classes such as the following?
Also, it's possible to implement this without making the programmer explicitly state that they'd like it; the JVM is already aware of "value types" (POD types in C++); ones with only other plain-old-data types inside them. I believe HotSpot uses this information during stack elision, no reason it couldn't do it for arrays too?