是“结构”数组吗？理论上可以用Java吗？

发布于 2025-01-07 19:56:48 字数 920 浏览 1 评论 0原文

在某些情况下，人们需要高效的内存来存储大量对象。要在 Java 中做到这一点，您必须使用几个原始数组（请参阅下面的原因）或一个大字节数组，这会产生一点 CPU 转换开销。

示例：您有一个 class Point { float x;浮动y;}。现在您想要将 N 个点存储在一个数组中，在 32 位 JVM 上，该数组至少需要 N * 8 字节的浮点数和 N * 4 字节的引用。所以至少 1/3 是垃圾（不计入这里的正常对象开销）。但如果你将其存储在两个浮点数组中，一切都会好起来的。

我的问题：为什么Java不优化引用数组的内存使用？我的意思是为什么不像 C++ 那样直接将对象嵌入到数组中呢？

例如，将 Point 类标记为 Final 应该足以让 JVM 查看 Point 类的数据的最大长度。或者这在哪里违反规范？另外，在处理大型 n 维矩阵等时，这会节省大量内存

更新：

我想知道 JVM 理论上是否可以优化它（例如在幕后）以及在什么条件下 - 不是我可以以某种方式强制 JVM。我认为结论的第二点是它根本无法轻易完成的原因。

JVM 需要知道的结论：

该类需要是最终类，以便让 JVM 猜测一个数组条目的长度
该数组需要是只读的。当然，您可以更改值，例如 Point p = arr[i]; p.setX(i) 但您无法通过 inlineArr[i] = new Point() 写入数组。或者 JVM 必须引入复制语义，这将违背“Java 方式”。请参阅aroth的答案
如何初始化数组（调用默认构造函数或将成员初始化为其默认值）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

左岸枫 2025-01-14 19:56:48

Java 没有提供执行此操作的方法，因为它不是语言级别的选择。 C、C++ 等公开了执行此操作的方法，因为它们是系统级编程语言，您需要了解系统级功能并根据您正在使用的特定体系结构做出决策。

在 Java 中，您的目标是 JVM。 JVM 没有指定这是否是允许的（我假设这是真的；我还没有彻底梳理 JLS 来证明我是对的）。这个想法是，当您编写 Java 代码时，您相信 JIT 会做出明智的决策。这就是引用类型可以折叠成数组等的地方。因此，这里的“Java 方式”是您无法指定它是否发生，但如果 JIT 可以进行优化并提高性能，那么它就可以而且应该这样做。

我不确定是否特别实现了这种优化，但我确实知道类似的优化是：例如，使用 new 分配的对象在概念上位于“堆”上，但如果 JVM 注意到（通过一种称为逃逸分析的技术），该对象是方法局部的，它可以在堆栈上甚至直接在 CPU 寄存器中分配对象的字段，完全消除“堆分配”开销，而无需更改语言。

更新问题的更新

如果问题是“这可以完成吗”，我认为答案是肯定的。有一些极端情况（例如空指针），但您应该能够解决它们。对于空引用，JVM 可以说服自己永远不会有空元素，或者保留一个位向量，如前所述。这两种技术可能都基于逃逸分析，表明数组引用永远不会离开该方法，因为我可以看到，如果您尝试将其存储在对象字段中，簿记会变得很棘手。

回复收藏 0 原文

何必那么矫情 2025-01-14 19:56:48

您描述的场景可能会节省内存（尽管实际上我不确定它是否会这样做），但在实际将对象放入数组时，它可能会增加相当多的计算开销。考虑一下，当您执行 new Point() 时，您创建的对象会在堆上动态分配。因此，如果您通过调用 new Point() 分配 100 个 Point 实例，则不能保证它们的位置在内存中是连续的（事实上，它们很可能不会被分配）到连续的内存块）。

那么，Point 实例实际上如何将其放入“压缩”数组中呢？在我看来，Java 必须显式地将 Point 中的每个字段复制到为数组分配的连续内存块中。对于具有许多字段的对象类型来说，这可能会变得昂贵。不仅如此，原始的 Point 实例仍然占用堆上以及数组内部的空间。因此，除非它立即被垃圾收集（我想任何引用都可以被重写以指向放置在数组中的副本，从而理论上允许立即对原始实例进行垃圾收集），否则您实际上使用的存储空间比您想要的要多如果您刚刚将引用存储在数组中。

此外，如果您有多个“压缩”数组和可变对象类型怎么办？将对象插入数组必然会将该对象的字段复制到数组中。因此，如果您执行以下操作：

Point p = new Point(0, 0);
Point[] compressedA = {p};  //assuming 'p' is "optimally" stored as {0,0}
Point[] compressedB = {p};  //assuming 'p' is "optimally" stored as {0,0}

compressedA[0].setX(5)  
compressedB[0].setX(1)  

System.out.println(p.x);
System.out.println(compressedA[0].x);
System.out.println(compressedB[0].x);

...您将得到：

0
5
1

...即使从逻辑上讲应该只有一个 Point 实例。存储引用可以避免此类问题，并且还意味着在任何情况下，在多个数组之间共享重要对象时，您的总存储使用量可能低于如果每个数组存储一个副本该对象的所有字段。

The scenario you describe might save on memory (though in practice I'm not sure it would even do that), but it probably would add a fair bit of computational overhead when actually placing an object into an array. Consider that when you do new Point() the object you create is dynamically allocated on the heap. So if you allocate 100 Point instances by calling new Point() there is no guarantee that their locations will be contiguous in memory (and in fact they will most likely not be allocated to a contiguous block of memory).

So how would a Point instance actually make it into the "compressed" array? It seems to me that Java would have to explicitly copy every field in Point into the contiguous block of memory that was allocated for the array. That could become costly for object types that have many fields. Not only that, but the original Point instance is still taking up space on the heap, as well as inside of the array. So unless it gets immediately garbage-collected (I suppose any references could be rewritten to point at the copy that was placed in the array, thereby theoretically allowing immediate garbage-collection of the original instance) you're actually using more storage than you would be if you had just stored the reference in the array.

Moreover, what if you have multiple "compressed" arrays and a mutable object type? Inserting an object into an array necessarily copies that object's fields into the array. So if you do something like:

Point p = new Point(0, 0);
Point[] compressedA = {p};  //assuming 'p' is "optimally" stored as {0,0}
Point[] compressedB = {p};  //assuming 'p' is "optimally" stored as {0,0}

compressedA[0].setX(5)  
compressedB[0].setX(1)  

System.out.println(p.x);
System.out.println(compressedA[0].x);
System.out.println(compressedB[0].x);

...you would get:

0
5
1

...even though logically there should only be a single instance of Point. Storing references avoids this kind of problem, and also means that in any case where a nontrivial object is being shared between multiple arrays your total storage usage is probably lower than it would be if each array stored a copy of all of that object's fields.

回复收藏 0 原文

无语# 2025-01-14 19:56:48

这不是等于提供如下这样的琐碎类吗？

class Fixed {
   float hiddenArr[];
   Point pointArray(int position) {
      return new Point(hiddenArr[position*2], hiddenArr[position*2+1]);
   }
}

此外，无需让程序员明确声明他们想要它就可以实现这一点； JVM 已经知道“值类型”（C++ 中的 POD 类型）；其中仅包含其他普通旧数据类型的数据。我相信 HotSpot 在堆栈省略期间使用此信息，为什么它不能对数组也这样做？

Isn't this tantamount to providing trivial classes such as the following?

class Fixed {
   float hiddenArr[];
   Point pointArray(int position) {
      return new Point(hiddenArr[position*2], hiddenArr[position*2+1]);
   }
}

Also, it's possible to implement this without making the programmer explicitly state that they'd like it; the JVM is already aware of "value types" (POD types in C++); ones with only other plain-old-data types inside them. I believe HotSpot uses this information during stack elision, no reason it couldn't do it for arrays too?

回复收藏 0 原文

~没有更多了~