数组内存分配 - 分页
不确定 Java、C# 和 C++ 的答案是否相同,因此我将它们全部分类。所有语言的答案都会很好。
我一整天都在想,如果我分配数组,所有单元格都将位于一个连续的空间中。因此,如果系统中的某一块内存不足,就会引发内存不足异常。
我说的可以吗?或者分配的数组是否有可能被分页?
Not sure if the answer would be the same for Java, C# and C++, so I categorized all of them. Answer for all languages would be nice.
All days I've been thinking, that if I allocate array all the cells would be in one, contiguous space. So if there isn't enough memory in one piece in system there will be raised out of memory exception.
Is it all right, what I said? Or is there possibility, that allocated array would be paginated?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
C++ 数组是连续的,这意味着内存具有连续的地址,即它在虚拟地址空间中是连续的。它不需要在物理地址空间中连续,因为现代处理器(或其内存子系统)具有将虚拟页与物理页关联起来的大映射。在用户模式下运行的进程永远不会看到其阵列的物理地址。
我认为实际上大多数或所有 Java 实现都是相同的。但程序员永远看不到数组元素的实际地址,只是看到对数组的引用以及对其进行索引的方法。因此,理论上,Java 实现可以分解数组并在
[]
运算符中隐藏该事实,尽管 JNI 代码仍然可以以 C++ 风格查看数组,此时需要一个连续的块。这是假设 JVM 规范中没有关于数组布局的内容,jarnbjo 告诉我没有。我不懂 C#,但我预计情况与 Java 非常相似 - 您可以想象实现可能使用
[]
运算符来隐藏数组在虚拟中不连续的事实地址空间。一旦有人获得了指向它的指针,这种伪装就会失败。 [编辑:多项式表示 C# 中的数组可以是不连续的,直到有人固定它们,这是有道理的,因为您知道必须在将对象传递到使用地址的低级代码之前固定对象。]请注意,如果您分配一些数组大对象类型,那么在C++中数组实际上是许多首尾相连的大型结构,因此连续分配所需的大小取决于对象的大小。在 Java 中,对象数组“实际上”是引用数组。所以这是一个比 C++ 数组更小的连续块。对于本机类型,它们是相同的。
C++ arrays are contiguous, meaning that the memory has consecutive addresses, i.e. it's contiguous in virtual address space. It need not be contiguous in physical address space, since modern processors (or their memory subsystems) have a big map that associates virtual pages with physical pages. Processes running in user mode never see physical addresses of their arrays.
I think in practice most or all Java implementations are the same. But the programmer never sees an actual address of an array element, just a reference to the array and the means to index it. So in theory, a Java implementation could fracture arrays and hide that fact in the
[]
operator, although JNI code can still view the array in the C++ style, at which point a contiguous block would be needed. This is assuming there's nothing in the JVM spec about the layout of arrays, which jarnbjo tells me there isn't.I don't know C#, but I expect the situation is pretty similar to Java - you can imagine that an implementation might use the
[]
operator to hide the fact that an array isn't contiguous in virtual address space. The pretense would fail as soon as someone obtained a pointer into it. [Edit: Polynomial says that arrays in C# can be discontiguous until someone pins them, which makes sense since you know you have to pin objects before passing them into low-level code that uses addresses.]Note that if you allocate an array of some large object type, then in C++ the array actually is that many large structures laid end-to-end, so the required size of the contiguous allocation depends on the size of the object. In Java, an array of objects is "really" an array of references. So that's a smaller contiguous block than the C++ array. For native types they're the same.
在 C# 中,您无法保证内存块是连续的。 CLR 尝试在一个连续的块中分配内存,但也可能在多个块中分配内存。关于 CLR 应如何管理 C# 内存几乎没有定义的行为,因为它被设计为由托管构造抽象出来。
在 C# 中,唯一真正重要的情况是,如果您通过 P/Invoke 将数组作为指针传递给某些非托管代码,在这种情况下,您应该使用 GC.Pin 来锁定对象的位置记忆中。也许其他人能够解释 CLR 和 GC 在这种情况下如何处理对连续内存的需求。
In C# you can't guarantee that the memory block will be contiguous. The CLR tries to allocate the memory in one contiguous block, but it may allocate it in several blocks. There is little defined behaviour about how a CLR should manage C# memory, because it is designed to be abstracted away by managed constructs.
The only time it should really matter in C# is if you're passing the array as a pointer via P/Invoke to some unmanaged code, in which case you should use
GC.Pin
to lock the object's location in memory. Perhaps someone else will be able to explain how the CLR and GC handles the need for contiguous memory in this case.确实,在 Java 和 C# 中,但 C++ 仅当达到进程或系统限制时才会出现错误。不同之处在于,在 Java 和 C# 中,应用程序对自身施加了限制。在 C++ 中,限制是由操作系统施加的。
这也是有可能的。然而,在 Java 中,对堆进行分页对性能非常不利。当 GC 运行时,所有检查的对象都必须位于内存中。在 C++ 中,它不是很好,但影响较小。
如果您想要可以在 Java 中分页的大型结构,您可以使用 ByteBuffer.allocateDirect() 或内存映射文件。这是通过使用堆外内存来实现的(基本上是 C++ 使用的)
True, in Java and C#, but C++ will only get an error when you have reached the process or system limit. The difference is that in Java and C# its the application imposing a limit on itself. In C++ the limit is imposed by the OS.
This is also possible. However in Java, having the heap paged is very bad for performance. When a GC runs, all the objects examined have to be in memory. In C++ its not great but has less impact.
If you want large structures which could be paged in Java you can use ByteBuffer.allocateDirect() or memory mapped files. This works by using memory off the heap (basicaly what C++ uses)
在 C(++) 程序中,通常(也就是说,除非我们谈论的是解释代码而不是编译它+直接执行它)数组在虚拟地址空间中是连续的(当然,如果在有问题的平台)。
在那里,如果不能连续分配一个大数组,即使有足够的可用内存,您也会得到 std::bad_alloc 异常(在 C++ 中)或 NULL(来自 C/C++ 中的类似 malloc() 的函数或非抛出异常) C++ 中的 new 运算符)。
虚拟内存(以及到/从磁盘的分页)通常不能解决虚拟地址空间碎片问题,或者至少不能直接解决,其目的不同。它通常用于让程序认为有足够的内存,而实际上没有。 RAM 可以通过可用磁盘空间有效地扩展,但代价是降低性能,因为当存在内存压力时,操作系统必须在 RAM 和磁盘之间交换数据。
您的阵列(部分或全部)可以由操作系统卸载到磁盘。但这对您的程序来说是透明的,因为每当它需要访问数组中的某些内容时,操作系统都会将其加载回来(同样,部分或全部,根据操作系统认为有必要)。
在没有虚拟内存的系统上,没有虚拟到物理地址的转换,您的程序将直接使用物理内存,因此,它将必须处理物理内存碎片,并与其他程序竞争可用内存和地址空间,一般来说,分配失败的可能性更大(具有虚拟内存的系统通常在单独的虚拟地址空间中运行程序,并且应用程序 A 的虚拟地址空间中的碎片不会影响应用程序 B 的虚拟地址空间)。
In C(++) programs typically (that is, unless we're talking about interpreting code instead of compiling it+executing it directly) arrays are contiguous in the virtual address space (if, of course, there is such a thing on the platform in question).
There, if a big array can't be allocated contiguously, even if there's enough free memory, you will get either the std::bad_alloc exception (in C++) or NULL (from malloc()-like functions in C/C++ or nonthrowing operator new in C++).
Virtual memory (and paging to/from disk) usually doesn't solve virtual address space fragmentation problems, or, at least, not directly, its purpose is different. It's normally used to let programs think there's enough memory, when in fact there isn't. The RAM is effectively extended by the free disk space at the expense of lower performance because the OS has to exchange data between the RAM and disk when there's memory pressure.
Your array (in parts or in whole) can be offloaded to the disk by the OS. But this is made transparent to your program because whenever it needs to access something from the array the OS will load it back (again, in parts or in whole, as the OS deems necessary).
On systems without virtual memory, there's no virtual to physical address translation and your program will work directly with physical memory, hence, it will have to deal with the physical memory fragmentation and also compete with other programs for both free memory and the address space, making allocation failures more likely to occur in general (systems with virtual memory often run programs in separate virtual address spaces and fragmentation in app A's virtual address space won't affect that of app B's).
当然是 Java 和 C#。我们可以通过在内存页面大小为 4096 字节的 Windows 计算机上运行
byte[] array = new byte[4097];
来显示这一点。因此它必须在不止一页中。当然,分页会影响性能,但这可能是使用 .NET 或 Java 等框架的 GC 具有优势的情况之一,因为 GC 是由知道分页发生的人编写的。结构中仍然存在一些优点,使其更有可能在同一页面上具有相关元素(与指针追逐集合相比,支持数组的集合更受青睐)。这在CPU缓存方面也有优势。 (大型数组仍然是导致 GC 必须努力解决的堆碎片的最佳方法之一,但由于 GC 非常擅长这样做,因此它仍然比处理同一问题的许多其他方法要胜出)。
对于 C++ 来说几乎可以肯定,因为我们通常在操作系统的内存管理级别进行编码 - 数组位于连续的虚拟空间(无论是在堆上还是在堆栈上),而不是连续的物理空间。在 C 或 C++ 中可以在低于该级别的级别进行编码,但这通常只能由实际编写内存管理代码本身的人来完成。
With Java and C# certainly. We can show this by running
byte[] array = new byte[4097];
on a Windows machine where the memory page size is 4096bytes. It hence must be in more than one page.Of course paging impacts performance, but this can be one of the cases where GC using frameworks like .NET or Java can have an advantage, because the GC was written by people who know paging happens. There are still advantages in structures that make it more likely to have related elements on the same page (favouring array-backed collections over pointer-chasing collections). This also has an advantage in terms of CPU caches. (Large arrays are still one of the best ways to cause heap fragmentation that the GC has to struggle with, still since the GC is pretty good at doing so, it's still going to be a win over many other ways of dealing with the same issue).
With C++ almost certainly, because we normally code at the level of the memory-management of the operating system - arrays are in contiguous virtual space (whether on the heap or the stack), not contiguous physical space. It's possible in C or C++ to code at a level below that, but that's normally only done by people actually writing the memory-management code itself.