.NET 数组的内存布局
.NET 数组的内存布局是什么?
以这个数组为例:
Int32[] x = new Int32[10];
我知道数组的大部分是这样的:
0000111122223333444455556666777788889999
其中每个字符都是一个字节,数字对应于数组的索引。
另外,我知道有一个类型引用,以及所有对象的同步块索引,因此上面的内容可以调整为:
ttttssss0000111122223333444455556666777788889999
^
+- object reference points here
此外,需要存储数组的长度,所以也许这更正确:
ttttssssllll0000111122223333444455556666777788889999
^
+- object reference points here
是这样的吗?完全的? 数组中是否还有更多数据?
我问的原因是,我们试图估计一个相当大的数据语料库的几个不同的内存表示将占用多少内存,并且数组的大小变化很大,因此开销可能会有对一种解决方案影响很大,但对另一种解决方案影响可能不大。
所以基本上,对于一个数组来说,有多少开销,这基本上是我的问题。
在数组不好小队醒来之前,这部分解决方案是静态构建一次引用经常类型的事情,因此这里不需要使用可增长列表。
What is the memory layout of a .NET array?
Take for instance this array:
Int32[] x = new Int32[10];
I understand that the bulk of the array is like this:
0000111122223333444455556666777788889999
Where each character is one byte, and the digits corresponds to indices into the array.
Additionally, I know that there is a type reference, and a syncblock-index for all objects, so the above can be adjusted to this:
ttttssss0000111122223333444455556666777788889999
^
+- object reference points here
Additionally, the length of the array needs to be stored, so perhaps this is more correct:
ttttssssllll0000111122223333444455556666777788889999
^
+- object reference points here
Is this complete? Are there more data in an array?
The reason I'm asking is that we're trying to estimate how much memory a couple of different in-memory representations of a rather large data corpus will take and the size of the arrays varies quite a bit, so the overhead might have a large impact in one solution, but perhaps not so much in the other.
So basically, for an array, how much overhead is there, that is basically my question.
And before the arrays are bad squad wakes up, this part of the solution is a static build-once-reference-often type of thing so using growable lists is not necessary here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
检查这一点的一种方法是查看 WinDbg 中的代码。 因此,给出下面的代码,让我们看看它是如何出现在堆上的。
要做的第一件事是找到实例。 由于我已将其设置为
Main()
中的本地变量,因此很容易找到实例的地址。从地址中我们可以转储实际实例,这给了我们:
这告诉我们这是我们的 Int32 数组,有 10 个元素,总大小为 52 字节。
让我们转储实例所在的内存。
我已经为 52 字节插入了括号。
编辑:忘记第一次发帖的长度。
该列表略有不正确,因为 romkyns 指出实例实际上从地址 - 4 开始,第一个字段是同步块。
One way to examine this is to look at the code in WinDbg. So given the code below, let's see how that appears on the heap.
The first thing to do is to locate the instance. As I have made this a local in
Main()
, it is easy to find the address of the instance.From the address we can dump the actual instance, which gives us:
This tells us that it is our Int32 array with 10 elements and a total size of 52 bytes.
Let's dump the memory where the instance is located.
I have inserted brackets for the 52 bytes.
Edit: Forgot length in first posting.
The listing is slightly incorrect because as romkyns points out the instance actually begins at the address - 4 and the first field is the Syncblock.
好问题! 我想亲自看看,这似乎是尝试 CorDbg.exe 的好机会...
似乎对于简单的整数数组,格式是:
其中 s 是同步块,l 是数组的长度,并且然后是各个元素。 好像最后有一个finally 0,我不知道为什么。
对于多维数组:
其中 s 是同步块,t 是元素总数,l1 第一个维度的长度,l2 第二个维度的长度,然后是两个零?,后面依次是所有元素,最后是一个零再次。
对象数组被视为整数数组,这次内容是引用。 交错数组是对象数组,其中引用指向其他数组。
Great question! I wanted to see it for myself, and it seemed a good opportunity to try out CorDbg.exe...
It seems that for simple integer arrays, the format is:
where s is the sync block, l the length of the array, and then the individual elements. It seems that there is a finally 0 at the end, I'm not sure why that is.
For multidimensional arrays:
where s is the sync block, t the total number of elements, l1 the length of the first dimension, l2 the length of the second dimension, then two zeroes?, followed by all the elements sequentially, and finally a zero again.
Object arrays are treated as the integer array, the contents are references this time. Jagged arrays are object arrays where the references point to other arrays.
很好的问题。 我发现 这篇文章包含值类型和引用类型的框图。 另请参阅这篇文章 其中 Ritcher 指出:
Great question. I found this article which contains block diagrams for both value types and reference types. Also see this article in which Ritcher states:
数组对象必须存储它有多少个维度以及每个维度的长度。 因此,至少还有一个数据元素需要添加到您的模型中
An array object would have to store how many dimensions it has and the length of each dimension. So there is at least one more data element to add to your model