.NET 数组的内存布局

发布于 2024-07-11 23:40:29 字数 802 浏览 6 评论 0原文

.NET 数组的内存布局是什么?

以这个数组为例:

Int32[] x = new Int32[10];

我知道数组的大部分是这样的:

0000111122223333444455556666777788889999

其中每个字符都是一个字节,数字对应于数组的索引。

另外,我知道有一个类型引用,以及所有对象的同步块索引,因此上面的内容可以调整为:

ttttssss0000111122223333444455556666777788889999
        ^
        +- object reference points here

此外,需要存储数组的长度,所以也许这更正确:

ttttssssllll0000111122223333444455556666777788889999
        ^
        +- object reference points here

是这样的吗?完全的? 数组中是否还有更多数据?

我问的原因是,我们试图估计一个相当大的数据语料库的几个不同的内存表示将占用多少内存,并且数组的大小变化很大,因此开销可能会有对一种解决方案影响很大,但对另一种解决方案影响可能不大。

所以基本上,对于一个数组来说,有多少开销,这基本上是我的问题。

数组不好小队醒来之前,这部分解决方案是静态构建一次引用经常类型的事情,因此这里不需要使用可增长列表。

What is the memory layout of a .NET array?

Take for instance this array:

Int32[] x = new Int32[10];

I understand that the bulk of the array is like this:

0000111122223333444455556666777788889999

Where each character is one byte, and the digits corresponds to indices into the array.

Additionally, I know that there is a type reference, and a syncblock-index for all objects, so the above can be adjusted to this:

ttttssss0000111122223333444455556666777788889999
        ^
        +- object reference points here

Additionally, the length of the array needs to be stored, so perhaps this is more correct:

ttttssssllll0000111122223333444455556666777788889999
        ^
        +- object reference points here

Is this complete? Are there more data in an array?

The reason I'm asking is that we're trying to estimate how much memory a couple of different in-memory representations of a rather large data corpus will take and the size of the arrays varies quite a bit, so the overhead might have a large impact in one solution, but perhaps not so much in the other.

So basically, for an array, how much overhead is there, that is basically my question.

And before the arrays are bad squad wakes up, this part of the solution is a static build-once-reference-often type of thing so using growable lists is not necessary here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

允世 2024-07-18 23:40:29

检查这一点的一种方法是查看 WinDbg 中的代码。 因此,给出下面的代码,让我们看看它是如何出现在堆上的。

var numbers = new Int32[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

要做的第一件事是找到实例。 由于我已将其设置为 Main() 中的本地变量,因此很容易找到实例的地址。

从地址中我们可以转储实际实例,这给了我们:

0:000> !do 0x0141ffc0
Name: System.Int32[]
MethodTable: 01309584
EEClass: 01309510
Size: 52(0x34) bytes
Array: Rank 1, Number of elements 10, Type Int32
Element Type: System.Int32
Fields:
None

这告诉我们这是我们的 Int32 数组,有 10 个元素,总大小为 52 字节。

让我们转储实例所在的内存。

0:000> d 0x0141ffc0
0141ffc0 [84 95 30 01 0a 00 00 00-00 00 00 00 01 00 00 00  ..0.............
0141ffd0  02 00 00 00 03 00 00 00-04 00 00 00 05 00 00 00  ................
0141ffe0  06 00 00 00 07 00 00 00-08 00 00 00 09 00 00 00  ................
0141fff0  00 00 00 00]a0 20 40 03-00 00 00 00 00 00 00 00  ..... @.........
01420000  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
01420010  10 6d 99 00 00 00 00 00-00 00 01 40 50 f7 3d 03  .m.........@P.=.
01420020  03 00 00 00 08 00 00 00-00 01 00 00 00 00 00 00  ................
01420030  1c 24 40 03 00 00 00 00-00 00 00 00 00 00 00 00  .$@.............

我已经为 52 字节插入了括号。

  • 前四个字节是对 01309584 处的方法表的引用。
  • 然后四个字节是数组的长度。
  • 接下来是数字 0 到 9(每四个字节)。
  • 最后四个字节为空。 我不完全确定,但我猜想如果实例用于锁定,那一定是存储对同步块数组的引用的位置。

编辑:忘记第一次发帖的长度。

该列表略有不正确,因为 romkyns 指出实例实际上从地址 - 4 开始,第一个字段是同步块。

One way to examine this is to look at the code in WinDbg. So given the code below, let's see how that appears on the heap.

var numbers = new Int32[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

The first thing to do is to locate the instance. As I have made this a local in Main(), it is easy to find the address of the instance.

From the address we can dump the actual instance, which gives us:

0:000> !do 0x0141ffc0
Name: System.Int32[]
MethodTable: 01309584
EEClass: 01309510
Size: 52(0x34) bytes
Array: Rank 1, Number of elements 10, Type Int32
Element Type: System.Int32
Fields:
None

This tells us that it is our Int32 array with 10 elements and a total size of 52 bytes.

Let's dump the memory where the instance is located.

0:000> d 0x0141ffc0
0141ffc0 [84 95 30 01 0a 00 00 00-00 00 00 00 01 00 00 00  ..0.............
0141ffd0  02 00 00 00 03 00 00 00-04 00 00 00 05 00 00 00  ................
0141ffe0  06 00 00 00 07 00 00 00-08 00 00 00 09 00 00 00  ................
0141fff0  00 00 00 00]a0 20 40 03-00 00 00 00 00 00 00 00  ..... @.........
01420000  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
01420010  10 6d 99 00 00 00 00 00-00 00 01 40 50 f7 3d 03  .m.........@P.=.
01420020  03 00 00 00 08 00 00 00-00 01 00 00 00 00 00 00  ................
01420030  1c 24 40 03 00 00 00 00-00 00 00 00 00 00 00 00  .$@.............

I have inserted brackets for the 52 bytes.

  • The first four bytes are the reference to the method table at 01309584.
  • Then four bytes for the Length of the array.
  • Following that are the numbers 0 to 9 (each four bytes).
  • The last four bytes are null. I'm not entirely sure, but I guess that must be where the reference to the syncblock array is stored if the instance is used for locking.

Edit: Forgot length in first posting.

The listing is slightly incorrect because as romkyns points out the instance actually begins at the address - 4 and the first field is the Syncblock.

九公里浅绿 2024-07-18 23:40:29

好问题! 我想亲自看看,这似乎是尝试 CorDbg.exe 的好机会...

似乎对于简单的整数数组,格式是:

ssssllll000011112222....nnnn0000

其中 s 是同步块,l 是数组的长度,并且然后是各个元素。 好像最后有一个finally 0,我不知道为什么。

对于多维数组:

ssssttttl1l1l2l2????????
    000011112222....nnnn000011112222....nnnn....000011112222....nnnn0000

其中 s 是同步块,t 是元素总数,l1 第一个维度的长度,l2 第二个维度的长度,然后是两个零?,后面依次是所有元素,最后是一个零再次。

对象数组被视为整数数组,这次内容是引用。 交错数组是对象数组,其中引用指向其他数组。

Great question! I wanted to see it for myself, and it seemed a good opportunity to try out CorDbg.exe...

It seems that for simple integer arrays, the format is:

ssssllll000011112222....nnnn0000

where s is the sync block, l the length of the array, and then the individual elements. It seems that there is a finally 0 at the end, I'm not sure why that is.

For multidimensional arrays:

ssssttttl1l1l2l2????????
    000011112222....nnnn000011112222....nnnn....000011112222....nnnn0000

where s is the sync block, t the total number of elements, l1 the length of the first dimension, l2 the length of the second dimension, then two zeroes?, followed by all the elements sequentially, and finally a zero again.

Object arrays are treated as the integer array, the contents are references this time. Jagged arrays are object arrays where the references point to other arrays.

迷路的信 2024-07-18 23:40:29

很好的问题。 我发现 这篇文章包含值类型和引用类型的框图。 另请参阅这篇文章 其中 Ritcher 指出:

[snip] 每个数组都有一些额外的
相关的开销信息
它。 该信息包含排名
数组的(维数),
每个维度的下界
数组(几乎总是 0),以及
每个维度的长度。 开销
还包含每个元素的类型
在数组中。

Great question. I found this article which contains block diagrams for both value types and reference types. Also see this article in which Ritcher states:

[snip] each array has some additional
overhead information associated with
it. This information contains the rank
of the array (number of dimensions),
the lower bounds for each dimension of
the array (almost always 0), and the
length of each dimension. The overhead
also contains the type of each element
in the array.

不知在何时 2024-07-18 23:40:29

数组对象必须存储它有多少个维度以及每个维度的长度。 因此,至少还有一个数据元素需要添加到您的模型中

An array object would have to store how many dimensions it has and the length of each dimension. So there is at least one more data element to add to your model

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文