.NET 数组的开销?
我试图使用以下代码确定 .NET 数组(在 32 位进程中)上标头的开销:
long bytes1 = GC.GetTotalMemory(false);
object[] array = new object[10000];
for (int i = 0; i < 10000; i++)
array[i] = new int[1];
long bytes2 = GC.GetTotalMemory(false);
array[0] = null; // ensure no garbage collection before this point
Console.WriteLine(bytes2 - bytes1);
// Calculate array overhead in bytes by subtracting the size of
// the array elements (40000 for object[10000] and 4 for each
// array), and dividing by the number of arrays (10001)
Console.WriteLine("Array overhead: {0:0.000}",
((double)(bytes2 - bytes1) - 40000) / 10001 - 4);
Console.Write("Press any key to continue...");
Console.ReadKey();
结果是
204800
Array overhead: 12.478
In a 32-bit process, object[1] should be the same size as int[1 ],但实际上开销增加了 3.28 个字节,
237568
Array overhead: 15.755
有人知道为什么吗?
(顺便说一句,如果有人好奇的话,非数组对象的开销,例如上面循环中的 (object)i ,大约是 8 个字节 (8.384)。我听说在 64 位进程中是 16 个字节。)
I was trying to determine the overhead of the header on a .NET array (in a 32-bit process) using this code:
long bytes1 = GC.GetTotalMemory(false);
object[] array = new object[10000];
for (int i = 0; i < 10000; i++)
array[i] = new int[1];
long bytes2 = GC.GetTotalMemory(false);
array[0] = null; // ensure no garbage collection before this point
Console.WriteLine(bytes2 - bytes1);
// Calculate array overhead in bytes by subtracting the size of
// the array elements (40000 for object[10000] and 4 for each
// array), and dividing by the number of arrays (10001)
Console.WriteLine("Array overhead: {0:0.000}",
((double)(bytes2 - bytes1) - 40000) / 10001 - 4);
Console.Write("Press any key to continue...");
Console.ReadKey();
The result was
204800
Array overhead: 12.478
In a 32-bit process, object[1] should be the same size as int[1], but in fact the overhead jumps by 3.28 bytes to
237568
Array overhead: 15.755
Anyone know why?
(By the way, if anyone's curious, the overhead for non-array objects, e.g. (object)i in the loop above, is about 8 bytes (8.384). I heard it's 16 bytes in 64-bit processes.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是一个稍微简洁(IMO)的简短但完整的程序来演示相同的事情:
但我得到了相同的结果 - 任何引用类型数组的开销都是 16 字节,而任何值类型数组的开销都是 12 字节。我仍在尝试在 CLI 规范的帮助下找出原因。不要忘记引用类型数组是协变的,这可能是相关的...
编辑:在 cordbg 的帮助下,我可以确认 Brian 的答案 - 无论实际元素类型如何,引用类型数组的类型指针都是相同的。据推测,
object.GetType()
(记住,它是非虚拟的)中有一些有趣的东西来解释这一点。因此,使用以下代码:
我们最终得到如下内容:
请注意,我已将内存 1 个字转储到变量本身的值之前。
对于
x
和y
,值为:对于
z
,值为:不同值类型数组(byte[]、int) [] 等)最终会得到不同的类型指针,而所有引用类型数组都使用相同的类型指针,但具有不同的元素类型指针。元素类型指针与您找到的该类型对象的类型指针的值相同。因此,如果我们在上面的运行中查看字符串对象的内存,它将有一个类型指针 0x00329134。
类型指针之前的单词肯定与监视器或哈希码有关:调用 GetHashCode() 填充该内存位,我相信默认的 < code>object.GetHashCode() 获取同步块以确保对象生命周期内哈希码的唯一性。然而,仅仅执行
lock(x){}
并没有做任何事情,这让我感到惊讶......所有这些仅对“向量”类型有效,顺便说一下 - 在 CLR 中, “向量”类型是一个下限为 0 的一维数组。其他数组将具有不同的布局 - 一方面,它们需要存储下限......
到目前为止,这一直是实验,但这里是猜测——系统以现有方式实施的原因。从现在开始,我真的只是猜测。
object[]
数组可以共享相同的JIT代码。它们在内存分配、数组访问、长度属性和(重要的是)GC 引用布局方面的行为方式相同。与值类型数组相比,不同的值类型可能有不同的 GC“足迹”(例如,一个可能有一个字节,然后有一个引用,其他的则根本没有引用,等等)。每次在
object[]
中分配一个值时,运行时都需要检查它是否有效。它需要检查您用于新元素值的引用的对象的类型是否与数组的元素类型兼容。例如:这就是我前面提到的协方差。现在考虑到每一次赋值都会发生这种情况,因此减少间接寻址的数量是有意义的。特别是,我怀疑您并不真的想通过必须转到每个分配的类型对象来获取元素类型来破坏缓存。我怀疑(并且我的 x86 程序集不足以验证这一点)测试类似于:
如果我们可以在前三个步骤中终止搜索,则不会有太多间接 - 这对于像数组赋值一样经常发生的事情很有好处。对于值类型赋值来说,这一切都不需要发生,因为这是静态可验证的。
因此,这就是为什么我认为引用类型数组比值类型数组稍大。
很好的问题 - 深入研究它真的很有趣:)
Here's a slightly neater (IMO) short but complete program to demonstrate the same thing:
But I get the same results - the overhead for any reference type array is 16 bytes, whereas the overhead for any value type array is 12 bytes. I'm still trying to work out why that is, with the help of the CLI spec. Don't forget that reference type arrays are covariant, which may be relevant...
EDIT: With the help of cordbg, I can confirm Brian's answer - the type pointer of a reference-type array is the same regardless of the actual element type. Presumably there's some funkiness in
object.GetType()
(which is non-virtual, remember) to account for this.So, with code of:
We end up with something like the following:
Note that I've dumped the memory 1 word before the value of the variable itself.
For
x
andy
, the values are:For
z
, the values are:Different value type arrays (byte[], int[] etc) end up with different type pointers, whereas all reference type arrays use the same type pointer, but have a different element type pointer. The element type pointer is the same value as you'd find as the type pointer for an object of that type. So if we looked at a string object's memory in the above run, it would have a type pointer of 0x00329134.
The word before the type pointer certainly has something to do with either the monitor or the hash code: calling
GetHashCode()
populates that bit of memory, and I believe the defaultobject.GetHashCode()
obtains a sync block to ensure hash code uniqueness for the lifetime of the object. However, just doinglock(x){}
didn't do anything, which surprised me...All of this is only valid for "vector" types, by the way - in the CLR, a "vector" type is a single-dimensional array with a lower-bound of 0. Other arrays will have a different layout - for one thing, they'd need the lower bound stored...
So far this has been experimentation, but here's the guesswork - the reason for the system being implemented the way it has. From here on, I really am just guessing.
object[]
arrays can share the same JIT code. They're going to behave the same way in terms of memory allocation, array access,Length
property and (importantly) the layout of references for the GC. Compare that with value type arrays, where different value types may have different GC "footprints" (e.g. one might have a byte and then a reference, others will have no references at all, etc).Every time you assign a value within an
object[]
the runtime needs to check that it's valid. It needs to check that the type of the object whose reference you're using for the new element value is compatible with the element type of the array. For instance:This is the covariance I mentioned earlier. Now given that this is going to happen for every single assignment, it makes sense to reduce the number of indirections. In particular, I suspect you don't really want to blow the cache by having to go to the type object for each assigment to get the element type. I suspect (and my x86 assembly isn't good enough to verify this) that the test is something like:
If we can terminate the search in the first three steps, there's not a lot of indirection - which is good for something that's going to happen as often as array assignments. None of this needs to happen for value type assignments, because that's statically verifiable.
So, that's why I believe reference type arrays are slightly bigger than value type arrays.
Great question - really interesting to delve into it :)
数组是一种引用类型。所有参考类型都带有两个附加字字段。类型引用和 SyncBlock 索引字段,其中用于在 CLR 中实现锁。因此,引用类型的类型开销在 32 位上为 8 个字节。除此之外,数组本身还存储另外 4 个字节的长度。这使得总开销达到 12 个字节。
我刚刚从 Jon Skeet 的回答中了解到,引用类型数组有额外的 4 个字节的开销。这可以使用 WinDbg 进行确认。事实证明,附加字是数组中存储的类型的另一个类型引用。所有引用类型的数组都在内部存储为
object[]
,并附加对实际类型的类型对象的引用。因此,string[]
实际上只是一个object[]
,带有对string
类型的附加类型引用。详情请参阅下文。存储在数组中的值:引用类型的数组保存对对象的引用,因此数组中的每个条目都是引用的大小(即 32 位上的 4 个字节)。值类型数组内联存储值,因此每个元素将占用相关类型的大小。
这个问题可能也很有趣: C# Listsize 与 double[] size
详细信息
考虑以下代码
附加 WinDbg 显示以下内容:
首先让我们看一下值类型数组。
首先,我们转储数组和值为 42 的一个元素。可以看出,大小为 16 字节。其中 4 个字节用于 int32 值本身,8 个字节用于常规引用类型开销,另外 4 个字节用于数组的长度。
原始转储显示了 SyncBlock、
int[]
的方法表、长度和值 42(十六进制为 2a)。请注意,SyncBlock 位于对象引用的前面。接下来,让我们查看
string[]
以了解附加单词的用途。首先我们转储数组和字符串。接下来我们转储字符串[]的大小。请注意,WinDbg 在此将类型列为
System.Object[]
。本例中的对象大小包括字符串本身,因此总大小是数组中的 20 加上字符串的 40。通过转储实例的原始字节,我们可以看到以下内容:首先我们有 SyncBlock,然后是
object[]
的方法表,然后是数组的长度。之后,我们通过引用字符串的方法表找到额外的 4 个字节。这可以通过 dumpmt 命令进行验证,如上所示。最后我们找到了对实际字符串实例的单个引用。总结
数组的开销可以细分如下(即 32 位)
object[]
),即值类型数组的开销是 12 个字节 和引用类型数组为 16 字节。
Array is a reference type. All reference types carry two additional word fields. The type reference and a SyncBlock index field, which among other things is used to implement locks in the CLR. So the type overhead on reference types is 8 bytes on 32 bit. On top of that the array itself also stores the length which is another 4 bytes. This brings the total overhead to 12 bytes.
And I just learned from Jon Skeet's answer, arrays of reference types has an additional 4 bytes overhead. This can be confirmed using WinDbg. It turns out that the additional word is another type reference for the type stored in the array. All arrays of reference types are stored internally as
object[]
, with the additional reference to the type object of the actual type. So astring[]
is really just anobject[]
with an additional type reference to the typestring
. For details please see below.Values stored in arrays: Arrays of reference types hold references to objects, so each entry in the array is the size of a reference (i.e. 4 bytes on 32 bit). Arrays of value types store the values inline and thus each element will take up the size of the type in question.
This question may also be of interest: C# List<double> size vs double[] size
Gory Details
Consider the following code
Attaching WinDbg shows the following:
First let's take a look at the value type array.
First we dump the array and the one element with value of 42. As can be seen the size is 16 bytes. That is 4 bytes for the
int32
value itself, 8 bytes for regular reference type overhead and another 4 bytes for the length of the array.The raw dump shows the SyncBlock, the method table for
int[]
, the length, and the value of 42 (2a in hex). Notice that the SyncBlock is located just in front of the object reference.Next, let's look at the
string[]
to find out what the additional word is used for.First we dump the array and the string. Next we dump the size of the
string[]
. Notice that WinDbg lists the type asSystem.Object[]
here. The object size in this case includes the string itself, so the total size is the 20 from the array plus the 40 for the string.By dumping the raw bytes of the instance we can see the following: First we have the SyncBlock, then follows the method table for
object[]
, then the length of the array. After that we find the additional 4 bytes with the reference to the method table for string. This can be verified by the dumpmt command as shown above. Finally we find the single reference to the actual string instance.In conclusion
The overhead for arrays can be broken down as follows (on 32 bit that is)
object[]
under the hood)I.e. the overhead is 12 bytes for value type arrays and 16 bytes for reference type arrays.
我认为您在测量时做出了一些错误的假设,因为循环期间的内存分配(通过 GetTotalMemory)可能与数组的实际所需内存不同 - 内存可能被分配在更大的块中,可能还有其他对象 在循环期间回收的内存等。
以下是有关数组开销的一些信息:
I think you are making some faulty assumptions while measuring, as the memory allocation (via GetTotalMemory) during your loop may be different than the actual required memory for just the arrays - the memory may be allocated in larger blocks, there may be other objects in memory that are reclaimed during the loop, etc.
Here's some info for you on array overhead:
因为堆管理(因为您处理 GetTotalMemory)只能分配相当大的块,而后者是由 CLR 为程序员的目的按较小的块分配的。
Because heap management (since you deal with GetTotalMemory) can only allocate rather large blocks, which latter are allocated by smaller chunks for programmer purposes by CLR.