在 C# 中,我需要将 T[] 写入流,最好没有任何额外的缓冲区。我有一个动态代码,可以将 T[] (其中 T 是无对象结构)转换为 void* 并将其修复在内存中,效果很好。当流是文件时,我可以使用本机 Windows API 直接传递 void *,但现在我需要写入一个采用 byte[] 的通用 Stream 对象。
问题:任何人都可以建议一种破解方法来创建一个虚拟数组对象,该对象实际上没有任何堆分配,而是指向一个已经存在的(且固定的)堆位置?
这是我需要的伪代码:
void Write(Stream stream, T[] buffer)
{
fixed( void* ptr = &buffer ) // done with dynamic code generation
{
int typeSize = sizeof(T); // done as well
byte[] dummy = (byte[]) ptr; // <-- how do I create this fake array?
stream.Write( dummy, 0, buffer.Length*typeSize );
}
}
更新:
我在 fixed(void* ptr=&buffer) “nofollow noreferrer”>这篇文章。我总是可以创建一个 byte[],在内存中修复它,并从一个指针到另一个指针进行不安全的字节复制,然后将该数组发送到流,但我希望避免不必要的额外分配和复制。
不可能吗?
经过进一步思考,byte[] 在堆中有一些元数据,其中包含数组维度和元素类型。简单地将引用(指针)作为 byte[] 传递给 T[] 可能不起作用,因为块的元数据仍然是 T[] 的元数据。即使元数据的结构相同,T[] 的长度也将远小于 byte[],因此托管代码对 byte[] 的任何后续访问都将生成不正确的结果。
请求的功能@ Microsoft Connect
请投票给 这个请求,希望MS能够听取。
In C#, I need to write T[] to a stream, ideally without any additional buffers. I have a dynamic code that converts T[] (where T is a no-objects struct) to a void* and fixes it in memory, and that works great. When the stream was a file, I could use native Windows API to pass the void * directly, but now I need to write to a generic Stream object that takes byte[].
Question: Can anyone suggest a hack way to create a dummy array object which does not actually have any heap allocations, but rather points to an already existing (and fixed) heap location?
This is the pseudo-code that I need:
void Write(Stream stream, T[] buffer)
{
fixed( void* ptr = &buffer ) // done with dynamic code generation
{
int typeSize = sizeof(T); // done as well
byte[] dummy = (byte[]) ptr; // <-- how do I create this fake array?
stream.Write( dummy, 0, buffer.Length*typeSize );
}
}
Update:
I described how to do fixed(void* ptr=&buffer)
in depth in this article. I could always create a byte[], fix it in memory and do an unsafe byte-copying from one pointer to another, and than send that array to the stream, but i was hoping to avoid unneeded extra allocation and copying.
Impossible?
Upon further thinking, the byte[] has some meta data in heap with the array dimensions and the element type. Simply passing a reference (pointer) to T[] as byte[] might not work because the meta data of the block would still be that of T[]. And even if the structure of the meta data is identical, the length of the T[] will be much less than the byte[], hence any subsequent access to byte[] by managed code will generate incorrect results.
Feature requested @ Microsoft Connect
Please vote for this request, hopefully MS will listen.
发布评论
评论(4)
这种代码永远无法以通用方式工作。它依赖于一个硬性假设,即 T 的内存布局是可预测的且一致的。仅当 T 是简单值类型时才是如此。暂时忽略字节序。如果 T 是引用类型,那么您就死定了,您将复制永远无法反序列化的跟踪句柄,您必须为 T 提供结构约束。
但这还不够,结构类型也不可复制。即使它们没有引用类型字段,您也无法限制它们。内部布局由JIT编译器决定。它会随意交换字段,选择一个字段正确对齐且结构值采用最小存储大小的字段。您要序列化的值只能由使用完全相同的 CPU 架构和 JIT 编译器版本运行的程序正确读取。
框架中已经有很多类可以完成您正在做的事情。最接近的匹配是 .NET 4.0 MemoryMappedViewAccessor 类。它需要执行相同的工作,使原始字节在内存映射文件中可用。其中的主力是 System.Runtime.InteropServices.SafeBuffer 类,使用 Reflector 来看看。不幸的是,您不能只复制该类,它依赖于 CLR 来进行转换。话又说回来,距离上市还剩一周的时间。
This kind of code can never work in a generic way. It relies on a hard assumption that the memory layout for T is predictable and consistent. That is only true if T is a simple value type. Ignoring endianness for a moment. You are dead in the water if T is a reference type, you'll be copying tracking handles that can never be deserialized, you'll have to give T the struct constraint.
But that's not enough, structure types are not copyable either. Not even if they have no reference type fields, something you can't constrain. The internal layout is determined by the JIT compiler. It swaps fields at its leisure, selecting one where the fields are properly aligned and the structure value take the minimum storage size. The value you'll serialize can only be read properly by a program that runs with the exact same CPU architecture and JIT compiler version.
There are already plenty of classes in the framework that do what you are doing. The closest match is the .NET 4.0 MemoryMappedViewAccessor class. It needs to do the same job, making raw bytes available in the memory mapped file. The workhorse there is the System.Runtime.InteropServices.SafeBuffer class, have a look-see with Reflector. Unfortunately, you can't just copy the class, it relies on the CLR to make the transformation. Then again, it is only another week before it's available.
由于stream.Write无法获取指针,因此您无法避免复制内存,因此速度会有所减慢。您可能想考虑使用 BinaryReader 和 BinaryWriter 来序列化您的对象,但这里的代码可以让您做您想做的事情。请记住,T 的所有成员也必须是结构体。
Because stream.Write cannot take a pointer, you cannot avoid copying memory, so you will have some slowdown. You might want to consider using a BinaryReader and BinaryWriter to serialize your objects, but here is code that will let you do what you want. Keep in mind that all members of T must also be structs.
查看我对相关问题的回答:
什么是最快的将 float[] 转换为 byte[] 的方法?
在其中,我临时将浮点数数组转换为字节数组,而无需内存分配和复制。
为此,我使用内存操作更改了 CLR 的元数据。
不幸的是,这个解决方案不太适合泛型。但是,您可以将此技巧与代码生成技术结合起来来解决您的问题。
Check out my answer to a related question:
What is the fastest way to convert a float[] to a byte[]?
In it I temporarily transform an array of floats to an array of bytes without memory allocation and copying.
To do this I changed the CLR's metadata using memory manipulation.
Unfortunately, this solution does not lend itself well to generics. However, you can combine this hack with code generation techniques to solve your problem.
看这篇文章 C#/VB.NET 中的内联 MSIL 和通用指针是获得梦想代码的最佳方式:)
Look this article Inline MSIL in C#/VB.NET and Generic Pointers the best way to get dream code :)