编组 C++ 的最有效方法是什么? 结构到 C#?
我即将开始读取大量二进制文件,每个文件包含 1000 条或更多记录。 新文件不断添加,因此我正在编写一个 Windows 服务来监视目录并在收到新文件时对其进行处理。 这些文件是使用 C++ 程序创建的。 我已经在 C# 中重新创建了结构定义,并且可以很好地读取数据,但我担心我这样做的方式最终会终止我的应用程序。
using (BinaryReader br = new BinaryReader(File.Open("myfile.bin", FileMode.Open)))
{
long pos = 0L;
long length = br.BaseStream.Length;
CPP_STRUCT_DEF record;
byte[] buffer = new byte[Marshal.SizeOf(typeof(CPP_STRUCT_DEF))];
GCHandle pin;
while (pos < length)
{
buffer = br.ReadBytes(buffer.Length);
pin = GCHandle.Alloc(buffer, GCHandleType.Pinned);
record = (CPP_STRUCT_DEF)Marshal.PtrToStructure(pin.AddrOfPinnedObject(), typeof(CPP_STRUCT_DEF));
pin.Free();
pos += buffer.Length;
/* Do stuff with my record */
}
}
我认为我不需要使用 GCHandle 因为我实际上没有与 C++ 应用程序通信,一切都是通过托管代码完成的,但我不知道有替代方法。
I am about to begin reading tons of binary files, each with 1000 or more records. New files are added constantly so I'm writing a Windows service to monitor the directories and process new files as they are received. The files were created with a c++ program. I've recreated the struct definitions in c# and can read the data fine, but I'm concerned that the way I'm doing it will eventually kill my application.
using (BinaryReader br = new BinaryReader(File.Open("myfile.bin", FileMode.Open)))
{
long pos = 0L;
long length = br.BaseStream.Length;
CPP_STRUCT_DEF record;
byte[] buffer = new byte[Marshal.SizeOf(typeof(CPP_STRUCT_DEF))];
GCHandle pin;
while (pos < length)
{
buffer = br.ReadBytes(buffer.Length);
pin = GCHandle.Alloc(buffer, GCHandleType.Pinned);
record = (CPP_STRUCT_DEF)Marshal.PtrToStructure(pin.AddrOfPinnedObject(), typeof(CPP_STRUCT_DEF));
pin.Free();
pos += buffer.Length;
/* Do stuff with my record */
}
}
I don't think I need to use GCHandle because I'm not actually communicating with the C++ app, everything is being done from managed code, but I don't know of an alternative method.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
使用 Marshal.PtrToStructure 相当慢。 我发现 CodeProject 上的以下文章比较(和基准测试)读取二进制数据的不同方法非常有帮助:
Using
Marshal.PtrToStructure
is rather slow. I found the following article on CodeProject which is comparing (and benchmarking) different ways of reading binary data very helpful:对于您的特定应用程序,只有一件事可以给您明确的答案:对其进行分析。
这就是我在使用大型 PInvoke 解决方案时学到的经验教训。 编组数据最有效的方法是编组可直接传送的字段。 这意味着 CLR 可以简单地执行相当于 memcpy 的操作,在本机代码和托管代码之间移动数据。 简而言之,从结构中获取所有非内联数组和字符串。 如果它们存在于本机结构中,则用 IntPtr 表示它们,并根据需要将值编组到托管代码中。
我还没有分析过使用 Marshal.PtrToStructure 与使用本机 API 取消引用该值之间的区别。 如果通过分析发现 PtrToStructure 是瓶颈,这可能是您应该投资的东西。
对于大型层次结构,按需编组与一次性将整个结构拉入托管代码中。 在处理大型树结构时,我最常遇到这个问题。 如果单个节点是可直接传送的并且从性能角度来看,编组单个节点的速度非常快,因此它只能编组您当时需要的内容。
For your particular application, only one thing will give you the definitive answer: Profile it.
That being said here are the lessons I've learned while working with large PInvoke solutions. The most effective way to marshal data is to marshal fields which are blittable. Meaning the CLR can simple do what amounts to a memcpy to move data between native and managed code. In simple terms, get all of the non-inline arrays and strings out of your structures. If they are present in the native structure, represent them with an IntPtr and marshal the values on demand into managed code.
I haven't ever profiled the difference between using Marshal.PtrToStructure vs. having a native API dereference the value. This is probably something you should invest in should PtrToStructure be revealed as a bottleneck via profiling.
For large hierarchies marshal on demand vs. pulling an entire structure into managed code at a single time. I've run into this issue the most when dealing with large tree structures. Marshalling an individual node is very fast if it's blittable and performance wise it works out to only marshal what you need at that moment.
除了JaredPar的全面答案之外,您不需要使用
GCHandle
,您可以使用不安全的代码代替。GCHandle
/fixed
语句的全部目的是固定/固定特定的内存段,使内存从 GC 的角度来看不可移动。 如果内存是可移动的,任何重定位都会导致指针无效。但不确定哪种方式更快。
In addition to JaredPar's comprehensive answer, you don't need to use
GCHandle
, you can use unsafe code instead.The whole purpose of the
GCHandle
/fixed
statement is to pin/fix the particular memory segment, making the memory immovable from GC's point of view. If the memory was movable, any relocation would render your pointers invalid.Not sure which way is faster though.
这可能超出了你的问题的范围,但我倾向于在托管 C++ 中编写一个小程序集,它可以执行 fread() 或类似的快速读取结构的操作。 一旦您读入它们,您就可以使用 C# 来完成您需要的所有其他操作。
This may be outside the bounds of your question, but I would be inclined to write a little assembly in Managed C++ that did an fread() or something similarly fast to read in the structs. Once you've got them read in, you can use C# to do everything else you need with them.
这是我不久前在处理结构化文件时制作的一个小课程。 这是我当时能想到的最快的方法,避免变得不安全(这是我试图替换并保持可比性能的方法。)
使用:(
这里很新,希望发布的内容不会太多......刚刚粘贴在课堂上,没有删除评论或任何缩短它的内容。)
here's a small class i made a while back while playing with structured files. it was the fastest method i could figure out at the time shy of going unsafe (which was what i was trying to replace and maintain comparable performance.)
to use:
(pretty new here, hope that wasn't too much to post... just pasted in the class, didn't chop out the comments or anything to shorten it.)
看来这与 C++ 和编组无关。 你知道结构你还需要什么。
显然,您需要一个简单的代码,它将读取表示一个结构的一组字节,然后使用 BitConverter 将字节放入相应的 C# 字段中。
It seems this has nothing to do with neither C++ nor marshalling. You know the structure what else do you need.
Obviously you need a simple code which will read group of bytes representing one struct and then using BitConverter to place bytes into corresponding C# fields..