解析自定义二进制平面文件的首选方法?
我有一个由 C 程序生成的平面文件。文件中的每条记录都由固定长度的标头和后面的数据组成。标头包含指示后续数据大小的字段。我的最终目标是编写一个 C#/.NET 程序来查询此平面文件,因此我正在寻找使用 C# 读取文件的最有效方法。
我无法在以下代码中找到第 7 行的 .NET 等效项。据我所知,我必须发出多次读取(使用 BinaryReader 针对标头的每个字段发出一次读取),然后发出一次读取以获取标头后面的数据。我正在尝试学习一种在两次读取操作中解析记录的方法(一次读取以获取固定长度标头,第二次读取以获取以下数据)。
这是我尝试使用 C#/.NET 复制的 C 代码:
struct header header; /* 1-byte aligned structure (48 bytes) */
char *data;
FILE* fp = fopen("flatfile", "r");
while (!feof(fp))
{
fread(&header, 48, 1, fp);
/* Read header.length number of bytes to get the data. */
data = (char*)malloc(header.length);
fread(data, header.length, 1, fp);
/* Do stuff... */
free(data);
}
这是标头的 C 结构:
struct header
{
char id[2];
char toname[12];
char fromname[12];
char routeto[6];
char routefrom[6];
char flag1;
char flag2;
char flag3;
char flag4;
char cycl[4];
unsigned short len;
};
我想出了这个 C# 对象来表示 C 标头:
[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Ansi, Size = 48)]
class RouterHeader
{
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 2)]
char[] Type;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 12)]
char[] To;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 12)]
char[] From;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 6)]
char[] RouteTo;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 6)]
char[] RouteFrom;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
char[] Flags;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
char[] Cycle;
UInt16 Length;
}
I have a flat file generated by a C program. Each record in the file consists of a fixed length header followed by data. The header contains a field indicating the size of the following data. My ultimate goal is to write a C#/.NET program to query this flat file, so I'm looking for the most efficient way to read the file using C#.
I am having trouble finding the .NET equivalent of line 7 in the following code. As far as I can tell, I have to issue multiple reads (one for each field of the header using BinaryReader) and then issue one read to get the data following the header. I'm trying to learn a way to parse a record in two read operations (one read to get the fixed length header and a second read to get the following data).
This is the C code I am trying to duplicate using C#/.NET:
struct header header; /* 1-byte aligned structure (48 bytes) */
char *data;
FILE* fp = fopen("flatfile", "r");
while (!feof(fp))
{
fread(&header, 48, 1, fp);
/* Read header.length number of bytes to get the data. */
data = (char*)malloc(header.length);
fread(data, header.length, 1, fp);
/* Do stuff... */
free(data);
}
This is C structure of the header:
struct header
{
char id[2];
char toname[12];
char fromname[12];
char routeto[6];
char routefrom[6];
char flag1;
char flag2;
char flag3;
char flag4;
char cycl[4];
unsigned short len;
};
I've come up with this C# object to represent the C header:
[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Ansi, Size = 48)]
class RouterHeader
{
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 2)]
char[] Type;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 12)]
char[] To;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 12)]
char[] From;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 6)]
char[] RouteTo;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 6)]
char[] RouteFrom;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
char[] Flags;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
char[] Cycle;
UInt16 Length;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
好吧,您可以使用一次对
Stream.Read
的调用来读取长度(尽管您需要检查返回值以确保您已读取了您要求的所有内容;您可能无法得到它一次完成),然后再次调用 Stream.Read 将数据本身获取到字节数组中(再次循环,直到读完任何内容为止)。一旦全部进入内存,您就可以从缓冲区中挑选适当的字节来创建结构(或类)的实例。就我个人而言,我更喜欢显式地完成所有这些操作,而不是使用
StructLayout
- 后者对我来说总是感觉有些脆弱。Well, you can use one call to
Stream.Read
to read the length (although you need to check the return value to make sure you've read everything you've asked for; you may not get it all in one go) and then another call toStream.Read
to get the data itself into a byte array (again, looping until you've read anything). Once it's all in memory, you can pick out the appropriate bytes from the buffer to create an instance of your struct (or class).Personally I prefer to do all of this explicitly rather than using
StructLayout
- the latter always feels somewhat brittle to me.作为一种替代方法,您可以尝试使用类似联合的结构来创建一个头结构,您可以一次性读取该头结构(例如,作为适当长度的字符串),但是当您需要时,可以引用各个字段。来自该结构的信息。
您可以找到有关使用 StructLayouts 和 FieldOffsets 来实现此类操作的更多详细信息 这里。
关于阅读和阅读还有一些进一步的讨论。 此处使用 C# 编写二进制文件。建议使用 BinaryReader 读取多个字段,对于少量(<40)字段通常更有效。
As an alternative, you could try using a union-like structure to create a header struct that you could read in one go (as a String of an appropriate length for example), but then are able to reference the individual fields when you're information from that struct.
You can find some more details on using StructLayouts and FieldOffsets to achieve that sort of thing here.
There's some further discussion on reading & writing binary files with C# here. It's suggested that using BinaryReader to read in multiple fields is generally more efficient for small (<40) number of fields.
我建议您只编写代码(每个字段一条语句)来一一读取字段。这是一些额外的代码,但提供了更多的灵活性。首先,它使您不再需要内存中的数据结构必须与磁盘上的文件具有相同的布局。它可以是另一个结构的一部分,例如,您可以使用
String
代替char[]
。还要考虑:如果您需要编写 2.0 版本,在结构体末尾添加一个新字段怎么办?在您的示例中,您需要定义一个新的结构,并且您将不得不使用这两个定义。如果您选择在代码中读/写,则可以通过有条件地读取新元素来使用相同的代码来支持两者。
I would reccomend you just write code (one statement per field) that reads the fields one by one. It is a little extra code, but gives more flexibility. To begin with, it relieves you from the requirement that your in memory datastructure has to have the same layout as the file has on disk. It could be part of another structure, you can use
String
in stead ofchar[]
, for example.Also consider: What if you need to write a version 2.0, where a new field is added at the end of the struct? In your example, you'd need to define a new struct, and you'll be stuck with both definitions. If you choose the read/write in code, you can support both with the same code by reading the new element conditionally.
我倾向于将数据读入数组,然后适当地组装数据对象,使用移位和添加来处理单词、长字等。我有一些实用程序类来处理此类事情。
My inclination would be to read the data into an array, and then assemble the data object appropriately, using shifts and adds to handle words, longwords, etc. I have some utility classes to handle that sort of thing.
Hans Passant 提供的链接有答案。我会给他信用,但我不知道该怎么做,因为他发布的是评论而不是答案。
The link Hans Passant provided has the answer. I would give him credit, but I'm not sure what to do since he posted as a comment instead of an answer.