如何将字节块读入结构体
我有一个需要处理的资源文件,它包含一组文件。
首先,资源文件列出了其中包含的所有文件,以及一些其他数据,例如在这个结构中:
struct FileEntry{
byte Value1;
char Filename[12];
byte Value2;
byte FileOffset[3];
float whatever;
}
所以我需要读取这个大小的块。
我正在使用 FileStream 中的 Read 函数,但是如何指定结构的大小? 我使用:
int sizeToRead = Marshal.SizeOf(typeof(Header));
然后将此值传递给 Read,但随后我只能读取一组 byte[],我不知道如何将其转换为指定的值(我确实知道如何获取单字节值...但不是其余的)。
另外,我需要指定一个不安全的上下文,我不知道它是否正确......
在我看来,读取字节流比我在 .NET 中想象的要困难:)
谢谢!
I have this resource file which I need to process, wich packs a set of files.
First, the resource file lists all the files contained within, plus some other data, such as in this struct:
struct FileEntry{
byte Value1;
char Filename[12];
byte Value2;
byte FileOffset[3];
float whatever;
}
So I would need to read blocks exactly this size.
I am using the Read function from FileStream, but how can I specify the size of the struct?
I used:
int sizeToRead = Marshal.SizeOf(typeof(Header));
and then pass this value to Read, but then I can only read a set of byte[] which I do not know how to convert into the specified values (well I do know how to get the single byte values... but not the rest of them).
Also I need to specify an unsafe context which I don't know whether it's correct or not...
It seems to me that reading byte streams is tougher than I thought in .NET :)
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
假设这是 C#,我不会创建一个结构体作为 FileEntry 类型。我将用字符串替换 char[20] 并使用 BinaryReader - http://msdn.microsoft.com/en-us/library/system.io.binaryreader.aspx 读取各个字段。您必须按照写入数据的顺序读取数据。
例如:
如果您坚持拥有一个结构,则应该使您的结构不可变,并为每个字段创建一个带有参数的构造函数。
Assuming this is C#, I wouldn't create a struct as a FileEntry type. I would replace char[20] with strings and use a BinaryReader - http://msdn.microsoft.com/en-us/library/system.io.binaryreader.aspx to read individual fields. You must read the data in the same order as it was written.
Something like:
If you insist having a struct, you should make your struct immutable and create a constructor with arguments for each of your field.
如果可以使用不安全代码:
fixed 关键字将数组嵌入到结构中。由于它已修复,因此如果您不断创建这些问题并且从不放开它们,可能会导致 GC 问题。请记住,常量大小是 n*sizeof(t)。因此 Filename[12] 分配 24 个字节(每个字符是 2 个字节 unicode),FileOffset[3] 分配 3 个字节。如果您不处理磁盘上的 unicode 数据,这一点很重要。我建议将其更改为 byte[] 并将结构转换为可用的类,您可以在其中转换字符串。
如果您无法使用 unsafe,则可以使用整个 BinaryReader 方法:
不安全的方法几乎是即时的,速度要快得多,特别是当您一次转换大量结构时。问题是你想使用 unsafe.我的建议是,如果您绝对需要性能提升,则仅使用不安全的方法。
If you can use unsafe code:
The fixed keyword embeds the array in the struct. Since it is fixed, this can cause GC issues if you are constantly creating these and never letting them go. Keep in mind that the constant sizes are the n*sizeof(t). So the Filename[12] is allocating 24 bytes (each char is 2 bytes unicode) and FileOffset[3] is allocating 3 bytes. This matters if you're not dealing with unicode data on disk. I would recommend changing it to a byte[] and converting the struct to a usable class where you can convert the string.
If you can't use unsafe, you can do the whole BinaryReader approach:
The unsafe way is nearly instant, far faster, especially when you're converting a lot of structs at once. The question is do you want to use unsafe. My recommendation is only use the unsafe method if you absolutely need the performance boost.
使用
BinaryReader
包装您的FileStream
将为您提供原始类型的专用Read*()
方法:http://msdn.microsoft.com/en-us/library /system.io.binaryreader.aspx
在我看来,您可能可以用
[StructLayout(LayoutKind.Sequential)]
标记您的struct
(以确保正确的表示在内存中)并使用unsafe
块中的指针来实际填充 C 风格的结构体。但是,如果您并不真正需要它(互操作、图像处理等繁重操作),则不建议使用不安全
。Wrapping your
FileStream
with aBinaryReader
will give you dedicatedRead*()
methods for primitive types:http://msdn.microsoft.com/en-us/library/system.io.binaryreader.aspx
Out of my head, you could probably mark your
struct
with[StructLayout(LayoutKind.Sequential)]
(to ensure proper representation in memory) and use a pointer inunsafe
block to actually fill the struct C-style. Goingunsafe
is not recommended if you don't really need it (interop, heavy operations like image processing and so on) however.基于这篇文章,只是我把它变得通用,这是将数据直接编组到结构的方法。对于较长的数据类型非常有用。
用法示例:
Base on this article, only I have made it generic, this is how to marshal the data directly to the struct. Very useful on longer data types.
Example Usage:
不是一个完整的答案(我认为它已被覆盖),而是关于文件名的具体注释:
Char
类型可能不是 C# 中的单字节事物,因为 .Net 字符是 unicode,这意味着它们支持远远超过 255 的字符值,因此将文件名数据解释为Char[]
数组会出现问题。因此,第一步肯定是将其读取为Byte[12]
,而不是Char[12]
。不过,也不建议从字节数组直接转换为字符数组,因为在这样的二进制索引中,比允许的 12 个字符短的文件名可能会用“00”字节填充,因此直接转换将产生始终为 12 个字符长的字符串,并且可能以这些零字符结尾。
但是,不建议简单地修剪这些零,因为此类数据的读取系统通常只是读取到第一个遇到的零,并且如果写入系统,则数组中后面的数据实际上可能包含垃圾在将字符串放入缓冲区之前,不需要专门用零清除缓冲区。这是很多程序都懒得做的事情,因为它们假设读取系统无论如何都只会解释字符串到第一个零。
因此,假设这确实是一个典型的零终止(C 风格)字符串,以每个字符一个字节的文本编码(如 ASCII、DOS-437 或 Win-1252)保存,第二步是剪切离开第一个零的字符串。您可以使用 Linq 的
TakeWhile
函数轻松完成此操作。然后第三步也是最后一步是将生成的字节数组转换为字符串,无论它所使用的每字符一个字节的文本编码恰好是:正如我所说,编码可能类似于纯 ASCII,它可以可从
Encoding.ASCII
(标准美国 DOS 编码,即Encoding.GetEncoding(437)
)或 Windows-1252(标准美国/西欧 Windows 文本编码)进行访问,您可以检索使用Encoding.GetEncoding("Windows-1252")
。Not a full answer (it's been covered I think), but a specific note on the filename:
The
Char
type is probably not a one-byte thing in C#, since .Net characters are unicode, meaning they support character values far beyond 255, so interpreting your filename data asChar[]
array will give problems. So the first step is definitely to read that asByte[12]
, notChar[12]
.A straight conversion from byte array to char array is also not advised, though, since in binary indices like this, filenames that are shorter than the allowed 12 characters will probably be padded with '00' bytes, so a straight conversion will result in a string that's always 12 characters long and might end on these zero-characters.
However, simply trimming these zeroes off is not advised, since reading systems for such data usually simply read up to the first encountered zero, and the data behind that in the array might actually contain garbage if the writing system doesn't bother to specifically clear its buffer with zeroes before putting the string into it. It's something a lot of programs don't bother doing, since they assume the reading system will only interpret the string up to the first zero anyway.
So, assuming this is indeed such a typical zero-terminated (C-style) string, saved in a one-byte-per-character text encoding (like ASCII, DOS-437 or Win-1252), the second step is to cut off the string on the first zero. You can easily do this with Linq's
TakeWhile
function. Then the third and final step is to convert the resulting byte array to string with whatever that one-byte-per-character text encoding it's written with happens to be:As I said, the encoding will probably be something like pure ASCII, which can be accessed from
Encoding.ASCII
, standard US DOS encoding, which isEncoding.GetEncoding(437)
, or Windows-1252, the standard US / western Europe Windows text encoding, which you can retrieve withEncoding.GetEncoding("Windows-1252")
.