如何将字节块读入结构体

发布于 2024-11-27 17:22:55 字数 552 浏览 4 评论 0原文

我有一个需要处理的资源文件，它包含一组文件。

首先，资源文件列出了其中包含的所有文件，以及一些其他数据，例如在这个结构中：

struct FileEntry{
     byte Value1;
     char Filename[12];
     byte Value2;
     byte FileOffset[3];
     float whatever;
}

所以我需要读取这个大小的块。

我正在使用 FileStream 中的 Read 函数，但是如何指定结构的大小？我使用：

int sizeToRead = Marshal.SizeOf(typeof(Header));

然后将此值传递给 Read，但随后我只能读取一组 byte[]，我不知道如何将其转换为指定的值（我确实知道如何获取单字节值...但不是其余的）。

另外，我需要指定一个不安全的上下文，我不知道它是否正确......

在我看来，读取字节流比我在 .NET 中想象的要困难:)

谢谢！

原文

I have this resource file which I need to process, wich packs a set of files.

First, the resource file lists all the files contained within, plus some other data, such as in this struct:

struct FileEntry{
     byte Value1;
     char Filename[12];
     byte Value2;
     byte FileOffset[3];
     float whatever;
}

So I would need to read blocks exactly this size.

I am using the Read function from FileStream, but how can I specify the size of the struct?
I used:

int sizeToRead = Marshal.SizeOf(typeof(Header));

and then pass this value to Read, but then I can only read a set of byte[] which I do not know how to convert into the specified values (well I do know how to get the single byte values... but not the rest of them).

Also I need to specify an unsafe context which I don't know whether it's correct or not...

It seems to me that reading byte streams is tougher than I thought in .NET :)

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尤怨 2024-12-04 17:22:55

假设这是 C#，我不会创建一个结构体作为 FileEntry 类型。我将用字符串替换 char[20] 并使用 BinaryReader - http://msdn.microsoft.com/en-us/library/system.io.binaryreader.aspx 读取各个字段。您必须按照写入数据的顺序读取数据。

例如：

class FileEntry {
     byte Value1;
     char[] Filename;
     byte Value2;
     byte[] FileOffset;
     float whatever;
}

  using (var reader = new BinaryReader(File.OpenRead("path"))) {
     var entry = new FileEntry {
        Value1 = reader.ReadByte(),
        Filename = reader.ReadChars(12) // would replace this with string
        FileOffset = reader.ReadBytes(3),
        whatever = reader.ReadFloat()           
     };
  }

如果您坚持拥有一个结构，则应该使您的结构不可变，并为每个字段创建一个带有参数的构造函数。

Assuming this is C#, I wouldn't create a struct as a FileEntry type. I would replace char[20] with strings and use a BinaryReader - http://msdn.microsoft.com/en-us/library/system.io.binaryreader.aspx to read individual fields. You must read the data in the same order as it was written.

Something like:

class FileEntry {
     byte Value1;
     char[] Filename;
     byte Value2;
     byte[] FileOffset;
     float whatever;
}

  using (var reader = new BinaryReader(File.OpenRead("path"))) {
     var entry = new FileEntry {
        Value1 = reader.ReadByte(),
        Filename = reader.ReadChars(12) // would replace this with string
        FileOffset = reader.ReadBytes(3),
        whatever = reader.ReadFloat()           
     };
  }

If you insist having a struct, you should make your struct immutable and create a constructor with arguments for each of your field.

回复收藏 0 原文

淡淡の花香 2024-12-04 17:22:55

如果可以使用不安全代码：

unsafe struct FileEntry{
     byte Value1;
     fixed char Filename[12];
     byte Value2;
     fixed byte FileOffset[3];
     float whatever;
}

public unsafe FileEntry Get(byte[] src)
{
     fixed(byte* pb = &src[0])
     {
         return *(FileEntry*)pb;
     } 
}

fixed 关键字将数组嵌入到结构中。由于它已修复，因此如果您不断创建这些问题并且从不放开它们，可能会导致 GC 问题。请记住，常量大小是 n*sizeof(t)。因此 Filename[12] 分配 24 个字节（每个字符是 2 个字节 unicode），FileOffset[3] 分配 3 个字节。如果您不处理磁盘上的 unicode 数据，这一点很重要。我建议将其更改为 byte[] 并将结构转换为可用的类，您可以在其中转换字符串。

如果您无法使用 unsafe，则可以使用整个 BinaryReader 方法：

public unsafe FileEntry Get(Stream src)
{
     FileEntry fe = new FileEntry();
     var br = new BinaryReader(src);
     fe.Value1 = br.ReadByte();
     ...
}

不安全的方法几乎是即时的，速度要快得多，特别是当您一次转换大量结构时。问题是你想使用 unsafe.我的建议是，如果您绝对需要性能提升，则仅使用不安全的方法。

If you can use unsafe code:

unsafe struct FileEntry{
     byte Value1;
     fixed char Filename[12];
     byte Value2;
     fixed byte FileOffset[3];
     float whatever;
}

public unsafe FileEntry Get(byte[] src)
{
     fixed(byte* pb = &src[0])
     {
         return *(FileEntry*)pb;
     } 
}

The fixed keyword embeds the array in the struct. Since it is fixed, this can cause GC issues if you are constantly creating these and never letting them go. Keep in mind that the constant sizes are the n*sizeof(t). So the Filename[12] is allocating 24 bytes (each char is 2 bytes unicode) and FileOffset[3] is allocating 3 bytes. This matters if you're not dealing with unicode data on disk. I would recommend changing it to a byte[] and converting the struct to a usable class where you can convert the string.

If you can't use unsafe, you can do the whole BinaryReader approach:

public unsafe FileEntry Get(Stream src)
{
     FileEntry fe = new FileEntry();
     var br = new BinaryReader(src);
     fe.Value1 = br.ReadByte();
     ...
}

The unsafe way is nearly instant, far faster, especially when you're converting a lot of structs at once. The question is do you want to use unsafe. My recommendation is only use the unsafe method if you absolutely need the performance boost.

回复收藏 0 原文

勿挽旧人 2024-12-04 17:22:55

使用 BinaryReader 包装您的 FileStream 将为您提供原始类型的专用 Read*() 方法：
http://msdn.microsoft.com/en-us/library /system.io.binaryreader.aspx

在我看来，您可能可以用 [StructLayout(LayoutKind.Sequential)] 标记您的 struct （以确保正确的表示在内存中）并使用 unsafe 块中的指针来实际填充 C 风格的结构体。但是，如果您并不真正需要它（互操作、图像处理等繁重操作），则不建议使用不安全。

回复收藏 0 原文

奢华的一滴泪 2024-12-04 17:22:55

基于这篇文章，只是我把它变得通用，这是将数据直接编组到结构的方法。对于较长的数据类型非常有用。

public static T RawDataToObject<T>(byte[] rawData) where T : struct
{
    var pinnedRawData = GCHandle.Alloc(rawData,
                                       GCHandleType.Pinned);
    try
    {
        // Get the address of the data array
        var pinnedRawDataPtr = pinnedRawData.AddrOfPinnedObject();

        // overlay the data type on top of the raw data
        return (T) Marshal.PtrToStructure(pinnedRawDataPtr, typeof(T));
    }
    finally
    {
        // must explicitly release
        pinnedRawData.Free();
    }
}

用法示例：

[StructLayout(LayoutKind.Sequential)]
public struct FileEntry
{
    public readonly byte Value1;

    //you may need to play around with this one
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 12)]
    public readonly string Filename;

    public readonly byte Value2;

    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
    public readonly byte[] FileOffset;

    public readonly float whatever;
}

private static void Main(string[] args)
{
    byte[] data =;//from file stream or whatever;
    //usage
    FileEntry entry = RawDataToObject<FileEntry>(data);
}

Base on this article, only I have made it generic, this is how to marshal the data directly to the struct. Very useful on longer data types.

public static T RawDataToObject<T>(byte[] rawData) where T : struct
{
    var pinnedRawData = GCHandle.Alloc(rawData,
                                       GCHandleType.Pinned);
    try
    {
        // Get the address of the data array
        var pinnedRawDataPtr = pinnedRawData.AddrOfPinnedObject();

        // overlay the data type on top of the raw data
        return (T) Marshal.PtrToStructure(pinnedRawDataPtr, typeof(T));
    }
    finally
    {
        // must explicitly release
        pinnedRawData.Free();
    }
}

Example Usage:

[StructLayout(LayoutKind.Sequential)]
public struct FileEntry
{
    public readonly byte Value1;

    //you may need to play around with this one
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 12)]
    public readonly string Filename;

    public readonly byte Value2;

    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
    public readonly byte[] FileOffset;

    public readonly float whatever;
}

private static void Main(string[] args)
{
    byte[] data =;//from file stream or whatever;
    //usage
    FileEntry entry = RawDataToObject<FileEntry>(data);
}

回复收藏 0 原文

梦忆晨望 2024-12-04 17:22:55

不是一个完整的答案（我认为它已被覆盖），而是关于文件名的具体注释：

Char 类型可能不是 C# 中的单字节事物，因为 .Net 字符是 unicode，这意味着它们支持远远超过 255 的字符值，因此将文件名数据解释为 Char[] 数组会出现问题。因此，第一步肯定是将其读取为 Byte[12]，而不是 Char[12]。

不过，也不建议从字节数组直接转换为字符数组，因为在这样的二进制索引中，比允许的 12 个字符短的文件名可能会用“00”字节填充，因此直接转换将产生始终为 12 个字符长的字符串，并且可能以这些零字符结尾。

但是，不建议简单地修剪这些零，因为此类数据的读取系统通常只是读取到第一个遇到的零，并且如果写入系统，则数组中后面的数据实际上可能包含垃圾在将字符串放入缓冲区之前，不需要专门用零清除缓冲区。这是很多程序都懒得做的事情，因为它们假设读取系统无论如何都只会解释字符串到第一个零。

因此，假设这确实是一个典型的零终止（C 风格）字符串，以每个字符一个字节的文本编码（如 ASCII、DOS-437 或 Win-1252）保存，第二步是剪切离开第一个零的字符串。您可以使用 Linq 的 TakeWhile 函数轻松完成此操作。然后第三步也是最后一步是将生成的字节数组转换为字符串，无论它所使用的每字符一个字节的文本编码恰好是：

public String StringFromCStringArray(Byte[] readData, Encoding encoding)
{
    return encoding.GetString(readData.TakeWhile(x => x != 0).ToArray());
}

正如我所说，编码可能类似于纯 ASCII，它可以可从 Encoding.ASCII（标准美国 DOS 编码，即 Encoding.GetEncoding(437)）或 Windows-1252（标准美国/西欧 Windows 文本编码）进行访问，您可以检索使用Encoding.GetEncoding("Windows-1252")。

Not a full answer (it's been covered I think), but a specific note on the filename:

The Char type is probably not a one-byte thing in C#, since .Net characters are unicode, meaning they support character values far beyond 255, so interpreting your filename data as Char[] array will give problems. So the first step is definitely to read that as Byte[12], not Char[12].

A straight conversion from byte array to char array is also not advised, though, since in binary indices like this, filenames that are shorter than the allowed 12 characters will probably be padded with '00' bytes, so a straight conversion will result in a string that's always 12 characters long and might end on these zero-characters.

However, simply trimming these zeroes off is not advised, since reading systems for such data usually simply read up to the first encountered zero, and the data behind that in the array might actually contain garbage if the writing system doesn't bother to specifically clear its buffer with zeroes before putting the string into it. It's something a lot of programs don't bother doing, since they assume the reading system will only interpret the string up to the first zero anyway.

So, assuming this is indeed such a typical zero-terminated (C-style) string, saved in a one-byte-per-character text encoding (like ASCII, DOS-437 or Win-1252), the second step is to cut off the string on the first zero. You can easily do this with Linq's TakeWhile function. Then the third and final step is to convert the resulting byte array to string with whatever that one-byte-per-character text encoding it's written with happens to be:

public String StringFromCStringArray(Byte[] readData, Encoding encoding)
{
    return encoding.GetString(readData.TakeWhile(x => x != 0).ToArray());
}

As I said, the encoding will probably be something like pure ASCII, which can be accessed from Encoding.ASCII, standard US DOS encoding, which is Encoding.GetEncoding(437), or Windows-1252, the standard US / western Europe Windows text encoding, which you can retrieve with Encoding.GetEncoding("Windows-1252").

回复收藏 0 原文

~没有更多了~