从二进制文件中的结构中解析内容

发布于 2024-09-03 06:19:20 字数 704 浏览 6 评论 0原文

使用 C#,我需要读取使用 FORTRAN 创建的打包二进制文件。该文件以“未格式化的顺序”格式存储,如下所述(大约在“未格式化的顺序文件”部分的页面中间):

http://www.tacc.utexas.edu/services/userguides/intel8/fc/f_ug1/pggfmsp.htm

作为您可以从 URL 中看到,该文件被组织成 130 字节或更少的“块”,并且每个块周围包含 2 个长度字节(由 FORTRAN 编译器插入)。

因此,我需要找到一种有效的方法来解析实际文件负载,使其脱离编译器插入的格式。

从文件中提取实际有效负载后,我需要将其解析为不同的数据类型。这将是下一个练习。

我的第一个想法是使用 File.ReadAllBytes 将整个文件放入字节数组中。然后,只需迭代字节,跳过格式化并将实际数据传输到第二个字节数组。

最后,第二个字节数组应该包含实际的文件内容减去所有格式,然后我需要返回以获得我需要的内容。

由于我对 C# 相当陌生,我认为可能有一种更好、更容易接受的方法来解决这个问题。

另外,如果有帮助的话,这些文件可能相当大(比如 30MB),尽管大多数文件会小得多......

Using C#, I need to read a packed binary file created using FORTRAN. The file is stored in an "Unformatted Sequential" format as described here (about half-way down the page in the "Unformatted Sequential Files" section):

http://www.tacc.utexas.edu/services/userguides/intel8/fc/f_ug1/pggfmsp.htm

As you can see from the URL, the file is organized into "chunks" of 130 bytes or less and includes 2 length bytes (inserted by the FORTRAN compiler) surrounding each chunk.

So, I need to find an efficient way to parse the actual file payload away from the compiler-inserted formatting.

Once I've extracted the actual payload from the file, I'll then need to parse it up into its varying data types. That'll be the next exercise.

My first thoughts are to slurp up the entire file into a byte array using File.ReadAllBytes. Then, just iterate through the bytes, skipping the formatting and transferring the actual data to a second byte array.

In the end, that second byte array should contain the actual file contents minus all the formatting, which I'd then need to go back through to get what I need.

As I'm fairly new to C#, I thought there might be a better, more accepted way of tackling this.

Also, in case it's helpful, these files could be fairly large (say 30MB), though most will be much smaller...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

吹梦到西洲 2024-09-10 06:19:20

读取此类文件的一种方法是逐条记录(例如,读取长度字节,然后读取数据块,构建记录列表,这些记录只是字节数组)。然后记录的集合被传递到进一步的解析例程。

但是,如果您使用的是 4.0,则有一个 用于文件映射的新类,它会更高效,但工作方式与ReadAllBytes类似。

如果您使用的是 ReadAllBytes 或 MemoryMappedFile,最好先解析所有记录长度,在大型二进制文件中构建内存“索引”。如果您只需要某些记录,这尤其有用。

One way to read files like this is record by record (e.g., read the length bytes and then the data chunk, building up a list of records, which are just byte arrays). The collection of records is then passed to further parsing routines.

However, if you're on 4.0, there is a new class for file mapping which would be more efficient yet work similarly to ReadAllBytes.

If you're using ReadAllBytes or MemoryMappedFile it's nice to build an in-memory "index" into the large binary file by parsing all the record lengths first. This is especially useful if you will only need certain records.

怪我闹别瞎闹 2024-09-10 06:19:20

不要遍历字节,而是查看 System.IO.BinaryReader。将文件作为 FileStream 打开,将其包装在 BinaryReader 中,然后您可以直接从中读取原始类型,同时流指针会跟踪您在 Blob 中的偏移量。您可能必须自己考虑字节顺序和自定义类型,也许可以在读取单个字节的方法之上为 BinaryReader 构建自己的扩展方法。

如果您确实需要字节数组中的数据,并且首先将数组包装在 MemoryStream 中,则仍然可以使用 BinaryReader

对于这么大的文件,我会避开 File.ReadAllBytesFileStream 应该为您缓冲,斯蒂芬关于使用内存映射文件的建议听起来像是一个更复杂(可能更有效)的替代方案,特别是如果您需要进行第二次格式化。

Rather than iterate through the bytes, take a look at System.IO.BinaryReader. Open the file as a FileStream, wrap it in a BinaryReader, and you can read primitive types from it directly, with the stream pointer keeping track of your offset into the blob. You might have to account for endianness and custom types yourself, maybe building your own extension methods for BinaryReader on top of its method for reading individual bytes.

If you do need the data in a byte array, you can still use BinaryReader if you wrap the array in a MemoryStream first.

With files that large, I'd steer clear of File.ReadAllBytes. FileStream should buffer for you, and Stephen's suggestion for using memory-mapped files sounds like a more sophisticated (possibly more efficient) alternative to that, especially if you need to make a second pass for the formatting.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文