读\写结构化二进制文件

发布于 2024-12-13 12:03:14 字数 1638 浏览 0 评论 0原文

我想读\写一个具有以下结构的二进制文件:

在此处输入图像描述

该文件由“RECORDS”组成”。每个“RECORD”具有以下结构: 我将使用第一条记录作为示例

  • (红色)起始字节:0x5A(始终为 1 字节,固定值 0x5A)
  • (绿色)长度字节:0x00 0x16(始终为 2 字节,值可以从 “0x00 0x02”到“0xFF 0xFF”)
  • (蓝色)内容:由长度字段的十进制值减 2 表示的字节数。在这种情况下,长度字段值为 22(0x00 0x16 转换为十进制),因此内容将包含20 (22 - 2) 字节。

我的目标是逐条读取每条记录,并将其写入输出文件。 实际上我有一个读取函数和写入函数(一些伪代码):

private void Read(BinaryReader binaryReader, BinaryWriter binaryWriter)
{
    byte START = 0x5A;
    int decimalLenght = 0;
    byte[] content = null;
    byte[] length = new byte[2];

    while (binaryReader.PeekChar() != -1)
    {
        //Check the first byte which should be equals to 0x5A
        if (binaryReader.ReadByte() != START)
        {
            throw new Exception("0x5A Expected");
        }

        //Extract the length field value
        length = binaryReader.ReadBytes(2);

        //Convert the length field to decimal
        int decimalLenght = GetLength(length);

        //Extract the content field value
        content = binaryReader.ReadBytes(decimalLenght - 2);

        //DO WORK
        //modifying the content

        //Writing the record
        Write(binaryWriter, content, length, START);
    }
}

private void Write(BinaryWriter binaryWriter, byte[] content, byte[] length, byte START)
{
    binaryWriter.Write(START);
    binaryWriter.Write(length);
    binaryWriter.Write(content);   
}

这种方式实际上是有效的。 然而,由于我正在处理非常大的文件,我发现它根本无法执行,因为我对每个记录读写了 3 次。实际上,我想读取错误的数据块而不是少量字节,并且可能在内存中工作,但我使用 Stream 的经验仅限于 BinaryReader 和 BinaryWriter。提前致谢。

i want to read\write a binary file which has the following structure:

enter image description here

The file is composed by "RECORDS". Each "RECORD" has the following structure:
I will use the first record as example

  • (red)START byte: 0x5A (always 1 byte, fixed value 0x5A)
  • (green) LENGTH bytes: 0x00 0x16 (always 2 bytes, value can change from
    "0x00 0x02" to "0xFF 0xFF")
  • (blue) CONTENT: Number of Bytes indicated by the decimal value of LENGTH Field minus 2. In this case LENGHT field value is 22 (0x00 0x16 converted to decimal), therefore the CONTENT will contain 20 (22 - 2) bytes.

My goal is to read each record one by one, and write it to an output file.
Actually i have a read function and write function (some pseudocode):

private void Read(BinaryReader binaryReader, BinaryWriter binaryWriter)
{
    byte START = 0x5A;
    int decimalLenght = 0;
    byte[] content = null;
    byte[] length = new byte[2];

    while (binaryReader.PeekChar() != -1)
    {
        //Check the first byte which should be equals to 0x5A
        if (binaryReader.ReadByte() != START)
        {
            throw new Exception("0x5A Expected");
        }

        //Extract the length field value
        length = binaryReader.ReadBytes(2);

        //Convert the length field to decimal
        int decimalLenght = GetLength(length);

        //Extract the content field value
        content = binaryReader.ReadBytes(decimalLenght - 2);

        //DO WORK
        //modifying the content

        //Writing the record
        Write(binaryWriter, content, length, START);
    }
}

private void Write(BinaryWriter binaryWriter, byte[] content, byte[] length, byte START)
{
    binaryWriter.Write(START);
    binaryWriter.Write(length);
    binaryWriter.Write(content);   
}

This way is actually working.
However since I am dealing with very large files i find it to be not performing at all, cause I Read and write 3 times foreach Record. Actually I would like to read bug chunks of data instead small amount of byte and maybe work in memory, but my experience in using Stream stops with BinaryReader and BinaryWriter. Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

感性 2024-12-20 12:03:14

FileStream 已经缓冲,所以我期望它工作得很好。如果确实需要,您始终可以在原始流周围创建一个 BufferedStream 来添加额外的缓冲,但我怀疑这会产生重大影响。

你说它“根本没有表现”——它的运行速度有多快?您有多确定 IO 就是您的时间所在?您是否对代码进行过分析?

FileStream is already buffered, so I'd expect it to work pretty well. You could always create a BufferedStream around the original stream to add extra more buffering if you really need to, but I doubt it would make a significant difference.

You say it's "not performing at all" - how fast is it working? How sure are you that the IO is where your time is going? Have you performed any profiling of the code?

固执像三岁 2024-12-20 12:03:14

我还可能建议您最初读取 3 个(或 6 个?)字节,而不是 2 个单独的读取。将初始字节放入一个小数组中,检查 5a ck 字节,然后检查 2 字节长度指示符,然后检查 3 字节 AFP 操作码,然后读取 AFP 记录的其余部分。

这是一个很小的差异,但它消除了您的一个读取调用。

我不是乔恩·斯基特,但我确实在最大的印刷厂之一工作过。在这个国家的邮局已经有一段时间了,我们主要做法新社的输出:-)

(不过通常是用 C 语言)

I might also suggest that you read 3 (or 6?) bytes initially, instead of 2 separate reads. Put the initial bytes in a small array, check the 5a ck-byte, then the 2 byte length indicator, then the 3 byte AFP op-code, THEN, read the remainder of the AFP record.

It's a small difference, but it gets rid of one of your read calls.

I'm no Jon Skeet, but I did work at one of the biggest print & mail shops in the country for quite a while, and we did mostly AFP output :-)

(usually in C, though)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文