读\写结构化二进制文件
我想读\写一个具有以下结构的二进制文件:
该文件由“RECORDS”组成”。每个“RECORD”具有以下结构: 我将使用第一条记录作为示例
- (红色)起始字节:0x5A(始终为 1 字节,固定值 0x5A)
- (绿色)长度字节:0x00 0x16(始终为 2 字节,值可以从 “0x00 0x02”到“0xFF 0xFF”)
- (蓝色)内容:由长度字段的十进制值减 2 表示的字节数。在这种情况下,长度字段值为 22(0x00 0x16 转换为十进制),因此内容将包含20 (22 - 2) 字节。
我的目标是逐条读取每条记录,并将其写入输出文件。 实际上我有一个读取函数和写入函数(一些伪代码):
private void Read(BinaryReader binaryReader, BinaryWriter binaryWriter)
{
byte START = 0x5A;
int decimalLenght = 0;
byte[] content = null;
byte[] length = new byte[2];
while (binaryReader.PeekChar() != -1)
{
//Check the first byte which should be equals to 0x5A
if (binaryReader.ReadByte() != START)
{
throw new Exception("0x5A Expected");
}
//Extract the length field value
length = binaryReader.ReadBytes(2);
//Convert the length field to decimal
int decimalLenght = GetLength(length);
//Extract the content field value
content = binaryReader.ReadBytes(decimalLenght - 2);
//DO WORK
//modifying the content
//Writing the record
Write(binaryWriter, content, length, START);
}
}
private void Write(BinaryWriter binaryWriter, byte[] content, byte[] length, byte START)
{
binaryWriter.Write(START);
binaryWriter.Write(length);
binaryWriter.Write(content);
}
这种方式实际上是有效的。 然而,由于我正在处理非常大的文件,我发现它根本无法执行,因为我对每个记录读写了 3 次。实际上,我想读取错误的数据块而不是少量字节,并且可能在内存中工作,但我使用 Stream 的经验仅限于 BinaryReader 和 BinaryWriter。提前致谢。
i want to read\write a binary file which has the following structure:
The file is composed by "RECORDS". Each "RECORD" has the following structure:
I will use the first record as example
- (red)START byte: 0x5A (always 1 byte, fixed value 0x5A)
- (green) LENGTH bytes: 0x00 0x16 (always 2 bytes, value can change from
"0x00 0x02" to "0xFF 0xFF") - (blue) CONTENT: Number of Bytes indicated by the decimal value of LENGTH Field minus 2. In this case LENGHT field value is 22 (0x00 0x16 converted to decimal), therefore the CONTENT will contain 20 (22 - 2) bytes.
My goal is to read each record one by one, and write it to an output file.
Actually i have a read function and write function (some pseudocode):
private void Read(BinaryReader binaryReader, BinaryWriter binaryWriter)
{
byte START = 0x5A;
int decimalLenght = 0;
byte[] content = null;
byte[] length = new byte[2];
while (binaryReader.PeekChar() != -1)
{
//Check the first byte which should be equals to 0x5A
if (binaryReader.ReadByte() != START)
{
throw new Exception("0x5A Expected");
}
//Extract the length field value
length = binaryReader.ReadBytes(2);
//Convert the length field to decimal
int decimalLenght = GetLength(length);
//Extract the content field value
content = binaryReader.ReadBytes(decimalLenght - 2);
//DO WORK
//modifying the content
//Writing the record
Write(binaryWriter, content, length, START);
}
}
private void Write(BinaryWriter binaryWriter, byte[] content, byte[] length, byte START)
{
binaryWriter.Write(START);
binaryWriter.Write(length);
binaryWriter.Write(content);
}
This way is actually working.
However since I am dealing with very large files i find it to be not performing at all, cause I Read and write 3 times foreach Record. Actually I would like to read bug chunks of data instead small amount of byte and maybe work in memory, but my experience in using Stream stops with BinaryReader and BinaryWriter. Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
FileStream
已经缓冲,所以我期望它工作得很好。如果确实需要,您始终可以在原始流周围创建一个 BufferedStream 来添加额外的缓冲,但我怀疑这会产生重大影响。你说它“根本没有表现”——它的运行速度有多快?您有多确定 IO 就是您的时间所在?您是否对代码进行过分析?
FileStream
is already buffered, so I'd expect it to work pretty well. You could always create aBufferedStream
around the original stream to add extra more buffering if you really need to, but I doubt it would make a significant difference.You say it's "not performing at all" - how fast is it working? How sure are you that the IO is where your time is going? Have you performed any profiling of the code?
我还可能建议您最初读取 3 个(或 6 个?)字节,而不是 2 个单独的读取。将初始字节放入一个小数组中,检查 5a ck 字节,然后检查 2 字节长度指示符,然后检查 3 字节 AFP 操作码,然后读取 AFP 记录的其余部分。
这是一个很小的差异,但它消除了您的一个读取调用。
我不是乔恩·斯基特,但我确实在最大的印刷厂之一工作过。在这个国家的邮局已经有一段时间了,我们主要做法新社的输出:-)
(不过通常是用 C 语言)
I might also suggest that you read 3 (or 6?) bytes initially, instead of 2 separate reads. Put the initial bytes in a small array, check the 5a ck-byte, then the 2 byte length indicator, then the 3 byte AFP op-code, THEN, read the remainder of the AFP record.
It's a small difference, but it gets rid of one of your read calls.
I'm no Jon Skeet, but I did work at one of the biggest print & mail shops in the country for quite a while, and we did mostly AFP output :-)
(usually in C, though)