读取压缩文件并写入新文件将不允许解压
我有一个测试程序,它演示了我希望的最终结果(即使在这个测试程序中,这些步骤似乎是不必要的)。
该程序使用 GZipStream 将数据压缩到文件。生成的压缩文件是C:\mydata.dat。
然后我读取这个文件,并将其写入一个新文件。
//Read original file
string compressedFile = String.Empty;
using (StreamReader reader = new StreamReader(@"C:\mydata.dat"))
{
compressedFile = reader.ReadToEnd();
reader.Close();
reader.Dispose();
}
//Write to a new file
using (StreamWriter file = new StreamWriter(@"C:\mynewdata.dat"))
{
file.WriteLine(compressedUserFile);
}
当我尝试解压缩这两个文件时,原始文件可以完美解压,但新文件会抛出 InvalidDataException 并显示消息 GZip 标头中的幻数不正确。确保您传递的是 GZip 流。
为什么这些文件不同?
I have a test program that demonstrates the end result that I am hoping for (even though in this test program the steps may seem unnecessary).
The program compresses data to a file using GZipStream. The resulting compressed file is C:\mydata.dat.
I then read this file, and write it to a new file.
//Read original file
string compressedFile = String.Empty;
using (StreamReader reader = new StreamReader(@"C:\mydata.dat"))
{
compressedFile = reader.ReadToEnd();
reader.Close();
reader.Dispose();
}
//Write to a new file
using (StreamWriter file = new StreamWriter(@"C:\mynewdata.dat"))
{
file.WriteLine(compressedUserFile);
}
When I try to decompress the two files, the original one decompresses perfectly, but the new file throws an InvalidDataException with message The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.
Why are these files different?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
StreamReader
用于读取字符序列,而不是字节。这同样适用于StremWriter
。由于将压缩文件视为字符流没有任何意义,因此您应该使用Stream
的某种实现。如果您想以字节数组的形式获取流,可以使用MemoryStream
。使用字符流不起作用的确切原因是它们默认采用 UTF-8 编码。如果某个字节不是有效的 UTF-8(例如标头的第二个字节 0x8B),则它表示为 Unicode“替换字符”(U+FFFD)。当字符串写回时,该字符将使用 UTF-8 编码为与源中完全不同的内容。
例如,要从流中读取文件,请将其作为字节数组获取,然后将其作为流写入另一个文件:
CopyTo()
方法仅在 .Net 4 中可用,但是您可以编写自己的如果您使用旧版本。当然,对于这个简单的例子,不需要使用流。您可以简单地执行以下操作:
StreamReader
is for reading a sequence of characters, not bytes. The same applies toStremWriter
. Since treating compressed files as a stream of characters doesn't make any sense, you should use some implementation ofStream
. If you want to get the stream as an array of bytes, you can useMemoryStream
.The exact reason why using character streams doesn't work is that they assume the UTF-8 encoding by default. If some byte is not valid UTF-8 (like the second byte of the header, 0x8B), it's represented as Unicode “replacement character” (U+FFFD). When the string is written back, that character is encoded using UTF-8 into something completely different than what was in the source.
For example, to read a file from a stream, get it as an array of bytes and then write it to another files as a stream:
The
CopyTo()
method is only available in .Net 4, but you can write your own if you use older versions.Of course, for this simple example, there is no need to use streams. You can simply do:
编辑:显然,我的建议是错误/无效/无论如何......请使用其他建议之一,这些建议无疑已经被高度重构到无法实现额外性能的程度(否则,这意味着它们是就像我的一样无效)
读取所有字节
读取所有字节/写入所有字节(来自 svick 的答案):
使用其他答案进行性能测试:
刚刚在我的答案(StreamReader)(上面的第一部分,文件)之间进行了快速测试复制)和斯维克的答案(FileStream/MemoryStream)(第一个)。测试是代码的 1000 次迭代,以下是 4 次测试的结果(结果以整秒为单位,所有实际结果都略高于这些值):
正如您所看到的,至少在我的测试中,我的代码表现更好。也许需要注意的一件事是我没有读取字符流,实际上我正在访问提供字节流的 BaseStream。也许斯维克的回答很慢,因为他使用两个流进行读取,然后使用两个流进行写入。当然,可以对 svick 的答案进行很多优化以提高性能(他还提供了简单文件复制的替代方案)
使用第三个选项(ReadAllBytes/WriteAllBytes)进行测试
注意:以毫秒为单位,第三个选项始终是更好的
EDIT: Apparently, my suggestions are wrong/invalid/whatever... please use one of the others which have no doubt been highly re-factored to the point where no extra performance could be possible be achieved (else, that would mean they are just as invalid as mine)
Read all bytes
Read all bytes/write all bytes (from svick's answer):
PERFORMANCE TESTING WITH OTHER ANSWERS:
Just did a quick test between my Answer (StreamReader) (first part above, file copy) and svick's answer (FileStream/MemoryStream) (the first one). The test is 1000 iterations of the code, here are the results from 4 tests (results are in whole seconds, all actual result where slightly over these values):
As you can see, in my test at least, my code performed better. One thing perhaps to note with mine is I am not reading a character stream, I am in fact accessing the BaseStream which is providing a byte stream. Perhaps svick's answer is slow because he is using two streams for reading, then two for writing. Of course, there is a lot of optimisation that could be done to svick's answer to improve the performance (and he also provided an alternative for simple file copy)
Testing with third option (ReadAllBytes/WriteAllBytes)
Note: in milliseconds the 3rd option was always better