原始流有数据，Deflate 返回零字节

发布于 2024-10-18 01:33:12 字数 2871 浏览 5 评论 0原文

我正在读取数据（碰巧是一个 adCenter 报告），该数据应该是压缩的。使用普通流读取内容，我得到了几千字节的乱码，所以这似乎是合理的。所以我将流提供给 DeflateStream。

首先，它报告“块长度与其补码不匹配”。简短的搜索表明存在一个双字节前缀，实际上，如果我在打开 DeflateStream 之前调用 ReadByte() 两次，异常就会消失。

但是，DeflateStream 现在根本不返回任何内容。我花了整个下午的大部分时间来寻找线索，但没有运气。帮助我，StackOverflow，你是我唯一的希望！谁能告诉我我错过了什么？

这是代码。当然，在测试时我一次只启用两个注释块之一。

_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
  {
  // Skip the zlib prefix, which conflicts with the deflate specification
  compressed.ReadByte();  compressed.ReadByte();

  // Reports reading 3,000-odd bytes, followed by random characters
  /*byte[]  buffer    = new byte[4096];
  int     bytesRead = compressed.Read(buffer, 0, 4096);
  Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
  string  content   = Encoding.ASCII.GetString(buffer, 0, bytesRead);
  Console.WriteLine(content);*/

  using (DeflateStream decompressed = new DeflateStream(compressed, CompressionMode.Decompress))
    {
    // Reports reading 0 bytes, and no output
    /*byte[]  buffer    = new byte[4096];
    int     bytesRead = decompressed.Read(buffer, 0, 4096);
    Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
    string  content   = Encoding.ASCII.GetString(buffer, 0, bytesRead);
    Console.WriteLine(content);*/

    using (StreamReader reader = new StreamReader(decompressed))
      while (reader.EndOfStream == false)
        _results.Add(reader.ReadLine().Split('\t'));
    }
  }

您可能从最后一行猜到，解压缩的内容应该是 TDT。

只是为了好玩，我尝试使用 GZipStream 解压缩，但它报告幻数不正确。 MS 的文档只是说“下载的报告是使用 zip 压缩进行压缩的。您必须先解压缩报告，然后才能使用其内容。”

这是最终起作用的代码。我必须将内容保存到文件中，然后将其读回。这似乎不合理，但对于我正在处理的少量数据来说，这是可以接受的，我接受它！

WebRequest   request  = HttpWebRequest.Create(reportURL);
WebResponse  response = request.GetResponse();

_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
  {
  // Save the content to a temporary location
  string  zipFilePath = @"\\Server\Folder\adCenter\Temp.zip";
  using (StreamWriter file = new StreamWriter(zipFilePath))
    {
    compressed.CopyTo(file.BaseStream);
    file.Flush();
    }

  // Get the first file from the temporary zip
  ZipFile  zipFile = ZipFile.Read(zipFilePath);
  if (zipFile.Entries.Count > 1)  throw new ApplicationException("Found " + zipFile.Entries.Count.ToString("#,##0") + " entries in the report; expected 1.");
  ZipEntry  report = zipFile[0];

  // Extract the data
  using (MemoryStream decompressed = new MemoryStream())
    {
    report.Extract(decompressed);
    decompressed.Position = 0;  // Note that the stream does NOT start at the beginning
    using (StreamReader reader = new StreamReader(decompressed))
      while (reader.EndOfStream == false)
        _results.Add(reader.ReadLine().Split('\t'));
    }
  }

原文

I'm reading data (an adCenter report, as it happens), which is supposed to be zipped. Reading the contents with an ordinary stream, I get a couple thousand bytes of gibberish, so this seems reasonable. So I feed the stream to DeflateStream.

First, it reports "Block length does not match with its complement." A brief search suggests that there is a two-byte prefix, and indeed if I call ReadByte() twice before opening DeflateStream, the exception goes away.

However, DeflateStream now returns nothing at all. I've spent most of the afternoon chasing leads on this, with no luck. Help me, StackOverflow, you're my only hope! Can anyone tell me what I'm missing?

Here's the code. Naturally I only enabled one of the two commented blocks at a time when testing.

_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
  {
  // Skip the zlib prefix, which conflicts with the deflate specification
  compressed.ReadByte();  compressed.ReadByte();

  // Reports reading 3,000-odd bytes, followed by random characters
  /*byte[]  buffer    = new byte[4096];
  int     bytesRead = compressed.Read(buffer, 0, 4096);
  Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
  string  content   = Encoding.ASCII.GetString(buffer, 0, bytesRead);
  Console.WriteLine(content);*/

  using (DeflateStream decompressed = new DeflateStream(compressed, CompressionMode.Decompress))
    {
    // Reports reading 0 bytes, and no output
    /*byte[]  buffer    = new byte[4096];
    int     bytesRead = decompressed.Read(buffer, 0, 4096);
    Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
    string  content   = Encoding.ASCII.GetString(buffer, 0, bytesRead);
    Console.WriteLine(content);*/

    using (StreamReader reader = new StreamReader(decompressed))
      while (reader.EndOfStream == false)
        _results.Add(reader.ReadLine().Split('\t'));
    }
  }

As you can probably guess from the last line, the unzipped content should be TDT.

Just for fun, I tried decompressing with GZipStream, but it reports that the magic number is not correct. MS' docs just say "The downloaded report is compressed by using zip compression. You must unzip the report before you can use its contents."

Here's the code that finally worked. I had to save the content out to a file and read it back in. This does not seem reasonable, but for the small quantities of data I'm working with, it's acceptable, I'll take it!

WebRequest   request  = HttpWebRequest.Create(reportURL);
WebResponse  response = request.GetResponse();

_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
  {
  // Save the content to a temporary location
  string  zipFilePath = @"\\Server\Folder\adCenter\Temp.zip";
  using (StreamWriter file = new StreamWriter(zipFilePath))
    {
    compressed.CopyTo(file.BaseStream);
    file.Flush();
    }

  // Get the first file from the temporary zip
  ZipFile  zipFile = ZipFile.Read(zipFilePath);
  if (zipFile.Entries.Count > 1)  throw new ApplicationException("Found " + zipFile.Entries.Count.ToString("#,##0") + " entries in the report; expected 1.");
  ZipEntry  report = zipFile[0];

  // Extract the data
  using (MemoryStream decompressed = new MemoryStream())
    {
    report.Extract(decompressed);
    decompressed.Position = 0;  // Note that the stream does NOT start at the beginning
    using (StreamReader reader = new StreamReader(decompressed))
      while (reader.EndOfStream == false)
        _results.Add(reader.ReadLine().Split('\t'));
    }
  }

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

墨小沫ゞ 2024-10-25 01:33:12

您会发现 DeflateStream 在解压缩的数据方面受到极大限制。事实上，如果您期望完整的文件，那么它根本没有用。
ZIP 文件有数百种（大部分是较小的）变体，DeflateStream 只能处理其中的两到三个。

最好的方法可能是使用专用库来读取 Zip 文件/流，例如 DotNetZip 或 SharpZipLib（有些未维护）。

回复收藏 0 原文

韵柒 2024-10-25 01:33:12

您可以将流写入文件并在其上尝试我的工具 Precomp 。如果您像这样使用它：

precomp -c- -v [name of input file]

将检测文件内的任何 ZIP/gZip 流，并报告一些详细信息（流的位置和长度）。此外，如果它们可以逐位相同地解压缩和重新压缩，则输出文件将包含解压缩的流。

Precomp 会检测文件中任何位置的 ZIP/gZip（以及其他一些）流，因此您不必担心文件开头的标头字节或垃圾。

如果它没有检测到这样的流，请尝试添加 -slow，它会检测 deflate 流，即使它们没有 ZIP/gZip 标头。如果失败，您可以尝试 -brute 它甚至可以检测缺少两个字节标头的 deflate 流，但这会非常慢并且可能导致误报。

之后，您将知道文件中是否有（有效的）deflate 流，如果有，附加信息应帮助您使用 zLib 解压缩例程或类似程序正确解压缩其他报告。

You could write the stream to a file and try my tool Precomp on it. If you use it like this:

precomp -c- -v [name of input file]

any ZIP/gZip stream(s) inside the file will be detected and some verbose information will be reported (position and length of the stream). Additionally, if they can be decompressed and recompressed bit-to-bit identical, the output file will contain the decompressed stream(s).

Precomp detects ZIP/gZip (and some other) streams anywhere in the file, so you won't have to worry about header bytes or garbage at the beginning of the file.

If it doesn't detect a stream like this, try to add -slow, which detects deflate streams even if they don't have a ZIP/gZip header. If this fails, you can try -brute which even detects deflate streams that lack the two byte header, but this will be extremely slow and can cause false positives.

After that, you'll know if there is a (valid) deflate stream in the file and if so, the additional information should help you to decompress other reports correctly using zLib decompression routines or similar.

回复收藏 0 原文

~没有更多了~