C# 解码(解压缩)PDF 文件的 Deflate 数据

发布于 2025-01-04 05:54:15 字数 1116 浏览 1 评论 0原文

我想在 C# 中解压缩一些 DeflateCoded 数据(提取的 PDF)。 不幸的是,我每次都会遇到异常“解码时发现无效数据。”。 但数据是有效的。

private void Decompress()
{
    FileStream fs = new FileStream(@"S:\Temp\myFile.bin", FileMode.Open);

    //First two bytes are irrelevant
    fs.ReadByte();
    fs.ReadByte();

    DeflateStream d_Stream = new DeflateStream(fs, CompressionMode.Decompress);

    StreamToFile(d_Stream, @"S:\Temp\myFile1.txt", FileMode.OpenOrCreate);

    d_Stream.Close();
    fs.Close();
}

private static void StreamToFile(Stream inputStream, string outputFile, FileMode fileMode)
{
    if (inputStream == null)
        throw new ArgumentNullException("inputStream");

    if (String.IsNullOrEmpty(outputFile))
        throw new ArgumentException("Argument null or empty.", "outputFile");

    using (FileStream outputStream = new FileStream(outputFile, fileMode, FileAccess.Write))
    {
        int cnt = 0;
        const int LEN = 4096;
        byte[] buffer = new byte[LEN];

        while ((cnt = inputStream.Read(buffer, 0, LEN)) != 0)
            outputStream.Write(buffer, 0, cnt);
    }
}

有人有一些想法吗? 谢谢。

I would like to decompress in C# some DeflateCoded data (PDF extracted).
Unfortunately I got every time the exception "Found invalid data while decoding.".
But the data are valid.

private void Decompress()
{
    FileStream fs = new FileStream(@"S:\Temp\myFile.bin", FileMode.Open);

    //First two bytes are irrelevant
    fs.ReadByte();
    fs.ReadByte();

    DeflateStream d_Stream = new DeflateStream(fs, CompressionMode.Decompress);

    StreamToFile(d_Stream, @"S:\Temp\myFile1.txt", FileMode.OpenOrCreate);

    d_Stream.Close();
    fs.Close();
}

private static void StreamToFile(Stream inputStream, string outputFile, FileMode fileMode)
{
    if (inputStream == null)
        throw new ArgumentNullException("inputStream");

    if (String.IsNullOrEmpty(outputFile))
        throw new ArgumentException("Argument null or empty.", "outputFile");

    using (FileStream outputStream = new FileStream(outputFile, fileMode, FileAccess.Write))
    {
        int cnt = 0;
        const int LEN = 4096;
        byte[] buffer = new byte[LEN];

        while ((cnt = inputStream.Read(buffer, 0, LEN)) != 0)
            outputStream.Write(buffer, 0, cnt);
    }
}

Does anyone has some ideas?
Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

執念 2025-01-11 05:54:15

我为测试数据添加了这个:-

private static void Compress()
{
  FileStream fs = new FileStream(@"C:\Temp\myFile.bin", FileMode.Create);

  DeflateStream d_Stream = new DeflateStream(fs, CompressionMode.Compress);
  for (byte n = 0; n < 255; n++)
    d_Stream.WriteByte(n);
  d_Stream.Close();
  fs.Close();
}

像这样修改解压缩:-

private static void Decompress()
{
  FileStream fs = new FileStream(@"C:\Temp\myFile.bin", FileMode.Open);

  //First two bytes are irrelevant
  //      fs.ReadByte();
  //      fs.ReadByte();

  DeflateStream d_Stream = new DeflateStream(fs, CompressionMode.Decompress);

  StreamToFile(d_Stream, @"C:\Temp\myFile1.txt", FileMode.OpenOrCreate);

  d_Stream.Close();
  fs.Close();
}

像这样运行它:-

static void Main(string[] args)
{
  Compress();
  Decompress();
}

没有错误。

我的结论是,前两个字节是相关的(显然它们与我的特定测试数据相关)或
说明你的数据有问题。

我们可以使用您的一些测试数据吗?

(如果敏感的话就不要这么做)

I added this for test data:-

private static void Compress()
{
  FileStream fs = new FileStream(@"C:\Temp\myFile.bin", FileMode.Create);

  DeflateStream d_Stream = new DeflateStream(fs, CompressionMode.Compress);
  for (byte n = 0; n < 255; n++)
    d_Stream.WriteByte(n);
  d_Stream.Close();
  fs.Close();
}

Modified Decompress like this:-

private static void Decompress()
{
  FileStream fs = new FileStream(@"C:\Temp\myFile.bin", FileMode.Open);

  //First two bytes are irrelevant
  //      fs.ReadByte();
  //      fs.ReadByte();

  DeflateStream d_Stream = new DeflateStream(fs, CompressionMode.Decompress);

  StreamToFile(d_Stream, @"C:\Temp\myFile1.txt", FileMode.OpenOrCreate);

  d_Stream.Close();
  fs.Close();
}

Ran it like this:-

static void Main(string[] args)
{
  Compress();
  Decompress();
}

And got no errors.

I conclude that either the first two bytes are relevant (Obviously they are with my particular test data.) or
that your data has a problem.

Can we have some of your test data to play with?

(Obviously don't if it's sensitive)

唯憾梦倾城 2025-01-11 05:54:15
private static string decompress(byte[] input)
{
    byte[] cutinput = new byte[input.Length - 2];
    Array.Copy(input, 2, cutinput, 0, cutinput.Length);

    var stream = new MemoryStream();

    using (var compressStream = new MemoryStream(cutinput))
    using (var decompressor = new DeflateStream(compressStream, CompressionMode.Decompress))
        decompressor.CopyTo(stream);

    return Encoding.Default.GetString(stream.ToArray());
}

感谢user159335和user1011394让我走上正轨!只需将流的所有字节传递给上述函数的输入即可。确保字节数与指定的长度相同。

private static string decompress(byte[] input)
{
    byte[] cutinput = new byte[input.Length - 2];
    Array.Copy(input, 2, cutinput, 0, cutinput.Length);

    var stream = new MemoryStream();

    using (var compressStream = new MemoryStream(cutinput))
    using (var decompressor = new DeflateStream(compressStream, CompressionMode.Decompress))
        decompressor.CopyTo(stream);

    return Encoding.Default.GetString(stream.ToArray());
}

Thank you user159335 and user1011394 for bringing me on the right track! Just pass all bytes of the stream to input of above function. Make sure the bytecount is the same as the length specified.

冷月断魂刀 2025-01-11 05:54:15

您所需要做的就是使用 GZip 而不是 Deflate。下面是我用于 PDF 文档中的 stream…endstream 部分内容的代码:

        using System.IO.Compression;

        public void DecompressStreamData(byte[] data)
        {

            int start = 0;
            while ((this.data[start] == 0x0a) | (this.data[start] == 0x0d)) start++; // skip trailling cr, lf

            byte[] tempdata = new byte[this.data.Length - start];
            Array.Copy(data, start, tempdata, 0, data.Length - start);

            MemoryStream msInput = new MemoryStream(tempdata);
            MemoryStream msOutput = new MemoryStream();
            try
            {
                GZipStream decomp = new GZipStream(msInput, CompressionMode.Decompress);
                decomp.CopyTo(msOutput);
            }
            catch (Exception e)
            {
                MessageBox.Show(e.Message);
            }

        }

All you need to do is use GZip instead of Deflate. Below is the code I use for the content of the stream… endstream section in a PDF document:

        using System.IO.Compression;

        public void DecompressStreamData(byte[] data)
        {

            int start = 0;
            while ((this.data[start] == 0x0a) | (this.data[start] == 0x0d)) start++; // skip trailling cr, lf

            byte[] tempdata = new byte[this.data.Length - start];
            Array.Copy(data, start, tempdata, 0, data.Length - start);

            MemoryStream msInput = new MemoryStream(tempdata);
            MemoryStream msOutput = new MemoryStream();
            try
            {
                GZipStream decomp = new GZipStream(msInput, CompressionMode.Decompress);
                decomp.CopyTo(msOutput);
            }
            catch (Exception e)
            {
                MessageBox.Show(e.Message);
            }

        }
一口甜 2025-01-11 05:54:15

对于我处理 PDF/A-3 文档中的附件的压缩问题,这些解决方案均无效。一些研究表明,.NET DeflateStream 不支持按照 RFC1950 的带有标头和尾部的压缩流。

供参考的错误消息:使用不支持的压缩方法压缩存档条目。

解决方案是使用替代库SharpZipLib

这是一个简单的方法,可以为我成功解码 PDF/A-3 文件中的 Deflate 附件:

public static string SZLDecompress(byte[] data) {
    var outputStream = new MemoryStream();
    using var compressedStream = new MemoryStream(data);
    using var inputStream = new InflaterInputStream(compressedStream);
    inputStream.CopyTo(outputStream);
    outputStream.Position = 0;
    return Encoding.Default.GetString(outputStream.ToArray());
}

None of the solutions worked for me on Deflate attachments in a PDF/A-3 document. Some research showed that .NET DeflateStream does not support compressed streams with a header and trailer as per RFC1950.

Error message for reference: The archive entry was compressed using an unsupported compression method.

The solution is to use an alternative library SharpZipLib

Here is a simple method that successfully decoded a Deflate attachment from a PDF/A-3 file for me:

public static string SZLDecompress(byte[] data) {
    var outputStream = new MemoryStream();
    using var compressedStream = new MemoryStream(data);
    using var inputStream = new InflaterInputStream(compressedStream);
    inputStream.CopyTo(outputStream);
    outputStream.Position = 0;
    return Encoding.Default.GetString(outputStream.ToArray());
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文