Java:对 GZIPOutputStream 的 Deflater 使用 setDictionary 时出现 CRC 错误

发布于 2025-01-03 15:47:54 字数 2132 浏览 0 评论 0原文

我正在尝试从标准输入获取一串数据,一次将其压缩为一个 128 字节块,然后将其输出到标准输出。 (例如:“cat file.txt | java Dict | gzip -d | cmp file.txt”,其中 file.txt 仅包含一些 ASCII 字符。)

我还需要使用从前面每个 128 的末尾获取的 32 字节字典字节块,对于每个后续块。 (第一个块使用它自己的前 32 个字节作为它的字典。)当我根本不设置字典时,压缩工作正常。但是,当我设置字典时,gzip 给我一个尝试解压缩数据的错误:“gzip:stdin:无效的压缩数据 - crc 错误”。

我尝试过添加/更改代码的几个部分,但到目前为止没有任何效果,而且我还没有通过 Google 找到解决方案。

我已经尝试过...

  • 在代码底部附近的“def.setDictionary(b)”之前添加“def.reset()”不起作用。
  • 仅在第一个块之后设置块的字典是行不通的。 (第一个块不使用字典。)
  • 在compressor.write(input, 0, bytesRead) 之前或之后使用“input”数组调用updateCRC 不起作用。

我真的很感激任何建议 - 有什么明显的我遗漏或做错的事情吗?

这是我的 Dict.java 文件中的内容:

import java.io.*;
import java.util.zip.GZIPOutputStream;

public class Dict {
  protected static final int BLOCK_SIZE = 128;
  protected static final int DICT_SIZE = 32;

  public static void main(String[] args) {
    InputStream stdinBytes = System.in;
    byte[] input = new byte[BLOCK_SIZE];
    byte[] dict = new byte[DICT_SIZE];
    int bytesRead = 0;

    try {
        DictGZIPOuputStream compressor = new DictGZIPOuputStream(System.out);
        bytesRead = stdinBytes.read(input, 0, BLOCK_SIZE);
        if (bytesRead >= DICT_SIZE) {
            System.arraycopy(input, 0, dict, 0, DICT_SIZE);
            compressor.setDictionary(dict);
        }

        do {
            compressor.write(input, 0, bytesRead);
            compressor.flush();

            if (bytesRead == BLOCK_SIZE) {
                System.arraycopy(input, BLOCK_SIZE-DICT_SIZE-1, dict, 0, DICT_SIZE);
                compressor.setDictionary(dict);
            }
            bytesRead = stdinBytes.read(input, 0, BLOCK_SIZE);
        } while (bytesRead > 0);

        compressor.finish();
    }
    catch (IOException e) {e.printStackTrace();}
  }

  public static class DictGZIPOuputStream extends GZIPOutputStream {
    public DictGZIPOuputStream(OutputStream out) throws IOException {
        super(out);
    }

    public void setDictionary(byte[] b) {
        def.setDictionary(b);
    }
    public void updateCRC(byte[] input) {
        crc.update(input);
    }
  }
}

I'm trying to take a stream of data from standard in, compress it one 128 byte block at a time, and then output it to standard out. (Example: "cat file.txt | java Dict | gzip -d | cmp file.txt", where file.txt just contains some ASCII characters.)

I also need to use a 32 byte dictionary taken from the end of each previous 128 byte block, for each subsequent block. (The first block uses its own first 32 bytes as its dictionary.) When I don't set the dictionary at all, the compression works fine. However, when I do set the dictionary, gzip gives me an error trying to decompress the data: "gzip: stdin: invalid compressed data--crc error".

I've tried adding/changing several parts of the code, but nothing has worked so far, and I haven't had any luck finding solutions with Google.

I've tried...

  • Adding "def.reset()" before "def.setDictionary(b)" near the bottom of the code does not work.
  • Only setting the dictionary for blocks after the first block does not work. (Not using a dictionary for the first block.)
  • Calling updateCRC with the "input" array before or after compressor.write(input, 0, bytesRead) does not work.

I'd really appreciate any suggestions - is there anything obvious I'm missing or doing wrong?

This is what I have in my Dict.java file:

import java.io.*;
import java.util.zip.GZIPOutputStream;

public class Dict {
  protected static final int BLOCK_SIZE = 128;
  protected static final int DICT_SIZE = 32;

  public static void main(String[] args) {
    InputStream stdinBytes = System.in;
    byte[] input = new byte[BLOCK_SIZE];
    byte[] dict = new byte[DICT_SIZE];
    int bytesRead = 0;

    try {
        DictGZIPOuputStream compressor = new DictGZIPOuputStream(System.out);
        bytesRead = stdinBytes.read(input, 0, BLOCK_SIZE);
        if (bytesRead >= DICT_SIZE) {
            System.arraycopy(input, 0, dict, 0, DICT_SIZE);
            compressor.setDictionary(dict);
        }

        do {
            compressor.write(input, 0, bytesRead);
            compressor.flush();

            if (bytesRead == BLOCK_SIZE) {
                System.arraycopy(input, BLOCK_SIZE-DICT_SIZE-1, dict, 0, DICT_SIZE);
                compressor.setDictionary(dict);
            }
            bytesRead = stdinBytes.read(input, 0, BLOCK_SIZE);
        } while (bytesRead > 0);

        compressor.finish();
    }
    catch (IOException e) {e.printStackTrace();}
  }

  public static class DictGZIPOuputStream extends GZIPOutputStream {
    public DictGZIPOuputStream(OutputStream out) throws IOException {
        super(out);
    }

    public void setDictionary(byte[] b) {
        def.setDictionary(b);
    }
    public void updateCRC(byte[] input) {
        crc.update(input);
    }
  }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

挽你眉间 2025-01-10 15:47:54

我不知道 zlib 算法的内部工作原理,但根据我对 DictGZIPOutputStream 的理解,当您调用 write() 方法时,在写入后,它将更新该字节数组的 crc。因此,如果您在代码中再次调用 updateCRC() ,那么事情就会出错,因为 crc 会更新两次。然后,当执行 gzip -d 时,由于之前两次 crc 更新,gzip 会抱怨“无效的压缩数据--crc 错误”

我还注意到您在使用压缩器后没有关闭压缩器。当我执行上面粘贴的代码时,它给出了错误“gzip:stdin:意外的文件结尾”。因此,请始终确保最后调用了刷新方法 close 方法。话虽如此,我有以下内容,

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.zip.GZIPOutputStream;


public class Dict
{
    protected static final int BLOCK_SIZE = 128;
    protected static final int DICT_DIZE = 32;

    public static void main(String[] args)
    {
        InputStream stdinBytes = System.in;
        byte[] input = new byte[BLOCK_SIZE];
        byte[] dict = new byte[DICT_DIZE];
        int bytesRead = 0;

        try
        {
            DictGZIPOutputStream compressor = new DictGZIPOutputStream(System.out);
            bytesRead = stdinBytes.read(input, 0, BLOCK_SIZE);

            if (bytesRead >= DICT_DIZE)
            {
                System.arraycopy(input, 0, dict, 0, DICT_DIZE);
            }

            do 
            {               
                compressor.write(input, 0, bytesRead);              

                if (bytesRead == BLOCK_SIZE)
                {
                    System.arraycopy(input, BLOCK_SIZE-1, dict, 0, DICT_DIZE);
                    compressor.setDictionary(dict);
                }

                bytesRead = stdinBytes.read(input, 0, BLOCK_SIZE);
            }
            while (bytesRead > 0);
            compressor.flush();         
            compressor.close();
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }

    }

    public static class DictGZIPOutputStream extends GZIPOutputStream
    {

        public DictGZIPOutputStream(OutputStream out) throws IOException
        {
            super(out);
        }

        public void setDictionary(byte[] b)
        {
            def.setDictionary(b);
        }

        public void updateCRC(byte[] input)
        {
            crc.update(input);
        }                       
    }

}

控制台的测试结果。

$ cat file.txt 
hello world, how are you?1e3djw
hello world, how are you?1e3djw adfa asdfas

$ cat file.txt | java Dict | gzip -d | cmp file.txt ; echo $?
0

I do not know exactly internally zlib algorithm work but based on my understanding on DictGZIPOutputStream, when you call write() method, after it is write, it will update its crc for that byte array. So if you call again updateCRC() in your code again, then thing become wrong as the crc is updated twice. Then when gzip -d is executed, as a result of previous two crc updates, gzip will complaint "invalid compressed data--crc error"

I also noticed that you did not close the compressor after it is used. When I executed the code pasted above, it gave error "gzip: stdin: unexpected end of file". So always make sure to flush method and close method is called in the end. With that said, I have the following,

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.zip.GZIPOutputStream;


public class Dict
{
    protected static final int BLOCK_SIZE = 128;
    protected static final int DICT_DIZE = 32;

    public static void main(String[] args)
    {
        InputStream stdinBytes = System.in;
        byte[] input = new byte[BLOCK_SIZE];
        byte[] dict = new byte[DICT_DIZE];
        int bytesRead = 0;

        try
        {
            DictGZIPOutputStream compressor = new DictGZIPOutputStream(System.out);
            bytesRead = stdinBytes.read(input, 0, BLOCK_SIZE);

            if (bytesRead >= DICT_DIZE)
            {
                System.arraycopy(input, 0, dict, 0, DICT_DIZE);
            }

            do 
            {               
                compressor.write(input, 0, bytesRead);              

                if (bytesRead == BLOCK_SIZE)
                {
                    System.arraycopy(input, BLOCK_SIZE-1, dict, 0, DICT_DIZE);
                    compressor.setDictionary(dict);
                }

                bytesRead = stdinBytes.read(input, 0, BLOCK_SIZE);
            }
            while (bytesRead > 0);
            compressor.flush();         
            compressor.close();
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }

    }

    public static class DictGZIPOutputStream extends GZIPOutputStream
    {

        public DictGZIPOutputStream(OutputStream out) throws IOException
        {
            super(out);
        }

        public void setDictionary(byte[] b)
        {
            def.setDictionary(b);
        }

        public void updateCRC(byte[] input)
        {
            crc.update(input);
        }                       
    }

}

The test result at the console.

$ cat file.txt 
hello world, how are you?1e3djw
hello world, how are you?1e3djw adfa asdfas

$ cat file.txt | java Dict | gzip -d | cmp file.txt ; echo $?
0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文