当前位置：文江博客话题详情

java nio 直接缓冲区上的压缩

发布于 2024-12-25 06:22:39 字数 109 浏览 1 评论 0原文

gzip 输入/输出流不在 Java 直接缓冲区上运行。

是否有直接在直接缓冲区上运行的压缩算法实现？

这样就不会产生将直接缓冲区复制到 java 字节数组进行压缩的开销。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

淡淡的优雅 2025-01-01 06:22:39

我并不是想转移你的注意力，但这真的是你的程序中一个很好的优化点吗？您是否使用分析器验证过确实存在问题？您提出的问题意味着您没有做过任何研究，而只是猜测您通过分配 byte[] 会遇到性能或内存问题。由于该线程中的所有答案都可能是某种黑客行为，因此您应该在修复问题之前验证您是否确实存在问题。

回到问题，如果您想在 ByteBuffer 上“就地”压缩数据，答案是否定的，Java 中没有内置的功能可以做到这一点。

如果您按如下方式分配缓冲区：

byte[] bytes = getMyData();
ByteBuffer buf = ByteBuffer.wrap(bytes);

您可以按照前面的答案建议通过 ByteBufferInputStream 过滤您的 byte[] 。

I don't mean to detract from your question, but is this really a good optimization point in your program? Have you verified with a profiler that you indeed have a problem? Your question as stated implies you have not done any research, but are merely guessing that you will have a performance or memory problem by allocating a byte[]. Since all the answers in this thread are likely to be hacks of some sort, you should really verify that you actually have a problem before fixing it.

Back to the question, if you're wanting to compress the data "in place" in on a ByteBuffer, the answer is no, there is no capability to do that built into Java.

If you allocated your buffer like the following:

byte[] bytes = getMyData();
ByteBuffer buf = ByteBuffer.wrap(bytes);

You can filter your byte[] through a ByteBufferInputStream as the previous answer suggested.

回复收藏 0 原文

赤濁 2025-01-01 06:22:39

哇老问题，但今天偶然发现了这个。

可能像 zip4j 这样的库可以处理这个问题，但是您可以在没有外部依赖的情况下完成工作从 Java 11 开始：

如果您只对压缩数据感兴趣，您可以这样做：

void compress(ByteBuffer src, ByteBuffer dst) {
    var def = new Deflater(Deflater.DEFAULT_COMPRESSION, true);
    try {
        def.setInput(src);
        def.finish();
        def.deflate(dst, Deflater.SYNC_FLUSH);

        if (src.hasRemaining()) {
            throw new RuntimeException("dst too small");
        }
    } finally {
        def.end();
    }
}

src 和 dst 都会改变位置，因此您可能必须在 compress 返回后翻转它们。

为了恢复压缩数据：

void decompress(ByteBuffer src, ByteBuffer dst) throws DataFormatException {
    var inf = new Inflater(true);
    try {
        inf.setInput(src);
        inf.inflate(dst);

        if (src.hasRemaining()) {
            throw new RuntimeException("dst too small");
        }

    } finally {
        inf.end();
    }
}

请注意，两种方法都希望在一次传递中进行（解）压缩，但是，我们可以使用稍微修改的版本来传输它：

void compress(ByteBuffer src, ByteBuffer dst, Consumer<ByteBuffer> sink) {
    var def = new Deflater(Deflater.DEFAULT_COMPRESSION, true);
    try {
        def.setInput(src);
        def.finish();
        int cmp;
        do {
            cmp = def.deflate(dst, Deflater.SYNC_FLUSH);
            if (cmp > 0) {
                sink.accept(dst.flip());
                dst.clear();
            }
        } while (cmp > 0);
    } finally {
        def.end();
    }
}

void decompress(ByteBuffer src, ByteBuffer dst, Consumer<ByteBuffer> sink) throws DataFormatException {
    var inf = new Inflater(true);
    try {
        inf.setInput(src);
        int dec;
        do {
            dec = inf.inflate(dst);

            if (dec > 0) {
                sink.accept(dst.flip());
                dst.clear();
            }

        } while (dec > 0);
    } finally {
        inf.end();
    }
}

示例：

void compressLargeFile() throws IOException {
    var in = FileChannel.open(Paths.get("large"));
    var temp = ByteBuffer.allocateDirect(1024 * 1024);
    var out = FileChannel.open(Paths.get("large.zip"));

    var start = 0;
    var rem = ch.size();
    while (rem > 0) {
        var mapped=Math.min(16*1024*1024, rem);
        var src = in.map(MapMode.READ_ONLY, start, mapped);

        compress(src, temp, (bb) -> {
            try {
                out.write(bb);
            } catch (IOException e) {
                throw new UncheckedIOException(e);
            }
        });
        
        rem-=mapped;
    }
}

如果您想要完全符合 zip 标准的数据：

void zip(ByteBuffer src, ByteBuffer dst) {
    var u = src.remaining();
    var crc = new CRC32();
    crc.update(src.duplicate());
    writeHeader(dst);

    compress(src, dst);

    writeTrailer(crc, u, dst);
}

其中：

void writeHeader(ByteBuffer dst) {
    var header = new byte[] { (byte) 0x8b1f, (byte) (0x8b1f >> 8), Deflater.DEFLATED, 0, 0, 0, 0, 0, 0, 0 };
    dst.put(header);
}

并且：

void writeTrailer(CRC32 crc, int uncompressed, ByteBuffer dst) {
    if (dst.order() == ByteOrder.LITTLE_ENDIAN) {
        dst.putInt((int) crc.getValue());
        dst.putInt(uncompressed);
    } else {
        dst.putInt(Integer.reverseBytes((int) crc.getValue()));
        dst.putInt(Integer.reverseBytes(uncompressed));
    }

因此，zip 会产生 10+8 字节的开销。

为了将一个直接缓冲区解压到另一个缓冲区中，您可以将 src 缓冲区包装到一个 InputStream 中：

class ByteBufferInputStream extends InputStream {

    final ByteBuffer bb;

    public ByteBufferInputStream(ByteBuffer bb) {
        this.bb = bb;
    }

    @Override
    public int available() throws IOException {
        return bb.remaining();
    }

    @Override
    public int read() throws IOException {
        return bb.hasRemaining() ? bb.get() & 0xFF : -1;
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        var rem = bb.remaining();

        if (rem == 0) {
            return -1;
        }

        len = Math.min(rem, len);

        bb.get(b, off, len);

        return len;
    }

    @Override
    public long skip(long n) throws IOException {
        var rem = bb.remaining();

        if (n > rem) {
            bb.position(bb.limit());
            n = rem;
        } else {
            bb.position((int) (bb.position() + n));
        }

        return n;
    }
}

并使用：

void unzip(ByteBuffer src, ByteBuffer dst) throws IOException {
    try (var is = new ByteBufferInputStream(src); var gis = new GZIPInputStream(is)) {
        var tmp = new byte[1024];

        var r = gis.read(tmp);

        if (r > 0) {
            do {
                dst.put(tmp, 0, r);
                r = gis.read(tmp);
            } while (r > 0);
        }

    }
}

当然，这并不酷，因为我们将数据复制到临时数组，但无论如何，它是一种往返检查这证明基于 nio 的 zip 编码写入了可以从基于标准 io 的消费者读取的有效数据。

因此，如果我们忽略 crc 一致性检查，我们可以删除页眉/页脚：

void unzipNoCheck(ByteBuffer src, ByteBuffer dst) throws DataFormatException {
    src.position(src.position() + 10).limit(src.limit() - 8);

    decompress(src, dst);
}

Wow old question, but stumbled upon this today.

Probably some libs like zip4j can handle this, but you can get the job done with no external dependencies since Java 11:

If you are interested only in compressing data, you can just do:

void compress(ByteBuffer src, ByteBuffer dst) {
    var def = new Deflater(Deflater.DEFAULT_COMPRESSION, true);
    try {
        def.setInput(src);
        def.finish();
        def.deflate(dst, Deflater.SYNC_FLUSH);

        if (src.hasRemaining()) {
            throw new RuntimeException("dst too small");
        }
    } finally {
        def.end();
    }
}

Both src and dst will change positions, so you might have to flip them after compress returns.

In order to recover compressed data:

void decompress(ByteBuffer src, ByteBuffer dst) throws DataFormatException {
    var inf = new Inflater(true);
    try {
        inf.setInput(src);
        inf.inflate(dst);

        if (src.hasRemaining()) {
            throw new RuntimeException("dst too small");
        }

    } finally {
        inf.end();
    }
}

Note that both methods expect (de-)compression to happen in a single pass, however, we could use slight modified versions in order to stream it:

void compress(ByteBuffer src, ByteBuffer dst, Consumer<ByteBuffer> sink) {
    var def = new Deflater(Deflater.DEFAULT_COMPRESSION, true);
    try {
        def.setInput(src);
        def.finish();
        int cmp;
        do {
            cmp = def.deflate(dst, Deflater.SYNC_FLUSH);
            if (cmp > 0) {
                sink.accept(dst.flip());
                dst.clear();
            }
        } while (cmp > 0);
    } finally {
        def.end();
    }
}

void decompress(ByteBuffer src, ByteBuffer dst, Consumer<ByteBuffer> sink) throws DataFormatException {
    var inf = new Inflater(true);
    try {
        inf.setInput(src);
        int dec;
        do {
            dec = inf.inflate(dst);

            if (dec > 0) {
                sink.accept(dst.flip());
                dst.clear();
            }

        } while (dec > 0);
    } finally {
        inf.end();
    }
}

Example:

void compressLargeFile() throws IOException {
    var in = FileChannel.open(Paths.get("large"));
    var temp = ByteBuffer.allocateDirect(1024 * 1024);
    var out = FileChannel.open(Paths.get("large.zip"));

    var start = 0;
    var rem = ch.size();
    while (rem > 0) {
        var mapped=Math.min(16*1024*1024, rem);
        var src = in.map(MapMode.READ_ONLY, start, mapped);

        compress(src, temp, (bb) -> {
            try {
                out.write(bb);
            } catch (IOException e) {
                throw new UncheckedIOException(e);
            }
        });
        
        rem-=mapped;
    }
}

If you want fully zip compliant data:

void zip(ByteBuffer src, ByteBuffer dst) {
    var u = src.remaining();
    var crc = new CRC32();
    crc.update(src.duplicate());
    writeHeader(dst);

    compress(src, dst);

    writeTrailer(crc, u, dst);
}

Where:

void writeHeader(ByteBuffer dst) {
    var header = new byte[] { (byte) 0x8b1f, (byte) (0x8b1f >> 8), Deflater.DEFLATED, 0, 0, 0, 0, 0, 0, 0 };
    dst.put(header);
}

And:

void writeTrailer(CRC32 crc, int uncompressed, ByteBuffer dst) {
    if (dst.order() == ByteOrder.LITTLE_ENDIAN) {
        dst.putInt((int) crc.getValue());
        dst.putInt(uncompressed);
    } else {
        dst.putInt(Integer.reverseBytes((int) crc.getValue()));
        dst.putInt(Integer.reverseBytes(uncompressed));
    }

So, zip imposes 10+8 bytes of overhead.

In order to unzip a direct buffer into another, you can wrap the src buffer into an InputStream:

class ByteBufferInputStream extends InputStream {

    final ByteBuffer bb;

    public ByteBufferInputStream(ByteBuffer bb) {
        this.bb = bb;
    }

    @Override
    public int available() throws IOException {
        return bb.remaining();
    }

    @Override
    public int read() throws IOException {
        return bb.hasRemaining() ? bb.get() & 0xFF : -1;
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        var rem = bb.remaining();

        if (rem == 0) {
            return -1;
        }

        len = Math.min(rem, len);

        bb.get(b, off, len);

        return len;
    }

    @Override
    public long skip(long n) throws IOException {
        var rem = bb.remaining();

        if (n > rem) {
            bb.position(bb.limit());
            n = rem;
        } else {
            bb.position((int) (bb.position() + n));
        }

        return n;
    }
}

and use:

void unzip(ByteBuffer src, ByteBuffer dst) throws IOException {
    try (var is = new ByteBufferInputStream(src); var gis = new GZIPInputStream(is)) {
        var tmp = new byte[1024];

        var r = gis.read(tmp);

        if (r > 0) {
            do {
                dst.put(tmp, 0, r);
                r = gis.read(tmp);
            } while (r > 0);
        }

    }
}

Of course, this is not cool since we are copying data to a temporary array, but nevertheless, it is sort of a roundtrip check that proves that nio-based zip encoding writes valid data that can be read from standard io-based consumers.

So, if we just ignore crc consistency checks we can just drop header/footer:

void unzipNoCheck(ByteBuffer src, ByteBuffer dst) throws DataFormatException {
    src.position(src.position() + 10).limit(src.limit() - 8);

    decompress(src, dst);
}

回复收藏 0 原文

久随 2025-01-01 06:22:39

如果您使用 ByteBuffers，您可以使用一些简单的输入/输出流包装器，例如：

public class ByteBufferInputStream extends InputStream {

    private ByteBuffer buffer = null;

    public ByteBufferInputStream( ByteBuffer b) {
        this.buffer = b;
    }

    @Override
    public int read() throws IOException {
        return (buffer.get() & 0xFF);
    }
}

public class ByteBufferOutputStream extends OutputStream {

    private ByteBuffer buffer = null;

    public ByteBufferOutputStream( ByteBuffer b) {
        this.buffer = b;
    }

    @Override
    public void write(int b) throws IOException {
        buffer.put( (byte)(b & 0xFF) );
    }

}

测试：

ByteBuffer buffer = ByteBuffer.allocate( 1000 );
ByteBufferOutputStream bufferOutput = new ByteBufferOutputStream( buffer );
GZIPOutputStream output = new GZIPOutputStream( bufferOutput );
output.write("stackexchange".getBytes());
output.close();

buffer.position( 0 );

byte[] result = new byte[ 1000 ];

ByteBufferInputStream bufferInput = new ByteBufferInputStream( buffer );
GZIPInputStream input = new GZIPInputStream( bufferInput );
input.read( result );

System.out.println( new String(result));

If you are using ByteBuffers you can use some simple Input/OutputStream wrappers such as these:

public class ByteBufferInputStream extends InputStream {

    private ByteBuffer buffer = null;

    public ByteBufferInputStream( ByteBuffer b) {
        this.buffer = b;
    }

    @Override
    public int read() throws IOException {
        return (buffer.get() & 0xFF);
    }
}

public class ByteBufferOutputStream extends OutputStream {

    private ByteBuffer buffer = null;

    public ByteBufferOutputStream( ByteBuffer b) {
        this.buffer = b;
    }

    @Override
    public void write(int b) throws IOException {
        buffer.put( (byte)(b & 0xFF) );
    }

}

Test:

ByteBuffer buffer = ByteBuffer.allocate( 1000 );
ByteBufferOutputStream bufferOutput = new ByteBufferOutputStream( buffer );
GZIPOutputStream output = new GZIPOutputStream( bufferOutput );
output.write("stackexchange".getBytes());
output.close();

buffer.position( 0 );

byte[] result = new byte[ 1000 ];

ByteBufferInputStream bufferInput = new ByteBufferInputStream( buffer );
GZIPInputStream input = new GZIPInputStream( bufferInput );
input.read( result );

System.out.println( new String(result));

回复收藏 0 原文

~没有更多了~