确定用于一系列字节的最佳压缩算法

发布于 2024-07-14 23:33:46 字数 604 浏览 5 评论 0原文

对于我的个人项目，我正在编写一个小类来压缩和解压缩一种相当晦涩的格式。我有完整的规格，但这不是问题所在。

首先，这种“格式”使用一组 6 种不同的压缩类型以及未压缩的字节数据块。格式为 RLE、RLE 的分支，其中数字按每个字节递增（例如 3、4、5...）、16 位 RLE、LZ 复制、反向 LZ 复制和 LZ 复制 Xor' d 与 255。这不是最干净的规格，但也不是我设计的。

我的压缩例程应该接受 1 到 65535 字节之间的数组，并（希望）尽可能地压缩它。我之前的尝试只是简单地计算出，从未压缩流中的任何索引开始，上述哪种压缩技术将提供最佳压缩，然后压缩该方法将压缩到压缩字节数组的字节数，然后再从新的“未压缩”索引，例如：

{0,0,0,1,2,3,4}

该算法将首先读取开头有三个零，然后输出规范使用的它们的 RLE 编码，然后从第四个元素开始，将读取递增的 RLE 将覆盖'1,2,3,4' 足够好并在返回之前对其进行压缩。

总结的问题是，在尝试找出最佳使用规范时，即使在小型（20-30）字节数组上，例程也非常慢。任何人都可以帮助我提供有关如何优化此问题的提示，或者我是否可以提供更多信息来提供帮助？

原文

For a personal project of mine, I'm writing up a small class to compress to and decompress from a rather obscure format. I've got the full spec, but that's not where the problem is.

First, this 'format' uses a set of 6 different compression types as well as uncompressed blocks of byte data. The formats are RLE, an offshoot of RLE where the number increments each byte (e.g. 3, 4, 5, ...), a 16-bit RLE, LZ-Copy, a reverse LZ-copy, and LZ-Copy Xor'd with 255. It's not the cleanest of specs, but I didn't design it either.

My compression routine is supposed to take in an array of anywhere from 1 to 65535 bytes, and (hopefully) compress it as much as possible. My previous attempt at this simply calculated out, starting from any index in the uncompressed stream, which of the compression techniques above will provide the best compression, and then compresses however many bytes that method will compress to the array of compressed bytes before repeating from the new 'uncompressed' index, e.g.:

{0,0,0,1,2,3,4}

The algorithm would first read that there were three zeroes at the start, and then output the RLE encoding for them that the spec used, and then starting from the fourth element, would read that incrementing RLE would cover the '1,2,3,4' well enough and compress that before returning.

The problem summarized is that while trying to find out the best spec to use, the routine is very slow even on small (20-30) byte arrays. Can anyone help with tips on how I might look at optimizing this, or if there's any more information I could provide to help?

分享到QQ

分享到微博