低内存条件下的LZW压缩/解压
任何人都可以指点我如何在低内存条件(< 2k)下实现 lzw 压缩/解压缩。这可能吗?
Can anybody give pointers how I can implement lzw compression/decompression in low memory conditions (< 2k). is that possible?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
每个人都使用的 zlib 库在其他问题中显得臃肿(对于嵌入式)。我很确定它不适合你的情况。我的内存多一点,可能是 16K,但无法安装。它分配和清零大块内存并保留内容的副本等。该算法也许可以做到这一点,但找到现有代码是一个挑战。
我使用 http://lzfx.googlecode.com 解压循环很小,它是较旧的 lz 类型压缩依赖于先前的结果,因此您需要访问未压缩的结果...下一个字节是 0x5,下一个字节是 0x23,接下来的 15 个字节是 15 200 字节之前的副本,接下来的 6 个字节bytes 是 127 年前的副本...较新的 lz 算法是基于可变宽度表的,可以很大或根据实现方式而增长。
我正在处理重复的数据,并试图将几 K 压缩为几百,我认为压缩率约为 50%,不是很好,但完成了工作,并且解压例程很小。上面的lzfx包很小,不像zlib那样,有两个主要函数,代码就在那里,而不是几十个文件。如果您愿意,您可以更改缓冲区的深度,也许可以改进压缩算法。我确实必须修改解压缩代码(可能是 20 或 30 行代码),它的指针很重,我将其切换到数组,因为在我的嵌入式环境中,指针位于错误的位置。烧伤可能是一个额外的寄存器,也可能不是,这取决于你如何实现它和你的编译器。我也这样做了,这样我就可以抽象字节的获取和存储,因为我将它们打包到不可字节寻址的内存中。
如果您发现更好的东西,请在此处发布或通过 stackoverflow 联系我,我对其他嵌入式解决方案也非常感兴趣。我搜索了很多,上面是我发现的唯一有用的,我很幸运,我的数据使用该算法压缩得足够好......目前。
The zlib library that everyone uses is bloated among other problems (for embedded). I am pretty sure it wont work for your case. I had a little more memory maybe 16K and couldnt get it to fit. It allocates and zeros large chunks of memory and keeps copies of stuff, etc. The algorithm can maybe do it but finding existing code is the challenge.
I went with http://lzfx.googlecode.com The decompression loop is tiny, it is the older lz type compression that relies on the prior results so you need to have access to the uncompressed results...The next byte is a 0x5, the next byte is a 0x23, the next 15 bytes are a copy of the 15 200 bytes ago, the next 6 bytes are a copy of 127 ago...the newer lz algorithm is variable width table based that can be big or grow depending on how implemented.
I was dealing with repetitive data and trying to squeeze a few K down into a few hundred, I think the compression was about 50%, not great but did the job and the decompression routine was tiny. The lzfx package above is small, not like zlib, like two main functions that have the code right there, not dozens of files. You could likely change the depth of the buffer, perhaps improve the compression algorithm if you so desire. I did have to modify the decompression code (like 20 or 30 lines of code perhaps) it was pointer heavy and I switched it to arrays because in my embedded environment the pointers were in the wrong place. Burns maybe an extra register or not depending on how you implement it and your compiler. I also did that so I could abstract the fetches and the stores of the bytes as I had them packed into memory that wasnt byte addressable.
If you find something better please post it here or ping me through stackoverflow, I am also very interested in other embedded solutions. I searched quite a bit and the above was the only useful one I found and I was lucky that my data was such that it compressed well enough using that algorithm...for now.
为什么是LZW? LZW需要大量内存。它基于哈希/字典,压缩率与哈希/字典大小成正比。更多内存 - 更好的压缩。更少的内存 - 输出甚至可以大于输入。
我已经很长时间没有接触过编码了,但是 IIRC 霍夫曼编码 在内存消耗方面要好一些。
但这完全取决于您想要压缩的信息类型。
Why LZW? LZW needs lots of memory. It is based on a hash/dictionary and compression ratio is proportional to the hash/dictionary size. More memory - better compression. Less memory - output can be even larger than input.
I haven't touched encoding for very long time, but IIRC Huffman coding is little bit better when it comes to memory consumption.
But it all depends on type of information you want to compress.
我已经使用了 LZSS。我使用了 代码 来自 奥村晴彦 作为基础。它使用未压缩数据的最后部分(2K)作为字典。如果内存中拥有所有可用的未压缩数据,则可以修改我链接的代码以几乎不使用内存。通过谷歌搜索,您会发现很多不同的实现。
I have used LZSS. I used code from Haruhiko Okumura as base. It uses the last portion of uncompressed data(2K) as dictionary. The code I linked can be modified to use almost no memory if you have all the uncompressed data available in memory. With a bit of googling you will find that a lot of different implementations.
如果压缩算法的选择不是一成不变的,您可以尝试使用 gzip/LZ77。这是我使用和改编过一次的非常简单的实现:
ftp://quatramaran.ens .fr/pub/madore/misc/myunzip.c
您需要清理它读取输入、错误处理等的方式,但这是一个好的开始。如果您的数据和代码需要容纳 2k,它可能也太大了,但至少数据大小已经很小了。
最大的优点是它是公共领域,因此您可以随心所欲地使用它!
If the choice of compression algorithm isn't set in stone, you might try gzip/LZ77 instead. Here's a very simple implementation I used and adapted once:
ftp://quatramaran.ens.fr/pub/madore/misc/myunzip.c
You'll need to clean up the way it reads input, error handling, etc. but it's a good start. It's probably also way too big if your data AND code need to fit in 2k, but at least the data size is small already.
Big plus is that it's public domain so you can use it however you like!
距离我上次使用 LZW 压缩算法已经过去 15 年了,所以请对以下内容持保留态度。
考虑到内存限制,这充其量也是困难的。您构建的词典将消耗您现有的绝大多数内容。 (假设代码 + 内存 <= 2k。)
为您的字典选择一个小的固定大小。假设有 1024 个条目。
让每个字典条目采用...的形式。
这种结构使得字典是递归的。您需要前一个索引处的项目有效才能正常工作。这可行吗?我不知道。然而,让我们暂时假设它是这样的,并找出它会将我们引向何处......
如果您使用 int 和 char 的标准类型,您将很快耗尽内存。您需要将所有东西尽可能紧密地包装在一起。 1024 个条目将需要 10 位来存储。您的新角色可能需要 8 位。总计 = 18 位。
18 位 * 1024 个条目 = 18432 位或 2304 字节。
乍一看这似乎太大了。我们该怎么办?利用前 256 个条目已知的事实——您的典型扩展 ascii 集或您拥有的其他内容。这意味着我们确实需要 768 个条目。
768 * 18 位 = 13824 位或 1728 字节。
这为您留下了大约 320 个字节的代码可供使用。当然,您可以尝试调整字典的大小,看看什么对您有好处,但最终您的代码不会有太多空间。由于您所看到的代码空间非常小,因此我希望您最终会在汇编中进行编码。
我希望这有帮助。
It has been over 15 years since I last played with the LZW compression algorithm, so take the following with a grain of salt.
Given the memory constraints, this is going to be difficult at best. The dictionary you build is going to consume the vast majority of what you have available. (Assuming that code + memory <= 2k.)
Pick a small fixed size for your dictionary. Say 1024 entries.
Let each dictionary entry take the form of ....
This structure makes the dictionary recursive. You need the item at the previous index to be valid in order for it to work properly. Is this workable? I'm not sure. However, let us assume for the moment that it is and find out where it leads us ....
If you use the standard types for int and char, you are going to run out of memory fast. You will want to pack things together as tightly as possible. 1024 entries will take 10 bits to store. Your new character, will likely take 8 bits. Total = 18 bits.
18 bits * 1024 entries = 18432 bits or 2304 bytes.
At first glance this appears too large. What do we do? Take advantage of the fact that the first 256 entries are already known--your typical extended ascii set or what have you. This means we really need 768 entries.
768 * 18 bits = 13824 bits or 1728 bytes.
This leaves you with about 320 bytes to play with for code. Naturally, you can play around with the dictionary size and see what's good for you, but you will not end up with very much space for your code. Since you are looking at so little code space, I would expect that you would end up coding in assembly.
I hope this helps.
我最好的建议是检查 BusyBox 源代码,看看他们的 LZW 实现是否足够小,可以在您的环境中工作。
My best recommendation is to examine the BusyBox source and see if their LZW implementation is sufficiently small to work in your environment.
lzw 的最低字典是链表上的trie。请参阅 LZW AB 中的原始实现。我已经在 fork LZWS 中重写了它。 Fork 与
compress
兼容。详细文档此处。n
位字典需要(2 ** n) * sizeof(code) + ((2 ** n) - 257) * sizeof(code) + (2 ** n) - 257
。因此:
请注意,这是字典的要求。堆栈中的状态或变量需要大约 100-150 个字节。
解压缩器将比压缩器使用更少的内存。
所以我认为你可以尝试使用
9位
版本来压缩你的数据。但它不会提供良好的压缩比。您拥有的位数越多,比率就越好。The lowest dictionary for lzw is trie on linked list. See original implementation in LZW AB. I've rewrited it in fork LZWS. Fork is compatible with
compress
. Detailed documentation here.n
bit dictionary requires(2 ** n) * sizeof(code) + ((2 ** n) - 257) * sizeof(code) + (2 ** n) - 257
.So:
Please be aware that it is a requirements for dictionary. You need to have about 100-150 bytes for state or variables in stack.
Decompressor will use less memory than compressor.
So I think that you can try to compress your data with
9 bit
version. But it won't provide good compression ratio. More bits you have - ratio is better.