压缩算法的完整描述
我正在寻找一种压缩算法(用于编程竞赛),并且我需要如何实现它的完整描述(所有技术细节),任何无损且无专利的算法都可以,但易于实现是一个额外的好处:)
(虽然可能不相关)我计划用 C++ 实现该算法...
提前致谢。
编辑: 我将仅压缩文本文件,不压缩其他文件类型......
I am looking for a compression algorithm (for a programming competition) and I need a full description of how to implement it (all technical details), any loseless and patent-free algorithm will do, but the ease of implementation is a bonus :)
(Although possibly irrelevant) I plan to implement the algorithm in C++...
Thanks in advance.
EDIT:
I will be compressing text files only, no other file types...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
好吧,我无法为您完成比赛,但请查看 wiki 上的这篇文章: 行程长度编码。 它是迄今为止最简单的数据压缩方法之一,尽管并不总是一种有效的方法。 压缩也是特定于领域的,即使在无损算法中,您也会发现您正在压缩的内容决定了如何最好对其进行编码。
Well, I can't go so far as to complete the competition for you, but please check out this article on wiki: Run Length Encoding. It is by far one of the simplest ways to compress data, albeit not always an efficient one. Compression is also domain specific, even amongst lossless algorithms you will find that what you are compressing determines how best to encode it.
我会从这里开始维基百科。
有很多选择,但如果不更多地了解您想要什么,就很难提供更多帮助。 您正在压缩文本、图像、视频还是只是随机文件? 每个人都有自己的一套技术和挑战以获得最佳结果。
如果易于实施是唯一的标准,我会使用“文件复制”压缩。 保证压缩比恰好为 1:1,并且实现简单......
I'd start here on Wikipedia.
There's a whole lot to choose from, but without knowing more about what you want it's difficult to help more. Are you compressing text, images, video or just random files? Each one has it's own set of techniques and challenges for optimal results.
If ease of implementation is the sole criterion I'd use "filecopy" compression. Guaranteed compression ratio of exactly 1:1, and trivial implementation...
如果您要压缩纯文本,Huffman 非常有用。 下面的所有评论者都向我保证,实现
;D
是一种乐趣Huffman is good if you're compressing plain text. And all the commenters below assure me it's a joy to implement
;D
RFC 1951 描述了 inflate/deflate,包括压缩器算法的简要描述。 Antaeus Feldspar 的An Explanation of the Deflate Algorithm 提供了更多背景知识。
此外,zlib 源代码发行版在 contrib/puff/puff.c 中包含一个简化的参考充气器,它有助于阅读以准确理解位的排列方式(但它不包含 deflate,仅包含 inflate)。
RFC 1951 describes inflate/deflate, including a brief description of the compressor's algorithm. Antaeus Feldspar's An Explanation of the Deflate Algorithm provides a bit more background.
Also, the zlib source distribution contains a simplified reference inflater in contrib/puff/puff.c that can be helpful reading to understand exactly how the bits are arranged (but it doesn't contain a deflate, only inflate).
如果您在互联网浏览器中进入“查看”,应该有一个“缩小”或缩小文本的选项。
选择其中之一然后...
BAM!
您刚刚在同一屏幕上看到了更多文本! 耶压缩!
If you go under "View" in your internet browser, there should be an option to either "Zoom Out" or make the text smaller.
Select one of those and...
BAM!
You just got more text on the same screen! Yay compression!
易于实施:霍夫曼,如前所述。 我相信 LZW 不再受专利保护,但我不确定。 这是一个相对简单的算法。 不过,LZ77 应该可用。 最后,Burrows-Wheeler 变换允许压缩,但实现起来要困难得多。
Ease of implementation: Huffman, as stated before. I believe LZW is no longer under patent, but I don't know for sure. It' a relatively simple algorithm. LZ77 should be available, though. Lastly, the Burrows-Wheeler transform allows for compression, but it's significantly more difficult to implement.
我喜欢这个对 Burrows-Wheeler 变换 的介绍。
I like this introduction to the Burrows-Wheeler Transform.
现在就安全! 播客最近推出了一集重点介绍数据压缩算法。 Steve Gibson 对 Huffman 和 Lempel-Ziv 压缩技术的基础知识做了很好的解释。 您可以收听音频播客或阅读第 205 集的文字记录。
The Security Now! podcast recently put out an episode highlighting data compression algorithms. Steve Gibson gives a pretty good explanation of the basics of Huffman and Lempel-Ziv compression techniques. You can listen to the audio podcast or read the transcript for Episode 205.