像霍夫曼编码这样的算法实际上是否在生产中使用?
目前,我正在开发一个需要在iPad上存储大量文本的应用程序。我的问题是,像霍夫曼编码这样的算法实际上是否在生产中使用?我只需要一个非常简单的压缩算法(不会有大量的文本,它只需要更有效的存储方法),那么像 Huffamn 这样的东西会起作用吗?我应该研究其他类型的压缩库吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
是的,我在网络应用程序中使用霍夫曼压缩,将引擎的完整快照存储在隐藏的输入字段中。首先,这只是好奇心,但它卸载了我的会话内存,将其移动到客户端浏览器内存,我用它将其存储在文件中,以备份并与我的同事交换该快照。伙计,当你可以在管理面板中加载文件以在网络中加载引擎时,你必须看到他们的脸!它基本上是一个序列化压缩和 base64 编码的数组。它帮助我节省了大约 15% 的带宽,但我认为我现在可以做得更好。
Yes, I'm using a huffman compression in my web app for storing a complete snapshot of my engine in an hidden input field. First off it was just curiosity but it offload my SESSION memory moving it to the client browser memory and i used it to store it in a file to backup and exchange that snapshot with my collegue. Man, you have to see their faces when you can just load a file in an admin panel to load the engine in the web!!! It's basically a serialized compressed and base64 encoded array. It helps me to save about 15% bandwith but I think I can do it better now.
是的,您正在使用霍夫曼编码(解码)来阅读此页面,因为网页被压缩为 gzip 格式。因此,全球各地的网络浏览器几乎每纳秒都会使用它。
霍夫曼编码几乎从不单独使用,而是总是与数据的一些高阶建模一起使用,为霍夫曼算法提供它需要处理的内容。因此,LZ77 将文本和其他面向字节的数据建模为编码为文字和长度/距离对的重复字符串,然后将其馈送到霍夫曼编码,例如在 使用 zlib。或者对 PNG 像素进行差分或其他预测编码,然后进行霍夫曼编码。
至于你应该使用什么,请查看 lz4、zlib 和 zstd。
Yes, you're using Huffman coding (decoding) to read this page, since web pages are compressed to the gzip format. So it is used pretty much every nanosecond by web browsers all over the planet.
Huffman coding is almost never used by itself, but rather always with some higher-order modeling of the data to give the Huffman algorithm what it needs to work with. So LZ77 models text and other byte-oriented data as repeating strings coded as literals and length/distance pairs, which is then fed to Huffman coding, e.g. in the deflate compressed format using zlib. Or with difference or other prediction coding of pixels for PNG, followed by Huffman coding.
As for what you should use, look at lz4, zlib, and zstd.
来自有关该主题的维基百科:
所以,是的,霍夫曼编码用于生产中。甚至相当多。
From Wikipedia on the subject:
So yes, Huffman coding is used in production. Quite a lot, even.
霍夫曼编码(也称为熵编码)应用非常广泛。任何你想象的被压缩的东西,除了一些非常古老的方案之外,都会使用它们。图像压缩、Zip 和 RAR 存档、各种可以想象到的编解码器等等。
请记住,霍夫曼编码是无损的,并且要求您提前了解要压缩的所有数据。如果您正在进行有损压缩,则需要对数据执行一些转换以首先降低其熵(在 JPEG 压缩中删除并量化 DCT 系数)。如果您希望霍夫南编码适用于实时数据(您无法提前知道每一位数据),则可以使用自适应霍夫曼编码。您可以在信号处理文献中找到有关此主题的大量内容。
一些前霍夫曼压缩包括诸如游程编码(传真机)之类的方案。有时仍将游程编码(再次是 JPEG)与霍夫曼编码结合使用。
Huffman coding (also entropy coding) is used very widely. Anything you imagine that is being compressed, with exceptions of some very old schemes, uses them. Image compression, Zip and RAR archives, every imaginable codec and so on.
Keep in mind that Huffman coding is lossless and requires you to know all of the data you're compressing in advance. If you're doing lossy compression, you need to perform some transformations on your data to reduce its entropy first (removing and quantizing DCT coefficients in JPEG compression). If you want Huffnam coding to work on real-time data (you don't know every bit in advance), adaptive Huffman coding is used. You can find a whole lot on this topic in signal processing literature.
Some of the pre-Huffman compression include schemes like runlength coding (fax machines). Runlength coding is still sometimes used (JPEG, again) in combination with Huffman coding.
是的,它们用于生产。
正如其他人提到的,真正的霍夫曼要求您首先分析整个语料库以获得最有效的编码,因此它通常不会单独使用。
可能在你出生后不久,我在 Psion Series 3 手持计算机上用 C 语言实现了霍夫曼压缩,以压缩预加载到数据包中并且仅在手持计算机上解压缩的数据。那个时候,空间紧张,也没有内置的压缩库。
与大多数明确指定的软件一样,我强烈考虑使用 iOS 内置的任何功能或开发环境中可用的标准包。
这将节省大量调试工作,并让您能够专注于应用程序中最重要的增值部分。
大量文本将适合 zip 式压缩。从长远来看,花费精力改善其性能(无论是空间还是时间)不太可能获得回报。
Yes, they are used in production.
As others have mentioned, true Huffman requires you to analyze the entire corpus first to get the most efficient encoding, so it isn't typically used by itself.
Probably shortly after you were born, I implemented Huffman compression on the Psion Series 3 handheld computer in C in order to compress data which was preloaded onto data packs and only decompressed on the handheld. In those days, space was tight and there was no built-in compression library.
Like most software which is well-specified, I would strongly consider using any feature built into the iOS or standard packages available in your development environment.
This will save a lot of debugging and allow you to concentrate on the most significant portions of your app which add value.
Large amounts of text will be amenable to zip-style compression. And it will be unlikely that spending effort improving its performance (in either space or time) will pay off in the long run.
iOS 有一个嵌入式机制来支持 zlib 算法(Objective-C 中的 zlib.h)。
您可以实现自己的压缩功能和 利用 iOS 嵌入式 zlib 函数。并比较性能。
我认为嵌入式 zlib 功能会更快并且会提供更高的压缩比。
There's an iOS embedded mechanism to support zlib algorithm (zlib.h in Objective-C).
You may implement your own compression functionality and utilize iOS embedded zlib functions. And compare the performance.
I think the embedded zlib functionality will be faster and will give higher compression ratio.
霍夫曼代码是许多“现实世界”生产算法的支柱。当今常见的压缩算法通过转换数据来提高压缩率,从而对霍夫曼代码进行了改进。当然,有许多特定于应用程序的技术可用于执行此操作。
至于是否应该使用霍夫曼代码,我的问题是,当您可以通过使用已经实现的第三方库实现更好的压缩和简化代码时,为什么要使用霍夫曼代码呢?
Huffman codes are the backbone to many "real world" production algorithms. The common compression algorithms today improve upon Huffman codes by transforming their data to improve compression ratios. Of course, there are many application specific techniques used to do this.
As for whether or not you should use Huffman codes, my question is why should you when you can achieve better compression and ease of code by using an already implemented 3rd party library?