当前位置：文江博客话题详情

algorithm compression huffman-code information-theory

压缩性示例

发布于 2024-09-05 11:52:38 字数 1031 浏览 13 评论 0原文

来自我的算法教科书：

一年一度的县赛马比赛将带来三匹从未相互竞争过的纯种马。您兴奋地研究了他们过去 200 场比赛，并将其总结为四种结果的概率分布：第一（“第一名”）、第二、第三和其他。

                       Outcome     Aurora   Whirlwind    Phantasm
                        first        0.15      0.30          0.20

                        second       0.10      0.05          0.30

                        third        0.70      0.25          0.30

                        other        0.05      0.40          0.20

哪匹马最容易预测？解决这个问题的一种定量方法是查看可压缩性。将每匹马的历史记录为包含 200 个值的字符串（第一、第二、第三、其他）。然后可以使用霍夫曼算法计算对这些跟踪记录字符串进行编码所需的总位数。 Aurora 的计算结果为 290 位，Whirlwind 的计算结果为 380 位，Phantasm 的计算结果为 420 位（检查一下！）。 Aurora 的编码最短，因此在很大程度上是最可预测的。

《Phantasm》的420分是怎么来的？我不断获得 400 字节，如下所示：

组合第一，其他 = 0.4，组合第二，第三 = 0.6。最终每个位置都有 2 位进行编码。

我对霍夫曼编码算法有什么误解吗？

教科书可在此处获取：http://www.cs.berkeley.edu/~vazirani/algorithms .html（第 156 页）。

From my algorithms textbook:

The annual county horse race is bringing in three thoroughbreds who have never competed against one another. Excited, you study their past 200 races and summarize these as probability distributions over four outcomes: first (“first place”), second, third, and other.

                       Outcome     Aurora   Whirlwind    Phantasm
                        first        0.15      0.30          0.20

                        second       0.10      0.05          0.30

                        third        0.70      0.25          0.30

                        other        0.05      0.40          0.20

Which horse is the most predictable? One quantitative approach to this question is to look at compressibility. Write down the history of each horse as a string of 200 values (first, second, third, other). The total number of bits needed to encode these track-record strings can then be computed using Huffman’s algorithm. This works out to 290 bits for Aurora, 380 for Whirlwind, and 420 for Phantasm (check it!). Aurora has the shortest encoding and is therefore in a strong sense the most predictable.

How did they get 420 for Phantasm? I keep getting 400 bytes, as so:

Combine first, other = 0.4, combine second, third = 0.6. End up with 2 bits encoding each position.

Is there something I've misunderstood about the Huffman encoding algorithm?

Textbook available here: http://www.cs.berkeley.edu/~vazirani/algorithms.html (page 156).

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（1）

空心↖ 2024-09-12 11:52:38

我认为你是对的：Phantasm 的 200 个结果可以使用 400 位（而不是字节）来表示。极光 290 和旋风 380 是正确的。

正确的霍夫曼代码是通过以下方式生成的：

组合两个最不可能的结果：0.2 和 0.2。得到0.4。
将接下来的两个最不可能的结果组合起来：0.3 和 0.3。得到0.6。
将 0.4 和 0.6 结合起来。获取 1.0。

如果您这样做，您将得到 420 位：

组合两个最不可能的结果：0.2 和 0.2。得到0.4。
将 0.4 和 0.3 结合起来。（错误！）得到 0.7。
将 0.7 和 0.3 结合起来。获得1.0

回复收藏 0 原文

~没有更多了~

关于作者

执手闯天涯

暂无简介

文章

评论

27 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

浪子阿飞

文章 0 评论 0

JK.Yang

文章 0 评论 0

人间不值得

文章 0 评论 0

静待花开

文章 0 评论 0

只涨不跌

文章 0 评论 0

污浊的双黑

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文