对霍夫曼树感到困惑

发布于 2024-09-05 01:57:01 字数 413 浏览 5 评论 0原文

生成哈夫曼树的快速教程

困惑关于霍夫曼树。在上面链接的末尾附近,它显示了剩下 2 个元素的树,然后是完整的树。我对它的分支方式感到困惑。哈夫曼树有没有特定的分支方式?

例如,57:* 及其右子级 35:* 向右分支。是否可能有 35 个向左分支,22 个向右分支?另外,为什么 22:* 不与 15:4 配对 - 它只是与 20:5 配对来创建一棵新树。

从最初的观察来看,除了叶子的频率加起来等于父节点的值之外,树似乎不需要平衡或具有任何特定的顺序。两个使用相同数据创建哈夫曼树的人最终会得到不同的编码值吗?

A quick tutorial on generating a huffman tree

Confused about Huffman Trees. Near the end of that link above, it shows the tree with 2 elements left, and then the completed tree. I'm confused about the way that it is branched. Is there a specific way a huffman tree needs to be branched?

For example, 57:* with its right child 35:* is branched off to the right. Could it have been 35 branched to the left with 22 branched to the right? Also, why wasn't 22:* paired up with 15:4 - it just paired with 20:5 to create a new tree.

From initial obersvations it seems the tree does not need to be balanced or have any specific order other than that the frequencies of a leaf add up to the value of the parent node. Could two people creating a huffman tree with the same data end up with different encoding values?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

仅冇旳回忆 2024-09-12 01:57:01

到目前为止,这些帖子都是错误且具有误导性的:选择具有相同权重的叶子确实很重要,并且它们确实改变了数据压缩的效果。

这是一个反例,演示了不同的选择如何导致不同的压缩率:
ABBBCCCDDDDEEEEEEEE

A:1、B:3、C:3、D:4、E:8。
第一步:将A和B组成一个权重为4的节点。
第二步:

如果你用C取第一步新创建的节点,那么你得到
(19 (11 (7 (4 (1-A) (3-B)) (3-C)) (4-D)) (8-E)) 给出 37 位压缩数据。

另一方面,如果您采用权重为 4 的 D,而不是新创建的节点,您将得到
(19 (11 (4 (1-A) (3-B)) (7 (3-C) (4-D))) (8-E)) 给出 41 位压缩数据。

The posts so far are wrong and misleading: the choice of leaves with equal weights does matter and they do change how well they compress data.

Here's a counter example that demonstrates how different choices lead to different compression rates:
ABBBCCCDDDDEEEEEEEE

A:1, B:3, C:3, D:4, E:8.
First step: take A and B to form a node with weight 4.
Second step:

If you take the newly created node in the first step with C, then you get
(19 (11 (7 (4 (1-A) (3-B)) (3-C)) (4-D)) (8-E)) which gives 37-bits compressed data.

If, on the other hand, you take D, which also has the weight 4, instead of the newly created node, you get
(19 (11 (4 (1-A) (3-B)) (7 (3-C) (4-D))) (8-E)) which gives 41-bits compressed data.

空袭的梦i 2024-09-12 01:57:01

哈夫曼树的关键是:

按频率对列表进行排序,并将两个最低元素放入叶子

如果有两个以上频率最低的元素(例如 3,4,4...),则任何两个都可以( 3 和任意一个 4 - 而不是两个 4)。此外,这些最低元素中哪个被分配 0、哪个被分配 1 并不重要。这两个事实允许从相同的数据中产生不同但有效的霍夫曼编码。

哈夫曼树应该通过频率而不是节点数量来平衡。因此,以下内容是平衡的:

(100 (50 (25 (12 (12 1)))))

而这不是:

(((100 50) 25) ((12 12) 1)))

具体来说,在您的问题中,15 与 20 配对,而不是 22,因为 15 和 20 是两个最低的剩余值(均低于 22)。任何一个分支(左分支或右分支)都可以,只要它是一致的(在同一算法中总是较小的左分支,或者总是较小的右分支,以便可以在另一端重建编码)。

The key to Huffman trees is this:

Sort this list by frequency and make the two-lowest elements into leaves

If you have more than two elements that have the lowest frequency (e.g. 3,4,4...), any two will do (3 and either of 4s - not two 4s). Also, it is not important which of these lowest elements is assigned 0 and which is 1. These two facts allow different yet valid Huffman encodings to arise from the same data.

The Huffman tree is supposed to be balanced by frequencies, not by the number of nodes. Thus the following is balanced:

(100 (50 (25 (12 (12 1)))))

and this is not:

(((100 50) 25) ((12 12) 1)))

Specifically in your question, 15 is paired with 20 and not 22 because 15 and 20 are the two lowest remaining values (both lower than 22). Either branching (left or right) would have been fine, as long as it's consistent (always smaller-left, or always smaller-right, within the same algorithm, so that the encoding can be reconstructed at the other end).

东走西顾 2024-09-12 01:57:01

页面上已对所有内容进行了说明。 22:* 未与 15:4 配对,因为在每个步骤中,两个具有最低元素的节点被组合。这创造了一个独特的秩序。

霍夫曼代码可以不同(如果您有多个具有相同频率的值或交换左/右的 0 和 1 表示),但霍夫曼长度不能不同。

左/右分支只是如何绘制树或以图形方式表示树的问题,所以这并不重要。

Everything is explained on the page. 22:* wasn't paired up with 15:4 because in each step, two nodes with the lowest elements are combined. This creates an unique order.

Huffman codes can be different (if you have multiple values with the same frequency or exchange 0 and 1 representation of left/right), but huffman lengths can't be.

Branching left/right is just a matter of how to draw the tree or represent it graphical, so this doesn't matter.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文