霍夫曼编码 - 标头和编码EOF

发布于 2024-12-16 21:27:23 字数 274 浏览 5 评论 0原文

我目前正在致力于用Java实现一个基于霍夫曼算法的程序，我正处于需要将编码内容输出到文件的阶段。我对如何实现解码所需的 header 和 eof 有点困惑。对于目前我的标题，我拥有输入文件中出现的所有唯一值及其频率，但在一些文章中，我看到人们用 0 或 1 表示节点，然后是频率（我对此有点困惑）因为它没有说明符号是什么）。

另外，对于我所理解的 EOF，我像符号一样对其进行编码，以便读取和解码它，但是我不确定我可以使用它的什么值，但肯定不会出现？我知道它的权重需要为 1，但不确定如何确保它实际上不会出现在文件中。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜点 2024-12-23 21:27:23

我曾经在一次作业中必须这样做，这就是我们使用的方法。

标头编码是通过使用 0 和 1 来编码树的结构（而不是频率）来完成的。 “0”表示沿着树移动，“1”表示我们位于叶节点。这导致了一种对树进行独特编码的前序遍历。

例如，像 (((ab) c) (de)) 这样的树将被编码为“0001a1b1c01 d1e”，其中 a、b、c、d、e 是它们的 ASCII 形式。

这是图形形式的树：

     / \
   /\   /\
 /\  c d  e
a  b

对于 EOF，我们使用文件中的最后 3 位来指定需要读取最后两个字节中的多少个。一旦我们读取了最后一个字节（所以我们正在处理倒数第二个字节），我们就检查了最后 3 位：它们编码了要读取的位数，减去 6。所以 110101xxxxxxx000 意味着“读取 110101（6 位）并丢弃其他所有内容”。 1101011xxxxxx001 表示“读取 1101011（7 位）并丢弃其余部分”等。

这样做意味着我们不必有一个表示 EOF 的特殊值，我们仍然可以一次读取文件一个字节（尽管我们实际上需要在工作之前读取一个字节）。

（对于 EOF 我还没有读过你的文章，所以我不知道我们的想法是否适合你的方法。）

I've had to do this once for an assignment and this is the approach we used.

Encoding the header was done by using 0 and 1 to encode the structure of the tree (rather than the frequencies). A "0" denoted moving along the tree, a "1" denoted we were at a leaf node. This resulted in a sort of pre-order traversal of the tree encoding it uniquely.

For example, a tree like (((a b) c) (d e)) would be encoded as "0001a1b1c01d1e", where a,b,c,d,e are their ASCII forms.

Here's the tree in a graphical form:

     / \
   /\   /\
 /\  c d  e
a  b

For the EOF we used the last 3 bits in the file to specify how many of the last two bytes needed to be read. Once we read the last byte (so we were working on the second last byte) we checked the last 3 bits: They encoded how many more bits to read, minus 6. So 110101xxxxxxx000 meant "read 110101 (6 bits) and discard everything else". 1101011xxxxxx001 meant "read 1101011 (7 bits) and discard the rest", etc.

Doing it this way meant we didn't have to have a special value denoting the EOF and we could still read the file a byte at a time (although we actually needed to read one byte ahead of where we were working).

(For the EOF I haven't read your articles, so I don't know if our idea works with your approach.)

回复收藏 0 原文