霍夫曼将两个字符编码为一个

发布于 2024-09-02 17:31:11 字数 158 浏览 10 评论 0原文

我需要哈夫曼代码(最好在Python或Java中),它可以不是用一个字符(a = 10, b = 11)对文本进行编码,而是用两个(ab = 11, ag = 10)。是否可能,如果可以,我在哪里可以找到它,也许它在互联网上的某个地方,我只能找到它?

I need huffman code(best in python or in java), which could encode text not by one character (a = 10, b = 11), but by two (ab = 11, ag = 10). Is it possible and if yes, where could i find it, maybe it's somewhere in the internet and i just can'd find it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

策马西风 2024-09-09 17:31:11

霍夫曼代码不关心字符,它关心符号。通常,它用于对字母表/其他单个字符进行编码,但可以很容易地推广到对字符串进行编码。基本上,您只需采用现有的实现并允许符号是字符串而不是字符。叶节点将对应于字符串列表。

Huffman code doesn't care about characters, it cares about symbols. Generally, it is used to encode the alphabet / other single characters, but can very easily be generalized to encode strings of characters. Basically, you would just take an existing implementation and allow symbols to be strings rather than characters. A leaf node would then correspond to a list of strings.

揽月 2024-09-09 17:31:11

有一个随 Python bitarray 模块分发的霍夫曼编码器示例,如果有的话给你。

There's a Huffman encoder example distributed with the Python bitarray module, if that's any use to you.

如歌彻婉言 2024-09-09 17:31:11

某处可能有一些代码。但这听起来像是一个解析和标记化问题。我要回答的第一个问题是您正在处理多少个独特的对。霍夫曼编码最适合少量标记。例如,键盘上的 101 个字符。但如果你的两个角色可以是任何东西,那么你现在正在大规模扩展角色的最大数量。

There is probably some code somewhere. But this sounds like a parsing and tokenising question. One of the first questions I would be answering is how many unique pairs are you dealing with. Huffman encoding works best with small numbers of tokens. For example, the 101 characters on your keyboard. But if your two characters can be anything, you are now expanding the maximum number of characters massively.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文