将字符串编码为具有更多字符的另一个基数?

发布于 2024-12-09 05:20:15 字数 281 浏览 0 评论 0原文

我知道我可以将数字编码为像 65 这样的基数 减小字符显示的大小(即使二进制数字较小)。

但是,有没有办法将 UTF-8 文本编码为比我们标准 26 字母英文字母表更多字符的另一种基数? 换句话说,而不是需要 4 个“字符”来表示“四”这个词- 我可以只使用2个(即“6$”)来创建表示或散列?

I know that I can encode numbers to a base like 65 to decrease the size of the character display (even if the number is smaller in binary).

However, is there a way to encode UTF-8 text to another base with more characters than our standard 26 letter English alphabet? In other words, Instead of requiring 4 "characters" for the word "four" - I can create a representation or hash using only, maybe 2 (i.e. "6$")?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

面如桃花 2024-12-16 05:20:15

我相信 Base64 的要点是您可以轻松地将任何二进制数据转换为“人类可读”的字母和数字。它可以轻松地将任意数据转录到新闻组或通过基于文本的协议传输它们。

如果你想进一步“压缩”这些数据,你需要弄清楚你想要允许多少个字符。 8 位的组合只有这么多。最有效的方法是使用所有这些,在这种情况下为什么不使用 gzip 呢?

I believe the point of Base64 is you can easily convert any binary data into "human readable" letters and numbers. It makes it easy to transcribe arbitrary data to newsgroups or transmit them over text based protocols.

If you want to further "compress" this data, you need to figure out how many characters you want to allow. There's only so many combinations of 8 bits. The most efficient would be to use all of them, in which case why just not use gzip?

情仇皆在手 2024-12-16 05:20:15

您的问题似乎与 Order-0 熵编码有关:
http://en.wikipedia.org/wiki/Entropy_encoding

这个家族最著名的算法是霍夫曼编码:
http://en.wikipedia.org/wiki/Huffman_coding

霍夫曼不仅会告诉你,使用 64 个字符,因此每个字符只需 6 位:它还会区分频繁字符(例如(空格))和罕见字符(例如 (;))。然后,它将创建一个代码,其中频繁出现的字符使用的位数少于较少出现的字符,从而获得更好的压缩效果(在英文文本中,每个字符通常为 4.5 位)。

霍夫曼编码是一种全方位的压缩技术,用作许多压缩算法的一部分,包括 zip。
您可以在此处找到一个仅应用一次霍夫曼压缩 (Huff0) 的演示程序,它将帮助您确定通过对示例输入使用此技术可以获得多少收益:
http://fastcompression.blogspot.com/p/huff0-range0-entropy -coders.html

Your question seems related to Order-0 entropy coding :
http://en.wikipedia.org/wiki/Entropy_encoding

The most famous algorithm is this family is Huffman coding :
http://en.wikipedia.org/wiki/Huffman_coding

Huffman will not only tells you that only 64 characters are used and therefore only 6 bits per characters are necessary : it will also make a difference between frequent characters, such as (space), and rare ones, such as (;). It will then create a code in which frequent characters use less bits than rarer ones, resulting in better compression (typically 4.5bits per character on English texts).

Huffman coding is an all-around compression technique, used as part of many compression algorithms, including zip.
You can find a demo program which only applies one pass of Huffman compression here (Huff0), it will help you determine how much can be gained by using this technique for your sample inputs :
http://fastcompression.blogspot.com/p/huff0-range0-entropy-coders.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文