是否有更有效的压缩算法来压缩64base编码的字符串?

发布于 2025-02-05 19:08:32 字数 606 浏览 3 评论 0原文

假设我有一个5个字符字符串,并且字符可以是64个字符之一[A-ZA-Z0-9+/]。我想生成所有可能的字符串(64^5个字符串),并将这些字符串存储在DB中,并最大程度地减少将它们存储在DB中所需的空间。

我对压缩算法了解不多,但是我认为我只能将每个字符编码为6位代码,并用base64编码并将每个代码连续存储在4个字节块中。我只浪费2位存储每个字符串。

当我查找压缩算法时,我会看到类似Huffman代码的内容,这些代码应该非常有效。例如,我看了看这篇文章关于“ bcaadddccacacac”字符串编码的Huffman编码。字符串长15个字符,每个字符都是一个字节存储,因此存储为15个字节。通过霍夫曼编码,它们的尺寸将大小降低到75位,即10个字节。但是,只需进行基本2编码,您不能做得更好吗?只有4个字符,因此您可以存储15个2位代码的连续流,该流只需要30位= 4个字节。

我只是不确定我在这里是否缺少东西。

Say I have a 5 character string, and the characters can be one of 64 characters [a-zA-Z0-9+/]. I want to generate every possible string (64^5 strings) and store those strings in a DB and minimize the space I need to store them in my DB.

I don't know much about compression algorithms, but I was thinking that I could just encode each character into a 6 bit code with a base64 encoding and store each code contiguously in 4 byte blocks. I'd only be wasting 2 bits to store each string.

When I look up compression algorithms, I see stuff like huffman codings which supposed to be really efficient. For example, I took a look at this post about the huffman coding for the string "BCAADDDCCACACAC". The string is 15 characters long and each character is one byte to store, so the storage is 15 bytes. With a huffman coding, they reduce the size to 75 bits which is 10 bytes. But couldn't you do much better by just doing a base 2 encoding? There's only 4 characters, so you could store a contiguous stream of 15 2bit codes which would only require 30 bits = 4 bytes.

I'm just not sure if there's something I'm missing here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文