Python:以可逆的方式将字符串编码为较短的无意义字符串

发布于 2024-10-12 20:53:31 字数 396 浏览 3 评论 0原文

我对这个通用问题感到抱歉(我过去没有任何关于压缩的知识,我不知道它是否有可能的解决方案)。

我有一些总是 19 个字符的代码。

这些字符只能是:AZaz0-9.:, -

一个例子可以是 1995AbC...123..456Z

我想要做的是找到一种方法以可逆的方式转换该字符串转换为仅包含 ascii 字符的较短字符:例如 gfSDd2H

  • 是否可以?
  • 有没有办法在Python中做到这一点?

谢谢!

I'm sorry for the generic question (I don't have any past knowledge about compression and I don't know if it has a possible solution).

I have some codes of always 19 characters.

These characters can be only: A-Z, a-z, 0-9, ., :, -

An example can be something like 1995AbC...123..456Z

What I want to do is to find a way to convert in a reversible way that string to a shorter one that contains only ascii characters: something like gfSDd2H.

  • Is it possible?
  • Is there a way to do it in python?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

深空失忆 2024-10-19 20:53:32

您可以尝试压缩字符串并将结果编码为例如 base64。当然,这假设您的原始字符串是可压缩的。对于 19 个字符的字符串,这似乎不太可能。

如果您被允许保留一些数据,您可以将第一个字符串压缩为 1,第二个字符串压缩为 2,等等...并且您将需要将您所做的映射存储在例如数据库中,以便您可以反转它。然后,您可以将该数字编码为 64 基(或其他基数)字符串。

这与 URL 缩短服务的工作原理类似。

You can try to compress the string and the encode the result to for example base64. This of course assumes that your original strings are compressible. For strings of 19 characters this seems unlikely.

If you are allowed to persist some data you can compress the first string to 1, the second to 2, etc... and you will need to store the mapping you made in for example a database so that you can reverse it. You can then encode the number as a base 64 (or some other base) string.

This is similar to how URL shortening services work.

森罗 2024-10-19 20:53:32

您允许使用 65 个不同的字符。假设所有输入具有相同的概率,每种编码将产生不少于 19*65/128 ≈ 10 个字符。但是,由于您可能想忽略不可打印的字符,因此可以将其减少到 19*65/95=13 个字符,并具有完美的映射。因此,任何这样的映射都不会导致空间的显着减少。

You allow 65 different characters. Assuming all inputs have the same probability, every encoding would produce not less than 19*65/128 ≈ 10 characters. However, since you probably want to ignore unprintable characters, this is diminished to 19*65/95=13 characters with a perfect mapping. Therefore, any such mapping will not lead to a significant reduction in space.

永言不败 2024-10-19 20:53:32

当然(?)这在Python 中是可能的。您要做的就是将 65 基数转换为 95 基数或 94 基数,然后再转换回来。只是它会有点慢,正如另一个答案中指出的那样,您不会节省太多空间

这里(未经测试)是基本构建块:

def ttoi(text, base, letter_values):
    """converts a base-"base" string to an int"""
    n = 0
    for c in text:
        n = n * base + letter_values[c]
    return n

def itot(number, base, alphabet, padsize):
    """converts an int into a base-"base" string
       The result is left-padded to "padsize" using the zero-value character"""
    temp = []
    assert number >= 0
    while number:
        number, digit = divmod(number, base)
        temp.append(alphabet[digit])
    return max(0, padsize - len(temp)) * alphabet[0] + "".join(reversed(temp))

例如您现有的 base-65 代码的定义:

b65_letter_values = {
    'A': 0, 'Z': 25, 'a': 26, 'z': 51, '0': 52, '9': 61,
    # etc
    }
b65_alphabet = "ABCetcXYZabcetcxyz0123456789.:-"
b65_padsize = 19

Of course (?) it's possible in Python. All you would be doing is converting a base-65 number into a base-95 or base-94 number, and back again. It's just that it would be a bit slow, and as pointed out in another answer, you wouldn't be saving much space

Here (untested) are the basic building blocks:

def ttoi(text, base, letter_values):
    """converts a base-"base" string to an int"""
    n = 0
    for c in text:
        n = n * base + letter_values[c]
    return n

def itot(number, base, alphabet, padsize):
    """converts an int into a base-"base" string
       The result is left-padded to "padsize" using the zero-value character"""
    temp = []
    assert number >= 0
    while number:
        number, digit = divmod(number, base)
        temp.append(alphabet[digit])
    return max(0, padsize - len(temp)) * alphabet[0] + "".join(reversed(temp))

Definitions for e.g. your existing base-65 code:

b65_letter_values = {
    'A': 0, 'Z': 25, 'a': 26, 'z': 51, '0': 52, '9': 61,
    # etc
    }
b65_alphabet = "ABCetcXYZabcetcxyz0123456789.:-"
b65_padsize = 19
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文