当前位置：文江博客话题详情

将数字列表压缩或编码为单个字母数字字符串的最佳方法是什么？

发布于 2024-09-26 17:05:54 字数 205 浏览 4 评论 0原文

将任意长度和大小的数字列表压缩或编码为单个字母数字字符串的最佳方法是什么？

目标是能够将 1,5,8,3,20,212,42 之类的内容转换为 a8D1jN 之类的内容以在 URL 中使用，然后再转换回 1,5,8,3,20,212,42。

对于生成的字符串，我可以使用任何数字和任何 ASCII 字母（小写和大写），因此：0-9a-zA-Z。我不喜欢有任何标点符号。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

魂牵梦绕锁你心扉 2024-10-03 17:05:54

如果您将列表视为字符串，则需要对 11 个不同的字符进行编码（0-9 和逗号）。这可以用4位来表示。如果您愿意添加，请说 $ 和 !到您可接受的字符列表，那么您将有 64 个不同的输出字符，因此能够对每个字符编码 6 位。

这意味着您可以将字符串映射到编码字符串，该字符串比原始字符串短约 30%，并且看起来相当模糊且随机。

这样您就可以将数字系列 [1,5,8,3,20,212,42] 转码为字符串“gLQfoIcIeQqq”。

更新：我受到启发并为此解决方案编写了一个Python解决方案（速度不快但功能足够......）

    ZERO = ord('0')
    OUTPUT_CHARACTERS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789$!"

    def encode(numberlist):

        # convert to string -> '1,5,8,3,20,212,42'
        s = str(numberlist).replace(' ','')[1:-1]

        # convert to four bit values -> ['0010', '1011', '0110', ... ]
        # (add 1 to avoid the '0000' series used for padding later)
        four_bit_ints = [0 <= (ord(ch) - ZERO) <= 9 and (ord(ch) - ZERO) + 1 or 11 for ch in s]
        four_bits = [bin(x).lstrip('-0b').zfill(4) for x in four_bit_ints]

        # make binary string and pad with 0 to align to 6 -> '00101011011010111001101101...'
        bin_str = "".join(four_bits)
        bin_str = bin_str + '0' * (6 - len(bin_str) % 6)

        # split to 6bit blocks and map those to ints
        six_bits = [bin_str[x * 6 : x * 6 + 6] for x in range(0, len(bin_str) / 6)]
        six_bit_ints = [int(x, 2) for x in six_bits]

        # map the 6bit integers to characters
        output = "".join([OUTPUT_CHARACTERS[x] for x in six_bit_ints])

        return output

    def decode(input_str):

        # map the input string from characters to 6bit integers, and convert those to bitstrings
        six_bit_ints = [OUTPUT_CHARACTERS.index(x) for x in input_str]
        six_bits = [bin(x).lstrip('-0b').zfill(6) for x in six_bit_ints]

        # join to a single binarystring
        bin_str = "".join(six_bits)

        # split to four bits groups, and convert those to integers
        four_bits = [bin_str[x * 4 : x * 4 + 4] for x in range(0, len(bin_str) / 4)]
        four_bit_ints = [int(x, 2) for x in four_bits]

        # filter out 0 values (padding)
        four_bit_ints = [x for x in four_bit_ints if x > 0]

        # convert back to the original characters -> '1',',','5',',','8',',','3',',','2','0',',','2','1','2',',','4','2'
        chars = [x < 11 and str(x - 1) or ',' for x in four_bit_ints]

        # join, split on ',' convert to int
        output = [int(x) for x in "".join(chars).split(',') if x]

        return output


    if __name__ == "__main__":

        # test
        for i in range(100):
            numbers = range(i)
            out = decode(encode(numbers))
            assert out == numbers

        # test with original series
        numbers = [1,5,8,3,20,212,42]
        encoded = encode(numbers)
        print encoded         # prints 'k2UBsZgZi7uW'
        print decode(encoded) # prints [1, 5, 8, 3, 20, 212, 42]

If you consider your list as a string, then you have 11 different characters to encode (0-9 and comma). This can be expressed in 4 bits. If you were willing to add, say $ and ! to your list of acceptable characters, then you would have 64 different output characters, and thus be able to encode 6 bits per character.

This would mean that you could map the string to an encoded string that would be about 30% shorter than the original one, and fairly obfuscated and random looking.

This way you could transcode the number series [1,5,8,3,20,212,42] to the string "gLQfoIcIeQqq".

UPDATE: I felt inspired and wrote a python solution for this solution (not fast but functional enough...)

    ZERO = ord('0')
    OUTPUT_CHARACTERS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789$!"

    def encode(numberlist):

        # convert to string -> '1,5,8,3,20,212,42'
        s = str(numberlist).replace(' ','')[1:-1]

        # convert to four bit values -> ['0010', '1011', '0110', ... ]
        # (add 1 to avoid the '0000' series used for padding later)
        four_bit_ints = [0 <= (ord(ch) - ZERO) <= 9 and (ord(ch) - ZERO) + 1 or 11 for ch in s]
        four_bits = [bin(x).lstrip('-0b').zfill(4) for x in four_bit_ints]

        # make binary string and pad with 0 to align to 6 -> '00101011011010111001101101...'
        bin_str = "".join(four_bits)
        bin_str = bin_str + '0' * (6 - len(bin_str) % 6)

        # split to 6bit blocks and map those to ints
        six_bits = [bin_str[x * 6 : x * 6 + 6] for x in range(0, len(bin_str) / 6)]
        six_bit_ints = [int(x, 2) for x in six_bits]

        # map the 6bit integers to characters
        output = "".join([OUTPUT_CHARACTERS[x] for x in six_bit_ints])

        return output

    def decode(input_str):

        # map the input string from characters to 6bit integers, and convert those to bitstrings
        six_bit_ints = [OUTPUT_CHARACTERS.index(x) for x in input_str]
        six_bits = [bin(x).lstrip('-0b').zfill(6) for x in six_bit_ints]

        # join to a single binarystring
        bin_str = "".join(six_bits)

        # split to four bits groups, and convert those to integers
        four_bits = [bin_str[x * 4 : x * 4 + 4] for x in range(0, len(bin_str) / 4)]
        four_bit_ints = [int(x, 2) for x in four_bits]

        # filter out 0 values (padding)
        four_bit_ints = [x for x in four_bit_ints if x > 0]

        # convert back to the original characters -> '1',',','5',',','8',',','3',',','2','0',',','2','1','2',',','4','2'
        chars = [x < 11 and str(x - 1) or ',' for x in four_bit_ints]

        # join, split on ',' convert to int
        output = [int(x) for x in "".join(chars).split(',') if x]

        return output


    if __name__ == "__main__":

        # test
        for i in range(100):
            numbers = range(i)
            out = decode(encode(numbers))
            assert out == numbers

        # test with original series
        numbers = [1,5,8,3,20,212,42]
        encoded = encode(numbers)
        print encoded         # prints 'k2UBsZgZi7uW'
        print decode(encoded) # prints [1, 5, 8, 3, 20, 212, 42]

回复收藏 0 原文

巡山小妖精 2024-10-03 17:05:54

您可以使用 Base64 等编码方案。

Base64 模块或库在多种编程语言中都很常见。

回复收藏 0 原文

哥，最终变帅啦 2024-10-03 17:05:54

您可以进行简单的编码，用“a”+数字替换每个数字的最后一位数字，而不是用逗号分隔数字。因此，您的列表[1,5,8,3,20,212,42]将变得神秘起来bfid2a21c4c。 :)

仅当有少量数字时我才会使用类似的东西，其中压缩无法大大缩短字符串。如果它是我们正在讨论的很多数字，您可以尝试对数据执行某种压缩 + Base64 编码。

回复收藏 0 原文

云朵有点甜 2024-10-03 17:05:54

取决于数字的范围——在合理的范围内，简单的字典压缩方案就可以工作。

考虑到您对 10k 行的编辑和估计，每个数字映射到 [A-Za-z0-9] 的三元组的字典方案对于 62*62*62 不同的条目来说可能是唯一的。

回复收藏 0 原文

短暂陪伴 2024-10-03 17:05:54

可能有一个超级酷且高效的算法适合您的情况。然而，一个非常简单、经过测试且可靠的算法是对逗号分隔的数字字符串使用“通用”编码或压缩算法。

有很多可供选择。

回复收藏 0 原文

多情癖 2024-10-03 17:05:54

“最好”取决于您的标准是什么。

如果最好的意思是简单：只需将数字串在一起并用固定字符分隔：

1a5a8a3a20a212a42

这也应该快速

如果您希望结果字符串小， em>，您可以通过某种压缩算法（如 zip）运行上面的字符串，然后通过某种编码（如 base64 或类似算法）运行结果。

回复收藏 0 原文

紫竹語嫣☆ 2024-10-03 17:05:54

您也可以使用中国剩余定理。

这个想法是找到一个数字 X，使得

X = a1 mod n1
X = a2 mod n2
...
X = ak mod nk

每个组合 (ij) 的 gcd(Ni Nj)=1。

CRT 说明了如何找到满足这些方程的最小数 X。

像这样，您可以将数字 a1...ak 编码为 X，并保持 N 的列表固定。每个 Ni 必须大于 ai，确实如此。

You can use the Chinese Remainder Theorem as well.

The idea is to find a number X such that

X = a1 mod n1
X = a2 mod n2
...
X = ak mod nk

gcd(Ni Nj)=1 for each combination (i j).

The CRT says how to find a smallest number X that satisfies these equations.

Like this, you can encode the numbers a1...ak as X, and keep a list of Ns fixed. Each Ni must be greater than ai, quite so.

回复收藏 0 原文

~没有更多了~

关于作者

少女净妖师

暂无简介

0 文章

0 评论

672 人气

关注发私信

友情链接

文江博客

将数字列表压缩或编码为单个字母数字字符串的最佳方法是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

玍銹的英雄夢

我不会写诗

十六岁半

浸婚纱

qq_kJ6XkX

旧伤还要旧人安

友情链接

将数字列表压缩或编码为单个字母数字字符串的最佳方法是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

玍銹的英雄夢

我不会写诗

十六岁半

浸婚纱

qq_kJ6XkX

旧伤还要旧人安

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。