将数字列表压缩或编码为单个字母数字字符串的最佳方法是什么?

发布于 2024-09-26 17:05:54 字数 205 浏览 4 评论 0原文

将任意长度和大小的数字列表压缩或编码为单个字母数字字符串的最佳方法是什么?

目标是能够将 1,5,8,3,20,212,42 之类的内容转换为 a8D1jN 之类的内容以在 URL 中使用,然后再转换回 1,5,8,3,20,212,42。

对于生成的字符串,我可以使用任何数字和任何 ASCII 字母(小写和大写),因此:0-9a-zA-Z。我不喜欢有任何标点符号。

What's the best way to compress or encode a list of numbers of arbitrary length and sizes into a single alphanumeric string?

The goal is to be able to convert something like 1,5,8,3,20,212,42 into something like a8D1jN to be used in a URL, and then back to 1,5,8,3,20,212,42.

For the resulting string I'm fine with any number and any ASCII letter, lowercase and uppercase, so: 0-9a-zA-Z. I prefer not to have any punctuation whatsoever.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

魂牵梦绕锁你心扉 2024-10-03 17:05:54

如果您将列表视为字符串,则需要对 11 个不同的字符进行编码(0-9 和逗号)。这可以用4位来表示。如果您愿意添加,请说 $ 和 !到您可接受的字符列表,那么您将有 64 个不同的输出字符,因此能够对每个字符编码 6 位。

这意味着您可以将字符串映射到编码字符串,该字符串比原始字符串短约 30%,并且看起来相当模糊且随机。

这样您就可以将数字系列 [1,5,8,3,20,212,42] 转码为字符串“gLQfoIcIeQqq”。

更新:我受到启发并为此解决方案编写了一个Python解决方案(速度不快但功能足够......)

    ZERO = ord('0')
    OUTPUT_CHARACTERS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789$!"

    def encode(numberlist):

        # convert to string -> '1,5,8,3,20,212,42'
        s = str(numberlist).replace(' ','')[1:-1]

        # convert to four bit values -> ['0010', '1011', '0110', ... ]
        # (add 1 to avoid the '0000' series used for padding later)
        four_bit_ints = [0 <= (ord(ch) - ZERO) <= 9 and (ord(ch) - ZERO) + 1 or 11 for ch in s]
        four_bits = [bin(x).lstrip('-0b').zfill(4) for x in four_bit_ints]

        # make binary string and pad with 0 to align to 6 -> '00101011011010111001101101...'
        bin_str = "".join(four_bits)
        bin_str = bin_str + '0' * (6 - len(bin_str) % 6)

        # split to 6bit blocks and map those to ints
        six_bits = [bin_str[x * 6 : x * 6 + 6] for x in range(0, len(bin_str) / 6)]
        six_bit_ints = [int(x, 2) for x in six_bits]

        # map the 6bit integers to characters
        output = "".join([OUTPUT_CHARACTERS[x] for x in six_bit_ints])

        return output

    def decode(input_str):

        # map the input string from characters to 6bit integers, and convert those to bitstrings
        six_bit_ints = [OUTPUT_CHARACTERS.index(x) for x in input_str]
        six_bits = [bin(x).lstrip('-0b').zfill(6) for x in six_bit_ints]

        # join to a single binarystring
        bin_str = "".join(six_bits)

        # split to four bits groups, and convert those to integers
        four_bits = [bin_str[x * 4 : x * 4 + 4] for x in range(0, len(bin_str) / 4)]
        four_bit_ints = [int(x, 2) for x in four_bits]

        # filter out 0 values (padding)
        four_bit_ints = [x for x in four_bit_ints if x > 0]

        # convert back to the original characters -> '1',',','5',',','8',',','3',',','2','0',',','2','1','2',',','4','2'
        chars = [x < 11 and str(x - 1) or ',' for x in four_bit_ints]

        # join, split on ',' convert to int
        output = [int(x) for x in "".join(chars).split(',') if x]

        return output


    if __name__ == "__main__":

        # test
        for i in range(100):
            numbers = range(i)
            out = decode(encode(numbers))
            assert out == numbers

        # test with original series
        numbers = [1,5,8,3,20,212,42]
        encoded = encode(numbers)
        print encoded         # prints 'k2UBsZgZi7uW'
        print decode(encoded) # prints [1, 5, 8, 3, 20, 212, 42]


If you consider your list as a string, then you have 11 different characters to encode (0-9 and comma). This can be expressed in 4 bits. If you were willing to add, say $ and ! to your list of acceptable characters, then you would have 64 different output characters, and thus be able to encode 6 bits per character.

This would mean that you could map the string to an encoded string that would be about 30% shorter than the original one, and fairly obfuscated and random looking.

This way you could transcode the number series [1,5,8,3,20,212,42] to the string "gLQfoIcIeQqq".

UPDATE: I felt inspired and wrote a python solution for this solution (not fast but functional enough...)

    ZERO = ord('0')
    OUTPUT_CHARACTERS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789$!"

    def encode(numberlist):

        # convert to string -> '1,5,8,3,20,212,42'
        s = str(numberlist).replace(' ','')[1:-1]

        # convert to four bit values -> ['0010', '1011', '0110', ... ]
        # (add 1 to avoid the '0000' series used for padding later)
        four_bit_ints = [0 <= (ord(ch) - ZERO) <= 9 and (ord(ch) - ZERO) + 1 or 11 for ch in s]
        four_bits = [bin(x).lstrip('-0b').zfill(4) for x in four_bit_ints]

        # make binary string and pad with 0 to align to 6 -> '00101011011010111001101101...'
        bin_str = "".join(four_bits)
        bin_str = bin_str + '0' * (6 - len(bin_str) % 6)

        # split to 6bit blocks and map those to ints
        six_bits = [bin_str[x * 6 : x * 6 + 6] for x in range(0, len(bin_str) / 6)]
        six_bit_ints = [int(x, 2) for x in six_bits]

        # map the 6bit integers to characters
        output = "".join([OUTPUT_CHARACTERS[x] for x in six_bit_ints])

        return output

    def decode(input_str):

        # map the input string from characters to 6bit integers, and convert those to bitstrings
        six_bit_ints = [OUTPUT_CHARACTERS.index(x) for x in input_str]
        six_bits = [bin(x).lstrip('-0b').zfill(6) for x in six_bit_ints]

        # join to a single binarystring
        bin_str = "".join(six_bits)

        # split to four bits groups, and convert those to integers
        four_bits = [bin_str[x * 4 : x * 4 + 4] for x in range(0, len(bin_str) / 4)]
        four_bit_ints = [int(x, 2) for x in four_bits]

        # filter out 0 values (padding)
        four_bit_ints = [x for x in four_bit_ints if x > 0]

        # convert back to the original characters -> '1',',','5',',','8',',','3',',','2','0',',','2','1','2',',','4','2'
        chars = [x < 11 and str(x - 1) or ',' for x in four_bit_ints]

        # join, split on ',' convert to int
        output = [int(x) for x in "".join(chars).split(',') if x]

        return output


    if __name__ == "__main__":

        # test
        for i in range(100):
            numbers = range(i)
            out = decode(encode(numbers))
            assert out == numbers

        # test with original series
        numbers = [1,5,8,3,20,212,42]
        encoded = encode(numbers)
        print encoded         # prints 'k2UBsZgZi7uW'
        print decode(encoded) # prints [1, 5, 8, 3, 20, 212, 42]


巡山小妖精 2024-10-03 17:05:54

您可以使用 Base64 等编码方案。

Base64 模块或库在多种编程语言中都很常见。

You can use an encoding scheme like the Base64.

Base64 modules or libraries are common in multiple programming languages.

哥,最终变帅啦 2024-10-03 17:05:54

您可以进行简单的编码,用“a”+数字替换每个数字的最后一位数字,而不是用逗号分隔数字。因此,您的列表[1,5,8,3,20,212,42]将变得神秘起来bfid2a21c4c。 :)

仅当有少量数字时我才会使用类似的东西,其中压缩无法大大缩短字符串。如果它是我们正在讨论的很多数字,您可以尝试对数据执行某种压缩 + Base64 编码。

Instead of comma separating the numbers, you could do a simple encoding where you replace the last digit of each number with 'a'+digit. So, your list [1,5,8,3,20,212,42] would become mysterious looking bfid2a21c4c. :)

I would use something like this only if there are a handful of numbers, where compression won't be able to shorten the string a lot. If it s a lot of numbers we are talking about, you could try to perform some sort of compression + base64 encoding on the data instead.

云朵有点甜 2024-10-03 17:05:54

取决于数字的范围——在合理的范围内,简单的字典压缩方案就可以工作。

考虑到您对 10k 行的编辑和估计,每个数字映射到 [A-Za-z0-9] 的三元组的字典方案对于 62*62*62 不同的条目来说可能是唯一的。

Depends on the range of the numbers -- with a reasonable range a simple dictionary compression scheme could work.

Given your edit and estimate of 10k rows, a dictionary scheme where each number is mapped to a triple of [A-Za-z0-9] could be unique for 62*62*62 different entries.

短暂陪伴 2024-10-03 17:05:54

可能有一个超级酷且高效的算法适合您的情况。然而,一个非常简单、经过测试且可靠的算法是对逗号分隔的数字字符串使用“通用”编码或压缩算法。

有很多可供选择。

There might be a super cool and efficient algorithm for your case. However, a very simple, tested, and reliable algorithm would be to use a "common" encoding or compression algorithm on the comma seperated string of numbers.

There are many available to choose from.

多情癖 2024-10-03 17:05:54

“最好”取决于您的标准是什么。

如果最好的意思是简单:只需将数字串在一起并用固定字符分隔:

1a5a8a3a20a212a42

这也应该快速

如果您希望结果字符串, em>,您可以通过某种压缩算法(如 zip)运行上面的字符串,然后通过某种编码(如 base64 或类似算法)运行结果。

'best' depends on what your criteria are.

If best means simple : just string the numbers together seperated by a fixed character:

1a5a8a3a20a212a42

This should also be fast

If you want the resulting string to be small, you can run the string above through some compression algorithm like zip, then the result through some encoding like base64 or similar.

紫竹語嫣☆ 2024-10-03 17:05:54

您也可以使用中国剩余定理。

这个想法是找到一个数字 X,使得

X = a1 mod n1
X = a2 mod n2
...
X = ak mod nk

每个组合 (ij) 的 gcd(Ni Nj)=1。

CRT 说明了如何找到满足这些方程的最小数 X。

像这样,您可以将数字 a1...ak 编码为 X,并保持 N 的列表固定。每个 Ni 必须大于 ai,确实如此。

You can use the Chinese Remainder Theorem as well.

The idea is to find a number X such that

X = a1 mod n1
X = a2 mod n2
...
X = ak mod nk

gcd(Ni Nj)=1 for each combination (i j).

The CRT says how to find a smallest number X that satisfies these equations.

Like this, you can encode the numbers a1...ak as X, and keep a list of Ns fixed. Each Ni must be greater than ai, quite so.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文