将数字列表压缩或编码为单个字母数字字符串的最佳方法是什么?
将任意长度和大小的数字列表压缩或编码为单个字母数字字符串的最佳方法是什么?
目标是能够将 1,5,8,3,20,212,42 之类的内容转换为 a8D1jN 之类的内容以在 URL 中使用,然后再转换回 1,5,8,3,20,212,42。
对于生成的字符串,我可以使用任何数字和任何 ASCII 字母(小写和大写),因此:0-9a-zA-Z。我不喜欢有任何标点符号。
What's the best way to compress or encode a list of numbers of arbitrary length and sizes into a single alphanumeric string?
The goal is to be able to convert something like 1,5,8,3,20,212,42 into something like a8D1jN to be used in a URL, and then back to 1,5,8,3,20,212,42.
For the resulting string I'm fine with any number and any ASCII letter, lowercase and uppercase, so: 0-9a-zA-Z. I prefer not to have any punctuation whatsoever.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
如果您将列表视为字符串,则需要对 11 个不同的字符进行编码(0-9 和逗号)。这可以用4位来表示。如果您愿意添加,请说 $ 和 !到您可接受的字符列表,那么您将有 64 个不同的输出字符,因此能够对每个字符编码 6 位。
这意味着您可以将字符串映射到编码字符串,该字符串比原始字符串短约 30%,并且看起来相当模糊且随机。
这样您就可以将数字系列 [1,5,8,3,20,212,42] 转码为字符串“gLQfoIcIeQqq”。
更新:我受到启发并为此解决方案编写了一个Python解决方案(速度不快但功能足够......)
If you consider your list as a string, then you have 11 different characters to encode (0-9 and comma). This can be expressed in 4 bits. If you were willing to add, say $ and ! to your list of acceptable characters, then you would have 64 different output characters, and thus be able to encode 6 bits per character.
This would mean that you could map the string to an encoded string that would be about 30% shorter than the original one, and fairly obfuscated and random looking.
This way you could transcode the number series [1,5,8,3,20,212,42] to the string "gLQfoIcIeQqq".
UPDATE: I felt inspired and wrote a python solution for this solution (not fast but functional enough...)
您可以使用
Base64
等编码方案。Base64 模块或库在多种编程语言中都很常见。
You can use an encoding scheme like the
Base64
.Base64 modules or libraries are common in multiple programming languages.
您可以进行简单的编码,用“a”+数字替换每个数字的最后一位数字,而不是用逗号分隔数字。因此,您的列表
[1,5,8,3,20,212,42]
将变得神秘起来bfid2a21c4c
。 :)仅当有少量数字时我才会使用类似的东西,其中压缩无法大大缩短字符串。如果它是我们正在讨论的很多数字,您可以尝试对数据执行某种压缩 + Base64 编码。
Instead of comma separating the numbers, you could do a simple encoding where you replace the last digit of each number with 'a'+digit. So, your list
[1,5,8,3,20,212,42]
would become mysterious lookingbfid2a21c4c
. :)I would use something like this only if there are a handful of numbers, where compression won't be able to shorten the string a lot. If it s a lot of numbers we are talking about, you could try to perform some sort of compression + base64 encoding on the data instead.
取决于数字的范围——在合理的范围内,简单的字典压缩方案就可以工作。
考虑到您对 10k 行的编辑和估计,每个数字映射到 [A-Za-z0-9] 的三元组的字典方案对于 62*62*62 不同的条目来说可能是唯一的。
Depends on the range of the numbers -- with a reasonable range a simple dictionary compression scheme could work.
Given your edit and estimate of 10k rows, a dictionary scheme where each number is mapped to a triple of [A-Za-z0-9] could be unique for 62*62*62 different entries.
可能有一个超级酷且高效的算法适合您的情况。然而,一个非常简单、经过测试且可靠的算法是对逗号分隔的数字字符串使用“通用”编码或压缩算法。
有很多可供选择。
There might be a super cool and efficient algorithm for your case. However, a very simple, tested, and reliable algorithm would be to use a "common" encoding or compression algorithm on the comma seperated string of numbers.
There are many available to choose from.
“最好”取决于您的标准是什么。
如果最好的意思是简单:只需将数字串在一起并用固定字符分隔:
1a5a8a3a20a212a42
这也应该快速
如果您希望结果字符串小, em>,您可以通过某种压缩算法(如 zip)运行上面的字符串,然后通过某种编码(如 base64 或类似算法)运行结果。
'best' depends on what your criteria are.
If best means simple : just string the numbers together seperated by a fixed character:
1a5a8a3a20a212a42
This should also be fast
If you want the resulting string to be small, you can run the string above through some compression algorithm like zip, then the result through some encoding like base64 or similar.
您也可以使用中国剩余定理。
这个想法是找到一个数字 X,使得
每个组合 (ij) 的 gcd(Ni Nj)=1。
CRT 说明了如何找到满足这些方程的最小数 X。
像这样,您可以将数字 a1...ak 编码为 X,并保持 N 的列表固定。每个 Ni 必须大于 ai,确实如此。
You can use the Chinese Remainder Theorem as well.
The idea is to find a number X such that
gcd(Ni Nj)=1 for each combination (i j).
The CRT says how to find a smallest number X that satisfies these equations.
Like this, you can encode the numbers a1...ak as X, and keep a list of Ns fixed. Each Ni must be greater than ai, quite so.