这种 URL 缩短器混淆算法是否有效？

发布于 2024-11-14 08:11:08 字数 1403 浏览 3 评论 0原文

免责声明：我不是问如何制作 URL 缩短器（我已经实现了找到的“双射函数”答案此处使用 base-62 编码字符串）。相反，我想扩展此实现以混淆生成的字符串，以便它既：

A）不是一个容易猜测的序列，并且

B）仍然是双射的。

您可以轻松地随机化您的 base-62 字符集，但问题是它仍然像任何其他基数中的任何其他数字一样递增。例如，一个可能的增量进展可能是 {aX9fgE, aX9fg3, aX9fgf, aX9fgR, … ,}

我已经提出了一种混淆技术，在要求方面我很满意 A) ，但我只是部分确定它满足B)。这个想法是这样的：

增量方法中唯一能保证改变的是“1 的位置”（出于实用原因，我将使用十进制术语）。在我之前给出的示例级数中，这将是 {E, 3, f, R, …}。因此，如果 base-62 集中的每个字符都有自己唯一的偏移量（例如，它与“零字符”的距离），那么您可以将“1 位置”字符的偏移量应用到字符串的其余部分。

例如，假设一个包含字符 {A, f, 9, p, Z, 3} 的基数为 5 的集合（按从 0 到 5 的升序排列）。然后，每个都将分别具有 0 到 5 的唯一偏移量。计数看起来像 {A, f, 9, p, Z, 3, fA, ff, f9, fp, …} 等等。因此，当给定值 fZ3p 时，该算法会查看 p，并且偏移量为 +3，会将字符串排列为 Zf9p code> （假设以 5 为基数的集合是循环数组）。下一个增量数将是 fZ3Z，并且 Z 的偏移量为 +4，算法返回 39pZ。这些排列后的结果将作为用户的“唯一 URL”传递给用户，而用户永远不会看到实际的 base-62 编码字符串。

这种方法看起来确实是可逆的。只需查看最后一个字符，并使用负偏移量执行相同的排列。我认为，由于这个原因，它仍然必须是双射的。但我不知道这是否必然是真的？是否有任何我没有考虑的边缘/角落情况？

编辑：我的意图更注重缩短 URL 的长度，而不是模式的安全性。我意识到有很多涉及加密函数、分组密码等的解决方案。但我想强调的是，我不是询问实现A）的最佳方法，但是相反，“我的偏移方法是否满足B)”。

如果您能找到任何漏洞，我们将不胜感激。

原文

DISCLAIMER: I am not asking how to make a URL shortener (I have already implemented the "bijective function" answer found HERE that uses a base-62 encoded string). Instead, I want to expand this implementation to obfuscate the generated string so that it is both:

A) not an easily guessable sequence, and

B) still bijective.

You can easily randomize your base-62 character set, but the problem is that it still increments like any other number in any other base. For example, one possible incremental progression might be {aX9fgE, aX9fg3, aX9fgf, aX9fgR, … ,}

I have come up with an obfuscation technique that I am pleased with in terms of requirement A), but I'm only partially sure that it satisfies B). The idea is this:

The only thing that is guaranteed to change in the incremental approach is the "1's place" (I'll use decimal terminology for practicality reasons). In the sample progression I gave earlier, that would be {E, 3, f, R, …}. So if each character in the base-62 set had its own unique offset number (say, its distance from the "zero character"), then you could apply the offset of the "1's place" character to the rest of the string.

For instance, let's assume a base-5 set with characters {A, f, 9, p, Z, 3} (in ascending order from 0 to 5). Each one would then have a unique offset of 0 to 5 respectively. Counting would look like {A, f, 9, p, Z, 3, fA, ff, f9, fp, …} and so on. So the algorithm, when given a value of fZ3p, would look at the p and, having an offset of +3, would permute the string into Zf9p (assuming the base-5 set is a circular array). The next incremental number would be fZ3Z, and with Z's offset being +4, the algorithm returns 39pZ. These permutated results would be handed off to the user as his/her "unique URL", who would never see the actual base-62 encoded string.

This approach certainly seems reversible; just look at the last character, and perform the same permutation with the negative offset. And I'm thinking that for this reason, it has to still be bijective. But I don't know if this is necessarily true? Are there any edge/corner cases I'm not considering?

EDIT : My intentions are more heavily weighed towards the length of the shortened-URL rather than the security of the pattern. I realize there are plenty of solutions involving cryptographic functions, block ciphers, etc. But I would like to emphasize that I am not asking the best way to achieve A), but rather, "is my offset-approach satisfying B)".

Any holes you can find would be appreciated.

分享到QQ

分享到微博