这种 URL 缩短器混淆算法是否有效?

发布于 2024-11-14 08:11:08 字数 1403 浏览 3 评论 0原文

免责声明:我不是问如何制作 URL 缩短器(我已经实现了找到的“双射函数”答案 此处 使用 base-62 编码字符串)。相反,我想扩展此实现以混淆生成的字符串,以便它既:

A)不是一个容易猜测的序列,并且

B)仍然是双射的。

您可以轻松地随机化您的 base-62 字符集,但问题是它仍然像任何其他基数中的任何其他数字一样递增。例如,一个可能的增量进展可能是 {aX9fgE, aX9fg3, aX9fgf, aX9fgR, … ,}

我已经提出了一种混淆技术,在要求方面我很满意 A) ,但我只是部分确定它满足B)。这个想法是这样的:

增量方法中唯一能保证改变的是“1 的位置”(出于实用原因,我将使用十进制术语)。在我之前给出的示例级数中,这将是 {E, 3, f, R, …}。因此,如果 base-62 集中的每个字符都有自己唯一的偏移量(例如,它与“零字符”的距离),那么您可以将“1 位置”字符的偏移量应用到字符串的其余部分。

例如,假设一个包含字符 {A, f, 9, p, Z, 3} 的基数为 5 的集合(按从 0 到 5 的升序排列)。然后,每个都将分别具有 0 到 5 的唯一偏移量。计数看起来像 {A, f, 9, p, Z, 3, fA, ff, f9, fp, …} 等等。因此,当给定值 fZ3p 时,该算法会查看 p,并且偏移量为 +3,会将字符串排列为 Zf9p code> (假设以 5 为基数的集合是循环数组)。下一个增量数将是 fZ3Z,并且 Z 的偏移量为 +4,算法返回 39pZ。这些排列后的结果将作为用户的“唯一 URL”传递给用户,而用户永远不会看到实际的 base-62 编码字符串。

这种方法看起来确实是可逆的。只需查看最后一个字符,并使用负偏移量执行相同的排列。我认为,由于这个原因,它仍然必须是双射的。但我不知道这是否必然是真的?是否有任何我没有考虑的边缘/角落情况?

编辑:我的意图更注重缩短 URL 的长度,而不是模式的安全性。我意识到有很多涉及加密函数、分组密码等的解决方案。但我想强调的是,我不是询问实现A)的最佳方法,但是相反,“我的偏移方法是否满足B)”。

如果您能找到任何漏洞,我们将不胜感激。

DISCLAIMER: I am not asking how to make a URL shortener (I have already implemented the "bijective function" answer found HERE that uses a base-62 encoded string). Instead, I want to expand this implementation to obfuscate the generated string so that it is both:

A) not an easily guessable sequence, and

B) still bijective.

You can easily randomize your base-62 character set, but the problem is that it still increments like any other number in any other base. For example, one possible incremental progression might be {aX9fgE, aX9fg3, aX9fgf, aX9fgR, … ,}

I have come up with an obfuscation technique that I am pleased with in terms of requirement A), but I'm only partially sure that it satisfies B). The idea is this:

The only thing that is guaranteed to change in the incremental approach is the "1's place" (I'll use decimal terminology for practicality reasons). In the sample progression I gave earlier, that would be {E, 3, f, R, …}. So if each character in the base-62 set had its own unique offset number (say, its distance from the "zero character"), then you could apply the offset of the "1's place" character to the rest of the string.

For instance, let's assume a base-5 set with characters {A, f, 9, p, Z, 3} (in ascending order from 0 to 5). Each one would then have a unique offset of 0 to 5 respectively. Counting would look like {A, f, 9, p, Z, 3, fA, ff, f9, fp, …} and so on. So the algorithm, when given a value of fZ3p, would look at the p and, having an offset of +3, would permute the string into Zf9p (assuming the base-5 set is a circular array). The next incremental number would be fZ3Z, and with Z's offset being +4, the algorithm returns 39pZ. These permutated results would be handed off to the user as his/her "unique URL", who would never see the actual base-62 encoded string.

This approach certainly seems reversible; just look at the last character, and perform the same permutation with the negative offset. And I'm thinking that for this reason, it has to still be bijective. But I don't know if this is necessarily true? Are there any edge/corner cases I'm not considering?

EDIT : My intentions are more heavily weighed towards the length of the shortened-URL rather than the security of the pattern. I realize there are plenty of solutions involving cryptographic functions, block ciphers, etc. But I would like to emphasize that I am not asking the best way to achieve A), but rather, "is my offset-approach satisfying B)".

Any holes you can find would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

辞取 2024-11-21 08:11:08

如果您确实希望它们难以猜测,请保持简单。

从在计数器模式下运行的正常加密算法开始。当您获得要缩短的 URL 时,增加计数器,对其进行加密,将结果转换为使用可打印字符(例如,基数 64)的内容,并将原始 URL 和缩短的版本放入表中,以便您可以从需要时缩短版本。

此时唯一真正的问题是使用什么加密算法。反过来,这取决于您的威胁模型。我不明白通过使缩短的 URL 难以猜测你到底能得到什么,所以我对威胁模型有点不确定。

如果你想让它稍微难以猜测,你可以使用 40 位版本的 RC4 之类的东西。这很容易被破坏,但足以让大多数人避免打扰。

如果您想要更高的安全性,可以升级到 DES。这已经被打破了,但即使在这么晚的时候,打破它也是相当困难的。

如果您想要更高的安全性,可以使用 AES。

请注意,随着安全性的提高,缩短的 URL 会变得更长。 RC4-40 以 5 字节开始,DES 7 字节,AES 32 字节。根据您转换为可打印文本的方式,它至少会扩展一点。

If you honestly want them to be hard to guess, keep it simple.

Start with a normal encryption algorithm running in counter mode. When you get a URL to shorten, increment your counter, encrypt it, convert the result to something using printable characters (e.g., base 64) and put the original URL and the shortened version into your table so you can get the original URL from the shortened version when needed.

The only real question at that point is what encryption algorithm to use. That, in turn, depends on your threat model. I don't see exactly what you gain by making shortened URLs hard to guess, so I'm a bit uncertain about the threat model.

If you want to make it mildly difficult to guess, you could use something like a 40-bit version of RC4. This is pretty easy to break, but enough to keep most people from bothering.

If you want a bit more security, you could step up to DES. That's been broken, but even at this late date breaking it is quite a bit of work.

If you want more security than that, you can use AES.

Note that as you increase the security, the shortened URL gets longer though. RC4-40 starts out with 5 bytes, DES 7 bytes, and AES with 32 bytes. Depending on how you convert to printable text, that's going to expand at least a little.

酒与心事 2024-11-21 08:11:08

另一种选择是使用 Luby-Rackoff 结构(另请参阅此处),这是一种从伪随机排列生成伪随机排列的方法-随机函数。

您只需选择一个“轮函数”F。F 必须将密钥 K 和您正在编码的大小一半的位块作为输入。 F 必须生成一个位块作为输出,该位块的大小也是您正在编码的内容的一半。

然后,您只需运行 Luby-Rackoff 构造(又名“Feistel 网络”)四轮,每轮使用不同的 K。

该构造保证结果是双射映射,并且如果 F 为很难逆转。

Another option is to use the Luby-Rackoff construction (see also here), which is a way to generate a pseudo-random permutation from a pseudo-random function.

You just have to pick a "round function" F. F must take as input a key K and a block of bits half the size of what you are encoding. F must produce as output a block of bits also half the size of whatever you are encoding.

Then you just run the Luby-Rackoff construction (aka. "Feistel network") for four rounds, each round using a different K.

The construction guarantees that the result is a bijective map, and it will be hard to invert provided that F is hard to invert.

活雷疯 2024-11-21 08:11:08

我尝试解决同样的问题(在 php 中)并最终得到了这些函数:

所以对于A):它不容易猜到(对我来说),因为你不能在没有算法的情况下增加字符串来获取下一条记录

对于B):据我所知,它是100%双射的。

感谢 @Nemo 为 Feistel 网络命名,这使我找到了我链接到的第一个函数。

I tried to solve the same problem (in php) and ended up with those functions:

So for the A): it's not easily guessable (to me) as you cant increment a string to get the next record without an algo

And for the B): for what I understand it's 100% bijective.

Thanks to @Nemo for naming the feistel network, which lead me to the first function i have linked to.

落墨 2024-11-21 08:11:08

如果您试图避免人们抓取 URL,我认为 Nick Johnson 的想法是正确的,您需要确保 URL 空间不密集。

这里有一个简单的想法:获取您的 URL,并在其前面添加一些随机字符。然后通过压缩算法运行它——我会尝试范围编码(如果你找到一个好的库,你可能可以指定基础)。这应该可以解压缩为原始形式,并且应该会影响局部性并使编码空间更加稀疏。

也就是说,我想几乎所有的 URL 缩短器都会在服务器端保留一个包含状态的哈希表。您还如何将 100 个字符的 URL 无损压缩为 5 或 6 个字符?

If you're trying to avoid people crawling the URLs, I think Nick Johnson has the right idea, that you need to make sure your URL space is not dense.

Here's a simple idea: take your URL, and prepend a few random characters to it. Then run it through a compression algorithm -- I'd try range encoding (you can probably specify the basis if you find a good library). This should be decompressible to the original form, and should both impact locality and make the encoded space more sparse.

That said, I imagine nearly all URL shorteners out there keep a hash table with state on the server side. How else are you going to losslessly compress a hundred-character URL into 5 or 6 characters?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文