URL 缩短:最好使用什么编码?

发布于 2024-08-04 16:53:21 字数 827 浏览 5 评论 0原文

我正在向我的项目添加一个功能,我们将生成指向我们网站内部内容的链接,并且我们希望这些链接尽可能短,因此我们将制作自己的“URL 缩短器”。

我想知道生成的短 URL 的最佳编码/字母表是什么。 这很大程度上是一个主观问题,我想知道您对最佳方法/权衡的看法。

我想到的几个选择:
- 数字,大写 + 小写(基数 62)
- 数字,仅小写(基数 36)
- Base 32 (http://www.crockford.com/wrmg/base32.html)
- linkpot.net(使用常见的短英文单词)

当然,后两个更适合点击以外的用途,前两个更适合 Twitter。

另外,如果我要使用“仅可点击”的 URL,我希望字母表尽可能大,并添加其他符号。

  • 我可以在不会进行 URL 编码的 URL 中使用哪些符号?
  • 我应该使用什么符号?其中一些可能有问题吗?例如,我正在考虑斜杠和点。

你怎么认为?

注意:这些 URL 的主要目标是 Twitter。记住这一点,我们可能应该拥有尽可能最大的字母表,因为大多数人都会点击。不过,我对您以其他方式(通过电话、印刷纸张等)使用短 URL 的体验感兴趣。这种情况发生的可能性有多大?

注 2:我不是在制作“又一个 URL 缩短器”,请不要用反对票来谴责我。我们正在为网站的内部内容生成短 URL,不允许任何人缩短任何 URL。想象一下,当您生成指向特定坐标的链接时,Google 地图会为您提供简短的 URL。

I'm adding a feature to my project where we are generating links to internal stuff of our website, and we want these links to be as short as possible, so we'll be making our own "URL Shortener".

I'm wondering what's the best encoding / alphabet to use for the generated short URLs.
This is largely a subjective question, I'd like to know what your opinions are regarding the best approach / trade-off.

Several options I've thought of:
- Digits, uppercase + lowercase (base 62)
- Digits, only lowercase (base 36)
- Base 32 (http://www.crockford.com/wrmg/base32.html)
- linkpot.net (using common short english words)

Of course, the second two are better for uses other than clicking, and the first two are better for Twitter.

Also, if I'm going with "clickable-only" URLs, I'd like to make the alphabet as large as possible, adding other symbols.

  • What symbols can I use in URLs that won't get URL encoded?
  • What symbols should I use? Could some of these prove problematic? I'm thinking slash and dot, for example.

What do you think?

NOTE: The main target for these URLs is Twitter. Keeping this in mind, we should probably have the largest alphabet possible, since most people will be clicking. However, I'm interested in your experience with people using short URLs in other ways (over the phone, in printed paper, etc). How likely is it this could happen?

NOTE 2: I'm not making "yet another URL shortener", please don't condemn me with downvotes. We are generating short URLs for internal stuff in our site, not allowing anyone to shorten any URL. Imagine Google Maps giving you short URLs when you generate a link to a specific coordinate.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

勿忘心安 2024-08-11 16:53:21

如果这些是“只能点击的 URL”,我可能会使用 base-64 编码。 MIME 的 base-64 使用了一些您不应该使用的字符,但是 URL 中有足够多的未保留的安全字符,您可以将它们替换掉。 (此外,您不需要 MIME 的 base-64 使用的填充,因为您知道 URL 何时结束。)

这是 讨论一种方法的页面

如果需要,您可以查看 RFC2396 来准确了解 URI 中哪些字符是安全的仔细检查。

If these are "clickable only URLS" I'd probably go with a base-64 encoding. MIME's base-64 uses a couple of characters you shouldn't use, but there are enough unreserved safe characters in URLs that you can just swap them out. (Also, you don't need the padding that MIME's base-64 uses, since you know when your URL ends.)

Here's a page that discusses one way to do this.

You can look at RFC2396 to figure out exactly what characters are safe in URIs if you want to double check.

Bonjour°[大白 2024-08-11 16:53:21

我会选择 Base-62,它是最短的。缩短的 URL 并不意味着任何人都可以手动输入,因此不必担心区分大小写。

I would go with Base-62, it's the shortest. Shortened URL is not meant for someone to manually enter anyway so don't worry about case-sensitivity.

°如果伤别离去 2024-08-11 16:53:21

我很想了解更多有关实施的信息。如何将这些URL“不缩短”,或者将正在访问的内部页面保存为缩短的URL?无论哪种情况,即使您使用 [AZ] 编码集,您也可以仅用 3 个字符引用 26 * 26 * 26 = 17,576 个页面;您指的是多少个内部网页?

一般来说,我会根据您的用例要求来选择正确的编码集。您是否计划将这些链接用于“点击以外的用途”?这些用途是什么?您认为它们会如何改变编码? (例如,在不区分大小写的文件系统上使用 URL 的一部分作为文件名会减少可用的字符集。)

这里是一个信息页面,介绍您在编写 URL 时可以使用的字符集。

I'd be curious to know a little more about the implementation. How will these URLs be "unshortened", or will the internal pages being accessed be saved as shortened URLs? In either case, even if you went with the encoding set of [A-Z] you'd be able to reference 26 * 26 * 26 = 17,576 pages with only 3 characters; how many internal web pages are you talking about?

In general I would lean on what your use case requirements are for picking the right encoding set. Are you planning on having these links available for "uses other than clicking"? What would those uses be, and how do you suspect they'll alter the encoding? (For example, using parts of the URL as a file name on a case-insensitive file system reduces the available character set.)

Here's an informative page on the character set you have available to you when writing a URL.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文