c# url 的缩短字符串

发布于 2024-08-17 10:40:41 字数 115 浏览 6 评论 0原文

我想独特地缩短字符串文件ID以在URL中使用,例如bit.ly等。我可以使用数据库中的ID,但我希望URL是随机的。

最好的解决方案是什么?

网站将是一个移动网站,所以我希望它尽可能短

i want to uniquely shorten strings-file ids to use in urls like the ones on bit.ly etc. I can use ids from a db but i want urls to be random like.

what would be the best solution?

site will be a mobile site so i want to it to as short as possible

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

清风无影 2024-08-24 10:40:41

您不能“唯一缩短”任意字符串。鸽巢原理等等。

您想要做的(据我所知,网址缩短服务所做的)是保留提交的所有内容以及使用的短字符串的数据库。然后你就可以在数据库中查找它。

您可以通过简单地递增一个数字并每次对其进行 Base64 编码来生成短字符串。

You can't "uniquely shorten" arbitrary strings. Pigeonhole principle and all.

What you want to do (and, AFAIK what url-shortening services do) is keep a database of everything submitted, and the short string used. Then you can look it up in the database.

You can generate the short strings by simply incrementing a number and Base64 encoding it for each time.

戈亓 2024-08-24 10:40:41

有两种方法可以实现您所描述的地图服务。

  1. 客户端提交全局唯一 id,或者
  2. 服务器生成全局唯一 id

客户端提交全局唯一 id

据我所知,1. 只能尝试使用 Guid,除非您设计将足够不同的信息塞入短字节流的类似方法。无论哪种方式,如果您有一个表示全局唯一标识符的字节流,您可以执行类似的操作

// source is either a Guid, or some other globally unique byte stream
byte[] bytes = Guid.NewGuid ().ToByteArray ();
string base64String = Convert.ToBase64String (bytes).Trim ("=");

来获取用户可读的字母数字字符串,该字符串看起来是随机的,但避免了其他随机方案中固有的冲突。 Guid 包含 16 个字节或 128 位,对于完整的 Base64 编码来说,这转换为大约 19 个字符。

这种方法的优点是客户可以在没有中央权威的情况下生成自己的小型 Uris。缺点是如果您使用 Guid,或者实现您自己的全局唯一字节流(让我们面对现实),很容易出错。

如果你确实走这条路,请考虑谷歌搜索全球唯一的字节流或类似的东西。哦,还有远离随机字节,否则你将不得不你的小型 Uri 生成器之上构建冲突解决方案。

服务器生成全局唯一的 id

同样,上述的主要优点是客户端可以先验生成其 Uris。如果您要提交要检查的长期请求,则特别方便。这可能与您的情况并不特别相关,并且只能提供有限的价值。

因此,除此之外,以服务器为中心的方法(由单一机构生成并分发 ID)可能更有吸引力。如果这是您选择的路线,那么唯一的问题是您希望您的 Uri 持续多长时间?

假设所需长度为 5 个字符,假设您使用 Base64 编码,则每个 id 最多可以表示 5 个字符,每个字符 7 位等于 35 位或 2^35 [34 359 738 368] 个不同值。这是一个相当大的领域。 *

那么这就变成了为给定的提交返回一个值的问题。可能有很多方法可以做到这一点,但我会采用这样的方法,

  • 枚举数据库中“空闲列表”内的所有可能值,
  • 在消耗时从空闲列表中删除值,
  • 在释放时将值添加到空闲列表

增强或优化可能包括

  • 不要枚举范围 [0, 2^35] 上的每个值,而是枚举一个可管理的子集,例如一次 100 000 个值,当所有值都消耗完时,只需按顺序生成另外 100 000 个值并继续
  • 添加值的到期日期,并在一天结束时回收过期的值
  • 分发您的服务,在并行化您的服务时,只需将空闲列表中相互排斥的小子集分配给分布式服务

结论

底线是,您想要保证唯一性——所以碰撞是一个很大的禁忌。


*=34 359 738 368 是原始域的大小,这是所有 0 长度到 5 长度的 id。如果您有兴趣将所有 id 的最小和最大长度限制为 5,那么您的域看起来就像长度为 0 到 5 (2^35) 的所有 id 减去长度为 0 到 4 (2^28) 的所有 id 为 2^ 35 - 2^28 = 34 091 302 912,这仍然很大:)

There are two methods to implementing a mapping service like the one you describe.

  1. Clients submit globally unique ids, or
  2. Server generates globally unique ids

Clients submit globally unique ids

As far as I know, 1. should only be attempted with Guids, unless you devise a similar means to cram sufficiently distinct information into a short byte stream. Either way, if you have a stream of bytes that represent a globally unique identifier, you may do something like this

// source is either a Guid, or some other globally unique byte stream
byte[] bytes = Guid.NewGuid ().ToByteArray ();
string base64String = Convert.ToBase64String (bytes).Trim ("=");

to obtain a user-readable string of alphanumerics that appears random, but avoids collisions inherent in other random schemes. A Guid contains 16 bytes, or 128 bits, which translates to approximately 19 characters for a full Base64 encoding.

The advantage to this approach is that clients may generate their own tiny Uris without a central authority. The downside is hefty length if you roll with Guid, or implementing your own globally unique byte stream which - let's face it - is error prone.

If you do go this route, consider Google'ing globally unique byte streams or the such. Oh, and STAY AWAY FROM RANDOM BYTES, otherwise you will have to build collision resolution ON TOP OF your tiny Uri generator.

Server generates globally unique ids

Again, the primary advantage to the above is that Client's may generate their Uris a priori. Particularly handy if you are about to submit a long running request you wish to check up on. This may not be particularly relevant to your situation, and may provide only limited value.

So, that aside, a server-centric approach, in which a single authority generates and doles out ids may be more appealing. If this is the route you choose, then the only question is how long would you like your Uri?

Presuming a desired length of 5 characters, and let's say you go with a Base64 encoding, each id may represent up to 5 characters by 7 bits per character equals 35 bits or 2^35 [34 359 738 368] distinct values. That's a fairly large domain. *

Then it becomes a question of returning a value for a given submission. There are probably a great many many ways to do this, but I would go with something like this,

  • Enumerate all possible values within a "free list" in your database
  • Remove value from free list when consumed
  • Add value to free list when released

Enhancements or optimizations may include

  • Do not enumerate every value on range [0, 2^35], instead enumerate a manageable subset, say 100 000 values at a time, and when all values are consumed, simply generate another 100 000 values in sequence and continue
  • Add an expiry date to values, and recycle expired values end of day
  • Distribute your service, when parallelizing your service simply dole out small mutually exclusive subsets of your free list to distributed services

Conclusion

Bottom line is, you want to guarantee uniqueness - so collisions are a big no-no.


*=34 359 738 368 is the size of the raw domain, this is all ids of 0 length to 5 length. If you are interested in restricting all ids to a minimum and maximum of 5 length, then your domain looks like all ids of length 0 to 5 (2^35) less all ids of length 0 to 4 (2^28) is 2^35 - 2^28 = 34 091 302 912, which is still quite large :)

浮华 2024-08-24 10:40:41

存储随机字母数字字符串并将其用于您的短网址。使其长度适合您的网站和用户,例如 www.yoursite.com/d8f3

store a random alpha-numeric string and use that for your short url. make it the length that you think is best for your site and it's users some thing like www.yoursite.com/d8f3

深海里的那抹蓝 2024-08-24 10:40:41

您可以使用哈希(例如 CRC32)来生成非常短的 URL。当你减少数据时,你将永远无法获得“唯一”的 URL,因此必然存在冲突。

You could use a hash (for example CRC32) to produce quite short URLs. You will never be able to get 'unique' URLs as you are reducing the data, so there has to be collisions.

樱娆 2024-08-24 10:40:41

嘿,nll,正如其他几个人告诉你的那样......如果你开始将 url 压缩成很小的东西,你将不可能保持它的唯一性。也就是说,您需要为提交给您的每个网址编写自己的编码。一种方法(简单)是,尝试从提交的网址创建一个数据库,然后为每个网址生成一个 guid 字段,然后从中获取一个子字符串,确保每次注册的内容都与以前的完全不同。

例如:www.google.com,guid 为 F9168C5E-CEB2-4faa-B6BF-329BF39FA1E4 -> http://www.mysite.com/?q=CEB2

字符数越多使用更多数量的链接,您可以跟踪。对于此示例,您将有 65536 个不同的链接(只有 4 个十六进制字符)。

希望这有帮助。

Hey nll, as several other people has told you.. If you start compressing the url into something small it will be impossible for you to keep it unique. That said, you need to make your own coding for every url submitted to you. One way (easy) to do it is, try to create a database from the submitted urls and then generate a guid field for each and then get a substring from it ensuring everytime you register something is totally different from the previous.

For instance: www.google.com with the guid F9168C5E-CEB2-4faa-B6BF-329BF39FA1E4 -> http://www.mysite.com/?q=CEB2

As more characters as you use, more amount of links you can keep track on. for this sample you will have 65536 different links (with only 4 characters on hex).

Hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文