c# url 的缩短字符串
我想独特地缩短字符串文件ID以在URL中使用,例如bit.ly等。我可以使用数据库中的ID,但我希望URL是随机的。
最好的解决方案是什么?
网站将是一个移动网站,所以我希望它尽可能短
i want to uniquely shorten strings-file ids to use in urls like the ones on bit.ly etc. I can use ids from a db but i want urls to be random like.
what would be the best solution?
site will be a mobile site so i want to it to as short as possible
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您不能“唯一缩短”任意字符串。鸽巢原理等等。
您想要做的(据我所知,网址缩短服务所做的)是保留提交的所有内容以及使用的短字符串的数据库。然后你就可以在数据库中查找它。
您可以通过简单地递增一个数字并每次对其进行 Base64 编码来生成短字符串。
You can't "uniquely shorten" arbitrary strings. Pigeonhole principle and all.
What you want to do (and, AFAIK what url-shortening services do) is keep a database of everything submitted, and the short string used. Then you can look it up in the database.
You can generate the short strings by simply incrementing a number and Base64 encoding it for each time.
有两种方法可以实现您所描述的地图服务。
客户端提交全局唯一 id
据我所知,1. 只能尝试使用
Guid
,除非您设计将足够不同的信息塞入短字节流的类似方法。无论哪种方式,如果您有一个表示全局唯一标识符的字节流,您可以执行类似的操作来获取用户可读的字母数字字符串,该字符串看起来是随机的,但避免了其他随机方案中固有的冲突。
Guid
包含 16 个字节或 128 位,对于完整的 Base64 编码来说,这转换为大约 19 个字符。这种方法的优点是客户可以在没有中央权威的情况下生成自己的小型 Uris。缺点是如果您使用
Guid
,或者实现您自己的全局唯一字节流(让我们面对现实),很容易出错。如果你确实走这条路,请考虑谷歌搜索全球唯一的字节流或类似的东西。哦,还有远离随机字节,否则你将不得不在你的小型 Uri 生成器之上构建冲突解决方案。
服务器生成全局唯一的 id
同样,上述的主要优点是客户端可以先验生成其 Uris。如果您要提交要检查的长期请求,则特别方便。这可能与您的情况并不特别相关,并且只能提供有限的价值。
因此,除此之外,以服务器为中心的方法(由单一机构生成并分发 ID)可能更有吸引力。如果这是您选择的路线,那么唯一的问题是您希望您的 Uri 持续多长时间?
假设所需长度为 5 个字符,假设您使用 Base64 编码,则每个 id 最多可以表示 5 个字符,每个字符 7 位等于 35 位或 2^35 [34 359 738 368] 个不同值。这是一个相当大的领域。 *
那么这就变成了为给定的提交返回一个值的问题。可能有很多方法可以做到这一点,但我会采用这样的方法,
增强或优化可能包括
结论
底线是,您想要保证唯一性——所以碰撞是一个很大的禁忌。
*=34 359 738 368 是原始域的大小,这是所有 0 长度到 5 长度的 id。如果您有兴趣将所有 id 的最小和最大长度限制为 5,那么您的域看起来就像长度为 0 到 5 (2^35) 的所有 id 减去长度为 0 到 4 (2^28) 的所有 id 为 2^ 35 - 2^28 = 34 091 302 912,这仍然很大:)
There are two methods to implementing a mapping service like the one you describe.
Clients submit globally unique ids
As far as I know, 1. should only be attempted with
Guid
s, unless you devise a similar means to cram sufficiently distinct information into a short byte stream. Either way, if you have a stream of bytes that represent a globally unique identifier, you may do something like thisto obtain a user-readable string of alphanumerics that appears random, but avoids collisions inherent in other random schemes. A
Guid
contains 16 bytes, or 128 bits, which translates to approximately 19 characters for a full Base64 encoding.The advantage to this approach is that clients may generate their own tiny Uris without a central authority. The downside is hefty length if you roll with
Guid
, or implementing your own globally unique byte stream which - let's face it - is error prone.If you do go this route, consider Google'ing globally unique byte streams or the such. Oh, and STAY AWAY FROM RANDOM BYTES, otherwise you will have to build collision resolution ON TOP OF your tiny Uri generator.
Server generates globally unique ids
Again, the primary advantage to the above is that Client's may generate their Uris a priori. Particularly handy if you are about to submit a long running request you wish to check up on. This may not be particularly relevant to your situation, and may provide only limited value.
So, that aside, a server-centric approach, in which a single authority generates and doles out ids may be more appealing. If this is the route you choose, then the only question is how long would you like your Uri?
Presuming a desired length of 5 characters, and let's say you go with a Base64 encoding, each id may represent up to 5 characters by 7 bits per character equals 35 bits or 2^35 [34 359 738 368] distinct values. That's a fairly large domain. *
Then it becomes a question of returning a value for a given submission. There are probably a great many many ways to do this, but I would go with something like this,
Enhancements or optimizations may include
Conclusion
Bottom line is, you want to guarantee uniqueness - so collisions are a big no-no.
*=34 359 738 368 is the size of the raw domain, this is all ids of 0 length to 5 length. If you are interested in restricting all ids to a minimum and maximum of 5 length, then your domain looks like all ids of length 0 to 5 (2^35) less all ids of length 0 to 4 (2^28) is 2^35 - 2^28 = 34 091 302 912, which is still quite large :)
存储随机字母数字字符串并将其用于您的短网址。使其长度适合您的网站和用户,例如
www.yoursite.com/d8f3
store a random alpha-numeric string and use that for your short url. make it the length that you think is best for your site and it's users some thing like
www.yoursite.com/d8f3
您可以使用哈希(例如 CRC32)来生成非常短的 URL。当你减少数据时,你将永远无法获得“唯一”的 URL,因此必然存在冲突。
You could use a hash (for example CRC32) to produce quite short URLs. You will never be able to get 'unique' URLs as you are reducing the data, so there has to be collisions.
嘿,nll,正如其他几个人告诉你的那样......如果你开始将 url 压缩成很小的东西,你将不可能保持它的唯一性。也就是说,您需要为提交给您的每个网址编写自己的编码。一种方法(简单)是,尝试从提交的网址创建一个数据库,然后为每个网址生成一个 guid 字段,然后从中获取一个子字符串,确保每次注册的内容都与以前的完全不同。
例如:www.google.com,guid 为 F9168C5E-CEB2-4faa-B6BF-329BF39FA1E4 -> http://www.mysite.com/?q=CEB2
字符数越多使用更多数量的链接,您可以跟踪。对于此示例,您将有 65536 个不同的链接(只有 4 个十六进制字符)。
希望这有帮助。
Hey nll, as several other people has told you.. If you start compressing the url into something small it will be impossible for you to keep it unique. That said, you need to make your own coding for every url submitted to you. One way (easy) to do it is, try to create a database from the submitted urls and then generate a guid field for each and then get a substring from it ensuring everytime you register something is totally different from the previous.
For instance: www.google.com with the guid F9168C5E-CEB2-4faa-B6BF-329BF39FA1E4 -> http://www.mysite.com/?q=CEB2
As more characters as you use, more amount of links you can keep track on. for this sample you will have 65536 different links (with only 4 characters on hex).
Hope this helps.