创建短哈希的最佳方法是什么,类似于tiny Url 的做法?
我目前正在使用 MD5 哈希值,但我想找到一些可以创建仅使用 [az][AZ][0-9]
的较短哈希值的东西。 它只需要大约 5-10 个字符长。
是否已经有一些东西可以做到这一点?
更新 1:
我喜欢 CRC32 哈希值。 在.NET中是否有一种干净的计算方法?
更新 2:
我正在使用 Joe 提供的链接中的 CRC32 函数。 如何将 uInt 转换为上面定义的字符?
I'm currently using MD5 hashes but I would like to find something that will create a shorter hash that uses just [a-z][A-Z][0-9]
. It only needs to be around 5-10 characters long.
Is there something out there that already does this?
Update 1:
I like the CRC32 hash. Is there a clean way of calculating it in .NET?
Update 2:
I'm using the CRC32 function from the link Joe provided. How can I convert the uInt into the characters defined above?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
.NET 字符串对象有一个 GetHashCode() 函数。 它返回一个整数。
将其转换为十六进制,然后转换为 8 个字符长的字符串。
像这样:
更多信息:http://msdn.microsoft。 com/en-us/library/system.string.gethashcode.aspx
更新: 将上面链接中的注释添加到此答案中:
.NET string object has a GetHashCode() function. It returns an integer.
Convert it into a hex and then to an 8 characters long string.
Like so:
More on that: http://msdn.microsoft.com/en-us/library/system.string.gethashcode.aspx
UPDATE: Added the remarks from the link above to this answer:
您的目标是创建 URL 缩短器还是创建哈希函数?
如果您的目标是创建 URL 缩短器,那么您不需要哈希函数。 在这种情况下,您只需预先生成一个加密安全随机数序列,然后为每个要编码的 url 分配该序列中的唯一数字。
您可以使用如下代码来执行此操作:
使用加密数字生成器将使人们很难预测您生成的字符串,我认为这对您很重要。
然后,您可以使用字母表中的字符将 8 字节随机数转换为字符串。 这基本上是基数计算的变化(从基数 256 到基数 62)。
Is your goal to create a URL shortener or to create a hash function?
If your goal is to create a URL shortener, then you don't need a hash function. In that case, you just want to pre generate a sequence of cryptographically secure random numbers, and then assign each url to be encoded a unique number from the sequence.
You can do this using code like:
Using the cryptographic number generator will make it very difficult for people to predict the strings you generate, which I assume is important to you.
You can then convert the 8 byte random number into a string using the chars in your alphabet. This is basically a change of base calculation (from base 256 to base 62).
我不认为 URL 缩短服务使用哈希值,我认为它们只是有一个运行的字母数字字符串,该字符串随着每个新 URL 的增加而增加并存储在数据库中。
如果您确实需要使用哈希函数,请查看此链接:一些哈希函数
另外,有点离题,但根据您正在研究的内容,这可能会很有趣:编码恐怖文章
I dont think URL shortening services use hashes, I think they just have a running alphanumerical string that is increased with every new URL and stored in a database.
If you really need to use a hash function have a look at this link: some hash functions
Also, a bit offtopic but depending on what you are working on this might be interesting: Coding Horror article
只需采用 Base36(不区分大小写)或 Base64 的条目 ID 即可。
所以,假设我想使用 Base36:
(ID - Base36)
1 - 1
2 - 2
3 - 3
10 - 一个
11 - B
12 - C
...
10000 - 7PS
22000 - GZ4
34000 - Q8C
...
1000000 - LFLS
2345000 - 1E9EW
6000000 - 3KLMO
如果您使用 base64,则可以使这些内容更短,但 URL 会区分大小写。 您可以看到您仍然得到漂亮、整洁的字母数字密钥,并且保证不会发生冲突!
Just take a Base36 (case-insensitive) or Base64 of the ID of the entry.
So, lets say I wanted to use Base36:
(ID - Base36)
1 - 1
2 - 2
3 - 3
10 - A
11 - B
12 - C
...
10000 - 7PS
22000 - GZ4
34000 - Q8C
...
1000000 - LFLS
2345000 - 1E9EW
6000000 - 3KLMO
You could keep these even shorter if you went with base64 but then the URL's would be case-sensitive. You can see you still get your nice, neat alphanumeric key and with a guarantee that there will be no collisions!
您不能使用短哈希值,因为您需要从短版本到实际值的一对一映射。 对于短哈希来说,发生冲突的机会太高了。 正常的长哈希不会非常用户友好(即使碰撞的机会可能足够小,但对我来说仍然感觉不“正确”)。
TinyURL.com 似乎使用转换为Base 36(0-9,AZ)。
You cannot use a short hash as you need a one-to-one mapping from the short version to the actual value. For a short hash the chance for a collision would be far too high. Normal, long hashes, would not be very user-friendly (and even though the chance for a collision would probably be small enough then, it still wouldn't feel "right" to me).
TinyURL.com seems to use an incremented number that is converted to Base 36 (0-9, A-Z).
首先,我得到一个随机不同数字的列表。 然后我从基本字符串中选择每个
char
,追加并返回结果。 我选择 5 个字符,这将相当于基于 62 的 6471002 个排列。第二部分是检查数据库以查看是否存在,如果不保存短网址。First I get a list of random distinct numbers. Then I select each
char
from base string, append and return result. I'm selecting 5 chars, that will amount to 6471002 permutations out of base 62. Second part is to check against db to see if any exists, if not save short url.您可以通过将 MD5 哈希值编码为字母数字来减少字符数。 每个 MD5 字符通常表示为十六进制,因此有 16 个可能的值。 [a-zA-Z0-9] 包含 62 个可能的值,因此您可以通过采用 4 个 MD5 值对每个值进行编码。
编辑:
这是一个函数,它接受一个数字(4 个十六进制数字长)并返回 [0-9a-zA-Z]。 这应该会让您了解如何实施它。 请注意,类型可能存在一些问题; 我没有测试这段代码。
You can decrease the number of characters from the MD5 hash by encoding them as alphanumerics. Each MD5 character is usually represented as hex, so that's 16 possible values. [a-zA-Z0-9] includes 62 possible values, so you could encode each value by taking 4 MD5 values.
EDIT:
here's a function that takes a number ( 4 hex digits long ) and returns [0-9a-zA-Z]. This should give you an idea of how to implement it. Note that there may be some issues with the types; I didn't test this code.
您可以使用 CRC32,它有 8 个字节长,与 MD5 类似。 通过向实际值添加时间戳来支持唯一值。
所以它看起来像 http://foo.bar/abcdefg12。
You can use CRC32, it is 8 bytes long and similar to MD5. Unique values will be supported by adding timestamp to actual value.
So its will look like http://foo.bar/abcdefg12.
如果您正在寻找一个可以从 inters 生成微小独特哈希值的库,我强烈推荐 http://hashids.org/网/。 我在很多项目中使用它并且效果非常好。 您还可以为自定义哈希指定您自己的字符集。
If you're looking for a library that generates tiny unique hashes from inters, I can highly recommend http://hashids.org/net/. I use it in many projects and it works fantastically. You can also specify your own character set for custom hashes.
如果您不关心加密强度,任何 CRC 函数都可以。
维基百科列出了一堆不同的哈希函数,包括输出的长度。 将它们的输出转换为 [az][AZ][0-9] 很简单。
If you don't care about cryptographic strength, any of the CRC functions will do.
Wikipedia lists a bunch of different hash functions, including length of output. Converting their output to [a-z][A-Z][0-9] is trivial.
您可以使用 base64 而不是十六进制对 md5 哈希码进行编码,这样您就可以使用字符 [az][AZ][0-9] 获得更短的 url。
You could encode your md5 hash code with base64 instead of hexadecimal, this way you get a shorter url using exactly the characters [a-z][A-Z][0-9].
有一个很棒但古老的程序,名为
btoa
,它可以转换二进制使用大写和小写字母、数字和两个附加字符转换为 ASCII。 还有 MIME base64 编码; 大多数 Linux 系统可能都有一个名为base64
或base64encode
的程序。 任何一种都会为您提供来自 32 位 CRC 的简短、可读的字符串。There's a wonderful but ancient program called
btoa
which converts binary to ASCII using upper- and lower-case letters, digits, and two additional characters. There's also the MIME base64 encoding; most Linux systems probably have a program calledbase64
orbase64encode
. Either one would give you a short, readable string from a 32-bit CRC.您可以采用 MD5 哈希值的前 5-10 个字母数字字符。
You could take the first alphanumeric 5-10 characters of the MD5 hash.
如果您需要在每次调用时更改哈希值,您可以执行以下操作:
If you need the hash to change on every call, you can do something like: