如何生成字符串的长哈希值?
我有一个java应用程序,我想在其中生成字符串的long
id(以便将这些字符串存储在中Neo4j)。为了避免数据重复,我想为存储在long
整数中的每个字符串生成一个id,该id对于每个字符串应该是唯一的。我怎样才能做到这一点?
I have a java applciation in which I want to generate long
ids for strings (in order to store those strings in neo4j). In order to avoid data duplication, I would like to generate an id for each string stored in a long
integer, which should be unique for each string. How can I do that ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这段代码将计算出相当好的哈希值:
This code will calculate pretty good hash:
为什么不看看 String 的
hashcode()
函数,而直接采用它来使用 long 值呢?顺便提一句。如果有办法为每个字符串创建唯一的 ID,那么您就会找到一种压缩算法,能够将每个字符串打包为 8 个字节(根据定义不可能)。
Why don't you have a look a the
hashcode()
function of String, and just adopt it to using long values instead?Btw. if there was a way to create a unique ID for each String, then you would have found a compression algorithm that would be able to pack every String into 8 bytes (not possible by definition).
long
有 64 位。长度为 9 的String
有 72 位。来自 鸽子洞原理 - 您无法将 9 个字符长的字符串获取到的唯一哈希值长
。如果您仍然想要一个
long
哈希值:您可以为String->int
采用两个标准[不同!]哈希函数,hash1()
和hash2()
并计算:hash(s) = 2^32* hash1(s) + hash2(s)
long
has 64 bits. AString
of length 9 has 72 bits. from pigeon hole principle - you cannot get a unique hashing for 9 chars long strings to along
.If you still want a
long
hash: You can just take two standard [different!] hash functions forString->int
,hash1()
andhash2()
and calculate:hash(s) = 2^32* hash1(s) + hash2(s)
简单的 64 位哈希可以通过将 CRC32 与 Adler32 组合来实现,尽管它们不是为哈希而设计的。当然,这种组合不如现代哈希技术那么强大,但对于本身提供 CRC 库的语言来说,它是可移植的。
Java 中的示例:
Python 中的示例:
此要点比较了一些哈希方法:
https://gist.github.com/fabiolimace/507eac3d35900050eeb9772e5b1871ba
A simple 64 bits hash can be implemented by combining CRC32 with Adler32, although they are not made for hashing. Of course the combination is not as strong as modern hash techniques, but it is portable for languages that natively provide a library for CRC.
Example in Java:
Example in Python:
This Gist compares some hash methods:
https://gist.github.com/fabiolimace/507eac3d35900050eeb9772e5b1871ba
有很多答案,请尝试以下方法:
http://stackoverflow.com/questions/415953/generate-md5-hash-in-java编辑:删除,我错过了长要求。抱歉。
或者,按照之前的建议,查看来源。
附言。另一种技术是维护字符串字典:由于您不太可能很快获得 264 个字符串,因此您可以拥有完美的映射。但请注意,该映射也可能成为主要瓶颈。
There are many answers, try the following:
http://stackoverflow.com/questions/415953/generate-md5-hash-in-javaEDIT: removed, I've missed thelong
requirement. Mea culpa.Or, as suggested before, check out the sources.
PS. One more technique is to maintain a dictionary of strings: since you're unlikely to get 264 strings any time soon, you can have perfect mapping. Note though that that mapping may as well become a major bottleneck.