为数百万个项目创建完美的哈希 - 结果只需“存在或不存在”即可
有谁知道一个好的库(Windows)可以让我为数百万个项目(可能大约10m)创建一个静态(非运行时)完美哈希?
我本质上有数百万组字符串,我想以最小的 O(1) 知道一个字符串是否在我的集合中 - 就是这样。我不需要它来实际查找字符串 - 它背后没有任何价值(除了存在之外)。
Does anyone know of a good library (windows) that will allow me to create a static (not runtime) perfect hash for millions of items (probably about 10m)?
I essentially have millions of sets of strings and I want to know at a minimal O(1) if a string is in my set or not - that's it. I don't need it to actually look up the string - there's no value behind it (other than the existance).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试一下:
Perfect 和 gperf 生成 C 代码形式的表,在 Windows 上应该可以正常工作。我不知道CMPH的输出是多少。
CMPH 有评论说:
如果这是正确的,那么对于您的百万密钥情况,您可能应该更喜欢 CMPH 而不是 gperf。我不知道它们与詹金斯的完美相比如何。尝试这三种方法并相互比较应该很简单。
Try:
perfect and gperf produce tables in C code form, which should work fine on Windows. I don't know what CMPH's output is.
CMPH has a comment saying:
If that's correct, then with your million-key case, you should probably prefer CMPH to gperf. I don't know how they compare to Jenkins's perfect. It should be simple enough to try all three and benchmark them against each other.
布隆过滤器会做你想做的事,我会四处寻找拥有它们的库,或者你可以尝试自己编写一个。
A Bloom filter will do what you want, I would look around for libraries that have them or you can attempt to write one yourself.