为数百万个项目创建完美的哈希 - 结果只需“存在或不存在”即可

发布于 2024-11-26 18:37:33 字数 153 浏览 0 评论 0原文

有谁知道一个好的库(Windows)可以让我为数百万个项目(可能大约10m)创建一个静态(非运行时)完美哈希?

我本质上有数百万组字符串,我想以最小的 O(1) 知道一个字符串是否在我的集合中 - 就是这样。我不需要它来实际查找字符串 - 它背后没有任何价值(除了存在之外)。

Does anyone know of a good library (windows) that will allow me to create a static (not runtime) perfect hash for millions of items (probably about 10m)?

I essentially have millions of sets of strings and I want to know at a minimal O(1) if a string is in my set or not - that's it. I don't need it to actually look up the string - there's no value behind it (other than the existance).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

卷耳 2024-12-03 18:37:34

尝试一下:

Perfect 和 gperf 生成 C 代码形式的表,在 Windows 上应该可以正常工作。我不知道CMPH的输出是多少。

CMPH 有评论说:

gperf 有点不同,因为它的设想是为小型密钥集创建非常快速的完美哈希函数,而 CMPH 库的设想是为非常大的密钥集创建最小的完美哈希函数。

如果这是正确的,那么对于您的百万密钥情况,您可能应该更喜欢 CMPH 而不是 gperf。我不知道它们与詹金斯的完美相比如何。尝试这三种方法并相互比较应该很简单。

Try:

perfect and gperf produce tables in C code form, which should work fine on Windows. I don't know what CMPH's output is.

CMPH has a comment saying:

gperf is a bit different, since it was conceived to create very fast perfect hash functions for small sets of keys and CMPH Library was conceived to create minimal perfect hash functions for very large sets of keys.

If that's correct, then with your million-key case, you should probably prefer CMPH to gperf. I don't know how they compare to Jenkins's perfect. It should be simple enough to try all three and benchmark them against each other.

南笙 2024-12-03 18:37:34

布隆过滤器会做你想做的事,我会四处寻找拥有它们的库,或者你可以尝试自己编写一个。

A Bloom filter will do what you want, I would look around for libraries that have them or you can attempt to write one yourself.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文