局部敏感哈希实现?

发布于 2024-11-03 15:39:53 字数 169 浏览 0 评论 0原文

C/C++/Java/C# 中是否有相对简单易懂(且易于实现)的局部敏感哈希示例?

我想了解更多关于这个概念的信息,所以想尝试在几个文本文件上实现,只是为了看看它是如何工作的,所以我不需要任何高性能或任何东西......只是一个哈希的例子为相似的输入返回相似的哈希值的函数。之后我可以通过例子学到更多。 :)

Are there any relatively simple to understand (and simple to implement) locality-sensitive hash examples in C/C++/Java/C#?

I'd like to learn more about the concept and so want to try an implementation on a few text files just to see how it works, so I don't need anything high-performance or anything... just an example of a hash function that returns similar hashes for similar inputs. I can learn more from it by example afterwards. :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

合约呢 2024-11-10 15:39:53

对于字符串,您可以使用近似匹配算法。

如果字符串与参考字符串等距,那么它们很可能彼此相似。这样你就有了一个字符串的局部敏感哈希实现。

您可以为一定距离范围创建不同的哈希桶。

编辑:您可以尝试其他变化的字符串距离。一个更简单的算法只会返回“否”。两个字符串之间的公共字符。

For strings you can use approximate matching algorithm.

If the strings are equidistant from a reference string then chances are that they are similar to each other. And there you go you have a locality senitive hash implementation for strings.

You can create different hash buckets for a range of distances.

EDIT: You can try other variations of string distance. A simpler algorithm would just return no. of common characters between two strings.

述情 2024-11-10 15:39:53

MSDN 博客上有一篇精彩的文章: http://blogs.msdn.com/b/spt/archive/2008/06/11/locality-sensitive-hashing-lsh-and-min-hash.aspx

另外,至少还有一个 C++ 库,您可以在此处检查源代码:http://sourceforge。净/项目/lshkit/

Well there is an excellent into article at MSDN blogs here: http://blogs.msdn.com/b/spt/archive/2008/06/11/locality-sensitive-hashing-lsh-and-min-hash.aspx

Also there is at least once C++ library which you can inspect the source code of here: http://sourceforge.net/projects/lshkit/

各空 2024-11-10 15:39:53

Hadoop 上还有一个 Java 实现。它在文档方面做得很好。

它的名称为LikeLike

目前仅支持Likelike
Min-Wise 独立排列。
Min-Wise 独立排列是
应用于推荐
谷歌新闻

There is also a Java Implementation on Hadoop. it does a good job on documents.

it's called LikeLike

Currently Likelike supports only
Min-Wise independent permutations.
Min-Wise independent permutations is
applied to the recommendation of
Google News

最偏执的依靠 2024-11-10 15:39:53

我知道您明确要求 C/C++/C#,但有 nilsimsa 哈希的“nofollow">Python 端口,可能是比其他更大的图书馆更容易理解。

I realise you explicitly asked for C/C++/C#, but there is a Python port of the nilsimsa hash which might be easier to grok than other, larger libraries.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文