如何存储位置敏感的哈希值？

发布于 2024-11-17 23:36:28 字数 434 浏览 7 评论 0 原文

我已经有了生成局部敏感哈希值的算法，但是我应该如何对它们进行存储以利用它们的特性（即相似的元素具有接近的哈希值（具有汉明距离））？

在 matlab 代码中，我发现他们只是在要搜索的点的哈希值和数据库中的点的哈希值之间创建一个距离矩阵，以简化代码，同时引用所谓的 Charikar 方法来实际上很好地实现搜索方法。

我试图寻找这一点，但我不确定如何将我找到的任何方法（如多探针方法）应用于我的案例。如果您已经拥有哈希值，那么这些技术似乎都不容易插入。有没有简单的示例代码？或者有什么建议吗？

这是我正在讨论的包含 matlab 代码的页面的链接： http://www.eecs.berkeley.edu/~kulis/klsh/klsh .htm

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

青丝拂面 2024-11-24 23:36:28

基于：搜索局部敏感哈希在阅读相似性舍入算法的估计技术：

这个问题有点宽泛，所以我将在这里给出一个最小（抽象）的例子：

我们的数据集中有 6 (= n) 个向量，每个都有 d 位。假设我们进行 2 (= N) 次随机排列。

让第一个随机排列开始！请记住，我们排列位，而不是向量的顺序。排列这些位后，它们保持一个顺序，例如：

v1
v5
v0
v3
v2
v4

现在查询向量 q 到达，但它（几乎）不太可能与以下相同我们的数据集中的一个向量（排列后），因此我们不会通过执行二分搜索找到它。

然而，我们最终会处于两个向量之间。所以现在我们可以想象这样的场景（例如 q 位于 v0 和 v3 之间：

v1
v5
v0 <-- up pointer
   <-- q lies here
v3 <-- down pointer
v2
v4

现在我们向上或向下移动指针，寻找与最多位匹配的 vi 向量假设它是 v0。

类似地，我们进行第二次排列并找到向量 vi，假设我们现在比较第一次排列中的 v0 和 v4，看看哪个最接近。到q，即哪一个与 q 相等的位最多。

但是，如果您正在寻求现成的实现，您应该在软件推荐。我还会查看我链接到的论文，看看作者是否公开了代码，或者他们是否愿意在联系他们后分享代码。

Based on: Search in locality sensitive hashing I would say this, after reading Similarity Estimation Techniques from Rounding Algorithms:

This question is somehow broad, so I am just going to give a minimal (abstract) example here:

We have 6 (= n) vectors in our dataset, with d bits each. Let's assume that we do 2 (= N) random permutation.

Let the 1st random permutation begin! Remember that we permute the bits, not the order of the vectors. After permuting the bits, they maintain an order, for example:

v1
v5
v0
v3
v2
v4

Now the query vector, q, arrives, but it's (almost) unlikely that is going to be the same with a vector in our dataset (after the permutation), thus we won't find it by performing binary search.

However, we are going to end up between two vectors. So now we can imagine the scenario to be like this (for example q lies between v0 and v3:

v1
v5
v0 <-- up pointer
   <-- q lies here
v3 <-- down pointer
v2
v4

Now we move either up or down pointer, seeking for the vi vector that will match at the most bits with q. Let's say it was v0.

Similarly, we do the second permutation and we find the vector vi, let's say v4. we now compare v0 from the first permutation and v4, to see which one is closest to q, i.e. which one has the most bits equal with q.

However, if you are seeking for a ready implementation, you should ask in Software Recommendation. I would also look at the paper I linked to to see if the author(s) made the code public, or if they would like to share it after contacting them.

回复收藏 0 原文

~没有更多了~