LSH和MINHASING-为什么哈希签名矩阵有意义?
我们将签名矩阵划分为频段,我们哈希(使用哪个哈希函数?)将列的每个部分分为k
buckets。为什么有意义?如果我们使用常规哈希函数,那么即使在两列中有轻微的差异也可能导致不同的存储桶。
我确实了解签名矩阵与雅卡德距离之间的关系,但是我不明白下一步本质上是均匀分布项目的哈希。
I'm learning about LSH and minhashing and I'm trying to understand the rational of hashing the signature matrix:
We divide the signature matrix to bands and we hash (using which hash function?) every portion of column to k
buckets. Why would it make sense? If we use a regular hash function then even a slight difference in two columns would probably lead to different buckets.
I do understand the relation between the signature matrix to Jacard distance but I don't understand the next step which is essentially hashing that distributes items evenly.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论