lucene模糊搜索是懒惰的吗?
我想使用 Lucene 的模糊搜索,我理解它是基于某种类似 Levenshtein 的算法。如果我使用相当高的阈值(即“new york~0.9”),它会首先计算编辑距离,然后查看它是否小于 0.9 对应的值,或者如果它变得明显,它会切断算法该文档与查询的匹配程度不高?我知道这可以通过编辑算法实现。
I would like to use Lucene's fuzzy search, which I understand is based on some sort of Levenshtein-like algorithm. If I use a fairly high threshold (i.e, "new york~0.9"), will it first compute the edit distance and then see if it is less than whatever 0.9 corresponds to, or will it cut off the algorithm if it becomes apparent that the document does not match the query that closely? I understand that that is possible with the levenshtein algorithm.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不。您要查看的代码是 FuzzyTermEnum 的第 57-59 行:
您可以看到它计算距离,如果小于阈值则返回。
你为什么关心这个?除非您的术语有数千个字符长,否则计算完整编辑距离会非常快。
No. The code you want to see is lines 57-59 of FuzzyTermEnum:
You can see that it calculates the distance, then returns if that is less than the threshold.
Why do you care about this though? Unless your terms are thousands of characters long, calculating the full edit distance will be really quick.