lucene模糊搜索是懒惰的吗？

发布于 2024-09-06 19:52:15 字数 157 浏览 10 评论 0原文

我想使用 Lucene 的模糊搜索，我理解它是基于某种类似 Levenshtein 的算法。如果我使用相当高的阈值（即“new york~0.9”），它会首先计算编辑距离，然后查看它是否小于 0.9 对应的值，或者如果它变得明显，它会切断算法该文档与查询的匹配程度不高？我知道这可以通过编辑算法实现。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

预谋 2024-09-13 19:52:15

如果文档明显与查询不匹配，它会中断算法吗？

不。您要查看的代码是 FuzzyTermEnum 的第 57-59 行：

int dist = editDistance(text, target, textlen, targetlen);
distance = 1 - ((double)dist / (double)Math.min(textlen, targetlen));
return (distance > FUZZY_THRESHOLD);

您可以看到它计算距离，如果小于阈值则返回。

你为什么关心这个？除非您的术语有数千个字符长，否则计算完整编辑距离会非常快。

will it cut off the algorithm if it becomes apparent that the document does not match the query that closely?

No. The code you want to see is lines 57-59 of FuzzyTermEnum:

int dist = editDistance(text, target, textlen, targetlen);
distance = 1 - ((double)dist / (double)Math.min(textlen, targetlen));
return (distance > FUZZY_THRESHOLD);

You can see that it calculates the distance, then returns if that is less than the threshold.

Why do you care about this though? Unless your terms are thousands of characters long, calculating the full edit distance will be really quick.

回复收藏 0 原文

~没有更多了~