Lucene:使用 FuzzyQuery 在搜索中搜索
我需要使用包含大约 800 万行的索引创建一个 FuzzyQuery。这种查询速度相当慢,每场比赛大约需要 20 秒。事实上,在进行模糊搜索之前,我可以使用另一个字段将结果范围缩小到大约 5000 个匹配项。为此,我应该能够首先通过“较窄”字段进行搜索,然后在这些结果中使用模糊搜索。
根据 lucene FAQ,我唯一的所要做的就是一个BooleanQuery,其中应该需要“更窄的”(lucene 3 中的BooleanClause.Occur.MUST)。
现在我尝试了两种不同的方法:
a) 使用查询解析器,输入如下: narrower:+narrowing_text fuzzy:fuzzy_text~0.9</code>
b) 使用 TermQuery 和 构造 BooleanQuery FuzzyQuery
都不起作用,我得到的时间与不使用窄器时的时间大致相同。
另外,只是为了检查窄范围是否有效,时间应该会好得多,我只重新索引了与窄范围匹配的 5000 个项目,搜索速度快得要命。
如果有人想知道,我使用的是 pylucene 3.0.2。
I need to make a FuzzyQuery using an index that contains around 8 million lines. That kind of query is pretty slow, needing about 20 seconds for every match. The fact is that I can narrow down the results using another field to about 5000 hits before doing the fuzzy search. For this to work, I should be able to make a search by the "narrower" field first, and then use the fuzzy search within those results.
According to the lucene FAQ, the only thing I have to do is a BooleanQuery, where the "narrower" should be required (BooleanClause.Occur.MUST in lucene 3).
Now I have tried two different approaches:
a) Using the Query Parser, with an input like:narrower:+narrowing_text fuzzy:fuzzy_text~0.9
b) Constructing a BooleanQuery with a TermQuery and a FuzzyQuery
Neither did work, I'm getting about the same times than the ones when the narrower is not used.
Also, just to check that if the narrower was working the times should be much better, I reindexed only the 5000 items that match the narrower, and the search went fast as hell.
In case anyone wonders, I'm using pylucene 3.0.2.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Doppleganger,您可能可以使用 Filter< /a>,特别是 QueryWrapperFilter。
请遵循 Lucene 实践 中的示例。您可能需要进行一些修改才能在 python 中使用,但除此之外它应该很简单:
Doppleganger, you can probably use a Filter, specifically a QueryWrapperFilter.
Follow the example from Lucene in Action. You may have to make some modifications for use in python, but otherwise it should be simple: