包含“大多数”的文档的 Whoosh 查询组合条款
我刚刚开始使用 Whoosh 并注意到查询似乎对每个术语应用了逻辑,例如 AND([term1, term2, ...]) 或 OR([term1, term2, ...])。
我的问题是,我想要包含包含搜索字符串中的大多数术语(但不一定是全部)的文档。文档的术语越多,它就应该越“相关”。例如,如果我搜索“大棕色牛”,我希望结果包含仅匹配术语“棕色”和“牛”或“大”和“棕色”但不一定两者都匹配的文档。当然,如果文档包含所有术语,那么它们的排名应该高于其他文档。
我怎样才能做到这一点? (无需对每个单独的术语组合进行单独搜索!)
I've just started using Whoosh and noticed that queries seem to have logic applied to each term such as AND([term1, term2, ...]) or OR([term1, term2, ...]).
My problem is that I want to include documents that include most of the terms in my search string, but not necessarily all. The more terms a doc has, the more "relevant" it should be. For example, if I search for "big brown cow" I want the results to include documents that only match terms "brown" and "cow", or "big" and "brown" but not necessarily both. Ofcourse, if documents have all terms then they should have a higher ranking than the others.
How can I accomplish this? (Without having to do a separate search for each individual combination of terms!)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以将 Whoosh 解析器配置为默认在查询术语之间使用
OR
而不是AND
。请参阅 http://packages.python.org/Whoosh/parsing.html#common-自定义。然后,您可以编写一个自定义评分类,如果项目具有更多搜索词,则对项目进行评分更高。请参阅http://packages.python.org/Whoosh/searching.html#评分和排序 和 http://packages.python.org/Whoosh/api/scoring.html#module-whoosh.scoring。
总而言之,文档是开始寻找此类问题答案的好地方。
You can configure the Whoosh parser to default to using
OR
rather thanAND
between query terms. See http://packages.python.org/Whoosh/parsing.html#common-customizations.You can then write a custom scoring class that scores items higher if they have more of the search terms. See http://packages.python.org/Whoosh/searching.html#scoring-and-sorting and http://packages.python.org/Whoosh/api/scoring.html#module-whoosh.scoring.
In all, the documentation is a good place to start looking for answers to questions like these.