在 Lucene.NET 中过滤排序查询的结果
我正在使用 Lucene.NET,它目前是最新的 Lucene 2.9。我正在尝试实现一种不同的选择,但不需要深入到任何组。我知道 Lucene 3.2 有一个分面搜索可以解决这个问题,但我还没有时间将它移植到 2.9。
我认为无论如何,当您使用排序运算符执行分页查询时,Lucene 必须找到与查询匹配的所有文档,对它们进行排序,然后获取前 N 个结果,其中 N 是页面大小。我想构建一些在排序查询完成后也应用的东西,但获取前 N 个唯一结果并返回它们。我正在考虑使用 HashSet 和索引字段之一来确定唯一性。我宁愿找到一种方法来扩展 Lucene 中的某些内容,也不愿在出于性能原因返回结果后尝试执行此操作。
自定义过滤器似乎在应用主查询之前运行,并且自定义收集器在应用排序之前运行,除非您按 Lucene 的文档 ID 排序。那么解决这个问题的最佳方法是什么?指向要扩展的正确组件的方向的一点将为您提供这个问题的答案,示例实现肯定会为您提供答案。提前致谢
I'm using Lucene.NET, which is currently up to date with Lucene 2.9. I'm trying to implement a kind of select distinct, but without the need to drill down into any groups. I know that Lucene 3.2 has a faceted search that may solve this, but I don't have the time to port it to 2.9 yet.
I figure in any event, when you perform a paged query with a sort operator, Lucene has to find all the documents that match the query, sort them, then take the top N results, where N is the page size. I'd like to build something that is also applied after the sorted query has completed, but takes the top N unique results and returns them. I'm thinking of using a HashSet and one of the indexed fields to determine uniqueness. I'd rather find a way to extend something in Lucene than try and do this once the results are already returned for performance reasons.
Custom filters seem to run before the main query is even applied and custom collectors run before sorting is applied, unless you are sorting by Lucene's document id. So what is the best approach to this problem? A point in the direction of the right component to extend will get you the answer on this one, an example implementation will most definitely get you the answer. Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我会在不排序的情况下进行搜索,并在自定义收集器中,根据“唯一性”将结果收集到大小为 N 的排序列表中
I'd make the search without sorting, and in a custom collector, would collect the results in a sorted list of size N based on "uniqueness"