lucene.net 使用查询分页?
我正在使用 lucene.net 生成索引并进行搜索。我实际上是通过 codeplex 上的 Examine 项目间接使用 API。我目前一切正常,分页逻辑已就位,但是当前逻辑对搜索完成后的结果进行分页。我不喜欢这样,因为这意味着搜索可能会返回数千条记录,只有这样我的代码才会获取所需的 10-20 条记录并丢弃其余的记录,这是对资源的重大浪费。即使每个 SearchResult 项只有很小的 3KB,执行这些搜索的内存量也会随着时间的推移而增长,并成为一个巨大的内存消耗者。我的共享主机仅保证 1GB 专用内存,因此这对我的网站来说是一个大问题。
那么问题来了:我如何单独使用lucene查询语言以分页的方式限制结果的结果呢?我查看了 lucene.net 移植自的 apache lucene 项目,但没有看到任何语法可以让我做我正在寻找的事情。基本上我想要相当于 sql server 在查询语言级别限制行的功能。
例如(这就是我们在sql中进行分页的方式,它只返回20条记录,而不是每一条与where子句匹配的记录)
Select * from (select Row_Number() OVER (ORDER BY OrderDate) as RoNum, 订单ID, 订单日期 来自销售订单 WHERE OrderCustomerName 如“Davis%”) O WHERE RowNum 介于 1 和 20 之间
I'm using lucene.net to produce an index and search it. I'm actually using the API indirectly through the Examine project on codeplex. I currently have everything working and the paging logic in place, however the current logic pages the results after the search has been completed. I don't like this because it means the search will possibly return thousands of records and only then does my code take the 10-20 records it needs and discards the rest which is a major waste of resources. Even if each SearchResult item is just a tiny 3KB the amount of memory to execute these searches will grow with time and become a huge memory hog. My shared host is only guaranteeing 1GB of dedicated memory so this is a big concern for my website.
So the question is: How do i limit the results of the results in a paged manner using lucene query language alone? I looked at the apache lucene project, which lucene.net is ported from, and I don't see any syntax that lets me do what I'm looking for. Basically I want the equivalent of what sql server has to limit the rows at the query language level.
E.g. (this is how we would do paging in sql and it only returns 20 records not every record that matches the where clause)
Select * from (select Row_Number() OVER (ORDER BY OrderDate) as RoNum,
OrderID,
OrderDate
FROM SalesOrders
WHERE OrderCustomerName like 'Davis%') O
WHERE RowNum BETWEEN 1 and 20
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不认为存在
资源的重大浪费
,因为搜索(使其变得简单)只不过是计算位向量和向量。分数。代价高昂的是从索引中读取文档。 (除了已弃用的Hits
类)搜索结果不会读取文档,而只是返回 docid,因此跳过前 N 个结果不会产生太多开销。例外情况是当您想根据某个字段对结果进行排序时。然后,必须从索引中读取搜索结果列表中的所有文档,以便能够以正确的顺序返回它们。
I don't think that there is a
major waste of resources
, since search is (making it simple) nothing more than calculating the Bitvector & scores. What is costly is the reading of docs from the index. (Except the deprecatedHits
class) search results don't read the docs, instead just return the docid's, so there isn't much overhead in skipping the first N result.The exception for this is when you want to sort the result according to some field. Then all docs in the search result list must be read from the index, to be able to return them in correct order.