按最近在 Lucene / Solr 中的访问排序
在我的 Solr 查询中,我想将最近访问的文档排序到顶部(“访问”意味着由用户操作打开)。没有其他搜索条件对我来说很重要:在文本与查询匹配的文档中,我希望它们按最近使用的顺序排列。我只能想到两种方法来做到这一点:
1)在每个文档中包含一个“上次访问”日期字段,以便 Solr 进行排序。据我所知,Trie 日期字段可以很快排序。当然,问题是使该字段保持最新,这需要存储每个文档的文本,以便我可以删除并重新添加具有更新的“上次访问”字段的任何文档。可变字段可以避免这种情况,但 Lucene/Solr 仍然不提供可变字段。
2)或者,存储可变的“上次访问”日期并将其更新到另一个数据库中。这将要求 Solr 返回匹配文档的完整列表,该列表可能多达数十万个文档。然后,这个巨大的文档 ID 列表将与数据库中的日期进行匹配,然后进行排序。它适用于不常见的搜索词,但不适用于广泛的常见搜索词。
因此,需要在 1) 索引大小加上每次访问文档时的处理成本和 2) 大查询开销之间进行权衡,特别是对于不集中的搜索词
我有其他选择吗?
In my Solr queries, I want to sort most recently accessed documents to the top ("accessed" meaning opened by user action). No other search criteria has weight for me: of the documents with text matching the query, I want them in order of recent use. I can only think of two ways to do this:
1) Include a 'last accessed' date field in each doc to have Solr sort upon. Trie Date fields can be sorted very quickly, I'm told. The problem of course is keeping the field up to date, which would require storing each document's text so I can delete and re-add any document with an updated 'last accessed' field. Mutable fields would obviate this, but Lucene/Solr still doesn't offer mutable fields.
2) Alternatively, store the mutable 'last accessed' dates and keep them updated in another db. This would require Solr to return the full list of matching documents, which could be upwards of hundreds of thousands of documents. This huge list of document ids would then be matched up against dates in the db and then sorted. It would work OK for uncommon search terms, but not for broad, common search terms.
So the trade off is between 1) index size plus a processing cost every time a document is accessed and 2) big query overhead, especially for unfocused search terms
Do I have any alternatives?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
http://lucidworks.lucidimagination.com/display/solr/Solr +Field+Types#SolrFieldTypes-使用外部文件
http://blog.mikemccandless.com/2012/01/tochildblockjoinquery-in-lucene .html
http://lucidworks.lucidimagination.com/display/solr/Solr+Field+Types#SolrFieldTypes-WorkingwithExternalFiles
http://blog.mikemccandless.com/2012/01/tochildblockjoinquery-in-lucene.html
您应该能够使用原子更新功能来做到这一点。
http://wiki.apache.org/solr/Atomic_Updates
此功能从 Solr 4.0 开始提供。它允许您更新文档中的单个字段,而无需重新索引整个文档。我只从文档中了解此功能。我自己没有使用过它,所以我不能说它有多好或者是否有任何陷阱。
You should be able to do this with the atomic update functionality.
http://wiki.apache.org/solr/Atomic_Updates
This functionality is available as of Solr 4.0. It allows you to update a single field in a document without having to reindex the entire document. I only know about this functionality from the documentation. I have not used it myself, so I can't say how well it works or if there are any pitfalls.
一定要使用选项 1,使用 SOLR 查询并根据需要更新 lastAccessed 字段。
由于 SOLR 4.0 在多个方面支持部分文档更新: https:// /cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
对于您的应用程序来说,简单的原子更新似乎就足够了。
就性能而言,这对于大型集合和快速文档更新应该非常有效。
Definitely use option 1, using SOLR queries and updating the lastAccessed field as needed.
Since SOLR 4.0 partial document updates are suported in several falvours: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
For your application it seems that a simple atomic update would be sufficient.
With respect to performance, this should work very well for large collections and fast document updates.