Solr删除/优化影响排名分数
有人知道为什么 Solr 的排名会受到删除(但不清除)文档的影响吗?
IE。如果我添加一个文档并搜索它,它的分数可能是 4.7,但是如果我重新添加它(即 Solr 删除旧的并再次添加它......使用相同的值),那么执行相同的查询,结果有一个分数4.5。如果我对索引进行优化,那么分数会再次返回到 4.7。
我认为这是由于当文档已被逻辑删除但未从索引中清除时,Solr 中 maxDoc 和 numDoc 之间的差异。
这是一个错误吗?就我而言,它会导致问题,因为当删除不相关的文档(不在我的结果集中)时,排序顺序最终不稳定。
这是 Solr 3.2.0
-Matt
anyone know why Solr's ranking is affected by deleting (but not purging) documents?
ie. if I add a document and search for it, its score might be 4.7 but then if I re-add it (ie Solr deletes the old and adds it again... with same values) then do the same query the result has a score of 4.5. If I do an optimize on the indexes then the score again returns to 4.7.
I reckon this is due to the difference between maxDoc and numDoc in Solr when a document has been logically deleted, but not purged from the index.
Is this a bug? In my case it is causing problems as the sort order ends up unstable when an unrelated document (not in my result set) is deleted.
This is Solr 3.2.0
-Matt
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这并不是真正的错误,而是 Solr 默认情况下的工作方式 - 因为您猜测删除文档在优化之前不会真正删除它,因此统计信息仍然反映了删除的文档。好处是它使删除成为一个快速操作(优化通常作为一个偶尔的过程完成)。其他一些引擎(例如 Xapian)实际上会完全删除文档。
It's not really a bug, rather how Solr works by default - as you surmise deleting a document doesn't actually remove it until you optimize, thus the statistics still reflect the deleted document until that point. The benefit is that it makes deletion a fast operation (optimisation is usually done as an occasional process). Some other engines (such as Xapian) do actually delete documents completely.