如何从 Solr 索引中删除逻辑删除的文档?
我正在为一个项目实施 Solr 进行自由文本搜索,该项目每天需要大规模添加和删除可搜索的记录。
由于规模的原因,我需要确保索引的大小合适。
在 Solr 的测试安装中,我索引了一组 10 个文档。然后我对其中一个文档进行了更改,并希望替换索引中具有相同 ID 的文档。当我搜索时,它可以正常工作并按预期运行。
我正在使用此代码来更新文档:
getSolrServer().deleteById(document.getIndexId());
getSolrServer().add(document.getSolrInputDocument());
getSolrServer().commit();
但我注意到,当我查看 Solr 服务器的统计页面时,这些数字不是我所期望的。
在初始索引之后,numDocs 和 maxDocs 均等于 10(如预期)。然而,当我更新文档时,numDocs 仍然等于 10(预期),但 maxDocs 等于 11(意外)。
在阅读文档时我看到
maxDoc 可能会更大,因为 maxDoc 计数包括尚未从索引中删除的逻辑删除文档。
那么问题来了,如何从索引中删除逻辑删除的文档呢?
如果这些文档仍然存在于索引中,当使用大量文档运行时,我是否会面临性能损失的风险?
谢谢 :)
I am implementing Solr for a free text search for a project where the records available to be searched will need to be added and deleted on a large scale every day.
Because of the scale I need to make sure that the size of the index is appropriate.
On my test installation of Solr, I index a set of 10 documents. Then I make a change in one of the document and want to replace the document with the same ID in the index. This works correctly and behaves as expected when I search.
I am using this code to update the document:
getSolrServer().deleteById(document.getIndexId());
getSolrServer().add(document.getSolrInputDocument());
getSolrServer().commit();
What I noticed though is that when I look at the stats page for the Solr server that the figures are not what I expect.
After the initial index, numDocs and maxDocs both equal 10 as expected. When I update the document however, numDocs is still equal to 10 (expected) but maxDocs equals 11 (unexpected).
When reading the documentation I see that
maxDoc may be larger as the maxDoc count includes logically deleted documents that have not yet been removed from the index.
So the question is, how do I remove logically deleted documents from the index?
If these documents still exist in the index do I run the risk of performance penalties when this is run with a very large volume of documents?
Thanks :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您必须优化您的索引。
请注意,优化是广泛的,您可能不应该超过每天进行一次。
以下是有关优化的更多信息:
http://www.lucidimagination.com/search /document/CDRG_ch06_6.3.1.3
http://wiki.apache.org/solr /SolrPerformanceFactors#Optimization_Considerations
You have to optimize your index.
Note that an optimize is expansive, you probably should not do it more than daily.
Here is some more info on optimize:
http://www.lucidimagination.com/search/document/CDRG_ch06_6.3.1.3
http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations