写入后是否强制优化lucene索引?
目前我正在写入完成后调用索引编写器的优化方法。由于我的数据集很大,所以花了很长时间(并且需要更多空间(2*实际大小))来优化索引。我非常担心这一点,因为索引中经常包含很多文档。
那么
- 关闭优化就可以了吗?
- 对性能有何影响,例如未优化时查询速度会慢多少?
干杯
Currently i am calling the optimize method of the indexwriter after the completions of the write. Since my data set is huge, it took long time ( and needs more space (2*actual size)) to optimize the index. I am very much concerned about this because lot of documents included frequently in the index.
So
- is it ok to turn off optimize?
- What are the performance implications, like how much slower the querying when its not optmized?
Cheers
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Lucene 常见问题解答 说:
The Lucene FAQ says:
您最了解自己的数据,因此我建议您执行一些测试来衡量使用和不使用
optimize
步骤时查询的运行速度。根据 javadocs,“在更新频繁的环境中,最好在低容量时间进行优化(如果有的话)”。您应该只在必要时进行优化。如果自上次优化以来只有 5% 的文档发生了更改,则没有必要,因此请了解一下文档更改的频率。也许您可以减少
优化
的频率,例如每隔几个小时或每天一次。另请查看此线程,其中他们建议不要调用在索引不断更新的环境中进行优化,而不是选择设置较低的
mergeFactor
。You know your data best so I would suggest you perform some tests to measure how fast your queries run with and without the
optimize
step.According to the javadocs, "in environments with frequent updates, optimize is best done during low volume times, if at all". You should only optimize when necessary. If only 5% of your documents have changed since the last optimize, then it is not necessary, so get a feel of how frequently your documents change. Maybe you can
optimise
less often, say once every few hours or once a day.Also take a look at this thread in which they advise against calling optimize at all in an environment whose indices are constantly updated and instead choose to set a low
mergeFactor
.