写入后是否强制优化lucene索引?

发布于 2024-09-27 05:36:33 字数 183 浏览 6 评论 0原文

目前我正在写入完成后调用索引编写器的优化方法。由于我的数据集很大,所以花了很长时间(并且需要更多空间(2*实际大小))来优化索引。我非常担心这一点,因为索引中经常包含很多文档。

那么

  1. 关闭优化就可以了吗?
  2. 对性能有何影响,例如未优化时查询速度会慢多少?

干杯

Currently i am calling the optimize method of the indexwriter after the completions of the write. Since my data set is huge, it took long time ( and needs more space (2*actual size)) to optimize the index. I am very much concerned about this because lot of documents included frequently in the index.

So

  1. is it ok to turn off optimize?
  2. What are the performance implications, like how much slower the querying when its not optmized?

Cheers

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

情魔剑神 2024-10-04 05:36:33

Lucene 常见问题解答 说:

什么是索引优化以及何时应该使用它?

IndexWriter 类支持 optimize() 方法,该方法可以压缩索引数据库并加快查询速度。您可能希望在对文档集执行完整索引或增量更新索引后使用此方法。如果您的增量更新频繁添加文档,您只想偶尔执行一次优化,以避免优化带来的额外开销。

如果我决定不优化索引,删除的文档什么时候才会真正删除?

已删除的文档被标记为已删除。但是,它们在索引中消耗的空间在索引优化之前不会被回收。即使索引没有得到优化,随着更多文档添加到索引中,该空间最终也会被回收。

The Lucene FAQ says:

What is index optimization and when should I use it?

The IndexWriter class supports an optimize() method that compacts the index database and speeds up queries. You may want to use this method after performing a complete indexing of your document set or after incremental updates of the index. If your incremental update adds documents frequently, you want to perform the optimization only once in a while to avoid the extra overhead of the optimization.

If I decide not to optimize the index, when will the deleted documents actually get deleted?

Documents that are deleted are marked as deleted. However, the space they consume in the index does not get reclaimed until the index is optimized. That space will also eventually be reclaimed as more documents are added to the index, even if the index does not get optimized.

蔚蓝源自深海 2024-10-04 05:36:33

您最了解自己的数据,因此我建议您执行一些测试来衡量使用和不使用optimize 步骤时查询的运行速度。

根据 javadocs,“在更新频繁的环境中,最好在低容量时间进行优化(如果有的话)”。您应该只在必要时进行优化。如果自上次优化以来只有 5% 的文档发生了更改,则没有必要,因此请了解一下文档更改的频率。也许您可以减少优化的频率,例如每隔几个小时或每天一次。

另请查看此线程,其中他们建议不要调用在索引不断更新的环境中进行优化,而不是选择设置较低的 mergeFactor

You know your data best so I would suggest you perform some tests to measure how fast your queries run with and without the optimize step.

According to the javadocs, "in environments with frequent updates, optimize is best done during low volume times, if at all". You should only optimize when necessary. If only 5% of your documents have changed since the last optimize, then it is not necessary, so get a feel of how frequently your documents change. Maybe you can optimise less often, say once every few hours or once a day.

Also take a look at this thread in which they advise against calling optimize at all in an environment whose indices are constantly updated and instead choose to set a low mergeFactor.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文