如何优雅地终止 Lucene NRT Reader/Writer 进程?
我们在应用程序中使用 Lucene 的近实时搜索功能进行全文搜索。由于提交的成本很高,假设我们在每添加 10 个文档后就提交索引(我们预计每小时大约有 150 到 200 个文档用于索引)。现在,如果我想终止进程,如何确保在进程被终止之前内存中的所有文档都已提交到磁盘?这里有推荐的方法吗?或者我的文档量是否太小而无需担心,我是否应该承诺每次添加?
我应该追踪所有未提交的文件吗?如果进程在提交到磁盘之前被终止,我是否应该在进程启动时再次索引这些未提交的进程?
Lucene NRT 用于运行嵌入式 Jetty 的进程。向jetty发送关闭命令(调用一些servlet)并等待所有文档都提交然后使用System.exit()终止是正确的方法吗?
We are using Lucene's near real-time search feature for full-text search in our application. Since commits are costly, say we commit to index after every 10 documents are added (We expect around 150 to 200 documents per hour for indexing). Now, if I want to terminate my process, how do I make sure that all documents in memory are committed to disk before my process is killed? Is there any approach recommended here? Or is my document volume too less to bother about and should I commit on every addition?
Should I track all uncommitted documents? And if process gets killed before they are committed to disk, should I index these uncommitted ones again when the process start up?
Lucene NRT is used in a process that runs embedded Jetty. Is it the right approach to send a shutdown command (invoke some servlet) to jetty and wait till all documents are committed and then terminate using System.exit()?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以添加一个挂钩来提交 destroy servlet 方法,并确保在调用 System.exit 之前关闭嵌入式 servlet 容器(可以通过添加 关闭 JVM 挂钩)。
但这仍然不完美。如果您的进程被终止,所有缓冲的数据都将丢失。另一种解决方案是使用软提交。软提交是廉价的提交(不执行 fsync),因此如果您的进程被终止,则不会丢失数据(但如果服务器意外关闭,数据仍然可能丢失)。
总结一下:
You could add a hook to commit all the buffered documents in the destroy method of your servlet and make sure that the embedded servlet container is shut down before calling System.exit (maybe by adding a shutdown hook to the JVM).
But this is still not perfect. If your process gets killed, all the buffered data will be lost. Another solution is to use soft commits. Soft commits are cheap commits (no fsync is performed) so no data will be lost if your process gets killed (but data could still be lost if the server shuts down unexpectedly).
To sum up: