Elasticsearch 文档在批量插入时被删除
我有以下情况: 使用JDBC输入连接器和Elasticsearch输出连接器的LogStash。 数据最初加载到ES中,这意味着在LogStash负载之前不存在填充的索引。 LogStash根据映射模板创建索引。 LogStash和ES在版本7.17.0中。
直到最近,这种情况都很好。
问题
索引效果很好,直到大约1 miO文档。然后,索引新文档会迅速减慢,而已删除的文档的数量极大地振荡。它增加了很多,并且经常下降。
我知道将文档作为更新操作的一部分删除,并且以前已经如此。但是删除的文档数量大部分增加了,并且没有振荡于此。 在此问题发生之前,负载结束时的情况约为8 mio文档。计数约为3 mio,如删除。目前,数字停留在1.2 mio左右。
该方案部署在两个环境上,只有一个环境显示了这种行为。两者都有相同的源和logstash配置。环境在其实例上有所不同。两种实例均配置平等。
磁盘空间很好,ES日志中没有错误。 我还可以确认LogStash运行良好,并且不断向ES发送批量请求。
在另一个线程中,有人提到ES将文档放在过载方案中。我找不到有关这种行为的任何证据,但这至少是我所经历的一个很好的解释: 文档在插入后自动删除在Elasticsearch
TIME 1230:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open myindex LLC8qxSWTWyO1U25Olljlg 1 1 1182676 166056 642.3mb 642.3mb
TIME 1240:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open myindex LLC8qxSWTWyO1U25Olljlg 1 1 1182676 533339 946.9mb 946.9mb
TIME 1300:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open myindex LLC8qxSWTWyO1U25Olljlg 1 1 1182676 349747 701.9mb 701.9mb
TIME 1400:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open myindex LLC8qxSWTWyO1U25Olljlg 1 1 1182678 467651 1gb 1gb
TIME 1430:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open myindex LLC8qxSWTWyO1U25Olljlg 1 1 1182678 693906 1gb 1gb
I have the following situation:
Logstash with a JDBC input connector and an Elasticsearch output connector.
The data is loaded initially into the ES, meaning the indices which are filled do not exist prior to the Logstash load.
Logstash creates the indices based on mapping templates.
Logstash and ES are in version 7.17.0.
This scenario worked perfectly fine until recently.
Issue
The indexing works perfectly fine until around 1 mio documents. Then indexing of new documents slows down rapidly and the number of deleted documents oscillates immensely. It increases a lot and drops frequently.
I am aware that documents are deleted as part of an update operation and this was previously already the case. But the deleted documents count mostly increased and did not oscillate to that extend.
Before the issue occurred the situation at the end of the load was around 8 mio docs.count and around 3 mio as docs.deleted. At the moment the number stays at around 1.2 mio.
The scenario is deployed on two environments and only one of the environments shows this behavior. Both have the same source and Logstash configuration. The environments differ in their ES instance. Both instances are configured equally.
The disk space is fine and there are no error in the ES log.
I also can confirm that the Logstash is running fine and is constantly sending bulk requests to the ES.
There is another thread in which someone mentioned that ES is dropping documents in overload scenarios. I could not find any proof for that behavior but this would at least be a good explanation for what I am experiencing:
Documents are automatically getting deleted in Elasticsearch after insertion
Time series of the metrics
TIME 1230:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open myindex LLC8qxSWTWyO1U25Olljlg 1 1 1182676 166056 642.3mb 642.3mb
TIME 1240:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open myindex LLC8qxSWTWyO1U25Olljlg 1 1 1182676 533339 946.9mb 946.9mb
TIME 1300:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open myindex LLC8qxSWTWyO1U25Olljlg 1 1 1182676 349747 701.9mb 701.9mb
TIME 1400:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open myindex LLC8qxSWTWyO1U25Olljlg 1 1 1182678 467651 1gb 1gb
TIME 1430:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open myindex LLC8qxSWTWyO1U25Olljlg 1 1 1182678 693906 1gb 1gb
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
实际上,在详细分析 Logstash 后,我们意识到 Logstash 由于某种原因没有更新其元数据,因此它会无限循环地发送相同的数据。
ES 工作得很好,这也解释了删除文档的振荡。
看来是网络问题,需要进一步调查。
Actually after analyzing the Logstash in detail we realized that Logstash is not updating it's meta data for some reason and hence it sends the same data in an endless loop.
The ES works perfectly fine and this also explains the oscillating deleted documents.
Seems to be a network issue which we need to investigate further.