当Fluentbit在重载下输出日志时,OpenSearch抛出429误差
使用以下Fluentbit配置,我们将在重负载下从Opensearch遇到错误。
fluentbit config:
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
DB /var/log/flb_kube.db
Mem_Buf_Limit 400M
storage.type filesystem
Skip_Long_Lines On
Refresh_Interval 1
Rotate_Wait 600
[OUTPUT]
Name es
Match kube.*
Host ${ES_HOST}
Port ${PORT}
Buffer_Size False
AWS_Auth Off
AWS_Role_ARN ${ES_ARN}
AWS_External_ID ${ES_IAMROLE}
HTTP_User ${ES_USER}
HTTP_Passwd ${ES_PASSWD}
tls On
tls.verify Off
Trace_Output ${TRACE_OUTPUT}
Trace_Error On
Replace_Dots On
Index fluentbit
Type flb
AWS_Region ${AWS_REGION}
Logstash_Format On
Logstash_Prefix ${ES_LOGSTASHPREFIX}_app_log
Logstash_DateFormat %Y.%m.%d
Retry_Limit 10
storage.total_limit_size 1G
为了解决此问题,我们已将OpenSearch实例类型从r5.xlarge.search(4个节点)升级到r5.2xlarge.search.search(3个节点),但这也没有解决该问题。 我们还将ES索引Refresh_interval提高到60年代,但这无济于事。
我们读到可以通过缓冲控制向ES输出到ES,因此我们将MEM_BUF_LIMIT降低到400m,这无济于事。
如果可以尝试任何其他事情,或者我们缺少某些东西,有人可以帮忙吗?
With the below fluentbit configuration we are getting errors from opensearch under heavy load.
Http bulk requests to opensearch by fluentbit(respresenting 429 errors as spike)
Fluentbit config:
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
DB /var/log/flb_kube.db
Mem_Buf_Limit 400M
storage.type filesystem
Skip_Long_Lines On
Refresh_Interval 1
Rotate_Wait 600
[OUTPUT]
Name es
Match kube.*
Host ${ES_HOST}
Port ${PORT}
Buffer_Size False
AWS_Auth Off
AWS_Role_ARN ${ES_ARN}
AWS_External_ID ${ES_IAMROLE}
HTTP_User ${ES_USER}
HTTP_Passwd ${ES_PASSWD}
tls On
tls.verify Off
Trace_Output ${TRACE_OUTPUT}
Trace_Error On
Replace_Dots On
Index fluentbit
Type flb
AWS_Region ${AWS_REGION}
Logstash_Format On
Logstash_Prefix ${ES_LOGSTASHPREFIX}_app_log
Logstash_DateFormat %Y.%m.%d
Retry_Limit 10
storage.total_limit_size 1G
For resolving this we have upgraded our opensearch instance type from r5.xlarge.search(4 nodes) to r5.2xlarge.search(3 nodes) but that also didn't solve the issue.
We have also increased the ES index refresh_interval to 60s but that didn't help.
We read that output to ES from fluentbit can be controlled via buffering so we decreased Mem_Buf_Limit to 400M and it didn't help.
Can someone help if can try any other things or we are missing something.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这里的问题不是Fluentbit,而是OpenSearch/Elasticsearch。
ES中发生的HTTP 429错误(es_request_rejected_exception)发生时,当发送到群集的请求太多,而不是线程池可以处理的内容。搜索不同任务的线程池以不同的方式分配了不同的任务。手动修改线程池分配的选项不适合5.1版和更高版本。
您可以尝试通过几种方式解决此问题。
1:刷新率(您已经做到了,没有帮助)。
2:更改索引速度。尝试以大于当前的间隔发送日志。
3:高档(您做到了,也没有起作用)
您可以使用以下线程池的以下公式获得一个想法。
分配给Writes的线程池=虚拟CPU的数量(您的情况)
分配的搜索线程池数量=((3 *虚拟CPU的数量)/2) + 1
,所以我想您的问题是大量的碎片!您可以减少每个索引的碎片,或者如果只有额外负载时只有偶尔遇到此问题,则可以将复制品计数更改为0,并且在周期完成后,将其更改回原始。
检查这两个链接,以了解有关优化ES域的更多信息。
索引性能
最佳实践< /a>
The issue here is not that of fluentbit but is of opensearch/elasticsearch.
The HTTP 429 errors (es_request_rejected_exception) in ES occur when too many requests are sent to the cluster, than what the thread pool for it can handle. The thread pool in OpenSearch for different tasks are allocated differently with search operations getting a larger share. The option to manually modify thread pool allocation is not available for versions 5.1 and later.
You can try to resolve this by few ways.
1: Refresh rate (you already did that and it didn't help).
2: Change the indexing speed. Try to send logs with an interval greater than your current.
3: Upscale (you did and it didn't work either)
You can get an idea with the following formula for thread pools.
Number of thread pools allocated for writes = Number of Virtual CPUs (your case)
Number of thread pools allocated for search = ((3 * Number of virtual CPUs)/2) + 1
So, I am guessing your issue here is a big number of shards! You can either decrease the shards for each index or if you are having this issue only once in a while when there is extra load, you can change the replica count to 0 and when the period is finished, change it back to the original.
Check these two links to find out more about optimizing your ES domain.
indexing performance
Best practices