Airflow Helm Chart Worker 节点错误 - CrashLoopBackOff
我正在使用官方 Helm 图表来测量气流。除了 Worker 节点之外,每个 Pod 都可以正常工作。
即使在该工作节点中,2 个容器(git-sync 和worker-log-groomer)也可以正常工作。
错误发生在第三个容器(工作线程)中,并出现 CrashLoopBackOff。退出代码状态为 137 OOMkilled。
在我的 openshift 中,内存使用率显示为 70%。
虽然这个错误是由于内存泄漏造成的。但本次的情况却并非如此。请帮忙,我已经在这方面进行了一个星期了。
Kubectl 描述 pod airflow-worker-0 ->
worker:
Container ID: <>
Image: <>
Image ID: <>
Port: <>
Host Port: <>
Args:
bash
-c
exec \
airflow celery worker
State: Running
Started: <>
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: <>
Finished: <>
Ready: True
Restart Count: 3
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
DUMB_INIT_SETSID: 0
AIRFLOW__CORE__FERNET_KEY: <> Optional: false
Mounts:
<>
git-sync:
Container ID: <>
Image: <>
Image ID: <>
Port: <none>
Host Port: <none>
State: Running
Started: <>
Ready: True
Restart Count: 0
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
GIT_SYNC_REV: HEAD
Mounts:
<>
worker-log-groomer:
Container ID: <>
Image: <>
Image ID: <>
Port: <none>
Host Port: <none>
Args:
bash
/clean-logs
State: Running
Started: <>
Ready: True
Restart Count: 0
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
AIRFLOW__LOG_RETENTION_DAYS: 5
Mounts:
<>
我非常确定您知道答案。阅读您所有有关气流的文章。谢谢 :) https://stackoverflow.com/users/1376561/marc-lamberti
I am using official Helm chart for airflow. Every Pod works properly except Worker node.
Even in that worker node, 2 of the containers (git-sync and worker-log-groomer) works fine.
The error happened in the 3rd container (worker) with CrashLoopBackOff. Exit code status as 137 OOMkilled.
In my openshift, memory usage is showing to be at 70%.
Although this error comes because of memory leak. This doesn't happen to be the case for this one. Please help, I have been going on in this one for a week now.
Kubectl describe pod airflow-worker-0 ->
worker:
Container ID: <>
Image: <>
Image ID: <>
Port: <>
Host Port: <>
Args:
bash
-c
exec \
airflow celery worker
State: Running
Started: <>
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: <>
Finished: <>
Ready: True
Restart Count: 3
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
DUMB_INIT_SETSID: 0
AIRFLOW__CORE__FERNET_KEY: <> Optional: false
Mounts:
<>
git-sync:
Container ID: <>
Image: <>
Image ID: <>
Port: <none>
Host Port: <none>
State: Running
Started: <>
Ready: True
Restart Count: 0
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
GIT_SYNC_REV: HEAD
Mounts:
<>
worker-log-groomer:
Container ID: <>
Image: <>
Image ID: <>
Port: <none>
Host Port: <none>
Args:
bash
/clean-logs
State: Running
Started: <>
Ready: True
Restart Count: 0
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
AIRFLOW__LOG_RETENTION_DAYS: 5
Mounts:
<>
I am pretty much sure you know the answer. Read all your articles on airflow. Thank you :)
https://stackoverflow.com/users/1376561/marc-lamberti
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
出现此问题的原因是在任何 pod 中的 helm 图表 -values.yaml 下的“资源”中设置了限制。
默认情况下是这样的 -
但这会导致一个问题,因为 Pod 可以根据需要访问无限的内存。
通过将其更改为 -
可以使 pod 清楚其可以访问和请求的量。
这解决了我的问题。
The issues occurs due to placing a limit in "resources" under helm chart - values.yaml in any of the pods.
By default it is -
but this causes an issue as pods can access unlimited memory as required.
By changing it to -
It makes the pod clear on how much it can access and request.
This solved my issue.