Airflow Helm Chart Worker 节点错误 - CrashLoopBackOff

发布于 2025-01-10 14:36:06 字数 2464 浏览 0 评论 0原文

我正在使用官方 Helm 图表来测量气流。除了 Worker 节点之外,每个 Pod 都可以正常工作。

即使在该工作节点中,2 个容器(git-sync 和worker-log-groomer)也可以正常工作。

错误发生在第三个容器(工作线程)中,并出现 CrashLoopBackOff。退出代码状态为 137 OOMkilled。

在我的 openshift 中,内存使用率显示为 70%。

虽然这个错误是由于内存泄漏造成的。但本次的情况却并非如此。请帮忙,我已经在这方面进行了一个星期了。

Kubectl 描述 pod airflow-worker-0 ->

worker:
    Container ID:  <>
    Image:         <>
    Image ID:     <>
    Port:          <>
    Host Port:     <>
    Args:
      bash
      -c
      exec \
      airflow celery worker
    State:          Running
      Started:      <>
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      <>
      Finished:     <>
    Ready:          True
    Restart Count:  3
    Limits:
      ephemeral-storage:  30G
      memory:             1Gi
    Requests:
      cpu:                50m
      ephemeral-storage:  100M
      memory:             409Mi
    Environment:
      DUMB_INIT_SETSID:                        0
      AIRFLOW__CORE__FERNET_KEY:               <>                     Optional: false
    Mounts:
      <>
  git-sync:
    Container ID:   <>
    Image:          <>
    Image ID:       <>
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      <>
    Ready:          True
    Restart Count:  0
    Limits:
      ephemeral-storage:  30G
      memory:             1Gi
    Requests:
      cpu:                50m
      ephemeral-storage:  100M
      memory:             409Mi
    Environment:
      GIT_SYNC_REV:                HEAD
    Mounts:
      <>
  worker-log-groomer:
    Container ID:  <>
    Image:         <>
    Image ID:      <>
    Port:          <none>
    Host Port:     <none>
    Args:
      bash
      /clean-logs
    State:          Running
      Started:      <>
    Ready:          True
    Restart Count:  0
    Limits:
      ephemeral-storage:  30G
      memory:             1Gi
    Requests:
      cpu:                50m
      ephemeral-storage:  100M
      memory:             409Mi
    Environment:
      AIRFLOW__LOG_RETENTION_DAYS:  5
    Mounts:
      <>

我非常确定您知道答案。阅读您所有有关气流的文章。谢谢 :) https://stackoverflow.com/users/1376561/marc-lamberti

I am using official Helm chart for airflow. Every Pod works properly except Worker node.

Even in that worker node, 2 of the containers (git-sync and worker-log-groomer) works fine.

The error happened in the 3rd container (worker) with CrashLoopBackOff. Exit code status as 137 OOMkilled.

In my openshift, memory usage is showing to be at 70%.

Although this error comes because of memory leak. This doesn't happen to be the case for this one. Please help, I have been going on in this one for a week now.

Kubectl describe pod airflow-worker-0 ->

worker:
    Container ID:  <>
    Image:         <>
    Image ID:     <>
    Port:          <>
    Host Port:     <>
    Args:
      bash
      -c
      exec \
      airflow celery worker
    State:          Running
      Started:      <>
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      <>
      Finished:     <>
    Ready:          True
    Restart Count:  3
    Limits:
      ephemeral-storage:  30G
      memory:             1Gi
    Requests:
      cpu:                50m
      ephemeral-storage:  100M
      memory:             409Mi
    Environment:
      DUMB_INIT_SETSID:                        0
      AIRFLOW__CORE__FERNET_KEY:               <>                     Optional: false
    Mounts:
      <>
  git-sync:
    Container ID:   <>
    Image:          <>
    Image ID:       <>
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      <>
    Ready:          True
    Restart Count:  0
    Limits:
      ephemeral-storage:  30G
      memory:             1Gi
    Requests:
      cpu:                50m
      ephemeral-storage:  100M
      memory:             409Mi
    Environment:
      GIT_SYNC_REV:                HEAD
    Mounts:
      <>
  worker-log-groomer:
    Container ID:  <>
    Image:         <>
    Image ID:      <>
    Port:          <none>
    Host Port:     <none>
    Args:
      bash
      /clean-logs
    State:          Running
      Started:      <>
    Ready:          True
    Restart Count:  0
    Limits:
      ephemeral-storage:  30G
      memory:             1Gi
    Requests:
      cpu:                50m
      ephemeral-storage:  100M
      memory:             409Mi
    Environment:
      AIRFLOW__LOG_RETENTION_DAYS:  5
    Mounts:
      <>

I am pretty much sure you know the answer. Read all your articles on airflow. Thank you :)
https://stackoverflow.com/users/1376561/marc-lamberti

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

萌面超妹 2025-01-17 14:36:06

出现此问题的原因是在任何 pod 中的 helm 图表 -values.yaml 下的“资源”中设置了限制

默认情况下是这样的 -

resources: {}

但这会导致一个问题,因为 Pod 可以根据需要访问无限的内存。

通过将其更改为 -

resources:
   limits:
    cpu: 200m
    memory: 2Gi
   requests:
    cpu: 100m
    memory: 512Mi

可以使 pod 清楚其可以访问和请求的量。
这解决了我的问题。

The issues occurs due to placing a limit in "resources" under helm chart - values.yaml in any of the pods.

By default it is -

resources: {}

but this causes an issue as pods can access unlimited memory as required.

By changing it to -

resources:
   limits:
    cpu: 200m
    memory: 2Gi
   requests:
    cpu: 100m
    memory: 512Mi

It makes the pod clear on how much it can access and request.
This solved my issue.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文