kubernetes worker pod 上的气流立即崩溃

发布于 2025-01-09 09:57:40 字数 13530 浏览 1 评论 0原文

大家好,我在这个问题上遇到了麻烦。

我在 Kubernetes 基础上设置气流

我使用 AWS EKS、AWS EFS(用于持久化卷)

Airflow: 2.2.3-python3.8
Kubernetes: 1.21

airflow uid: 50000, gid: 0

我参考 博客来部署此基础设施。

我的 Dockerfile

#  Licensed to the Apache Software Foundation (ASF) under one   *
#  or more contributor license agreements.  See the NOTICE file *
#  distributed with this work for additional information        *
#  regarding copyright ownership.  The ASF licenses this file   *
#  to you under the Apache License, Version 2.0 (the            *
#  "License"); you may not use this file except in compliance   *
#  with the License.  You may obtain a copy of the License at   *
#                                                               *
#    http://www.apache.org/licenses/LICENSE-2.0                 *
#                                                               *
#  Unless required by applicable law or agreed to in writing,   *
#  software distributed under the License is distributed on an  *
#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY       *
#  KIND, either express or implied.  See the License for the    *
#  specific language governing permissions and limitations      *
#  under the License.                                           *

FROM apache/airflow:2.2.3-python3.8
RUN usermod -g 0 airflow

# install deps
USER root
RUN apt-get update -y && apt-get install -y \
    libczmq-dev \
    libssl-dev \
    inetutils-telnet \
    python3-dev \
    build-essential \
    postgresql postgresql-contrib \
    bind9utils \
    gcc \
    git \
    && apt-get clean

# vim
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
         vim \
  && apt-get autoremove -yqq --purge \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*

USER airflow
RUN pip install --upgrade pip
COPY requirement.txt /tmp/requirement.txt

RUN pip install -r /tmp/requirement.txt


COPY airflow-test-env-init.sh /tmp/airflow-test-env-init.sh

COPY bootstrap.sh /bootstrap.sh

ENTRYPOINT ["/bootstrap.sh"]

我的 airflow.cfg 文件

#  Licensed to the Apache Software Foundation (ASF) under one   *
#  or more contributor license agreements.  See the NOTICE file *
#  distributed with this work for additional information        *
#  regarding copyright ownership.  The ASF licenses this file   *
#  to you under the Apache License, Version 2.0 (the            *
#  "License"); you may not use this file except in compliance   *
#  with the License.  You may obtain a copy of the License at   *
#                                                               *
#    http://www.apache.org/licenses/LICENSE-2.0                 *
#                                                               *
#  Unless required by applicable law or agreed to in writing,   *
#  software distributed under the License is distributed on an  *
#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY       *
#  KIND, either express or implied.  See the License for the    *
#  specific language governing permissions and limitations      *
#  under the License.                                           *

# Note: The airflow image used in this example is obtained by   *
# building the image from the local docker subdirectory.        *
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: airflow
  namespace: airflow
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: airflow
  name: airflow
rules:
  - apiGroups: [""] # "" indicates the core API group
    resources: ["pods"]
    verbs: ["get", "list", "watch", "create", "update", "delete"]
  - apiGroups: [ "" ]
    resources: [ "pods/log" ]
    verbs: [ "get", "list" ]
  - apiGroups: [ "" ]
    resources: [ "pods/exec" ]
    verbs: [ "create", "get" ]
  - apiGroups: ["batch", "extensions"]
    resources: ["jobs"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: airflow
  namespace: airflow
subjects:
  - kind: ServiceAccount
    name: airflow # Name of the ServiceAccount
    namespace: airflow
roleRef:
  kind: Role # This must be Role or ClusterRole
  name: airflow # This must match the name of the Role
                #   or ClusterRole you wish to bind to
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: airflow
  namespace: airflow
spec:
  replicas: 1
  selector:
    matchLabels:
      name: airflow
  template:
    metadata:
      labels:
        name: airflow
    spec:
      serviceAccountName: airflow
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: lifecycle
                operator: NotIn
                values:
                - Ec2Spot
      containers:
      - name: webserver
        image: {{AIRFLOW_IMAGE}}:{{AIRFLOW_TAG}}
        imagePullPolicy: Always
        ports:
        - name: webserver
          containerPort: 8080
        args: ["webserver"]
        env:
        - name: AIRFLOW_KUBE_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: SQL_ALCHEMY_CONN
          valueFrom:
            secretKeyRef:
              name: airflow-secrets
              key: sql_alchemy_conn
        volumeMounts:
        - name: airflow-configmap
          mountPath: /opt/airflow/airflow.cfg
          subPath: airflow.cfg
        - name: {{POD_AIRFLOW_VOLUME_NAME}}
          mountPath: /opt/airflow/dags
        - name: {{POD_AIRFLOW_VOLUME_NAME}}
          mountPath: /opt/airflow/logs
      - name: scheduler
        image: {{AIRFLOW_IMAGE}}:{{AIRFLOW_TAG}}
        imagePullPolicy: Always
        args: ["scheduler"]
        env:
        - name: AIRFLOW_KUBE_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: SQL_ALCHEMY_CONN
          valueFrom:
            secretKeyRef:
              name: airflow-secrets
              key: sql_alchemy_conn
        volumeMounts:
        - name: airflow-configmap
          mountPath: /opt/airflow/airflow.cfg
          subPath: airflow.cfg
        - name: {{POD_AIRFLOW_VOLUME_NAME}}
          mountPath: /opt/airflow/dags
        - name: {{POD_AIRFLOW_VOLUME_NAME}}
          mountPath: /opt/airflow/logs
      - name: git-sync
        image: k8s.gcr.io/git-sync/git-sync:v3.4.0
        imagePullPolicy: IfNotPresent
        envFrom:
          - configMapRef:
              name: airflow-gitsync
          - secretRef:
              name: airflow-secrets
        volumeMounts:
          - name: {{POD_AIRFLOW_VOLUME_NAME}}
            mountPath: /git
      volumes:
      - name: {{POD_AIRFLOW_VOLUME_NAME}}
        persistentVolumeClaim:
          claimName: airflow-efs-pvc
      - name: airflow-dags-fake
        emptyDir: {}
      - name: airflow-configmap
        configMap:
          name: airflow-configmap
      securityContext:
        runAsUser: 50000
        fsGroup: 0
---
apiVersion: v1
kind: Service
metadata:
  name: airflow
  namespace: airflow
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: {{AOK_SSL_ENDPOINT}}
spec:
  type: LoadBalancer
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
      nodePort: 30031
      name: http
    - protocol: TCP
      port: 443
      targetPort: 8080
      nodePort: 30032
      name: https
  selector:
    name: airflow

获取日志并描述 pod

NAME                                                               READY   STATUS             RESTARTS   AGE
airflow-bfd79c998-d5gjf                                            3/3     Running            0          2m14s
examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa    0/1     CrashLoopBackOff   1          15s
examplebashoperatorrunme0.9fd08bc8182a4bb7ad3d41cbb57942ff         0/1     CrashLoopBackOff   1          17s
examplebashoperatorrunme1.20e9bd925aaf4b4eb7645ad181267a8f         0/1     CrashLoopBackOff   1          17s
examplebashoperatorrunme2.58fb15f683184e83b4e714bd0e27ccb8         0/1     CrashLoopBackOff   1          16s
examplebashoperatorthiswillskip.71370cbbaa324a21915d73f4e07dc307   0/1     CrashLoopBackOff   1          13s

kubectl logs -n airflow -f 
examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa --previous
unable to retrieve container logs for docker://b81e5ea6ffa99d21b62b46500a865fbc7bfb6560683f8d8bfba4786ea02f361a

kubectl describe pod examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa -n airflow

Name:         examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa
Namespace:    airflow
Priority:     0
Node:         ip-xxx.xxx.xxx.xxx.my region.compute.internal/xxx.xxx.xxx.xxx
Start Time:   Tue, 22 Feb 2022 22:22:27 +0900
Labels:       airflow-worker=144
              airflow_version=2.2.3
              dag_id=example_bash_operator
              kubernetes_executor=True
              run_id=manual__2022-02-22T132224.6817590000-81c9256fb
              task_id=also_run_this
              try_number=1
Annotations:  dag_id: example_bash_operator
              kubernetes.io/psp: eks.privileged
              run_id: manual__2022-02-22T13:22:24.681759+00:00
              task_id: also_run_this
              try_number: 1
Status:       Running
IP:           xxx.xxx.xxx.xxx
IPs:
  IP:  xxx.xxx.xxx.xxx
Containers:
  base:
    Container ID:  docker://f2e0648c4a6a585b753529964d4bc26bc5c5c061e4c74a9c9e71aab00b1505e0
    Image:         xxxxxxxxxxxx.dkr.ecr.my-region**strong text**.amazonaws.com/my-repo:latest
    Image ID:      docker-pullable://xxxxxxxxxxxx.dkr.ecr.my region.amazonaws.com/repo@xxxxxxxxxxxx
    Port:          <none>
    Host Port:     <none>
    Args:
      airflow
      tasks
      run
      example_bash_operator
      also_run_this
      manual__2022-02-22T13:22:24.681759+00:00
      --local
      --subdir
      DAGS_FOLDER/example_bash_operator.py
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 22 Feb 2022 22:23:54 +0900
      Finished:     Tue, 22 Feb 2022 22:23:54 +0900
    Ready:          False
    Restart Count:  4
    Environment:
      AIRFLOW_IS_K8S_EXECUTOR_POD:  True
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bh4kp (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-bh4kp:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  2m7s                default-scheduler  Successfully assigned airflow/examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa to ip-xxx.xxx.xxx.xx.my-region.compute.internal
  Normal   Pulled     2m5s                kubelet            Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 94.764374ms
  Normal   Pulled     2m4s                kubelet            Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 93.874971ms
  Normal   Pulled     108s                kubelet            Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 106.66327ms
  Normal   Created    81s (x4 over 2m5s)  kubelet            Created container base
  Normal   Started    81s (x4 over 2m5s)  kubelet            Started container base
  Normal   Pulled     81s                 kubelet            Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 82.336875ms
  Warning  BackOff    54s (x7 over 2m3s)  kubelet            Back-off restarting failed container
  Normal   Pulling    40s (x5 over 2m5s)  kubelet            Pulling image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest"
  Normal   Pulled     40s                 kubelet            Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 91.959453ms

所有过程都正确完成(我认为),但是当我启动 dag(手动或计划)时,工作人员立即崩溃并且不会生成日志文件...(在我设置的持久量中) AWS 控制台错误消息 错误崩溃环回

kubectl get pods -n 气流消息 kubectl cli get pods

我需要帮助...请有人帮助我摆脱这个地狱.. 。

Hi guys I'm in trouble with this one.

I set airflow on Kubernetes infra

I use AWS EKS, AWS EFS (for persistence volume )

Airflow: 2.2.3-python3.8
Kubernetes: 1.21

airflow uid: 50000, gid: 0

I refer to this blog to deploy this infra.

My Dockerfile

#  Licensed to the Apache Software Foundation (ASF) under one   *
#  or more contributor license agreements.  See the NOTICE file *
#  distributed with this work for additional information        *
#  regarding copyright ownership.  The ASF licenses this file   *
#  to you under the Apache License, Version 2.0 (the            *
#  "License"); you may not use this file except in compliance   *
#  with the License.  You may obtain a copy of the License at   *
#                                                               *
#    http://www.apache.org/licenses/LICENSE-2.0                 *
#                                                               *
#  Unless required by applicable law or agreed to in writing,   *
#  software distributed under the License is distributed on an  *
#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY       *
#  KIND, either express or implied.  See the License for the    *
#  specific language governing permissions and limitations      *
#  under the License.                                           *

FROM apache/airflow:2.2.3-python3.8
RUN usermod -g 0 airflow

# install deps
USER root
RUN apt-get update -y && apt-get install -y \
    libczmq-dev \
    libssl-dev \
    inetutils-telnet \
    python3-dev \
    build-essential \
    postgresql postgresql-contrib \
    bind9utils \
    gcc \
    git \
    && apt-get clean

# vim
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
         vim \
  && apt-get autoremove -yqq --purge \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*

USER airflow
RUN pip install --upgrade pip
COPY requirement.txt /tmp/requirement.txt

RUN pip install -r /tmp/requirement.txt


COPY airflow-test-env-init.sh /tmp/airflow-test-env-init.sh

COPY bootstrap.sh /bootstrap.sh

ENTRYPOINT ["/bootstrap.sh"]

My airflow.cfg file

#  Licensed to the Apache Software Foundation (ASF) under one   *
#  or more contributor license agreements.  See the NOTICE file *
#  distributed with this work for additional information        *
#  regarding copyright ownership.  The ASF licenses this file   *
#  to you under the Apache License, Version 2.0 (the            *
#  "License"); you may not use this file except in compliance   *
#  with the License.  You may obtain a copy of the License at   *
#                                                               *
#    http://www.apache.org/licenses/LICENSE-2.0                 *
#                                                               *
#  Unless required by applicable law or agreed to in writing,   *
#  software distributed under the License is distributed on an  *
#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY       *
#  KIND, either express or implied.  See the License for the    *
#  specific language governing permissions and limitations      *
#  under the License.                                           *

# Note: The airflow image used in this example is obtained by   *
# building the image from the local docker subdirectory.        *
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: airflow
  namespace: airflow
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: airflow
  name: airflow
rules:
  - apiGroups: [""] # "" indicates the core API group
    resources: ["pods"]
    verbs: ["get", "list", "watch", "create", "update", "delete"]
  - apiGroups: [ "" ]
    resources: [ "pods/log" ]
    verbs: [ "get", "list" ]
  - apiGroups: [ "" ]
    resources: [ "pods/exec" ]
    verbs: [ "create", "get" ]
  - apiGroups: ["batch", "extensions"]
    resources: ["jobs"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: airflow
  namespace: airflow
subjects:
  - kind: ServiceAccount
    name: airflow # Name of the ServiceAccount
    namespace: airflow
roleRef:
  kind: Role # This must be Role or ClusterRole
  name: airflow # This must match the name of the Role
                #   or ClusterRole you wish to bind to
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: airflow
  namespace: airflow
spec:
  replicas: 1
  selector:
    matchLabels:
      name: airflow
  template:
    metadata:
      labels:
        name: airflow
    spec:
      serviceAccountName: airflow
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: lifecycle
                operator: NotIn
                values:
                - Ec2Spot
      containers:
      - name: webserver
        image: {{AIRFLOW_IMAGE}}:{{AIRFLOW_TAG}}
        imagePullPolicy: Always
        ports:
        - name: webserver
          containerPort: 8080
        args: ["webserver"]
        env:
        - name: AIRFLOW_KUBE_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: SQL_ALCHEMY_CONN
          valueFrom:
            secretKeyRef:
              name: airflow-secrets
              key: sql_alchemy_conn
        volumeMounts:
        - name: airflow-configmap
          mountPath: /opt/airflow/airflow.cfg
          subPath: airflow.cfg
        - name: {{POD_AIRFLOW_VOLUME_NAME}}
          mountPath: /opt/airflow/dags
        - name: {{POD_AIRFLOW_VOLUME_NAME}}
          mountPath: /opt/airflow/logs
      - name: scheduler
        image: {{AIRFLOW_IMAGE}}:{{AIRFLOW_TAG}}
        imagePullPolicy: Always
        args: ["scheduler"]
        env:
        - name: AIRFLOW_KUBE_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: SQL_ALCHEMY_CONN
          valueFrom:
            secretKeyRef:
              name: airflow-secrets
              key: sql_alchemy_conn
        volumeMounts:
        - name: airflow-configmap
          mountPath: /opt/airflow/airflow.cfg
          subPath: airflow.cfg
        - name: {{POD_AIRFLOW_VOLUME_NAME}}
          mountPath: /opt/airflow/dags
        - name: {{POD_AIRFLOW_VOLUME_NAME}}
          mountPath: /opt/airflow/logs
      - name: git-sync
        image: k8s.gcr.io/git-sync/git-sync:v3.4.0
        imagePullPolicy: IfNotPresent
        envFrom:
          - configMapRef:
              name: airflow-gitsync
          - secretRef:
              name: airflow-secrets
        volumeMounts:
          - name: {{POD_AIRFLOW_VOLUME_NAME}}
            mountPath: /git
      volumes:
      - name: {{POD_AIRFLOW_VOLUME_NAME}}
        persistentVolumeClaim:
          claimName: airflow-efs-pvc
      - name: airflow-dags-fake
        emptyDir: {}
      - name: airflow-configmap
        configMap:
          name: airflow-configmap
      securityContext:
        runAsUser: 50000
        fsGroup: 0
---
apiVersion: v1
kind: Service
metadata:
  name: airflow
  namespace: airflow
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: {{AOK_SSL_ENDPOINT}}
spec:
  type: LoadBalancer
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
      nodePort: 30031
      name: http
    - protocol: TCP
      port: 443
      targetPort: 8080
      nodePort: 30032
      name: https
  selector:
    name: airflow

Get logs and describe pods

NAME                                                               READY   STATUS             RESTARTS   AGE
airflow-bfd79c998-d5gjf                                            3/3     Running            0          2m14s
examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa    0/1     CrashLoopBackOff   1          15s
examplebashoperatorrunme0.9fd08bc8182a4bb7ad3d41cbb57942ff         0/1     CrashLoopBackOff   1          17s
examplebashoperatorrunme1.20e9bd925aaf4b4eb7645ad181267a8f         0/1     CrashLoopBackOff   1          17s
examplebashoperatorrunme2.58fb15f683184e83b4e714bd0e27ccb8         0/1     CrashLoopBackOff   1          16s
examplebashoperatorthiswillskip.71370cbbaa324a21915d73f4e07dc307   0/1     CrashLoopBackOff   1          13s

kubectl logs -n airflow -f 
examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa --previous
unable to retrieve container logs for docker://b81e5ea6ffa99d21b62b46500a865fbc7bfb6560683f8d8bfba4786ea02f361a

kubectl describe pod examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa -n airflow

Name:         examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa
Namespace:    airflow
Priority:     0
Node:         ip-xxx.xxx.xxx.xxx.my region.compute.internal/xxx.xxx.xxx.xxx
Start Time:   Tue, 22 Feb 2022 22:22:27 +0900
Labels:       airflow-worker=144
              airflow_version=2.2.3
              dag_id=example_bash_operator
              kubernetes_executor=True
              run_id=manual__2022-02-22T132224.6817590000-81c9256fb
              task_id=also_run_this
              try_number=1
Annotations:  dag_id: example_bash_operator
              kubernetes.io/psp: eks.privileged
              run_id: manual__2022-02-22T13:22:24.681759+00:00
              task_id: also_run_this
              try_number: 1
Status:       Running
IP:           xxx.xxx.xxx.xxx
IPs:
  IP:  xxx.xxx.xxx.xxx
Containers:
  base:
    Container ID:  docker://f2e0648c4a6a585b753529964d4bc26bc5c5c061e4c74a9c9e71aab00b1505e0
    Image:         xxxxxxxxxxxx.dkr.ecr.my-region**strong text**.amazonaws.com/my-repo:latest
    Image ID:      docker-pullable://xxxxxxxxxxxx.dkr.ecr.my region.amazonaws.com/repo@xxxxxxxxxxxx
    Port:          <none>
    Host Port:     <none>
    Args:
      airflow
      tasks
      run
      example_bash_operator
      also_run_this
      manual__2022-02-22T13:22:24.681759+00:00
      --local
      --subdir
      DAGS_FOLDER/example_bash_operator.py
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 22 Feb 2022 22:23:54 +0900
      Finished:     Tue, 22 Feb 2022 22:23:54 +0900
    Ready:          False
    Restart Count:  4
    Environment:
      AIRFLOW_IS_K8S_EXECUTOR_POD:  True
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bh4kp (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-bh4kp:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  2m7s                default-scheduler  Successfully assigned airflow/examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa to ip-xxx.xxx.xxx.xx.my-region.compute.internal
  Normal   Pulled     2m5s                kubelet            Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 94.764374ms
  Normal   Pulled     2m4s                kubelet            Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 93.874971ms
  Normal   Pulled     108s                kubelet            Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 106.66327ms
  Normal   Created    81s (x4 over 2m5s)  kubelet            Created container base
  Normal   Started    81s (x4 over 2m5s)  kubelet            Started container base
  Normal   Pulled     81s                 kubelet            Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 82.336875ms
  Warning  BackOff    54s (x7 over 2m3s)  kubelet            Back-off restarting failed container
  Normal   Pulling    40s (x5 over 2m5s)  kubelet            Pulling image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest"
  Normal   Pulled     40s                 kubelet            Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 91.959453ms

All the process is done rightly ( i think ) but when i start the dag (manually or schedule ) the worker was crashed immediately and doesn't make a log file... ( in persistence volume i set)
AWS Console error message
Error crash loop back

kubectl get pods -n airflow message
kubectl cli get pods

I need help... please somebody help me out of this hell...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文