在气流GkestartPodoperator操作员中配置卷

发布于 2025-02-01 03:47:57 字数 1678 浏览 3 评论 0 原文

我有一个Google Cloud Composer环境。在我的DAG中,我想在GKE中创建一个吊舱。当我根据不需要任何卷配置或秘密的Docker容器部署一个简单的应用程序时,一切都很好,例如:

kubernetes_max = GKEStartPodOperator(
    # The ID specified for the task.
    task_id="python-simple-app",
    # Name of task you want to run, used to generate Pod ID.
    name="python-demo-app",
    project_id=PROJECT_ID,
    location=CLUSTER_REGION,
    cluster_name=CLUSTER_NAME,
    # Entrypoint of the container, if not specified the Docker container's
    # entrypoint is used. The cmds parameter is templated.
    cmds=["python", "app.py"],
    namespace="production",
    image="gcr.io/path/to/lab-python-job:latest",
)

但是,当我有一个需要访问我的GKE群集量的应用程序时,我需要在我的豆荚中配置卷。问题是文档对此尚不清楚。我曾经是这样:

volume = k8s.V1Volume(
    name='test-volume',
    persistent_volume_claim=k8s.V1PersistentVolumeClaimVolumeSource(claim_name='test-volume'),
)

虽然我的清单文件中的卷(我使用它从本地部署我的应用程序)看起来像这样:

volumes:
  - name: volume-prod
    secret:
      secretName: volume-prod
      items:
        - key: config
          path: config.json
        - key: another_config
          path: another_config.conf
        - key: random-ca
          path: random-ca.pem

因此,当我比较控制台中的两个卷时(当我手动部署成功运行的清单文件时,当我使用失败的Clod Composer部署POD时):

  • 成功运行 - 清单文件:

    卷 - 质量
    名称:卷 - 构成
    类型:秘密
    源音量标识符:卷 - 构成

  • 失败的运行 - 作曲家 gkestartpodoperator

    卷 - 质量
    名称:卷 - 构成
    类型:emptydir
    源音量标识符:节点的默认介质

我如何以读取群集卷的方式从云作曲家中配置我的POD?

I have a google cloud composer environment. In my DAG I want to create a pod in GKE. When I come to deploy a simple app based on a docker container that doesn't need any volume configuration or secrets, everything works fine, for example:

kubernetes_max = GKEStartPodOperator(
    # The ID specified for the task.
    task_id="python-simple-app",
    # Name of task you want to run, used to generate Pod ID.
    name="python-demo-app",
    project_id=PROJECT_ID,
    location=CLUSTER_REGION,
    cluster_name=CLUSTER_NAME,
    # Entrypoint of the container, if not specified the Docker container's
    # entrypoint is used. The cmds parameter is templated.
    cmds=["python", "app.py"],
    namespace="production",
    image="gcr.io/path/to/lab-python-job:latest",
)

But when I have an application that need to access to my GKE cluster volumes, I need to configure volumes in my pod. The issue is the documentation is not clear regarding this. The only example that I ever foud is this:

volume = k8s.V1Volume(
    name='test-volume',
    persistent_volume_claim=k8s.V1PersistentVolumeClaimVolumeSource(claim_name='test-volume'),
)

While the volumes in the my manifest file (I use it to deploy my app from local) looks like this:

volumes:
  - name: volume-prod
    secret:
      secretName: volume-prod
      items:
        - key: config
          path: config.json
        - key: another_config
          path: another_config.conf
        - key: random-ca
          path: random-ca.pem

So when I compare how both volumes looks like in the console (when I manually deploy the manifest file that successfully run, and when I deploy the pod using clod composer that fails):

  • The successful run - Manifest file:

    volume-prod
    Name: volume-prod
    Type: secret
    Source volume identifier: volume-prod

  • The failed run - Composer GKEStartPodOperator:

    volume-prod
    Name: volume-prod
    Type: emptyDir
    Source volume identifier: Node's default medium

How I can configure my pod from cloud composer in a way it can read the volume of my cluster?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

旧人 2025-02-08 03:47:57

kubernetespederator / gkestartaperator 只是python kubernetes sdk周围的包装器 - 我同意它在气流/云作曲家文档中没有很好地记录在python sdk中,但本身有充分的记录。

从Kubernetes Python SDK文档开始:

您会注意到参数 kubernetespodoperator /code>/ gkestartoperatorator gkestartoperator 匹配此规格。如果您挖掘运算符的源代码,您会发现操作员不过是创建 kubernetes.client.models.models.v1pod 对象的构建器,并使用API​​部署POD。

operator 采用卷>, v1volume 的文档是在这里

因此,在您的情况下,您需要提供:

from kubernetes.client import models as k8s

kubernetes_max = GKEStartPodOperator(
    # The ID specified for the task.
    task_id="python-simple-app",
    # Name of task you want to run, used to generate Pod ID.
    name="python-demo-app",
    project_id=PROJECT_ID,
    location=CLUSTER_REGION,
    cluster_name=CLUSTER_NAME,
    # Entrypoint of the container, if not specified the Docker container's
    # entrypoint is used. The cmds parameter is templated.
    cmds=["python", "app.py"],
    namespace="production",
    image="gcr.io/path/to/lab-python-job:latest",
    volumes=[
        k8s.V1Volume(
            name="volume-prod",
            secret=k8s.V1SecretVolumeSource(
                secret_name="volume-prod",
                items=[
                    k8s.V1KeyToPath(key="config", path="config.json"),
                    k8s.V1KeyToPath(key="another_config", path="another_config.conf"),
                    k8s.V1KeyToPath(key="random-ca", path="random-ca.pem"),
                ],
            )
        )
    ]
)

或者,您可以向 pod_template_file 参数提供 gkestartpodoperator - 这需要向气流中的工人提供。

有3种方法可以使用此操作员在气流中创建POD:

  1. 使用操作员的参数来指定您的需求,并让操作员为您构建 v1pod
  2. 通过传递 pod_template_file 参数来提供清单。
  3. 使用kubernetes SDK自己创建 v1pod 对象自己,然后将其传递给 full_pod_spec 参数。

The KubernetesPodOperator/GKEStartOperator is just a wrapper around the python Kubernetes sdk - I agree that it isn't well documented in the Airflow/Cloud Composer documentation but the Python SDK for Kubernetes itself is well documented.

Start here with the kubernetes python sdk documentation: https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1PodSpec.md

You'll notice that the arguments the KubernetesPodOperator/GKEStartOperator take match this spec. If you dig into the source code of the operators you'll see that the operator is nothing more than a builder that creates a kubernetes.client.models.V1Pod object and uses the API to deploy the pod.

The operator takes a volumes parameter which should be of type List[V1Volume], where the documentation for V1Volume is here.

So in your case you would need to provide:

from kubernetes.client import models as k8s

kubernetes_max = GKEStartPodOperator(
    # The ID specified for the task.
    task_id="python-simple-app",
    # Name of task you want to run, used to generate Pod ID.
    name="python-demo-app",
    project_id=PROJECT_ID,
    location=CLUSTER_REGION,
    cluster_name=CLUSTER_NAME,
    # Entrypoint of the container, if not specified the Docker container's
    # entrypoint is used. The cmds parameter is templated.
    cmds=["python", "app.py"],
    namespace="production",
    image="gcr.io/path/to/lab-python-job:latest",
    volumes=[
        k8s.V1Volume(
            name="volume-prod",
            secret=k8s.V1SecretVolumeSource(
                secret_name="volume-prod",
                items=[
                    k8s.V1KeyToPath(key="config", path="config.json"),
                    k8s.V1KeyToPath(key="another_config", path="another_config.conf"),
                    k8s.V1KeyToPath(key="random-ca", path="random-ca.pem"),
                ],
            )
        )
    ]
)

Alternatively, you can provide your manifest to the pod_template_file argument in GKEStartPodOperator - this will need to be available to the workers inside airflow.

There are 3 ways to create pods in Airflow using this Operator:

  1. Use the arguments of the operator to specify what you need and have the operator build the V1Pod for you.
  2. Provide a manifest by passing in pod_template_file argument.
  3. Use the Kubernetes sdk to create a V1Pod object yourself and pass this to the full_pod_spec argument.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文