K8s 作业和 pod 在使用主机+子域时的差异

发布于 2025-01-19 16:18:01 字数 9737 浏览 5 评论 0原文

我有Helm 3使用的K8。

我需要在YAML文件（由Helm创建）运行时访问K8S作业。

kubectl版本：

客户端版本：version.info {major：“ 1”，Minor：“ 21”， gitversion：“ v1.21.6”， GitCommit：“ D921BC6D1810DA51177FBD0ED61DC811C5228097”， gittreestate：“清洁”，builddate：“ 2021-10-27T17：50：34Z”， Goverion：“ GO1.16.9”，编译器：“ GC”，Platform：“ Linux/amd64”}服务器版本：版本。 GitCommit：“ D921BC6D1810DA51177FBD0ED61DC811C5228097”， gittreestate：“清洁”，builddate：“ 2021-10-27T17：44：26z”， GoVersion：“ GO1.16.9”，编译器：“ GC”，Platform：“ Linux/amd64”}

Helm版本：

version.buildinfo {版本：“ v3.3.4”， GitCommit：“ A61CE5633AF99708171414353ED49547CF05013D”， gittreestate：“清洁”，goversion：“ go1.14.9”}

作为以下链接：

https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#:%7edcm-bot-890c06f7dff54e2ff54e2d9d9d9d9d35332364e 不适合POD。

如所解释的那样，将主机名和子域放入POD的YAML文件中，并添加保存域的服务...

是否需要检查状态。

对于POD，它已经准备就绪状态。

kubectl wait pod/pod-name --for=condition=ready ...

对于工作，没有现成的状态（而背后的POD正在运行）。

我如何检查工作背后的POD状态（作业正在运行）？如何将HOST +子域用于工作？

我的代码... 删除了一些安全标签，但是相同。

（我作业）：

侦听器（POD是最后一个工作）：

我添加的是主机名和子域（适用于POD，而不是工作）。如果它在豆荚上 - 没问题。

我还意识到，POD的名称（由作业创建）具有哈希自动扩展名。

apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "my-project.fullname" . }}-listener
  namespace: {{ .Release.Namespace }}
  labels:
    name: {{ include "my-project.fullname" . }}-listener
    app: {{ include "my-project.fullname" . }}-listener
    component: {{ .Chart.Name }}
    subcomponent: {{ .Chart.Name }}-listener
  annotations:
    "prometheus.io/scrape": {{ .Values.prometheus.scrape | quote }}
    "prometheus.io/path": {{ .Values.prometheus.path }}
    "prometheus.io/port": {{ .Values.ports.api.container | quote }}
spec:
  template: #PodTemplateSpec (Core/V1)
    spec: #PodSpec (core/v1)
      hostname: {{ include "my-project.fullname" . }}-listener
      subdomain: {{ include "my-project.fullname" . }}-listener-dmn
      initContainers:
        # twice - can add in helers.tpl
        - name: wait-mysql-exist-pod
          image: {{ .Values.global.registry }}/{{ .Values.global.k8s.image }}:{{ .Values.global.k8s.tag | default "latest" }}
          imagePullPolicy: IfNotPresent
          env:
            - name: MYSQL_POD_NAME
              value: {{ .Release.Name }}-mysql
            - name: COMPONENT_NAME
              value: {{ .Values.global.mysql.database.name }}
          command:
            - /bin/sh
          args:
            - -c
            - |-
              while [ "$(kubectl get pod $MYSQL_POD_NAME 2>/dev/null | grep $MYSQL_POD_NAME | awk '{print $1;}')" \!= "$MYSQL_POD_NAME" ];do
                echo 'Waiting for mysql pod to be existed...';
                sleep 5;
              done
        - name: wait-mysql-ready
          image: {{ .Values.global.registry }}/{{ .Values.global.k8s.image }}:{{ .Values.global.k8s.tag | default "latest" }}
          imagePullPolicy: IfNotPresent
          env:
            - name: MYSQL_POD_NAME
              value: {{ .Release.Name }}-mysql
          command:
            - kubectl
          args:
            - wait
            - pod/$(MYSQL_POD_NAME)
            - --for=condition=ready
            - --timeout=120s
        - name: wait-mysql-has-db
          image: {{ .Values.global.registry }}/{{ .Values.global.k8s.image }}:{{ .Values.global.k8s.tag | default "latest" }}
          imagePullPolicy: IfNotPresent
          env:
            {{- include "k8s.db.env" . | nindent 12 }}
            - name: MYSQL_POD_NAME
              value: {{ .Release.Name }}-mysql
          command:
            - /bin/sh
          args:
            - -c
            - |-
             while [ "$(kubectl exec $MYSQL_POD_NAME -- mysql -uroot -p$MYSQL_ROOT_PASSWORD -e 'show databases' 2>/dev/null | grep $MYSQL_DATABASE | awk '{print $1;}')" \!= "$MYSQL_DATABASE" ]; do
                echo 'Waiting for mysql database up...';
                sleep 5;
             done
      containers:
        - name: {{ include "my-project.fullname" . }}-listener
          image:  {{ .Values.global.registry }}/{{ .Values.image.repository }}:{{ .Values.image.tag | default "latest" }}
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          env:
          {{- include "k8s.db.env" . | nindent 12 }}
            - name: SCHEDULER_DB
              value: $(CONNECTION_STRING)
          command: {{- toYaml .Values.image.entrypoint | nindent 12 }}
          args: # some args ...
          ports:
            - name: api
              containerPort: 8081
          resources:
            limits:
              cpu: 1
              memory: 1024Mi
            requests:
              cpu: 100m
              memory: 50Mi
          readinessProbe:
            httpGet:
              path: /api/scheduler/healthcheck
              port: api
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 5
            timeoutSeconds: 1
          livenessProbe:
            tcpSocket:
              port: api
            initialDelaySeconds: 120
            periodSeconds: 10
            timeoutSeconds: 5
          volumeMounts:
            - name: {{ include "my-project.fullname" . }}-volume
              mountPath: /etc/test/scheduler.yaml
              subPath: scheduler.yaml
              readOnly: true
      volumes:
      - name: {{ include "my-project.fullname" . }}-volume
        configMap:
          name: {{ include "my-project.fullname" . }}-config
      restartPolicy: Never

服务（用于子域）：

apiVersion: v1
kind: Service
metadata:
  name: {{ include "my-project.fullname" . }}-listener-dmn
spec:
  selector:
    name: {{ include "my-project.fullname" . }}-listener
  ports:
    - name: api
      port: 8081
      targetPort: 8081
  type: ClusterIP

角色 + colebinding（启用curl命令访问）：

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: {{ include "my-project.fullname" . }}-role
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods"]
  verbs: ["get", "watch", "list", "update"]
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods/exec"]
  verbs: ["create", "delete", "deletecollection", "get", "list", "patch", "update", "watch"]
- apiGroups: ["", "app", "batch"] # "" indicates the core API group
  resources: ["jobs"]
  verbs: ["get", "watch", "list"]

Role-Binding:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: {{ include "go-scheduler.fullname" . }}-rolebinding
subjects:
- kind: ServiceAccount
  name: default
roleRef:
  kind: Role
  name: {{ include "go-scheduler.fullname" . }}-role
  apiGroup: rbac.authorization.k8s.io

最后是执行curl命令的测试仪：（

为了检查我将荚。

apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "my-project.fullname" . }}-test
  namespace: {{ .Release.Namespace }}
  labels:
    name: {{ include "my-project.fullname" . }}-test
    app: {{ include "my-project.fullname" . }}-test 
  annotations:
    "prometheus.io/scrape": {{ .Values.prometheus.scrape | quote }}
    "prometheus.io/path": {{ .Values.prometheus.path }}
    "prometheus.io/port": {{ .Values.ports.api.container | quote }}
spec:
  template: #PodTemplateSpec (Core/V1)
    spec: #PodSpec (core/v1)
      initContainers:
        # twice - can add in helers.tpl
        #
        - name: wait-sched-listener-exists
          image: {{ .Values.global.registry }}/{{ .Values.global.k8s.image }}:{{ .Values.global.k8s.tag | default "latest" }}
          imagePullPolicy: IfNotPresent
          env:
            - name: POD_NAME
              value: {{ include "my-project.fullname" . }}-listener
          command:
            - /bin/sh
          args:
            - -c
            - |-
              while [ "$(kubectl get job $POD_NAME 2>/dev/null | grep $POD_NAME | awk '{print $1;}')" \!= "$POD_NAME" ];do
                echo 'Waiting for scheduler pod to exist ...';
                sleep 5;
              done
        - name: wait-listener-running
          image: {{ .Values.global.registry }}/{{ .Values.global.k8s.image }}:{{ .Values.global.k8s.tag | default "latest" }}
          imagePullPolicy: IfNotPresent
          env:
            - name: POD_NAME
              value: {{ include "my-project.fullname" . }}-listener
          command:
            - /bin/sh
          args:
            - -c
            - |-
              while [ "$(kubectl get pods 2>/dev/null | grep $POD_NAME | awk '{print $3;}')" \!= "Running" ];do
                echo 'Waiting for scheduler pod to run ...';
                sleep 5;
              done
      containers:
        - name: {{ include "my-project.fullname" . }}-test
          image:  {{ .Values.global.registry }}/{{ .Values.global.k8s.image }}:{{ .Values.global.k8s.tag | default "latest" }}
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          command:
            - /bin/sh
          args:
            - -c
            - "tail -f"
     # instead of above can be curl: "curl -H 'Accept: application/json' -X get my-project-listener.my-project-listener-dmn:8081/api/scheduler/jobs"

      restartPolicy: Never

我输入测试吊舱

kubectl exec -it my-tester-<hash> -- /bin/sh

...并运行命令：

ping my-project-listener.my-project-listener-dmn

get：

ping: bad address 'my-project-listener.my-project-listener-dmn'

为pod做到这一点时：

ping pod-hostname.pod-subdomain（）：...数据字节

原文

I have K8s used by Helm 3.

I need to access a k8s job while running in yaml file (created by helm).

The kubectl version:

Client Version: version.Info{Major:"1", Minor:"21",
GitVersion:"v1.21.6",
GitCommit:"d921bc6d1810da51177fbd0ed61dc811c5228097",
GitTreeState:"clean", BuildDate:"2021-10-27T17:50:34Z",
GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"} Server
Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.6",
GitCommit:"d921bc6d1810da51177fbd0ed61dc811c5228097",
GitTreeState:"clean", BuildDate:"2021-10-27T17:44:26Z",
GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}

Helm version:

version.BuildInfo{Version:"v3.3.4",
GitCommit:"a61ce5633af99708171414353ed49547cf05013d",
GitTreeState:"clean", GoVersion:"go1.14.9"}

As the following link:
DNS concept

It works fine for Pod, but not for job.

As explained, for putting hostname and subdomain in Pod's YAML file, and add service that holds the domain...

Need to check the state if running.

for pod, it is ready state.

kubectl wait pod/pod-name --for=condition=ready ...

For job there is no ready state (while pod behind is running).

How can I check the state of pod behind the job (job is running) and how can I use host + subdomain for jobs?

My code ...
(I removed some security tags, but the same. Important - It may be complicated.

I create a listener - running when listen, with job that need to do some curl command, and this can be achieved whether it has access to that pod behind the job):

Listener (the pod is the last job):

What I added is hostname and subdomain (which work for Pod, and not for Job). If it ever was on Pod - no problem.

I also realized that the name of the Pod (created by the job) has a hash automatic extension.

apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "my-project.fullname" . }}-listener
  namespace: {{ .Release.Namespace }}
  labels:
    name: {{ include "my-project.fullname" . }}-listener
    app: {{ include "my-project.fullname" . }}-listener
    component: {{ .Chart.Name }}
    subcomponent: {{ .Chart.Name }}-listener
  annotations:
    "prometheus.io/scrape": {{ .Values.prometheus.scrape | quote }}
    "prometheus.io/path": {{ .Values.prometheus.path }}
    "prometheus.io/port": {{ .Values.ports.api.container | quote }}
spec:
  template: #PodTemplateSpec (Core/V1)
    spec: #PodSpec (core/v1)
      hostname: {{ include "my-project.fullname" . }}-listener
      subdomain: {{ include "my-project.fullname" . }}-listener-dmn
      initContainers:
        # twice - can add in helers.tpl
        - name: wait-mysql-exist-pod
          image: {{ .Values.global.registry }}/{{ .Values.global.k8s.image }}:{{ .Values.global.k8s.tag | default "latest" }}
          imagePullPolicy: IfNotPresent
          env:
            - name: MYSQL_POD_NAME
              value: {{ .Release.Name }}-mysql
            - name: COMPONENT_NAME
              value: {{ .Values.global.mysql.database.name }}
          command:
            - /bin/sh
          args:
            - -c
            - |-
              while [ "$(kubectl get pod $MYSQL_POD_NAME 2>/dev/null | grep $MYSQL_POD_NAME | awk '{print $1;}')" \!= "$MYSQL_POD_NAME" ];do
                echo 'Waiting for mysql pod to be existed...';
                sleep 5;
              done
        - name: wait-mysql-ready
          image: {{ .Values.global.registry }}/{{ .Values.global.k8s.image }}:{{ .Values.global.k8s.tag | default "latest" }}
          imagePullPolicy: IfNotPresent
          env:
            - name: MYSQL_POD_NAME
              value: {{ .Release.Name }}-mysql
          command:
            - kubectl
          args:
            - wait
            - pod/$(MYSQL_POD_NAME)
            - --for=condition=ready
            - --timeout=120s
        - name: wait-mysql-has-db
          image: {{ .Values.global.registry }}/{{ .Values.global.k8s.image }}:{{ .Values.global.k8s.tag | default "latest" }}
          imagePullPolicy: IfNotPresent
          env:
            {{- include "k8s.db.env" . | nindent 12 }}
            - name: MYSQL_POD_NAME
              value: {{ .Release.Name }}-mysql
          command:
            - /bin/sh
          args:
            - -c
            - |-
             while [ "$(kubectl exec $MYSQL_POD_NAME -- mysql -uroot -p$MYSQL_ROOT_PASSWORD -e 'show databases' 2>/dev/null | grep $MYSQL_DATABASE | awk '{print $1;}')" \!= "$MYSQL_DATABASE" ]; do
                echo 'Waiting for mysql database up...';
                sleep 5;
             done
      containers:
        - name: {{ include "my-project.fullname" . }}-listener
          image:  {{ .Values.global.registry }}/{{ .Values.image.repository }}:{{ .Values.image.tag | default "latest" }}
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          env:
          {{- include "k8s.db.env" . | nindent 12 }}
            - name: SCHEDULER_DB
              value: $(CONNECTION_STRING)
          command: {{- toYaml .Values.image.entrypoint | nindent 12 }}
          args: # some args ...
          ports:
            - name: api
              containerPort: 8081
          resources:
            limits:
              cpu: 1
              memory: 1024Mi
            requests:
              cpu: 100m
              memory: 50Mi
          readinessProbe:
            httpGet:
              path: /api/scheduler/healthcheck
              port: api
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 5
            timeoutSeconds: 1
          livenessProbe:
            tcpSocket:
              port: api
            initialDelaySeconds: 120
            periodSeconds: 10
            timeoutSeconds: 5
          volumeMounts:
            - name: {{ include "my-project.fullname" . }}-volume
              mountPath: /etc/test/scheduler.yaml
              subPath: scheduler.yaml
              readOnly: true
      volumes:
      - name: {{ include "my-project.fullname" . }}-volume
        configMap:
          name: {{ include "my-project.fullname" . }}-config
      restartPolicy: Never

The service (for the subdomain):

apiVersion: v1
kind: Service
metadata:
  name: {{ include "my-project.fullname" . }}-listener-dmn
spec:
  selector:
    name: {{ include "my-project.fullname" . }}-listener
  ports:
    - name: api
      port: 8081
      targetPort: 8081
  type: ClusterIP

Roles + RoleBinding (to enable access for curl command):

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: {{ include "my-project.fullname" . }}-role
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods"]
  verbs: ["get", "watch", "list", "update"]
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods/exec"]
  verbs: ["create", "delete", "deletecollection", "get", "list", "patch", "update", "watch"]
- apiGroups: ["", "app", "batch"] # "" indicates the core API group
  resources: ["jobs"]
  verbs: ["get", "watch", "list"]

Role-Binding:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: {{ include "go-scheduler.fullname" . }}-rolebinding
subjects:
- kind: ServiceAccount
  name: default
roleRef:
  kind: Role
  name: {{ include "go-scheduler.fullname" . }}-role
  apiGroup: rbac.authorization.k8s.io

And finally a tester that doing a curl command:

(For check I put tail -f), and enter the pod.

apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "my-project.fullname" . }}-test
  namespace: {{ .Release.Namespace }}
  labels:
    name: {{ include "my-project.fullname" . }}-test
    app: {{ include "my-project.fullname" . }}-test 
  annotations:
    "prometheus.io/scrape": {{ .Values.prometheus.scrape | quote }}
    "prometheus.io/path": {{ .Values.prometheus.path }}
    "prometheus.io/port": {{ .Values.ports.api.container | quote }}
spec:
  template: #PodTemplateSpec (Core/V1)
    spec: #PodSpec (core/v1)
      initContainers:
        # twice - can add in helers.tpl
        #
        - name: wait-sched-listener-exists
          image: {{ .Values.global.registry }}/{{ .Values.global.k8s.image }}:{{ .Values.global.k8s.tag | default "latest" }}
          imagePullPolicy: IfNotPresent
          env:
            - name: POD_NAME
              value: {{ include "my-project.fullname" . }}-listener
          command:
            - /bin/sh
          args:
            - -c
            - |-
              while [ "$(kubectl get job $POD_NAME 2>/dev/null | grep $POD_NAME | awk '{print $1;}')" \!= "$POD_NAME" ];do
                echo 'Waiting for scheduler pod to exist ...';
                sleep 5;
              done
        - name: wait-listener-running
          image: {{ .Values.global.registry }}/{{ .Values.global.k8s.image }}:{{ .Values.global.k8s.tag | default "latest" }}
          imagePullPolicy: IfNotPresent
          env:
            - name: POD_NAME
              value: {{ include "my-project.fullname" . }}-listener
          command:
            - /bin/sh
          args:
            - -c
            - |-
              while [ "$(kubectl get pods 2>/dev/null | grep $POD_NAME | awk '{print $3;}')" \!= "Running" ];do
                echo 'Waiting for scheduler pod to run ...';
                sleep 5;
              done
      containers:
        - name: {{ include "my-project.fullname" . }}-test
          image:  {{ .Values.global.registry }}/{{ .Values.global.k8s.image }}:{{ .Values.global.k8s.tag | default "latest" }}
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          command:
            - /bin/sh
          args:
            - -c
            - "tail -f"
     # instead of above can be curl: "curl -H 'Accept: application/json' -X get my-project-listener.my-project-listener-dmn:8081/api/scheduler/jobs"

      restartPolicy: Never

I enter the test pod

kubectl exec -it my-tester-<hash> -- /bin/sh

... and run the command:

ping my-project-listener.my-project-listener-dmn

Got:

ping: bad address 'my-project-listener.my-project-listener-dmn'

When doing that for pod:

PING pod-hostname.pod-subdomain (): ... data bytes

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无语# 2025-01-26 16:18:01

这里有很多内容，但我认为您应该能够通过一些小的更改来解决所有这些问题。

总之，我建议更改：

apiVersion: apps/v1
kind: Deployment     # <-- not a Job
metadata: &original-job-metadata-from-the-question
spec:
  template:
    metadata:
      labels:   # vvv matching the Service selector
        name: {{ include "my-project.fullname" . }}-listener
    spec:
      # delete all of the initContainers:
      containers: &original-container-list-from-the-question
      volumes: &original-volume-list-from-the-question
      # delete restartPolicy: (default value Always)

删除 Role 和 RoleBinding 对象；连接到服务 http://my-project-listener-dmn:8081 而不是单个 Pod；您可以在部署上kubectl wait --for=condition=available。

连接到服务，而不是单个 Pod（或作业或部署）。该服务名为 {{ include "my-project.fullname" 。 }}-listener-dmn 这是您应该连接到的主机名。该服务充当非常轻量级的集群内负载均衡器，并将请求转发到其选择器标识的 Pod 之一。

因此，在此示例中，您将连接到服务的名称和端口 http://my-project-listener-dmn:8081。您的应用程序不响应非常低级别的 ICMP 协议，我会避免 ping(1)，而选择更有用的诊断。还可以考虑将Service的端口设置为默认的HTTP端口80；它不一定需要与 Pod 的端口匹配。

服务选择器需要匹配 Pod 标签（而不是作业或部署的标签）。 Service 附加到 Pod； Job 或 Deployment 有一个创建 Pod 的模板；需要匹配的是那些标签。您需要向 Pod 模板添加标签：

spec:
  template:
    metadata:
      labels:
        name: {{ include "my-project.fullname" . }}-listener

或者，在 Helm 图表中，您有一个帮助程序来生成这些标签，

      labels: {{- include "my-project.labels" | nindent 8 }}

这里要检查的是 kubectl describe service my-project-listener-dmn。底部应该有一行显示 Endpoints: 和一些 IP 地址（技术上是一些单独的 Pod IP 地址，但您通常不需要知道这一点）。如果显示 Endpoints: 这通常表明标签不匹配。

您可能需要某种程度的自动重启。 Pod 可能会因多种原因而失败，包括代码错误和网络故障。如果您设置 restartPolicy: Never 那么您将拥有一个失败的 Pod，并且对服务的请求将失败，直到您采取某种手动干预。我建议至少将其设置为 restartPolicy: OnFailure，或者（对于部署）将其保留为默认值 Always。（有有关作业重启策略的更多讨论在 Kubernetes 文档中。）

您可能需要在此处进行部署。作业适用于您执行一组批处理然后作业完成的情况；这就是为什么 kubectl wait 没有您正在寻找的生命周期选项的部分原因。

我猜你想要一个部署。根据您在此处显示的内容，我认为您根本不需要进行任何更改，除了

apiVersion: apps/v1
kind: Deployment

迄今为止有关服务和 DNS 和标签的所有内容仍然适用。

您可以 kubectl wait 等待 Deployment 可用。由于作业预计运行完成并退出，因此这是 kubectl wait 允许的状态。如果至少有最少数量的托管 Pod 正在运行并且通过了运行状况检查，则部署是“可用”的，我认为这就是您所追求的状态。

kubectl wait --for=condition=available deployment/my-project-listener

有更简单的方法来检查数据库活动性。此处显示的大部分内容是一个具有特殊权限的复杂序列，用于在 Pod 启动之前查看数据库是否正在运行。

如果 pod 运行时数据库出现故障会发生什么？一个常见的情况是，您会遇到一系列级联异常，并且您的 Pod 会崩溃。然后使用 restartPolicy: Always Kubernetes 将尝试重新启动它；但如果数据库仍然不可用，则会再次崩溃；然后您将进入 CrashLoopBackOff 状态。如果数据库确实再次可用，那么最终 Kubernetes 将尝试重新启动 Pod 并且会成功。

同样的逻辑也适用于启动时。如果 Pod 尝试启动，而数据库尚未准备好，并且崩溃了，Kubernetes 将默认重新启动它，在前几次尝试后增加一些延迟。如果数据库在 30 秒左右启动，那么应用程序将在一分钟左右启动。重新启动计数将大于 0，但 kubectl messages --previous 希望有一个明显的异常。

这将使您删除此处显示的大约一半内容。删除全部 initContainers: 块；然后，由于您没有执行任何 Kubernetes API 操作，因此也删除 Role 和 RoleBinding 对象。

如果您确实想强制 Pod 等待数据库并将启动视为特殊情况，我建议您使用 mysql 客户端工具，甚至是 wait 来使用更简单的 shell 脚本-for 进行基本 TCP 调用的脚本（Docker Compose 等待在启动 Y 之前容器 X）。这仍然可以让您避免所有 Kubernetes RBAC 设置。

There's a lot here, but I think you should be able to resolve all of this with a couple of small changes.

In summary, I'd suggest changing:

apiVersion: apps/v1
kind: Deployment     # <-- not a Job
metadata: &original-job-metadata-from-the-question
spec:
  template:
    metadata:
      labels:   # vvv matching the Service selector
        name: {{ include "my-project.fullname" . }}-listener
    spec:
      # delete all of the initContainers:
      containers: &original-container-list-from-the-question
      volumes: &original-volume-list-from-the-question
      # delete restartPolicy: (default value Always)

Delete the Role and RoleBinding objects; connect to the Service http://my-project-listener-dmn:8081 and not an individual Pod; and you can kubectl wait --for=condition=available on the Deployment.

Connect to Services, not individual Pods (or Jobs or Deployments). The Service is named {{ include "my-project.fullname" . }}-listener-dmn and that is the host name you should connect to. The Service acts as a very lightweight in-cluster load balancer, and will forward requests on to one of the pods identified by its selector.

So in this example you'd connect to the Service's name and port, http://my-project-listener-dmn:8081. Your application doesn't answer the very-low-level ICMP protocol and I'd avoid ping(1) in favor of a more useful diagnostic. Also consider setting the Service's port to the default HTTP port 80; it doesn't necessarily need to match the Pod's port.

The Service selector needs to match the Pod labels (and not the Job's or Deployment's labels). A Service attaches to Pods; a Job or a Deployment has a template to create Pods; and it's those labels that need to match up. You need to add labels to the Pod template:

spec:
  template:
    metadata:
      labels:
        name: {{ include "my-project.fullname" . }}-listener

Or, in a Helm chart where you have a helper to generate these labels,

      labels: {{- include "my-project.labels" | nindent 8 }}

The thing to check here is kubectl describe service my-project-listener-dmn. There should be a line at the bottom that says Endpoints: with some IP addresses (technically some individual Pod IP addresses, but you don't usually need to know that). If it says Endpoints: <none> that's usually a sign that the labels don't match up.

You probably want some level of automatic restarts. A Pod can fail for lots of reasons, including code bugs and network hiccups. If you set restartPolicy: Never then you'll have a Failed Pod, and requests to the Service will fail until you take manual intervention of some sort. I'd suggest setting this to at least restartPolicy: OnFailure, or (for a Deployment) leaving it at its default value of Always. (There is more discussion on Job restart policies in the Kubernetes documentation.)

You probably want a Deployment here. A Job is meant for a case where you do some set of batch processing and then the job completes; that's part of why kubectl wait doesn't have the lifecycle option you're looking for.
I'm guessing you want a Deployment instead. With what you've shown here I don't think you need to make any changes at all besides

apiVersion: apps/v1
kind: Deployment

Everything so far about Services and DNS and labels still applies.

You can kubectl wait for a Deployment to be available. Since a Job is expected to run to completion and exit, that's the state kubectl wait allows. A Deployment is "available" if there is at least a minimum number of managed Pods running that pass their health checks, which I think is the state you're after.

kubectl wait --for=condition=available deployment/my-project-listener

There are simpler ways to check for database liveness. A huge fraction of what you show here is an involved sequence with special permissions to see if the database is running before the pod starts up.

What happens if the database fails while the pod is running? One common thing that will happen is you'll get a cascading sequence of exceptions and your pod will crash. Then with restartPolicy: Always Kubernetes will try to restart it; but if the database still isn't available, it will crash again; and you'll get to a CrashLoopBackOff state. If the database does become available again then eventually Kubernetes will try to restart the Pod and it will succeed.

This same logic can apply at startup time. If the Pod tries to start up, and the database isn't ready yet, and it crashes, Kubernetes will by default restart it, adding some delays after the first couple of attempts. If the database starts up within 30 seconds or so then the application will be up within a minute or so. The restart count will be greater than 0, but kubectl logs --previous will hopefully have a clear exception.

This will let you delete about half of what you show here. Delete all of the initContainers: block; then, since you're not doing any Kubernetes API operations, delete the Role and RoleBinding objects too.

If you really do want to force the Pod to wait for the database and treat startup as a special case, I'd suggest a simpler shell script using the mysql client tool, or even the wait-for script that makes basic TCP calls (the mechanism described in Docker Compose wait for container X before starting Y). This still lets you avoid all of the Kubernetes RBAC setup.

回复收藏 0 原文

~没有更多了~