滚动更新期间无法访问 GKE Django MySQL

发布于 2025-01-10 05:17:32 字数 3459 浏览 0 评论 0原文

我在 GKE 中部署了 Django 应用程序。（使用本教程完成）

我的配置文件：myapp.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp-app
        image: gcr.io/myproject/myapp
        imagePullPolicy: IfNotPresent

    ---------

      - image: gcr.io/cloudsql-docker/gce-proxy:1.16
        name: cloudsql-proxy
        command: ["/cloud_sql_proxy", "--dir=/cloudsql",
                  "-instances=myproject:europe-north1:myapp=tcp:3306",
                  "-credential_file=/secrets/cloudsql/credentials.json"]


apiVersion: v1
kind: Service
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: myapp

settings.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': os.environ['DATABASE_NAME'],
        'USER': os.environ['DATABASE_USER'],
        'PASSWORD': os.environ['DATABASE_PASSWORD'],
        'HOST': '127.0.0.1',
        'PORT': os.getenv('DATABASE_PORT', '3306'),
    }

现在，当我进行滚动更新或通过

kubectl rollout restart deployment myapp

或

kubectl apply -f myapp.yaml

处于以下状态：

NAME                         READY   STATUS        RESTARTS   AGE
myapp-8477898cff-5wztr   2/2     Terminating   0          88s
myapp-8477898cff-ndt5b   2/2     Terminating   0          85s
myapp-8477898cff-qxzsh   2/2     Terminating   0          82s
myapp-97d6ccfc4-4qmpj    2/2     Running       0          6s
myapp-97d6ccfc4-vr6mb    2/2     Running       0          4s
myapp-97d6ccfc4-xw294    2/2     Running       0          7s

kubectl get pods 时，推出期间的时间：

OperationalError at /
(2003, "Can't connect to MySQL server on '127.0.0.1' (111)")

请告知我如何调整设置以在没有停机/此错误的情况下推出

UPD

我通过查看日志发现发生这种情况是因为cloudsql-proxy首先关闭了应用程序容器仍然存在。

应用程序日志：

Found 3 pods, using pod/myapp-f59c686b5-6t7c4
[2022-02-27 17:39:55 +0000] [7] [INFO] Starting gunicorn 20.0.4
[2022-02-27 17:39:55 +0000] [7] [INFO] Listening at: http://0.0.0.0:8080 (7)
[2022-02-27 17:39:55 +0000] [7] [INFO] Using worker: sync
[2022-02-27 17:39:55 +0000] [10] [INFO] Booting worker with pid: 10
Internal Server Error: /api/health/    # here cloudsql-proxy died
Internal Server Error: /api/health/
Internal Server Error: /api/health/

.... here more messages of Internal Server Error ...

rpc error: code = NotFound desc = an error occurred when try to find container "ec7658770c772eff6efb544a502fcd1841d7401add6efb2b53bf264b8eca1bb6": not founde

cloudsql-proxy日志

2022/02/28 08:17:58 New connection for "myapp:europe-north1:myapp"
2022/02/28 08:17:58 Client closed local connection on 127.0.0.1:3306
2022/02/28 08:17:58 Client closed local connection on 127.0.0.1:3306
2022/02/28 08:17:59 Received TERM signal. Waiting up to 0s before terminating.

所以我想解决方案应该是关闭命令 - 以某种方式在关闭之前关闭应用程序pod 更新时的 cloudsql-proxy。

原文

I have Django application deployed in GKE. (Done with this tutorial)

My configuration file: myapp.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp-app
        image: gcr.io/myproject/myapp
        imagePullPolicy: IfNotPresent

    ---------

      - image: gcr.io/cloudsql-docker/gce-proxy:1.16
        name: cloudsql-proxy
        command: ["/cloud_sql_proxy", "--dir=/cloudsql",
                  "-instances=myproject:europe-north1:myapp=tcp:3306",
                  "-credential_file=/secrets/cloudsql/credentials.json"]


apiVersion: v1
kind: Service
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: myapp

settings.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': os.environ['DATABASE_NAME'],
        'USER': os.environ['DATABASE_USER'],
        'PASSWORD': os.environ['DATABASE_PASSWORD'],
        'HOST': '127.0.0.1',
        'PORT': os.getenv('DATABASE_PORT', '3306'),
    }

Now, when I do rolling update or via

kubectl rollout restart deployment myapp

kubectl apply -f myapp.yaml

kubectl get pods is in the following state:

NAME                         READY   STATUS        RESTARTS   AGE
myapp-8477898cff-5wztr   2/2     Terminating   0          88s
myapp-8477898cff-ndt5b   2/2     Terminating   0          85s
myapp-8477898cff-qxzsh   2/2     Terminating   0          82s
myapp-97d6ccfc4-4qmpj    2/2     Running       0          6s
myapp-97d6ccfc4-vr6mb    2/2     Running       0          4s
myapp-97d6ccfc4-xw294    2/2     Running       0          7s

I am getting the following error for some amount of time during rolling out:

OperationalError at /
(2003, "Can't connect to MySQL server on '127.0.0.1' (111)")

Please advise how can I ajust settings to have rollout without a downtime/this error

UPD

I have figured out by looking into logs that this happens because cloudsql-proxy brought down first while application container is still alive.

Log of app:

Found 3 pods, using pod/myapp-f59c686b5-6t7c4
[2022-02-27 17:39:55 +0000] [7] [INFO] Starting gunicorn 20.0.4
[2022-02-27 17:39:55 +0000] [7] [INFO] Listening at: http://0.0.0.0:8080 (7)
[2022-02-27 17:39:55 +0000] [7] [INFO] Using worker: sync
[2022-02-27 17:39:55 +0000] [10] [INFO] Booting worker with pid: 10
Internal Server Error: /api/health/    # here cloudsql-proxy died
Internal Server Error: /api/health/
Internal Server Error: /api/health/

.... here more messages of Internal Server Error ...

rpc error: code = NotFound desc = an error occurred when try to find container "ec7658770c772eff6efb544a502fcd1841d7401add6efb2b53bf264b8eca1bb6": not founde

Log of cloudsql-proxy

2022/02/28 08:17:58 New connection for "myapp:europe-north1:myapp"
2022/02/28 08:17:58 Client closed local connection on 127.0.0.1:3306
2022/02/28 08:17:58 Client closed local connection on 127.0.0.1:3306
2022/02/28 08:17:59 Received TERM signal. Waiting up to 0s before terminating.

So I guess solution should be to bring the order in shutdown - somehow shutdown the application before shutting down cloudsql-proxy when pod is updated.

分享到QQ

分享到微博