滚动更新期间无法访问 GKE Django MySQL
我在 GKE 中部署了 Django 应用程序。 (使用本教程完成)
我的配置文件:myapp.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp-app
image: gcr.io/myproject/myapp
imagePullPolicy: IfNotPresent
---------
- image: gcr.io/cloudsql-docker/gce-proxy:1.16
name: cloudsql-proxy
command: ["/cloud_sql_proxy", "--dir=/cloudsql",
"-instances=myproject:europe-north1:myapp=tcp:3306",
"-credential_file=/secrets/cloudsql/credentials.json"]
apiVersion: v1
kind: Service
metadata:
name: myapp
labels:
app: myapp
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 8080
selector:
app: myapp
settings.py
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': os.environ['DATABASE_NAME'],
'USER': os.environ['DATABASE_USER'],
'PASSWORD': os.environ['DATABASE_PASSWORD'],
'HOST': '127.0.0.1',
'PORT': os.getenv('DATABASE_PORT', '3306'),
}
现在,当我进行滚动更新或通过
kubectl rollout restart deployment myapp
或
kubectl apply -f myapp.yaml
处于以下状态:
NAME READY STATUS RESTARTS AGE
myapp-8477898cff-5wztr 2/2 Terminating 0 88s
myapp-8477898cff-ndt5b 2/2 Terminating 0 85s
myapp-8477898cff-qxzsh 2/2 Terminating 0 82s
myapp-97d6ccfc4-4qmpj 2/2 Running 0 6s
myapp-97d6ccfc4-vr6mb 2/2 Running 0 4s
myapp-97d6ccfc4-xw294 2/2 Running 0 7s
kubectl get pods
时, 推出期间的时间:
OperationalError at /
(2003, "Can't connect to MySQL server on '127.0.0.1' (111)")
请告知我如何调整设置以在没有停机/此错误的情况下推出
UPD
我通过查看日志发现发生这种情况是因为cloudsql-proxy
首先关闭了应用程序容器仍然存在。
应用程序日志:
Found 3 pods, using pod/myapp-f59c686b5-6t7c4
[2022-02-27 17:39:55 +0000] [7] [INFO] Starting gunicorn 20.0.4
[2022-02-27 17:39:55 +0000] [7] [INFO] Listening at: http://0.0.0.0:8080 (7)
[2022-02-27 17:39:55 +0000] [7] [INFO] Using worker: sync
[2022-02-27 17:39:55 +0000] [10] [INFO] Booting worker with pid: 10
Internal Server Error: /api/health/ # here cloudsql-proxy died
Internal Server Error: /api/health/
Internal Server Error: /api/health/
.... here more messages of Internal Server Error ...
rpc error: code = NotFound desc = an error occurred when try to find container "ec7658770c772eff6efb544a502fcd1841d7401add6efb2b53bf264b8eca1bb6": not founde
cloudsql-proxy日志
2022/02/28 08:17:58 New connection for "myapp:europe-north1:myapp"
2022/02/28 08:17:58 Client closed local connection on 127.0.0.1:3306
2022/02/28 08:17:58 Client closed local connection on 127.0.0.1:3306
2022/02/28 08:17:59 Received TERM signal. Waiting up to 0s before terminating.
所以我想解决方案应该是关闭命令 - 以某种方式在关闭之前关闭应用程序pod 更新时的 cloudsql-proxy。
I have Django application deployed in GKE. (Done with this tutorial)
My configuration file: myapp.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp-app
image: gcr.io/myproject/myapp
imagePullPolicy: IfNotPresent
---------
- image: gcr.io/cloudsql-docker/gce-proxy:1.16
name: cloudsql-proxy
command: ["/cloud_sql_proxy", "--dir=/cloudsql",
"-instances=myproject:europe-north1:myapp=tcp:3306",
"-credential_file=/secrets/cloudsql/credentials.json"]
apiVersion: v1
kind: Service
metadata:
name: myapp
labels:
app: myapp
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 8080
selector:
app: myapp
settings.py
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': os.environ['DATABASE_NAME'],
'USER': os.environ['DATABASE_USER'],
'PASSWORD': os.environ['DATABASE_PASSWORD'],
'HOST': '127.0.0.1',
'PORT': os.getenv('DATABASE_PORT', '3306'),
}
Now, when I do rolling update or via
kubectl rollout restart deployment myapp
or
kubectl apply -f myapp.yaml
kubectl get pods
is in the following state:
NAME READY STATUS RESTARTS AGE
myapp-8477898cff-5wztr 2/2 Terminating 0 88s
myapp-8477898cff-ndt5b 2/2 Terminating 0 85s
myapp-8477898cff-qxzsh 2/2 Terminating 0 82s
myapp-97d6ccfc4-4qmpj 2/2 Running 0 6s
myapp-97d6ccfc4-vr6mb 2/2 Running 0 4s
myapp-97d6ccfc4-xw294 2/2 Running 0 7s
I am getting the following error for some amount of time during rolling out:
OperationalError at /
(2003, "Can't connect to MySQL server on '127.0.0.1' (111)")
Please advise how can I ajust settings to have rollout without a downtime/this error
UPD
I have figured out by looking into logs that this happens because cloudsql-proxy
brought down first while application container is still alive.
Log of app:
Found 3 pods, using pod/myapp-f59c686b5-6t7c4
[2022-02-27 17:39:55 +0000] [7] [INFO] Starting gunicorn 20.0.4
[2022-02-27 17:39:55 +0000] [7] [INFO] Listening at: http://0.0.0.0:8080 (7)
[2022-02-27 17:39:55 +0000] [7] [INFO] Using worker: sync
[2022-02-27 17:39:55 +0000] [10] [INFO] Booting worker with pid: 10
Internal Server Error: /api/health/ # here cloudsql-proxy died
Internal Server Error: /api/health/
Internal Server Error: /api/health/
.... here more messages of Internal Server Error ...
rpc error: code = NotFound desc = an error occurred when try to find container "ec7658770c772eff6efb544a502fcd1841d7401add6efb2b53bf264b8eca1bb6": not founde
Log of cloudsql-proxy
2022/02/28 08:17:58 New connection for "myapp:europe-north1:myapp"
2022/02/28 08:17:58 Client closed local connection on 127.0.0.1:3306
2022/02/28 08:17:58 Client closed local connection on 127.0.0.1:3306
2022/02/28 08:17:59 Received TERM signal. Waiting up to 0s before terminating.
So I guess solution should be to bring the order in shutdown - somehow shutdown the application before shutting down cloudsql-proxy when pod is updated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看起来代理的 sidecar 正在终止,并且不允许您在应用程序之前进行清理。
考虑使用
-term-timeout
标志给自己一些时间: https://github.com/GoogleCloudPlatform/cloudsql-proxy#-term_timeout30sIt looks like the sidecar for the proxy is terminating, and not letting you clean up before the application does.
Consider using the
-term-timeout
flag to give yourself some time: https://github.com/GoogleCloudPlatform/cloudsql-proxy#-term_timeout30s