Google Composer 升级后 Airflow 调度程序无法启动
早上好,
将 Google Composer 升级到版本 1.18,将 Apache Airflow 升级到版本 1.10.15(使用 Composer 的自动升级)后,调度程序似乎无法启动。
气流消息:“调度程序似乎没有运行。最后一次检测信号是在 1 天前收到的。DAG 列表可能不会更新,并且不会安排新任务。”
得到这个后我尝试:
重新启动网络服务器
gcloud beta Composer 环境 restart-web-server
尝试重新启动 Airflow-Scheduler: kubectl 获取部署气流调度程序 -o yaml | kubectl 替换 --force -f -
我查看了 pod 的信息:
kubectl 描述 pod 气流调度程序
最后状态:已终止 原因:错误 退出代码:1 开始时间:2022 年 2 月 23 日星期三 15:59:13 +0000 完成时间:2022 年 2 月 23 日星期三 16:04:09 +0000
所以我删除了 pod 并等待它自行运行:
kubectl 删除 pod airflow-scheduler-...
编辑 1:来自 Pod 的日志:
Dags 和插件尚未同步
- 编辑 2:其他日志:
正在建立同步状态... 开始同步... 正在复制 gs://europe-west1-********-bucket/dags/sql/... 跳过尝试下载到以斜杠结尾的文件名 (/home/airflow/gcs/dags/sql/)。这通常发生在使用时 gsutil 从 Cloud Console 创建的子目录下载 (https://cloud.google.com/console) / [0/1 个文件][ 0.0 B/ 11.0 B] 0% 完成 InvalidUrl 错误:无效的目标路径:/home/airflow/gcs/dags/sql/
但它继续单独重新启动,有时会出现 CrashLoopBackOff 因此表明容器重新启动后反复崩溃
不知道我还能做什么:/。
感谢您的帮助:)
Good morning,
After upgrade the Google Composer to the version 1.18 and Apache Airflow to the version 1.10.15 (using the auto upgrade from the composer) the scheduler does not seem to be able to start.
After get this I tried:
Restart web server
gcloud beta composer environments restart-web-server
Try to restart Airflow-Scheduler:
kubectl get deployment airflow-scheduler -o yaml | kubectl replace --force -f -
I looked the info of the pod:
kubectl describe pod airflow-scheduler
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 23 Feb 2022 15:59:13 +0000
Finished: Wed, 23 Feb 2022 16:04:09 +0000
So I deleted the pod and wait until it run by itself:
kubectl delete pod airflow-scheduler-...
EDIT 1: The logs from the pod:
Dags and plugins are not synced yet
- EDIT 2: Additional logs:
Building synchronization state...
Starting synchronization...
Copying gs://europe-west1-********-bucket/dags/sql/...
Skipping attempt to download to filename ending with slash
(/home/airflow/gcs/dags/sql/). This typically happens when using
gsutil to download from a subdirectory created by the Cloud Console
(https://cloud.google.com/console)
/ [0/1 files][ 0.0 B/ 11.0 B] 0% Done InvalidUrl Error: Invalid destination path: /home/airflow/gcs/dags/sql/
But it continues restarting alone and sometimes appears the CrashLoopBackOff so indicates that a container is repeatedly crashing after restarting
Not sure what could I do more :/.
Thanks for the help :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您面临的问题与资源达到限制并且不允许您启动调度程序的问题有关。
我的假设是这可能会发生:
被杀了,你能把它们移除来检查是否可以阻止
崩溃循环?
启动调度程序作业,您可以向其中添加资源。
你可以用这个做的事情是重新启动调度程序
您自己的,通过使用 ssh 连接到实例。
The problem that you are facing has to do with a problem where the resources are getting on the limits and this is not letting you start the Scheduler.
My assumptions are that this could be happening:
get killed, can you remove them to check if that stops the
crashloop?
start the scheduler job, you can add resources to this.
thing that you could do with this is to restart the scheduler on
your own, by using ssh to connect into the instance.
在我们的 DAG 文件夹(存储桶内)中,我们有另一个文件夹,其中包含由不同 BigQuery 运算符触发的所有 SQL。由于某种原因,该文件夹的同步未正确完成,因此删除该文件夹并再次添加后,工作人员再次启动。
In our DAGs folder (inside the bucket) we have another folder with the all SQLs that are triggered by the different BigQuery operators. For some reason the synchronisation of that folder was not being done correctly, so after deleting the folder and adding it again the workers were up again.