Google Cloud Composer气流SQLalchemy operationalitationalror导致DAG永远悬挂
我在云作曲家气流DAG中有很多任务,其中一个是kubernetespedeperator
。这个任务似乎永远陷入计划的状态>状态,因此DAG连续运行15小时而无需完成(通常需要大约一个小时)。我必须手动标记它无法结束。
我将DAG超时设置为2个小时,但没有任何区别。
Cloud Composer日志显示以下错误:
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not connect to server:
Connection refused
Is the server running on host "airflow-sqlproxy-service.default.svc.cluster.local" (10.7.124.107)
and accepting TCP/IP connections on port 3306?
错误日志还为我提供了有关该错误类型的文档的链接: https://docs.sqlalchemy.org/en/13/errors.html#operationalerrationerror
当下一个DAG按计划触发时,它可以正常工作,无需任何修复即可。这个问题间歇性地发生,我们无法复制它。
有人知道此错误的原因以及如何解决吗?
I have a bunch of tasks within a Cloud Composer Airflow DAG, one of which is a KubernetesPodOperator
. This task seems to get stuck in the scheduled
state forever and so the DAG runs continuously for 15 hours without finishing (it normally takes about an hour). I have to manually mark it failed for it to end.
I've set the DAG timeout to 2 hours but it does not make any difference.
The Cloud Composer logs show the following error:
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not connect to server:
Connection refused
Is the server running on host "airflow-sqlproxy-service.default.svc.cluster.local" (10.7.124.107)
and accepting TCP/IP connections on port 3306?
The error log also gives me a link to this documentation about that error type: https://docs.sqlalchemy.org/en/13/errors.html#operationalerror
When the DAG is next triggered on schedule, it works fine without any fix required. This issue happens intermittently, we've not been able to reproduce it.
Does anyone know the cause of this error and how to fix it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
该问题背后的原因与 sqlalchemy 使用线程会话并创建可可的会话,该会话可以在以后的气流代码中使用。如果查询和会话之间存在一些最小延迟,则MySQL可能会关闭连接。连接超时设置为大约10分钟。
解决方案:
在会话中为气流数据库提供了有效的会话
参数并在功能末尾关闭会话。
查询分开函数,以便有多个功能
使用 airflow.utils.db.provide_session装饰。在这种情况下,
检索查询结果后会自动关闭会议。
The reason behind the issue is related to SQLAlchemy using a session by a thread and creating a callable session that can be used later in the Airflow Code. If there are some minimum delays between the queries and sessions, MySQL might close the connection. The connection timeout is set to approximately 10 minutes.
Solutions:
provides a valid session to the Airflow database in the session
parameter and closes the session at the end of the function.
queries to separate functions, so that there are multiple functions
with the airflow.utils.db.provide_session decorator. In this case,
sessions are automatically closed after retrieving query results.