MWAA - 气流 - PythonVirtualenvOperator 需要 virtualenv
我正在使用AWS的 mwaa服务(2.2(2.2)。 2)运行各种DAG,其中大多数是使用标准的Pythononerator类型实现的。我将DAG与任何共享要求一起将DAG捆绑到S3存储桶中,然后将MWAA指向相关对象&版本。到目前为止,一切运行顺利。
我现在想使用 pythonvirtualenvoperator 类型,这是一个奇怪的类型,开箱即用。我正在关注他们的指南使用自定义插件,但继续从气流中收到错误,在仪表板的顶部显示了大红色写作:
dag导入错误(1) ... ... ... AirFlowException:Pythonvirtualenvoperator需要Virtualenv,请安装它。
我已经确认该插件确实是通过气流拾取的(我看到它在管理员屏幕中引用了它),并且为了避免疑问,我正在使用AWS在其示例中提供的DAG中提供的确切代码。 AWS对此的文档很轻,我还没有偶然发现任何社区讨论。
从AWS的文档中,我们希望该插件在处理任何DAG之前都可以在启动过程中运行。该插件本身似乎有效地重写了VENV命令以使用PIP安装的版本,而不是安装在机器上的版本,但是我很难以我期望的顺序验证事情正在发生。任何关于调试实例行为的指示都将不胜感激。
有人面临类似的问题吗? MWAA文档中是否有需要解决的差距?我是否想念一些非常明显的东西?
可能是相关的,但是我确实在调度程序的日志中看到了此警告,这可能表明MWAA为什么努力解决依赖性?
警告:脚本VirtualEnv安装在不在路径上的'/usr/local/airflow/.local/bin中。
I am using AWS's MWAA service (2.2.2) to run a variety of DAGs, most of which are implemented with standard PythonOperator types. I bundle the DAGs into an S3 bucket alongside any shared requirements, then point MWAA to the relevant objects & versions. Everything runs smoothly so far.
I would now like to implement a DAG using the PythonVirtualenvOperator type, which AWS acknowledge is not supported out of the box. I am following their guide on how to patch the behaviour using a custom plugin, but continue to receive an error from Airflow, shown at the top of the dashboard in big red writing:
DAG Import Errors (1)
... ...
AirflowException: PythonVirtualenvOperator requires virtualenv, please install it.
I've confirmed that the plugin is indeed being picked up by Airflow (I see it referenced in the admin screen), and for the avoidance of doubt I am using the exact code provided by AWS in their examples for the DAG. AWS's documentation on this is pretty light and I've yet to stumble across any community discussion for the same.
From AWS's docs, we'd expect the plugin to run at startup prior to any DAGs being processed. The plugin itself appears to effectively rewrite the venv command to use the pip-installed version, rather than that which is installed on the machine, however I've struggled to verify that things are happening in the order I expect. Any pointers on debugging the instance's behavior would be very much appreciated.
Has anyone faced a similar issue? Is there a gap in the MWAA documentation that needs addressing? Am I missing something incredibly obvious?
Possibly related, but I do see this warning in the scheduler's logs, which may indicate why MWAA is struggling to resolve the dependency?
WARNING: The script virtualenv is installed in '/usr/local/airflow/.local/bin' which is not on PATH.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Airflow 使用 Shutil.which 来查找 virtualenv。通过requirements.txt安装的virtualenv不在路径上。将 virtualenv 的路径添加到 PATH 可以解决此问题。
这里的文档是错误的 https://docs.aws.amazon .com/mwaa/latest/userguide/samples-virtualenv.html
Airflow uses shutil.which to look for virtualenv. The installed virtualenv via requirements.txt isn't on the PATH. Adding the path to virtualenv to PATH solves this.
The doc here is wrong https://docs.aws.amazon.com/mwaa/latest/userguide/samples-virtualenv.html
我想对公认的答案发表评论,但我还没有足够的声誉。
如果您处于MWAA的最新版本(使用气流2.4.3),那么您需要更改以下行:
对此:
要考虑Python的新版本
I wanted to make this a comment on the accepted answer, but I don't have enough reputation yet.
If you're in the newest version of MWAA (using airflow 2.4.3), then you'll need to change this line:
to this:
to account for the new version of python