使用Apache气流检查文件是否存在于Azure DataLake上的最佳方法是什么?
我有一个DAG,该DAG应检查是否已将文件上传到特定目录中的Azure Datalake。如果是这样,它允许其他DAG运行。
我考虑过使用FileSensor,但我认为FSCONNID参数不足以针对数据验证
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Azure provider but you can easily implement one since the
我没有测试它,但这应该可以工作:
用法示例:
There is no
AzureDataLakeSensor
in the Azure provider but you can easily implement one since theAzureDataLakeHook
hascheck_for_file
function so all needed is to wrap this function with Sensor class implementingpoke()
function ofBaseSensorOperator
. By doing so you can use Microsoft Azure Data Lake Connection directly.I didn't test it but this should work:
Usage example:
首先,看看。
我们可以看到,有专门的操作员对 Azure DataLake存储不幸的是,目前似乎只有
ADLSDELETEEPERATOR
可用。此
adlsdeleteOperator
使用 azuredatalakehook 您应该在自己的自定义操作员中重复使用以检查文件是否存在。我对您的建议是创建一个 checkoperator 使用ADLS钩检查输入中提供的文件是否存在
check_for_file
hook的函数。更新:正如注释中指出的那样,Checkoperator似乎是通过与SQL查询绑定的,并已弃用。使用自己的自定义传感器或自定义操作员是必经之路。
First of all, have a look at official Microsoft Operators for Airflow.
We can see that there are dedicated Operators to Azure DataLake Storage, unfortunately, only the
ADLSDeleteOperator
seems available at the moment.This
ADLSDeleteOperator
uses a AzureDataLakeHook which you should reuse in your own custom operator to check for file presence.My advice for you is to create a Child class of CheckOperator using the ADLS hook check if the file provided in input exists with
check_for_file
function of the hook.UPDATE: as pointed in comments, CheckOperator seems to by tied to SQL queries and is deprecated. Using your own custom Sensor or custom Operator is the way to go.
我使用拟议的API遇到了严重的问题。因此,我将Microsoft API嵌入了气流中。这很好。然后,您需要做的就是使用此操作员并通过Account_url和Access_Token。
I had severe issues using the proposed API. So I embedded the Microsoft API into Airflow. This was working fine. All you need to do then is to use this operator and pass account_url and access_token.