在SSH到远程计算机/边缘节点后,如何将文件从S3存储台下载到同一远程计算机中的特定路径?
我对气流是相对较新的,并且正在尝试解决一个特定的问题。 我想知道如何通过SSHOPERATOR下载SSH之后的SSS居住在S3存储桶中的文件。 我可以简单地使用一些BASH脚本(我已经使用S3 URL尝试过,但它不起作用)?还是气流中有任何操作员可以作为我的DAG的一部分执行此操作?我尝试了PythOnoperator,但是它似乎仅在指定本地文件路径(在OP_KWARGS字典中)时才能起作用。 以下是我的样品气流代码;任何帮助将不胜感激。
# command to execute on remote
download_file_cmd = 'rm -rf airflow_pipeline && mkdir -p airflow_pipeline && cd airflow_pipeline'
download_to_edge_node = SSHOperator(
task_id='download_file_to_edge_node',
ssh_conn_id='itversity',
command=download_file_cmd,
dag=dag
)
task_download_from_s3 = PythonOperator(
task_id='download_file_from_s3',
python_callable=download_from_s3,
op_kwargs={
'key': 'orders.csv',
'bucket_name': 'airflow-pipeline-files',
'local_path': '/home/danish/test'
}
)
I am relatively new to Airflow and am trying to solve a particular problem.
I wanted to know how I could download my file residing in an S3 bucket after SSH to a remote machine/client through the SSHOperator.
Can I simply use some bash script (I have tried this using the S3 URL, but it does not work)? or are there any operators available within airflow to perform this operation as part of my DAG? I tried the PythonOperator, but it seems to work only when I specify a local file path (in the op_kwargs dictionary).
Below is my sample Airflow code; any help will be highly appreciated.
# command to execute on remote
download_file_cmd = 'rm -rf airflow_pipeline && mkdir -p airflow_pipeline && cd airflow_pipeline'
download_to_edge_node = SSHOperator(
task_id='download_file_to_edge_node',
ssh_conn_id='itversity',
command=download_file_cmd,
dag=dag
)
task_download_from_s3 = PythonOperator(
task_id='download_file_from_s3',
python_callable=download_from_s3,
op_kwargs={
'key': 'orders.csv',
'bucket_name': 'airflow-pipeline-files',
'local_path': '/home/danish/test'
}
)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论