将文件从 SFTP 递归移动到 S3 并保留结构
我正在尝试将文件从 SFTP 服务器递归移动到 S3,可能使用 boto3
。我也想保留文件夹/文件结构。我想这样做:
import pysftp
private_key = "/mnt/results/sftpkey"
srv = pysftp.Connection(host="server.com", username="user1", private_key=private_key)
srv.get_r("/mnt/folder", "./output_folder")
然后使用 boto3
将这些文件上传到 S3。但服务器上的文件夹和文件数量多、层次深、体积大。所以我的机器最终耗尽了内存和磁盘空间。我正在考虑一个脚本,我可以在其中下载单个文件并上传单个文件,然后删除并重复。
我知道这需要很长时间才能完成,但我可以将其作为一项作业来运行,而不会耗尽空间,并且不必让我的计算机始终保持打开状态。有人做过类似的事情吗?任何帮助表示赞赏!
I'm trying to recursively move files from an SFTP server to S3, possibly using boto3
. I want to preserve the folder/file structure as well. I was looking to do it this way:
import pysftp
private_key = "/mnt/results/sftpkey"
srv = pysftp.Connection(host="server.com", username="user1", private_key=private_key)
srv.get_r("/mnt/folder", "./output_folder")
Then take those files and upload them to S3 using boto3
. However, the folders and files on the server are numerous with deep levels and also large in size. So my machine ends up running out of memory and disk space. I was thinking of a script where I could download single files and upload single files and then delete and repeat.
I know this would take a long time to finish, but I can run this as a job without running out of space and not keep my machine open the entire time. Has anyone done something similar? Any help is appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您无法(或不想)在将所有文件发送到 S3 之前一次下载所有文件,则需要一次下载一个。
此外,从这里开始,您需要构建要下载的文件列表,然后对其进行处理,将一个文件传输到本地计算机,然后将其发送到 S3。
一个非常简单的版本看起来像这样:
错误检查和速度改进显然是可能的。
If you can't (or don't want) to download all of the files at once before sending them to S3, then you need to download them one at a time.
Further, from there, it follows that you'll need to build a list of files to download, then work on them, transferring one file to your local computer, then sending it to S3.
A very simple version of this would look something like this:
Error checking, and speed improvements are obviously possible.
您必须逐个文件地执行此操作。
从此处的递归下载代码开始:
来自 Linux 的 Python pysftp get_r 在 Linux 上工作正常,但在 Windows 上不行
在每个
sftp.get
之后,执行 S3 上传并删除文件。实际上,您甚至可以将文件从 SFTP 复制到 S3,而无需在本地存储文件:
使用 Paramiko 将文件从 SFTP 传输到 S3
You have to do it file-by-file.
Start with the recursive download code here:
Python pysftp get_r from Linux works fine on Linux but not on Windows
After each
sftp.get
, do S3 upload and remove the file.Actually you can even copy the file from SFTP to S3 without storing the file locally:
Transfer file from SFTP to S3 using Paramiko