在不同的SFTP目录中将多个文件下载到本地

发布于 2025-02-09 07:47:14 字数 524 浏览 2 评论 0原文

我有一个场景,我们需要将SFTP服务器中不同目录中的某些图像文件下载到本地。

Example : 
/IMAGES/folder1 has img11, img12, img13, img14
/IMAGES/folder2 has img21, img22, img23, img24
/IMAGES/folder3 has img31, img32, img33, img34
And I need to download img12, img23 and img34 from folder 1, 2 and 3 respectively

现在,我进入每个文件夹并单独获取图像,这需要大量时间(因为有10,000张图像要下载)。

我还发现,下载一个相同大小的单个文件(如多个图像文件的大小)所需的时间很少。

我的问题是,有没有办法将这些多个文件放在一起而不是接一个地下载它们?

我想到的一种方法是将所有文件复制到SFTP服务器中的temp文件夹,然后下载目录,但SFTP不允许“复制”,而我无法使用'Rename'临时目录

I have a scenario where we need to download certain image files in different directories in SFTP server to local.

Example : 
/IMAGES/folder1 has img11, img12, img13, img14
/IMAGES/folder2 has img21, img22, img23, img24
/IMAGES/folder3 has img31, img32, img33, img34
And I need to download img12, img23 and img34 from folder 1, 2 and 3 respectively

Right now I go inside each folder and get the images individually which takes an extraordinary amount of time(since there are 10,000s of images to download).

I have also found out that downloading a single file of the same size(as that of multiple image files) takes a fraction of the time.

My question is, is there a way to get these multiple files together instead of downloading them one after another ?

One approach I came up with was to copy all the files to a temp folder in sftp server and then download the directory but sftp does not allow 'copy', and I can not use 'rename' because then I will be moving the files to temp directory

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

青春有你 2025-02-16 07:47:14

您可以使用过程池打开多个SFTP连接并并行下载。例如,

from paramiko import SSHClient
from multiprocessing import Pool

def download_init(host):
    global client, sftp
    client = SSHClient()
    client.load_system_host_keys()
    client.connect(host)
    sftp = ssh_client.open_sftp()

def download_close(dummy):
    client.close()

def download_worker(params):
    local_path, remote_path = *params
    sftp.get(remote_path, local_path)

list_of_local_and_remote_files = [
    ["/client/files/folder1/img11", "/IMAGES/folder1/img11"],
]

def downloader(files):
    pool_size = 8
    pool = Pool(8, initializer=download_init, 
        initargs=["sftpserver.example.com"])
    result = pool.map(download_worker, files, chunksize=10)
    pool.map(download_close, range(pool_size))

if __name__ == "__main__":
    downloader(list_of_local_and_remote_files)

不幸的是,pool没有最终确定器来撤消初始化器中设置的内容。通常不是必需的 - 退出过程已经足够清理。在示例中,我刚刚编写了一个单独的工作功能来清理内容。通过每个泳池进程有1个工作项目,他们每个都会接到1个电话。

You could use a process pool to open multiple sftp connections and download in parallel. For example,

from paramiko import SSHClient
from multiprocessing import Pool

def download_init(host):
    global client, sftp
    client = SSHClient()
    client.load_system_host_keys()
    client.connect(host)
    sftp = ssh_client.open_sftp()

def download_close(dummy):
    client.close()

def download_worker(params):
    local_path, remote_path = *params
    sftp.get(remote_path, local_path)

list_of_local_and_remote_files = [
    ["/client/files/folder1/img11", "/IMAGES/folder1/img11"],
]

def downloader(files):
    pool_size = 8
    pool = Pool(8, initializer=download_init, 
        initargs=["sftpserver.example.com"])
    result = pool.map(download_worker, files, chunksize=10)
    pool.map(download_close, range(pool_size))

if __name__ == "__main__":
    downloader(list_of_local_and_remote_files)

Its unfortunate that Pool doesn't have a finalizer to undo what was set in the initializer. Its not usually necessary - the exiting process is cleanup enough. In the example I just wrote a separate worker function that cleans things up. By having 1 work item per pool process, they each get 1 call.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文