如何用Python一次性上传多个文件到云文件？

发布于 2024-10-21 15:56:20 字数 618 浏览 2 评论 0原文

我正在使用 cloudfile 模块将文件上传到机架空间云文件，使用类似这样的伪代码

import cloudfiles

username = '---'
api_key = '---'

conn = cloudfiles.get_connection(username, api_key)
testcontainer = conn.create_container('test')

for f in get_filenames():
    obj = testcontainer.create_object(f)
    obj.load_from_filename(f)

：问题是我有很多小文件要上传，这样需要很长时间。

在文档中，我看到有一个类 ConnectionPool ，据说可以是用于并行上传文件。

有人可以告诉我如何使这段代码一次上传多个文件吗？

原文

I'm using the cloudfile module to upload files to rackspace cloud files, using something like this pseudocode:

import cloudfiles

username = '---'
api_key = '---'

conn = cloudfiles.get_connection(username, api_key)
testcontainer = conn.create_container('test')

for f in get_filenames():
    obj = testcontainer.create_object(f)
    obj.load_from_filename(f)

My problem is that I have a lot of small files to upload, and it takes too long this way.

Buried in the documentation, I see that there is a class ConnectionPool, which supposedly can be used to upload files in parallell.

Could someone please show how I can make this piece of code upload more than one file at a time?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

[旋木] 2024-10-28 15:56:20

ConnectionPool 类适用于偶尔需要向机架空间发送内容的多线程应用程序。

这样您就可以重用您的连接，但如果您有 100 个线程，则不必保持 100 个连接打开。

您只是在寻找多线程/多处理上传器。
以下是使用 multiprocessing 库的示例：

import cloudfiles
import multiprocessing

USERNAME = '---'
API_KEY = '---'


def get_container():
    conn = cloudfiles.get_connection(USERNAME, API_KEY)
    testcontainer = conn.create_container('test')
    return testcontainer

def uploader(filenames):
    '''Worker process to upload the given files'''
    container = get_container()

    # Keep going till you reach STOP
    for filename in iter(filenames.get, 'STOP'):
        # Create the object and upload
        obj = container.create_object(filename)
        obj.load_from_filename(filename)

def main():
    NUMBER_OF_PROCESSES = 16

    # Add your filenames to this queue
    filenames = multiprocessing.Queue()

    # Start worker processes
    for i in range(NUMBER_OF_PROCESSES):
        multiprocessing.Process(target=uploader, args=(filenames,)).start()

    # You can keep adding tasks until you add STOP
    filenames.put('some filename')

    # Stop all child processes
    for i in range(NUMBER_OF_PROCESSES):
        filenames.put('STOP')

if __name__ == '__main__':
    multiprocessing.freeze_support()
    main()

The ConnectionPool class is meant for a multithreading application that ocasionally has to send something to rackspace.

That way you can reuse your connection but you don't have to keep 100 connections open if you have 100 threads.

You are simply looking for a multithreading/multiprocessing uploader.
Here's an example using the multiprocessing library:

import cloudfiles
import multiprocessing

USERNAME = '---'
API_KEY = '---'


def get_container():
    conn = cloudfiles.get_connection(USERNAME, API_KEY)
    testcontainer = conn.create_container('test')
    return testcontainer

def uploader(filenames):
    '''Worker process to upload the given files'''
    container = get_container()

    # Keep going till you reach STOP
    for filename in iter(filenames.get, 'STOP'):
        # Create the object and upload
        obj = container.create_object(filename)
        obj.load_from_filename(filename)

def main():
    NUMBER_OF_PROCESSES = 16

    # Add your filenames to this queue
    filenames = multiprocessing.Queue()

    # Start worker processes
    for i in range(NUMBER_OF_PROCESSES):
        multiprocessing.Process(target=uploader, args=(filenames,)).start()

    # You can keep adding tasks until you add STOP
    filenames.put('some filename')

    # Stop all child processes
    for i in range(NUMBER_OF_PROCESSES):
        filenames.put('STOP')

if __name__ == '__main__':
    multiprocessing.freeze_support()
    main()

回复收藏 0 原文

~没有更多了~