如何让 Pool.map 采用 lambda 函数

发布于 2024-10-14 21:28:46 字数 297 浏览 2 评论 0原文

我有以下函数:

def copy_file(source_file, target_dir):
    pass

现在我想使用 multiprocessing 立即执行此函数:

p = Pool(12)
p.map(lambda x: copy_file(x,target_dir), file_list)

问题是,lambda 无法进行 pickle,因此会失败。解决这个问题最简洁的(Pythonic)方法是什么?

I have the following function:

def copy_file(source_file, target_dir):
    pass

Now I would like to use multiprocessing to execute this function at once:

p = Pool(12)
p.map(lambda x: copy_file(x,target_dir), file_list)

The problem is, lambda's can't be pickled, so this fails. What is the most neat (pythonic) way to fix this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

情丝乱 2024-10-21 21:28:46

使用函数对象:

class Copier(object):
    def __init__(self, tgtdir):
        self.target_dir = tgtdir
    def __call__(self, src):
        copy_file(src, self.target_dir)

运行 Pool.map

p.map(Copier(target_dir), file_list)

Use a function object:

class Copier(object):
    def __init__(self, tgtdir):
        self.target_dir = tgtdir
    def __call__(self, src):
        copy_file(src, self.target_dir)

To run your Pool.map:

p.map(Copier(target_dir), file_list)
救赎№ 2024-10-21 21:28:46

对于 Python2.7+ 或 Python3,您可以使用 functools.partial

import functools
copier = functools.partial(copy_file, target_dir=target_dir)
p.map(copier, file_list)

For Python2.7+ or Python3, you could use functools.partial:

import functools
copier = functools.partial(copy_file, target_dir=target_dir)
p.map(copier, file_list)
默嘫て 2024-10-21 21:28:46

问题有点老了,但如果你仍在使用 Python 2,我的答案可能会有用。

技巧是使用 pathos 项目的一部分:pathos 项目:multiprocess 多处理的分支。它摆脱了原始多进程的烦人限制。

安装:pip install multiprocess

使用方法:

>>> from multiprocess import Pool
>>> p = Pool(4)
>>> print p.map(lambda x: (lambda y:y**2)(x) + x, xrange(10))
[0, 2, 6, 12, 20, 30, 42, 56, 72, 90]

Question is a bit old but if you are still use Python 2 my answer can be useful.

Trick is to use part of pathos project: multiprocess fork of multiprocessing. It get rid of annoying limitation of original multiprocess.

Installation: pip install multiprocess

Usage:

>>> from multiprocess import Pool
>>> p = Pool(4)
>>> print p.map(lambda x: (lambda y:y**2)(x) + x, xrange(10))
[0, 2, 6, 12, 20, 30, 42, 56, 72, 90]
故乡的云 2024-10-21 21:28:46

From this answer, pathos let's you run your lambda p.map(lambda x: copy_file(x,target_dir), file_list) directly, saving all the workarounds / hacks

爱情眠于流年 2024-10-21 21:28:46

您可以使用starmap()通过池化来解决这个问题。

假定您有一个文件列表(例如在工作目录中),并且您有一个要将这些文件复制到的位置,那么您可以 import os 并使用 os.system( ) 在 python 中运行终端命令。这将使您可以轻松地移动文件。

但是,在开始之前,您需要创建一个变量 res = [(file, target_dir) for file in file_list] 来容纳目标目录中的每个文件。

它看起来像......

[('test1.pdf', '/home/mcurie/files/pdfs/'), ('test2.pdf', '/home/mcurie/files/pdfs/'), ('test3.pdf', '/home/mcurie/files/pdfs/'), ('test4.pdf', '/home/mcurie/files/pdfs/')]

显然,对于这个用例,您可以通过将每个文件和目标目录存储在一个字符串中来简化此过程,但这会降低使用此方法的洞察力。

这个想法是,starmap() 将获取 res 的每个组件并将其放入函数 copy_file(source_file, target_dir) 中并执行它们是同步的(这受到CPU核心数量的限制)。

因此,第一个操作线程将类似于

copy_file('test1.pdf', '/home/mcurie/files/pdfs/')

我希望这会有所帮助。完整代码如下。

from multiprocessing.pool import Pool
import os

file_list = ["test1.pdf", "test2.pdf", "test3.pdf", "test4.pdf"]
target_dir = "/home/mcurie/files/pdfs/"


def copy_file(source_file, target_dir):
    os.system(f"cp {source_file} {target_dir + source_file}")
    
if __name__ == '__main__':
    with Pool() as p:
        res = [(file, target_dir) for file in file_list]
        for results in p.starmap(copy_file, res):
            pass

You can use starmap() to solve this problem with pooling.

Given that you have a list of files, say in your working directory, and you have a location you would like to copy those files to, then you can import os and use os.system() to run terminal commands in python. This will allow you to move the files over with ease.

However, before you start you will need to create a variable res = [(file, target_dir) for file in file_list] that will house each file with the target directory.

It will look like...

[('test1.pdf', '/home/mcurie/files/pdfs/'), ('test2.pdf', '/home/mcurie/files/pdfs/'), ('test3.pdf', '/home/mcurie/files/pdfs/'), ('test4.pdf', '/home/mcurie/files/pdfs/')]

Obviously, for this use case you can simplify this process by storing each file and target directory in one string to begin with, but that would reduce the insight of using this method.

The idea is that starmap() is going to take each component of res and place it into the function copy_file(source_file, target_dir) and execute them synchronously (this is limited by the core quantity of your cpu).

Therefore, the first operational thread will look like

copy_file('test1.pdf', '/home/mcurie/files/pdfs/')

I hope this helps. The full code is below.

from multiprocessing.pool import Pool
import os

file_list = ["test1.pdf", "test2.pdf", "test3.pdf", "test4.pdf"]
target_dir = "/home/mcurie/files/pdfs/"


def copy_file(source_file, target_dir):
    os.system(f"cp {source_file} {target_dir + source_file}")
    
if __name__ == '__main__':
    with Pool() as p:
        res = [(file, target_dir) for file in file_list]
        for results in p.starmap(copy_file, res):
            pass
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文