自动拆分列表以进行多处理

发布于 2025-01-28 15:20:53 字数 858 浏览 1 评论 0 原文

我正在学习python的多处理,并想到一个问题。我希望为共享列表( nums = mp.manager()。列表),是否有任何方法可以自动将所有进程的列表拆分,以便它不会在相同的数字上计算平行线。

当前代码:

# multiple processes

nums = mp.Manager().list(range(10000))
results = mp.Queue()
def get_square(list_of_num, results_sharedlist):
    # simple get square
    results_sharedlist.put(list(map(lambda x: x**2, list_of_num)))

start = time.time()
process1 = mp.Process(target=get_square, args = (nums, results))
process2 = mp.Process(target=get_square, args=(nums, results))

process1.start()
process2.start()
process1.join()
process2.join()

print(time.time()-start)
for i in range(results.qsize()):
    print(results.get())

当前行为

它计算同一列表的平方两次

我想要的

我想要过程1和过程2来计算数字的正方形在没有我定义拆分的情况下并行列出1次。

I am learning multiprocessing in Python, and thinking of a problem. I want that for a shared list(nums = mp.Manager().list), is there any way that it automatically splits the list for all the processes so that it does not compute on same numbers in parallel.

Current code:

# multiple processes

nums = mp.Manager().list(range(10000))
results = mp.Queue()
def get_square(list_of_num, results_sharedlist):
    # simple get square
    results_sharedlist.put(list(map(lambda x: x**2, list_of_num)))

start = time.time()
process1 = mp.Process(target=get_square, args = (nums, results))
process2 = mp.Process(target=get_square, args=(nums, results))

process1.start()
process2.start()
process1.join()
process2.join()

print(time.time()-start)
for i in range(results.qsize()):
    print(results.get())

Current Behaviour

It computes the square of same list twice

What I want

I want the process 1 and process 2 to compute squares of nums list 1 time in parallel without me defining the split.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦里梦着梦中梦 2025-02-04 15:20:53

您可以发挥功能来确定其需要执行操作的数据。在当前方案中,您希望您的功能根据并行工作数量来将平方计算工作除以其自己的功能。

为此,您需要让您的函数知道它正在处理的过程以及与之一起工作的其他过程。这样它只能在特定数据上使用。因此,您只需将更多参数传递给您的函数,这些参数将提供有关并行运行的过程的信息。 IE current_process total_process

如果您的长度列表可除以2,并且要使用两个过程计算相同的正方形,那么您的功能看起来如下:

def get_square(list_of_num, results_sharedlist, current_process, total_process):
    total_length = len(list_of_num)
    start = (total_length // total_process) * (current_process - 1)
    end = (total_length // total_process) * current_process
    results_sharedlist.put(list(map(lambda x: x**2, list_of_num[start:end])))

TOTAL_PROCESSES = 2
process1 = mp.Process(target=get_square, args = (nums, results, 1, TOTAL_PROCESSES))
process2 = mp.Process(target=get_square, args=(nums, results, 2, TOTAL_PROCESSES))

我在这里所做的假设是您要在其中工作的列表长度您正在分配多个流程。如果不是,那么当前的逻辑将留下一些没有输出的数字。

希望这回答您的问题!

You can make function to decide on which data it needs to perform operations. In current scenario, you want your function to divide the square calculation work by it's own based on how many processes are working in parallel.

To do so, you need to let your function know which process it is working on and how many other processes are working along with it. So that it can only work on specific data. So you can just pass two more parameters to your functions which will give information about processes running in parallel. i.e. current_process and total_process.

If you have a list of length divisible by 2 and you want to calculate squares of same using two processes then your function would look something like as follows:

def get_square(list_of_num, results_sharedlist, current_process, total_process):
    total_length = len(list_of_num)
    start = (total_length // total_process) * (current_process - 1)
    end = (total_length // total_process) * current_process
    results_sharedlist.put(list(map(lambda x: x**2, list_of_num[start:end])))

TOTAL_PROCESSES = 2
process1 = mp.Process(target=get_square, args = (nums, results, 1, TOTAL_PROCESSES))
process2 = mp.Process(target=get_square, args=(nums, results, 2, TOTAL_PROCESSES))

The assumption I have made here is that the length of list on which you are going to work is in multiple of processes you are allocating. And if it not then the current logic will leave behind some numbers with no output.

Hope this answers your question!

极致的悲 2025-02-04 15:20:53

同意杰克在这里的答案,但作为奖励:
如果您使用多处理.pool(),则它可以保留所产生的多处理线程的内部计数器,因此您可以避免参数以识别 current_process 通过访问 > _identity 来自 current_process 通过多处理,如下:

from multiprocessing import current_process, Pool

p = current_process()
print('process counter:', p._identity[0])

更多信息来自 this 答案”。

Agree on the answer by Jake here, but as a bonus:
if you are using a multiprocessing.Pool(), it keeps an internal counter of the multiprocessing threads spawned, so you can avoid the parametr to identify the current_process by accessing _identity from the current_process by multiprocessing, like this:

from multiprocessing import current_process, Pool

p = current_process()
print('process counter:', p._identity[0])

more info from this answer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文