Python 多处理仅利用一个核心
我正在尝试 标准 python 文档 中的代码片段来学习如何使用多处理模块。该代码粘贴在此消息的末尾。 我在四核机器上的 Ubuntu 11.04 上使用 Python 2.7.1(根据系统监视器,由于超线程,该机器为我提供了 8 个核心)
问题:所有工作负载似乎都安排在一个核心上,该核心接近 100 个尽管启动了多个进程,但利用率百分比。有时,所有工作负载都会迁移到另一个核心,但工作负载永远不会在它们之间分配。
有什么想法为什么会这样吗?
最好的问候,
保罗
#
# Simple example which uses a pool of workers to carry out some tasks.
#
# Notice that the results will probably not come out of the output
# queue in the same in the same order as the corresponding tasks were
# put on the input queue. If it is important to get the results back
# in the original order then consider using `Pool.map()` or
# `Pool.imap()` (which will save on the amount of code needed anyway).
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
import time
import random
from multiprocessing import Process, Queue, current_process, freeze_support
#
# Function run by worker processes
#
def worker(input, output):
for func, args in iter(input.get, 'STOP'):
result = calculate(func, args)
output.put(result)
#
# Function used to calculate result
#
def calculate(func, args):
result = func(*args)
return '%s says that %s%s = %s' % \
(current_process().name, func.__name__, args, result)
#
# Functions referenced by tasks
#
def mul(a, b):
time.sleep(0.5*random.random())
return a * b
def plus(a, b):
time.sleep(0.5*random.random())
return a + b
def test():
NUMBER_OF_PROCESSES = 4
TASKS1 = [(mul, (i, 7)) for i in range(500)]
TASKS2 = [(plus, (i, 8)) for i in range(250)]
# Create queues
task_queue = Queue()
done_queue = Queue()
# Submit tasks
for task in TASKS1:
task_queue.put(task)
# Start worker processes
for i in range(NUMBER_OF_PROCESSES):
Process(target=worker, args=(task_queue, done_queue)).start()
# Get and print results
print 'Unordered results:'
for i in range(len(TASKS1)):
print '\t', done_queue.get()
# Add more tasks using `put()`
for task in TASKS2:
task_queue.put(task)
# Get and print some more results
for i in range(len(TASKS2)):
print '\t', done_queue.get()
# Tell child processes to stop
for i in range(NUMBER_OF_PROCESSES):
task_queue.put('STOP')
test()
I'm trying out a code snippet from the standard python documentation to learn how to use the multiprocessing module. The code is pasted at the end of this message.
I'm using Python 2.7.1 on Ubuntu 11.04 on a quad core machine (which according to the system monitor gives me eight cores due to hyper threading)
Problem: All workload seems to be scheduled to just one core, which gets close to 100% utilization, despite the fact that several processes are started. Occasionally all workload migrates to another core but the workload is never distributed among them.
Any ideas why this is so?
Best regards,
Paul
#
# Simple example which uses a pool of workers to carry out some tasks.
#
# Notice that the results will probably not come out of the output
# queue in the same in the same order as the corresponding tasks were
# put on the input queue. If it is important to get the results back
# in the original order then consider using `Pool.map()` or
# `Pool.imap()` (which will save on the amount of code needed anyway).
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
import time
import random
from multiprocessing import Process, Queue, current_process, freeze_support
#
# Function run by worker processes
#
def worker(input, output):
for func, args in iter(input.get, 'STOP'):
result = calculate(func, args)
output.put(result)
#
# Function used to calculate result
#
def calculate(func, args):
result = func(*args)
return '%s says that %s%s = %s' % \
(current_process().name, func.__name__, args, result)
#
# Functions referenced by tasks
#
def mul(a, b):
time.sleep(0.5*random.random())
return a * b
def plus(a, b):
time.sleep(0.5*random.random())
return a + b
def test():
NUMBER_OF_PROCESSES = 4
TASKS1 = [(mul, (i, 7)) for i in range(500)]
TASKS2 = [(plus, (i, 8)) for i in range(250)]
# Create queues
task_queue = Queue()
done_queue = Queue()
# Submit tasks
for task in TASKS1:
task_queue.put(task)
# Start worker processes
for i in range(NUMBER_OF_PROCESSES):
Process(target=worker, args=(task_queue, done_queue)).start()
# Get and print results
print 'Unordered results:'
for i in range(len(TASKS1)):
print '\t', done_queue.get()
# Add more tasks using `put()`
for task in TASKS2:
task_queue.put(task)
# Get and print some more results
for i in range(len(TASKS2)):
print '\t', done_queue.get()
# Tell child processes to stop
for i in range(NUMBER_OF_PROCESSES):
task_queue.put('STOP')
test()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
尝试将
time.sleep
替换为实际需要 CPU 的内容,您会发现multiprocess
工作得很好!例如:Try replacing the
time.sleep
with something that actually requires CPUs and you will see themultiprocess
works just fine! For example:CPU 关联性发生了一些变化。我之前用 numpy 也遇到过这个问题。我在这里找到了解决方案
http://bugs.python.org/issue17038#msg180663
Some how the CPU affinity has been changed. I had this problem with numpy before. I found the solution here
http://bugs.python.org/issue17038#msg180663
多处理并不意味着您将使用处理器的所有核心,您只是获得多个进程而不是多核进程,这将由操作系统处理并且不确定,问题 @Devraj 在评论中发布,其中包含完成您想要的任务的答案。
multiprocessing does not mean you'll use all cores of a processor, you just get multiple processes and not multi-core processes, this would be handled by the OS and is uncertain, the question @Devraj posted on comments has answers to accomplish what you desire.
我找到了使用并行Python 的解决方法。我知道这不是使用基本 Python 库的解决方案,但代码很简单并且工作起来很有魅力
I have found a work around using Parallel Python. I know this is not the solution using basic Python libraries, but the code is simple and works like a charm