为什么 multiprocessing.Pool.map 比内置映射慢?
import multiprocessing
import time
from subprocess import call,STDOUT
from glob import glob
import sys
def do_calculation(data):
x = time.time()
with open(data + '.classes.report','w') as f:
call(["external script", data], stdout = f.fileno(), stderr=STDOUT)
return 'apk: {data!s} time {tim!s}'.format(data = data ,tim = time.time()-x)
def start_process():
print 'Starting', multiprocessing.current_process().name
if __name__ == '__main__':
inputs = glob('./*.dex')
builtin_outputs = map(do_calculation, inputs)
print 'Built-in:'
for i in builtin_outputs:
print i
pool_size = multiprocessing.cpu_count() * 2
print 'Worker Pool size: %s' % pool_size
pool = multiprocessing.Pool(processes=pool_size,
initializer=start_process,
)
pool_outputs = pool.map(do_calculation, inputs)
pool.close() # no more tasks
pool.join() # wrap up current tasks
print 'Pool output:'
for i in pool_outputs:
print i
令人惊讶的是,builtin_outputs
的执行时间比 pool_outputs
更快:
Built-in:
apk: ./TooDo_2.0.8.classes.dex time 5.69289898872
apk: ./TooDo_2.0.9.classes.dex time 5.37206411362
apk: ./Twitter_Client.classes.dex time 0.272782087326
apk: ./zaTelnet_Light.classes.dex time 0.141801118851
apk: ./Temperature_Converter.classes.dex time 0.270312070847
apk: ./Tipper_1.0.classes.dex time 0.293262958527
apk: ./XLive.classes.dex time 0.361288070679
apk: ./TwitterDroid_0.1.2_alpha.classes.dex time 0.381947040558
apk: ./Universal_Conversion_Application.classes.dex time 0.404763936996
Worker Pool size: 8
Pool output:
apk: ./TooDo_2.0.8.classes.dex time 5.72440505028
apk: ./TooDo_2.0.9.classes.dex time 5.9017829895
apk: ./Twitter_Client.classes.dex time 0.309305906296
apk: ./zaTelnet_Light.classes.dex time 0.374011039734
apk: ./Temperature_Converter.classes.dex time 0.450366973877
apk: ./Tipper_1.0.classes.dex time 0.379780054092
apk: ./XLive.classes.dex time 0.394504070282
apk: ./TwitterDroid_0.1.2_alpha.classes.dex time 0.505702018738
apk: ./Universal_Conversion_Application.classes.dex time 0.512043952942
如何解释这种性能差异?
import multiprocessing
import time
from subprocess import call,STDOUT
from glob import glob
import sys
def do_calculation(data):
x = time.time()
with open(data + '.classes.report','w') as f:
call(["external script", data], stdout = f.fileno(), stderr=STDOUT)
return 'apk: {data!s} time {tim!s}'.format(data = data ,tim = time.time()-x)
def start_process():
print 'Starting', multiprocessing.current_process().name
if __name__ == '__main__':
inputs = glob('./*.dex')
builtin_outputs = map(do_calculation, inputs)
print 'Built-in:'
for i in builtin_outputs:
print i
pool_size = multiprocessing.cpu_count() * 2
print 'Worker Pool size: %s' % pool_size
pool = multiprocessing.Pool(processes=pool_size,
initializer=start_process,
)
pool_outputs = pool.map(do_calculation, inputs)
pool.close() # no more tasks
pool.join() # wrap up current tasks
print 'Pool output:'
for i in pool_outputs:
print i
Surprisingly, builtin_outputs
has a faster execution time than pool_outputs
:
Built-in:
apk: ./TooDo_2.0.8.classes.dex time 5.69289898872
apk: ./TooDo_2.0.9.classes.dex time 5.37206411362
apk: ./Twitter_Client.classes.dex time 0.272782087326
apk: ./zaTelnet_Light.classes.dex time 0.141801118851
apk: ./Temperature_Converter.classes.dex time 0.270312070847
apk: ./Tipper_1.0.classes.dex time 0.293262958527
apk: ./XLive.classes.dex time 0.361288070679
apk: ./TwitterDroid_0.1.2_alpha.classes.dex time 0.381947040558
apk: ./Universal_Conversion_Application.classes.dex time 0.404763936996
Worker Pool size: 8
Pool output:
apk: ./TooDo_2.0.8.classes.dex time 5.72440505028
apk: ./TooDo_2.0.9.classes.dex time 5.9017829895
apk: ./Twitter_Client.classes.dex time 0.309305906296
apk: ./zaTelnet_Light.classes.dex time 0.374011039734
apk: ./Temperature_Converter.classes.dex time 0.450366973877
apk: ./Tipper_1.0.classes.dex time 0.379780054092
apk: ./XLive.classes.dex time 0.394504070282
apk: ./TwitterDroid_0.1.2_alpha.classes.dex time 0.505702018738
apk: ./Universal_Conversion_Application.classes.dex time 0.512043952942
How can this performance difference be explained?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
当您使用多处理时,您应该为工作进程提供足够的计算以持续至少几秒钟。如果工作进程结束得太快,则需要花费太多时间来设置池、生成子进程以及(可能)在进程之间切换(并且没有足够的时间实际执行预期的计算)来证明使用多处理的合理性代码>.
此外,如果您的计算受 CPU 限制,那么用比核心数更多的进程初始化池 (
multiprocessing.cpu_count()
) 会适得其反。它将使操作系统在进程之间切换,同时不允许计算进行得更快。When you use multiprocessing, it behooves you to give the worker processes enough computation to last for at least a few seconds. If the worker process ends too quickly, then too much time is spent setting up the pool, spawning the subprocess, and (potentially) switching between processes (and not enough time actually doing the intended computation) to justify using
multiprocessing
.Also, if you have a CPU-bound computation, then initializing a pool with more processes than cores (
multiprocessing.cpu_count()
) is counter-productive. It will make the OS switch between processes while not allowing the computation to proceed any faster.如果“外部脚本”中涉及的工作负载的 IO 量足够大,以至于使您的硬盘饱和,那么并行运行多个副本只会减慢您的速度,因为从多个文件读取会产生额外的寻道。
如果您的 CPU 已饱和并且没有多个可用的 CPU 核心,情况也是如此。
If the workload involved in "external script" is sufficiently IO-heavy that it saturates your hard disk, running multiple copies in parallel will only slow you down, as reading from multiple files incurs additional seeks.
Same goes if you're saturating your CPU and you don't have multiple CPU cores available.
您正在测量执行单个任务所需的时间。如果并行运行任务,每个单独的任务不会变短。相反,它们都同时运行。换句话说,您测量的是错误的,您应该计算所有任务的总时间,而不是单独计算每个任务的时间。
速度缓慢可能是因为同时运行多个任务会互相干扰,因此任务不会全速运行。
You are measuring the time required to perform a single task. If you run your tasks in parallel, each individual task doesn't get shorter. Rather, they all run at the same time. In other words, you are measuring this wrong, you should be calculating the total time for all tasks not each task individually.
The slowness is probably because running the various tasks at the same time interferes with each other somewhat and so the tasks don't run at full speed.