python 多进程调用者(以及被调用者)在 Windows XP 上多次调用
可能的重复:
多重处理启动过多的 Python VM 实例
我正在尝试使用 python 多进程来并行化网络获取,但我发现调用多处理的应用程序被实例化多次,而不仅仅是我想要调用的函数(这对我来说是一个问题,因为调用者对库有一些依赖实例化速度很慢 - 失去了并行性带来的大部分性能提升)。
我做错了什么或者如何避免这种情况?
my_app.py:my_slow_stuff.py:url_fetcher.py
from url_fetcher import url_fetch, parallel_fetch
import my_slow_stuff
:
if __name__ == '__main__':
import datetime
urls = ['http://www.microsoft.com'] * 20
results = parallel_fetch(urls, fn=url_fetch)
print([x[:20] for x in results])
class MySlowStuff(object):
import time
print('doing slow stuff')
time.sleep(0)
print('done slow stuff')
部分
import multiprocessing
import urllib
def url_fetch(url):
#return urllib.urlopen(url).read()
return url
def parallel_fetch(urls, fn):
PROCESSES = 10
CHUNK_SIZE = 1
pool = multiprocessing.Pool(PROCESSES)
results = pool.imap(fn, urls, CHUNK_SIZE)
return results
if __name__ == '__main__':
import datetime
urls = ['http://www.microsoft.com'] * 20
results = parallel_fetch(urls, fn=url_fetch)
print([x[:20] for x in results])
输出:
$ python my_app.py
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
...
Possible Duplicate:
Multiprocessing launching too many instances of Python VM
I'm trying to use python multiprocess to parallelize web fetching, but I'm finding that the application calling the multiprocessing gets instantiated multiple times, not just the function I want called (which is a problem for me as the caller has some dependencies on a library that is slow to instantiate - losing most of my performance gains from parallelism).
What am I doing wrong or how is this avoided?
my_app.py:
from url_fetcher import url_fetch, parallel_fetch
import my_slow_stuff
my_slow_stuff.py:
if __name__ == '__main__':
import datetime
urls = ['http://www.microsoft.com'] * 20
results = parallel_fetch(urls, fn=url_fetch)
print([x[:20] for x in results])
class MySlowStuff(object):
import time
print('doing slow stuff')
time.sleep(0)
print('done slow stuff')
url_fetcher.py:
import multiprocessing
import urllib
def url_fetch(url):
#return urllib.urlopen(url).read()
return url
def parallel_fetch(urls, fn):
PROCESSES = 10
CHUNK_SIZE = 1
pool = multiprocessing.Pool(PROCESSES)
results = pool.imap(fn, urls, CHUNK_SIZE)
return results
if __name__ == '__main__':
import datetime
urls = ['http://www.microsoft.com'] * 20
results = parallel_fetch(urls, fn=url_fetch)
print([x[:20] for x in results])
partial output:
$ python my_app.py
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Windows 上的多处理模块与 Unix/Linux 上的工作方式不同。在 Linux 上,它使用 fork 命令,所有上下文都被复制/复制到新进程,就像分叉时一样。
Windows 上不存在系统调用 fork,并且多处理模块必须创建一个新的 python 进程并再次加载所有模块,这就是为什么在 python lib documetnacion 强制您在 Windows 上使用多处理时使用
if __name__ == '__main__'
技巧。这种情况的解决方案是使用线程来代替。这种情况是一个 IO 密集型进程,并且避免 GIL 问题的操作系统多处理优势不会影响您。
更多信息请参见 http://docs.python.org/library/multiprocessing.html#windows< /a>
The multiprocessing module on windows doesn't work the same as in Unix/Linux. On Linux it uses the fork command and all the context is copied/duplciated to the new pocess as it is when forked.
The system call fork does not exsit on windows, and the multiprocessing module has to create a new python process and load all the modules again, this is the reason why on the python lib documetnacion forces you to user the
if __name__ == '__main__'
trick when using mutiprocessing on windows.The solution to this case is to use threads instead. This case is a IO bound process and you the advantage os multiprocessing that is avoiding GIL problems does not afect you.
More info in http://docs.python.org/library/multiprocessing.html#windows
Windows 的 Python 多处理模块的行为略有不同,因为 Python 在此平台上没有实现 os.fork() 。尤其:
在这里,全局
class MySlowStuff
始终由 Windows 上新启动的子进程进行评估。要修复该问题,仅当__name__ == '__main__'
时才应定义class MySlowStuff
。请参阅16.6.3.2。 Windows 了解更多详细信息。
Python multiprocessing module for Windows behaves slightly differently because Python doesn't implement
os.fork()
on this platform. In particular:Here, global
class MySlowStuff
gets always evaluated by newly started child processes on Windows. To fix thatclass MySlowStuff
should be defined only when__name__ == '__main__'
.See 16.6.3.2. Windows for more details.