python 多进程调用者（以及被调用者）在 Windows XP 上多次调用

发布于 2024-12-12 20:42:43 字数 1579 浏览 3 评论 0原文

可能的重复：
多重处理启动过多的 Python VM 实例

我正在尝试使用 python 多进程来并行化网络获取，但我发现调用多处理的应用程序被实例化多次，而不仅仅是我想要调用的函数（这对我来说是一个问题，因为调用者对库有一些依赖实例化速度很慢 - 失去了并行性带来的大部分性能提升）。

我做错了什么或者如何避免这种情况？

my_app.py：my_slow_stuff.py：url_fetcher.py

from url_fetcher import url_fetch, parallel_fetch
import my_slow_stuff

：

if __name__ == '__main__':
    import datetime
    urls = ['http://www.microsoft.com'] * 20
    results = parallel_fetch(urls, fn=url_fetch)
    print([x[:20] for x in results])

class MySlowStuff(object):
    import time
    print('doing slow stuff')
    time.sleep(0)
    print('done slow stuff')

部分

import multiprocessing
import urllib

def url_fetch(url):
    #return urllib.urlopen(url).read()
    return url

def parallel_fetch(urls, fn):
    PROCESSES = 10
    CHUNK_SIZE = 1
    pool = multiprocessing.Pool(PROCESSES)
    results = pool.imap(fn, urls, CHUNK_SIZE)
    return results

if __name__ == '__main__':
    import datetime
    urls = ['http://www.microsoft.com'] * 20
    results = parallel_fetch(urls, fn=url_fetch)
    print([x[:20] for x in results])

输出：

$ python my_app.py
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff

...

原文

Possible Duplicate:
Multiprocessing launching too many instances of Python VM

I'm trying to use python multiprocess to parallelize web fetching, but I'm finding that the application calling the multiprocessing gets instantiated multiple times, not just the function I want called (which is a problem for me as the caller has some dependencies on a library that is slow to instantiate - losing most of my performance gains from parallelism).

What am I doing wrong or how is this avoided?

my_app.py:

from url_fetcher import url_fetch, parallel_fetch
import my_slow_stuff

my_slow_stuff.py:

if __name__ == '__main__':
    import datetime
    urls = ['http://www.microsoft.com'] * 20
    results = parallel_fetch(urls, fn=url_fetch)
    print([x[:20] for x in results])

class MySlowStuff(object):
    import time
    print('doing slow stuff')
    time.sleep(0)
    print('done slow stuff')

url_fetcher.py:

import multiprocessing
import urllib

def url_fetch(url):
    #return urllib.urlopen(url).read()
    return url

def parallel_fetch(urls, fn):
    PROCESSES = 10
    CHUNK_SIZE = 1
    pool = multiprocessing.Pool(PROCESSES)
    results = pool.imap(fn, urls, CHUNK_SIZE)
    return results

if __name__ == '__main__':
    import datetime
    urls = ['http://www.microsoft.com'] * 20
    results = parallel_fetch(urls, fn=url_fetch)
    print([x[:20] for x in results])

partial output:

$ python my_app.py
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff

...

分享到QQ

分享到微博