Python:如何修复此代码以使其在 Windows 上运行?

发布于 2024-12-15 15:53:44 字数 1101 浏览 3 评论 0 原文

import lxml.html
import mechanize, cookielib
import multiprocessing

browser = None

def download(i):
    link = 'www.google.com'
    response = browser.open(link)
    tree = lxml.html.parse(response)
    print tree
    return 0

if __name__ == '__main__':    
    browser = mechanize.Browser()
    cookie_jar = cookielib.LWPCookieJar()
    browser.set_cookiejar(cookie_jar)
    browser.set_handle_equiv(True)
    browser.set_handle_gzip(True)
    browser.set_handle_redirect(True)
    browser.set_handle_referer(False) #inicialmente estava on mas deve ser melhor off
    browser.set_handle_robots(False)
    browser.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
    browser.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:2.0.1) Gecko/20100101 Ubuntu/11.04 maverick Firefox/4.0.1')]

    pool = multiprocessing.Pool(None)
    tasks = range(8)
    r = pool.map_async(download, tasks)
    r.wait() # Wait on the results

如果我删除多处理部分,它就可以工作。如果我不在下载功能中调用浏览器,它也可以工作。然而,多重处理+机械化似乎根本不起作用。

我该如何解决这个问题? linux下不会发生这种情况。

import lxml.html
import mechanize, cookielib
import multiprocessing

browser = None

def download(i):
    link = 'www.google.com'
    response = browser.open(link)
    tree = lxml.html.parse(response)
    print tree
    return 0

if __name__ == '__main__':    
    browser = mechanize.Browser()
    cookie_jar = cookielib.LWPCookieJar()
    browser.set_cookiejar(cookie_jar)
    browser.set_handle_equiv(True)
    browser.set_handle_gzip(True)
    browser.set_handle_redirect(True)
    browser.set_handle_referer(False) #inicialmente estava on mas deve ser melhor off
    browser.set_handle_robots(False)
    browser.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
    browser.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:2.0.1) Gecko/20100101 Ubuntu/11.04 maverick Firefox/4.0.1')]

    pool = multiprocessing.Pool(None)
    tasks = range(8)
    r = pool.map_async(download, tasks)
    r.wait() # Wait on the results

If I remove the multiprocessing part, it works. If I don't call the browser inside the download function, it also works. However, it seems as if multiprocessing + mechanize is simply not working.

How can I fix this? It doesn't happen under linux.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

两相知 2024-12-22 15:53:44

只有主进程执行门控 if __name__ == '__main__' 块。由于 Windows 缺乏 fork 系统调用,因此池中创建的每个进程都需要自己的浏览器。您可以使用初始化函数来完成此操作。有关参考,请参阅 initializer 和 initargs 选项="nofollow">多处理池

import lxml.html
import mechanize, cookielib
import multiprocessing as mp

def download(i):
    link = 'http://www.google.com'
    response = browser.open(link)
    tree = lxml.html.parse(response)
    print tree
    return 0

def init(count):
    global browser
    browser = mechanize.Browser()
    cookie_jar = cookielib.LWPCookieJar()
    browser.set_cookiejar(cookie_jar)
    browser.set_handle_equiv(True)
    browser.set_handle_gzip(True)  #warning
    browser.set_handle_redirect(True)
    browser.set_handle_referer(False)
    browser.set_handle_robots(False)
    browser.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), 
                               max_time=1)
    browser.addheaders = [('User-agent', 
        'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:2.0.1) '
        'Gecko/20100101 Ubuntu/11.04 maverick Firefox/4.0.1')]

    count.value -= 1

if __name__ == '__main__':
    import time
    count = mp.Value('I', mp.cpu_count())
    pool = mp.Pool(count.value, initializer=init, initargs=(count,))
    #wait until all processes are initialized
    while count.value > 0:
        time.sleep(0.1)

    tasks = range(8)
    r = pool.map_async(download, tasks)
    r.wait()

Only the main process executes the gated if __name__ == '__main__' block. Since Windows lacks a fork system call, each process created in the pool needs its own browser. You can do this with an initializer function. For reference, see the initializer and initargs options of multiprocessing.Pool.

import lxml.html
import mechanize, cookielib
import multiprocessing as mp

def download(i):
    link = 'http://www.google.com'
    response = browser.open(link)
    tree = lxml.html.parse(response)
    print tree
    return 0

def init(count):
    global browser
    browser = mechanize.Browser()
    cookie_jar = cookielib.LWPCookieJar()
    browser.set_cookiejar(cookie_jar)
    browser.set_handle_equiv(True)
    browser.set_handle_gzip(True)  #warning
    browser.set_handle_redirect(True)
    browser.set_handle_referer(False)
    browser.set_handle_robots(False)
    browser.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), 
                               max_time=1)
    browser.addheaders = [('User-agent', 
        'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:2.0.1) '
        'Gecko/20100101 Ubuntu/11.04 maverick Firefox/4.0.1')]

    count.value -= 1

if __name__ == '__main__':
    import time
    count = mp.Value('I', mp.cpu_count())
    pool = mp.Pool(count.value, initializer=init, initargs=(count,))
    #wait until all processes are initialized
    while count.value > 0:
        time.sleep(0.1)

    tasks = range(8)
    r = pool.map_async(download, tasks)
    r.wait()
我们只是彼此的过ke 2024-12-22 15:53:44

我会尝试:

  • 删除 browser = None
    或者
  • __name__=="__main__"中的代码移至main()函数中,并在browser=mechanize之前添加全局浏览器。浏览器()
  • 将初始化浏览器的代码移动到初始化程序

如果您的任务受 I/O 限制,那么您不一定需要多处理来发出并发请求。例如,您可以使用concurrent.futures.ThreadPoolExecutor、gevent、Twisted 来代替。

相关:多线程 Python 应用程序和套接字连接的问题

I would try to:

  • remove browser = None
    or
  • move the code in the __name__=="__main__" into main() function and add global browser before browser=mechanize.Browser()
    or
  • move code that initializes browser to an initializer

If your tasks are I/O bound then you don't necessarily need multiprocessing to make concurrent requests. For example, you could use concurrent.futures.ThreadPoolExecutor, gevent, Twisted instead.

Related: Problem with multi threaded Python app and socket connections

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文