Python GUI Scraper 挂起问题
不久前我用 python 写了一个爬虫,它在命令行中运行得很好。我现在已经为该应用程序制作了一个 GUI,但我遇到了一个问题。当我尝试更新 gui 内的文本(例如“获取 URL 12/50”)时,我无法看到抓取器内的功能正在抓取 100 多个链接。此外,当从一个抓取函数转到一个应该更新 gui 的函数,再到另一个函数时,在运行下一个抓取函数时,gui 更新函数似乎会被跳过。一个例子是:
scrapeLinksA() #takes 20 seconds
updateInfo("LinksA done")
scrapeLinksB() #takes another 20 seconds
在上面的例子中,updateInfo永远不会被执行,除非我用键盘中断结束程序。
我认为我的解决方案是线程,但我不确定。我可以做什么来解决这个问题?
我正在使用:
- PyQt4
- urllib2
- BeautifulSoup
I wrote a scraper using python a while back, and it worked fine in the command line. I have made a GUI for the application now, but I am having trouble with one issue. When I attempt to update text inside the gui (e.g. 'fetching URL 12/50'), I am unable seeing as the function within the scraper is grabbing 100+ links. Also when going from one scraping function, to a function that should update the gui, to another function, the gui update function seems to be skipped over while the next scrape function is run. An example would be:
scrapeLinksA() #takes 20 seconds
updateInfo("LinksA done")
scrapeLinksB() #takes another 20 seconds
in the above example, updateInfo is never executed, unless I end the program with a KeyboardInterrupt.
I'm thinking my solution is threading, but I'm not sure. What can I do to fix this?
I am using:
- PyQt4
- urllib2
- BeautifulSoup
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Lukáš Lalinský 的回答非常好。
另一种可能性是使用 PyQt 线程。
如果问题仅仅是“更新”部分(而不是需要异步处理),请尝试将此调用:
放在
scrapeLinksA
和scrapeLinksB
之间,看看是否有帮助(它暂时中断主事件循环以查看是否有其他(例如绘制请求)待处理)。如果没有,请向我们提供
updateInfo
的来源。Lukáš Lalinský 's answer is very good.
Another possibility would be to use the PyQt threads.
If the problem is merely the 'updating' part (and not the need for asynchronous processing), try putting this call:
between
scrapeLinksA
andscrapeLinksB
to see if that helps (it temporarily interrupts the main event loop to see if there are other (paint requests e.g.) pending).If that doesn't, please provide us with the source of
updateInfo
.我建议使用
QNetworkAccessManager
对于非阻止下载网站的方式。这是一种不同的方法,因此您可能会重写应用程序的处理部分。您不必等到页面下载完毕才能解析它,而是拥有多个较小的函数,通过信号连接,并在发生某些事件(例如“页面已下载”)时执行它们。I'd suggest to use
QNetworkAccessManager
for a non-blocking way of downloading the websites. It's a different approach, so you will probably rewrite the handling part of your application. Instead of waiting until the page is downloaded so that you can parse it, you have multiple smaller functions, connected via signals and they are executed when some events happen (e.g. "the page is downloaded").