机械化和 JavaScript
我想使用 Mechanize 来模拟浏览具有活动 JavaScript 的网页,包括 DOM 事件和 AJAX,但到目前为止我还没有找到这样做的方法。
我查看了一些支持 JavaScript 的 Python 客户端浏览器,例如 Spynner 和 Zope,但没有一个真正适合我。 Spynner 总是让 PyQt 崩溃,而且 Zope 看起来并不支持 JavaScript。
有没有一种方法可以仅使用 Python 来模拟浏览(无需额外的进程),例如 WATIR 或操作 Firefox 或 Internet Explorer 的库,同时完全支持 Javascript,就像实际浏览页面一样?
I want to use Mechanize to simulate browsing to a web page with active JavaScript, including DOM Events and AJAX, and so far I've found no way to do that.
I looked at some Python client browsers that support JavaScript like Spynner and Zope, and none of them really work for me. Spynner crashes PyQt all the time, and Zope doesn't support JavaScript as it seems.
Is there a way to simulate browsing with Python only (no extra processes) like WATIR or libraries that manipulate Firefox or Internet Explorer while supporting Javascript fully as if actually browsing the page?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我尝试过 Mechanize(我喜欢的)的新替代品,名为 Phantom JS。
它是一个像 Safari 或 Chrome 一样的完整 Web 工具包浏览器,但它是无头且可编写脚本的。你用javascript编写脚本,而不是python(至少据我所知)。
有一些示例脚本可以帮助您入门。这很像使用 Firebug。我只花了几分钟使用它,但我发现我从一开始就非常高效。
I've played with this new alternative to Mechanize (which I love) called Phantom JS.
It is a full web kit browser like Safari or Chrome but is headless and scriptable. You script it with javascript, not python (as far as I know at least).
There are some example scripts to get you started. It's a lot like using Firebug. I've only spent a few min using it but I found I was quite productive right from the start.
来自http://wwwsearch.sourceforge.net/mechanize/faq.html#general
From http://wwwsearch.sourceforge.net/mechanize/faq.html#general
基本上,如果你想要处理 javascript 的东西,那么你需要一个真正的 javascript 引擎,这些引擎总是涉及自动化真正的浏览器(我在其中包括无头浏览器)。
Java 的 HtmlUnit 做得不是很好,因为它没有使用实际浏览器中的 javascript 引擎。 Phantom JS 听起来很理想(正如 newz2000 指出的那样),但是我发现,当使用 javascript 操作页面时,如果您实际上看不到正在处理的页面,则调试脚本可能会非常困难。
这导致了像 Selenium Webdriver 这样的解决方案,它有一个完整的 python API 来自动化各种浏览器,但是你必须运行一个 java jar 并且它实际上启动浏览器,所以不是你想要的纯 python 解决方案(但我认为这是作为尽可能接近)。
Basically if you want something that deals with javascript then you need a real javascript engine, these invariably involve automating a real browser (I'm including headless ones in this).
Java’s HtmlUnit doesn't do a very good job as it doesn't use a javascript engine from an actual browser. Phantom JS sounds ideal (as newz2000 points out) however I find that when manipulating pages with javascript it can be very difficult to debug your script if you can't actually see the page you're dealing with.
This leads to solutions such as Selenium Webdriver which has a full python API to automate various browsers, however you must run a java jar and it actually launches the browser, so not the pure python solution you're after (but I think this is as close as you can get).
您可以将 Selenium 与 Python 结合使用。然后,您可以抓取 JavaScript 生成的内容,并使用其他 JavaScript(以及 Python)操作页面。
您可以在 Python REPL 中运行代码,并使用自动完成功能来发现浏览器或您选择的任何元素上可用的方法。或者执行类似
print(dir(browser))
的操作来查看可用的内容。You can use Selenium with Python. You can then scrape JavaScript-generated content as well as manipulate the page with additional JavaScript (as well as Python).
You can run the code in a Python REPL and use autocomplete to discover the methods available on
browser
or whatever element you have selected. Or do something likeprint(dir(browser))
to see what is available.可以在此处找到如何使用 PyV8 通过 python 在 DOM 上运行 JS 的示例:
https://github.com /buffer/thug
这应该很容易使其与 mechanize 一起运行。
An example how to use PyV8, to run JS on a DOM with python can be found here:
https://github.com/buffer/thug
This should be fairly easy to make it run together with mechanize.