机械化和 JavaScript

发布于 2024-11-03 14:03:58 字数 321 浏览 1 评论 0原文

我想使用 Mechanize 来模拟浏览具有活动 JavaScript 的网页,包括 DOM 事件和 AJAX,但到目前为止我还没有找到这样做的方法。

我查看了一些支持 JavaScript 的 Python 客户端浏览器,例如 Spynner 和 Zope,但没有一个真正适合我。 Spynner 总是让 PyQt 崩溃,而且 Zope 看起来并不支持 JavaScript。

有没有一种方法可以仅使用 Python 来模拟浏览(无需额外的进程),例如 WATIR 或操作 Firefox 或 Internet Explorer 的库,同时完全支持 Javascript,就像实际浏览页面一样?

I want to use Mechanize to simulate browsing to a web page with active JavaScript, including DOM Events and AJAX, and so far I've found no way to do that.

I looked at some Python client browsers that support JavaScript like Spynner and Zope, and none of them really work for me. Spynner crashes PyQt all the time, and Zope doesn't support JavaScript as it seems.

Is there a way to simulate browsing with Python only (no extra processes) like WATIR or libraries that manipulate Firefox or Internet Explorer while supporting Javascript fully as if actually browsing the page?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

我们的影子 2024-11-10 14:03:58

我尝试过 Mechanize(我喜欢的)的新替代品,名为 Phantom JS

它是一个像 Safari 或 Chrome 一样的完整 Web 工具包浏览器,但它是无头且可编写脚本的。你用javascript编写脚本,而不是python(至少据我所知)。

有一些示例脚本可以帮助您入门。这很像使用 Firebug。我只花了几分钟使用它,但我发现我从一开始就非常高效。

I've played with this new alternative to Mechanize (which I love) called Phantom JS.

It is a full web kit browser like Safari or Chrome but is headless and scriptable. You script it with javascript, not python (as far as I know at least).

There are some example scripts to get you started. It's a lot like using Firebug. I've only spent a few min using it but I found I was quite productive right from the start.

墨洒年华 2024-11-10 14:03:58

来自http://wwwsearch.sourceforge.net/mechanize/faq.html#general

如果您在想要自动化的页面中遇到此问题,您有四个选择。这里是它们,大致按照简单的顺序排列。

弄清楚 JavaScript 正在做什么,并在您的 Python 代码中模拟它:例如,通过手动向 CookieJar 实例添加 cookie、调用 HTMLForms 上的方法、调用 urlopen 等。请参阅上面的表单。

使用 Java 的 HtmlUnit 或 Jython 的 HttpUnit,因为它们了解一些 JavaScript。

不要使用机械化,而是自动化浏览器。例如,通过其 COM 自动化接口使用 MS Internet Explorer,使用 Python for Windows 扩展,又名 pywin32,又名 win32all(例如简单函数,pamie;O'Reilly 书中的 pywin32 章节)或 ctypes(示例)。对于 Windows 上缺乏自动化 API 的情况,这种事情也可能很有用。对于 Firefox,有 PyXPCOM。

雄心勃勃并自动将工作委托给适当的解释器(例如 Mozilla 的 JavaScript 解释器)。这就是 HtmlUnit 和 httpunit 所做的事情。几年前,我沿着这些路线进行了一次飙升,但我认为要做好它(仍然)需要做很多工作。

From http://wwwsearch.sourceforge.net/mechanize/faq.html#general

If you come across this in a page you want to automate, you have four options. Here they are, roughly in order of simplicity.

Figure out what the JavaScript is doing and emulate it in your Python code: for example, by manually adding cookies to your CookieJar instance, calling methods on HTMLForms, calling urlopen, etc. See above re forms.

Use Java’s HtmlUnit or HttpUnit from Jython, since they know some JavaScript.

Instead of using mechanize, automate a browser instead. For example use MS Internet Explorer via its COM automation interfaces, using the Python for Windows extensions, aka pywin32, aka win32all (e.g. simple function, pamie; pywin32 chapter from the O’Reilly book) or ctypes (example). This kind of thing may also come in useful on Windows for cases where the automation API is lacking. For Firefox, there is PyXPCOM.

Get ambitious and automatically delegate the work to an appropriate interpreter (Mozilla’s JavaScript interpreter, for instance). This is what HtmlUnit and httpunit do. I did a spike along these lines some years ago, but I think it would (still) be quite a lot of work to do well.

淤浪 2024-11-10 14:03:58

基本上,如果你想要处理 javascript 的东西,那么你需要一个真正的 javascript 引擎,这些引擎总是涉及自动化真正的浏览器(我在其中包括无头浏览器)。

Java 的 HtmlUnit 做得不是很好,因为它没有使用实际浏览器中的 javascript 引擎。 Phantom JS 听起来很理想(正如 newz2000 指出的那样),但是我发现,当使用 javascript 操作页面时,如果您实际上看不到正在处理的页面,则调试脚本可能会非常困难。

这导致了像 Selenium Webdriver 这样的解决方案,它有一个完整的 python API 来自动化各种浏览器,但是你必须运行一个 java jar 并且它实际上启动浏览器,所以不是你想要的纯 python 解决方案(但我认为这是作为尽可能接近)。

Basically if you want something that deals with javascript then you need a real javascript engine, these invariably involve automating a real browser (I'm including headless ones in this).

Java’s HtmlUnit doesn't do a very good job as it doesn't use a javascript engine from an actual browser. Phantom JS sounds ideal (as newz2000 points out) however I find that when manipulating pages with javascript it can be very difficult to debug your script if you can't actually see the page you're dealing with.

This leads to solutions such as Selenium Webdriver which has a full python API to automate various browsers, however you must run a java jar and it actually launches the browser, so not the pure python solution you're after (but I think this is as close as you can get).

咋地 2024-11-10 14:03:58

您可以将 Selenium 与 Python 结合使用。然后,您可以抓取 JavaScript 生成的内容,并使用其他 JavaScript(以及 Python)操作页面。

# In your virtualenv: pip install selenium
from selenium import webdriver

# Launch Firefox GUI
browser = webdriver.Firefox()

# Alternatively, you can drive PhantomJS without a GUI
# With Node.js installed: `npm install -g phantomjs`
# browser = webdriver.PhantomJS()

# Fetch a webpage
browser.get('http://example.com')

# If you need the whole HTML document
# just like inspecting the rendered page with the console
html = browser.page_source

# Get an element, even if it was created with JS
button = browser.find_element_by_css_selector('div.some-class > \
                                               input.the-submit-button')

# Click on something
button.click()

# Execute some JavaScript (assumes jQuery is loaded on the page)
browser.execute_script("$('html, body').animate({ scrollTop: 500 }, 50);")

您可以在 Python REPL 中运行代码,并使用自动完成功能来发现浏览器或您选择的任何元素上可用的方法。或者执行类似 print(dir(browser)) 的操作来查看可用的内容。

You can use Selenium with Python. You can then scrape JavaScript-generated content as well as manipulate the page with additional JavaScript (as well as Python).

# In your virtualenv: pip install selenium
from selenium import webdriver

# Launch Firefox GUI
browser = webdriver.Firefox()

# Alternatively, you can drive PhantomJS without a GUI
# With Node.js installed: `npm install -g phantomjs`
# browser = webdriver.PhantomJS()

# Fetch a webpage
browser.get('http://example.com')

# If you need the whole HTML document
# just like inspecting the rendered page with the console
html = browser.page_source

# Get an element, even if it was created with JS
button = browser.find_element_by_css_selector('div.some-class > \
                                               input.the-submit-button')

# Click on something
button.click()

# Execute some JavaScript (assumes jQuery is loaded on the page)
browser.execute_script("$('html, body').animate({ scrollTop: 500 }, 50);")

You can run the code in a Python REPL and use autocomplete to discover the methods available on browser or whatever element you have selected. Or do something like print(dir(browser)) to see what is available.

千鲤 2024-11-10 14:03:58

可以在此处找到如何使用 PyV8 通过 python 在 DOM 上运行 JS 的示例:

https://github.com /buffer/thug

这应该很容易使其与 mechanize 一起运行。

An example how to use PyV8, to run JS on a DOM with python can be found here:

https://github.com/buffer/thug

This should be fairly easy to make it run together with mechanize.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文