使用 JavaScript 的编程式 Python 浏览器
我想对一个使用 JavaScript 的网站进行屏幕抓取。
有 mechanize,Python 的编程 Web 浏览器。然而,它(可以理解)不解释 javascript。有没有 Python 的编程浏览器可以做到这一点?如果没有,Python 中是否有任何 JavaScript 实现可供我尝试创建一个?
I want to screen-scrape a web-site that uses JavaScript.
There is mechanize, the programmatic web browser for Python. However, it (understandably) doesn't interpret javascript. Is there any programmatic browser for Python which does? If not, is there any JavaScript implementation in Python that I could use to attempt to create one?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
您可能最好使用 Selenium 之类的工具来使用 Web 浏览器自动进行抓取,以便 JS 执行并且页面就像真实用户一样呈现。
You might be better off using a tool like Selenium to automate the scraping using a web browser, so the JS executes and the page renders just like it would for a real user.
PyV8 软件包很好地封装了 Google 的 Python V8 Javascript 引擎。它特别好,因为您不仅可以从 Python 调用 Javascript 代码,还可以从 Javascript 回调到 Python 代码。这使得实现通常的浏览器提供的对象(即 Javascript 全局命名空间中的所有内容:“window”、“document”等)变得非常简单,如果您要一个支持 Javascript 的 Python 浏览器模拟器,可以通过将其与 mechanize 连接起来。
The PyV8 package nicely wraps Google's V8 Javascript engine for Python. It's particularly nice because not only can you call from Python to Javascript code, but you can call back from Javascript to Python code. This makes it quite straightforward to implement the usual browser-supplied objects (that is, everything in the Javascript global namespace: "window", "document", and so on), which you'd need to do if you were going to make a Javascript-capable Python browser emulator thing, possibly by hooking this up with mechanize.
我最喜欢的是 PyPhantomJS。它是使用 Python 和 PyQt4 编写的。它是完全无头的,你可以完全通过 JavaScript 控制它。
但是,如果您希望实际查看该页面,也可以使用 PyQt4 中的 QWebView。
My favorite is PyPhantomJS. It's written using Python and PyQt4. It's completely headless and you can control it completely from JavaScript.
However, if you are looking to actually see the page, you can use
QWebView
from PyQt4 as well.还有spynner“一个用于Python的有状态编程Web浏览器模块,基于QtWebkit框架,支持Javascript/AJAX”:http://code.google.com/p/spynner/
There is also spynner " a stateful programmatic web browser module for Python with Javascript/AJAX support based on the QtWebkit framework" : http://code.google.com/p/spynner/
您还可以尝试在相关页面上定义 Chickenfoot 页面触发器,执行任何操作并将操作结果保存到本地文件,然后从程序内的命令行调用 Firefox,然后读取该文件。
You could also try defining Chickenfoot page triggers on the pages in question, executing whatever operations you want on the page and saving the results of the operation to a local file, and calling Firefox from the command line inside your program, followed by reading the file.
我建议您查看 http://wiki.python 中可用的一些选项。 org/moin/WebBrowserProgramming - 令人惊讶的是,这是一个常见问题(我今天通过在谷歌上搜索“python browser”一词在 stackoverflow 上找到了三个)。如果你这样做,你会找到我给出的其他答案。
i recommend that you take a look at some of the options available to you at http://wiki.python.org/moin/WebBrowserProgramming - surprisingly this is coming up as a common question (i've found three on stackoverflow today, by searching for the words "python browser" on google). if you do the same you'll find the other answers i gave.
您可以尝试 zope 浏览器
http://pypi.python.org/pypi ?:action=display&name=zope.testbrowser
you may try zope browser
http://pypi.python.org/pypi?:action=display&name=zope.testbrowser
剧作家 或 pyppeteer 都相当不错,并且使用无头 Chromium 来渲染页面和解释 JavaScript。
我会从两者中选择 Playwright,只是因为它得到了更大实体的支持,并且开箱即用地支持 Chromium/Firefox/WebKit。
Playwright or pyppeteer are both reasonably good, and use headless Chromium to render pages and interpret JavaScript.
I'd pick Playwright out of the two, simply because it's backed by a larger entity, and supports Chromium/Firefox/WebKit out of the box.