Python 如何与 javascript 一起工作
我正在开发一个 scrapy 应用程序来抓取网页上的一些数据
,但是有一些数据是由 ajax 加载的,因此 python 无法执行它来获取数据。
有没有模拟浏览器行为的库?
I am working on a scrapy app to scrapte some data on a web page
But there is some data loaded by ajax, and thus python just cannot execute that to get the data.
Is there any lib that simulate the behavior of a browser?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
为此,您必须使用成熟的 Javascript 引擎(例如 Chrome 中的 Google V8),才能获得浏览器的真正功能及其交互方式。但是,您可以通过查找源中的所有 URL 并向每个 URL 发出请求来获取一些信息,希望获得一些有效数据。但总的来说,如果没有完整的 Javascript 引擎,你就会陷入困境。
类似于 python-spidermonkey。 Mozilla Javascript 引擎的包装器。然而,使用它可能相当复杂,但这取决于您的具体应用程序。
你基本上必须构建一个浏览器,但 Python 开发者似乎已经让这一切变得简单了。使用 PyWebkitGtk 你会得到 dom 并使用前面提到的 python-spidermonkey 或 PyV8 理论上您可以获得浏览器/网络爬虫所需的全部功能。
For that you'd have to use a full-blown Javascript engine (like Google V8 in Chrome), to get the real functionality of the browser and how it interacts. However, you could possibly get some information by looking up all URLs in the source and doing a request to each, hoping for some valid data. But in overall, you're stuck without a full Javascript engine.
Something like python-spidermonkey. A wrapper to the Javascript engine of Mozilla. However using it might be rather complicated, but that's dependant on your specific application.
You'd basically have to build a browser, but seems Python-people have made it simple. With PyWebkitGtk you'd get the dom and using either python-spidermonkey mentioned before or PyV8 mentioned by Duncan you'd theoretically get the full functionality needed for a browser/webscraper.
问题是您不仅需要能够执行一些 Javascript(这很简单),您还必须模拟浏览器 DOM,这需要大量工作。
如果您希望能够运行 Javascript,那么您可以使用 PyV8。使用
easy_install PyV8
安装它,然后您可以执行任何独立的 javascript:您还可以传入 Python 中定义的类,因此原则上可能可以模拟足够的 DOM 来满足您的目的。
The problem is that you don't just have to be able to execute some Javascript (that's easy), you also have to emulate the browser DOM and that's a lot of work.
If you want to be able to run Javascript then you can use PyV8. Install it with
easy_install PyV8
and then you can execute any standalone javascript:You can also pass in classes defined in Python, so in principle might be able could emulate enough of the DOM for your purposes.
AJAX 请求是异步执行的普通 Web 请求。您所需要的只是 JavaScript 代码发送到服务器的 URL。使用该 URL 和 urllib 来获取相同的数据。
An AJAX request is a normal web request which is executed asynchronously. All you need is the URL which the JavaScript code sends to the server. Use that URL with urllib to get at the same data.
完成工作的最简单方法是使用“PyExecJS”。
https://pypi.python.org/pypi/PyExecJS
PyExecJS 是 ExecJS 从 Ruby 的移植。 PyExecJS 自动选择可用于评估 JavaScript 程序的最佳运行时,然后将结果作为 Python 对象返回给您。
我使用 Macbook 并安装了 node.js,因此 pyexecjs 可以使用 node.js javascript 运行时。
pip install PyExecJS
测试代码:
import execjs
execjs.eval("'红黄蓝'.split(' ')")
祝你好运!
The simplest way to get work done is by using 'PyExecJS'.
https://pypi.python.org/pypi/PyExecJS
PyExecJS is a porting of ExecJS from Ruby. PyExecJS automatically picks the best runtime available to evaluate your JavaScript program, then returns the result to you as a Python object.
I use Macbook and installed node.js so pyexecjs can use node.js javascript runtime.
pip install PyExecJS
Test code:
import execjs
execjs.eval("'red yellow blue'.split(' ')")
Good luck!
2020 年的一点更新
我查看了 Google 搜索推荐的结果。结果最好的选择是 #4,然后 #1 #2 已被弃用!
已弃用的
https://github.com/sony/v8eval两年来几乎没有得到维护。需要 20 分钟才能构建并失败...(为此提交了错误报告 https://github .com/sony/v8eval/issues/34https://github.com/doloopwhile/PyExecJS
已被所有者终止A little update for 2020
i reviewed the results recommended by Google Search. Turns out the best choice is #4 and then #1 #2 are deprecated!
Deprecated
https://github.com/sony/v8evalhas received very little maintenance for 2 years. takes 20 minutes to build and fails... (filed a bug report for that https://github.com/sony/v8eval/issues/34https://github.com/doloopwhile/PyExecJS
is discontinued by the owner