Javascript 的 Python Scraper?
谁能指导我找到一个好的 Python 屏幕抓取库来获取 JavaScript 代码(希望有好的文档/教程)?我想看看有哪些选择,但最重要的是最容易学习且效果最快......想知道是否有人有经验。我听说过一些关于蜘蛛猴的东西,但也许还有更好的?
具体来说,我使用 BeautifulSoup 和 Mechanize 到达这里,但需要一种方法来打开 javascript 弹出窗口、提交数据并下载/解析 javascript 弹出窗口中的结果。
<a href="javascript:openFindItem(12510109)" onclick="s_objectID="javascript:openFindItem(12510109)_1";return this.s_oc?this.s_oc(e):true">Find Item</a>
我想用 Google App 引擎和 Django 来实现这个。谢谢!
Can anyone direct me to a good Python screen scraping library for javascript code (hopefully one with good documentation/tutorials)? I'd like to see what options are out there, but most of all the easiest to learn with fastest results... wondering if anyone had experience. I've heard some stuff about spidermonkey, but maybe there are better ones out there?
Specifically, I use BeautifulSoup and Mechanize to get to here, but need a way to open the javascript popup, submit data, and download/parse the results in the javascript popup.
<a href="javascript:openFindItem(12510109)" onclick="s_objectID="javascript:openFindItem(12510109)_1";return this.s_oc?this.s_oc(e):true">Find Item</a>
I'd like to implement this with Google App engine and Django. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在这些情况下,我通常做的是自动化实际的浏览器,并从那里获取处理后的 HTML。
编辑:
以下是在页面加载后自动将 InternetExplorer 导航至 URL 并获取标题和位置的示例。
What I usually do is automate an actual browser in these cases, and grab the processed HTML from there.
Edit:
Here's an example of automating InternetExplorer to navigate to a URL and grab the title and location after the page loads.
我使用 Python 绑定到 webkit 来渲染基本的 JavaScript,并使用 Chickenfoot 来实现更高级的交互。有关详细信息,请参阅此 webkit 示例。
I use the Python bindings to webkit to render basic JavaScript and Chickenfoot for more advanced interactions. See this webkit example for more info.
您还可以使用名为 Spynner 的“程序化 Web 浏览器”。我发现这是最好的解决方案。比较容易使用。
You can also use a "programatic web browser" named Spynner. I found this to be the best solution. Relatively easy to use.