我需要编写一些访问某些网站的脚本。 来自命令行的脚本将获取一些页面,发布一些表单,抓取一些信息等。
它不能真正成为像 libwww-perl 这样的库“浏览器”,因为某些步骤可能需要用户交互(CAPTCHA,仅 Ajax 表单、任何交互意外等)。
我能想到的最实用的方法是在 Firefox 中远程打开一个选项卡,并向其中注入 JavaScript 代码,有点像 Greasemonkey 和 Selenium 可以。 它不一定适用于 Firefox,如果更容易的话也可以是其他浏览器。
那么最好的方法是什么?
I need to write some scripts that access some websites. A script from the command line would get some pages, post some forms, screen-scrape some information, etc.
It cannot really be a library "browser" like libwww-perl, because some steps might require user interactions (CAPTCHAs, Ajax-only forms, any interaction surprises, etc.).
The most practical way I can think of would be remotely opening a tab in Firefox, and injecting JavaScript code into it, something a bit like what Greasemonkey and Selenium do. It doesn't necessarily have to be for Firefox and can be a different browser if that's easier.
So what would be the best way to do that?
发布评论
评论(4)
您是否考虑过 Selenium 远程控制? 我之前使用该工具自动化了浏览器交互,它运行得很好,提供了很大的灵活性
根据您的具体需求,您也许可以利用 Selenium IDE 这是一个易于使用的 Firefox 插件,可以轻松编写脚本。
Have you considered Selenium Remote Control? I've automated browser interaction using the tool before and it works very well, providing a lot of flexibility
Depending on your exact needs, you might be able to leverage the Selenium IDE which is an easy to use Firefox plugin that allows easy scripting.
您可以使用 XPCOM 以各种可以想象到的方式扩展 Firefox。 您可能可以编写某种与另一个进程连接的接口。
You can use XPCOM to extend Firefox in every way imaginable. You could write some kind of interface that connects with another process maybe.
我不确定“最好”的方法是什么,但一种可能是使用 AppleScript 来完成这项工作。 然而,Firefox 没有广泛的脚本功能 - 如果您愿意使用 Safari,可以使用 AppleScript 命令将 JavaScript 代码注入到文档中(
do JavaScript
命令 - 在Safari 的脚本字典,可从脚本编辑器中获取)。另外,为了从命令行运行 AppleScripts,请使用 osascript:
I'm not sure what the "best" way to do it would be, but one possibility would be to use AppleScript for the job. Firefox, however, doesn't have extensive scripting capabilities—if you are willing to use Safari, there is an AppleScript command available to inject JavaScript code into a document (the
do JavaScript
command—look it up in Safari's scripting dictionary, available from within Script Editor).Also, in order to run AppleScripts from the command line, use
osascript
:要在 OS X 上编写脚本,我推荐两种方法,并且都是用 ruby 编写的。 第一个是 Watir,它是一个自动化测试框架,可以控制 Mac os x 上的 firefox 和 safari。
另一种更好的屏幕抓取方法是使用 hpricot 这是一个 html 解析器真的很容易使用。
Watir 在后台使用 JSSh - 适用于 Firefox 的 TCP/IP JavaScript Shell 服务器 来执行此操作。 JSSH 允许您通过 telnet 会话控制浏览器。
不管你走哪条路,如果有任何障碍,他们都会阻止你。 这就是他们的全部意义:-)
To write srcripts on OS X there are two ways I would recommend, and both of them are in ruby. The first is Watir which is an automated testing framework that will control both firefox and safari on Mac os x.
Another, prehaps better way for screen scraping would be to use hpricot which is a html parser that is really easy to use.
In the background Watir uses JSSh - a TCP/IP JavaScript Shell Server for Firefox to do this is. JSSH allows you you control the browser from a telnet session.
Whichever way you go, if ther eare catchpa's they will stop you though. It's sort of the whole point of them :-)