如何从 R 控制 Firefox 来处理 AJAX/Javascript
我尝试找出一种通过 R 脚本控制浏览器(最好是 Firefox)的方法,以便检索网站中由 AJAX/Javascript 控制的信息。例如,我如何检索 http://www.mobile 处的“Modell”字段中的值。 de/home/index.html?
AFAIU,Gabe Becker 的包 "RFirefox" 确实提供了 R 和 Firefox 之间的某种链接。但作为一个 Windows 孩子(不是出于信念,而是长期的网络效应;-)),我自己还无法尝试,所以我不确定它是否能实现我所追求的目标。
那么:是否有人有使用 RFirefox 或通过 R 处理 AJAX 的经验?不想让你做我的作业,但在我投入 Linux 世界之前,我只想评估一下这是否值得。
尽管如此,任何代码示例将不胜感激。 ;-)
I try to figure out a way of controlling a browser (preferably Firefox) via R scripts in order to retrieve information controlled by AJAX/Javascripts in Websites. For example, how could I retrieve the values in field "Modell" at http://www.mobile.de/home/index.html?
AFAIU, Gabe Becker's package "RFirefox" does provide some sort of link between R an Firefox. But being a Windows-Kid (not by conviction, but longstanding network effects ;-)), I couldn't try it myself yet so I'm not sure if it can do what I'm after.
So: is there anyone out there who does have some experience with either RFirefox or handling AJAX via R yet? Don't want you to do my homework, but before I plunge into the Linux world I'd just like to assess if it's worth it.
Nevertheless, any code examples would be greatly appreciated. ;-)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不清楚为什么需要浏览器来执行此操作。这只是网页抓取;当然,它需要某种解析器,但不一定是浏览器。我认为 RFirefox 可能找错了方向。如果您想使用 Javascript+R 连接,请查看 Duncan Temple Lang 的 SpiderMonkey。
即便如此,我认为使用适合使用 Javascript 的更严格的爬行/抓取工具来收集数据可能会更好。 这个问题似乎与此特别一致。我的建议是获取一个可以满足您需要的工具,然后以尽可能简单的级别将其与 R 连接。 Webkit 与多种语言都有绑定,尽管 R 似乎并非如此。
这个问题更贴切地解决了您的情况:它也在 Windows 上。它不使用Webkit。接受的答案中的三个建议是指从 Python 访问用 C/C++ 编写的工具。 R 具有两者的接口,因此您可能会发现编写一些东西来使用这些东西并在 R 和 Python 或 C/C++ 之间来回传递对象和指令更容易。
I'm not clear on why you need a browser to do this. It's just web scraping; it will require some kind of parser, certainly, but not necessarily a browser. I think that RFirefox may be barking up the wrong tree. If you want to play with Javascript+R connections, take a look at Duncan Temple Lang's SpiderMonkey.
Even so, I think it may be better to collect data with a more serious crawling/scraping facility suited for working with Javascript. This question on SO seems particularly aligned with that. My recommendation would be to get a tool that does what you need, and then interface that with R at the simplest level possible. There are bindings for Webkit to several languages, albeit this doesn't seem to be the case for R.
This question addresses your situation even more closely: it is also on Windows. It doesn't use Webkit. The three suggestions in the accepted answer refer to accessing the tools, written in C/C++, from Python. R has interfaces for both, so you may find it easier to write some stuff to work with these and pass objects and instructions back and forth between R and Python or C/C++.