我有以下情况。
我们正在使用基于 LaTeX 的文档系统,该系统需要获取发布到 Jive (Clearspace) 网页的信息。此信息在 Javascript 网页上提供,无法使用 cURL、wget 或 lynx 等传统工具正确呈现。
我正在寻找一些方法来执行以下操作(首选 bash/命令行)
1)使用适当的凭据登录到 Jive 服务器(可能需要保存 cookie)
2) 访问特定页面并将相关信息转储到文本文件
有任何想法/建议吗?我通常使用 bash 来完成大部分自动化过程,因为它使与 LaTeX(用命令行编译)的集成变得更加容易。然而,我对其他有一些指导的路线持开放态度。
I have the following situation.
We are using a LaTeX-based documentation system which needs to obtain information that is posted to a Jive (Clearspace) webpage. This information is provided on a webpage in Javascript which cannot be properly rendered using conventional tools like cURL, wget, or lynx.
I am looking for some means to be able to do the following (bash/command line preferred)
1) Login to the Jive server with appropriate credentials (will probably need to save a cookie)
2) Access the specific page and dump the relevant information to a text file
Any ideas/suggestions? I typically do most of these automated procedures using bash due to the fact that it makes integration with LaTeX (compiled with command line) much easier. Yet, I am open to other routes with a bit of guidance.
发布评论
评论(1)
可以使用 wget (大概还有 cURL)来发布和保存 cookie。以下是 wget 手册页中的一个示例:
您可以向显示数据的页面发出后续请求
至于弄清楚要在 post-data 选项中放入什么内容,我建议安装一个调试代理,例如 fiddler2。这将使您轻松理解发送到服务器的 http 消息的整个结构,并允许您对表单进行逆向工程。
It is possible to post and save cookies with wget (and presumably cURL). Here is an example from the wget man page:
You can them make the subsequent request to the page that displays the data with
As for figuring out what to put in the post-data option, I recommend installing a debugging proxy like fiddler2. This will make it easy to understand the entire structure of the http messages sent to the server and allow you to reverse engineer the forms.