使用 Chickenfoot 保存 PDF 文件
我正在使用 Chickenfoot 编写网络爬虫,需要保存 PDF 文件。我可以单击页面上的链接或获取 PDF 的 URL 并使用
go("http://www.whatever.com/file.pdf")
,然后出现 Firefox“打开 file.pdf”对话框,但无法单击“确定”按钮来实际保存文件。
我尝试过使用其他方式下载文件(wget、python 的 urllib2、twill),但 PDF 文件是受限制的,所以这些都不起作用。
任何帮助表示赞赏。
I'm writing a web-crawler using Chickenfoot and need to save PDF files. I can either click the link on the page or grab the PDF's URL and use
go("http://www.whatever.com/file.pdf")
and I get the firefox "Opening file.pdf" dialog box, but can't click the "OK" button to actually save the file.
I've tried using other means to download the files (wget, python's urllib2, twill), but the PDF files are gated so none of those will work.
Any help is appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这个关于如何在 Mozilla 开发人员文档中保存目标的示例看起来应该完全符合您的要求。我已经测试了一个非常相似的 Chickenfoot 示例,它获取临时环境变量,并且在 Chickenfoot 中对我来说效果很好。
https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIWebBrowserPersist#Example
您可能需要使用“工具”、“选项”、“应用程序”中的应用程序关联,以确保操作设置为“保存文件”,但这些设置可能不适用于这些功能。
回答结束,开始相关的抱怨……
我确实希望有人能够修复 Chickenfoot 中的许多错误,并编写一本不错的 Cookbook 编程指南。我已经使用它很多年了,但仍然有很多基本的事情我不知道该怎么做。我终于崩溃并订阅了邮件列表,因为档案中有一些不错的脚本示例。由于 Web API 参考非常稀疏,因此需要大量搜索 pdf 参考、博客等。
我喜欢 Chickenfoot 能够如此简单地自动执行某些任务,但我需要花几天时间搜索 javascript、DOM 和 Firefox 文档才能找到完成一些它无法完成的事情的方法,因为我并不是真正的 Web 程序员。 Chickenfoot 的目标似乎是我不必这样做,但不幸的是很少有人正在完善概念验证,因为麻省理工学院已经放弃了该项目。
我尝试了几种仅使用 Chickenfoot 命令的方法,并确认它们不适用于最新的 Firefox 3 和 Chickenfoot 1.0.7。
我希望这有帮助!祝你好运。抱歉,我昨天才看到你的问题,但发现它太有趣了,不能单独留下。
This example of how to save a target in the Mozilla developer documents looks like it should do exactly what you want. I've tested a Chickenfoot example that is very similar that gets the temp environment variable, and that worked well for me in Chickenfoot.
https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIWebBrowserPersist#Example
You might have to play with the application associations in Tools, Options, Applications to make sure the action is set to Save File, but those settings might not apply to these functions.
End Answer, begin related grumblings...
I sure wish someone would fix the many bugs in Chickenfoot, and write a nice Cookbook programming guide. I've been using it for years, and there are still many basic things I've not been able to figure out how to do. I finally broke down and subscribed to the mailing list, as the archives have some decent script examples. It takes a lot of searching through the pdf references, blogs, etc. as the web API reference is very sparse.
I love how simple Chickenfoot can make automating some tasks, but it takes me days of searching javascript, DOM, and Firefox documents to find ways to do some of the things it can't, since I'm not really a web programmer. The goal of Chickenfoot seems to be that I shouldn't have to be, but unfortunately few are refining the proof of concept, as MIT has dropped the project.
I tried to do this several ways using only Chickenfoot commands and confirmed they don't work with the latest Firefox 3 and Chickenfoot 1.0.7.
I hope this helps! Good luck. Sorry I only ran across your question yesterday, but found it too interesting to leave alone.
出于安全考虑,您将无法单击 Firefox 对话框。
下载 URL 内容的最佳方法是读取然后写入 URL 内容。
您可能会发现将 jquery 与 Chickenfoot 结合使用很有帮助。
http://groups.csail.mit.edu /uid/chickenfoot/scripts/index.php?title=使用_jQuery,_jQuery_UI_and_similar_libraries
You won't be able to click on Firefox dialogs for the sake of security.
The best way to download the content of a URL is to read then write the content of the URL.
You might find it helpful to use jquery with chickenfoot.
http://groups.csail.mit.edu/uid/chickenfoot/scripts/index.php?title=Using_jQuery,_jQuery_UI_and_similar_libraries
这对我从 NCES 门户保存 Excel 文件很有用。
http://muaz-khan .blogspot.com/2012/10/save-files-on-disk-using-javascript-or.html
我使用的是 Firefox 3.0 和代码的“旧语法”版本。我还删除了用于 IE 的代码和“(window.URL || window.webkitURL).revokeObjectURL(save.href);”这产生了一个错误。
This has worked for me to save Excel files from NCES portal.
http://muaz-khan.blogspot.com/2012/10/save-files-on-disk-using-javascript-or.html
I was using Firefox 3.0 and the "old syntax" version of the code. I also stripped code intended for IE and "(window.URL || window.webkitURL).revokeObjectURL(save.href);" which generated an error.