使用 Chickenfoot 保存 PDF 文件

发布于 2024-10-06 02:59:50 字数 291 浏览 4 评论 0原文

我正在使用 Chickenfoot 编写网络爬虫，需要保存 PDF 文件。我可以单击页面上的链接或获取 PDF 的 URL 并使用

go("http://www.whatever.com/file.pdf")

，然后出现 Firefox“打开 file.pdf”对话框，但无法单击“确定”按钮来实际保存文件。

我尝试过使用其他方式下载文件（wget、python 的 urllib2、twill），但 PDF 文件是受限制的，所以这些都不起作用。

任何帮助表示赞赏。

原文

I'm writing a web-crawler using Chickenfoot and need to save PDF files. I can either click the link on the page or grab the PDF's URL and use

go("http://www.whatever.com/file.pdf")

and I get the firefox "Opening file.pdf" dialog box, but can't click the "OK" button to actually save the file.

I've tried using other means to download the files (wget, python's urllib2, twill), but the PDF files are gated so none of those will work.

Any help is appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寂寞清仓 2024-10-13 02:59:50

这个关于如何在 Mozilla 开发人员文档中保存目标的示例看起来应该完全符合您的要求。我已经测试了一个非常相似的 Chickenfoot 示例，它获取临时环境变量，并且在 Chickenfoot 中对我来说效果很好。

https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIWebBrowserPersist#Example

您可能需要使用“工具”、“选项”、“应用程序”中的应用程序关联，以确保操作设置为“保存文件”，但这些设置可能不适用于这些功能。

回答结束，开始相关的抱怨……

我确实希望有人能够修复 Chickenfoot 中的许多错误，并编写一本不错的 Cookbook 编程指南。我已经使用它很多年了，但仍然有很多基本的事情我不知道该怎么做。我终于崩溃并订阅了邮件列表，因为档案中有一些不错的脚本示例。由于 Web API 参考非常稀疏，因此需要大量搜索 pdf 参考、博客等。
我喜欢 Chickenfoot 能够如此简单地自动执行某些任务，但我需要花几天时间搜索 javascript、DOM 和 Firefox 文档才能找到完成一些它无法完成的事情的方法，因为我并不是真正的 Web 程序员。 Chickenfoot 的目标似乎是我不必这样做，但不幸的是很少有人正在完善概念验证，因为麻省理工学院已经放弃了该项目。

我尝试了几种仅使用 Chickenfoot 命令的方法，并确认它们不适用于最新的 Firefox 3 和 Chickenfoot 1.0.7。

我希望这有帮助！祝你好运。抱歉，我昨天才看到你的问题，但发现它太有趣了，不能单独留下。

回复收藏 0 原文

瞎闹 2024-10-13 02:59:50

出于安全考虑，您将无法单击 Firefox 对话框。
下载 URL 内容的最佳方法是读取然后写入 URL 内容。

// Chickenfoot 1.0.7 Javascript Code to download the content of a url.
include( "fileio.js" ); // enables the write function.
var url = "http://google.com", 
    saveFileTo = "c://chickenfoot-google.com";

write( saveFileTo, read( url ) );

您可能会发现将 jquery 与 Chickenfoot 结合使用很有帮助。
http://groups.csail.mit.edu /uid/chickenfoot/scripts/index.php?title=使用_jQuery,_jQuery_UI_and_similar_libraries

You won't be able to click on Firefox dialogs for the sake of security.
The best way to download the content of a URL is to read then write the content of the URL.

// Chickenfoot 1.0.7 Javascript Code to download the content of a url.
include( "fileio.js" ); // enables the write function.
var url = "http://google.com", 
    saveFileTo = "c://chickenfoot-google.com";

write( saveFileTo, read( url ) );

You might find it helpful to use jquery with chickenfoot.
http://groups.csail.mit.edu/uid/chickenfoot/scripts/index.php?title=Using_jQuery,_jQuery_UI_and_similar_libraries

回复收藏 0 原文