使用命令行渲染 html 页面并保存它
我想加载一个网页并使用命令行保存它(想要获得与保存页面类似的行为,就像在 firefox 或 chrome 中保存完整页面一样。)
我尝试使用 wget 和 httrack,它们给了我 html文件正确。但是,如果 html 格式错误,浏览器会在渲染时纠正它并使用另存为,我们会得到纠正后的 html,但在 wget 或 htttrack 的情况下不会发生这种情况。
是否有任何工具可以渲染页面并将页面以及所有图像、Flash 和所有其他内容保存在本地。
I would like to load a web page and save it using command line ( want to get a similar behavior that we get for save page as for a complete page in firefox or chrome.)
I tried using wget and httrack, they give me the html files correctly. But in case of a malformed html the browser corrects it while rendering and using save as over there we get the corrected html but this doesnot happen in case of wget or htttrack.
Is there any tool that would render the page and save the page along with all the images and flash and all other stuff in local.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我找不到其他任何东西,所以最终在 Firefox 中打开页面,然后单击“另存为”按钮并保存它。使用 Firefox 和 xdotools 为其编写了一个脚本来自动执行整个任务。
感谢各位朋友的帮助和意见。
I couldn't find anything else so finally ended up opening the page in firefox and click on the save as button and saving it.. Wrote a script for it using firefox and xdotools to automate the whole task.
Thanks for all the help and views friends.
当我想保存页面供离线使用时,我使用一个名为“Scrapbook”的 Firefox 插件。当然,这不允许您的命令行要求。但如果你使用“htmlunit”之类的工具,你可以驱动Firefox浏览器转到你想要保存的页面。
When I want to save pages for offline use, I use a Firefox plugin called "Scrapbook". That, of course, does not allow for your command line requirement. But if you use a tool like 'htmlunit' or something like that, you can drive the Firefox browser to go to the page you want to save.
今天我觉得需要类似的东西(并选择了 xdotool 路径)。您可以在以下位置找到我的版本(可重用的 bash 脚本): https://github.com/ abiyani/自动保存页面为
I felt the need for something similar today (and went the
xdotool
path). You can find my version (a reusable bash script) at: https://github.com/abiyani/automate-save-page-as您可以将 curl 或 wget 与 tidyhtml,即
Tidy 应该能够将任何无效的 HTML 标记转换为有效的 XTML。
You could use curl or wget in combination with tidyhtml, i.e.
Tidy should be able to convert any invalid HTML markup to valid XTML.
有一些复杂的软件可以做到这一点:
https://launchpad.net/shotfactory
There is some sophisticated software available that does exactly that:
https://launchpad.net/shotfactory