保存 html 页面 + 更改所有链接以指向正确的位置
你可能知道IE有一个可以保存网页的东西,它会自动下载html文件以及html文件使用的所有图像/css/js文件。
现在有一个问题 - html 文件中的链接没有改变。 因此,如果我下载 example.com 的 html 页面,其中有一个 < a href=/hi.html> 我用 IE 下载的页面将有一个指向 C:\Documents and Settings...(html 文件所在文件夹的路径)的链接。
有没有一个Python库可以为我下载一个html页面,以及它的所有内容(images/js/css)? 如果是,是否有一个库也会为我更改链接?
谢谢!!
You probably know that IE has this thing where you can save a web page, and it will automatically download the html file and all he image/css/js files that the html file uses.
Now there is one problem with this- the links in the html file are not changed.
So if I download the html page of example.com, which has an < a href=/hi.html> the page that I downloaded with IE will have a link to C:\Documents and Settings...(path to the folder that the html file is in).
Is there a python library that will download an html page for me, with all the contents of it (images/js/css) too?
If yes, is there a library that will also change the links for me?
Thanks!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
由于您具体提到了 IE,我不确定这对您是否有任何用处,但在 Linux 上,完全镜像网站的最简单方法是使用 wget 命令。
如果您需要更多选项,请运行 man wget。
Since you're mentioning IE specifically, I'm not sure if this is gonna be of any use to you, but on linux the easiest way to completely mirror a website is with the wget command.
Run man wget if you need more options.
我编写了一个工具来将网页保存到单个独立的 html 文件中,并且链接指向应有的位置。
https://github.com/zTrix/webpage2html
I've written a tool to save web pages into a single standalone html file, and the links are pointed to the same place as it should be.
https://github.com/zTrix/webpage2html