当前位置：文江博客话题详情

在 Linux 服务器上保存完整网页的最佳方法是什么？

发布于 2024-10-14 01:51:59 字数 309 浏览 13 评论 0原文

我需要在我的 Linux 服务器上归档完整的页面，包括任何链接的图像等。寻找最佳解决方案。有没有办法保存所有资产，然后重新链接它们以在同一目录中工作？

我考虑过使用curl，但我不确定如何做到这一切。另外，我可能需要 PHP-DOM 吗？

有没有办法在服务器上使用 Firefox 并在加载地址后复制临时文件或类似的方法？

欢迎任何和所有的意见。

编辑：

似乎 wget “无法”工作，因为需要渲染文件。我在服务器上安装了 Firefox，有没有办法在 Firefox 中加载 url，然后抓取临时文件并在之后清除临时文件？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

乱了心跳 2024-10-21 01:51:59

wget -r http://yoursite.com

应该足够并抓取图像/媒体。您可以给它提供多种选择。

注意：我相信 wget 或任何其他程序都支持下载通过 CSS 指定的图像 - 因此您可能需要自己手动执行此操作。

这里可能是一些有用的参数： http://www.linuxjournal.com/内容/下载整个网站-wget

wget -r http://yoursite.com

Should be sufficient and grab images/media. There are plenty of options you can feed it.

Note: I believe wget nor any other program supports downloading images specified through CSS - so you may need to do that yourself manually.

Here may be some useful arguments: http://www.linuxjournal.com/content/downloading-entire-web-site-wget

回复收藏 0 原文

过期以后 2024-10-21 01:51:59

使用以下命令：

wget -E -k -p http://yoursite.com

使用-E调整扩展名。使用 -k 转换链接以从存储加载页面。使用 -p 下载页面内的所有对象。

请注意，此命令不会下载指定页面中超链接的其他页面。这意味着该命令仅下载正确加载指定页面所需的对象。

Use following command:

wget -E -k -p http://yoursite.com

Use -E to adjust extensions. Use -k to convert links to load the page from your storage. Use -p to download all objects inside the page.

Please note that this command does not download other pages hyperlinked in the specified page. It means that this command only download objects required to load the specified page properly.

回复收藏 0 原文

深者入戏 2024-10-21 01:51:59

wget 可以做到这一点，例如：

wget -r http://example.com/

这将镜像整个 example.com 站点。

一些有趣的选项是：

-Dexample.com：不要关注其他域的链接
--html-extension：将 text/html 内容类型的页面重命名为 .html

手册：http://www.gnu.org/software/wget/manual/

wget can do that, for example:

wget -r http://example.com/

This will mirror the whole example.com site.

Some interesting options are:

-Dexample.com: do not follow links of other domains
--html-extension: renames pages with text/html content-type to .html

Manual: http://www.gnu.org/software/wget/manual/

回复收藏 0 原文

若水微香 2024-10-21 01:51:59

在 Linux 服务器上保存完整网页的最佳方法是什么？

我尝试了几个工具 curl、wget ，但没有任何效果达到我的预期。

最后，我找到了一个保存完整网页的工具（图像、脚本、链接样式表......一切都包括在内）。它是用 Rust 编写的，名为 monolith。看看。

它将图像和其他脚本/样式表打包在 1 个 html 文件中。

示例

我可以将网页 https://nodejs.org/en/docs/es6 保存到本地文件 es6.html，使用以下命令将所有页面必需项打包在一个文件中：

monolith https://nodejs.org/en/docs/es6 -o es6.html

What's the best way to save a complete webpage on a linux server?

I tried couple of tools curl, wget included but nothing works up to my expectations.

Finally I found a tool to save a complete webpage (images, scripts, linked stylesheets.... everything included). Its written in rust named monolith. Take a look.

It packs images and other scripts/stylesheets in 1 html file.

Example

I can save webpage https://nodejs.org/en/docs/es6 to a local file es6.html with all page requisites packed in one file with the following command:

monolith https://nodejs.org/en/docs/es6 -o es6.html

回复收藏 0 原文

笙痞 2024-10-21 01:51:59

如果网页中的所有内容都是静态的，您可以使用 之类的内容来解决此问题wget：

$ wget -r -l 10 -p http://my.web.page.com/

或其一些变体。

由于您还拥有动态页面，因此通常无法使用 wget 或任何简单的 HTTP 客户端来归档此类网页。正确的存档需要包含后端数据库和任何服务器端脚本的内容。这意味着正确执行此操作的唯一方法是复制支持服务器端文件。这至少包括 HTTP 服务器文档根和任何数据库文件。

编辑：

作为解决方法，您可以修改网页，以便具有适当权限的用户可以下载所有服务器端文件，以及支持数据库的文本模式转储（例如 SQL 转储）。您应该极其小心，避免通过此归档系统打开任何安全漏洞。

如果您使用虚拟主机提供商，他们中的大多数都会提供某种允许备份整个站点的 Web 界面。如果您使用实际的服务器，则可以安装大量备份解决方案，包括一些用于托管站点的基于 Web 的解决方案。

If all the content in the web page was static, you could get around this issue with something like wget:

$ wget -r -l 10 -p http://my.web.page.com/

or some variation thereof.

Since you also have dynamic pages, you cannot in general archive such a web page using wget or any simple HTTP client. A proper archive needs to incorporate the contents of the backend database and any server-side scripts. That means that the only way to do this properly is to copy the backing server-side files. That includes at least the HTTP server document root and any database files.

EDIT:

As a work-around, you could modify your webpage so that a suitably priviledged user could download all the server-side files, as well as a text-mode dump of the backing database (e.g. an SQL dump). You should take extreme care to avoid opening any security holes through this archiving system.

If you are using a virtual hosting provider, most of them provide some kind of Web interface that allows backing-up the whole site. If you use an actual server, there is a large number of back-up solutions that you could install, including a few Web-based ones for hosted sites.

回复收藏 0 原文

~没有更多了~