如何使用 wget 指定文件名在本地镜像 html 文件及其随附图像？

发布于 2025-01-07 17:30:20 字数 644 浏览 0 评论 0原文

我需要获取一个 html 格式的 url 及其随附的图像。 html 将使用自定义文件名保存（我在调用脚本中给它一个时间戳），并且图像也需要具有这些时间戳。

结果文件夹的结构应如下所示：

2012-02-22 06:00:00 UTC.html
2012-02-22 07:00:00 UTC.html
2012-02-22 08:00:00 UTC.html
img1_2012-02-22 06:00:00 UTC.gif
img2_2012-02-22 06:00:00 UTC.gif
img1_2012-02-22 07:00:00 UTC.gif
img2_2012-02-22 07:00:00 UTC.gif
img1_2012-02-22 08:00:00 UTC.gif
img2_2012-02-22 08:00:00 UTC.gif

本质上，这是一个镜像，需要将图像路径重写为 html 中的本地相对路径。我尝试过 wget 的 --directory-prefix 和 --output-document ，但显然没有真正成功，因为它将图像嵌入 html 输出文件中。

这对于 Stock wget 是可行的还是最好编写我自己的脚本来将每个文件拉下来，然后解析 html 文件并适当地替换字符串？

原文

I need to grab a url as html along with it's accompanying images. The html is to be saved with a custom filename (I'm giving it a timestamp in the calling script) and the images need to have these timestamps as well.

The resultant folder should should be structured like:

2012-02-22 06:00:00 UTC.html
2012-02-22 07:00:00 UTC.html
2012-02-22 08:00:00 UTC.html
img1_2012-02-22 06:00:00 UTC.gif
img2_2012-02-22 06:00:00 UTC.gif
img1_2012-02-22 07:00:00 UTC.gif
img2_2012-02-22 07:00:00 UTC.gif
img1_2012-02-22 08:00:00 UTC.gif
img2_2012-02-22 08:00:00 UTC.gif

Essentially this is a mirror that needs to rewrite the images paths for the local relative path in the html. I've played around with wget's --directory-prefix and --output-document with no real success obviously since it embeds the images in the html output file.

Is this doable with stock wget or is it better to write my own script to just pull each file down and then parse up the html file replacing the strings appropriately?

分享到QQ

分享到微博