是否有一个实用程序结合了 wget 的递归下载和时间戳功能以及curl 的日期过滤功能?
我已经设置了这个 cron 作业,它可以递归地下载所有内容,并且如果目标目录中的文件不早于站点中的文件(或不同大小),则不会替换它们:(
* * * * * wget -r -N -c -P /home/user1/ http://SomeURL
刚刚设置了被调用的 cron 的频率为了我自己的测试目的,我不打算每分钟都运行它。)
我想知道如何修改它或使用其他实用程序来仅下载最近 X 天内修改日期的文件。天。原因是那里有很多文件,我们只需要 X 天或更早的文件,我不想让它下载所有内容(即使它只会在第一次发生)。
我已经看到,curl 具有仅在晚于某个日期时下载某些内容的功能,但是如果遵循一种非常简单的模式,curl 下载一次只能下载多个文件(至少这是我的理解)。
我考虑过使用 wget 递归地获取文件列表,然后对每个文件执行curl,但我找不到从 wget 命令获取列表的方法(通过 http)。 Curl 有一种获取列表的方法,但它不是递归的。
我希望有一些我不知道的其他实用程序可以完成此任务。
谢谢, 本
I have set up this cron job that works to download everything recursively and does not replace files in the destination directory if they aren't older than that in the site (or different size):
* * * * * wget -r -N -c -P /home/user1/ http://SomeURL
(The frequency of the cron being called was just set to every minute for my own testing purposes. I don't plan on running it every minute.)
I would like to know how I could modify this or use some other utility to only download files with a modified date within the last X number of days. Reason being there are a lot of files in there, we only need the ones X days old or younger, and I'd rather not have it download everything (even if it only would happen the first time).
I have seen that curl has the feature to download something only if it is later than a certain date, but curl downloads can only download more than one file at a time if they follow a very simple pattern (at least that's my understanding).
I've thought about using wget to recursively get a list of files, then perform curl on each, but I could not find a way to get a list from wget command (over http). Curl has a way to get a list, but it is not recursive.
I'm hoping that there is some other utility that I do not know about that can achieve this task.
Thanks,
Ben
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论