如何使用bash下载大约1000个小文件(〜20kb)

发布于 2025-02-03 05:47:49 字数 356 浏览 3 评论 0 原文

在URL中,有一定数量的小JPEG文件(约20KB),例如:

https://example.com/file=1.0
https://example.com/file=1.1
...
https://example.com/file=1.973
https://example.com/file=1.974

如何使用bash脚本下载它们?我不知道如何编写脚本,但是我认为使用WGET有一些简单的方法。 它们具有相同的 filename.jpeg ,因此需要使用连续名称下载它们,例如 filename-1.jpg filename-2.jpg 。 。

There is a certain amount (~1000) of small jpeg files (about 20Kb) that are located at the URLs like:

https://example.com/file=1.0
https://example.com/file=1.1
...
https://example.com/file=1.973
https://example.com/file=1.974

How to download them using bash script? I do not know how to write scripts, but I think that there is some simple way using wget for example.
They have the same filename.jpeg so need to download them with consecutive names like filename-1.jpg, filename-2.jpg ...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

贱人配狗天长地久 2025-02-10 05:47:49

curl具有此内置功能,能够下载带有生成序列的多个URL,并将这些相同的序列应用于保存的文件名:

curl \
  --user-agent 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36' \
  --parallel \
  --url "https://example.com/file=1.[0-974]" \
  --output "filename-#1.jpg"

请参阅: https://curl.se/docs/manpage.html#-o

-o - 输出< file>

将输出写入< file> 而不是STDOUT。如果您使用的是 {} [] 以获取多个文档,则应引用URL,并且可以使用'',然后是一个< file> 指定符中的编号。该变量将被当前字符串替换,以获取URL。喜欢:

或使用几个变量,例如:

  curl“ http:// {site,host} .host [1-5] .com“ -o” -o“#1_#2”

您可以使用此选项与所拥有的URL数量多倍。例如,如果您在同一命令行上指定两个URL,则可以这样使用:

  curl -o aa example.com -o bb example.net

以及-O选项和URL的顺序无关紧要,仅第一个-O是第一个URL等,因此上述命令行也可以写为

  curl example.com example.net -o aa -o bb

另请参阅 - create-dirs 选项动态创建本地目录。将输出指定为' - '(单个破折号)将迫使输出要完成。

为了抑制响应主体,您可以将输出重定向到/dev/null

  curl example.com -o/dev/null 

或用于Windows使用 nul

  curl example.com -o nul 

示例:

  curl -o文件https://example.com
curl“ http:// {一个,两个,两个} .example.com” -O“ file_#1.txt”
curl“ http:// {site,host} .host [1-5] .com“ -o”#1_#2”
curl -o file https://example.com -o file2 https://example.net
 

另请参见 - o, - romote-name - j, - remote-header-name

Curl has this built-in feature to be able to download multiple URLs with generated sequences and apply those same sequences to the saved file name:

curl \
  --user-agent 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36' \
  --parallel \
  --url "https://example.com/file=1.[0-974]" \
  --output "filename-#1.jpg"

See: https://curl.se/docs/manpage.html#-o

-o, --output <file>

Write output to <file> instead of stdout. If you are using {} or [] to fetch multiple documents, you should quote the URL and you can use '#' followed by a number in the <file> specifier. That variable will be replaced with the current string for the URL being fetched. Like in:

 curl "http://{one,two}.example.com" -o "file_#1.txt"

or use several variables like:

 curl "http://{site,host}.host[1-5].com" -o "#1_#2"

You may use this option as many times as the number of URLs you have. For example, if you specify two URLs on the same command line, you can use it like this:

  curl -o aa example.com -o bb example.net

and the order of the -o options and the URLs does not matter, just that the first -o is for the first URL and so on, so the above command line can also be written as

  curl example.com example.net -o aa -o bb

See also the --create-dirs option to create the local directories dynamically. Specifying the output as '-' (a single dash) will force the output to be done to stdout.

To suppress response bodies, you can redirect output to /dev/null:

  curl example.com -o /dev/null

Or for Windows use nul:

  curl example.com -o nul

Examples:

curl -o file https://example.com
curl "http://{one,two}.example.com" -o "file_#1.txt"
curl "http://{site,host}.host[1-5].com" -o "#1_#2"
curl -o file https://example.com -o file2 https://example.net

See also -O, --remote-name, --remote-name-all and -J, --remote-header-name.

昵称有卵用 2025-02-10 05:47:49

编辑:以下的改进版本将是(感谢评论):(

for i in {0..974}; do wget https://example.com/file=1."$i"; done;

在此特定情况下可能不需要引用,因为只有数字值,但可以遵守最佳实践。)

您可以使用对于带有 seq 之类的>:

for i in `seq 0 974`; do wget https://example.com/file=1.$i; done;

Edit: An improved version of the below would be (thanks to the comments):

for i in {0..974}; do wget https://example.com/file=1."$i"; done;

(The quotes might not be required in this particular case as there are only numeric values, but it's good to adhere to best practices.)

You can use for with seq like this:

for i in `seq 0 974`; do wget https://example.com/file=1.$i; done;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文