如何从 shell 脚本获取远程文件大小?

发布于 2024-10-09 05:09:05 字数 124 浏览 6 评论 0 原文

有没有办法像 shell 脚本一样获取远程文件的大小

http://api.twitter.com/1/statuses/public_timeline.json

Is there a way to get the size of a remote file like

http://api.twitter.com/1/statuses/public_timeline.json

in shell script?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

嘿嘿嘿 2024-10-16 05:09:05

您可以下载该文件并获取其大小。但我们可以做得更好。

使用 curl 仅获取 响应标头使用-I选项。

在响应标头中查找 Content-Length:,其后是文件大小(以字节为单位)。

$ URL="http://api.twitter.com/1/statuses/public_timeline.json"
$ curl -sI $URL | grep -i Content-Length
Content-Length: 134

要获取大小,请使用过滤器从上面的输出中提取数字部分:

$ curl -sI $URL | grep -i Content-Length | awk '{print $2}'
134

You can download the file and get its size. But we can do better.

Use curl to get only the response header using the -I option.

In the response header look for Content-Length: which will be followed by the size of the file in bytes.

$ URL="http://api.twitter.com/1/statuses/public_timeline.json"
$ curl -sI $URL | grep -i Content-Length
Content-Length: 134

To get the size use a filter to extract the numeric part from the output above:

$ curl -sI $URL | grep -i Content-Length | awk '{print $2}'
134
几味少女 2024-10-16 05:09:05

对其他答案的两个警告:

  1. 某些服务器不会为 HEAD 请求返回正确的内容长度,因此您可能需要执行完整下载。
  2. 除非您指定 gzip/deflate 标头,否则您可能会得到不切实际的大响应(与现代浏览器相比)。

另外,您可以在没有 grep/awk 或管道的情况下执行此操作:

curl 'http://api.twitter.com/1/statuses/public_timeline.json' --location --silent --write-out 'size_download=%{size_download}\n' --output /dev/null

并且使用压缩来执行相同的请求:

curl 'http://api.twitter.com/1/statuses/public_timeline.json' --location --silent  -H 'Accept-Encoding: gzip,deflate' --write-out 'size_download=%{size_download}\n' --output /dev/null

Two caveats to the other answers:

  1. Some servers don't return the correct Content-Length for a HEAD request, so you might need to do the full download.
  2. You'll likely get an unrealistically large response (compared to a modern browser) unless you specify gzip/deflate headers.

Also, you can do this without grep/awk or piping:

curl 'http://api.twitter.com/1/statuses/public_timeline.json' --location --silent --write-out 'size_download=%{size_download}\n' --output /dev/null

And the same request with compression:

curl 'http://api.twitter.com/1/statuses/public_timeline.json' --location --silent  -H 'Accept-Encoding: gzip,deflate' --write-out 'size_download=%{size_download}\n' --output /dev/null
っ左 2024-10-16 05:09:05

类似于 codaddict 的回答 ,但没有调用 grep

curl -sI http://api.twitter.com/1/statuses/public_timeline.json | awk '/Content-Length/ { print $2 }'

Similar to codaddict's answer, but without the call to grep:

curl -sI http://api.twitter.com/1/statuses/public_timeline.json | awk '/Content-Length/ { print $2 }'
同展鸳鸯锦 2024-10-16 05:09:05

我认为最简单的方法是:

  1. 使用 cURL 以静默模式运行 -s

  2. 仅拉动标头 -I(以避免下载整个文件)

  3. 然后执行不区分大小写的 grep -i

  4. 并返回第二个arg 使用 awk $2.

  5. 输出以字节形式返回

示例:

curl -sI http://api.twitter.com/1/statuses/public_timeline.json | grep -i content-length | awk '{print $2}'

//output: 52

curl -sI https://code.jquery.com/jquery-3.1.1.min.js | grep -i content-length | awk '{print $2}'

//output: 86709

curl -sI http://download.thinkbroadband.com/1GB.zip | grep -i content-length | awk '{print $2}'

//output: 1073741824

显示为千字节/兆字节

如果您想以千字节为单位显示大小然后将 awk 更改为:

awk '{print $2/1024}'

或兆字节

awk '{print $2/1024/1024}'

I think the easiest way to do this would be to:

  1. use cURL to run in silent mode -s,

  2. pull only the headers -I (so as to avoid downloading the whole file)

  3. then do a case insensitive grep -i

  4. and return the second arg using awk $2.

  5. output is returned as bytes

Examples:

curl -sI http://api.twitter.com/1/statuses/public_timeline.json | grep -i content-length | awk '{print $2}'

//output: 52

or

curl -sI https://code.jquery.com/jquery-3.1.1.min.js | grep -i content-length | awk '{print $2}'

//output: 86709

or

curl -sI http://download.thinkbroadband.com/1GB.zip | grep -i content-length | awk '{print $2}'

//output: 1073741824

Show as Kilobytes/Megabytes

If you would like to show the size in Kilobytes then change the awk to:

awk '{print $2/1024}'

or Megabytes

awk '{print $2/1024/1024}'
柠栀 2024-10-16 05:09:05

当存在重定向时,前面的答案将不起作用。例如,如果想要 debian iso DVD 的大小,则必须使用 --location 选项,否则,报告的大小可能是 302 Moved Temporarily 答案正文的大小,而不是真实文件。
假设您有以下 url:

$ url=http://cdimage.debian.org/debian-cd/8.1.0/amd64/iso-dvd/debian-8.1.0-amd64-DVD-1.iso

使用curl,您可以获得:

$ curl --head --location ${url}
HTTP/1.0 302 Moved Temporarily
...
Content-Type: text/html; charset=iso-8859-1
...

HTTP/1.0 200 OK
...
Content-Length: 3994091520
...
Content-Type: application/x-iso9660-image
...

这就是为什么我更喜欢使用 HEAD,它是 lwp-request 命令的别名libwww-perl 软件包(在 debian 上)。它的另一个优点是它去除了额外的 \r 字符,从而简化了后续的字符串处理。

因此,要检索 debian iso DVD 的大小,可以这样做:

$ size=$(HEAD ${url})
$ size=${size##*Content-Length: }
$ size=${size%%[[:space:]]*}

请注意:

  • 此方法只需要启动一个进程,
  • 它仅适用于 bash,因为使用了特殊的扩展语法

对于其他 shell,您可能有求助于 sed、awk、grep 等。

The preceding answers won't work when there are redirections. For example, if one wants the size of the debian iso DVD, he must use the --location option, otherwise, the reported size may be that of the 302 Moved Temporarily answer body, not that of the real file.
Suppose you have the following url:

$ url=http://cdimage.debian.org/debian-cd/8.1.0/amd64/iso-dvd/debian-8.1.0-amd64-DVD-1.iso

With curl, you could obtain:

$ curl --head --location ${url}
HTTP/1.0 302 Moved Temporarily
...
Content-Type: text/html; charset=iso-8859-1
...

HTTP/1.0 200 OK
...
Content-Length: 3994091520
...
Content-Type: application/x-iso9660-image
...

That's why I prefer using HEAD, which is an alias to the lwp-request command from the libwww-perl package (on debian). Another advantages it has is that it strips the extra \r characters, which eases subsequent string processing.

So to retrieve the size of the debian iso DVD, one could do for example:

$ size=$(HEAD ${url})
$ size=${size##*Content-Length: }
$ size=${size%%[[:space:]]*}

Please note that:

  • this method will require launching only one process
  • it will work only with bash, because of the special expansion syntax used

For other shells, you may have to resort to sed, awk, grep et al..

清风夜微凉 2024-10-16 05:09:05

接受的解决方案对我不起作用,这是:

curl -s https://code.jquery.com/jquery-3.1.1.min.js | wc -c

The accepted solution was not working for me, this is:

curl -s https://code.jquery.com/jquery-3.1.1.min.js | wc -c
横笛休吹塞上声 2024-10-16 05:09:05

我有一个基于codaddict的答案的shell函数,它以人类可读的格式给出远程文件的大小:

remote_file_size () {
  printf "%q" "$*"           |
    xargs curl -sI           |
    grep Content-Length      |
    awk '{print $2}'         |
    tr -d '\040\011\012\015' |
    gnumfmt --to=iec-i --suffix=B # the `g' prefix on `numfmt' is only for systems
  # ^                             # that lack the GNU coreutils by default, i.e.,
  # |                             # non-Linux systems
  # |
  # |                             # in other words, if you're on Linux, remove this
  # |                             # letter `g'; if you're on BSD or Mac, install the GNU coreutils
} # |                                        |
  # +----------------------------------------+

I have a shell function, based on codaddict's answer, which gives a remote file's size in a human-readable format thusly:

remote_file_size () {
  printf "%q" "$*"           |
    xargs curl -sI           |
    grep Content-Length      |
    awk '{print $2}'         |
    tr -d '\040\011\012\015' |
    gnumfmt --to=iec-i --suffix=B # the `g' prefix on `numfmt' is only for systems
  # ^                             # that lack the GNU coreutils by default, i.e.,
  # |                             # non-Linux systems
  # |
  # |                             # in other words, if you're on Linux, remove this
  # |                             # letter `g'; if you're on BSD or Mac, install the GNU coreutils
} # |                                        |
  # +----------------------------------------+
软糯酥胸 2024-10-16 05:09:05

这将向您显示有关正在进行的下载的详细信息,

您只需指定一个 URL,如下例所示。

$ curl -O -w 'We downloaded %{size_download} bytes\n' 
https://cmake.org/files/v3.8/cmake-3.8.2.tar.gz

输出

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7328k  100 7328k    0     0   244k      0  0:00:29  0:00:29 --:--:--  365k
We downloaded 7504706 bytes

为了实现自动化目的,您只需将该命令添加到您的
脚本文件。

This will show you a detailed info about the ongoing download

you just need to specify an URL like below example.

$ curl -O -w 'We downloaded %{size_download} bytes\n' 
https://cmake.org/files/v3.8/cmake-3.8.2.tar.gz

output

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7328k  100 7328k    0     0   244k      0  0:00:29  0:00:29 --:--:--  365k
We downloaded 7504706 bytes

For automated purposes you'll just need to add the command to your
script file.

浅笑依然 2024-10-16 05:09:05

将以上所有内容结合起来对我来说是可行的:

URL="http://cdimage.debian.org/debian-cd/current/i386/iso-dvd/debian-9.5.0-i386-DVD-1.iso"
curl --head --silent --location "$URL" | grep -i "content-length:" | tr -d " \t" | cut -d ':' -f 2

这将仅返回内容长度(以字节为单位):

3767500800

To combine all the above for me works:

URL="http://cdimage.debian.org/debian-cd/current/i386/iso-dvd/debian-9.5.0-i386-DVD-1.iso"
curl --head --silent --location "$URL" | grep -i "content-length:" | tr -d " \t" | cut -d ':' -f 2

This will return just the content length in bytes:

3767500800
当梦初醒 2024-10-16 05:09:05

您可以这样做,包括自动遵循 301/302 重定向:

curl -ILs 'https://twitter.com/i/csp_report?a=ORTGK%3D%3D%3D&ro=fals' | 

mawk 'NF*=!_


 1  41

这是非常暴力的方法,但可以完成工作 - 但这是服务器报告的原始值,因此您可能必须根据需要对其进行调整。

您可能还需要添加 -g 标志,以便它可以自动处理从普通 httphttps 的切换:

curl -gILs 'http://apple.com' | 

mawk 'NF *= !_


 1  304

 2  106049 

    '(I''m *guessing* this might be the main site, 
      and first item was the redirection page ? )'

You can kinda do it like this, including auto-following 301/302 redirections :

curl -ILs 'https://twitter.com/i/csp_report?a=ORTGK%3D%3D%3D&ro=fals' | 

mawk 'NF*=!_<NF' \
      OFS=   FS='^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Ll][Ee][Nn][Gg][Tt][Hh]: ' 
 1  41

It's very brute force but gets the job done - but that's whatever raw value being reported by the server, so you may have to make adjustments to it as you see fit.

You may also have to add the -g flag so it can auto handle switchover from vanilla http to https :

curl -gILs 'http://apple.com' | 

mawk 'NF *= !_<NF' OFS= \
        FS='^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Ll][Ee][Nn][Gg][Tt][Hh]: ' 
 1  304

 2  106049 

    '(I''m *guessing* this might be the main site, 
      and first item was the redirection page ? )'
只是一片海 2024-10-16 05:09:05

问题很旧并且已经得到了充分的回答,但让我们扩展现有的答案。如果您想自动执行此任务(用于检查多个文件的文件大小),那么这里有一个行。

首先将文件的 URL 写入文件中:

cat url_of_files.txt

https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04101_00001-seg002_nis_x1dints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04101_00001-seg003_nis_calints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04102_00001-seg001_nis_calints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_02101_00002-seg001_nis_cal.fits
... 

然后从命令行(与 url_of_files.txt 位于同一目录):

eval $(sed -rn '/^https/s/(https.*$)/curl -sI \1/p' url_of_files.txt) | awk '/[Cc]ontent-[Ll]ength/{kb=$2/1024;mb=kb/1024;gb=mb/1024;print ( $2>1024 ? ( kb>1024 ? ( mb>1024 ?  gb " G" : mb " M") : kb " K" ) : $2 " B" ) }'


这是用于检查文件大小范围从字节Gbs。我使用这条线来检查 JWST 团队提供的拟合数据文件。

它检查文件大小,并根据其大小,将其粗略地转换为适当的数字,扩展名为 B、K、M、G,表示大小(以字节、千字节、兆字节和千兆字节为单位)。

结果:

...
177.188 K
177.188 K
236.429 M
177.188 K
5.95184 M
1.83608 G
1.20326 G
130.059 M
1.20326 G
...

Question is old and have been sufficiently answered , but let expand upon exisiting answer. If you want to automate this task ( for checking file sizes of multiple files) then here's a one liner.

first write the URL of the files in a file:

cat url_of_files.txt

https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04101_00001-seg002_nis_x1dints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04101_00001-seg003_nis_calints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04102_00001-seg001_nis_calints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_02101_00002-seg001_nis_cal.fits
... 

then from the command line (from the same directory as your url_of_files.txt):

eval $(sed -rn '/^https/s/(https.*$)/curl -sI \1/p' url_of_files.txt) | awk '/[Cc]ontent-[Ll]ength/{kb=$2/1024;mb=kb/1024;gb=mb/1024;print ( $2>1024 ? ( kb>1024 ? ( mb>1024 ?  gb " G" : mb " M") : kb " K" ) : $2 " B" ) }'


This is for checking file sizes ranging from bytes to Gbs. I use this line to check the fits data files being made available by the JWST team.

It checks the file size and depending on its size , roughly converts it to a an appropriate number with B,K,M,G extensions denoting the size in Bytes, Kilo bytes, Mega bytes, and Giga bytes.

result:

...
177.188 K
177.188 K
236.429 M
177.188 K
5.95184 M
1.83608 G
1.20326 G
130.059 M
1.20326 G
...
长途伴 2024-10-16 05:09:05

我的解决方案是使用 awk END 来确保仅 grep 最后一个 Content-length:

function curl2contentlength() {
    curl -sI -L -H 'Accept-Encoding: gzip,deflate' $1 | grep -i Content-Length | awk 'END{print $2}'
}
curl2contentlength $@

./curl2contentlength.sh "https://chrt.fm/track/B63133/stitcher.simplecastaudio.com/ec74d48c-cbf1-4764-923e-7d584dce50fa/episodes/a85954a3-24c3-48ed-bced-ef0607b7149a/audio/128/default.mp3?aid=rss_feed&awCollectionId=ec74d48c-cbf1-4764-923e-7d584dce50fa&awEpisodeId=a85954a3-24c3-48ed-bced-ef0607b7149a&feed=qm_9xx0g"

10806508

事实上,如果没有它,就会是

0
0
10806508

My solution is using awk END to ensure to grep only the last Content-length:

function curl2contentlength() {
    curl -sI -L -H 'Accept-Encoding: gzip,deflate' $1 | grep -i Content-Length | awk 'END{print $2}'
}
curl2contentlength $@

./curl2contentlength.sh "https://chrt.fm/track/B63133/stitcher.simplecastaudio.com/ec74d48c-cbf1-4764-923e-7d584dce50fa/episodes/a85954a3-24c3-48ed-bced-ef0607b7149a/audio/128/default.mp3?aid=rss_feed&awCollectionId=ec74d48c-cbf1-4764-923e-7d584dce50fa&awEpisodeId=a85954a3-24c3-48ed-bced-ef0607b7149a&feed=qm_9xx0g"

10806508

In fact without it would have been

0
0
10806508
梦晓ヶ微光ヅ倾城 2024-10-16 05:09:05

我像这样使用 ([Cc]ontent-[Ll]ength:),因为我让服务器在标头响应中给出多个 Content-Length 字符

curl -sI "http://someserver.com/hls/125454.ts" | grep [Cc]ontent-[Ll]ength: | awk '{ print $2 }'

Accept-Ranges: bytes
访问控制公开标头:日期、服务器、内容类型、内容长度
服务器:WowzaStreamingEngine/4.5.0
缓存控制:无缓存
访问控制允许来源:*
访问控制允许凭据: true
访问控制允许方法:OPTIONS、GET、POST、HEAD
访问控制允许标头:内容类型、用户代理、If-Modified-Since、缓存控制、范围
日期:2017 年 1 月 10 日星期二 01:56:08 GMT
内容类型:视频/MP2T
内容长度:666460

I use like this ([Cc]ontent-[Ll]ength:), because I got server give multiple Content-Length character at header response

curl -sI "http://someserver.com/hls/125454.ts" | grep [Cc]ontent-[Ll]ength: | awk '{ print $2 }'

Accept-Ranges: bytes
Access-Control-Expose-Headers: Date, Server, Content-Type, Content-Length
Server: WowzaStreamingEngine/4.5.0
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: OPTIONS, GET, POST, HEAD
Access-Control-Allow-Headers: Content-Type, User-Agent, If-Modified-Since, Cache-Control, Range
Date: Tue, 10 Jan 2017 01:56:08 GMT
Content-Type: video/MP2T
Content-Length: 666460

兲鉂ぱ嘚淚 2024-10-16 05:09:05

不同的解决方案:

ssh userName@IP ls -s PATH | grep FILENAME | awk '{print$1}'

为您提供以 KB 为单位的大小

different solution:

ssh userName@IP ls -s PATH | grep FILENAME | awk '{print$1}'

gives you the size in KB

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文