在 shell 中获取文件大小(以字节为单位)的便携式方法
在 Linux 上,我使用 stat --format="%s" FILE
,但 我有权访问的 Solaris 计算机没有 stat
命令。那我应该用什么?
我正在编写 Bash 脚本,但无法真正在系统上安装任何新软件。
我考虑过已经使用:
perl -e '@x=stat(shift);print $x[7]' FILE
甚至:
ls -nl FILE | awk '{print $5}'
但这些看起来都不明智 - 运行 Perl 只是为了获取文件大小?或者运行两个程序来执行相同的操作?
On Linux, I use stat --format="%s" FILE
, but the Solaris machine I have access to doesn't have the stat
command. What should I use then?
I'm writing Bash scripts and can't really install any new software on the system.
I've considered already using:
perl -e '@x=stat(shift);print $x[7]' FILE
or even:
ls -nl FILE | awk '{print $5}'
But neither of these looks sensible - running Perl just to get file size? Or running two programs to do the same?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(16)
尝试 du -ks | awk '{print $1*1024}'。这可能会起作用。
Try
du -ks | awk '{print $1*1024}'
. That might just work.你的第一个 Perl 例子对我来说看起来并不不合理。
正是由于这样的原因,我从编写 shell 脚本(使用 Bash、sh 等)迁移到使用 Perl 编写除最简单的脚本以外的所有脚本。我发现我必须启动 Perl 来满足特定要求,并且随着我越来越多地这样做,我意识到用 Perl 编写脚本可能更强大(就语言和通过 < 提供的广泛库而言) a href="http://www.cpan.org" rel="nofollow noreferrer">CPAN)和更有效的方式来实现我想要的。
请注意,其他 shell 脚本语言(例如 Python 和 Ruby)无疑具有类似的功能,您可能需要根据您的目的评估这些功能。我只讨论 Perl,因为这是我使用并且熟悉的语言。
You first Perl example doesn't look unreasonable to me.
It's for reasons like this that I migrated from writing shell scripts (in Bash, sh, etc.) to writing all but the most trivial scripts in Perl. I found that I was having to launch Perl for particular requirements, and as I did that more and more, I realised that writing the scripts in Perl was probably a more powerful (in terms of the language and the wide array of libraries available via CPAN) and more efficient way to achieve what I wanted.
Note that other shell-scripting languages (e.g., Python and Ruby) will no doubt have similar facilities, and you may want to evaluate these for your purposes. I only discuss Perl since that's the language I use and am familiar with.
我不知道 GNU Gawk 的
filefuncs
扩展名是.基本语法是该语法允许一次检查多个文件。对于单个文件来说,
几乎没有任何增量节省。但不可否认,它比
stat
稍微慢一点:最后,一种将每个单个字节读取到 AWK 数组。此方法适用于二进制文件(正面或背面没有差异):
但这不是最快的方法,因为您将其全部存储在 RAM 中。正常的 AWK 范例是按行运行的。问题是,对于像 MP4 文件这样的二进制文件,如果它们没有完全结束在
\n
上,length + NR
方法的求和会超出 1。下面的代码是一种包罗万象的形式,明确使用最后一个或 2 个字节作为行分割符RS
。我发现使用二进制文件的2字节方法要快得多,而1字节方法则是典型的文本文件 以换行符结尾。对于二进制文件,1 字节的二进制文件可能会过于频繁地进行行分割并减慢速度。
但我们在这里几乎是挑剔的,因为
mawk2
读取 1.83 GB .txt 文件的每个字节只用了 0.95 秒< /em>,所以除非您正在处理大量数据,否则它可以忽略不计。尽管如此,正如其他人提到的,stat 仍然是迄今为止最快的,因为它是操作系统文件系统调用。
(MP4 的文件权限已更新,因为 AWK 方法需要它。)
I don't know how portable GNU Gawk's
filefuncs
extension is. The basic syntax isThat syntax allows checking multiple files at once. For a single file, it's
It is hardly any incremental savings. But admittedly it is slightly slower than
stat
straight up:And finally, a terse method of reading every single byte into an AWK array. This method works for binary files (front or back makes no diff):
But that's not the fastest way because you're storing it all in RAM. The normal AWK paradigm operates upon lines. The issue is that for binary files like MP4 files, if they don't end exactly on
\n
, the summing oflength + NR
method would overcount by one. The code below is a form of catch-all by explicitly using the last 1-or-2-byte as the line-splitterRS
.I found that it's much faster with the 2-byte method for binaries, and the 1-byte method it's a typical text file that ends with newlines. With binaries, 1-byte one may end up row-splitting far too often and slowing it down.
But we're close to nitpicking here, since all it took
mawk2
to read in every single byte of that 1.83 GB .txt file was 0.95 seconds, so unless you're processing massive volumes, it's negligible.Nonetheless,
stat
is still by far the fastest, as mentioned by others, since it's an OS filesystem call.(The file permissions for MP4 was updated because the AWK method required it.)
我会使用 ls 来获得更好的速度,而不是 wc ,它将读取管道中的所有流:
这是纯字节
使用标志--b M或--b G作为兆字节或千兆字节的输出(每句话:不可移植作者:@Andrew Henle 在评论中)。
顺便说一句,如果您打算使用:du cut,
,或者,通过du awk
或stat:
I'd use ls for a better speed instead of wc which will read all the stream in a pipeline:
This is in plain bytes
Use the flag --b M or --b G for the output in megabytes or gigabytes (per saying: not portable by @Andrew Henle on the comments).
BTW, if you're planning to go for: du cut
Or, by du awk
Or stat:
如果您的 Solaris 上有 Perl,那么就使用它。否则, ls 与 AWK 是您的下一个最佳选择,因为您没有 stat 或您的 查找 不是 GNU 查找。
If you have Perl on your Solaris, then use it. Otherwise, ls with AWK is your next best bet, since you don't have stat or your find is not GNU find.
我在 Solaris 中使用过一个技巧。如果您询问多个文件的大小,它只会返回没有名称的总大小 - 因此请包含一个空文件,例如 /dev/null 作为第二个文件:
例如,
我可以'不记得这适用于哪个大小命令 - ls,wc 等 - 不幸的是我没有 Solaris 机器来测试它。
There is a trick in Solaris I have used. If you ask for the size of more than one file, it returns just the total size with no names - so include an empty file like /dev/null as the second file:
For example,
I can't remember which size command this works for - ls, wc, etc. - unfortunately I don't have a Solaris box to test it.
在 Linux 上,您可以使用
du -h $FILE
。这也可以在 Solaris 上运行。On Linux you can use
du -h $FILE
. That may work on Solaris too.<代码>wc -c < filename (字数统计的缩写,
-c
打印字节数)是可移植的,POSIX 解决方案。只是输出格式在不同平台上可能不统一,因为可能会在前面添加一些空格(Solaris 就是这种情况)。不要省略输入重定向。当文件作为参数传递时,文件名将在字节计数之后打印。
我担心它不适用于二进制文件,但它在 Linux 和 Solaris 上都可以正常工作。您可以尝试使用
wc -c < /usr/bin/wc
。此外,POSIX 实用程序保证能够处理二进制文件,除非另有明确指定。wc -c < filename
(short for word count,-c
prints the byte count) is a portable, POSIX solution. Only the output format might not be uniform across platforms as some spaces may be prepended (which is the case for Solaris).Do not omit the input redirection. When the file is passed as an argument, the file name is printed after the byte count.
I was worried it wouldn't work for binary files, but it works OK on both Linux and Solaris. You can try it with
wc -c < /usr/bin/wc
. Moreover, POSIX utilities are guaranteed to handle binary files, unless specified otherwise explicitly.我最终编写了自己的程序(非常小)来显示大小。更多信息请参见bfsize -以字节为单位打印文件大小(仅此而已)。
我认为使用常见 Linux 工具的两种最简洁的方法是:
但我只是不想输入参数或通过管道传输输出只是为了获取文件大小,所以我使用自己的 bfsize 。
I ended up writing my own program (really small) to display just the size. More information is in bfsize - print file size in bytes (and just that).
The two cleanest ways in my opinion with common Linux tools are:
But I just don't want to be typing parameters or pipe the output just to get a file size, so I'm using my own bfsize.
尽管
du
通常会打印磁盘使用情况而不是实际数据大小,GNU Core Utilitiesdu
可以打印文件的“表观大小” ” 以字节为单位:但它无法在 BSD、Solaris, macOS等
Even though
du
usually prints disk usage and not actual data size, the GNU Core Utilitiesdu
can print a file's "apparent size" in bytes:But it won't work under BSD, Solaris, macOS, etc.
最后我决定使用ls,以及Bash数组扩展:
这不是很好,但至少它只做了一次fork+execve,并且它不依赖于辅助编程语言(Perl,Ruby、Python ,或其他什么)。
Finally I decided to use ls, and Bash array expansion:
It's not really nice, but at least it does only one fork+execve, and it doesn't rely on a secondary programming language (Perl, Ruby, Python, or whatever).
BSD 系统具有
stat
具有来自 GNU 核心实用程序之一,但具有类似的功能。这适用于 macOS (在 10.12 上测试),FreeBSD,NetBSD 和 OpenBSD。
BSD systems have
stat
with different options from the GNU Core Utilities one, but with similar capabilities.This works on macOS (tested on 10.12), FreeBSD, NetBSD and OpenBSD.
处理 ls -n 输出时,作为难以移植的 shell 数组的替代方案,您可以使用位置参数,它们形成唯一的数组,并且是标准 shell 中唯一的局部变量。将位置参数的覆盖包装在函数中,以保留脚本或函数的原始参数。
这会根据当前的
IFS
分割 ln -dn 的输出code> 环境变量设置,将其分配给位置参数并回显第五个。-d
确保正确处理目录,-n
确保不需要解析用户和组名称,这与-l
不同。此外,理论上,包含空格的用户名和组名可能会破坏预期的行结构;它们通常是不允许的,但这种可能性仍然让程序员停下来思考。When processing
ls -n
output, as an alternative to ill-portable shell arrays, you can use the positional arguments, which form the only array and are the only local variables in the standard shell. Wrap the overwrite of positional arguments in a function to preserve the original arguments to your script or function.This splits the output of
ln -dn
according to currentIFS
environment variable settings, assigns it to positional arguments and echoes the fifth one. The-d
ensures directories are handled properly and the-n
assures that user and group names do not need to be resolved, unlike with-l
. Also, user and group names containing white space could theoretically break the expected line structure; they are usually disallowed, but this possibility still makes the programmer stop and think.跨平台最快的解决方案(它仅使用单个 fork() 来表示 ls,不会尝试计算实际字符,不会产生不需要的 awk、perl 等)。
它在 Mac OS X 和 Linux 上进行了测试。对于 Solaris,它可能需要进行少量修改:
如果需要,请简化 ls 参数,并调整 ${__ln[3]} 中的偏移量。
注意:它将遵循符号链接。
Cross-platform fastest solution (it only uses a single fork() for ls, doesn't attempt to count actual characters, doesn't spawn unneeded awk, perl, etc.).
It was tested on Mac OS X and Linux. It may require minor modification for Solaris:
If required, simplify ls arguments, and adjust the offset in ${__ln[3]}.
Note: It will follow symbolic links.
如果您使用 GNU fileutils 中的
find
:不幸的是,
find
的其他实现通常不支持-maxdepth
,也不支持-printf< /代码>。例如 Solaris 和 macOS
find
就是这种情况。If you use
find
from GNU fileutils:Unfortunately, other implementations of
find
usually don't support-maxdepth
, nor-printf
. This is the case for e.g. Solaris and macOSfind
.您可以使用
find
命令来获取一些文件集(此处提取临时文件)。然后,您可以使用du
命令通过-h
开关以人类可读的形式获取每个文件的文件大小。输出:
You can use the
find
command to get some set of files (here temporary files are extracted). Then you can use thedu
command to get the file size of each file in a human-readable form using the-h
switch.Output: