在 shell 中获取文件大小（以字节为单位）的便携式方法

发布于 2024-08-12 13:22:14 字数 522 浏览 8 评论 0原文

在 Linux 上，我使用 stat --format="%s" FILE，但我有权访问的 Solaris 计算机没有 stat 命令。那我应该用什么？

我正在编写 Bash 脚本，但无法真正在系统上安装任何新软件。

我考虑过已经使用：

perl -e '@x=stat(shift);print $x[7]' FILE

甚至：

ls -nl FILE | awk '{print $5}'

但这些看起来都不明智 - 运行 Perl 只是为了获取文件大小？或者运行两个程序来执行相同的操作？

原文

On Linux, I use stat --format="%s" FILE, but the Solaris machine I have access to doesn't have the stat command. What should I use then?

I'm writing Bash scripts and can't really install any new software on the system.

I've considered already using:

perl -e '@x=stat(shift);print $x[7]' FILE

or even:

ls -nl FILE | awk '{print $5}'

But neither of these looks sensible - running Perl just to get file size? Or running two programs to do the same?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

慵挽 2024-08-19 13:22:16

尝试 du -ks | awk '{print $1*1024}'。这可能会起作用。

回复收藏 0 原文

瞄了个咪的 2024-08-19 13:22:15

你的第一个 Perl 例子对我来说看起来并不不合理。

正是由于这样的原因，我从编写 shell 脚本（使用 Bash、sh 等）迁移到使用 Perl 编写除最简单的脚本以外的所有脚本。我发现我必须启动 Perl 来满足特定要求，并且随着我越来越多地这样做，我意识到用 Perl 编写脚本可能更强大（就语言和通过 < 提供的广泛库而言） a href="http://www.cpan.org" rel="nofollow noreferrer">CPAN）和更有效的方式来实现我想要的。

请注意，其他 shell 脚本语言（例如 Python 和 Ruby）无疑具有类似的功能，您可能需要根据您的目的评估这些功能。我只讨论 Perl，因为这是我使用并且熟悉的语言。

回复收藏 0 原文

温柔一刀 2024-08-19 13:22:15

我不知道 GNU Gawk 的 filefuncs 扩展名是.基本语法是

time gawk -e '@load "filefuncs"; BEGIN {
     fnL[1] = ARGV[ARGC-1];
     fts(fnL, FTS_PHYSICAL, arr); print "";

     for (fn0 in arr) {
         print arr[fn0]["path"] \
           " :: "arr[fn0]["stat"]["size"]; };

     print ""; }' genieMV_204583_1.mp4


genieMV_204583_1.mp4 :: 259105690
real    0m0.013s


ls -Aln genieMV_204583_1.mp4

----------  1 501  20  259105690 Jan 25 09:31
            genieMV_204583_1.mp4

该语法允许一次检查多个文件。对于单个文件来说，

time gawk -e '@load "filefuncs"; BEGIN {
      stat(ARGV[ARGC-1], arr);
      printf("\n%s :: %s\n", arr["name"], \
           arr["size"]); }' genieMV_204583_1.mp4

   genieMV_204583_1.mp4 :: 259105690
   real    0m0.013s

几乎没有任何增量节省。但不可否认，它比 stat 稍微慢一点：

time stat -f '%z' genieMV_204583_1.mp4

259105690
real    0m0.006s (BSD-stat)


time gstat -c '%s' genieMV_204583_1.mp4

259105690
real    0m0.009s (GNU-stat)

最后，一种将每个单个字节读取到 AWK 数组。此方法适用于二进制文件（正面或背面没有差异）：

time mawk2 'BEGIN { RS = FS = "^$";
     FILENAME = ARGV[ARGC-1]; getline;
     print "\n" FILENAME " :: "length"\n"; }' genieMV_204583_1.mp4

genieMV_204583_1.mp4 :: 259105690
real    0m0.270s


time mawk2 'BEGIN { RS = FS = "^$";
   } END { print "\n" FILENAME " :: " \
     length "\n"; }'  genieMV_204583_1.mp4

genieMV_204583_1.mp4 :: 259105690
real    0m0.269

但这不是最快的方法，因为您将其全部存储在 RAM 中。正常的 AWK 范例是按行运行的。问题是，对于像 MP4 文件这样的二进制文件，如果它们没有完全结束在 \n 上，length + NR 方法的求和会超出 1。下面的代码是一种包罗万象的形式，明确使用最后一个或 2 个字节作为行分割符 RS。

我发现使用二进制文件的2字节方法要快得多，而1字节方法则是典型的文本文件 以换行符结尾。对于二进制文件，1 字节的二进制文件可能会过于频繁地进行行分割并减慢速度。

但我们在这里几乎是挑剔的，因为 mawk2 读取 1.83 GB .txt 文件的每个字节只用了 0.95 秒< /em>，所以除非您正在处理大量数据，否则它可以忽略不计。

尽管如此，正如其他人提到的，stat 仍然是迄今为止最快的，因为它是操作系统文件系统调用。

time mawk2 'BEGIN { FS = "^$";
    FILENAME = ARGV[ARGC-1];
    cmd = "tail -c 2 \""FILENAME"\"";
    cmd | getline XRS;
    close(cmd);

    RS = ( length(XRS) == 1 ) ? ORS : XRS ;

} { bytes += length } END {

    print FILENAME " :: "  bytes + NR * length(RS) }' genieMV_204583_1.mp4

        genieMV_204583_1.mp4 :: 259105690
        real    0m0.092s

        m23lyricsRTM_dict_15.txt :: 1961512986
        real    0m0.950s


ls -AlnFT "${m3t}" genieMV_204583_1.mp4

-rw-r--r--  1 501  20  1961512986 Mar 12 07:24:11 2021 m23lyricsRTM_dict_15.txt

-r--r--r--@ 1 501  20   259105690 Jan 25 09:31:43 2021 genieMV_204583_1.mp4

（MP4 的文件权限已更新，因为 AWK 方法需要它。）

I don't know how portable GNU Gawk's filefuncs extension is. The basic syntax is

time gawk -e '@load "filefuncs"; BEGIN {
     fnL[1] = ARGV[ARGC-1];
     fts(fnL, FTS_PHYSICAL, arr); print "";

     for (fn0 in arr) {
         print arr[fn0]["path"] \
           " :: "arr[fn0]["stat"]["size"]; };

     print ""; }' genieMV_204583_1.mp4


genieMV_204583_1.mp4 :: 259105690
real    0m0.013s


ls -Aln genieMV_204583_1.mp4

----------  1 501  20  259105690 Jan 25 09:31
            genieMV_204583_1.mp4

That syntax allows checking multiple files at once. For a single file, it's

time gawk -e '@load "filefuncs"; BEGIN {
      stat(ARGV[ARGC-1], arr);
      printf("\n%s :: %s\n", arr["name"], \
           arr["size"]); }' genieMV_204583_1.mp4

   genieMV_204583_1.mp4 :: 259105690
   real    0m0.013s

It is hardly any incremental savings. But admittedly it is slightly slower than stat straight up:

time stat -f '%z' genieMV_204583_1.mp4

259105690
real    0m0.006s (BSD-stat)


time gstat -c '%s' genieMV_204583_1.mp4

259105690
real    0m0.009s (GNU-stat)

And finally, a terse method of reading every single byte into an AWK array. This method works for binary files (front or back makes no diff):

time mawk2 'BEGIN { RS = FS = "^quot;;
     FILENAME = ARGV[ARGC-1]; getline;
     print "\n" FILENAME " :: "length"\n"; }' genieMV_204583_1.mp4

genieMV_204583_1.mp4 :: 259105690
real    0m0.270s


time mawk2 'BEGIN { RS = FS = "^quot;;
   } END { print "\n" FILENAME " :: " \
     length "\n"; }'  genieMV_204583_1.mp4

genieMV_204583_1.mp4 :: 259105690
real    0m0.269

But that's not the fastest way because you're storing it all in RAM. The normal AWK paradigm operates upon lines. The issue is that for binary files like MP4 files, if they don't end exactly on \n, the summing of length + NR method would overcount by one. The code below is a form of catch-all by explicitly using the last 1-or-2-byte as the line-splitter RS.

I found that it's much faster with the 2-byte method for binaries, and the 1-byte method it's a typical text file that ends with newlines. With binaries, 1-byte one may end up row-splitting far too often and slowing it down.

But we're close to nitpicking here, since all it took mawk2 to read in every single byte of that 1.83 GB .txt file was 0.95 seconds, so unless you're processing massive volumes, it's negligible.

Nonetheless, stat is still by far the fastest, as mentioned by others, since it's an OS filesystem call.

time mawk2 'BEGIN { FS = "^quot;;
    FILENAME = ARGV[ARGC-1];
    cmd = "tail -c 2 \""FILENAME"\"";
    cmd | getline XRS;
    close(cmd);

    RS = ( length(XRS) == 1 ) ? ORS : XRS ;

} { bytes += length } END {

    print FILENAME " :: "  bytes + NR * length(RS) }' genieMV_204583_1.mp4

        genieMV_204583_1.mp4 :: 259105690
        real    0m0.092s

        m23lyricsRTM_dict_15.txt :: 1961512986
        real    0m0.950s


ls -AlnFT "${m3t}" genieMV_204583_1.mp4

-rw-r--r--  1 501  20  1961512986 Mar 12 07:24:11 2021 m23lyricsRTM_dict_15.txt

-r--r--r--@ 1 501  20   259105690 Jan 25 09:31:43 2021 genieMV_204583_1.mp4

(The file permissions for MP4 was updated because the AWK method required it.)

回复收藏 0 原文

贪了杯 2024-08-19 13:22:15

我会使用 ls 来获得更好的速度，而不是 wc ，它将读取管道中的所有流：

ls -l <filename> | cut -d ' ' -f5

这是纯字节
使用标志--b M或--b G作为兆字节或千兆字节的输出（每句话：不可移植作者：@Andrew Henle 在评论中）。

顺便说一句，如果您打算使用：du cut，

du -b <filename> | cut -f -1

请使用 -h 以获得更好的人类阅读效果

，或者，通过du awk

du -h <filename> | awk '{print $1}'

或stat：

stat <filename> | grep Size: | awk '{print $2}'

I'd use ls for a better speed instead of wc which will read all the stream in a pipeline:

ls -l <filename> | cut -d ' ' -f5

This is in plain bytes
Use the flag --b M or --b G for the output in megabytes or gigabytes (per saying: not portable by @Andrew Henle on the comments).

BTW, if you're planning to go for: du cut

du -b <filename> | cut -f -1

use -h for a better human reading

Or, by du awk

du -h <filename> | awk '{print $1}'

Or stat:

stat <filename> | grep Size: | awk '{print $2}'

回复收藏 0 原文

手心的海 2024-08-19 13:22:15

如果您的 Solaris 上有 Perl，那么就使用它。否则， ls 与 AWK 是您的下一个最佳选择，因为您没有 stat 或您的查找不是 GNU 查找。

回复收藏 0 原文

我不在是我 2024-08-19 13:22:15

我在 Solaris 中使用过一个技巧。如果您询问多个文件的大小，它只会返回没有名称的总大小 - 因此请包含一个空文件，例如 /dev/null 作为第二个文件：

例如，

command fileyouwant /dev/null

我可以'不记得这适用于哪个大小命令 - ls，wc 等 - 不幸的是我没有 Solaris 机器来测试它。

There is a trick in Solaris I have used. If you ask for the size of more than one file, it returns just the total size with no names - so include an empty file like /dev/null as the second file:

For example,

command fileyouwant /dev/null

I can't remember which size command this works for - ls, wc, etc. - unfortunately I don't have a Solaris box to test it.

回复收藏 0 原文

合约呢 2024-08-19 13:22:15

在 Linux 上，您可以使用 du -h $FILE。这也可以在 Solaris 上运行。

回复收藏 0 原文

夏夜暖风 2024-08-19 13:22:14

<代码>wc -c < filename （字数统计的缩写，-c 打印字节数）是可移植的，POSIX 解决方案。只是输出格式在不同平台上可能不统一，因为可能会在前面添加一些空格（Solaris 就是这种情况）。

不要省略输入重定向。当文件作为参数传递时，文件名将在字节计数之后打印。

我担心它不适用于二进制文件，但它在 Linux 和 Solaris 上都可以正常工作。您可以尝试使用 wc -c < /usr/bin/wc。此外，POSIX 实用程序保证能够处理二进制文件，除非另有明确指定。

回复收藏 0 原文

情绪失控 2024-08-19 13:22:14

我最终编写了自己的程序（非常小）来显示大小。更多信息请参见bfsize -以字节为单位打印文件大小（仅此而已）。

我认为使用常见 Linux 工具的两种最简洁的方法是：

stat -c %s /usr/bin/stat

50000


wc -c < /usr/bin/wc

36912

但我只是不想输入参数或通过管道传输输出只是为了获取文件大小，所以我使用自己的 bfsize 。

I ended up writing my own program (really small) to display just the size. More information is in bfsize - print file size in bytes (and just that).

The two cleanest ways in my opinion with common Linux tools are:

stat -c %s /usr/bin/stat

50000


wc -c < /usr/bin/wc

36912

But I just don't want to be typing parameters or pipe the output just to get a file size, so I'm using my own bfsize.

回复收藏 0 原文

月朦胧 2024-08-19 13:22:14

尽管 du 通常会打印磁盘使用情况而不是实际数据大小，GNU Core Utilities du 可以打印文件的“表观大小” ” 以字节为单位：

du -b FILE

但它无法在 BSD、Solaris, macOS等

Even though du usually prints disk usage and not actual data size, the GNU Core Utilities du can print a file's "apparent size" in bytes:

du -b FILE

But it won't work under BSD, Solaris, macOS, etc.

回复收藏 0 原文

萌面超妹 2024-08-19 13:22:14

最后我决定使用ls，以及Bash数组扩展：

TEMP=( $( ls -ln FILE ) )
SIZE=${TEMP[4]}

这不是很好，但至少它只做了一次fork+execve，并且它不依赖于辅助编程语言（Perl，Ruby、Python ，或其他什么）。

Finally I decided to use ls, and Bash array expansion:

TEMP=( $( ls -ln FILE ) )
SIZE=${TEMP[4]}

It's not really nice, but at least it does only one fork+execve, and it doesn't rely on a secondary programming language (Perl, Ruby, Python, or whatever).

回复收藏 0 原文

故人的歌 2024-08-19 13:22:14

BSD 系统具有 stat 具有来自 GNU 核心实用程序之一，但具有类似的功能。

stat -f %z <file name>

这适用于 macOS （在 10.12 上测试），FreeBSD，NetBSD 和 OpenBSD。

BSD systems have stat with different options from the GNU Core Utilities one, but with similar capabilities.

stat -f %z <file name>

This works on macOS (tested on 10.12), FreeBSD, NetBSD and OpenBSD.

回复收藏 0 原文

可爱咩 2024-08-19 13:22:14

处理 ls -n 输出时，作为难以移植的 shell 数组的替代方案，您可以使用位置参数，它们形成唯一的数组，并且是标准 shell 中唯一的局部变量。将位置参数的覆盖包装在函数中，以保留脚本或函数的原始参数。

getsize() { set -- $(ls -dn "$1") && echo $5; }
getsize FILE

这会根据当前的 IFS 分割 ln -dn 的输出code> 环境变量设置，将其分配给位置参数并回显第五个。 -d 确保正确处理目录，-n 确保不需要解析用户和组名称，这与 -l 不同。此外，理论上，包含空格的用户名和组名可能会破坏预期的行结构；它们通常是不允许的，但这种可能性仍然让程序员停下来思考。

When processing ls -n output, as an alternative to ill-portable shell arrays, you can use the positional arguments, which form the only array and are the only local variables in the standard shell. Wrap the overwrite of positional arguments in a function to preserve the original arguments to your script or function.

getsize() { set -- $(ls -dn "$1") && echo $5; }
getsize FILE

This splits the output of ln -dn according to current IFS environment variable settings, assigns it to positional arguments and echoes the fifth one. The -d ensures directories are handled properly and the -n assures that user and group names do not need to be resolved, unlike with -l. Also, user and group names containing white space could theoretically break the expected line structure; they are usually disallowed, but this possibility still makes the programmer stop and think.

回复收藏 0 原文

站稳脚跟 2024-08-19 13:22:14

跨平台最快的解决方案（它仅使用单个 fork() 来表示 ls，不会尝试计算实际字符，不会产生不需要的 awk、perl 等）。

它在 Mac OS X 和 Linux 上进行了测试。对于 Solaris，它可能需要进行少量修改：

__ln=( $( ls -Lon "$1" ) )
__size=${__ln[3]}
echo "Size is: $__size bytes"

如果需要，请简化 ls 参数，并调整 ${__ln[3]} 中的偏移量。

注意：它将遵循符号链接。

Cross-platform fastest solution (it only uses a single fork() for ls, doesn't attempt to count actual characters, doesn't spawn unneeded awk, perl, etc.).

It was tested on Mac OS X and Linux. It may require minor modification for Solaris:

__ln=( $( ls -Lon "$1" ) )
__size=${__ln[3]}
echo "Size is: $__size bytes"

If required, simplify ls arguments, and adjust the offset in ${__ln[3]}.

Note: It will follow symbolic links.

回复收藏 0 原文

笨死的猪 2024-08-19 13:22:14

如果您使用 GNU fileutils 中的 find：

size=$( find . -maxdepth 1 -type f -name filename -printf '%s' )

不幸的是，find 的其他实现通常不支持 -maxdepth，也不支持 -printf< /代码>。例如 Solaris 和 macOS find 就是这种情况。

If you use find from GNU fileutils:

size=$( find . -maxdepth 1 -type f -name filename -printf '%s' )

Unfortunately, other implementations of find usually don't support -maxdepth, nor -printf. This is the case for e.g. Solaris and macOS find.

回复收藏 0 原文

隐诗 2024-08-19 13:22:14

您可以使用find命令来获取一些文件集（此处提取临时文件）。然后，您可以使用 du 命令通过 -h 开关以人类可读的形式获取每个文件的文件大小。

find $HOME -type f -name "*~" -exec du -h {} \;

输出：

4.0K    /home/turing/Desktop/JavaExmp/TwoButtons.java~
4.0K    /home/turing/Desktop/JavaExmp/MyDrawPanel.java~
4.0K    /home/turing/Desktop/JavaExmp/Instream.java~
4.0K    /home/turing/Desktop/JavaExmp/RandomDemo.java~
4.0K    /home/turing/Desktop/JavaExmp/Buff.java~
4.0K    /home/turing/Desktop/JavaExmp/SimpleGui2.java~

You can use the find command to get some set of files (here temporary files are extracted). Then you can use the du command to get the file size of each file in a human-readable form using the -h switch.

find $HOME -type f -name "*~" -exec du -h {} \;

Output:

4.0K    /home/turing/Desktop/JavaExmp/TwoButtons.java~
4.0K    /home/turing/Desktop/JavaExmp/MyDrawPanel.java~
4.0K    /home/turing/Desktop/JavaExmp/Instream.java~
4.0K    /home/turing/Desktop/JavaExmp/RandomDemo.java~
4.0K    /home/turing/Desktop/JavaExmp/Buff.java~
4.0K    /home/turing/Desktop/JavaExmp/SimpleGui2.java~

回复收藏 0 原文

~没有更多了~