计算目录大小的最快方法

发布于 2024-10-04 15:20:29 字数 568 浏览 4 评论 0原文

计算目录大小的最佳且最快方法是什么?例如,我们将具有以下结构:

/users
      /a
      /b
      /c
      /...

我们需要输出每个用户目录:

a = 1224KB
b = 3533KB
c = 3324KB
...

我们计划在 /users 下有数十甚至数十万个目录。以下 shell 命令有效:

du -cms /users/a | grep total | awk '{print $1}'

但是,我们必须调用它 N 次。重点在于输出;每个用户的目录大小将存储在我们的数据库中。此外,我们希望尽可能频繁地更新它,但又不会阻塞服务器上的所有资源。是否有可能让它每分钟计算一次用户目录大小?每5分钟一次怎么样?

现在我又想了一下,使用 Node.js 有意义吗?这样,我们就可以计算目录大小,甚至可以在一个事务中全部插入数据库。我们也可以在 PHP 和 Python 中做到这一点,但不确定它是否同样快。

谢谢。

what is the best and fastest way to calculate directory sizes? For example we will have the following structure:

/users
      /a
      /b
      /c
      /...

We need the output to be per user directory:

a = 1224KB
b = 3533KB
c = 3324KB
...

We plan on having tens maybe even hundred of thousands of directories under /users. The following shell command works:

du -cms /users/a | grep total | awk '{print $1}'

But, we will have to call it N number of times. The entire point, is that the output; each users directory size will be stored in our database. Also, we would love to have it update as frequently as possible, but without blocking all the resources on the server. Is it even possible to have it calculate users directory size every minute? How about every 5 minutes?

Now that I am thinking about it some more, would it make sense to use node.js? That way, we can calculate directory sizes, and even insert into the database all in one transaction. We could do that as well in PHP and Python, but not sure it is as fast.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

口干舌燥 2024-10-11 15:20:29

为什么不只是:(

du -sm /users/*

不过,最慢的部分仍然可能是 du 遍历文件系统来计算大小)。

Why not just:

du -sm /users/*

(The slowest part is still likely to be du traversing the filesystem to calculate the size, though).

⒈起吃苦の倖褔 2024-10-11 15:20:29

您需要这些信息做什么?如果只是为了提醒用户他们的主目录太大,您应该为文件系统添加配额限制。如果您只想要数量而不真正限制磁盘使用,则可以将配额设置为 1000 GB。

每当您访问磁盘上的任何内容时,这些数字通常都是准确的。唯一的缺点是,它们会告诉您特定用户拥有的文件有多大,而不是其主目录下的文件有多大。但也许你可以忍受这一点。

What do you need this information for? If it's only for reminding the users that their home directories are too big, you should add quota limits to the filesystem. You can set the quota to 1000 GB if you just want the numbers without really limiting disk usage.

The numbers are usually accurate whenever you access anything on the disk. The only downside is that they tell you how large the files are that are owned by a particular user, instead of how large the files below his home directory are. But maybe you can live with that.

心如狂蝶 2024-10-11 15:20:29

使用 ncdu 包分析存储的最快方法:

sudo apt-get install ncdu

命令示例:

ncdu /your/directory/

Fastest way for analyze storage using ncdu package:

sudo apt-get install ncdu

command example:

ncdu /your/directory/
我一向站在原地 2024-10-11 15:20:29

我想你正在寻找的是:

du -cm --max-depth=1 /users | awk '{user = substr($2,7,300);
>                                   ans = user ": " $1;
>                                   print ans}'

神奇的数字 7 正在拿走子字符串 /users/,而 300 只是一个任意的大数字(awk 不是我最好的语言之一 =D,但我猜这部分不是无论如何都会用 awk 编写。)它更快,因为您不涉及对总数进行 grep,并且循环包含在 du 内。我打赌它可以做得更快,但这应该足够快了。

I think what you are looking for is:

du -cm --max-depth=1 /users | awk '{user = substr($2,7,300);
>                                   ans = user ": " $1;
>                                   print ans}'

The magic numbers 7 is taking away the substring /users/, and 300 is just an arbitrary big number (awk is not one of my best languages =D, but I am guessing that part is not going to be written in awk anyways.) It's faster since you don't involve greping for the total and the loop is contained inside du. I bet it can be done faster, but this should be fast enough.

念三年u 2024-10-11 15:20:29

如果您有多个核心,您可以并行运行 du 命令,

例如(从要检查的文件夹运行):

>>并行 du -sm ::: *

>> ls -a | xargs -P4 du -sm

[-P 参数后面的数字设置您要使用的 cpu 数量]

If you have multiple cores you can run the du command in parallel,

For example (running from the folder you want to examine):

>> parallel du -sm ::: *

>> ls -a | xargs -P4 du -sm

[The number after the -P argument sets the amount of cpus you want to use]

弄潮 2024-10-11 15:20:29

没那么慢,但会显示文件夹大小: du -sh /* >总大小.文件.txt

not that slow but will show you folders size: du -sh /* > total.size.files.txt

又怨 2024-10-11 15:20:29

我推荐实用程序 DUC。第一次运行

duc index ~/

将缓存 ~/cache/duc 中的所有目录大小。您可以使用 duc ls ~/ 获取所需的格式。或者,如果您只想要当前大小:

result_in_bytes=$(duc ls -b ~/ | awk '{sum += $1} END {print sum}')
result_in_mb=$(echo "scale=1; $result_in_bytes / 1024 / 1024" | bc)

I recommend the utility DUC. First run

duc index ~/

that will cache all directory sizes in ~/cache/duc. You can get the formar you asked for with duc ls ~/. Or if you only want the current size:

result_in_bytes=$(duc ls -b ~/ | awk '{sum += $1} END {print sum}')
result_in_mb=$(echo "scale=1; $result_in_bytes / 1024 / 1024" | bc)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文