使用 Python 计算目录的大小?
在我重新发明这个特殊的轮子之前,有没有人有一个很好的例程来使用 Python 计算目录的大小?如果例程能够很好地以 Mb/Gb 等格式格式化大小,那就太好了。
Before I re-invent this particular wheel, has anybody got a nice routine for calculating the size of a directory using Python? It would be very nice if the routine would format the size nicely in Mb/Gb etc.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(30)
这会遍历所有子目录;总结文件大小:
以及使用 os.listdir(不包括子目录):
参考:
已更新
使用 os.path.getsize 比使用 os.stat().st_size 方法更清晰。
感谢 Ghostdog74 指出了这一点!
os.stat - st_size 给出大小(以字节为单位)。还可以用于获取文件大小和其他文件相关信息。
更新 2018
如果您使用 Python 3.4 或更低版本,那么您可以考虑使用第三方
scandir
包。在Python 3.5及更高版本中,该包已被纳入标准库,并且os.walk
获得了相应的性能提升。更新2019
最近我越来越多地使用
pathlib
,这是一个pathlib
解决方案:This walks all sub-directories; summing file sizes:
And a oneliner for fun using os.listdir (Does not include sub-directories):
Reference:
Updated
To use os.path.getsize, this is clearer than using the os.stat().st_size method.
Thanks to ghostdog74 for pointing this out!
os.stat - st_size Gives the size in bytes. Can also be used to get file size and other file related information.
Update 2018
If you use Python 3.4 or previous then you may consider using the more efficient
walk
method provided by the third-partyscandir
package. In Python 3.5 and later, this package has been incorporated into the standard library andos.walk
has received the corresponding increase in performance.Update 2019
Recently I've been using
pathlib
more and more, here's apathlib
solution:到目前为止建议的一些方法实现了递归,其他方法则使用了 shell,或者不会产生格式整齐的结果。当您的代码是针对 Linux 平台的一次性代码时,您可以像往常一样进行格式化,包括递归,作为一行代码。除了最后一行中的
print
之外,它适用于当前版本的python2
和python3
:简单、高效并且适用于文件和多级目录:
Some of the approaches suggested so far implement a recursion, others employ a shell or will not produce neatly formatted results. When your code is one-off for Linux platforms, you can get formatting as usual, recursion included, as a one-liner. Except for the
print
in the last line, it will work for current versions ofpython2
andpython3
:is simple, efficient and will work for files and multilevel directories:
使用
pathlib
我想出了这个单行来获取文件夹的大小:这就是我想出的一个格式良好的输出:
用法:
我也遇到了这个 问题,它有一些更紧凑、可能更高效的策略用于打印文件大小。
Using
pathlib
I came up with this one-liner to get the size of a folder:And this is what I came up with for a nicely formatted output:
Usage:
I also came across this question, which has some more compact and probably more performant strategies for printing file sizes.
这是一个递归函数(它递归地总结所有子文件夹及其各自文件的大小),它返回与运行“du -sb”时完全相同的字节。在 Linux 中(其中“.”表示“当前文件夹”):
Here is a recursive function (it recursively sums up the size of all subfolders and their respective files) which returns exactly the same bytes as when running "du -sb ." in linux (where the "." means "the current folder"):
使用 os.scandir 的 Python 3.5 递归文件夹大小
Python 3.5 recursive folder size using
os.scandir
对于 python3.5+
用法::
for python3.5+
Usage::
Monknut 答案很好,但它在损坏的符号链接上失败,因此您还必须检查该路径是否确实存在
monknut answer is good but it fails on broken symlink, so you also have to check if this path really exists
接受的答案不考虑硬链接或软链接,并且会将这些文件计数两次。您需要跟踪您看到的索引节点,而不是添加这些文件的大小。
The accepted answer doesn't take into account hard or soft links, and would count those files twice. You'd want to keep track of which inodes you've seen, and not add the size for those files.
递归单行:
a recursive one-liner:
克里斯的答案很好,但可以通过使用一组来检查可见目录来使其更加惯用,这也避免了使用控制流异常:
Chris' answer is good but could be made more idiomatic by using a set to check for seen directories, which also avoids using an exception for control flow:
聚会有点晚了,但只要你有 glob2 和 人性化已安装。请注意,在 Python 3 中,默认的 iglob 具有递归模式。如何修改 Python 3 的代码留给读者作为一个简单的练习。
A little late to the party but in one line provided that you have glob2 and humanize installed. Note that in Python 3, the default
iglob
has a recursive mode. How to modify the code for Python 3 is left as a trivial exercise for the reader.获取目录大小
解决方案的属性:
du
不st.st_blocks
表示已使用的磁盘空间,因此仅适用于类 Unix 系统代码:
用法示例:
人类可读的文件大小
解决方案的属性:
代码:
使用示例:
Get directory size
Properties of the solution:
du
doesst.st_blocks
for disk space used, thus works only on Unix-like systemsThe code:
Example usage:
Human readable file size
Properties of the solution:
The code:
Example usage:
你可以这样做:
在这种情况下,我在返回结果之前没有测试结果,如果你愿意,你可以使用commands.getstatusoutput检查它。
You can do something like this :
in this case I have not tested the result before returning it, if you want you can check it with commands.getstatusoutput.
对于问题的第二部分
For the second part of the question
你说的一句...
这是一个单行:
虽然我可能会把它分开并且它不执行任何检查。
要转换为 kb,请参阅 可重用库以获得人类可读版本文件大小?并在其中工作
One-liner you say...
Here is a one liner:
Although I would probably split it out and it performs no checks.
To convert to kb see Reusable library to get human readable version of file size? and work it in
为了获取一个文件的大小,可以使用 os.path.getsize() ,
它以字节为单位报告。
for getting the size of one file, there is os.path.getsize()
its reported in bytes.
以下脚本打印指定目录的所有子目录的目录大小。它还尝试从缓存递归函数的调用中受益(如果可能)。如果省略参数,脚本将在当前目录中运行。输出按目录大小从最大到最小排序。因此您可以根据您的需要进行调整。
PS 我已经使用配方 578019 以人性化的格式显示目录大小(http://code.activestate .com/recipes/578019/)
示例输出:
编辑:按照 user2233949 的建议将 null_decorator 移到上面
The following script prints directory size of all sub-directories for the specified directory. It also tries to benefit (if possible) from caching the calls of a recursive functions. If an argument is omitted, the script will work in the current directory. The output is sorted by the directory size from biggest to smallest ones. So you can adapt it for your needs.
PS i've used recipe 578019 for showing directory size in human-friendly format (http://code.activestate.com/recipes/578019/)
Sample output:
EDIT: moved null_decorator above, as user2233949 recommended
使用库 sh:模块
du
执行此操作:如果如果您想通过 asterix,请按照 这里。
要将值转换为人类可读的格式,请使用 humanize:
use library sh: the module
du
does it:if you want to pass asterix, use
glob
as described here.to convert the values in human readables, use humanize:
无论如何,tree 命令免费完成所有这些工作:
我喜欢 Python,但到目前为止,解决该问题的最简单的解决方案不需要新代码。
For what it's worth... the tree command does all of this for free:
I love Python, but by far the simplest solution to the problem requires no new code.
它很方便:
It is handy:
下面是一个以递归方式执行此操作的单行程序(从 Python 3.5 开始提供递归选项):
Here is a one liner that does it recursively (recursive option available as of Python 3.5):
使用 pathlib 在 Python 3.6 上运行的解决方案。
A solution that works on Python 3.6 using pathlib.
我编写了这个函数,它为我提供了目录的准确总体大小,我尝试了 os.walk 的其他 for 循环解决方案,但我不知道为什么最终结果总是小于实际大小(在 ubuntu 18 env 上)。我肯定做错了什么,但谁在乎呢,这个写得很好。
I wrote this function which gives me accurate overall size of a directory, i tried other for loop solutions with os.walk but i don't know why the end result was always less than the actual size (on ubuntu 18 env). I must have done something wrong but who cares wrote this one works perfectly fine.
我确信这会有所帮助!也适用于文件夹和文件!
I'm sure this helps! For folders and files as well!
我正在使用 python 2.7.13 和 scandir 这是我的单行递归函数文件夹的总大小:
https://pypi.python.org/pypi/scandir
I'm using python 2.7.13 with scandir and here's my one-liner recursive function to get the total size of a folder:
https://pypi.python.org/pypi/scandir
当计算子目录的大小时,它应该更新其父目录的文件夹大小,并且这将一直持续到到达根父目录。
以下函数计算文件夹及其所有子文件夹的大小。
When size of the sub-directories is computed, it should update its parent's folder size and this will go on till it reaches the root parent.
The following function computes the size of the folder and all its sub-folders.
Python 3.6+ 使用 os.scandir 递归文件夹/文件大小。与 @blakev 的答案一样强大,但更短,并且在 EAFP python 风格。
Python 3.6+ recursive folder/file size using
os.scandir
. As powerful as in the answer by @blakev, but shorter and in EAFP python style.默认情况下,
du
不遵循符号链接。这里没有答案,请使用follow_symlinks=False
。的默认行为的实现:
这是遵循 du: Test
du
does not follow symlinks by default. No answer here make use offollow_symlinks=False
.Here is an implementation which follows default behavior of du:
Test:
该脚本会告诉您 CWD 中哪个文件最大,还会告诉您该文件位于哪个文件夹中。
这个脚本适用于 win8 和 python 3.3.3 shell
This script tells you which file is the biggest in the CWD and also tells you in which folder the file is.
This script works for me on win8 and python 3.3.3 shell
不可否认,这有点 hackish,并且仅适用于 Unix/Linux。
它与
du -sb .
匹配,因为实际上这是一个运行du -sb .
命令的 Python bash 包装器。Admittedly, this is kind of hackish and only works on Unix/Linux.
It matches
du -sb .
because in effect this is a Python bash wrapper that runs thedu -sb .
command.