Here are some easy-to-copy one liners to use if you already know what unit size you want. If you're looking for in a more generic function with a few nice options, see my FEB 2021 update further on...
UPDATE FEB 2021 Here are my updated and fleshed-out functions to a) get file/folder size, b) convert into desired units:
from pathlib import Path
def get_path_size(path = Path('.'), recursive=False):
"""
Gets file size, or total directory size
Parameters
----------
path: str | pathlib.Path
File path or directory/folder path
recursive: bool
True -> use .rglob i.e. include nested files and directories
False -> use .glob i.e. only process current directory/folder
Returns
-------
int:
File size or recursive directory size in bytes
Use cleverutils.format_bytes to convert to other units e.g. MB
"""
path = Path(path)
if path.is_file():
size = path.stat().st_size
elif path.is_dir():
path_glob = path.rglob('*.*') if recursive else path.glob('*.*')
size = sum(file.stat().st_size for file in path_glob)
return size
def format_bytes(bytes, unit, SI=False):
"""
Converts bytes to common units such as kb, kib, KB, mb, mib, MB
Parameters
---------
bytes: int
Number of bytes to be converted
unit: str
Desired unit of measure for output
SI: bool
True -> Use SI standard e.g. KB = 1000 bytes
False -> Use JEDEC standard e.g. KB = 1024 bytes
Returns
-------
str:
E.g. "7 MiB" where MiB is the original unit abbreviation supplied
"""
if unit.lower() in "b bit bits".split():
return f"{bytes*8} {unit}"
unitN = unit[0].upper()+unit[1:].replace("s","") # Normalised
reference = {"Kb Kib Kibibit Kilobit": (7, 1),
"KB KiB Kibibyte Kilobyte": (10, 1),
"Mb Mib Mebibit Megabit": (17, 2),
"MB MiB Mebibyte Megabyte": (20, 2),
"Gb Gib Gibibit Gigabit": (27, 3),
"GB GiB Gibibyte Gigabyte": (30, 3),
"Tb Tib Tebibit Terabit": (37, 4),
"TB TiB Tebibyte Terabyte": (40, 4),
"Pb Pib Pebibit Petabit": (47, 5),
"PB PiB Pebibyte Petabyte": (50, 5),
"Eb Eib Exbibit Exabit": (57, 6),
"EB EiB Exbibyte Exabyte": (60, 6),
"Zb Zib Zebibit Zettabit": (67, 7),
"ZB ZiB Zebibyte Zettabyte": (70, 7),
"Yb Yib Yobibit Yottabit": (77, 8),
"YB YiB Yobibyte Yottabyte": (80, 8),
}
key_list = '\n'.join([" b Bit"] + [x for x in reference.keys()]) +"\n"
if unitN not in key_list:
raise IndexError(f"\n\nConversion unit must be one of:\n\n{key_list}")
units, divisors = [(k,v) for k,v in reference.items() if unitN in k][0]
if SI:
divisor = 1000**divisors[1]/8 if "bit" in units else 1000**divisors[1]
else:
divisor = float(1 << divisors[0])
value = bytes / divisor
return f"{value:,.0f} {unitN}{(value != 1 and len(unitN) > 3)*'s'}"
# Tests
>>> assert format_bytes(1,"b") == '8 b'
>>> assert format_bytes(1,"bits") == '8 bits'
>>> assert format_bytes(1024, "kilobyte") == "1 Kilobyte"
>>> assert format_bytes(1024, "kB") == "1 KB"
>>> assert format_bytes(7141000, "mb") == '54 Mb'
>>> assert format_bytes(7141000, "mib") == '54 Mib'
>>> assert format_bytes(7141000, "Mb") == '54 Mb'
>>> assert format_bytes(7141000, "MB") == '7 MB'
>>> assert format_bytes(7141000, "mebibytes") == '7 Mebibytes'
>>> assert format_bytes(7141000, "gb") == '0 Gb'
>>> assert format_bytes(1000000, "kB") == '977 KB'
>>> assert format_bytes(1000000, "kB", SI=True) == '1,000 KB'
>>> assert format_bytes(1000000, "kb") == '7,812 Kb'
>>> assert format_bytes(1000000, "kb", SI=True) == '8,000 Kb'
>>> assert format_bytes(125000, "kb") == '977 Kb'
>>> assert format_bytes(125000, "kb", SI=True) == '1,000 Kb'
>>> assert format_bytes(125*1024, "kb") == '1,000 Kb'
>>> assert format_bytes(125*1024, "kb", SI=True) == '1,024 Kb'
UPDATE OCT 2022
My answer to a recent comment was too long, so here's some further explanation of the 1<<20 magic! I also notice that float isn't needed so I've removed that from the examples above.
As stated in another reply (above) "<<" is called a "bitwise operator". It converts the left hand side to binary and moves the binary digits 20 places to the left (in this case). When we count normally in decimal, the total number of digits dictates whether we've reached the tens, hundreds, thousands, millions etc. Similar thing in binary except the number of digits dictates whether we're talking bits, bytes, kilobytes, megabytes etc. So.... 1<<20 is actually the same as (binary) 1 with 20 (binary) zeros after it, or if you remember how to convert from binary to decimal: 2 to the power of 20 (2**20) which equals 1048576. In the snippets above, os.path.getsize returns a value in BYTES and 1048576 bytes are strictly speaking a Mebibyte (MiB) and casually speaking a Megabyte (MB).
def GetHumanReadable(size,precision=2):
suffixes=['B','KB','MB','GB','TB']
suffixIndex = 0
while size > 1024 and suffixIndex < 4:
suffixIndex += 1 #increment the index of the suffix
size = size/1024.0 #apply the division
return "%.*f%s"%(precision,size,suffixes[suffixIndex])
def GetHumanReadable(size,precision=2):
suffixes=['B','KB','MB','GB','TB']
suffixIndex = 0
while size > 1024 and suffixIndex < 4:
suffixIndex += 1 #increment the index of the suffix
size = size/1024.0 #apply the division
return "%.*f%s"%(precision,size,suffixes[suffixIndex])
UNITS = {1000: ['KB', 'MB', 'GB'],
1024: ['KiB', 'MiB', 'GiB']}
def approximate_size(size, use_base_1024=True):
mult = 1024 if use_base_1024 else 1000
for unit in UNITS[mult]:
size = size / mult
if size < mult:
return '{0:.3f} {1}'.format(size, unit)
approximate_size(2123, False)
UNITS = {1000: ['KB', 'MB', 'GB'],
1024: ['KiB', 'MiB', 'GiB']}
def approximate_size(size, use_base_1024=True):
mult = 1024 if use_base_1024 else 1000
for unit in UNITS[mult]:
size = size / mult
if size < mult:
return '{0:.3f} {1}'.format(size, unit)
approximate_size(2123, False)
def human_size(num: int) -> str:
base = 1
for unit in ['B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y']:
n = num / base
if n < 9.95 and unit != 'B':
# Less than 10 then keep 1 decimal place
value = "{:.1f}{}".format(n, unit)
return value
if round(n) < 1000:
# Less than 4 digits so use this
value = "{}{}".format(round(n), unit)
return value
base *= 1024
value = "{}{}".format(round(n), unit)
return value
Here's a version that matches the output of ls -lh.
def human_size(num: int) -> str:
base = 1
for unit in ['B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y']:
n = num / base
if n < 9.95 and unit != 'B':
# Less than 10 then keep 1 decimal place
value = "{:.1f}{}".format(n, unit)
return value
if round(n) < 1000:
# Less than 4 digits so use this
value = "{}{}".format(round(n), unit)
return value
base *= 1024
value = "{}{}".format(round(n), unit)
return value
$ pip install -qqq datasize
$ python
...
>>> from datasize import DataSize
>>> 'My new {:GB} SSD really only stores {:.2GiB} of data.'.format(DataSize('750GB'),DataSize(DataSize('750GB') * 0.8))
'My new 750GB SSD really only stores 558.79GiB of data.'
I wanted 2 way conversion, and I wanted to use Python 3 format() support to be most pythonic. Maybe try datasize library module? https://pypi.org/project/datasize/
$ pip install -qqq datasize
$ python
...
>>> from datasize import DataSize
>>> 'My new {:GB} SSD really only stores {:.2GiB} of data.'.format(DataSize('750GB'),DataSize(DataSize('750GB') * 0.8))
'My new 750GB SSD really only stores 558.79GiB of data.'
from bisect import bisect
def to_filesize(bytes_num, si=True):
decade = 1000 if si else 1024
partitions = tuple(decade ** n for n in range(1, 6))
suffixes = tuple('BKMGTP')
i = bisect(partitions, bytes_num)
s = suffixes[i]
for n in range(i):
bytes_num /= decade
f = '{:.3f}'.format(bytes_num)
return '{}{}'.format(f.rstrip('0').rstrip('.'), s)
它将打印最多三位小数,并删除尾随零和句点。布尔参数 si 将切换基于 10 和基于 2 的大小大小的使用。
def from_filesize(spec, si=True):
decade = 1000 if si else 1024
suffixes = tuple('BKMGTP')
num = float(spec[:-1])
s = spec[-1]
i = suffixes.index(s)
for n in range(i):
num *= decade
return int(num)
Here is my implementation:
from bisect import bisect
def to_filesize(bytes_num, si=True):
decade = 1000 if si else 1024
partitions = tuple(decade ** n for n in range(1, 6))
suffixes = tuple('BKMGTP')
i = bisect(partitions, bytes_num)
s = suffixes[i]
for n in range(i):
bytes_num /= decade
f = '{:.3f}'.format(bytes_num)
return '{}{}'.format(f.rstrip('0').rstrip('.'), s)
It will print up to three decimals and it strips trailing zeros and periods. The boolean parameter si will toggle usage of 10-based vs. 2-based size magnitude.
This is its counterpart. It allows to write clean configuration files like {'maximum_filesize': from_filesize('10M'). It returns an integer that approximates the intended filesize. I am not using bit shifting because the source value is a floating point number (it will accept from_filesize('2.15M') just fine). Converting it to an integer/decimal would work but makes the code more complicated and it already works as it is.
def from_filesize(spec, si=True):
decade = 1000 if si else 1024
suffixes = tuple('BKMGTP')
num = float(spec[:-1])
s = spec[-1]
i = suffixes.index(s)
for n in range(i):
num *= decade
return int(num)
发布评论
评论(12)
这是我使用的:
注意:大小应该以字节为单位发送。
Here is what I use:
NB : size should be sent in Bytes.
有 hurry.filesize 它将获取字节大小并生成一个漂亮的字符串如果是的话。
或者,如果您想要 1K == 1000(这是大多数用户的假设):
它也有 IEC 支持(但没有记录):
因为它是由 Awesome Martijn Faassen 编写的,所以代码很小、清晰且可扩展。编写自己的系统非常容易。
这是一个:
像这样使用:
There is hurry.filesize that will take the size in bytes and make a nice string out if it.
Or if you want 1K == 1000 (which is what most users assume):
It has IEC support as well (but that wasn't documented):
Because it's written by the Awesome Martijn Faassen, the code is small, clear and extensible. Writing your own systems is dead easy.
Here is one:
Used like so:
您可以使用
<<
按位移位运算符,即1<<20
获取兆字节,1<< 30
以获得千兆字节等。在最简单的情况下,您可以使用常量
MBFACTOR = float(1<<20)
,然后可以将其与字节一起使用,即:megas = size_in_bytes/MBFACTOR
。通常,兆字节就足够了,或者可以使用类似这样的东西:
Instead of a size divisor of
1024 * 1024
you could use the<<
bitwise shifting operator, i.e.1<<20
to get megabytes,1<<30
to get gigabytes, etc.In the simplest scenario you can have e.g. a constant
MBFACTOR = float(1<<20)
which can then be used with bytes, i.e.:megas = size_in_bytes/MBFACTOR
.Megabytes are usually all that you need, or otherwise something like this can be used:
如果您已经知道自己想要什么单位尺寸,可以使用以下一些易于复制的单衬。如果您正在寻找具有一些不错选项的更通用的函数,请参阅我的 2021 年 2 月更新:
字节
千位
千字节
兆位
兆字节
千兆位
千兆字节
太字节
2021 年 2 月更新
以下是我更新和充实的函数,用于 a) 获取文件/文件夹大小,b) 转换为所需的单位:
更新 2022 年 10 月
我对最近评论的回答太长,所以这里是对1<<20魔法的进一步解释!我还注意到不需要 float,因此我已从上面的示例中删除了它。
正如另一个回复(上面)中所述“<<”称为“按位运算符”。它将左侧转换为二进制,并将二进制数字向左移动 20 位(在本例中)。当我们通常用十进制计数时,位数决定了我们是否达到了十、百、千、百万等。二进制中的类似情况,除了位数决定了我们是否在谈论位、字节、千字节、兆字节所以.... 1<<20 实际上与(二进制)1 相同,后面有 20 个(二进制)零,或者如果您还记得如何从二进制转换为十进制:2 的 20 次方(2 **20) 等于 1048576。在上面的代码片段中,os.path.getsize 返回一个以 BYTES 为单位的值,1048576 字节严格来说是兆字节 (MiB),随便说是兆字节 (MB)。
Here are some easy-to-copy one liners to use if you already know what unit size you want. If you're looking for in a more generic function with a few nice options, see my FEB 2021 update further on...
Bytes
Kilobits
Kilobytes
Megabits
Megabytes
Gigabits
Gigabytes
Terabytes
UPDATE FEB 2021
Here are my updated and fleshed-out functions to a) get file/folder size, b) convert into desired units:
UPDATE OCT 2022
My answer to a recent comment was too long, so here's some further explanation of the 1<<20 magic! I also notice that float isn't needed so I've removed that from the examples above.
As stated in another reply (above) "<<" is called a "bitwise operator". It converts the left hand side to binary and moves the binary digits 20 places to the left (in this case). When we count normally in decimal, the total number of digits dictates whether we've reached the tens, hundreds, thousands, millions etc. Similar thing in binary except the number of digits dictates whether we're talking bits, bytes, kilobytes, megabytes etc. So.... 1<<20 is actually the same as (binary) 1 with 20 (binary) zeros after it, or if you remember how to convert from binary to decimal: 2 to the power of 20 (2**20) which equals 1048576. In the snippets above, os.path.getsize returns a value in BYTES and 1048576 bytes are strictly speaking a Mebibyte (MiB) and casually speaking a Megabyte (MB).
这是计算大小的紧凑函数
有关更详细的输出和反之亦然的操作,请参阅:http://code.activestate.com/recipes/578019-bytes-to- human- human-to-bytes-converter/
Here is the compact function to calculate size
For more detailed output and vice versa operation please refer: http://code.activestate.com/recipes/578019-bytes-to-human-human-to-bytes-converter/
这是:
输出
Here it is:
Output
以防万一有人正在寻找这个问题的反面(正如我确实所做的那样),这对我有用:
Just in case anyone's searching for the reverse of this problem (as I sure did) here's what works for me:
这里我的两分钱,它允许向上和向下投射,并增加可定制的精度:
添加
TB
等,如你所愿。Here my two cents, which permits casting up and down, and adds customizable precision:
Add
TB
, etc, as you wish.这是与 ls -lh 的输出匹配的版本。
Here's a version that matches the output of ls -lh.
我想要 2 路转换,并且我想使用 Python 3 format() 支持来实现最 Pythonic。也许尝试 datasize 库模块? https://pypi.org/project/datasize/
I wanted 2 way conversion, and I wanted to use Python 3 format() support to be most pythonic. Maybe try datasize library module? https://pypi.org/project/datasize/
这是我的实现:
它将打印最多三位小数,并删除尾随零和句点。布尔参数
si
将切换基于 10 和基于 2 的大小大小的使用。这是它的对应物。它允许编写干净的配置文件,例如
{'maximum_filesize': from_filesize('10M')
。它返回一个近似于预期文件大小的整数。我没有使用位移位,因为源值是浮点数(它将接受from_filesize('2.15M')
就可以了)。将其转换为整数/小数可以工作,但会使代码更加复杂,并且它已经按原样工作。Here is my implementation:
It will print up to three decimals and it strips trailing zeros and periods. The boolean parameter
si
will toggle usage of 10-based vs. 2-based size magnitude.This is its counterpart. It allows to write clean configuration files like
{'maximum_filesize': from_filesize('10M')
. It returns an integer that approximates the intended filesize. I am not using bit shifting because the source value is a floating point number (it will acceptfrom_filesize('2.15M')
just fine). Converting it to an integer/decimal would work but makes the code more complicated and it already works as it is.