与 uint8 相比，以二进制形式保存 numpy 数组并不会提高磁盘使用率

发布于 2025-01-15 00:57:05 字数 772 浏览 0 评论 0原文

我正在保存 numpy 数组，同时尝试使用尽可能少的磁盘空间。一路上我意识到，与 uint8 数组相比，保存布尔 numpy 数组并不能提高磁盘使用率。这是有原因的还是我在这里做错了什么？

这是一个最小的例子：

import sys
import numpy as np

rand_array = np.random.randint(0, 2, size=(100, 100), dtype=np.uint8)  # create a random dual state numpy array

array_uint8 = rand_array * 255  # array, type uint8

array_bool = np.array(rand_array, dtype=bool)  # array, type bool

print(f"size array uint8 {sys.getsizeof(array_uint8)}")
# ==> size array uint8 10120
print(f"size array bool {sys.getsizeof(array_bool)}")
# ==> size array bool 10120

np.save("array_uint8", array_uint8, allow_pickle=False, fix_imports=False)
# size in fs: 10128
np.save("array_bool", array_bool, allow_pickle=False, fix_imports=False)
# size in fs: 10128

原文

I'm saving numpy arrays while trying to use as little disk space as possible.
Along the way I realized that saving a boolean numpy array does not improve disk usage compared to a uint8 array.
Is there a reason for that or am I doing something wrong here?

Here is a minimal example:

import sys
import numpy as np

rand_array = np.random.randint(0, 2, size=(100, 100), dtype=np.uint8)  # create a random dual state numpy array

array_uint8 = rand_array * 255  # array, type uint8

array_bool = np.array(rand_array, dtype=bool)  # array, type bool

print(f"size array uint8 {sys.getsizeof(array_uint8)}")
# ==> size array uint8 10120
print(f"size array bool {sys.getsizeof(array_bool)}")
# ==> size array bool 10120

np.save("array_uint8", array_uint8, allow_pickle=False, fix_imports=False)
# size in fs: 10128
np.save("array_bool", array_bool, allow_pickle=False, fix_imports=False)
# size in fs: 10128

分享到QQ

分享到微博