如何在 Python 中压缩相当长的二进制字符串以便以后能够访问它?
我有一长串项目 (4700),与另一个列表中的设置相比,它们最终将是 1 或 0。我希望能够构造一个可以存储在某些元数据中的单个整数/字符串项目,以便稍后可以访问它,以便唯一地标识其中的项目组合。
我正在用 Python 编写这一切。我正在考虑做一些类似 zlib 压缩加上十六进制转换的事情,但我对如何进行逆变换感到困惑。因此,假设 bin_string 是由 1 和 0 组成的字符串数组,它应该看起来像这样
import zlib
#example bin_string, real one is much longer
bin_string="1001010010100101010010100101010010101010000010100101010"
compressed = zlib.compress(bin_string.encode())
this_hex = compressed.hex()
,然后我可以将 this_hex 保存到元数据中。问题是,如何从我的十六进制值中获取原始的 bin_string
?我在数值方法等方面有很多 Python 经验,但在压缩方面却很少,所以任何基本的见解都非常有价值。
I have a long array of items (4700) that will ultimately be 1 or 0 when compared to settings in another list. I want to be able to construct a single integer/string item that I can store in some of the metadata such that it can be accessed later in order to uniquely identify the combination of items that goes into it.
I am writing this all in Python. I am thinking of doing something like zlib compression plus a hex conversion, but I am getting myself confused as to how to do the inverse transformation. So assuming bin_string is the string array of 1's and 0's it should look something like this
import zlib
#example bin_string, real one is much longer
bin_string="1001010010100101010010100101010010101010000010100101010"
compressed = zlib.compress(bin_string.encode())
this_hex = compressed.hex()
where I can then save this_hex to the metadata. The question is, how do I get the original bin_string
back from my hex value? I have lots of Python experience with numerical methods and such but little with compression, so any basic insights would be very valuable.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
只需执行每个操作的逆操作即可。这:
将返回您的原始字符串。
它会更快,甚至可能会导致更好的压缩,只需将您的位编码为字节字符串中的位,并用一个终止位后跟零来填充最后一个字节。这将是 7 个字节,而不是从
zlib.compress()
获得的 22 个字节。仅当 0 或 1 存在强烈偏差,和/或 0 和 1 存在重复模式时,zlib 才会做得更好。至于元数据的编码,Base64 比十六进制更紧凑。您的示例为
lKVKVKoKVQ==
。Just do the inverse of each operation. This:
will return your original string.
It would be faster and might even result in better compression to simply encode your bits as bits in a byte string, along with a terminating one bit followed by zeros to pad out the last byte. That would be seven bytes instead of the 22 you're getting from
zlib.compress()
. zlib would do better only if there is a strong bias for 0's or 1's, and/or there are repeating patterns in the 0's and 1's.As for encoding for the metadata, Base64 would be more compact than hexadecimal. Your example would be
lKVKVKoKVQ==
.您应该尝试使用 numpy 的 .savez_compressed() 方法
将您的简单数组转换为 numpy 数组 amd 然后使用它 -
使用
加载 .npz 文件
You should try using the .savez_compressed() method of numpy
Convert your simple array into a numpy array amd then use this -
Use
To load the .npz file